Search API
Query string syntax
If you'd like to do more complex queries than simple words or phrases, reading https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#query-string-syntax should prove very useful. The datastore that backs DOAJ is Elasticsearch and knowing more about its query syntax will let you send more advanced queries to DOAJ. This is not a prerequisite for using the DOAJ API - in the sections below, we provide instructions for the most common use cases. Please do email us if you think what you have achieved with the API would be useful to others and would like to add an example to the API documentation below.
Default handling of phrases
When searching for e.g. "understanding shadows in 3D scenes", DOAJ's web interface will return articles and journals which have metadata that contains *all* of the words "understanding", "shadows", "in" (may be ignored), "3D" and "scenes". In technical terms, the default query operator is AND. You can override it by sending us a query such as "understanding OR shadows". We find that the results returned by AND queries are much more relevant when looking for specific topics, where OR queries are best for exploring what is available, e.g. based loosely on the interests of your users.
Searching inside a specific field
When you are querying on a specific field you can use the json dot notation used by Elasticsearch, so for example to access the journal title of an article, you could use
bibjson.journal.title:"Journal of Science"
Note that all fields are analysed, which means that the above search does not look for the exact string "Journal of Science". To do that, add ".exact" to any string field (not date or number fields) to match the exact contents:
bibjson.journal.title.exact:"Journal of Science"
Special characters
All forward slash /
characters will be automatically escaped for you unless you escape them yourself. This means any forward slashes /
will become \/
which ends up encoded as %5C/
in a URL since a "naked" backslash \
is not allowed in a URL. So you can search for a DOI by giving the articles endpoint either of the following queries (they will give you the same results):
doi:10.3389/fpsyg.2013.00479 doi:10.3389%5C/fpsyg.2013.00479
Short field names
For convenience we also offer shorter field names for you to use when querying. Note that you cannot use the ".exact" notation mentioned above on these substitutions.
The substitutions for journals are as follows:
- title - search within the journal's title
- issn - the journal's issn
- publisher - the journal's publisher (not exact match)
- license - the exact licence
In addition, if you have a publisher account with the DOAJ, you may use the field "username" to query for your own publicly available journal records. Usernames are not available in the returned journal records, and no list of usernames is available to the public; you need to know your own username to use this field. You would include "username:myusername" in your search.
The substitutions for articles are as follows:
- title - search within the article title
- doi - the article's doi
- issn - the article's journal's issn
- publisher - the article's journal's publisher (not exact match)
- abstract - search within the article abstract
Sorting of results
Each request can take a "sort" url parameter, which can be of the form of one of:
sort=field sort=field:direction
The field again uses the dot notation.
If specifying the direction, it must be one of "asc" or "desc". If no direction is supplied then "asc" is used.
So for example
sort=bibjson.title sort=bibjson.title:desc
Note that for fields which may contain multiple values (i.e. arrays), the sort will use the "smallest" value in that field to sort by (depending on the definition of "smallest" for that field type)
The query string - advanced usage
The format of the query part of the URL is that of an Elasticsearch query string, as documented here: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#query-string-syntax. Elasticsearch uses Lucene under the hood.
Some of the Elasticsearch query syntax has been disabled in order to prevent queries which may damage performance. The disabled features are:
Wildcard searches. You may not put a * into a query string: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#_wildcards
Regular expression searches. You may not put an expression between two forward slashes
/regex/
into a query string: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#_regular_expressions. This is done both for performance reasons and because of the escaping of forward slashes/
described above.Fuzzy Searches. You may not use the ~ notation: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#_fuzziness
Proximity Searches. https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#_proximity_searches
CRUD API
Creating articles
If you try to create an article with a DOI or a full-text URL as another one of the articles associated with your account, then system will detect this as a duplicate. It will overwrite the old article we have with the new data you're supplying via the CRUD Article Create endpoint. It works in the same way as submitting article metadata to DOAJ via XML upload or manual entry with your publisher user account.