Search API

Query string syntax

If you'd like to do more complex queries than simple words or phrases, reading https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#query-string-syntax should prove very useful. The datastore that backs DOAJ is Elasticsearch and knowing more about its query syntax will let you send more advanced queries to DOAJ. This is not a prerequisite for using the DOAJ API - in the sections below, we provide instructions for the most common use cases. Please do email us if you think what you have achieved with the API would be useful to others and would like to add an example to the API documentation below.

Default handling of phrases

When searching for e.g. "understanding shadows in 3D scenes", DOAJ's web interface will return articles and journals which have metadata that contains *all* of the words "understanding", "shadows", "in" (may be ignored), "3D" and "scenes". In technical terms, the default query operator is AND. You can override it by sending us a query such as "understanding OR shadows". We find that the results returned by AND queries are much more relevant when looking for specific topics, where OR queries are best for exploring what is available, e.g. based loosely on the interests of your users.

When you are querying on a specific field you can use the json dot notation used by Elasticsearch, so for example to access the journal title of an article, you could use

bibjson.journal.title:"Journal of Science"

Note that all fields are analysed, which means that the above search does not look for the exact string "Journal of Science". To do that, add ".exact" to any string field (not date or number fields) to match the exact contents:

bibjson.journal.title.exact:"Journal of Science"

Special characters

All forward slash / characters will be automatically escaped for you unless you escape them yourself. This means any forward slashes / will become \/ which ends up encoded as %5C/ in a URL since a "naked" backslash \ is not allowed in a URL. So you can search for a DOI by giving the articles endpoint either of the following queries (they will give you the same results):

doi:10.3389/fpsyg.2013.00479
doi:10.3389%5C/fpsyg.2013.00479

Short field names

For convenience we also offer shorter field names for you to use when querying. Note that you cannot use the ".exact" notation mentioned above on these substitutions.

The substitutions for journals are as follows:

  • title - search within the journal's title
  • issn - the journal's issn
  • publisher - the journal's publisher (not exact match)
  • license - the exact licence

In addition, if you have a publisher account with the DOAJ, you may use the field "username" to query for your own publicly available journal records. Usernames are not available in the returned journal records, and no list of usernames is available to the public; you need to know your own username to use this field. You would include "username:myusername" in your search.

The substitutions for articles are as follows:

  • title - search within the article title
  • doi - the article's doi
  • issn - the article's journal's issn
  • publisher - the article's journal's publisher (not exact match)
  • abstract - search within the article abstract

Sorting of results

Each request can take a "sort" url parameter, which can be of the form of one of:

sort=field
sort=field:direction

The field again uses the dot notation.

If specifying the direction, it must be one of "asc" or "desc". If no direction is supplied then "asc" is used.

So for example

sort=bibjson.title
sort=bibjson.title:desc

Note that for fields which may contain multiple values (i.e. arrays), the sort will use the "smallest" value in that field to sort by (depending on the definition of "smallest" for that field type)

The query string - advanced usage

The format of the query part of the URL is that of an Elasticsearch query string, as documented here: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#query-string-syntax. Elasticsearch uses Lucene under the hood.

Some of the Elasticsearch query syntax has been disabled in order to prevent queries which may damage performance. The disabled features are:

  1. Wildcard searches. You may not put a * into a query string: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#_wildcards

  2. Regular expression searches. You may not put an expression between two forward slashes /regex/ into a query string: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#_regular_expressions. This is done both for performance reasons and because of the escaping of forward slashes / described above.

  3. Fuzzy Searches. You may not use the ~ notation: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#_fuzziness

  4. Proximity Searches. https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-string-query.html#_proximity_searches

CRUD API

Creating articles

If you try to create an article with a DOI or a full-text URL as another one of the articles associated with your account, then system will detect this as a duplicate. It will overwrite the old article we have with the new data you're supplying via the CRUD Article Create endpoint. It works in the same way as submitting article metadata to DOAJ via XML upload or manual entry with your publisher user account.