Wikipedia:OABOT

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Open Access logo PLoS transparent-grey.svg

OAbot is a tool to easily edit articles to make academic citations link open access publications (see list of edits made).

Wikipedia links to hundreds of thousands of paywalled sources. Our community does not prohibit or even discourage citing paywalled sources, but at the same time there is absolutely no prohibition on surfacing open access (OA) versions right alongside those citations, as long as the link does not violate any copyrights. Indeed, a good citation will have as much information as possible to let the reader find (and use) it in the way that is easiest for them.

Bot[edit]

Workflow[edit]

The bot looks for CS1 citation templates, and for each of them:

  • parses the citation using wikiciteparser
  • queries the Dissemin API and Unpaywall with the metadata it has extracted
  • translate the pdf_url it returns to a parameter of the citation (|arxiv=, |pmc=, |doi= or |url= as a fallback)
  • if there is no such parameter in the template, and if no link is already free to read, it adds it to the template.

Examples[edit]

  • Adding a free to read |url=:
    • Before: Groussard, M.; Rauchs, G.; Landeau, B.; Viader, F.; Desgranges, B.; Eustache, F.; Platel, H. (2010). "The neural substrates of musical memory revealed by fMRI and two semantic tasks". NeuroImage. 53: 1301–1309. doi:10.1016/j.neuroimage.2010.07.013.
    • After: Groussard, M.; Rauchs, G.; Landeau, B.; Viader, F.; Desgranges, B.; Eustache, F.; Platel, H. (2010). "The neural substrates of musical memory revealed by fMRI and two semantic tasks" (PDF). NeuroImage. 53: 1301–1309. doi:10.1016/j.neuroimage.2010.07.013.
  • Adding a |citeseerx=:
    • Before: Selinger, Peter (2011). "A survey of graphical languages for monoidal categories". New Structures for Physics. Lecture Notes in Physics. 813. Springer. pp. 289–233.
    • After: Selinger, Peter (2011). "A survey of graphical languages for monoidal categories". New Structures for Physics. Lecture Notes in Physics. 813. Springer. pp. 289–233. CiteSeerX 10.1.1.216.4918.

Code[edit]

You are very welcome to contribute to the code (for instance by pull requests on GitHub) and join the development team on wmflabs. You can request access to the Tools project.

If you want to make suggestions or report bugs, please add a task to the Phabricator project.

Questions[edit]

How does the bot work?[edit]

OABOT extracts the citations from an article and searches various indexes, apis, and repositories for versions of non-OA article which are free to read. OABOT uses the Dissemin backend to find these versions from sources like CrossRef, BASE, DOAI and SHERPA/RoMEO. When it finds an alternative version, it checks to see if it is already in the citation. If not there, it adds a free-to-read link to the citation. This helps readers access full text.

What kind of links does the bot add?[edit]

The bot adds a link with one the following parameters:

|arxiv=
|hdl=
|doi=
|pmc=
|citeseerx=
|url=

The bot only uses |url= if none of the other more specific parameters is known or applicable. The bot only adds a parameter if it does not contain anything before (so, the bot does not erase any information from the templates).

What kinds of links won't the bot add?[edit]

  • The bot won't add a link to a version not in CrossRef, BASE, DOAI, or SHERPA/RoMEO (it's not an open-web search for any version or pdf, it only draws from curated sources)
  • The bot won't add a link to an alternative version of a source that is already signaled as free to read (that is, if Free to read appears in the rendered source)
  • The bot won't ever replace an existing |url= with a different one, or add a second |url=
  • The bot will not replace source in free form: it only considers citation templates.

What repositories is the bot querying and pulling from?[edit]

The bot currently queries:

In the future we could add CORE - https://core.ac.uk/- (or any others, SHARE Notify, Handle.net, MLA CORE, CHORUS), once their indexes provide additional benefit and have a workable API.

I am a publisher. How do I make sure OAbot recognizes my full texts?[edit]

You should make sure that

  • You comply with the Google Scholar guidelines for exposing your full texts. In particular, the landing page for articles that are free to read should contain the meta tag citation_pdf_url with a direct link to a PDF file.
  • Zotero is able to import metadata and the full text from any landing page. This should be straightforward if you comply with Google Scholar's guidelines. Otherwise, you can fix the Zotero translator yourself by submitting a pull request to Zotero.

In addition, it also is useful if you make sure that

  • All your fully open-access journals are registered in DOAJ.
  • The CrossRef metadata includes the correct license for each article: it should be straightforward to tell whether the article is free to read simply looking at this piece of information.

Once you comply with these guidelines, the bot should mark your DOIs as free to read in Wikipedia, with a green lock:

I run a repository. How do I make sure OAbot can add links to my repository?[edit]

  • Get a valid OAI-PMH interface which should be harvested by BASE
  • Comply with the Google Scholar guidelines for exposing your full texts. In particular, the landing page for articles that are free to read should contain the meta tag citation_pdf_url with a direct link to a PDF file.
  • Zotero should be able to retrieve metadata and the full text from any landing page. This should be straightforward if you comply with Google Scholar's guidelines. Otherwise, you can fix the Zotero translator yourself by submitting a pull request to Zotero.

I am a researcher. How do I make sure OAbot finds full texts for my papers?[edit]

Make sure all your papers are deposited in a mature repository (that complies with the guidelines above) such as Zenodo. You can use http://dissem.in/ for that. Other large repositories such as PubMed Central, arXiv or HAL will work too. The repository should give free access to the full text (not just the abstract). Records with ongoing embargoes are not considered.

Full texts stored on personal homepages will generally not be considered.

How many links should the bot add?[edit]

The bot only adds 1 link, even if it finds multiple alternative versions. For example, if OABOT finds a preprint on ArXiv and a post-print on a university repository, and a PDF on the author's website, then it chooses only one, based on a ranking algorithm in Dissemin.

What does the citation look like?[edit]

The citation doesn't have any additional text or graphical elements, just an additional link.

Can we signal the version type (preprint, postprint, published version)?[edit]

At the moment, no. For most repositories this metadata just doesn't exist or isn't well-curated.

How can the bot be localized/globalized to work on any wiki?[edit]

The bot can function on any wiki, but it is limited by whether or not they use the CS1 citation templates and in the same way.

Edge cases for future development[edit]

OABot will find situations where there is already a url present which is not open, but the bot can locate a free-to-read version. In some cases we can add the secondary link as an identifier, but there are edge cases we need consensus on where the bot behavior is undetermined:

  1. When the |url= matches an existing identifier:
    Say we have |doi=10.1004/1543 and |url=http://doi.org/10.1004/1543. Can we overwrite |url= to put a free-to-read repository there?
  2. When we can't match the |url= with an existing identifier but OABot finds a repository version:
    For instance if we find |url=http://www.sciencedirect.com/science/article/pii/S1535610816303981, we won't overwrite |url=, but we would like to add the free repository URL somewhere else. If the free URLs we want to add stem from few repositories, is it appropriate to create templates for these specific repositories, and add them as |id={{my repository|12345}}?

Next steps[edit]

Resources[edit]

See also:

People[edit]