Everything You Need to Know About Openverse and the WordPress Photo Directory

A screenshot of the redesigned Openverse homepage, with images from a search for 'Olympic games' of athletes from many decades and backgrounds.
The redesigned wordpress.org/openverse homepage

When we announced that Openverse had joined WordPress earlier last year, we were thrilled about the exciting changes coming to the platform. Many of those updates are here.

Openverse, previously known as CC Search, is a search engine for openly licensed media. The index, which joined WordPress in mid-2021, has over 600 million Creative Commons licensed and public domain image and audio files. All files can be used free of charge.

 OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. has several new features, including:

  • A redesigned interface: Openverse has a new brand identity and user interface optimized for usability. Find the images and audio files you’re looking for and filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. results by license, source, and many other options.
  • Internationalization: Openverse is fully translated in 12 languages, with additional partial translations in other languages. We encourage anyone in the community to submit translations in their own languages.
  • Audio support: Openverse now includes songs, podcasts, samples, and other audio files from FreeSound, Wikimedia Commons, and Jamendo.
  • New image providers: The Openverse team has added two new sources of high-quality photographs, the WordPress Photo Directory and StockSnap. In addition, photo libraries such as EDUimages and Images of Empowerment are now available from Meta Search.

The Openverse project is part of the WordPress community and welcomes contributions from those who want to help it become the best openly licensed media search engine on the internet. The WordPress Photo Directory provides such an opportunity. 

What’s the WordPress Photo Directory?

The WordPress Photo Directory is both a new curated source of free, high-quality photographs and a new submission tool for Openverse, powered by the WordPress community. Without it, you’d need to use Flickr, Wikimedia Commons, or other sources to submit your work to Openverse.

The WordPress Photo Directory aims to be a trusted place for the community to create, share, discover, and reuse free and openly licensed media. All photos in the WordPress Photo Directory images are licensed with the CC0 public domain tool.

The WordPress Photo Directory welcomes contributions in different forms. One of the best ways to get involved is by submitting your photos:

  • Anyone with a wordpress.org account can submit their work to the photo directory. All submissions must meet these guidelines to ensure the quality of content. 
  • Photos will also be categorized and tagged to facilitate searching. Once a submission gets approved, it will be automatically added to the WordPress Photo Directory and the Openverse search engine.

You can also report issues with the directory, or become a photo directory moderator.

It is worth noting that Openverse and the WordPress Photo Directory are separate and independent projects. However, they are complementary in that the images from the directory are discoverable via the Openverse search. All WordPress Photo Directory images can be viewed in Openverse.

Where can you learn more about Openverse?

The Make Openverse blog is one of the best ways to follow along with the project. Feel free to reach out to any Openverse contributors on SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. #openverse, GitHub, or any other channel to learn more about the project. If you are interested in contributing code to Openverse, look at our good first issues or our guide for new contributors.

We hope you are as excited as we are about Openverse, and we look forward to your contributions!

Happy searching!


Thanks to @rmartinezduque @anjanavasan @callye @zackkrida @angelasjin for their work on this post.

#media, #openverse, #photos

Community Meeting Recap (21 June 2022)

Announcements

We’re changing our default throttle level this week. See the GitHub issue to get the details in advance.

Takeaways

This is the second week working on the milestone of bringing Audio out of the ‘Beta’ state.

Done

We’re glad to have many community PRs a new contributors this time [ref]:

  • Added flag to strip slash in urls while validating
  • Fixed Typo in Frontend repository’s README
  • Improved the look of the audio detail page on large screens

  • Mitigated some behavior that was troublesome for the APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. [ref]
  • Fetching related images on the frontend is now improved and feels faster [ref]
  • Fixed the Audio “Length” filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. on the frontend [ref]
  • Prevented shifting of audio results when more results are added [ref]
  • The audio player has a loading indicator now [ref]
  • Prevented thumbnail timeout [ref]

In progress

Needs review

  • Make search.vue media fetching non-blocking (merged during the meeting) [ref]
  • Add missing audio file extensions [ref]
  • Many Catalog’s PR adding filetype/filesize extraction [ref]

Upcoming

Finally, priorities were adjusted [ref], new issues to address this week were presented [ref], discussed and (self/re)assigned. A batch of data cleaning is in the plan, alongside many improvements to the OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. infrastructure.

Mitigating out of terms API usage

Yesterday at 20:20 UTC, we released version 2.5.5 of our API! Along with a few dependency upgrades and DevEx improvements/fixes, this release also brings an important change regarding anonymous API requests. After v2.5.5, any media searches that are made without an APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. key cannot request more than 20 results per page.

This change was made in order to mitigate behavior we were seeing on the API which was adversely affecting performance for other users, our capacity to update the data that backs OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project., and our ability to deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. new changes to the API.

Our API Terms of Service state:

– A user must adhere to all rate limits, registration requirements, and comply with all requirements in the Openverse API documentation;

– A user must not scrape the content in the Openverse Catalog;

– A user must not use multiple machines to circumvent rate limits or otherwise take measures to bypass our technical or security measures;

– A user must not operate in a way that negatively affects other users of the API or impedes the WordPress Foundation’s ability to provide its services;

Background

Beginning around May 18th, we saw a significant increase in traffic.

Total requests made to api.openverse.engineering over the last 30 days

While the digital demographics (browser, user agent, OS, device type, etc.) were quite varied, one feature stuck out – these requests were all being made with the page_size=500 parameter.

Total requests made to api.openverse.engineering over the last 30 days using the page_size=500 parameter

Over the course of the last 30 days, these requests constituted almost 80% of our total traffic! While our application is designed to handle this many requests, it is not designed to handle each request querying for 500 results per page (the default page size is 20). As such, this had created significant strain on our Elasticsearch cluster and eventually caused disruptions in the API’s ability to serve results. The image below combines a few of our monitoring tools to show a general correlation between the page_size=500 requests and our Elasticsearch resource utilization.

Request count compared to Elasticsearch resource utilization

Even before this release, our application was set up to throttle individual, anonymous users to 1 request/second. These page_size=500 requests were coming from a myriad of different hosts; the initiator was able to circumvent the individual throttles by employing a large number of machines (also known as a botnet). These machines were also predominantly tied to a single data center and a single ASN, which led us to believe this was orchestrated by a single user.

This behavior was clearly in violation of our Terms of Service, since it was:

  1. Not using a registered API key for high-volume use
  2. Scraping data from Openverse
  3. Using multiple machines to circumvent the application throttles
  4. Consuming significant enough resources that it impacted other users of Openverse

Mitigation

As mentioned above, we deployedDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. a change which would now return a 401 Unauthorized for any anonymous requests to the API that included a page_size greater than the default of 20. Almost immediately after deployment, we saw this mitigation take effect when observing request behavior:

Screenshot of a Cloudflare analytics page. The graph in the center shows total requests with page_size=500, separated by status code over 6 hours. A consistent number of requests (split between 301 and 200) can be seen starting at 9:00 PST. At 13:00 PST, the number of 401 requests begins to overtake the number of 200 requests. After 13:15, the number of 200 requests drops to zero and all requests returned are 401s.
Total number of page_size=500 requests made over the course of 6 hours, separated by return status code

In the above graph, you can see where we deployed v2.5.5 (~13:00 PST) – the number of 200 OK responses decreased, and the number of 401 Unauthorized responses increased significantly! Eventually all of the page_size=500 requests were being rejected as unauthorized.

With this change, we were able to successfully mitigate the botnet and return our resource consumption to typical levels. This can be seen easily with a few Elasticsearch metrics:

Elasticsearch metrics over the last 12 hours

While the intention behind Openverse is to make openly licensed media easy to access, we don’t currently have the capacity to enable users to access the entire dataset at once. We do plan on exploring options for this in the future.

We’re pleased that this mitigation was successful, and we will continue to be vigilant in ensuring uninterrupted access to Openverse for our users!

#openverse, #infrastructure, #api

Community Meeting Recap (14 June 2022)

Takeaways

Done

In progress

Needs review

Discussion

We had little time to discuss the agenda, but we touched on the four points shared, starting from:

  1. New update on the facilitation guide.
  2. Follow up on the process for updating Elasticsearch Indexes.
  3. Follow up on Update database to version 002.
  4. APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.’s next deployment plan.

Community Meeting Recap (7 June 2022)

Takeaways

Done

  • 71 PRs merged over the two-week period
  • We continue to receive and merge contributions from people new to OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project.

In progress

Needs review

Discussions

We had two agenda items to discuss.

Data refresh / API latency issue

This is an ongoing production issue that is blocking new APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. deployments. The problem appears to be a marriage of our infrastructure and new code deployedDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. as part of 2.5.3 (which was an aggregate deployment of 2.5.1, 2.5.2 and 2.5.3).

Madison shared some difficulties with trying to reproduce the issue in staging. We discussed trying to test in production, but due to the risks inherent with that approach, we agreed to continue trying to make staging have 1:1 to parity with production and to even deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. a new production stack in another AWS region to test against (or use as a live back-up in the case of needing to test in the currently live production).

Thumbnail field in catalog/API

We discussed ongoing work to clarify the ways we wish to store thumbnail URLs consumed by our thumbnail proxy. We will continue to follow the strategy of all “generated” secondary artifacts (thumbnails, waveforms, etc.) to be handled by microservices that are heavily cached using Cloudflare. Furthermore, we agreed to move the thumbnail URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org for audio out of the main data.

X-post: Announcement: Incident Response Training

X-comment from +make.wordpress.org/updates: Comment on Announcement: Incident Response Training

Community Meeting Recap (May 17)

Takeaways

Done

  • Deployment of the API v2.5.0 and frontend v.3.3.0 [ref]
  • Catalog: Data refresh process in the catalog made clearer [ref]
  • Frontend: Saving network requests by not fetching the data twice on the search page [ref]
  • Frontend: layout PR fixed a lot of issues with scrolling and the headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. [ref]
  • Frontend: fix for the image size jumps on single result page [ref]
  • OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project.: Addition of the Renovate version management to the Openverse repo [ref]

In progress

Needs review

  • APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.: Overhaul serializer to eliminate manually-defined fields [ref]
  • API: Refactor search controller for consistency and clarity [ref]
  • Frontend: Fix for Copy attribution tabs [ref]
  • Frontend: Support for additional sources with a feature flag [ref]

Upcoming

We are reviewing the required data fields in the catalog to see if any required fields are missing and trying to backfill missing data. This will help us establish trust in the availability of the data and help clean up duplicated code that is necessary now on all layers of the stack because data is not always present.

We also plan on deployingDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. the API with the fix for email registration and the audio length query .

Openverse maintainers welcome Rebecca Widom as a new committer

It gives us great pleasure to announce that Rebecca Widom has been added as a committer to the OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. project! Her code contributions to the catalog and planning work on the addition of 3D models as a new media type have been tremendously beneficial. We’re thankful for her continued effort as a community contributor.

#openverse-committers

Community Meeting Recap (May 10)

Community notices

We’re refining our process for incorporating community contributions from frequent committers! We want to make this process easy for folks so they can continue to make excellent contributions with ease. Look out for some more information in the coming days about what being a “committer” to OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. looks like! [ref]

Takeaways

Done

  • Deployment of the API v2.5.0 [ref]
  • Two frontend milestones closed [ref]
  • APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. Dockerization [ref]
  • Mobile keyboard close on search submit [ref]
  • Updated Terms of Service [ref]
  • Bundle size reporting & deploys of Storybook & Tailwind per PR [ref]
  • (Community contribution) AccessibilityAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) fix for “Back to search results” on mobile [ref]
  • (Community contribution) Improved reporting for quick ingestion tasks [ref]

In progress

Needs review

  • API refactoring [ref]
  • Data refresh task unification [ref]
  • VCheckbox update [ref]
  • Critical fix app layout issues [ref]
  • Tab ordering [ref]
  • License explanation close button [ref]
  • Image jump fix [ref]

Discussions

  • Labels & emojis [ref]
  • Reprioritization of provider API key requests [ref]

Upcoming

Lots of work is underway on internal infrastructure improvements, additional monitoring, and ameliorated security measures. This effort may not be as publicly visible, but it is happening behind the scenes. We also plan on deployingDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. a new version of the frontend this week.

#openverse-weekly-community-meeting

Community Meeting Recap (April 26)

Due to the cancellation of last week’s meeting, this meeting covers the last two weeks.

Takeaways

Done

  • Exciting first contribution from a community member, improving Catalog connection configuration [ref]
  • Many TypeScript PRs merged for the frontend [ref]
  • Improvements to filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. fields in the APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. [ref]
  • Cleanup of the Smithsonian provider DAG, eliminating unneeded DAG and DB table [ref]
  • Interface bug fixes, including translation banner and audio layout updates [ref]
  • The v1.2.1 milestone was completed for the Catalog [ref]
  • Added Tailwind configuration viewer to improve lookups for tailwind values for styling the frontend [ref]

In progress

Needs review

  • Modal updates and TS utilities [ref]
  • High priority frontend bug fixes requiring review [ref]
  • Tabs component [ref]
  • Create VSources Table component, requirement for Removing old styles milestone [ref]
  • Proof of concept for Feature Flag [ref]

Upcoming

Priorities of stalled tickets have been adjusted, then issues labeled with critical priority will be addressed first, and then those with open milestones.

A new milestone was created in the frontend repository for frontend bugs which must be fixed before the next deployDeploy Launching code from a local development environment to the production web server, so that it's available to visitors. [ref].

#openverse-weekly-community-meeting

Community Meeting Recap (April 12th)

Announcements

Next week’s meeting is canceled; the next OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. Weekly Development chat will be at 1500 UTC on April 26th.

Takeaways

Done

  • New version of the APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. released with thumbnail, detail, and related URLs are now HTTPSHTTPS HTTPS is an acronym for Hyper Text Transfer Protocol Secure. HTTPS is the secure version of HTTP, the protocol over which data is sent between your browser and the website that you are connected to. The 'S' at the end of HTTPS stands for 'Secure'. It means all communications between your browser and the website are encrypted. This is especially helpful for protecting sensitive data like banking information. [ref]
  • The Monitoring RFC has been approved and initial and with it, preliminary sketched out code merged [ref]
  • Added storybook visual regression tests [ref]
  • Advances in the TypeScriptification milestone [ref]
  • Improvements to the SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. notifications for the catalog [ref]

In progress

It needs review

  • Writing an RFC for a Monorepo [ref]
  • Improve the thumbnail service to support compression [ref]
  • Move media type categories to constants module [ref]
  • Several PRs for store migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. to use Pinia 🍍 [ref]
  • Creation of issues for addition of new type: 3D model [ref]

It needs discussion

  • The handling and meaning of the alt_files field and extensions for audio [ref]
  • Evaluate GitHubGitHub GitHub is a website that offers online implementation of git repositories that can can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/ labels [ref]

Upcoming

Priorities of stalled tickets have been adjusted, then issues labeled with critical priority will be addressed first, and then those with open milestones.

#openverse-weekly-community-meeting