OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. is a search engine for openly-licensed media.
The Openverse team builds the Openverse Catalog, APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways., and front-end application, as well as integrations between Openverse and WordPress. Follow this site for updates and discussions on the project.
You can also come chat with us in #Openverse on SlackSlackSlack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/.. We have a weekly developer chat at 15:00 UTC on Tuesdays.
When we announced that Openverse had joined WordPress earlier last year, we were thrilled about the exciting changes coming to the platform. Many of those updates are here.
Openverse, previously known as CC Search, is a search engine for openly licensed media. The index, which joined WordPress in mid-2021, has over 600 million Creative Commons licensed and public domain image and audio files. All files can be used free of charge.
OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. has several new features, including:
A redesigned interface: Openverse has a new brand identity and user interface optimized for usability. Find the images and audio files you’re looking for and filterFilterFilters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. results by license, source, and many other options.
Internationalization: Openverse is fully translated in 12 languages, with additional partial translations in other languages. We encourage anyone in the community to submit translations in their own languages.
Audio support: Openverse now includes songs, podcasts, samples, and other audio files from FreeSound, Wikimedia Commons, and Jamendo.
New image providers: The Openverse team has added two new sources of high-quality photographs, the WordPress Photo Directory and StockSnap. In addition, photo libraries such as EDUimages and Images of Empowerment are now available from Meta Search.
The Openverse project is part of the WordPress community and welcomes contributions from those who want to help it become the best openly licensed media search engine on the internet. The WordPress Photo Directory provides such an opportunity.
What’s the WordPress Photo Directory?
The WordPress Photo Directoryis both a new curated source of free, high-quality photographs and a new submission tool for Openverse, powered by the WordPress community. Without it, you’d need to use Flickr, Wikimedia Commons, or other sources to submit your work to Openverse.
The WordPress Photo Directory aims to be a trusted place for the community to create, share, discover, and reuse free and openly licensed media. All photos in the WordPress Photo Directory images are licensed with the CC0 public domain tool.
The WordPress Photo Directory welcomes contributions in different forms. One of the best ways to get involved is by submitting your photos:
Anyone with a wordpress.org account can submit their work to the photo directory. All submissions must meet these guidelines to ensure the quality of content.
Photos will also be categorized and tagged to facilitate searching. Once a submission gets approved, it will be automatically added to the WordPress Photo Directory and the Openverse search engine.
It is worth noting that Openverse and the WordPress Photo Directory are separate and independent projects. However, they are complementary in that the images from the directory are discoverable via the Openverse search. All WordPress Photo Directory images can be viewed in Openverse.
We’re glad to have many community PRs a new contributors this time [ref]:
Added flag to strip slash in urls while validating
Fixed Typo in Frontend repository’s README
Improved the look of the audio detail page on large screens
Mitigated some behavior that was troublesome for the APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. [ref]
Fetching related images on the frontend is now improved and feels faster [ref]
Fixed the Audio “Length” filterFilterFilters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. on the frontend [ref]
Prevented shifting of audio results when more results are added [ref]
The audio player has a loading indicator now [ref]
Many Catalog’s PR adding filetype/filesize extraction [ref]
Upcoming
Finally, priorities were adjusted [ref], new issues to address this week were presented [ref], discussed and (self/re)assigned. A batch of data cleaning is in the plan, alongside many improvements to the OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. infrastructure.
Yesterday at 20:20 UTC, we released version 2.5.5 of our API! Along with a few dependency upgrades and DevEx improvements/fixes, this release also brings an important change regarding anonymous API requests. After v2.5.5, any media searches that are made without an APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. key cannot request more than 20 results per page.
This change was made in order to mitigate behavior we were seeing on the API which was adversely affecting performance for other users, our capacity to update the data that backs OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project., and our ability to deployDeployLaunching code from a local development environment to the production web server, so that it's available to visitors. new changes to the API.
– A user must adhere to all rate limits, registration requirements, and comply with all requirements in the Openverse API documentation;
– A user must not scrape the content in the Openverse Catalog;
– A user must not use multiple machines to circumvent rate limits or otherwise take measures to bypass our technical or security measures;
– A user must not operate in a way that negatively affects other users of the API or impedes the WordPress Foundation’s ability to provide its services;
Background
Beginning around May 18th, we saw a significant increase in traffic.
While the digital demographics (browser, user agent, OS, device type, etc.) were quite varied, one feature stuck out – these requests were all being made with the page_size=500 parameter.
Over the course of the last 30 days, these requests constituted almost 80% of our total traffic! While our application is designed to handle this many requests, it is not designed to handle each request querying for 500 results per page (the default page size is 20). As such, this had created significant strain on our Elasticsearch cluster and eventually caused disruptions in the API’s ability to serve results. The image below combines a few of our monitoring tools to show a general correlation between the page_size=500 requests and our Elasticsearch resource utilization.
Even before this release, our application was set up to throttle individual, anonymous users to 1 request/second. These page_size=500 requests were coming from a myriad of different hosts; the initiator was able to circumvent the individual throttles by employing a large number of machines (also known as a botnet). These machines were also predominantly tied to a single data center and a single ASN, which led us to believe this was orchestrated by a single user.
This behavior was clearly in violation of our Terms of Service, since it was:
Not using a registered API key for high-volume use
Scraping data from Openverse
Using multiple machines to circumvent the application throttles
Consuming significant enough resources that it impacted other users of Openverse
Mitigation
As mentioned above, we deployedDeployLaunching code from a local development environment to the production web server, so that it's available to visitors. a change which would now return a 401 Unauthorized for any anonymous requests to the API that included a page_size greater than the default of 20. Almost immediately after deployment, we saw this mitigation take effect when observing request behavior:
In the above graph, you can see where we deployed v2.5.5 (~13:00 PST) – the number of 200 OK responses decreased, and the number of 401 Unauthorized responses increased significantly! Eventually all of the page_size=500 requests were being rejected as unauthorized.
With this change, we were able to successfully mitigate the botnet and return our resource consumption to typical levels. This can be seen easily with a few Elasticsearch metrics:
While the intention behind Openverse is to make openly licensed media easy to access, we don’t currently have the capacity to enable users to access the entire dataset at once. We do plan on exploring options for this in the future.
We’re pleased that this mitigation was successful, and we will continue to be vigilant in ensuring uninterrupted access to Openverse for our users!
We had little time to discuss the agenda, but we touched on the four points shared, starting from:
New update on the facilitation guide.
Follow up on the process for updating Elasticsearch Indexes.
Follow up on Update database to version 002.
APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.’s next deployment plan.
We continue to receive and merge contributions from people new to OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project.
This is an ongoing production issue that is blocking new APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. deployments. The problem appears to be a marriage of our infrastructure and new code deployedDeployLaunching code from a local development environment to the production web server, so that it's available to visitors. as part of 2.5.3 (which was an aggregate deployment of 2.5.1, 2.5.2 and 2.5.3).
Madison shared some difficulties with trying to reproduce the issue in staging. We discussed trying to test in production, but due to the risks inherent with that approach, we agreed to continue trying to make staging have 1:1 to parity with production and to even deployDeployLaunching code from a local development environment to the production web server, so that it's available to visitors. a new production stack in another AWS region to test against (or use as a live back-up in the case of needing to test in the currently live production).
We discussed ongoing work to clarify the ways we wish to store thumbnail URLs consumed by our thumbnail proxy. We will continue to follow the strategy of all “generated” secondary artifacts (thumbnails, waveforms, etc.) to be handled by microservices that are heavily cached using Cloudflare. Furthermore, we agreed to move the thumbnail URLURLA specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org for audio out of the main data.
Catalog: Data refresh process in the catalog made clearer [ref]
Frontend: Saving network requests by not fetching the data twice on the search page [ref]
Frontend: layout PR fixed a lot of issues with scrolling and the headerHeaderThe header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. [ref]
Frontend: fix for the image size jumps on single result page [ref]
OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project.: Addition of the Renovate version management to the Openverse repo [ref]
In progress
Needs review
APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways.: Overhaul serializer to eliminate manually-defined fields [ref]
API: Refactor search controller for consistency and clarity [ref]
Frontend: Support for additional sources with a feature flag [ref]
Upcoming
We are reviewing the required data fields in the catalog to see if any required fields are missing and trying to backfill missing data. This will help us establish trust in the availability of the data and help clean up duplicated code that is necessary now on all layers of the stack because data is not always present.
We also plan on deployingDeployLaunching code from a local development environment to the production web server, so that it's available to visitors. the API with the fix for email registration and the audio length query .
Madison
5:12 pm on May 11, 2022 Tags: openverse-committers
We’re refining our process for incorporating community contributions from frequent committers! We want to make this process easy for folks so they can continue to make excellent contributions with ease. Look out for some more information in the coming days about what being a “committer” to OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. looks like! [ref]
APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. Dockerization [ref]
Bundle size reporting & deploys of Storybook & Tailwind per PR [ref]
(Community contribution) AccessibilityAccessibilityAccessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) fix for “Back to search results” on mobile [ref]
(Community contribution) Improved reporting for quick ingestion tasks [ref]
Reprioritization of provider API key requests [ref]
Upcoming
Lots of work is underway on internal infrastructure improvements, additional monitoring, and ameliorated security measures. This effort may not be as publicly visible, but it is happening behind the scenes. We also plan on deployingDeployLaunching code from a local development environment to the production web server, so that it's available to visitors. a new version of the frontend this week.
Improvements to filterFilterFilters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. fields in the APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. [ref]
Cleanup of the Smithsonian provider DAG, eliminating unneeded DAG and DB table [ref]
Interface bug fixes, including translation banner and audio layout updates [ref]
The v1.2.1 milestone was completed for the Catalog [ref]
Added Tailwind configuration viewer to improve lookups for tailwind values for styling the frontend [ref]
Priorities of stalled tickets have been adjusted, then issues labeled with critical priority will be addressed first, and then those with open milestones.
A new milestone was created in the frontend repository for frontend bugs which must be fixed before the next deployDeployLaunching code from a local development environment to the production web server, so that it's available to visitors. [ref].
Next week’s meeting is canceled; the next OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. Weekly Development chat will be at 1500 UTC on April 26th.
Takeaways
Done
New version of the APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. released with thumbnail, detail, and related URLs are now HTTPSHTTPSHTTPS is an acronym for Hyper Text Transfer Protocol Secure. HTTPS is the secure version of HTTP, the protocol over which data is sent between your browser and the website that you are connected to. The 'S' at the end of HTTPS stands for 'Secure'. It means all communications between your browser and the website are encrypted. This is especially helpful for protecting sensitive data like banking information. [ref]
The Monitoring RFC has been approved and initial and with it, preliminary sketched out code merged [ref]
Advances in the TypeScriptification milestone [ref]
Improvements to the SlackSlackSlack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. notifications for the catalog [ref]
Improve the thumbnail service to support compression [ref]
Move media type categories to constants module [ref]
Several PRs for store migrationMigrationMoving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. to use Pinia 🍍 [ref]
Creation of issues for addition of new type: 3D model [ref]
It needs discussion
The handling and meaning of the alt_files field and extensions for audio [ref]
Evaluate GitHubGitHubGitHub is a website that offers online implementation of git repositories that can can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/ labels [ref]
Upcoming
Priorities of stalled tickets have been adjusted, then issues labeled with critical priority will be addressed first, and then those with open milestones.