Openverse’s staging site has moved

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project.’s staging site, where contributors evaluate their recently-merged work and prepare new releases of Openverse, has moved from search-staging.openverse.engineering to staging.openverse.org, following the project’s acquisition of the openverse.org domain name at the end of last year. The new name is simpler to remember and to type.

For the foreseeable future, the old .engineering domain name will redirect to the new one. At this time this change only effects the staging frontend site. Any other domain name changes to Openverse services will be announced at a future date.

If you see any references to the old domain name in our documentation, or elsewhere on the web, please let us know.

Preparing for iNaturalist

Today we were able to merge some massive and significant changes contributed by @beccawidom to the iNaturalist DAG! This PR includes a number of changes, namely:

  • The transformation steps have changed from “CSV -> Postgres -> TSV -> Postgres” now to “CSV -> Postgres -> Postgres”. This significantly reduces disk space, time, and processing overhead, and was a necessary change in order to process all of the iNaturalist data in a reasonable timeframe. It also serves as a proof-of-concept for future bulk data imports, since the transformation & data cleaning steps are happening entirely in SQL (an OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. first!).
  • Images are now connected with the Catalog of Life, which provides English vernacular names. This should help improve search relevancy over the current scientific names.

I want to take a moment to celebrate this huge accomplishment, and the tremendous effort @beccawidom poured into this effort. Thank you!


Now that this DAG is ready to be run once again, we’re faced with the impressive and daunting notion that we could, in a matter of days, increase the size of the image catalog by ~137 million (a roughly 23.3% increase in size). With that information, it’s important to consider the implications of including this data.

We have a weekly image data refresh process which transfers images from the catalog into our APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. for public use. Presently, this data refresh takes around 47 hours without the popularity recalculation and 60 hours with the popularity recalculation. If we are to assume these times are linear, we can expect those times to become 58 hours and 74 hours respectively. Since these are run weekly, this still gives us about 100 hours left in the week before we start having data refreshes queued while previous ones are running.

Here are some steps we can take to monitor the process:

  1. Take a manual database snapshot of the catalog prior to enabling the iNaturalist DAG.
  2. Enable the DAG shortly after the weekly data refresh has completed. This will allow iNaturalist to run without other significant database operations occurring.
  3. Disable the DAG after the run while we verify the following steps.
  4. Monitor the next scheduled image data refresh closely for significant aberrations in step duration.
  5. Make a number of searches after the data refresh is complete to see how results are affected. We can make a number of searches which we would expect to return iNaturalist data (e.g. cat, mushroom, alligator) and some we expect should not (e.g. computer, transistor, book).
  6. Re-enable the iNaturalist DAG.

One of our big-picture goals for 2023 is search relevancy, and a key piece required for making improvements in that area is understanding how our existing document scoring works. I’m not sure that we can predict how adding this much data will affect our result relevancy. In the case where we notice result relevancy is negatively impacted (e.g. unrelated queries are flooded with iNaturalist results), there are a few actions we can take to mitigate this:

  • Alter the weight of the provider in the API (@sarayourfriend had mentioned this as an option).
  • Set the authority boost of the provider in the ingestion server and reindex the images.
  • Disable the iNaturalist provider in the API.

We would like to do all we can to avoid the last option. I don’t presume that the iNaturalist data will require taking the above actions, but I wanted to outline them and open up space in case other folks have mitigation ideas.

We’re incredibly excited for the addition of this data!

#catalog #database

Openverse switches to Photon for thumbnail generation

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. has moved from a self-hosted Imaginary instance for thumbnail generation to using Photon. Photon is a fast and flexible open-source image service with powerful tools for cropping, resizing, and filtering images.

The hosted instance of Photon we’re connecting to is provided to all Jetpack-connected WordPress sites, or sites hosted on the WordPress.comWordPress.com An online implementation of WordPress code that lets you immediately access a new WordPress environment to publish your content. WordPress.com is a private company owned by Automattic that hosts the largest multisite in the world. This is arguably the best place to start blogging if you have never touched WordPress before. https://wordpress.com/ platform. As we’ve been granted permission to use this for the Openverse frontend and APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways., other instances of the Openverse API, should you choose to host one, would need to connect to a different endpoint to comply with Photon’s Terms of Service. The Openverse API is easily configurable to switch to a different thumbnail proxy endpoint.

Please let us know if you encounter any issues with thumbnails on wordpress.orgWordPress.org The community site where WordPress code is created and shared by the users. This is where you can download the source code for WordPress core, plugins and themes as well as the central location for community conversations and organization. https://wordpress.org//openverse or in our API responses. Any feedback or concerns can be shared as comments on this post or as a GitHub issue. Thank you!

Community Meeting Recap (10 January 2023)

🗓️ Note: There will be no meeting for the next two weeks. The normal schedule will resume on January 31st.

[Slack: Meeting start]

🎉 Done!

👀 Needs review

🚧 In progress/To Do

📒 Agenda

  • There were no items to discuss.

[Slack: Meeting end]

#openverse-weekly-community-meeting

Community Meeting Recap (3 January 2023)

[Slack: Meeting start]

Done!

  • Add hash for livereload wheel [PR] [Slack]
  • Update the search term when users click the back button; extract useSearch [PR] [Slack]

In progress/Needs review

  • Split homepage VR tests into old and new [PR] [Slack]
  • Update spacing in the internal headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. [PR] [Slack]
  • Component scaffolding generator [PR] [Slack]
  • Move new homepage and 404 to the default layout [PR] [Slack]
  • Update the VSearchTypeButton to match the new homepage designs [PR] [Slack]
  • Update search types popover for the new homepage and header [PR] [Slack]
  • Allow no content responses from GitHubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/ [PR] [Slack]
  • Upgrade to Airflow 2.5.0 [PR] [Slack]
  • Temporarily increase Freesound delay & timeout [PR] [Slack]
  • Make Phylopic a dated-only DAG [PR] [Slack]
  • Render per-repo pull request template [PR] [Slack] (Now “Done! 🎉)
  • Add a Nappy provider DAG using ProviderDataIngester [PR] [Slack]

Closed

  • Postgres connection is crashing in production [Issue]

Agenda

  • IFrameiframe iFrame is an acronym for an inline frame. An iFrame is used inside a webpage to load another HTML document and render it. This HTML document may also contain JavaScript and/or CSS which is loaded at the time when iframe tag is parsed by the user’s browser. project. Can we discuss remaining work and a potential timeline? [Slack]
    • Frontend: All PRs merged except 1, to add popover or modal for selecting the search type. The spacing issue and filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. counter can be for after the iframe migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies..
    • Infrastructure: Being tracked in milestone containing 3 issues.
  • License discrepancy between repos [Slack]
    • State: Still looking for legal clarity on the copyright holder part.
    • Prior art
  • Search result error pages [Slack]
    • Only error pages are limited to a max-width of 1280px
    • Can be safely converted to a column-based layout if necessitated by mockups

[Slack: Meeting end]

#openverse-weekly-community-meeting

Community Meeting Recap (20 December 2022)

[Slack: Meeting start]

🎉 Done!

👀 Needs review

🚧 In progress/To Do

📒 Agenda

Comments are welcome in each issue/post.

[Slack: Meeting end]

#openverse-weekly-community-meeting

Community Meeting Recap (13 December 2022)

[Slack: Meeting start]

Done!

  • Change the label checker action [PR] [Slack]
  • fix margin bottom of pages menu [PR] [Slack]
  • Cleanup the fonts [PR] [Slack]
  • Replace deprecated set-output command [PR] [Slack]
  • Prevent pre-commit installation if it exists [PR] [Slack]
  • Reinstate image thumbnail column [PR] [Slack]
  • Make Finnish DAG dated [PR] [Slack]

In progress/Needs review

  • RFC: Frontend event tracking [PR] [Slack]
  • Add documentation for finding & interpreting dashboards [PR] [Slack]
  • Add db-shell just command and creates a custom user in the web container [PR] [Slack]
  • Split the e2e and vr tests [PR] [Slack]
  • Reduce DB queries needed in search results [PR] [Slack]
  • Make page_count self-truncating when we detect the last page of a query [PR] [Slack]
  • Replace grequests with asyncio solution [PR] [Slack]
  • Use environment variable to determine whether to filterFilter Filters are one of the two types of Hooks https://codex.wordpress.org/Plugin_API/Hooks. They provide a way for functions to modify data of other functions. They are the counterpart to Actions. Unlike Actions, filters are meant to work in an isolated manner, and should never have side effects such as affecting global variables and output. dead links by default; fix flaky tests [PR] [Slack]

Agenda

  • Airflow base Docker image [Slack]
  • OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. x GutenbergGutenberg The Gutenberg project is the new Editor Interface for WordPress. The editor improves the process and experience of creating new content, making writing rich content much simpler. It uses ‘blocks’ to add richness rather than shortcodes, custom HTML etc. https://wordpress.org/gutenberg/ integration [Slack]
  • Tailwind typography pluginPlugin A plugin is a piece of software containing a group of functions that can be added to a WordPress website. They can extend functionality or add new features to your WordPress websites. WordPress plugins are written in the PHP programming language and integrate seamlessly with WordPress. These can be free in the WordPress.org Plugin Directory https://wordpress.org/plugins/ or can be cost-based plugin from a third-party [Slack]
    • Will be dropped till i18n works well with raw HTMLHTML HTML is an acronym for Hyper Text Markup Language. It is a markup language that is used in the development of web pages and websites.
    • Francisco will share a file for solving text style inconsistencies
  • New labels for design [Slack]
    • Resolved with the existing “💬 talk: discussion” and “🏁 status: ready for work” labels
    • Flow will be “needs design” → “aspect: design” + “talk: discussion” → “aspect: design” + “status: ready for work”

[Slack: Meeting end]

X-post: Suggest Topics for the 2023 WordPress Community Summit

X-comment from +make.wordpress.org/community: Comment on Suggest Topics for the 2023 WordPress Community Summit

Community Meeting Recap (7 December 2022)

[Slack: Meeting start]

🎉 Done!

Notably, this week included many submissions from community contributors!

👀 Needs review

🚧 In progress/To Do

📒 Agenda

Reminder

  • OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. contributors will host a community meeting to discuss priorities for 2023 at 1500 UTC on 2022-12-07. More details about the meeting and its format in the Post.

Comments are welcome in each issue/post.

[Slack: Meeting end]

#openverse-weekly-community-meeting

Openverse Monthly Priorities Meeting 2022-12-07

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. contributors will host a community meeting to discuss priorities for 2023 at 1500 UTC on 2022-12-07.

A sync video chat link will be provided. We hope to see you there.

You can read the notes document for these meetings and the recap after our last session here. The notes document contains a tentative outline which the conversation will follow.

Much of the conversation will be driven by our Thinking towards 2023 post from November 16th.