OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. is a search engine for openly-licensed media.
The Openverse team builds the Openverse Catalog, APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways., and front-end application, as well as integrations between Openverse and WordPress. Follow this site for updates and discussions on the project.
You can also come chat with us in #Openverse on SlackSlackSlack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/.. We have a weekly developer chat at 15:00 UTC on Tuesdays.
OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. contributors will host a community meeting to discuss priorities for the month of October at 1500 UTC on 2022-10-05.
A sync video chat link will be provided. We hope to see you there!
You can read the notes document for these meetings to catch up on past discussions.
Madison
11:10 pm on September 30, 2022 Tags: data-normalization, provider
Today I attempted to refactor the Walters Art Museum provider APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. script (see this GitHub issue). While working on this refactor, I noticed that I could neither use the testing sandbox provided by the API nor create a user account to receive an API key. We have tried reaching out a number of times over the past year to ask for the CC Search API key to no avail.
As it stands, we have no way of confirming that the API could be accessible once this DAG is turned on. We only have 16,948 records in the catalog/API (confirmed in both places). The last update to the API codebase was made on August 7th, 2015, and the last update to any of our data was December 1st, 2020. The media that our data references still exists AFAICT.
Given all this context, I propose that we:
Create a one-off script to populate height, width, filesize, and filetype (see the filesize/filtype and height/width backfill GitHubGitHubGitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/ issues). This can likely be done without an API key using the direct image URLs we have in our database.
Move the Walters provider script into the Retired DAGs directory and decommission the DAG.
It does not seem likely that API will become accessible to us again in the near future. The backfills described above would at least allow us to have the minimum data we’d like to have now as part of our ongoing data normalization effort and allow us to continue to serve the data we have in the API.
Design update: Francisco (@fcoveram) is going away from keyboard (AFK) for a month and will have the headerHeaderThe header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. design done by the end of this week. The update includes the creation of several components, documenting the Figma file and recording a video explaining the change and its composition. Everything will be shared in the design issue.
We finished 2 frontend milestones: Remove Audio’s ‘BetaBetaA pre-release of software that is given out to a large group of users to trial under real conditions. Beta versions have gone through alpha testing in-house and are generally fairly close in look, feel and function to the final product; however, design changes often occur as part of the process.’ status and Copy improvements
Audio peaks are now optional on APIAPIAn API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. responses
API’s CI/CD pipeline was fixed with the addition of a user for ingestion-server and indexer_worker containers
Added peaks=true query param to all audio searches (in the frontend)
Made link validation expiry maximally configurable (in the API)
Move page and page_size query param validation into serializer
Move the dead link tally script PR to WordPress/openverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. instead of the API repo
v3.4.8 of the OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. frontend released today. View the full changelog in GitHub.
Most crucially, we have released a new version of our audio track component with accessibilityAccessibilityAccessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) improvements. We would love if folks with #accessibility expertise could test the component and provide any feedback. Specifically, we’re looking for feedback on the experience of using our audio component with keyboard controls in a screen reader. Here is an example URLURLA specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org:
Composite audio player for better accessibilityAccessibilityAccessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility)
Remote participation of OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. at WCUS on Sunday, 11 September 2022 @ 0900 to 1530 PDT.
The OpenverseOpenverseOpenverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. Biweekly update is an every-two-weeks summary of the work completed by the Openverse team.
This period we added +14,778,368 new images and +4,484 new audio files. A majority of these images came from our new iNaturalist integration, written by community contributor @beccawidom. We’ve so far only ingested a small subset of their collection, but have added some remarkable images to Openverse as a result. We have open PRs to make some optimizations to the iNaturalist DAG moving forward.
Call for a11yAccessibilityAccessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) testing on our new audio track
At the end of last week we merged a substantial PR from @dhruvkb which makes major accessibilityAccessibilityAccessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) and usability improvements to our audio track component. Tracks can be played/paused, seeked (bonus tip: you can do a faster seek by pressing Shift+left/right arrows), and navigated to all from a single root element. Previously, the component required navigating to individual controls to perform a specific action. We’ve also added a helpful snackbar when navigating by keyboard that announces the available controls to users. Here’s a quick video of how it works:
Which you can compare to the previous behavior, along with observing my failed attempts to seek the audio tracks while focused on the play/pause buttons:
While we’ve done local testing in MacOS with Safari and VoiceOver, and with NVDA on Windows, we are not daily screen reader users. Frankly, we’ve also found reference implementations of audio players from SoundCloud, as an example, to be woefully inadequate in their accessibility. We would love if any regular screen reader users or general #accessibility experts could take a look at our staging audio results page and give us some feedback. We’ll be posting a full request to our Make blog later this week.
Other Highlights
Thanks to smart diagnostic work and memory profiling from @sarayourfriend and a rapid, high-quality refactor from @olgabulat, we were able to mitigate our frontend memory leak and close the project ahead of schedule.
It’s the last day to leave your thoughts on our team priorities for the month; we’ve had a lively discussion with several community members chiming in. Let us know what you’d like to see us work on!
One thing we’ll definitely be working on is the Openverse migrationMigrationMoving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. away from using an iFrameiframeiFrame is an acronym for an inline frame. An iFrame is used inside a webpage to load another HTML document and render it. This HTML document may also contain JavaScript and/or CSS which is loaded at the time when iframe tag is parsed by the user’s browser.. This will make some dramatic improvements to SEO and overall usability. Link-sharing, in particular. Stay tuned for a kickoff post on that project in the coming weeks.
We’re making steady progress on our Catalog milestone to refactor all of our existing provider scripts. This standardization will make making bulk changes to provider script behavior a breeze, and allow us to make optimizations and improvements centrally that will improve the data quality and performance of all of our provider scripts.
This was a passing thought I had that I wanted to note somewhere. Currently the ingestion server is a small Falcon app that runs most aspects of the data refresh, but then also (in staging/prod) interacts with a fleet of “indexer worker” EC2 instances when performing the Postgres -> Elasticsearch indexing.
We have plans for moving the data refresh steps from the ingestion server into Airflow. Most of these steps are operations on the various databases, so they’re not very processor-intensive on the server end. However, the indexing steps are intensive, which is why they’re spread across 6 machines in production (and even then it can take a number of hours to complete).
We could replicate this process in Airflow by setting up Celery-based workers so that the tasks run on a separate instance from the webserver/scheduler. Ultimately I’d like to go this route (or use something like the ECS Executor rather than Celery), but that’s a non-trivial effort to complete.
One other way we could accomplish this would be to use ECS tasks! We could have a container defined specifically for the indexing step, which expects to receive the range on which to index and all necessary connection information. We could then kick off n of those jobs using the EcsRunTaskOperator, and wait for completion using the EcsTaskStateSensor to determine when they complete. This could be done in our current setup without any new Airflow infrastructure. It’d also allow us to remove the indexer workers, which currently sit idle (albeit in the stopped state) in EC2 until they are used.