Openverse Monthly Priorities Meeting 2022-10-05

OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. contributors will host a community meeting to discuss priorities for the month of October at 1500 UTC on 2022-10-05.

A sync video chat link will be provided. We hope to see you there!

You can read the notes document for these meetings to catch up on past discussions.

Next steps for Walters Art Museum data

Today I attempted to refactor the Walters Art Museum provider APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. script (see this GitHub issue). While working on this refactor, I noticed that I could neither use the testing sandbox provided by the API nor create a user account to receive an API key. We have tried reaching out a number of times over the past year to ask for the CC Search API key to no avail.

As it stands, we have no way of confirming that the API could be accessible once this DAG is turned on. We only have 16,948 records in the catalog/API (confirmed in both places). The last update to the API codebase was made on August 7th, 2015, and the last update to any of our data was December 1st, 2020. The media that our data references still exists AFAICT.

Given all this context, I propose that we:

  1. Create a one-off script to populate height, width, filesize, and filetype (see the filesize/filtype and height/width backfill GitHubGitHub GitHub is a website that offers online implementation of git repositories that can easily be shared, copied and modified by other developers. Public repositories are free to host, private repositories require a paid subscription. GitHub introduced the concept of the ‘pull request’ where code changes done in branches by contributors can be reviewed and discussed before being merged be the repository owner. https://github.com/ issues). This can likely be done without an API key using the direct image URLs we have in our database.
  2. Move the Walters provider script into the Retired DAGs directory and decommission the DAG.

It does not seem likely that API will become accessible to us again in the near future. The backfills described above would at least allow us to have the minimum data we’d like to have now as part of our ongoing data normalization effort and allow us to continue to serve the data we have in the API.

What do y’all think?

#data-normalization, #provider

Community Meeting Recap (27 September 2022)

Meeting start

🎉 Done!

👀 Needs review

🚧 In progress/To Do

Meeting end

#openverse-weekly-community-meeting

Community Meeting Recap (20 September 2022)

Meeting start

🎉 Done!

👀 Needs review

🚧 In progress/To Do

💬 Agenda discussion

Meeting end

#openverse-weekly-community-meeting

X-post: WordCamp US Contributor Day 2022 Recap

X-comment from +make.wordpress.org/updates: Comment on WordCamp US Contributor Day 2022 Recap

Community Meeting Recap (13 September 2022)

📣 Announcement

Design update: Francisco (@fcoveram) is going away from keyboard (AFK) for a month and will have the headerHeader The header of your site is typically the first thing people will experience. The masthead or header art located across the top of your page is part of the look and feel of your website. It can influence a visitor’s opinion about your content and you/ your organization’s brand. It may also look different on different screen sizes. design done by the end of this week. The update includes the creation of several components, documenting the Figma file and recording a video explaining the change and its composition. Everything will be shared in the design issue.

Meeting start

🎉 Done!

  • We finished 2 frontend milestones: Remove Audio’s ‘BetaBeta A pre-release of software that is given out to a large group of users to trial under real conditions. Beta versions have gone through alpha testing in-house and are generally fairly close in look, feel and function to the final product; however, design changes often occur as part of the process. status and Copy improvements
  • Audio peaks are now optional on APIAPI An API or Application Programming Interface is a software intermediary that allows programs to interact with each other and share data in limited, clearly defined ways. responses
  • API’s CI/CD pipeline was fixed with the addition of a user for ingestion-server and indexer_worker containers
  • Added peaks=true query param to all audio searches (in the frontend)
  • Made link validation expiry maximally configurable (in the API)

👀 Needs review

🚧 In progress

  • Test the new API key for the Brooklyn Museum
  • Move page and page_size query param validation into serializer
  • Move the dead link tally script PR to WordPress/openverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. instead of the API repo

Meeting end

#openverse-weekly-community-meeting

Frontend Release v3.4.8 and a call for a11y testing

v3.4.8 of the OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. frontend released today. View the full changelog in GitHub.

Most crucially, we have released a new version of our audio track component with accessibilityAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) improvements. We would love if folks with #accessibility expertise could test the component and provide any feedback. Specifically, we’re looking for feedback on the experience of using our audio component with keyboard controls in a screen reader. Here is an example URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org:

https://wordpress.org/openverse/search/audio?q=dance

Any identified bugs or concerns can be shared here in GitHub. Thank you!

We also fixed a bug with the ‘load more’ button disappearing when one media type returned zero results, along with a number of small improvements.

Thank you to all contributors!

Community meeting recap (6 September 2022)

Meeting start

Done

  • Copy improvements in the frontend
  • Composite audio player for better accessibilityAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility)
  • Removal of unused fonts

Fixes

  • Fixes to Prettier and ESLint in lint-staged
  • Fixes to Jamendo URLs
  • Fixes to audio type accuracy on the frontend
  • Fixes audio playback when transitioning page

In progress

Discussions

Deployments

Announcements

  • Monthly priorities call on Wednesday, 7 September 2022 @ 1500 UTC.
  • Remote participation of OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. at WCUS on Sunday, 11 September 2022 @ 0900 to 1530 PDT.

Meeting end

#openverse-weekly-community-meeting

Openverse Biweekly Update – September 5th

The OpenverseOpenverse Openverse is a search engine for openly-licensed media, including photos, audio, and video. Openverse is also the name for the collection of related code repositories that make up the project. Biweekly update is an every-two-weeks summary of the work completed by the Openverse team.

Openverse and WCUS

Openverse will be participating in WCUS’ contributor dayContributor Day Contributor Days are standalone days, frequently held before or after WordCamps but they can also happen at any time. They are events where people get together to work on various areas of https://make.wordpress.org/ There are many teams that people can participate in, each with a different focus. https://2017.us.wordcamp.org/contributor-day/ https://make.wordpress.org/support/handbook/getting-started/getting-started-at-a-contributor-day/. remotely. Details here. We’ve also had a new community member graciously offer to help out the folks at our table, so we’ll have an in-person presence as well as our remote support. @krysal and I (@zackkrida) along with @dhruvkb (for a bit) will be hanging out in SlackSlack Slack is a Collaborative Group Chat Platform https://slack.com/. The WordPress community has its own Slack Channel at https://make.wordpress.org/chat/. to support contributors.

iNaturalist and our latest media totals

iNaturalist is a joint initiative of the California Academy of Sciences and the National Geographic Society.

This period we added +14,778,368 new images and +4,484 new audio files. A majority of these images came from our new iNaturalist integration, written by community contributor @beccawidom. We’ve so far only ingested a small subset of their collection, but have added some remarkable images to Openverse as a result. We have open PRs to make some optimizations to the iNaturalist DAG moving forward.

Call for a11yAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) testing on our new audio track

At the end of last week we merged a substantial PR from @dhruvkb which makes major accessibilityAccessibility Accessibility (commonly shortened to a11y) refers to the design of products, devices, services, or environments for people with disabilities. The concept of accessible design ensures both “direct access” (i.e. unassisted) and “indirect access” meaning compatibility with a person’s assistive technology (for example, computer screen readers). (https://en.wikipedia.org/wiki/Accessibility) and usability improvements to our audio track component. Tracks can be played/paused, seeked (bonus tip: you can do a faster seek by pressing Shift+left/right arrows), and navigated to all from a single root element. Previously, the component required navigating to individual controls to perform a specific action. We’ve also added a helpful snackbar when navigating by keyboard that announces the available controls to users. Here’s a quick video of how it works:

Which you can compare to the previous behavior, along with observing my failed attempts to seek the audio tracks while focused on the play/pause buttons:

While we’ve done local testing in MacOS with Safari and VoiceOver, and with NVDA on Windows, we are not daily screen reader users. Frankly, we’ve also found reference implementations of audio players from SoundCloud, as an example, to be woefully inadequate in their accessibility. We would love if any regular screen reader users or general #accessibility experts could take a look at our staging audio results page and give us some feedback. We’ll be posting a full request to our Make blog later this week.

Other Highlights

  • Thanks to smart diagnostic work and memory profiling from @sarayourfriend and a rapid, high-quality refactor from @olgabulat, we were able to mitigate our frontend memory leak and close the project ahead of schedule.
  • It’s the last day to leave your thoughts on our team priorities for the month; we’ve had a lively discussion with several community members chiming in. Let us know what you’d like to see us work on!
  • One thing we’ll definitely be working on is the Openverse migrationMigration Moving the code, database and media files for a website site from one server to another. Most typically done when changing hosting companies. away from using an iFrameiframe iFrame is an acronym for an inline frame. An iFrame is used inside a webpage to load another HTML document and render it. This HTML document may also contain JavaScript and/or CSS which is loaded at the time when iframe tag is parsed by the user’s browser.. This will make some dramatic improvements to SEO and overall usability. Link-sharing, in particular. Stay tuned for a kickoff post on that project in the coming weeks.
  • We’re making steady progress on our Catalog milestone to refactor all of our existing provider scripts. This standardization will make making bulk changes to provider script behavior a breeze, and allow us to make optimizations and improvements centrally that will improve the data quality and performance of all of our provider scripts.

#biweekly-update

Applying ECS to the ingestion server/data refresh

This was a passing thought I had that I wanted to note somewhere. Currently the ingestion server is a small Falcon app that runs most aspects of the data refresh, but then also (in staging/prod) interacts with a fleet of “indexer worker” EC2 instances when performing the Postgres -> Elasticsearch indexing.

We have plans for moving the data refresh steps from the ingestion server into Airflow. Most of these steps are operations on the various databases, so they’re not very processor-intensive on the server end. However, the indexing steps are intensive, which is why they’re spread across 6 machines in production (and even then it can take a number of hours to complete).

We could replicate this process in Airflow by setting up Celery-based workers so that the tasks run on a separate instance from the webserver/scheduler. Ultimately I’d like to go this route (or use something like the ECS Executor rather than Celery), but that’s a non-trivial effort to complete.

One other way we could accomplish this would be to use ECS tasks! We could have a container defined specifically for the indexing step, which expects to receive the range on which to index and all necessary connection information. We could then kick off n of those jobs using the EcsRunTaskOperator, and wait for completion using the EcsTaskStateSensor to determine when they complete. This could be done in our current setup without any new Airflow infrastructure. It’d also allow us to remove the indexer workers, which currently sit idle (albeit in the stopped state) in EC2 until they are used.

#airflow, #data-refresh, #ecs, #infrastructure, #openverse