Wikimedia blog

News from the Wikimedia Foundation and about the Wikimedia movement

Announcing the official Commons app for iOS and Android

Login screen on the Commons app for Android.

Login screen on the Commons app for Android.

Love taking photos on your smartphone? Now you don’t need to wait to get home to upload your high quality educational photos to Wikimedia Commons, the free image repository used by Wikipedia and many other projects.

The official Wikimedia Commons app for iOS and Android allows you to quickly and easily upload your photos to Commons. You can also upload multiple files and add categories (Android only so far) and share your uploads through your favorite image sharing sites. Your contributions to Commons can help illustrate the world’s largest encyclopedia and make knowledge come to life for millions of readers around the globe.

The "my uploads" view on the Commons app for iOS.

The “my uploads” view on the Commons app for iOS.

In the future, we hope to add more features and make it easier to browse and discover all the great content Commons has to offer. We also look forward to being able to run more campaigns like Wiki Loves Monuments, encouraging expert Commons users and people new to Wikimedia projects alike to contribute to high-need content areas.

As always, we need your help and input to make these apps better. Take the apps for a test drive and let us know if you encounter bugs, or if you have great ideas for features we should add in the future.

And if you don’t have an iOS or Android device, don’t feel left out! Uploads to Commons for a wider selection of phones and browsers are supported on the mobile version of all Wikimedia projects.

Maryana Pinchuk, Associate Product Manager, Wikimedia Foundation

Education program leaders gather to share experiences

More than 40 people from 25 countries gathered together in person in Milan, Italy, last week to discuss Wikimedia projects’ use in education. Representatives from Wikimedia chapters, the Wikimedia Foundation, and universities worldwide discussed ways to further develop the relationships between educational institutions and Wikipedia and other Wikimedia projects.

Participants in the Education Leaders Workshop in Milan.

Participants in the Education Program Leaders Workshop in Milan.

The Education Program Leaders Workshop was held in conjunction with the Wikimedia chapters conference in Milan, an annual opportunity for representatives from around the world to meet in person to discuss the future of the movement. The enthusiasm worldwide for the program bodes well for the future of Wikimedia projects like Wikipedia and education.

Notes from the workshop highlight the incredible depth and breadth of activities happening worldwide in the education sphere. Some programs, like in Serbia, Czech Republic, Ukraine, Brazil, and Egypt, have been in operation for several terms and have been achieving incredible results on their language Wikipedias. Others, including programs in Germany, Sweden, and the United Kingdom, have dedicated staff people working on furthering their goals. Programs in Mexico, Switzerland, and Saudi Arabia are small but effective thanks to the dedicated work of individual volunteer educators whose drive to use Wikipedia in their own classrooms has furthered their language Wikipedias. Still others are just getting started, and many are exploring opportunities to collaborate with governmental bodies who work on creating curriculum and education policy to include Wikipedia and other Wikimedia projects.

“Education” is a broad field, and participants represented programs working with everyone from school-aged children to seniors. Workshop participants discussed the different activities relevant to education programs, and talked about the best way of setting goals for programs as a whole. The Wikimedia Foundation remains committed to supporting education programs worldwide through such support resources as brochuresa MediaWiki extension, and online trainings. Workshop participants agreed that developing a better system to share experiences across countries — perhaps a searchable database of learnings — would help programs learn from each others’ mistakes and determine the best path forward for their own programs. With more than 30 programs in operation worldwide, the future is bright for Wikimedia projects and education.

LiAnna Davis, Wikipedia Education Program Communications Manager

FLOSS internship programs as catalysts for richer community collaboration

OPW's robocats happy to work on their first contributions.

OPW’s robocats happy to work on their first contributions.

These days we are welcoming a new wave of candidates for Google Summer of Code and FOSS Outreach Program for Women (OPW) internships. Interested? Stop reading and hurry up! Or keep reading to learn why these free software mentorship programs are doing so much good.

Since 2006, Wikimedia has mentored 32 GSoC students. From those, only one (3.13%) was a woman (accepted in 2011), and she didn’t stick around. This number is even lower than the general percentage of women accepted in GSoC 2012 (8.3%) although perhaps it is in line with the composition of our own tech community (data missing). Can we do better?

We think we can. This is why we joined OPW last November. It was the first round open to organizations other than the GNOME Foundation, founders of the initiative. After 5 rounds of OPW, GNOME women are not an exotic exception anymore. It is too soon to evaluate results in the Wikimedia tech community, but the six interns we got during the 5th round delivered their projects in the areas of software development, internationalization, UX design, quality assurance and product management, and so far they are sticking around. We also learned some lessons that we are applying to the next internship programs. As we speak, several women are applying for Wikimedia in the current GSoC edition. A promising trend!

But there is more positive change. Paid internships are like subcutaneous injections for a free software community: in just one shot you get a full time contributor dedicated to help you within a defined scope and amount of time, with the incentive of a stipend ($5,000). The lives of the injected contributors change in the new environment. They learn and they adapt to new situations. They acquire a valuable experience that will help them becoming experienced volunteers and better professionals. At least this is the goal. But the life of the community receiving the injection also should change for good with the arrival of these full time contributors. This is also the goal. So what has improved so far in our tech community?

Scaling up complex projects

Mentorship programs require a good alignment of project ideas supported by the community and by available mentors. Thanks to the efforts of many, we have now a list of possible projects, including a selection of featured project ideas ready to start. The list includes proposals coming from different Wikimedia projects, Wikimedia Foundation-driven initiatives and MediaWiki features for third parties.

These project ideas link to Bugzilla reports in order to keep track of the technical discussion, involving the candidates, the mentors and whoever else wants to join. Full transparency! We also provide basic guidelines for candidates willing to propose their own projects.

All this has been done for the current GSoC and OPW round, but is potentially also useful in the context of other initiatives like OpenHatch, SocialCoding4Good, or Wikimedia’s Individual Engagement Grants. If you want to propose a technical project that could keep a person or team busy for 3–4 months, now you know where to start.

Improving our Welcome carpet

We are still learning how to attract newcomers.

We are still learning how to attract newcomers.

Each mentorship program brings a wave of newcomers willing to get up to speed as soon as possible. We are betting on the “the medium is the message” approach, giving as much importance to the proposals as to the participation and collaboration of the candidate in our regular community channels. But all this requires better landing surfaces in mediawiki.org.

This pressure and the repetition of similar questions by newcomers have encouraged the creation or promotion of references such as Where to start, How to contribute and Annoying little bugs. We keep working on an easier introduction to our community through the fresh and work-in-progress Starter kit, a team of volunteer Greeters and other initiatives discussed at the new Project:New contributors. And you know what? Several former interns are involved!

Diversity enters our agenda

We believe that “a healthy mix of demographic and cultural characteristics everywhere throughout the movement is key to Wikimedia’s success.” Diversity is good for creativity and sustainability, which are primary goals of any free software community. Yet diversity in these communities tends to be quite limited, and our case is not an exception.

We have mentioned the problem of male predominance, but there are other biases and types of discrimination that we would like to help leveling. What about working on other barriers caused by abilities, age, language, or cultural, ethnic, or economic background? Just like we are doing with OPW, we can start with programs for specific audiences that we can sync with mainstream activities like GSoC, increasing their diversity. Ideas are welcome.

Quim Gil, Technical Contributor Coordinator (IT Communications Manager)

Join the Language Mavens!

Among the Wikimedia projects, Wikipedia has the highest number of individual language projects — 285. The Language Engineering team focuses on building language tools and assets that improve the ability to interact with any article on Wikipedia. Language assets like fonts and input methods are integrated into MediaWiki and its extensions, and our wikis are localized using collaborative translation with translation tools to ensure a decent user experience.

Collaboration in Language Projects and the Language Maven Program

Language Engineering community meetup during GNUnify 2013 at Pune, India

Language tools are constantly evolving to ensure support for our users. It is a slow if not impossible task to scale our small engineering team to support hundreds of languages without close collaboration with our language communities, which have many capable and technically-savvy editors and administrators.

The Wikimedia Language Engineering team has compiled a proposal for the formation of a special interest group named the Language Mavens. With members from various language communities from around the globe, we hope to learn from our users, seek advice, guidance and validation on language features. We hope that the Language Mavens will pull in participation from community members and experts who care about language support features and their adoption in the wikis they read and contribute to.

Getting started with the Maven Program

The Language Maven pilot was rolled out earlier this month on April 13 with a meeting that was well attended. Program scope and activities were discussed. One of the recommendations was to ensure that documents and handy checklists be prepared for easy reference to the language tools available to each language community. Activities that the Mavens can participate in include usability tests, bug triages, testing days and even blogging to share valuable insights about the internationalization tools in their favorite language wiki projects.

The Mavens program is aimed to focus on collecting feedback and providing support for language tools and assets being deployed by the team. This will help develop a long-term user group that will be instrumental in helping other language community members learn more about the latest language features and tools being rolled out. The Maven team expects to meet once every month and communicate through the mediawiki-i18n mailing list. To participate as a Language Maven, please fill up this form to let us know about your interest or ping me (runa at wikimedia dot org) for any questions!

Help us make your language experience better — join the Mavens!

Runa Bhattacharjee, Outreach and QA coordinator, Language Engineering

Try the new login and account creation on Wikimedia projects

An account creation and login process that is simple and pleasurable to use is a must-have for engaging more contributors to Wikimedia projects. On just Wikipedia’s English-language version, more than 3,000 people sign up for an account on an average day. These interfaces are often the first time a new editor interacts with the site, beyond consuming content.

We’re happy to announce that, starting today, users of all Wikimedia projects will be able to try a new look for our account creation and login. For about a week, we’re asking all Wikimedia volunteer editors to give the update a try and help us spot any nagging bugs or errors in translation. We’ll then enable the new forms as the default on all our wikis.

The new account creation (mockup)

The new account creation (mockup)

Help test the new forms

If you’re a current or prospective member of a Wikimedia community, we need your help. Please give the new interfaces a try, report bugs, or leave comments for us on your wiki’s preferred noticeboard.

We’re providing this week-long testing period–instead of simply rolling out the new interface with less advance notice–to get help making sure our localizations are correct and the interfaces will be bug free for the 800 or so wiki communities we support.

Both links above are to our largest and most active community, English Wikipedia, but if you’re a contributor to any other project, you can try out the new forms by simply appending &useNew=1 to either URL on your favorite wiki. You can also find more detailed, step-by-step testing instructions if you’re willing to go a little deeper with testing the forms.

How we got here

The new login (mockup)

The new login (mockup)

The Wikimedia Foundation’s Editor Engagement Experiments team has been optimizing these forms, using weekly controlled tests to measure the impact of our new signup form and iterate on our ideas. (See our original announcement.)

Overall, the results of these experiments were encouraging. Using English Wikipedia as our proving ground, our most successful experiment gained around 800 additional signups over a two week period. The relative increase in conversion was 4 percent, from 28 percent to 32 percent of users successfully creating an account after visiting the signup page. The total number of new users gained will change based on seasonal trends. We also decreased the number of errors which held up users after they submitted the form by 14 percent.

This interface redesign marks the first time MediaWiki core (the platform shared by all our projects) is using the new form styles that we have experimented with in account creation, our new onboarding experience for Wikipedia editors, and in other features. The patterns we’re introducing via the new account creation and login, codenamed “Agora” by the Wikimedia Foundation design team, will now be able to be reused in a more standardized way by MediaWiki developers.

The redesigns we’re introducing to login and account creation are hardly radical. Simple use of typography, color and vertically-aligned form fields are not what could be called bold innovation in design. Nonetheless, we’re extremely happy to be releasing an experience that will make signing up and logging in less of a burden for the many contributors to Wikimedia communities, and thus enable them to create great, free educational resources.

Steven Walling,
Associate Product Manager

The alpha version of the VisualEditor is now in 15 languages

This post is available in 6 languages: English 100% Deutsch German • Français 7%Español 7%Svenska7%На русском языке 7%

English

Today the Wikimedia Foundation launched an alpha, opt-in version of the VisualEditor to fourteen Wikipedias, which follows our release to the English Wikipedia in December. The VisualEditor lets editors create and modify real articles visually, using a new system where the articles they edit will look the same as when one reads them — like writing a document in a word processor.
The VisualEditor is now on 15 language Wikipedias

The VisualEditor is now on 15 language Wikipedias

Editors on fifteen Wikipedias – Arabic, Chinese, Dutch, English, French, German, Hebrew, Hindi, Italian, Japanese, Korean, Polish, Russian, Spanish and Swedish – can now get an idea of what the VisualEditor looks like in the “real world”, so they can give us feedback about how well it integrates with their current editing processes. We also want to get their thoughts on what aspects of development we should be prioritizing in the coming months.

The editor is still at an early stage and is missing significant functions, which we will address in the coming months. Because of this, we are mostly looking for feedback from experienced editors; the alpha VisualEditor is insufficient to really give new volunteers a proper experience of editing. We don’t want to promise an easier editing experience to new editors before it is ready.

As we develop improvements, we will push them live every two weeks to the wikis, allowing you to give us feedback as we go, and tell us what you want us to work on next.

How can I try it out?
The VisualEditor is now available to all logged-in accounts as a new preference, switched off by default, on the fifteen Wikipedias listed above. If you go to your “Preferences” screen and click into the “Editing” section, it will have an option labelled “Enable VisualEditor.”

Once enabled, for each article you can edit, you will get a second editor tab labelled “VisualEditor” next to the “Edit” tab. If you click this, after a little pause you will enter the VisualEditor. From here, you can play around, edit and save real articles and get an idea of what it will be like when complete.

At this early stage in our development, we recommend that after saving any edits, you check whether they broke anything. All edits made with the VisualEditor will show up in articles’ history tabs with a “VisualEditor” tag next to them, so you can track what is happening.

How can I help?
It’s vital that our software is available in the native language of as many of our volunteers as possible. If you speak one of these languages – or any of the other 280 languages that we support, like WelshPunjabiUrdu or Scots Gaelic - please consider looking at the translations and helping us improve them!

We would love your feedback on what we have done so far — whether it’s a problem you discovered, an aspect that you find confusing, the areas you think we should work on next, or anything else, please do let us know.

James ForresterProduct Manager, VisualEditor and Parsoid

Read the rest of this entry »

The Wikidata revolution is here: enabling structured data on Wikipedia

The logo of Wikidata

A year after its announcement as the first new Wikimedia project since 2006, Wikidata has now begun to serve the over 280 language versions of Wikipedia as a common source of structured data that can be used in more than 25 million articles of the free encyclopedia.

By providing Wikipedia editors with a central venue for their efforts to collect and vet such data, Wikidata leads to a higher level of consistency and quality in Wikipedia articles across the many language editions of the encyclopedia. Beyond Wikipedia, Wikidata’s universal, machine-readable knowledge database will be freely reusable by anyone, enabling numerous external applications.

“Wikidata is a powerful tool for keeping information in Wikipedia current across all language versions,” said Wikimedia Foundation Executive Director Sue Gardner. “Before Wikidata, Wikipedians needed to manually update hundreds of Wikipedia language versions every time a famous person died or a country’s leader changed. With Wikidata, such new information, entered once, can automatically appear across all Wikipedia language versions. That makes life easier for editors and makes it easier for Wikipedia to stay current.”

The Wikidata entry on Johann Sebastian Bach (as displayed in the “Reasonator” tool), containing among other data the composer’s places of birth and death, family relations, entries in various bibliographic authority control databases, a list of compositions, and public monuments depicting him

The dream of a wiki-based, collaboratively edited repository of structured data that could be reused in Wikipedia infoboxes goes back to at least 2004, when Wikimedian Erik Möller (now the deputy director of the Wikimedia Foundation) posted a detailed proposal for such a project. The following years saw work on related efforts like the Semantic MediaWiki extension, and discussions of how to implement a central data repository for Wikimedia intensified in 2010 and 2011.

The development of Wikidata began in March 2012, led by Wikimedia Deutschland, the German chapter of the Wikimedia movement. Since Wikidata.org went live on 30 October 2012, a growing community of around 3,000 active contributors started building its database of ‘items’ (e.g. things, people or concepts), first by collecting topics that are already the subject of Wikipedia articles in several languages. An item’s central page on Wikidata replaces the complex web of language links that previously connected these articles about the same topic in different Wikipedia versions.

Wikidata’s collection of these items now numbers over 10 million. The community also began to enrich Wikidata’s database with factual statements about these topics (data like the mayor of a city, the ISBN of a book, the languages spoken in a country, etc.). This information has now become available for use on Wikipedia itself, and Wikipedians on many language Wikipedias have already started to add it to articles, or discuss how to make best use of it.

“It is the goal of Wikidata to collect the world’s complex knowledge in a structured manner so that anybody can benefit from it,” said Wikidata project director Denny Vrandečić. “Whether that’s readers of Wikipedia who are able to be up to date about certain facts or engineers who can use this data to create new products that improve the way we access knowledge.”

The next phase of Wikidata will allow for the automatic creation of lists and charts based on the data in Wikidata. Wikimedia Deutschland will continue to support the project with an engineering team that is dedicated to Wikidata’s second year of development and maintenance.

Wikidata is operated by the Wikimedia Foundation and its fact database is published under a Creative Commons 0 public domain dedication. Funding of Wikidata’s initial development was provided by the Allen Institute for Artificial Intelligence [AI]², the Gordon and Betty Moore Foundation and Google, Inc.

Tilman Bayer, Senior Operations Analyst, Wikimedia Foundation

More information available here:

Some of the first applications demonstrating the potential of Wikidata:

  • http://simia.net/treeoflife/ – a (still very incomplete) “tree of life” drawn from relations among biological species in Wikidata’s database
  • “GeneaWiki” generates a graph showing a person’s family relations as recorded in Wikidata, example: Bach family

Indian WikiWomen celebrate Women’s History Month

(This is a guest post by Ms. Netha Hussein, a Wikipedia contributor from India who regularly contributes to Malayalam Wikipedia, among other projects.)

March 2013 was a busy month for women Wikimedians in India, as we conducted various events, such as edit-a-thons and workshops to celebrate the presence of women in Wikimedia projects. The women Wikimedians, members of the Wikimedia India Chapter and the Access to Knowledge Team, brainstormed about the possible events, which we wanted to conduct to encourage women to participate and to increase the quality of articles related to Indian women in Wikipedias in English and the Indian languages. We decided to conduct the workshops and meetups in various Indian cities, in addition to online edit-a-thons.

Women participants of the Wikipedia Workshop, Bangalore

Women participants of the Wikipedia Workshop, Bangalore

We created a co-ordination page on English Wikipedia and added suggestions for articles to edit. We invited participants to join the edit-a-thon by spreading the word on mailing lists, social media networks and blogs. The Times of India published a feature about the event, which attracted many newbies to participate in it. We also created separate pages for offline events taking place in parallel, and we added a summary of the events to the main page. The participants of the edit-a-thon signed up on the co-ordination page, where we also added the details and status of Women’s History Month events happening in various Indian language Wikipedias.

The inaugural event took place on International Women’s Day (March 8) at Nirmala Institute of Education, Goa. Out of 100 participants who attended the event, 90 were female. Veteran Wikimedians Rohini and Nitika conducted a basic Wikipedia editing workshop. The event also set off the two-day long online edit-a-thon in which fourteen editors participated. Among those who participated in the program were homemakers, students and professionals. Rohini took charge as the Chairperson of the special interest group (SIG) for Gendergap at the Wikimedia Chapter India on the day of the workshop (March 8). She plans to conduct more workshops for women in the future.

Organizers subsequently held a series of events at two venues in Bengaluru and one in Ernakulam. Experienced Wikimedians Pavithra and Nikita Belavate led the workshops in Bengaluru. The workshop also served as an occasion for editors living in and around Bengaluru to meet. The Ernakulam event was aimed at increasing the participation of women in Malayalam Wikipedia and was led by Wikimedian Ditty Mathew. Around 40 women participated in the three edit-a-thons. A Wikipedia Academy with 9 participants was conducted in Hyderabad. Led by Anupama Srinivas, the last of all events took place on 30 March, 2012, in Chennai.

Nikita, who led the Bangalore event, said she was filled with happiness watching the exuberance in the eyes of women participants who edited and saved their edits live on Wikipedia. “This year’s Women’s History month makes me once again believe in the power of women and honing it by empowering them, Wikiwomenising them,” said Nikita.

Participants of the Bangalore workshop organized by FSMK

Participants of the Bangalore workshop organized by FSMK

Vishnu Vardhan, the Program Director of the Access to Knowledge team, was with the WikiWomen throughout the editathon, connecting people, planning events and urging them to contribute. He encouraged his mother, wife and female cousins to contribute to Wikipedia.

“I wish more of us took the initiative of involving the women in our life to share their knowledge on Wikipedia and truly make the Wikipedias the sum of all human knowledge,” he said. Harriet, one of the key organizers of the women’s day events, believes that the Indian Wikimedia community has gained momentum in favor of bridging the gender gap because of this event. She urged the Indian community to follow this success and to increase the participation of women in the Wikimedia movement. Though she could not attend the events in person, she ensured her participation in the edit-a-thon by arranging the logistics, monitoring the coordination page and suggesting changes.

The events had good participation from men as well. Among the 14 participants who signed up on English Wikipedia, 5 were men. In Malayalam Wikipedia, 18 out of the 26 participants who signed up for the online edit-a-thon were men. Dileep Unnikrishan, a male participant of the edit-a-thon, and a fan of Wikipedia, participated in the Ernakulam event because he was curious to find out how Wikipedia works. With women participants, he edited three articles and found it exciting to “be a part of the movement that has brought about a knowledge revolution in the world. The best thing I noticed about Wiki is that it has a peer-to-peer way of organization, which makes it warm and welcoming to newbies like me,” said Dileep.

The Indian WikiWomen are planning to conduct similar events in the future to increase the participation of women in Wikipedia and its sister projects. We are hopeful we will bridge the gender gap in the Indian Wikimedia community by conducting outreach programs, increasing awareness about free knowledge programs among women and conducting action-oriented events targeting women.

Netha Hussain

Catalan Wikipedia hits the 400,000 articles milestone during 35-hour edit-a-thon

This post is available in 2 languages: català  • English

English

The GLAM movement in Catalonia has been very active the past few years. Edit-a-thons and workshops have taken place in all kinds of institutions, but the one that was held this April in Fundació Miró in Barcelona (Catalonia), co-organized by Amical Viquipèdia, was really special: the edit-a-thon lasted for 35 consecutive hours, split in three session. Moreover, during the first hours of the edit-a-thon, Catalan Wikipedia reached 400.000 articles – a magical coincidence that made the event even more special.

35 consecutive hours editing Wikipedia? It IS possible!

Fundació Miró’s Espai 13 is celebrating the 35th anniversary since its creation. Fundació Miró had already collaborated with Wikipedia back in 2011, when they hosted an edit-a-thon about the Catalan artist Joan Miró. But this time Amical Viquipèdia and Fundació Miró agreed to make a huge celebration to commemorate the event: 35 consecutive hours editing Wikipedia.

First session of the Miró Editathon

First session of the Miró Editathon

During that time, around fifty Art and Philosophy university students from all over the country, and around fifteen volunteer Wikipedians, gathered in the workplace to start or expand articles on 300 artists who have exhibited at Espai 13, Fundació Miró’s space dedicated to promoting young artists’ work.

To start the event, we held a press conference at 12am on Friday, April 12th, 2013. The first shift of participants was already prepared to start working on the 300 proposed articles about the Espai 13 artists – and some of those artists were present at the event too, so the students were able to take freely licensed pictures of them and post them to Wikimedia Commons. The 26 Art and Philosphy students who participated in the first turn, plus the 5 volunteer Wikipedians who were there to help them, stayed until 10pm – that is, 10 hours. The second turn comprised a similar number of participants. They worked admirably during the whole night without rest until 10am next day, when the third shift took over and stayed until the end of the edit-a-thon eleven hours later, finishing at 9pm on April 13th, 2013.

The students and the volunteer Wikipedians didn’t just write on Wikipedia – there were parallel activities scheduled in order to get out, relax the mind and get ready for more work on articles. In addition to lunch and dinner at the magnificient gardens of the museum, those activities included a guided visit to the museum at midnight, conferences by Wikipedians, a couple of performances from two of the artists that were being written about, and two yoga sessions –one of them being held at 6am in the morning at Fundació Miró’s balcony, when Barcelona was waking up and the sight was breathtaking.

Catalan Wikipedia reaches 400.000 articles

Nonetheless, the edit-a-thon at Fundació Miró was not the only celebration of the day. As luck would have it, the 400,000th article in Catalan Wikipedia was written during the event. Catalan language is the 75th most spoken language in the world with 11,5 millions speakers, yet Catalan Wikipedia occupies the 15th place by number of articles. Catalan-speaking territories are situated in Spain, France, and Italy, whose languages make a strong influence to its speakers, specially Spanish – most of Catalan speakers are bilingual, knowing Spanish as well.

At 5.23pm, in the middle of a conference about “Open knowledge and the cultural institutions,” a participant announced the good news and we opened champagne bottles in the presence of Barcelona TV, who covered the news live. Catalan National TV also joined the event at midnight and the next day broadcasted a two-minute video about the the event being the longest edit-a-thon ever and the 400.000 articles milestone.

Arnau Duran (User:Arnaugir), member of Amical Viquipèdia
Note: for more information about the edit-a-thon see this page (in Catalan).

Read the rest of this entry »

Wikipedia Adopts MariaDB

This past Wednesday marked a milestone in the evolution of Wikimedia’s Database infrastructure: the completion of the migration of the English and German Wikipedias, as well as Wikidata, to MariaDB 5.5.

For the last several years, we’ve been operating the Facebook fork of MySQL 5.1 with most of our production environment running a build of r3753. We’ve been pleased with its performance; Facebook’s MySQL team contains some of the finest database engineers in the industry and they’ve done much to advance the open source MySQL ecosystem.

That said, MariaDB’s optimizer enhancements, the feature set of Percona’s XtraDB (many overlap with the Facebook patch, but I particularly like add-ons such as the ability to save the buffer pool LRU list, avoiding costly warmups on new servers), and of Oracle’s MySQL 5.5 provide compelling reasons to consider upgrading. Equally important, as supporters of the free culture movement, the Wikimedia Foundation strongly prefers free software projects; that includes a preference for projects without bifurcated code bases between differently licensed free and enterprise editions. We welcome and support the MariaDB Foundation as a not-for-profit steward of the free and open MySQL related database community.

Preparing For Change

Major version upgrades of a production database are not to be made lightly. In fact, as late as 2011, some Wikipedia languages were still running a heavily patched version of MySQL 4.0 — the migration to 5.1 required both schema changes, and direct modifications of data dumps to alter the padding of binary-typed columns. MySQL 5.5 contains a variety of incompatibilities with prior versions, thanks in part to better compliance with SQL standards. Changes to the query optimizer between versions may also change the execution plan for common queries, sometimes for the better but historically, sometimes not. SQL behavior changes may result in replication breakage or data consistency issues, while performance regressions, whether from query plan or other changes, can cause site outages. This calls for a lot of testing.

Compatibility testing was accomplished by running MariaDB replicas outside of production, watching for replication errors, replaying production read queries and validating results. After identifying and fixing a couple of MediaWiki issues that surfaced as replication errors (along the lines of trying to set unsigned integer types to negative values which previously caused a wrap-around instead of an error) we replayed production read queries using pt-upgrade from Percona Toolkit. Pt-upgrade replays a query log against two servers, and compares the responses for variances or errors. Scripts originally developed for our recent datacenter migration to simultaneously warmup many standby databases from current production read traffic helped with rough load testing and benchmarking. Along the way, a pair of bugs in MariaDB 5.5.28 and 5.5.29 were identified, one of which was a rare but potentially severe performance regression related to a new query optimizer feature. The MariaDB team was very responsive and quick to offer solutions, complete with test cases.

Performance Testing In Production

As a read-heavy site, Wikipedia aggressively uses edge caching. Approximately 90% of pageviews are served entirely from the edge while at the application layer, we utilize both memcached and redis in addition to MySQL. Despite that, the MySQL databases serving English Wikipedia alone reach a daily peak of ~50k queries/second. Most are read queries served by load-balanced slaves, depending on consistency requirements. 80% of the English Wikipedia query load (up to 40k qps) are typically handled by just two database servers at any given time. Our most common query type (40% of all) has a median execution time of ~0.2ms and a 95th percentile time of ~50ms. To successfully use MariaDB in production, we need it to keep up with the level of performance obtained from Facebook’s MySQL fork, and to behave consistently as traffic patterns change.

Ishmael views of pt-query-digest data collected via tcpdump for the most common Wikipedia read queries (pdf). The first page of a query shows data from db1042, running mysql-facebook-r3753, the second from db1043 over the same time period, running MariaDB 5.5.30.

Ishmael views of pt-query-digest data collected via tcpdump for the most common Wikipedia read queries (pdf). The first page of a query shows data from db1042, running 5.1fb-r3753, the second from db1043 over the same time period, running MariaDB 5.5.30.

Once confident that application compatibility issues were solved and comfortable with performance obtained under benchmark conditions, it was time to test in production. One of the production read slaves from the English Wikipedia shard was taken out of rotation, upgraded to MariaDB 5.5.30, and then returned for warmup. The load balancer weight was then gradually increased until it and a server still running MySQL 5.1-facebook-r3753 were equally weighted and receiving most of the query load.

Also from the Percona Toolkit, we use pt-query-digest across all database servers to collect query performance data which is then stored in a centralized database. Query data is collected from two sources per server and stored in separate buckets — from the slow query which only captures queries exceeding 450ms, and from periodic brief sampling of all queries obtained by tcpdump. Ishmael provides a convenient way to visualize and inspect query digest data over time. Using it, along with direct analysis of the raw data, allowed us to validate that every query continued to perform within acceptable bounds.

For our most common query type, 95th percentile times over an 8-hour period dropped from 56ms to 43ms and the average from 15.4ms to 12.7ms. 50th percentile times remained a bit better with the 5.1-facebook build over the sample period, 0.185ms vs. 0.194ms. Many query types were 4-15% faster with MariaDB 5.5.30 under production load, a few were 5% slower, and nothing appeared aberrant beyond those bounds.

From there, we upgraded the remaining slaves one by one, before finally rotating in a newer upgraded class of servers to act as masters. The switch was seamless and performance continues to look good. We’ll be completing the migration of shards covering the rest of our projects over the next month. Beyond that, we’re looking forward to the future release of MariaDB 10 (global transaction IDs!), and are continually assessing ways to improve our data storage infrastructure. If you’re interested in helping, the Wikimedia Foundation is hiring!

Asher Feldman, Site Architect