What it takes to parse MediaWiki page titles...in Rust

04:00, Thursday, 23 2021 December UTC

In the UseModWiki days, Wikipedia page titles were "CamelCase" and automatically linked (see CamelCase and Wikipedia).

MediaWiki on the other hand uses the famous [[bracketed links]], aka "free links". For most uses, page titles are the primary identifier of a page, whether it's in URLs for external consumption or [[Page title|internal links]]. Consequently, there are quite a few different normalization and validation steps MediaWiki titles go through.

Myself and Erutuon have been working on a Rust library that parses, validates and normalizes MediaWiki titles: mwtitle. The first 0.1 release was published earlier this week! It aims to replicate all of the PHP logic, but in Rust. This is just a bit harder than it seems...

First, let's understand what a MediaWiki title is. A complete title looks like: interwiki:Namespace:Title#fragment (in modern MediaWiki jargon titles are called "link targets").

The optional interwiki prefix references a title on another wiki. On most wikis, looking at Special:Interwiki shows the list of possible interwiki prefixes.

Namespaces are used to distinguish types of pages, like articles, help pages, templates, categories, and so on. Each namespace has an accompanying "talk" namespace used for discussions related to those pages. Each namespace also has an internal numerical ID, a canonical English form, and if the wiki isn't in English, localized forms. Namespaces can also have aliases, for example "WP:" is an alias for the "Wikipedia:" namespace. The main article namespace (ns #0) is special, because its name is the empty string.

The actual title part goes through various normalization routines and is stored in the database with spaces replaced by underscores.

And finally the fragment is just a URL fragment that points to a section heading or some other anchor on pages.

There are some basic validation steps that MediaWiki does. Titles can't be empty, can't have a relative path (Foo/../Bar), can't start with a colon, can't have magic tilde sequences (~~~, this syntax is used for signatures), and they can't contain illegal characters. This last one is where the fun begins, as MediaWiki actually allows users to configure what characters are allowed in titles:

$wgLegalTitleChars = " %!\"$&'()*,\\-.\\/0-9:;=?@A-Z\\\\^_`a-z~\\x80-\\xFF+";

This then gets put into a regex like [^$wgLegalTitleChars], which, if it matches, is an illegal character. This works fine if you're in PHP, except we're using Rust! Looking closely, you'll see that / is escaped, because it's used as the delimiter of the PHP regex, except that's an error when using the regex crate. And the byte sequences of \x80-\xFF mean we need to operate on bytes, when we really would be fine with just matching against \u0080-\u00FF.

MediaWiki has some (IMO crazy) code that parses the regex to rewrite it into the unicode escape syntax so it can be used in JavaScript. T297340 tracks making this better and I have a patch outstanding to hopefully make this easier for other people in the future.

Then there's normalization. So what kind of normalization routines does MediaWiki do?

One of the most obvious ones is that the first letter of a page title is uppercase. For example, the article about iPods is actually called "IPod" in the database (it has a display title override). Except of course, for all the cases where this isn't true. Like on Wiktionaries, where the first letter is not forced to uppercase and "iPod" is actually "iPod" in the database.

Seems simple enough, right? Just take the first character, call char.to_uppercase(), and then merge it back with the rest of the characters.

Except...PHP uppercases characters differently and changes behavior based on the PHP and possibly ICU version in use. Consider the character (U+1F80). When run through mb_strtoupper() using PHP 7.2 (3v4l), what Wikimedia currently uses, you get (U+1F88). In Rust (playground) and later PHP versions, you get ἈΙ (U+1F08 and U+0399).

For now we're storing a map of these characters inside mwtitle, which is terrible, but I filed a bug for exposing this via the API: T297342.

There's also a whole normalization routine that sanitizes IP addresses, especially IPv6. For example, User talk:::1 normalizes to User talk:0:0:0:0:0:0:0:1.

Finally, adjacent whitespace is normalized down into a single space. But of course, MediaWiki uses its own list of what whitespace is which doesn't exactly match char.is_whitespace().

We developed mwtitle by initially doing a line-by-line port of MediaWikiTitleCodec::splitTitleString(), and discovering stuff we messed up or overlooked by copying test cases too. Eventually this escalated by writing a PHP extension wrapper, php-mwtitle which could be plugged into MediaWiki for running MediaWiki's own test suite. And after a few fixes, it fully passes everything.

Since I already wrote the integration, I ran some basic benchmarks, the Rust version is about 3-4x faster than MediaWiki's current PHP implementation (see the raw perf measurements). But title parsing isn't particularly hot, so switching to the Rust version would probably result in only a ~0.5% speedup overall based on some rough estimations looking at flamegraphs. That's not really worth it, considering the social and tooling overhead of introducing a Rust-based PHP extension as a optional MediaWiki dependency.

For now mwtitle is primarily useful for people writing bots and other MediaWiki tools in Rust. Given that a lot of people tend to use Python for these tasks, we could look into using PyO3 to write a Python wrapper.

There's also generally a lot of cool code in mwtitle, including sets and maps that can perform case-insensitive matching without requiring string allocations (nearly all Erutuon's fantastic work!).

Throughout this process, we found a few bugs mostly by just staring at and analyzing this code over and over:

And filed some that would make parsing titles outside of PHP easier:

mwtitle is one part of the new mwbot-rs project, where we're building a framework for writing MediaWiki bots and tools in Rust the wiki way. We're always looking for more contributors, please reach out if you're interested, either on-wiki, on GitLab, or in the #wikimedia-rust:libera.chat room (Matrix or IRC).

Head shot of Matt Vetter
Matt Vetter

Matt Vetter is Associate Professor of English and affiliate faculty in the Composition and Applied Linguistics Phd Program at Indiana University of Pennsylvania. A veteran instructor with WIki Education, Vetter has been teaching with Wikipedia since 2011 and has published extensively on Wikipedia-based education. His recent book, Wikipedia and the Representation of Reality, co-authored with Zach McDowell, is open access and available from Routledge.

Full disclosure? I’m not a GLAM or IT professional; In fact, I don’t have any formal training in structured data. And while I am well-versed in other Wikimedia projects, especially Wikipedia, my understanding of Wikidata (and structured open data in general) was fairly superficial just a few months ago. All of that changed when I enrolled in Wiki Education’s Wikidata Institute for a three-week class in October of 2021

Despite the fact that I have roughly a decade of experience teaching and researching Wikipedia and Wikipedia-based education, almost everything I learned was new. I began the course with a good understanding of Wikipedia, its parent foundation Wikimedia, and a few of its sister projects, but little knowledge of the how’s and why’s of Wikidata. Completing the course, I gained confidence and competence in the processes and elements of Wikidata, which helped me to better understand the Wikimedia ecosystem as a whole, as well as make contributions to this open knowledge database. But perhaps more importantly, I also learned how and why Wikidata is important to my professional work as a researcher and professor of rhetoric and writing studies, a subfield of English Studies that sits at the intersection of the humanities and social sciences. 

Wiki Education’s professional development courses are valuable in that participants not only learn about and apply technical knowledge in Wikimedia projects, we’re also provided an opportunity to network with others across professional sectors. In this way, taking a course from Wiki Education is a little bit like attending a series of workshops at an interdisciplinary professional conference — a unique opportunity for sure. In fact, hearing from other participants in the class about their own experience with particular tools or uses of Wikidata became one of the most helpful aspects of the Institute — especially as a way to better understand how professionals saw value in Wikidata for different types of work. 

Bringing my own experience as an academic researcher and professor in English Studies meant that, while I didn’t have much of a background in structured data (like other participants who might be working at a library, for instance), I was able to broaden my perspective and the perspectives of others simply by sharing my experience and goals surrounding Wikidata. My initial goals were fairly simple, and align well with the major educational outcomes outlined by Wiki Education: 

1. Identify properties and add qualifiers, ranks, and citations

2. Communicate effectively with the Wikidata community

3. Communicate about issues of equity and systemic bias facing Wikidata

In short, I wanted to better understand how Wikidata works, practice editing and adding items, and learn more about the social element of this community, all while being able to translate some of my understanding of Wikipedia’s systemic biases to this sister project. 

This wasn’t the first time I had taken a professional development course with Wiki Education, however, nor was it my first experience working with them. In the spring of 2021, I completed a separate Wiki Scholars & Scientists course focused on editing Wikipedia articles related to COVID-19. As someone who has been editing and teaching others to edit for nearly a decade (with the help of Wiki Education’s Student Program), much of the material in this previous course was, for me, review. The main value added was getting the chance to see someone else teach Wikipedia editing, as well as being able to set professional time aside to do important editorial work in a professional community. 

The Wikidata Institute was very different, of course, because I had little familiarity with the processes and concepts covered. While I initially felt like a bit of an outsider, this feeling dissipated as I learned to edit and add new Wikidata items, focusing especially on adding information relevant to research and researchers in writing studies. As I became more comfortable editing and adding items, we were also introduced to Wikidata’s SPARQL based query service that lets both humans and bots ask questions about…well just about anything. Learning to author queries in SPARQL was perhaps the most challenging part of this course, but also one of the most rewarding, because it was through the query service that I was able to make a more direct connection to my academic work. 

As an educator and researcher in English interested in Wikipedia and other OERs, I’m a contributing member of the CCCC Wikipedia Initiative, an organization who has made it their goal to “expand Wikipedia’s coverage of topics related to writing research and pedagogy to be comprehensive and current with major conversations in published scholarship.” As part of this work, the Initiative has also formed a related Wikiproject: WikiProject Writing.

Coming into the course, I knew that Wikidata would be important to my work with WikiProject Writing, because I had seen examples of other WikiProjects successfully integrating Wikidata for assessment and project management. However, it wasn’t until I learned how to build Wikidata queries that I realized just how useful Wikidata could be for understanding specific knowledge gaps and biases related to the representation of my field. Two of my more successful queries yielded the following lists of Wikidata items, which I link to below (while also providing the SPARQL). 

Q1.  Instance of human whose main field of study is rhetoric


SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
    SELECT DISTINCT ?item WHERE {
      ?item p:P31 ?statement0.
      ?statement0 (ps:P31/(wdt:P279*)) wd:Q5.
      ?item p:P101 ?statement1.
      ?statement1 (ps:P101/(wdt:P279*)) wd:Q81009.
    }
    LIMIT 100
  }
}

Q2. Instance of human whose main field of study is composition studies


SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
    SELECT DISTINCT ?item WHERE {
      ?item p:P31 ?statement0.
      ?statement0 (ps:P31/(wdt:P279*)) wd:Q5.
      ?item p:P101 ?statement1.
      ?statement1 (ps:P101/(wdt:P279*)) wd:Q2791145.
    }
    LIMIT 100
  }
}

 

While both queries could perhaps be modified to retrieve a larger or more relevant set of data, what surprised me the most about the results was their incompleteness. As someone deeply familiar with writing studies’ history and research, I could immediately see the glaring gap in the numbers of scholars represented in Wikidata. This was a turning point in my understanding because I realized just how much work there is to be done in terms of adding items and improving taxonomies in Wikidata. More importantly, I also realized how academics, scientists, and others with privileged access to research (no matter what field they’re in) need to do a better job at actually sharing that research. Wikidata, like Wikipedia and its other sister projects, is an important method for making that mission more possible.

Interested in taking the Wikidata Institute course Matt took? Visit wikiedu.org/wikidata to enroll.

Image credits: Armineaghayan, CC BY-SA 4.0, via Wikimedia Commons; Matthewvetter, CC BY-SA 4.0, via Wikimedia Commons

Friends’ Newsletter/2021/Issue 03

13:26, Tuesday, 21 2021 December UTC

We are working through bugs on the Wikimedia UK wiki site, so are temporarily posting the winter newsletter here. This issue will be published on the usual Friends’ Newsletter wiki page in the new year.

Welcome

Season’s greetings! Welcome to Wikimedia UK’s Winter Newsletter 2021. As we come to the end of another strange and rather difficult year for many, we are very grateful for the continued support of our donors, members and volunteers. You and our partners have made it possible for us to continue delivering creative and inclusive projects and programmes that open up knowledge, and enable more people to benefit from the Wikimedia projects. During a year of mostly remote working, we have welcomed three new members of staff as well as new Wikimedians in Residence at the British Library, National Institute for Health Research and the University of the Arts London. We took part in the Big Give Christmas challenge for the first time, and were delighted to achieve our fundraising target. We developed a new Equity, Diversity and Inclusion Framework and Action Plan, and have also drafted our new organisational strategy for 2022 – 2025, with input from our community. Thank you for all your support, and I hope you enjoy reading our final newsletter of 2021. The staff team and I are sending you our warmest wishes for a safe festive season. 

Lucy Crompton-Reid

Chief Executive

UK chapter focus

Due to take place in early 2022, we’re running another course of Train the Trainer. Volunteer trainers play a key role in the delivery of Wikimedia UK programmes, helping us to achieve our strategic objectives by delivering Wikimedia project training to new and existing editors across the country.  Demand for training often outstrips staff capacity to fulfil, and we’re conscious that our existing networks do not always allow us to reach all the communities with whom we’d like to work. In the past, we’ve offered our main Train the Trainer programme as a 3-4 day in-person training course, and it has often focussed on training design and pedagogy. This time however, we’re taking a slightly different approach, which we hope will offer more flexibility to our volunteer trainers, and which we have developed in response to feedback from the community, and from partner organisations. The aim of this round of training will be to equip Volunteer Trainers with the skills, experience and resources to deliver a standard ‘Introduction to Wikipedia’, such that would take place at an online editathon or wiki workshop.

Dr Sara Thomas and Bhav Patel outline the content of the Train the Trainer course in the call for participants.

Wikimedia UK works on a three year strategic planning cycle, and we are now developing our new strategy for 2022 to 2025. We’ve held meetings with the staff, board, and wider community to develop the new strategy. Our schedule is aligned with our application deadline for funding from the Wikimedia Foundation, for which we’ll be applying for multi-year funding for the first time. While our strategy is still being finalised, we’re expecting to continue our work in themes such as knowledge equity and information literacy, while also increasing our focus on data and information on both the climate crisis and environmental conservation.

Events and projects you can join

The Connected Heritage project had an excellent turn out in 2021 and will be continuing into 2022. Over 100 participants took part in the free hour long webinars, tailored specifically to those working in the heritage sector. There was also a follow up session for 12 participants who wanted further training. The webinars covered open knowledge, the digital skills gap, digital preservation and how Wikimedia UK is addressing those issues. Participants were provided with access to resources and materials to take back to their organisations, and the opportunity to follow up with the project and engage in partnership. In 2022 there will be more webinars on the 18th January, 2nd February, 17th February and 4th March, and an International Women’s Day Potluck Wikithon on Friday 11th March.

Connected Heritage flyer - 2022 dates.
Connected Heritage flyer – 2022 dates, with Commons images: 1, 2, 3 and 4.

#WikiForHumanRights will return from 15th April through to 14th June 2022. This year’s focus is on the Right to a Healthy Environment. The environmental crisis is getting more complex and humanity needs to make thousands of big and small decisions to address it. As the UN Environmental Program described it, we need to make “Peace with Nature” and protect the human rights of the most vulnerable. Wikipedia and other platforms need to fill the knowledge gaps at the intersection of sustainability and human rights in every context and language. The world needs access to reliable information about the link between environmental sustainability and human rights. We encourage individuals or organisations interested in the campaign to organise activities around the intersecting themes of human rights and the environment.

#1Lib1Ref is back for 2022! Abbreviated from 1 Librarian for 1 Reference, 1Lib1Ref calls on librarians around the world, and anyone who has a passion for free knowledge, to add missing references to articles on Wikipedia. #1Lib1Ref runs every January 15th to February 5th and every May 15th to June 5th.

The #1Lib1Ref campaign video.

Our work in partnership

We’ve helped the National Institute of Health Research recruit a new Wikimedian in Residence. The initial proposal was for a six month placement with the potential to extend. The residents will help the Institute share health information to a larger audience, train staff in editing the Wikimedia projects, and scope out what information the Institute has.

National Institute of Health Research logo
National Institute of Health Research logo, with Commons images: 1, 2 and 3.

We ran an editathon in partnership with the University of Leeds for Black History Month, citing African scholars on Wikipedia. The aim was to increase the representation of African scholars and sources across the Wikimedia projects, with a suite of resources available to the students and staff who attended. 269 citations from African scholars were added to Wikipedia, with 26 articles created, and 116 articles edited.

The Wikimedian in Residence at the British Library, Dr Lucy Hinnie, who is also a project lead for Connected Heritage, has been running Wikimedia workshops for library volunteers. Lucy’s been working on a number of fascinating projects such as the Agents of Enslavement data project which was covered by The Guardian; the Canadian Copyright collection – a significant collection of Canadian photographs that were received between the years 1895 and 1923; the India Office Records project; a Bengali Wikisource collaboration, and the relabelling of historic materials in line with modern thought and vocabulary.

A 1780 engraving of an Englishman selling his mistress into slavery in Barbados. Photograph: The Granger Collection/Alamy
A 1780 engraving of an Englishman selling his mistress into slavery in Barbados. Photograph: The Granger Collection/Alamy in the British Library.

Dr Martin Poulter, Wikimedian in Residence at the Khalili Collections, and Dr Sara Thomas, Scotland Programme Coordinator here at Wikimedia UK, hosted a series of online Wikidata workshops. There was a workshop for beginners, a more advanced workshop, a workshop for the Education sector, and one for the GLAM (galleries, libraries, archives, and museums) sector. Wikidata is much easier to understand and has more interesting and varied content than most databases, so it is a good place to start when considering how knowledge can be represented by computers. It can create interactive educational visualisations on all sorts of topics and adding to Wikidata is already used as a platform for educational assignments. It can give a new lease of life to research outputs by joining them up with other information sources in a connected web.

We led a session for the School Library Association (SLA) to give school librarians a deeper understanding of Wikipedia’s mechanisms and how it strives to improve, so that they can give informed advice to students on how to approach Wikipedia.

To quote Wikipedia on the subject, The Red Book of Hergest is a large vellum manuscript written shortly after 1382, which ranks as one of the most important mediaeval manuscripts written in the Welsh language. It preserves a collection of Welsh prose and poetry, notably the tales of the Mabinogion and Gogynfeirdd poetry. We’ve been working with Jesus College Oxford to upload a substantial number of photographs of the manuscript to Wikimedia Commons. You can see the Commons category here.

Images of the 14-15th c. Welsh language manuscript photographed in 1997 for the project 'Early Manuscripts at Oxford University'.
Images of the 14-15th c. Welsh language manuscript photographed in 1997 for the project ‘Early Manuscripts at Oxford University’. 1.

Our Wikimedian in Residence at the Science Museum Group, has also had their residency extended. They continue to collaborate with the Wellcome Collection to get the Science Museum’s vast collection of high quality images on Wikimedia Commons. The resident is utilising and training the museum’s volunteer network.

The Devil’s Porridge Museum ran the Miracle Worker Research Project, working with remote volunteers to uncover the untold histories of munitions workers at HM Factory Gretna during WW1. They’ve since been making efforts to get the research on Wikipedia, with edithathons in September, October, and November, and the project is set to develop in the New Year.

Further and Higher Education

Our Scotland Programme Coordinator, Dr Sara Thomas, has been working with Edinburgh College on their first ever Wikipedia in the Classroom project. A cohort of around 40 students from the Art & Ethics course worked to create Wikipedia articles on underrepresented artists, learning about open licensing, underrepresentation in the canon, and how this applies to their own practice and reflection.

The Decolonising Wikipedia Network is relaunching, supporting students and staff at University of the Arts London (UAL) to edit Wikipedia through the lenses of anti-racism and decolonisation. This includes (but is not limited to) increasing the visibility and credibility of under-represented and marginalised figures and topics connected to our subject disciplines on Wikipedia. We’ve helped expand the network from the comms department to the arts department. We hope to see the network continue to grow, and add to their incredible successes such as the over 7000 words they have added to Wikipedia articles.

The University of Sussex has a module for students to take part in, called the ‘Education for Development: Aid, Policy and the Global Agenda’ module. The students have been working with articles such as education in Indonesia, and will hopefully be taking their text live in the new year.

Occasionally our staff give guest lectures at universities. Dr Sara Thomas returned to The University of Glasgow to lecture on Wikipedia and Information Management, to undergrad students in Digital media and information studies, and another lecture for the postgrad students in Information management and preservation.

The University of Edinburgh’s Wikimedian in Residence, Ewan McAndrew, continues to do fantastic work. Over the residency Ewan has recruited a number of interns from the student body, most recently the intent at the Library and University Collections, Joshua Jackson. Together they wrote a report of the library and university’s engagement with Wikimedia, which can be found on Commons and offers excellent insight into the value of Wikimedia in higher education. Ewan also liaises with the professors at the university to introduce Wikimedia components into their courses, such as the Reproductive Biomedicine BSc sixth year students workshop, and on the Digital Education MSc course.

UNIVERSITY OF EDINBURGH – Wikimedia and the Library & University Collections Report (2021) 1.

Blog highlights

Our projects and collaborations are many, so while there’s not a post for every activity, the ‘Our news’ page is the perfect home for a more in-depth look at the great Wiki initiatives happening in the UK.

Connected Heritage

The launch of our Connected Heritage project and the three month achievements of the project are live on our blog, with information on how heritage professionals can take part in the 2022 webinars.

Wikimedia UK returns to the office, but trials a new way of working

Our Chief Executive, Lucy Crompton-Reid, lays out our new way of hybrid working. For the time being we are restricting the number of people in the office at any one time to a maximum of six. Alongside this, we are consulting with all staff individually to determine what their working pattern might look like within this hybrid model. For anyone trying to reach us, email is probably still the best route. Since writing this blog post, the risk of the Omicron variant has further restricted our staff from travelling into the office.

Ada Lovelace day 24 hour global editathon

On the 12th October, an international 24 hour editing marathon started in New Zealand to improve the coverage of women in Wikipedia. The relay of volunteering editing reached the UK at 2pm, with an event hosted at the Pankhurst Centre in Manchester as both an in-person and an online event. This blog detailed the editathon for anyone interested in getting involved.

2021 Palestine-Wales editathon

Wikiproject Palestine-Wales was a month-long editathon, which took place in August 2021, between Wikimedia UK and Wikimedia Levant. The event generated a total of 242 new articles. Robin Owain, Wales Programme Manager, details the event.

UK based Punjabi artist opens up his archive

UK based Punjabi writer and photographer, Amarjit Chandan opened up images from his archive. As of 19th June 2021, a total of 471 images have been uploaded to Wikimedia Commons and at least 54 distinct images (11 % of the total images) are being used across languages and projects with the maximum images being used on Punjabi Wikipedia followed by English Wikipedia and Wikidata. More photos followed. 

Talking strategy with Wikimedia UK’s community

We had the pleasure of facilitating a meeting for our community to help shape the future direction of Wikimedia UK. We work on a 3 year strategic planning cycle, and we’re now developing our new strategy for 2022-25. Our Chief Executive gives an overview.

Train the Trainer

As referenced above, we invited expressions of interest in our next round of Train the Trainer, due to take place in early 2022. We are delighted to say that we’ll once again be partnering with Trainer Bhav Patel.

National Institute for Health Research launches Wikimedian in Residence in collaboration with Wikimedia UK

As referenced above, The National Institute for Health Research (NIHR) has recruited a new Wikimedian in Residence for a six month post is part of a pilot to help the evaluate the opportunities for using Wikimedia to support dissemination of NIHR funded research. We spoke to Adam Harangozo about his role.

Join us

We’re very grateful to and proud of the network we’ve built around our chapter. You can support the governance of the charity by becoming a member, donate to us online, or volunteer on some of the projects above.

We’re also on social media if you prefer to chat there, we always appreciate new followers and sharers of our news; Twitter, Facebook, Instagram and LinkedIn.

The post Friends’ Newsletter/2021/Issue 03 appeared first on WMUK.

The final stage of WLM 2021 has begun

15:56, Monday, 20 2021 December UTC

The Wiki Loves Monuments team is very excited to be able to share with you the first results of the national juries of Wiki Loves Monuments 2021.

This year was the tenth year for Wiki Loves Monuments as an international competition, and we had people from 37 countries participating in our photo competition for built heritage.

Of the 4926 uploaders this year, 3204 registered after the start of WLM 2021 and are therefore considered new users. Together they uploaded over 172.974 new images of heritage for use on the Wikimedia projects – and beyond!

The national competitions have now ended their jury process and have submitted their winners to the international jury. The international jury is preparing itself for the final stage of WLM 2021: we hope to be announcing the winners in February 2022, but in the meantime we invite you to look through the national winners that have been publicly announced, and pick your personal favorites.

Tech News issue #51, 2021 (December 20, 2021)

00:00, Monday, 20 2021 December UTC
previous 2021, week 51 (Monday 20 December 2021) next

weeklyOSM 595

11:01, Sunday, 19 2021 December UTC

07/12/2021-13/12/2021

lead picture

Multimap(a)s [1] | © Javier Jiménez Shaw | map data © OpenStreetMap contributors

Mapping

  • TomTom has analysed OSM data in Sri Lanka, Nepal, Philippines, Taiwan and Vietnam. They have posted the areas identified for improvement on MapRoulette.
  • indoor= updates their map data every hour. François2 described in his diary entry how map changes with respect to indoor tagging can be previewed without waiting for an hour.
  • User TreeTracks (writing as LittleMaps) reviewed and analysed the high accuracy of OpenStreetMap’s road surface tags for sealed vs unsealed roads in Victoria, Australia.
  • A tagging scheme has been proposed (de) > en to allow the passenger accessibility attributes from DELFI (de) > en to be translated into OSM mapping. The proposal has been developed by Nahverkehrsgesellschaft Baden-Württemberg (NVBW (de) > en ), OPENER next (de) > en — an mFund project to promote public transport accessibility for people with reduced mobility, and Mentz — a public transport software and service provider.
  • Mapper cytryn proposed (pl) > en a unified tagging scheme for memorial benches in Poland.
  • Florian Lainez has proposed (fr) > en , on Talk-fr, creating a new ‘deprecated and unsupported’ status for tags and establishing a calendar to implement this deprecation. He also started to draft a proposal on the wiki.
  • Voting is underway for the following proposals:
    • inlet=* for tagging the details of where the flow enters a culvert or pipeline, until Tuesday 28 December.
    • amenity=parcel_machine for mapping lockers that are used to store parcels awaiting self-service pick-up, until Wednesday 29 December.

Community

Imports

  • Jacob (user safetygoggles) wants to import the buildings of Morris County, New Jersey, USA, from data provided by Morris County GIS, from whom he claims to have obtained permission for use in OpenStreetMap.

OpenStreetMap Foundation

  • Heikki Vesanto has posted, on the OSM Foundation mailing list for discussion, suggestions for logos to use on the web map. This follows-up an earlier thread, as we reported earlier.
  • The OSMF Board met Wednesday 15 December, by video conference, for the first time after the election. The topics discussed were:
    • Election of officers
    • Etiquette Guidelines
    • Screen-to-screen/face-to-face board meetings in 2022
    • Holiday circular blackout.

    Minutes of the Board meeting have been published on the wiki.

  • At last month’s advisory board meeting, Microsoft gave a presentation about allowing people, who are already logged into a Microsoft account, to improve OSM without logging in again to OpenStreetMap.
  • Guillaume Rischard published the 2021 OSMF Treasurer’s report on the wiki.
  • At last week’s OSMF Annual General Meeting a special resolution was passed by 357 votes to 28. This resolution allows the time spent as an associate member to count towards the qualification period required to be a board member.The 2021 Annual General Meeting wiki page reports in detail about the election and the AGM.

Education

  • Dominik Weckmüller wrote a tutorial on how to create 3D OSM city models with QGIS and Aerialod. As an example, he published the result of Trier, the oldest town in Germany.
  • The newest video in HOT’s YouTube playlist ‘HOT Training Webinar Series’ is for mappers who want to begin editing with JOSM.
  • Maya Lovo wrote in her article, ‘Instead of consuming maps, let’s produce maps!’, on TeachOSM, about how Celeste Reynolds uses OpenStreetMap in the classroom. If teachers are doing similar projects with their students, they should contact Maya Lovo.

Maps

  • Harry Wood has blogged a collection of interesting OSM-related maps from the #30DayMapChallenge.
  • Hauke Stieler has made a QGIS project for an outdoor map based on OpenStreetMap data.

Open Data

  • Gonéo has published (fr) > en a blog post explaining how to integrate an open data dataset with OpenStreetMap.

Software

  • The OSM BTC system, initiated by OsmAnd, has been adapted in order to increase the distribution among more active participants. Instead of calculating payments by changes (changesets), now changed objects in changesets are counted. Minimum number will be 300 per month.
  • OsmAnd version 4.1 has been released for Android and iOS. The Android update includes support for Android Auto, disables the Mapillary layer plugin by default, and adds app shortcuts – for example, to quickly start a recording to upload tracks to OSM. The iOS update includes improved loading of maps.
  • Vladimir Agafonkin has published an article entitled ‘Reimagining projections for the interactive maps era’.

Did you know …

  • [1] … multimap(a)s Compare Map by Javier Jiménez Shaw ? The layers panel offers a large inventory of cartographic layers from around the world assembled by country and allows you to upload your own layers with various formats. A ‘Story Style Tutorial’ describes the functionalities available through the various buttons.
  • … the Map Design Guide, from the State of Minnesota Interagency Map Accessibility Workgroup, which presents a number of best practices to ensure accessibility and usability, including styles and colours?
  • Marble, the online globe for Linux, Mac, Windows and Android with OSM maps?

Other “geo” things

  • With the completion of the Kunming–Singapore railway central route on 3 December, Epic Maps tweeted a map showing the train trip you can now take from Singapore to Lagos, Portugal. Mashable has an article with more details about the journey.
  • Anders Sundell, a political scientist at University of Gothenburg tweeted a map with all the soccer (football) fields in Europe, according to OpenStreetMap. In his analysis he stated: ‘There are more fields in Germany than anywhere else, but per capita, Liechtenstein wins’.

Upcoming Events

Where What Online When Country
京都市 幕末京都オープンデータソン#15:岩倉具視と岩倉村 osmcalpic 2021-12-18 flag
Lyon Rencontre mensuelle Lyon osmcalpic 2021-12-21 flag
Bonn 146. Treffen des OSM-Stammtisches Bonn osmcalpic 2021-12-21 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2021-12-21 flag
京都市 京都!街歩き!マッピングパーティ:第28回 智積院 osmcalpic 2021-12-25 flag
Bremen Bremer Mappertreffen (Online) osmcalpic 2021-12-27 flag
Düsseldorf Düsseldorfer OSM-Treffen (online) osmcalpic 2021-12-29 flag
London Missing Maps London Mapathon osmcalpic 2022-01-04 flag
Landau an der Isar Virtuelles Niederbayern-Treffen osmcalpic 2022-01-04 flag
Berlin OSM-Verkehrswende #31 (Online) osmcalpic 2022-01-04 flag
Stuttgart Stuttgarter Stammtisch (Online) osmcalpic 2022-01-04 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by MatthiasMatthias, PierZen, RicoElectrico, SK53, Strubbl, TheSwavu, cafeconleche, derFred.

SMWCon 2021

00:00, Saturday, 18 2021 December UTC

What happened during the 21st installment of the Semantic MediaWiki conference?

This year the conference spanned three days, each focusing on a specific target audience. The conference kicked off with the Enterprise & Government day, followed by the Developer and Community day, and concluded by the Research and Education day. View the full schedule.

Due to the pandemic, the conference was once again held entirely online, on hopin.com. We hope that next year it will be possible to meet all of you in person.

The Yearly SMW Overview

You can find the slides on Google Slides.

Recordings on YouTube

You can find the recordings of the SMWCon Fall 2021 talks on the SMW YouTube channel.

A good place to start is the excellent keynote by Rich Evans:

Headshot of Deborah Krieger
Deborah Krieger
Image courtesy Deborah Krieger, all rights reserved.

As the Exhibit & Program Coordinator for the Museum of Work & Culture in Woonsocket, Rhode Island, Deborah Krieger organizes the museum’s changing exhibitions, develops programming to accompany those exhibitions, and works on the museum’s permanent exhibits. She also has a master’s degree in Public Humanities from Brown University. In both her academic and professional careers, Deborah had used Wikipedia — a lot.

The Museum of Work & Culture is a division of the Rhode Island Historical Society that tells the story of French-Canadian immigration to the Blackstone Valley and their lives as workers in the area’s textile mills — and it’s also a member of the Smithsonian Affiliates. As part of the Smithsonian’s American Women’s History Initiative, Wiki Education ran a series of Wiki Scholars courses, teaching Affiliates staff how to improve Wikipedia biographies of American women related to their collection. Deborah, as an avid Wikipedia reader, signed up to also become an editor.

“I have used Wikipedia many, many, many times over the years, and have observed as it became a more and more useful and reliable resource and jumping-off point for research and information, so I like to think I brought a Wikipedia user’s enthusiasm as well as professional and academic expertise to the course,” she says.

Perhaps that deep reader experience gave Deborah a better grasp of how to create a well-developed biography article than many new editors come to Wikipedia with — because she brought her first article, on Anne Burlak, from a short article of three or four total paragraphs all the way up to Good Article status on Wikipedia. A Good Article designation is given after an extensive peer review process; fewer than 1% of all articles on Wikipedia reach this status, and it’s extremely rare for a newcomer to achieve it with their first article, as Deborah did.

“The Wiki Education class really helped me learn about the process of editing Wikipedia — most important, how to take an underpopulated article to a Good Article, as my article on Anne Burlak was recently deemed!” she says. “Using the sandbox, responding to editorial comments and suggestions from other Wikipedians… all very helpful as I decide which article to work on next.”

Deborah chose Anne Burlak as the focus of her work as she is featured in the Museum of Work & Culture’s history of local unions. A union organizer and activist, Burlak even inspired a poem by Muriel Rukeyser.

“I thought it would be a great opportunity to study Burlak and collect the disparate sources of information about her on the internet into a place where people interested in labor history could learn about the ‘Red Flame,’ as she was known,” Deborah says.

Since taking the course, Deborah was inspired to host an edit-a-thon, where she and fellow participants helped improve articles related to five women featured in the museum’s recent exhibit, Rhode Island Women Create. Next up for Deborah is to get more involved in Wikipedia, potentially as an article reviewer for other Good Article nominations, since she had such a great experience working with the editor who reviewed her work.

“Since Wikipedia is based on collaboration and reciprocity, I think I could help pay it forward by helping another Wikipedian take their work to the next level,” she says.

Image credit: Swampyank at English Wikipedia, CC BY-SA 3.0, via Wikimedia Commons

A Tale Of Code Review Review

Production Excellence #38: November 2021

21:48, Monday, 13 2021 December UTC

How’d we do in our strive for operational excellence last month? Read on to find out!

Incidents

6 documented incidents last month. That's above the two-year and five-year median of 4 per month (per Incident graphs).

2021-11-04 large file upload timeouts
Impact: For 9 months, editors were unable to upload large files (e.g. to Commons). Editors would receive generic error messages, typically after a timeout. In retrospect, a dozen different distinct production errors had been reported and regularly observed that were related and provided different clues, however most of these remained untriaged and uninvestigated for months. This may be related to the affected components having no active code steward.

2021-11-05 TOC language converter
Impact: For 6 hours, wikis experienced a blank or missing table of contents on many pages. For up to 3 days prior, wikis that have multiple language variants (such as Chinese Wikipedia) displayed the table of contents in an incorrect or inconsistent language variant (which are not understandable to some readers).

2021-11-10 cirrussearch commonsfile outage
Impact: For ~2.5 hours, the Search results page was unavailable on many wikis (except English Wikipedia). On Wikimedia Commons the search suggestions feature was unresponsive as well.

2021-11-18 codfw ipv6 network
Impact: For 8 minutes, the Codfw cluster experienced partial loss of IPv6 connectivity for upload.wikimedia.org. This did not affect availability of the service because the "Happy Eyeballs" algorithm ensures browsers (and other clients) automatically fallback to IPv4. The Codfw cluster generally serves Mexico and parts of the US and Canada. The upload.wikimedia.org service serves photos and other media/document files, such as displayed in Wikipedia articles.

2021-11-23 core network routing
Impact: For about 12 minutes, Eqiad was unable to reach hosts in other data centers via public IP addresses. This was due to a BGP routing error. There was no impact on end-user traffic, and impact on internal traffic was limited (only Icinga alerts themselves) because internal traffic generally uses local IP subnets which we currently route with OSPF instead of BGP.

2021-11-25 eventgate-main outage
Impact: For about 3 minutes, eventgate-main was down. This resulted in 25,000 MediaWiki backend errors due to inability to queue new jobs. About 1000 user-facing web requests failed (HTTP 500 Error). Event production briefly dropped from ~3000 per second to 0 per second.


Incident follow-up

Remember to review and schedule Incident Follow-up work in Phabricator, which are preventive measures and tech debt mitigations written down after an incident is concluded. Read more about past incidents at Incident status on Wikitech.

Recently resolved incident follow-up:

Disable DPL on wikis that aren't using it.
Filed after a July 2021 incident, done by Amir (Ladsgroup) and Kunal (Legoktm).

Create easy access to MySQL ports for faster incident response and maintenance.
Filed in Sep 2021, and carried out by Stevie (Kormat).

Create paging alert for primary DB hosts.
Filed after a Sept 2019 incident, done by Stevie (Kormat).


Trends

November saw 27 new production error reports of which 14 were resolved, and 13 remain open and carry over to the next month.

Of the 301 errors still open from previous months, 16 were resolved. Together with the 13 carried over from November that brings the workboard to 298 unresolved tasks.

For the month-over-month numbers, refer to the spreadsheet data.


Outstanding errors
💡 Did you know:

To find your team's error reports, use the appropriate "Filter" link in the sidebar of the workboard.

View Workboard

Issues carried over from recent months:

Apr 2021 9 of 42 issues left.
May 2021 16 of 54 issues left.
Jun 2021 9 of 26 issues left.
Jul 2021 11 of 31 issues left.
Aug 2021 10 of 46 issues left.
Sep 2021 10 of 24 issues left.
Oct 2021 20 of 49 issues left.
Nov 2021 13 of 27 new issues are carried forward.

Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

headshot of Mike Benjamin
Mike Benjamin
Image courtesy Mike Benjamin, all rights reserved.

In spring 2018, Mike Benjamin was in his third year of graduate school for Occupational Hygiene at the University of Cincinnati. As an assignment for one of his classes, Mike was asked to edit Wikipedia through Wiki Education’s Wikipedia Student Program. Mike enjoyed the experience so much that he knew he wanted to stay engaged with Wikipedia.

Today, Mike teaches his own classes at the University of North Carolina at Charlotte — and for the last three terms, he’s been assigning his own students to edit Wikipedia through our Wikipedia Student Program.

“I think the main difference as an instructor is seeing some of the same reservations and mistakes I made as a student, but now I am able to see that article edits won’t stop after publishing each contribution in order to continually improve each article,” he says. “I was a little afraid to make my first edit as a student, in case I messed up the article. Now I hear that same hesitancy from a good portion of my students (and they sometimes make the first sentence edit with me watching via Zoom). After they submit the edit, they realize that it didn’t crash Wikipedia, so they feel more confident with the next one.”

Both the course he took as a student and now the courses he teaches are supported by both Wiki Education and the National Institute for Occupational Safety and Health (NIOSH), which has a long-standing effort to support content development on occupational safety and health-related topics on Wikipedia. As a student, Mike edited two articles: National Occupational Research Agenda (NORA) and Workplace Health Surveillance.

“I chose NORA as a topic because in my graduate program, we took a lot of Industrial Hygiene coursework, and we were focused on so many topics that we sometimes lost sight of the bigger picture — namely, the focus areas that NIOSH had identified as priority areas for occupational health research (industries and their related health outcomes). It’s important for people to see that such a strategy is needed when limited resources are available, and NIOSH tries to direct those resources responsibly,” he says. “The other topic, Workplace Health Surveillance, seemed to combine multiple topics together, and editing was needed. I’m not sure how much improvement I provided on that one, but it was still a good learning experience.”

Mike kept editing after the end of his coursework, adding information about safety to several articles throughout 2019. When he started teaching at UNC Charlotte in 2020, his courses were cross-listed as both undergraduate and graduate courses, so Mike immediately thought that adding the Wikipedia assignment as the additional work needed for the graduate level was a perfect fit. The NIOSH community, including longtime NIOSH-Wikipedia advocate Thais Morata and NIOSH Wikimedian in Residence John Sadowski, offered support. And Wiki Education’s framework and training have also been helpful, as well as the course Dashboard, which shows page views.

“It’s easy to get a sense of satisfaction when you know that people are reading the information you provided in your article,” he says.

Mike’s program attracts many international students. To make the assignment more meaningful to them, he’s opened the possibility of translating course-related Good Articles from Wikipedia into their native languages, including Arabic and Spanish.

“I believe this has increased the motivation of many of our international students, since the students demonstrate the value of their language in a translation, learn how to write formally in their language, and provide service to their home communities, where occupational health and safety information may be lacking,” he says. “After their page is published, they can look at their article ‘statistics’, like page views, which provides positive feedback for them. In short, I think that this effort is a respectful effort for inclusion and sharing of ideas in our program, which sometimes gets overlooked in the engineering disciplines.”

And his students and Wikipedia’s readers aren’t the only ones getting something out of writing for or translating Wikipedia articles.

“I learn new things as part of the assignment, too,” he says. “The translations are pretty cool actually. For example, what do you do when there’s not a direct translation for an English word? I had a few students argue in class about whether a translated word was correct or not, but nobody knew the ‘correct’ answer. I usually have to ask for assistance — a contact at NIOSH for Arabic translations and a colleague in my department who is fluent in Spanish. English speakers can learn more than they expect about translations (and how dialects create a challenge) and appreciate some of the language differences.”

This engagement with students is part of what Mike values most about running a Wikipedia assignment, and why he keeps doing it every term. He’s already signed up to teach with Wikipedia again in spring 2022.

To teach with Wikipedia, visit teach.wikiedu.org.
Image credit: Bz3rk, CC BY-SA 3.0, via Wikimedia Commons

Tech News issue #50, 2021 (December 13, 2021)

00:00, Monday, 13 2021 December UTC
previous 2021, week 50 (Monday 13 December 2021) next

weeklyOSM 594

10:56, Sunday, 12 2021 December UTC

30/11/2021-06/12/2021

lead picture

Impact of UN Mappers – IIEP UNESCO & Madagascar Ministry of Education’s mapping campaign [1] | © IIEP UNESCO & Madagascar Ministry of Education | map data © OpenStreetMap contributors

Breaking news

  • At the OSMF general meeting on 12 December, chair, Allan Mustard announced the results of the election to the board. Guillaume Rischard, Amanda McCann, and Mikel Maron were re-elected. Roland Olbricht joins them as a new member. 742 ballots were cast. The detailed results are available.

Mapping campaigns

  • [1] The UN Mappers – IIEP UNESCO mapping campaign, in collaboration with the Ministry of Education in Madagascar, brought together almost 900 volunteers to advance education planning: 9845 km of roads were mapped around the Vakinankaratra region in Madagascar. 97% of schools are now connected to road network data, compared to just over half at the beginning of the project. The road network produced is processed using the IIIEP-UNESCO school catchment plugin to assess travel times children undertake every day from home to school, and to design better interventions.

Mapping

  • darkonus gave instructions explaining how to install JOSM on a Mac with Apple Silicon.
  • Forteller shared the adventure of surveying by bike and then mapping more than 100 parks in Oslo, which led to an opportunity to discover new places.
  • Requests have been made for comments on the following proposals:
    • amenity=library_dropoff for mapping a place where library patrons can return or drop-off books, other than the library itself.
    • ele=* to allow the use of any of the documented height units for tagging elevations.

Community

  • OpenMapChile reported on their progress on the mapping of urban trees, specifically in the city of Valdivia, where they have achieved a total of 15,062 trees added.
  • OpenStreetMap Belgium’s Mapper of the Month for December is d1sr4n from Russia.
  • TechnicallyNotDeaf, a beginner contributor from Victoria, Australia, outlined what they have learnt from their first week of mapping.
  • User pedr0faria was the winner of the second edition of the UN Map Marathon, the event organised by UN Mappers to contribute to UN projects through mapping in OpenStreetMap.

OpenStreetMap Foundation

  • Michael Spreng, on behalf of the Membership Working Group, indicated that during the election season, which often leads to huge volumes of messages, contributors are limited to one message a day on talk-osmf.

OSM research

  • A scientific article discussed (fr) > en the contribution of cartography to the real and the virtual, including the case of mapathons that make it possible to make real the presence of people missing from maps other than OSM.

Maps

  • Yves noted having added the remaining missing functionality of the OpenSnowMap website to the mobile-friendly version, it’s time to de-commission the older version. In case you feel lost, you can still find the old version (at least for a while) from the menu option ‘LEGACY OPENSNOWMAP.ORG’

Open Data

  • Michael Cieslik demonstrated how open elevation profile data can be used to compare buildings heights with LOD2 style 3D renderings of OSM data. The discussion in Michael’s tweet (de) focuses on the differences in the datasets and potential means of synchronisation.

Software

  • While discussing the topic of software governance Roland Olbricht pointed out that the Overpass API has a low bus factor. mmd added (de) > en that the existence of their fork doesn’t necessarily increase the bus factor and things other than the number of developers can help a project survive in the long term.
  • The development team behind the OSM Welcome Tool recently wrote instructions on how to use the tool (available on the tool’s wiki page) and are looking for feedback from new users.
  • Robhubi explained (de) > en the workflow they use to import data from GIP to compare with OSM. The Graph Integration Platform (GIP) is Austria’s online system for collecting and sharing transport route information.
  • The Russian internet company VK (the russian Facebook ex-MailRu Group) has deployed an overpass turbo website. They have also published (ru) > en a tutorial on how to use it.

Other “geo” things

  • Anonymaps informed us of an exciting opportunity to work for Google in return for no pay.
  • The Bucharest court confirmed [ro] > en the decision of a chief prosecutor of DIICOT (Direcţia de Investigare a Infracţiunilor de Criminalitate Organizată şi Terorism) to reopen the criminal investigation in a case in which the Romanian Patriarchate filed a criminal complaint after the location and name of the ‘Cathedral of the Salvation of the Nation‘ on Google Maps was changed to the ‘Cathedral of the Stupidity of the Nation’.
  • The Department of Geoinformatics at Heidelberg University is seeking (de) a part-time research assistant (m, f, d) as soon as possible to develop methods for generating spatially high-resolution CO2 emission inventories using methods from Spatial Data Science and Machine Learning (esp. Deep Learning) in the context of the GeCO project.
  • Google Mexico has proposed a crowdsourced project for mapping all sorts of street stalls, which account for 50% of the businesses in the country. These businesses are not visible on the map, and Google Mexico took notice of the situation from custom maps prepared by a freelance data analyst.

Upcoming Events

Where What Online When Country
建设街道 长株潭区域作业绘图后续修正 osmcalpic 2021-12-10 – 2021-12-15 flag
Grenoble OSM Grenoble Atelier OpenStreetMap osmcalpic 2021-12-13 flag
臺北市 OSM x Wikidata Taipei #35 osmcalpic 2021-12-13 flag
OSMF Engineering Working Group meeting osmcalpic 2021-12-13
Toronto OpenStreetMap Enthusiasts Meeting osmcalpic 2021-12-14
Washington MappingDC Mappy Hour osmcalpic 2021-12-15 flag
20095 Hamburger Mappertreffen osmcalpic 2021-12-14 flag
Derby East Midlands OSM Pub Meet-up : Derby osmcalpic 2021-12-14 flag
Reunión mensual de la comunidad española osmcalpic 2021-12-14
Decatur County OSM US Mappy Hour osmcalpic 2021-12-16 flag
京都市 幕末京都オープンデータソン#15:岩倉具視と岩倉村 osmcalpic 2021-12-18 flag
Lyon Rencontre mensuelle Lyon osmcalpic 2021-12-21 flag
Bonn 146. Treffen des OSM-Stammtisches Bonn osmcalpic 2021-12-21 flag
Lüneburg Lüneburger Mappertreffen (online) osmcalpic 2021-12-21 flag
京都市 京都!街歩き!マッピングパーティ:第28回 智積院 osmcalpic 2021-12-25 flag
Bremen Bremer Mappertreffen (Online) osmcalpic 2021-12-27 flag
Düsseldorf Düsseldorfer OSM-Treffen (online) osmcalpic 2021-12-29 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by PierZen, SK53, TheSwavu, cafeconleche, derFred, Can.

Wikidata user and project talk page connection graph

09:23, Sunday, 12 2021 December UTC

Talk pages are a pretty key part of how wikis have worked over the years. Realtime chat apps and services are probably changing this dynamic somewhat, but they are still used, and also most of the history of these pages is still recorded.

I started up an IPython Notebook to try and take a look at some of the connections between different users on Wikidata over the years. Below you’ll find a few representations of these connections, as well as notable things I spotted along the way, the generating code, SQL query and more!

The data

MediaWiki maintains links tables for all pages, so getting all of the current links out of Wikidata is very easy. I made use of the Wikimedia Cloud Quarry service to run this query and host a CSV of the results.


SELECT SUBSTRING_INDEX(page_title, '/', 1) AS t1, pl_from_namespace AS t1ns, SUBSTRING_INDEX(pl_title, '/', 1) AS t2, pl_namespace AS t2ns FROM pagelinks, page WHERE pl_namespace IN (3,5) AND pl_from_namespace IN (3,5) AND page_id = pl_from AND page_title != pl_title GROUP BY t1, t2
Code language: PHP (php)

I then loaded this data directly into an IPython Notebook and did some cleaning, such as removing all IP addresses. I then spent quite some time applying more filtering and twiddling knobs to try and get some graphics out that are easy to look at. The first attempts looked like solid blobs as you can see in this tweet.

You can find a copy of the Notebook on notebooksharing.space.

The Graphs

For all of these graphs, edges are relationships between user talk pages and project talk pages on Wikidata. Edges occur if their talk pages (or subpages) are linked. Various filtering is then applied (see notebook) to visually show the graph in a nice way.

The first graph tries to show as much of the community as possible. Generally speaking, any page names, be that user name or project page names, that are toward the middle of the graph have the most connections to other nodes. This centre section includes many long time Wikidata users, as well as key project pages such as “Request for comment”, “Property Proposal”, “Notability” and more.

Each edge must connect to a node with 200 other potential edges, and all nodes must have at least 25 potential edges. Everything else is hidden.

The next graph moves toward highlighting the hubs of these link graphs, now requiring hubs with 900 links rather than 200. 10 or so very well linked users pop out at this point.

The names that appear within the centre of these nodes probably make us a core part of the community over the years.

Each edge must connect to a node with 900 other potential edges, and all nodes must have at least 10 potential edges. Everything else is hidden.

The final graph focuses on these key hubs once again, filtering out the rest of the cruft. We see that there are 5 hubs that have over 1500 potential edges.

There are now also some key connectors between these hubs that can be easily identified in the middle, even if some of the names are hard to read.

Each edge must connect to a node with 1500 other potential edges, and all nodes must have at least 5 potential edges. Everything else is hidden.

The post Wikidata user and project talk page connection graph appeared first on addshore.

New Wikimedia Code of Conduct

Remedial Skills In Open-To-The-Public Working Groups

Design, and Friction Preventing Design Improvement, in Open Tech

Inclusive-Or: Hospitality in Bug Tracking

Grief

UTC

Grief

Leadership Crisis at the Wikimedia Foundation

What Should We Stop Doing? (FLOSS Community Metrics Meeting keynote)

Comparing Codes of Conduct to Copyleft Licenses (My FOSDEM Speech)

How To Improve Bus Factor In Your Open Source Project

The Triumph Of Outreachy

Join Me In Donating to Stumptown Syndicate and Open Source Bridge

How I made a tidepool: Implementing the Friendly Space Policy for Wikimedia Foundation technical events

The Continuing Adventures (Transitioning From Intern To Volunteer)

The next Tor, role models, and criticism: the future I want

I'm Leaving My Job At The Wikimedia Foundation

Case Study of a Good Internship