I just published a VS Code language extension to support syntax highlighting for Stuttgart Finite State Transducer (SFST) formalism to VS Code. Download and install the extension Source code I learned how to write a language extension when I attempted the opentype feature file support. So I thought of applying that learning to SFST which I regularly use for the Malayalam morphology analyser project.

Wikimedia’s Event Data Platform, or JSON is ok too

18:10, Thursday, 10 2020 September UTC

By Andrew Otto, Staff Site Reliability Engineer

In the past few years, event-driven architectures (EDAs) have been getting a lot of attention. When done right, they help fulfill some of the promises of service-oriented architectures (SOAs) while mitigating many of their headaches. Event sourcing, coupled with complex event processing in particular, allows data to be materialized into services in the format they need when they need it. This recent data architecture trend can mostly be attributed to the success of Apache Kafka, which enables these kinds of architectures at scale. Confluent, founded by the creators of Kafka, have done an amazing job writing about and evangelizing EDAs on their blog, so we won’t try to expound on them any further. Instead, this 3-part series will focus on how Wikimedia has adapted these ideas for our own unique technical environment.

The Wikimedia Foundation has been working with event data since 2012. Over time, our event collection systems have transitioned from being used only to collect analytics data to being used to build important user-facing features. A few years ago, we began an effort to refactor our special-purpose analytics events system to one that closely resembles what Confluent calls a ‘stream data platform.’ However, even though most of Confluent’s components are free to use, we chose to implement some of our own. This article will first explain this decision and also briefly describe the Open Source components we developed to build Wikimedia’s Event Data Platform.

Confluent Community License

When Wikimedia began designing our event platform, all of Confluent’s free components were Open Sourced under the Apache 2.0 license. At the end of 2018, Confluent switched all its free software to a new custom Confluent Community License, which is not strictly a Free Open Source software license. One of the Wikimedia Foundation’s guiding principles is to use Free Open Source software in the software and systems we develop. Our decisions to avoid Confluent components were not originally influenced by the CCL—we began our system design process before they changed licenses—but in hindsight they proved to be good choices. (Aside: we did want to use Confluent’s Kafka Connect HDFS and now can’t due to the CCL, but I’ll save that story for another day.)

Event schemas

The main differences between Wikimedia’s and Confluent’s event stream platforms are all related to event schemas. In an event-driven architecture, events must conform to a schema in order for the various disparate producers and consumers to be able to communicate reliably. Schemas define a contract for the data used by many services, just like an API specification would in typical SOAs. They are a required component of building dependable and extensible EDAs.

What about Avro?

Confluent’s stream data platform uses Apache Avro for event schemas and for in-flight data serialization. Avro is a natural choice for organizations that are already heavily JVM-based. Avro is language-agnostic, but the software ecosystem around it is oriented towards the Java virtual machine. In addition to needing an Avro implementation in a language to use Avro data, an Avro library also needs an Avro schema in order to serialize and deserialize Avro binary data records. Outside of a streaming context, Avro data is usually serialized in binary files. These files contain a schema header which instructs Avro readers how to read the data records in the rest of the file.

However, in a streaming context, where every message in Kafka is a single data record, how can a consumer get the Avro schema it needs to deserialize the binary record? Confluent solves this problem with its Schema Registry. The Schema Registry is a centralized service that stores Avro (and as of April 2020, JSONSchema and Protobuf) schemas. To produce an Avro message, you must use a custom Confluent serializer that wraps the binary Avro message with an envelope containing an opaque numeric schema id registered with your running installation of Schema Registry. When consuming the message, you must also use a custom Confluent deserializer that knows how to unwrap the message payload to read the schema id, and then look up that schema id from the Schema Registry.

That’s a lot of service coupling just to produce or consume data. Any time you want to produce or consume any data from Kafka, you must have on hand a Confluent client implementation that is configured properly to talk to the running Schema Registry service. This coupling might be acceptable in a closed JVM shop, but it is untenable for the Wikimedia Foundation.

Making choices

Wikimedia’s software development model is decentralized. Volunteer developers must be able to contribute to our software just as Wikimedia staff does. We try to build forkable software, data, and systems. For example, our production helm charts and puppet manifests are Open Source. If someone tried hard enough, they could reproduce the entirety of the Wikimedia technical infrastructure. Using Avro and the Confluent Schema Registry would make this difficult. Yes, we could find ways of transforming data and formats for public consumption, but why make our lives harder if we don’t have to?

JSON is arguably more ubiquitous than Avro. But it is only loosely schemaed. The name of every field is stored in each record, but types of those fields are not. JSONSchema is commonly used to validate that JSON records conform to a schema, but it can also be used to solve data integration and conversion problems (AKA ETL) as long as the schema maps well to a strongly typed data model.

In 2018, we went through a Wikimedia Technical RFC process to decide if we should commit to the Avro route or stick with JSON, and ended up choosing JSON mostly due to the fact that most consumers of event data would not need any special tooling to read the data. However, by choosing JSONSchema instead of Avro, we walked away from Confluent’s Schema Registry, as at the time it only supported Avro schemas. We also missed out on a really important feature of Avro: schema evolution

We were left needing to implement for JSON and JSONSchema two features that are built into Confluent’s default stream data platform components: Schema evolution and schema distribution. We noticed that we weren’t the only ones that needed tools for using JSONSchemas in EDAs, so we decided to solve this problem in a decentralized and open sourced way. Schemas are programming language-agnostic data types. We felt that they should be treated just like we treat code, so we chose to distribute schemas using git. We also chose to enforce compatible schema evolution just like we’d check code changes: using code tests. The outcome was jsonschema-tools, a CLI and library to manage a git repository of semantically versioned JSONSchemas.

We also needed an easy way for developers to produce valid event data into Kafka. Confluent has a Kafka REST Proxy which integrates with their Schema Registry to do this. We wanted a generic event validating and producing service that would work with decentralized JSONSchemas. EventGate was born to do just that.

Next up

The next post in this series will explain how we manage JSONSchemas for streaming event data using jsonschema-tools. After that, expect a post on how EventGate interacts with a schema repository to validate and produce events, as well as Wikimedia’s specific EventGate customizations and features.

About this post

This post is part of a three part series. Check back next week for the next post!

Featured image credit: Dobson Stream by Wharfedale hut with the moon, Mt Oxford area, Canterbury, New Zealand, Michal Klajban, CC BY-SA 4.0

By Sara Thomas, Scotland Programme Coordinator.

At the beginning of the lockdown period in the UK, the Programmes Team, along with our network of Wikimedians in Residence, got to work on exploring how we could continue to support our community and partner organisations. We discussed ways in which we could offer engagement with the Wikimedia projects as lockdown activities (the National Library of Scotland’s work on WikiSource is an excellent example of how this worked out) for both staff and in terms of public engagement, as well as in the fight against mis- and disinformation surrounding COVID-19.

Back in November we had engaged a new trainer, Bhav Patel, to deliver our Train the Trainer course in Glasgow, Scotland. In light of the success of that programme, we decided to approach him again.

We held two brainstorming sessions, inviting all of our existing trainers to attend. Turnout for these was encouraging, and we were happy to see some familiar faces as well as some individuals whom we had not seen for a while. These sessions were intended to capture both appetite for training, and training needs. We anticipated that as we were approaching existing trainers that there may be less demand for content which pertained to design, however this was not wholly the case. We were also conscious that the high standard of Bhav’s training in the past concerning training design might be beneficial for our existing volunteers. There was significant interest in how to convert existing training methods to the online format, the “tips & tricks” of using specific tools, and liaison with partners in advance to properly assess training needs. There was also some trepidation around how to give individual support over video conferencing, and adapting to feedback in-session in the context of having no physical feedback from the room.

Following these sessions, and further discussions with Bhav, we held two sets of three sessions, delivered at varying times of day to accommodate the existing commitments of volunteers. The first of these sessions, led by Bhav, was titled “Going Online”, and focussed on the move from onsite to online design and delivery of events, taking a platform-agnostic approach. The second was “Tools, Tech & Event Management”, focussing on tips, tricks & a variety of online conferencing and supporting tools, as well as overall event management, and was led by myself.  The third session was an opportunity for practise and feedback, offered to all participants as a way to test out new tools and ideas they had for leading online training. In total, 14 trainers attended across the six sessions, with positive feedback. The two training sessions were recorded for future use, and a pool of accompanying resources created. These were sent to all participants, to the wider volunteer trainer pool, and are being kept as a resource for partner organisations, Wikimedians in Residence, and future volunteers.

Reflecting on this process, we noted a few things. Firstly, that just because we deal with an online resource does not mean that we are automatically prepared to deliver online. Secondly, that the online space opens up opportunities as well as presenting barriers – trainers’ geographic location matters less, for example. Thirdly, that we were able to engage volunteers with a variety of time commitments, from a wider demographic, and that the pressures on their time likely reflect those of our various audiences. Lastly, that ongoing investment in existing volunteers helps to reignite engagement.

Once again, a huge thanks to our volunteer trainers for the time and energy that you give to the movement; it is massively appreciated!

Amama Mbabazi is a former Prime Minister of Uganda and held many other governmental positions. There are 19 Wikipedia articles for him, Mr Mbabazi is notable. 

When you want to find a picture for him and you know his name in Russian, you can use Special:MediaSearch. You can also find him with اماما_مبابازى

One of the positions Mr Mbabazi held is Justice Minister of Uganda. English Wikipedia has a category for these Ministers, at this time two people are included but not Mr Mbabazi. There are ten Ministers missing in the category. Mind you, it is English Wikipedia that has a list that made my work at Wikidata possible!

On the talkpage for the Wikidata item for Justice Minister of Uganda, I added the {{PositionHolderHistory}} template. A bot updates the information every day and it adds comments on the quality of the information. This makes it easy to add positions of interest to a watchlist. 

On my Africa project you find Listeria lists for African political positions. It is duplicated to several Wikipedias and once a Wikipedia is synchronised it will show information like the lists that include Mr Mbabazi. 

One day all the data will be complete and up to date. In the meantime it is a "work in progress" and you are kindly invited to check the information out, find its shortcomings and make updates where necessary.

Thanks,

       GerardM

Opentype feature file support for VS Code

10:14, Thursday, 10 2020 September UTC

I just published a VS Code language extension to support OpenType feature files in the Adobe “AFDKO” format. The extension provides syntax highlighting and code snippet support. (Screenshot From Amiri font) The syntax highlighting patterns for AFDKO is based on the opentype-feature-bundle for Atom Editor by Kennet Ormandy which is based upon Brook Elgie’s original Textmate bundle. The code snippets are based on the snippets prepared by Simon Cozens for AFDKO-SublimeText

Elevating the voices of women in science through Wikipedia

17:10, Tuesday, 08 2020 September UTC

Adriana Bankston is a Principal Legislative Analyst at University of California. She recently took the 500 Women Scientists Wiki Scholars course and reflects on her experience in this guest blog post. This post represents the writer’s personal views and not the views of their employer, University of California.

Adriana Bankston
Adriana Bankston

I’m a former bench scientist who transitioned into science policy, and consequently operates at the intersection between academia and policy. As an academic, I was really interested in the environments in which researchers work. To this end, I advocated for women in science by various means, including by organizing professional development activities with a local affiliate group of Association for Women in Science (AWIS), and participating in local outreach activities locally through 500 Women Scientists and others. In doing these activities, I realized how important it is for girls to have role models in science, and looked for ways in which I could show girls of different ages what a scientist looks like. 

My own PhD advisor was also one of two female faculty in the department. She has since moved on from academia, but she inspired me to be bold and stand out despite the circumstances. She also taught me that it’s really important to give back to the community, and I think that drove a lot of my advocacy for women in science. In my transition from science to policy, I maintained an interest in diversifying the research enterprise, particularly after experiencing first hand how the gender gap in science is greater at higher levels on the career ladder. 

During COVID-19, I sought ways to enrich my own professional development, and happened to see a call for applications for the 500 Women Scientists Wiki Scholars course on social media. I had never edited Wikipedia before, but I was eager to make an impact for women in science. I was really excited when I was accepted into the course. My motivation for enrolling was to learn a new skill, but also through editing pages of women in science, I could make a difference in increasing the visibility of women in science. 

The first Wikipedia article I worked on was Sandra Schmid, whom I had previously met in person and long admired from my days in the laboratory as someone who had made a significant impact in her field. She was also an advocate for improving the research system by changing mentoring and hiring practices, as well as by changing the system through improving postdoc training. In short, she was another role model for me both scientifically and professionally in research, and someone whom I felt more people should know about through Wikipedia.

Sandy’s article was an easy one to start with given I was already familiar with the topic, and I considered it notable. During the course, however, I also challenged myself to learn Wikipedia by picking another kind of article which required a lot of improvements initially (Brooke Borel), so that I would learn multiple editing skills by correcting different types of information that was either wrong or missing. 

Notably, while taking the course, I was able to engage in the #editWikipedia4BlackLives #HackForBlackLives event on June 10, in observance of the #Strike4BlackLives and #ShutDownAcademia #ShutDownSTEM as part of a group of 200 people. Altogether, this group created over 50 articles, edited over 480, added over 100,000 words—and in just a few days, those articles had been collectively viewed over 500,000 times. 

By this point, I had been a few weeks into the course, and felt more confident in my edits. Inspired by this event, I was excited to try my hand at editing articles that I felt could have an impact in the community by elevating voices of People of Color in science. I chose to edit articles on Shirley Malcom, Melina Abdullah and Lori White, which taught me a lot. It was a really powerful experience to participate in this collective day of action, and I was glad to utilize Wikipedia editing skills for this purpose. 

As a scientist, I know that we always look for reputable sources, and in the past I might not have considered Wikipedia as such, partly because academics are mostly trained to use peer-reviewed journal articles of high impact as sources to cite for our work. This course gave me a new appreciation for the power of Wikipedia, ways in which it can impact different cross-cutting communities and identities around the world, as well as how we can leverage its accessibility to increase diversity, equity and inclusion in science and beyond.

In closing, I’m glad I took the course and I appreciate the chance to make a difference for women in STEM through the WikiProject Women Scientists, in particular as it relates to diversifying the research pipeline. To this end, I wanted to point out two pages, one is the WikiProject Women in Red looking to create pages for women in science that don’t yet exist, and the second is the List of African American Women in STEM Fields, both of which are useful tools to utilize for making a powerful impact in science. 

Since the course ended, I have sought to utilize these skills in my current field by editing relevant pages such as a science policy page, in order to educate the community on important issues in this area and how they can contribute. 

Interested in taking a course like the one Adriana took? Visit learn.wikiedu.org to see current course offerings. Another 500 Women Scientists Wiki Scientists course is now enrolling!

Tech News issue #37, 2020 (September 7, 2020)

00:00, Monday, 07 2020 September UTC
previous 2020, week 37 (Monday 07 September 2020) next
Other languages:
Bahasa Indonesia • ‎British English • ‎Deutsch • ‎English • ‎Nederlands • ‎español • ‎français • ‎italiano • ‎magyar • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎മലയാളം • ‎中文 • ‎日本語 • ‎한국어

weeklyOSM 528

10:05, Sunday, 06 2020 September UTC

25/08/2020-31/08/2020

lead picture

The French community maps defibrillators for one month 1 | © Copyright 2020 ProjetDuMois.fr

Mapping

  • [1] Since 4 August the French community has been discussing (fr) > en the project of the month for September (defibrillators) in detail on the mailing list talk-fr. Anyone wishing to participate can now read up on the wiki (fr) > en what to do and which tools are available. The project includes (fr) > en daily evaluations.
  • The Open Maps team from Microsoft announced they will start road editing in Colombia. They will use available street-level and aerial imagery to add where possible but there will be no automatic imports, algorithms or robot edits to improve the map.
  • RobJN published two proposals: ref:GB:usrn=* to refer to the unique identifiers for every street and ref:GB:uprn=* to refer to the unique identifiers for every addressable location in the United Kingdom. The voting for both is open until 13 September 2020.
  • OpenStreetMap hit 90 million changesets. The changeset was done using RapiD by Diana IRM-ED, an Indonesian mapper paid by HOT.
  • Christopher Beddow is concerned about the ‘OpenStreetMap undated imagery crisis’ and he documents it with two pictures on Twitter.

Community

  • In early October, the Wikimedia Foundation’s OSM-based public tile service will be shut down. The reason for this is abuse; the majority of traffic has nothing to do with the Wikimedia projects or comes from commercial websites, writes Erica Litrenta on the Maps-l mailing list of the Wikimedia Foundation.
  • OpenStreetMap US announced that Connect 2020 will be held online from 29 to 31 October. The call for participants is now open and everyone is invited to apply with proposals related to OpenStreetMap.
  • The Unite Maps Initiative seeks a new logo and has announced a competition for one. The deadline for entries is 16 September 2020.
  • In the run-up to the 2020 State of the Map conference, OSM Africa surveyed OpenStreetMap community leaders across the continent. Geoffrey Kateregga looks at the results and offers a deep dive into the state and trajectory of OpenStreetMap communities in Africa.
  • Changeset 90148752 briefly turned Great Britain into a giant beach and was even visible on the map tiles for about an hour before it was reverted.
  • As OSM users of printed maps (es) and OsmAnd Live, the fire fighters in Cordoba, Argentina, requested assistance in mapping roads, waterways and other elements. The community launched (es) a mapathon and in three days they added a lot of information in a very wild area. The firefighters confirmed (es) > en that OSM’s detailed maps help save forests.

OpenStreetMap Foundation

Events

  • Members of the Italian OSM community in the Piedmont region are organising (it) > en an event in the mountains on 19 and 20 September. The weekend includes presentations, mapping and sharing OSM experiences. To participate you must fill in this form (it) > en .
  • The next Geomob, organised as always by Ed Freyfogle and Steven Feldman, takes place on 16 September at 6.00 pm UTC (20:00 CEST) online. Speakers will be Jenny Allen, Maps Team Lead at Elastic; Ian Hannigan, founder of Formation; Olivier Cottray, discussing using GIS for demining and Hanc Naum, founder of PinApp.
  • State of the Map Japan 2020 (ja) > en will be held online on Saturday 7 November. The organisers are calling for volunteers and presentations.

Education

  • All 35(!) worksheets of OpenSchoolMaps (de) > en are now available not only as downloadable PDFs but also as web pages. The materials are available in English as well.
  • A paper released by the Department of Urban Planning and Spatial Analysis, University of Southern California, uses OSMnx to analyse how urban evolution, planning, design, and millions of individual human decisions shape cities.

Maps

  • Frank Schmirler now also offers his ‘Light of the Seas’ in the Portuguese and Russian languages. It is really worth taking a look at this impressive map to explore new shores.

Software

  • Pieter Vander Vennet announced MapComplete on talk, where you can now create your own theme with the ‘new easy-to-use editor’. If you have over 500 changesets, you can create a theme from scratch which can be shared via URL. Please visit GitHub to see some results such as cyclofix or the Open Bookcase Map.
  • Over the past three months DWeaver worked on OSM2World for the Google Summer of Code. The aim of his project was to add support for indoor tagging, allowing the rendering of indoor data in 3D. This includes basic features such as rooms and corridors, windows and doors as well as objects that may span levels such as stairs and barriers. The code that he produced can be seen in these three pull requests: 1, 2, 3.

Releases

  • The GIScience Research Group of HeiGIT, the Heidelberg Institute for Geoinformation Technology, announced that it is now possible to calculate long distance routes using all bike and pedestrian profiles, whilst taking into account restrictions such as avoiding ferries, with the newest update of OpenRouteService.

Did you know …

  • Martijn van Exel published his stream on how to use the Meet Your Mappers tool.

OSM in the media

  • Buzzfeed contributors and reporters have investigated the regions that were blanked out on Baidu’s maps by comparing with other data sources including OpenStreetMap. They have used those locations to find a network of buildings in Xinjiang bearing the hallmarks of prisons and internment camps. They explained their methodology in this article.

Other “geo” things

  • China released zh-cn > en a new version of its standard map, displaying the ‘correct’ map that all map publishers should follow. The standard map shows the Chinese territorial area extending to include Taiwan and disrupted South China Sea, along with the infamous 11-dash-line. When the announcement was reported by Taiwanese media zh-tw > en netizens of Taiwan derided zh-tw the map by saying that China is ridiculous and the imagery map is divorced from reality.
  • Mapillary announced that anyone can now access their map features directly via an API; filtering and downloading images, detected objects, and map features programmatically.

Upcoming Events

Where What When Country
Taipei OSM x Wikidata #20 2020-09-07 taiwan
Lyon Rencontre mensuelle 2020-09-08 france
Berlin 147. Berlin-Brandenburg Stammtisch 2020-09-10 germany
Munich Münchner Treffen 2020-09-10 germany
Zurich 121. OSM Meetup Zurich 2020-09-11 switzerland
Leoben Stammtisch Obersteiermark 2020-09-12 austria
San José National Day of Civic Hacking 2020-09-12 united states
Ashurst Trek View New Forest Pano Party 2020-09-13 united kingdom
Lüneburg Lüneburger Mappertreffen 2020-09-15 germany
Salt Lake City / Virtual OpenStreetMap Utah Map Night 2020-09-15 united states
Kabul / Online Why OSM and how to Contribute into it on Software Freedom Day 2020 2020-09-18 afghanistan
Nottingham Nottingham pub meetup 2020-09-22 united kingdom
Düsseldorf Düsseldorfer OSM-Stammtisch 2020-09-25 germany
Alice 2020 Pista ng Mapa 2020-11-13-2020-11-27 philippines

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by AnisKoutsi, Anne Ghisla, MatthiasMatthias, Nakaner, Nordpfeil, NunoMASAzevedo, Polyglot, Rogehm, Sammyhawkrad, TheSwavu, YoViajo, derFred, richter_fn.

Outreachy report #12: Launching the December 2020 round

00:00, Sunday, 06 2020 September UTC



Delayed opening

It’s been a while since the opening of an application period was so stressful, but we made it through. While Sage made sure new changes to the codebase were merged as soon as possible, I had the chance to ping communities directly to let them know we would be open for applications soon and invite them to participate in this round. Most of them confirmed their intention to sign up and followed through, one of them signed up without needing a ping, and one of them replied saying they wouldn’t have the time to participate due to COVID-19 constraints.

Sage and I had a few conversations about how to improve the application review workflow (and avoid repetitive strain injury from reviewing thousands of applications). These changes are now live and I’m quite confident they will improve our review practices significantly.

Promotion

In addition to participating in #OutreachyChat, I saw an increase in requests to speak in both public and private events about the program. I received an invitation to become a curator for a week on a project about free and open source software makers, but ultimately decided to schedule my participation to next year to align it with our summer round schedule.

One of our interns requested an interview with an Outreachy organizer, and I’m planning on following up with her this week. Some Brazilian communities have approached me as well, and I’m doing my best to reply their inquiries during the initial application period to encourage Brazil residents to apply.



Ornithologists in cartoons

06:30, Saturday, 05 2020 September UTC
From: The Graphic. 25 April 1874.
It is said that the modern version of badminton evolved from a game played in Poona (some sources name the game itself as Poona). When I saw this picture from 1874 about five years ago, I gave little thought to it. Revisiting it after five years after some research on one of A.O. Hume's ornithological collaborators, I have a strong hunch that one of the people depicted in the picture is recognizable although it is not going to be easy to confirm this.

I recently created a Wikipedia entry for a British administrator who worked in the Bombay Presidency, G.W. Vidal, when I came across a genealogy website (whose maintainer unfortunately was uncontactable by email) with notes on his life that included a photograph in profile and a cartoon. The photograph was apparently taken by Vidal himself, a keen amateur photographer apart from being a snake and bird enthusiast. Like naturalists of that epoch, many of his specimens were shot, skinned or pickled and sent off to museums or specialists. He was an active collaborator of Hume and contributed a long note in Stray Feathers on the birds of Ratnagiri District, where he was a senior ICS official. He continued to contribute notes after the ornithological exit of Hume, to the Journal of the Bombay Natural History Society. This gives further support for an idea I have suggested before that a key stimulus for the formation of the BNHS was the end of Stray Feathers. Vidal's mother has the claim for being the first women novelist of Australia. Interestingly one of his daughters, Norah, married Major Robert Mitchell Betham (2 May 1864 – 14 March 1940), another keen amateur ornithologist born in Dapoli, who is well-known in Bangalore birding circles for being the first to note Lesser Floricans in the region. Now Vidal was involved in popularizing badminton in India, apparently creating some of the rules that allowed matches to be played. The man at the left in the sketch in the 1874 edition of The Graphic looks quite like Vidal, but who knows! What do you think?

PS: Vidal sent bird specimens to Hume, and at least two subspecies have been named from his specimens after him - Perdicula asiatica vidali and Todiramphus chloris vidali.

For more information on Vidal, do take a look at the Wikipedia entry. More information from readers is welcome as usual.

PS: 26-July-2020: It would appear that an old badminton (once also known as Fives) court near Sholapur was also of some ornithological interest.
I think I can safely say that there is only one place in India where this bird has been shot, and where I have shot it during every month in the year, and that is Sholapur. There was a grass and baubul jungle near the old Fives court on the Bijapur Road which always contained florican. - "Felix" (1906). Recollections of a Bison & Tiger Hunter. London:J.M. Dent & Co. p. 183.

Into the WikiWorld

17:47, Thursday, 03 2020 September UTC

Valerie Bentivegna is a bio-engineer working and living in Seattle. She is a member of the Seattle pod of 500 Women Scientists and chair of the Communication Working Group of the Marie Curie Alumni Association. You can follow her on her personal blog and on Twitter.

Valerie Bentivegna
Valerie Bentivegna

We all know Wikipedia. It’s almost impossible not to. 

For me, from a quick look-up of some fact to prove your point in an argument with friends, to double-checking a chemical structure for schoolwork, or to translate an obscure plant name I can’t think of the English name for; I’ve used Wikipedia consistently for well over a decade.

I’ve always known that Wikipedia was an online encyclopedia than anyone could edit. But I’d never even considered making an edit myself. Until one day in April, I received an email from 500 Women Scientists with the opportunity to attend a 6-week wiki-editing course. I’d already been working from home for a few weeks, with a considerably lower workload than usual, and — to be honest — not quite sure what to do with myself. So, I jumped on the opportunity to learn how to use the skills I already have — hey, I’m a scientist, I’ve been researching and writing and fact-checking for years! — to make Wikipedia a more inclusive place.

500 Women Scientists Wikipedia

About 10 women scientists gathered twice a week to learn how to edit Wikipedia with one main goal: putting more women on Wikipedia. I was saddened, but not surprised, to learn that of all the biographies on Wikipedia, only ~18% are about women. That percentage is ~16% if we only look at academic biographies, and it drops down to ~6.5% for female engineers, my own field. 

One potential reason for this is that a lot of Wikipedia editors are men. And — likely due to implicit bias — they write and edit articles about… other men. Even if the academic world is becoming more inclusive, this isn’t necessarily reflected on the online encyclopedia that everyone uses. 

And that’s a problem. Middle or high schoolers looking to learn more about notable figures in a field of interest and don’t find anyone who looks like them or comes from a similar background, might be turned off from pursuing studies in that field. So that’s where 500 Women Scientists Wikipedia comes in. By increasing representation of women in the academic biography category of Wikipedia, either by improving existing articles or writing new ones (for example through the Women in Red WikiProject, which aims to write articles for “redlinked” women), we could improve representation and therefore make Wikipedia a better and more inclusive resource.

That all sounds good, but how?

Okay, so I knew I wanted to make Wikipedia more inclusive and I knew why, but that didn’t really help me with the “how.” Again, the fact that anyone can edit, doesn’t make me feel comfortable doing so right away! Luckily, the WikiEducators (if that’s the term, the course was organized by Wiki Education, and everything related to Wikipedia seems to have “Wiki” in it!), walked us through the core policies of Wikipedia, the do’s and don’t, and helped us through our first article edits.

Here is a list of things that stuck (but you can find all that is relevant to editing Wikipedia, on — you guessed it! — Wikipedia):

  • Statements on Wikipedia must be verifiable, which does not mean they are necessarily true. It just means there’s a sourceable body of work to back up the statement. This feels counterintuitive (shouldn’t we be writing “the truth”?) but it ensures there are reliable sources for everything on Wikipedia.
  • Wikipedia is not a place for opinion; articles should reflect a neutral point of view. I did like that this meant according to consensus, as opposed to the journalistic rule of equal time. For example, if 90% of climate researchers are in agreement that climate change is real, that viewpoint should be reflected for 90% of the article.
  • To have a biography on Wikipedia, a person must be notable. They have to meet criteria with regard to their academic achievements, prizes won, and impact to merit a presence on the online encyclopedia. In an academic culture where men are typically still more valued than women, this can be another factor for why there are so few biographies about women on Wikipedia.
  • The definition of Wikipedia as an “online encyclopedia” is incredibly broad, and apparently it’s easier to define what Wikipedia is not.
  • You can contribute to Wikipedia in several different ways, whether it’s writing new content, taking care of layout, correcting spelling and grammar, or making Wikipedia more aesthetically pleasing (just to name a few). 

Making the first edit

The first edit was scary! 

What if I made a mistake? What if I undid someone else’s edit and step on their toes? What if I did something that was inherently anti-Wikipedian?

Wikipedia’s mantra is “Be Bold” — make the change! The beauty of a massively open, crowd-sourced, and peer-reviewed platform is that almost everyone there is willing to help. It’s not seen as a faux-pas to make mistakes, and if you do, someone else will come along and fix it. Accidentally left in a typo? Someone will fix it. Mistakenly got a fact not quite right? Someone will fix it. Change someone’s important edits without noticing? They can come back and undo your change. And Wikipedia keeps track of all the changes in the “history” tab, making the whole editing process transparent and traceable.

Working on the second article was considerably easier. Sure, there are still some really tricky things, like adding images or editing boxes, but overall making edits on Wikipedia is really easy!

“So fix it”

Another Wikipedia Mantra is “So fix it”: if you see something wrong, make it better. 

If you see a lack of representation, write a new article. Make existing articles better (I was surprised to learn about how some articles in the outer corners of Wikipedia are not great). Increasing representation is not just about getting more women biographies on Wikipedia. Black, Indigenous and People of Color academics are more underrepresented on Wikipedia than they are in academia (thanks to the #editWikipedia4BlackLives effort on June 10th and ongoing efforts from the people involved, that will hopefully change), and Pride month brings LBGTQA+ themed “editathons” (sessions where groups of people edit pages together). Wikipedia is a group effort, and together we can all make Wikipedia better: more representative, more inclusive, and more equitable. I myself plan to edit or write one article a week! 💪

Interested in taking a course like the one Valerie took? Visit learn.wikiedu.org to see current course offerings. Another 500 Women Scientists Wiki Scientists course is now enrolling!

Semantic Wikibase released

00:00, Thursday, 03 2020 September UTC

You can now use Wikibase data in your Semantic MediaWiki with our new extension "Semantic Wikibase".

Both Wikibase and Semantic MediaWiki are extensions to MediaWiki that allow you to work with structured data within your wiki. They each serve their own use cases and have different strengths and weaknesses.

Our new Semantic Wikibase extension allows you to get the best of both worlds. Data added in the form of Wikibase items and properties is automatically made available in Semantic Mediawiki (SMW). This means you can use it with all SMW functionality, such as embedded visualizations, autocompletion in forms, export in RDF/JSON/CSV/etc and more.

Embedded visualizations

Wikibase on its own does not allow you to visualize or query the data stored in your wiki. You can install the Wikibase Query service and write SPARQL queries, though this is technically challenging. With Semantic Wikibase you can place queries directly into wiki pages without installing and maintaining extra software services. This is done with a simple query language designed for use in wikis, and can get you anything from a simple static list to fancy interactive visualizations that can dynamically gather additional data.

You can learn more about SMW queries on the SMW website.

Data translation

The connection that Semantic Wikibase provides is one way. It will make your Wikibase data available in SMW but not the other way round. Data coming from Wikibase is seamlessly combined with what you have stored in SMW itself, if anything.

Semantic Wikibase can also help you if you want a simplified projection of your Wikibase data. The Wikibase data model is more complex than just property value pairs associated with wiki pages / items. Each value has a rank, can have qualifiers and can have references. This is in contract with SMW, where people typically stick to property value pairs. Semantic Wikibase only stores the main values of Wikibase statements into SMW. This makes querying and visualizing the data a lot easier.

You can view details in the documentation on GitHub.

Installation

You can install Semantic Wikibase for free, as it is fully open source and released under a free license. We thank Frans AL van der Horst for enabling initial development. Contact us if you are interested in funding additional development.

View the installation instructions and documentation.

Wikidata, qualifiers and more

Semantic Wikibase only helps you if you have a Wikibase installation yourself. But what if you want to query data from another Wikibase, say Wikidata? This would be super cool. And this first version of Semantic Wikibase is a step in that direction. Find out more in Using Wikidata in Your Wiki.

Besides support for remote Wikibases, here are a few potentially useful features that the extension does not yet have.

Data translation:

  • Ability to whitelist or blacklist entities from being translated
  • Ability to whitelist or blacklist statements from being translated
  • Translation of qualifiers, references, statement rank and other non-main-snak data
  • Support for Entities other than Items and Properties
  • Translation of Item sitelinks

Properties:

  • Detection and possibly prevention of property name conflicts between Wikibase and SMW
  • (Multilingual) descriptions of Wikibase properties on SMW property pages
  • Grouping of Wikibase properties on Special:Browse

We would love to hear your thoughts on which features you would like to see next. You can also fund development of new features.

Participate

Join the Semantic Wikibase Telegram to share your ideas and use cases or to ask questions.

Comment on reddit

A journey of a single step begins with a thousand miles

17:15, Wednesday, 02 2020 September UTC

By Andrew Bogott, Senior Site Reliability Engineer, Wikimedia Cloud Services

A couple of years ago I ran across a bug in OpenStack Neutron: I was trying to gather quota information for display in the WMCS tool “Openstack Browser” and got the response “Only admin is authorized to access quotas for another tenant.”

Most API calls in OpenStack are governed by customizable role-based access controls, aka ‘RBAC.’ For some reason Neutron quota weren’t; they were just hard-coded to require admin access. I wasn’t the first person to run into this. There was already a bug entered in the bug tracker and a pending fix.

The pending fix looked pretty good but was a bit out of date.  I tuned it up, resubmitted an updated version, and waited for review. After a bit of back-and-forth, there was a delay while I waited for an answer to a question.  In the meantime, EVERYTHING about RBAC was redesigned in OpenStack, leaving my patch totally broken.  Months later another (in response to prodding from another developer) I finally submitted an updated patch. There were some complications after that with people asking for tests, me adding tests, someone else deciding that maybe the tests were in the wrong place, the CI testing framework breaking, etc. etc. etc., but last week, my tiny patch was finally merged.

OpenStack has fairly rapid release cycles, so this patch will probably be included in an official release in the next couple of months.  This will be OpenStack version ‘Victoria’.  WMCS currently runs OpenStack version ‘Rocky,’ so we will need to upgrade our install 4 times (the release names are alphabetical) before we’re running the fixed version of Neutron in production.

We install OpenStack using upstream Debian packages.  Right now, our Hypervisors are running Debian Stretch; the last version of OpenStack available for Debian Stretch is Rocky.  Before we upgrade from Rocky to Stein we need to upgrade our Virtualization hardware from Stretch to Buster. Typically, we don’t actually ‘upgrade’ hardware; instead, we wipe servers clean and reinstall with a fresh, empty OS. In the case of our Hypervisors, though, there are many (sometimes dozens) of VMs stored locally on the hardware. Wiping the servers would delete our users’ data, so a hypervisor upgrade is a huge, delicate pain in the neck. 

Fortunately, we’re in the process of moving all VMs to distributed storage, after which we can easily transfer VMs here and there to get them safely out of the way of OS upgrades.

To summarize the remaining steps:

  • We need to finish moving VMs to Ceph
    • So we can upgrade Hypervisors to Buster
    • So we can upgrade our OpenStack to version Stein
    • So we can upgrade our OpenStack version to Train
  • At which point we’ll need to upgrade our OpenStack web interfaces to use version Victoria (Horizon is mercifully backwards-compatible so I don’t have to rebuild it with every release) and upgrade all our custom panels to handle whatever API changes have happened since Rocky…
  • And then we can upgrade OpenStack to Ussuri
    • So that we can upgrade Openstack to Victoria
  • At which point we can finally fix the Neutron quota issue in OpenStack Browser.

From the perspective of that one little bug, this is a terrible story!  Of course, in reality, this chain of patches and upgrades intersects dozens of other issues and improvements, all stumbling forward together and occasionally knocking each other down or getting in each others’ way in the process. This Neutron issue is just the one lucky bug that got to be on the sidelines to watch it all.

I got to watch it too! The average tech worker keeps a given job for something like 18 months. It’s a genuine privilege to be in one place long enough to watch step after step of a plan come to life and sometimes get to close a bug that I opened years ago.

File:Phabricator dependency chain.png
https://commons.wikimedia.org/wiki/File:Phabricator_dependency_chain.png

About this post

Featured image credit: Cirina forda, caterpillar, Lautenschläger et al, CC BY 4.0

How can I get data on all the dams in the world? Use Wikidata

07:58, Wednesday, 02 2020 September UTC

During my first week at Newspeak house while explaining Wikidata and Wikibase to some folks on the terrace the topic of Dams came up while discussing an old project that someone had worked on. Back in the day collecting information about Dams would have been quite an effort, compiling a bunch of different data from different sources to try to get a complete worldwide view on the topic. Perhaps it is easier with Wikidata now?

Below is a very brief walkthrough of topic discovery and exploration using various Wikidata features and the SPARQL query service.

A typical known Dam

In order to get an idea of the data space for the topic within Wikidata I start with a Dam that I know about already, the Three Gorges Dam (Q12514). Using this example I can see how Dams are typically described.

Classification

The first thing I notice is that this Dam is an “instance of” “gravity dam”. An “instance of” is represented with the id P31 and a “gravity dam” is represented by the id Q3497167. This is probably a subclass of a wider set. When navigating to gravity dam I see that it is a “subclass of” “dam”. A “sub class of” is represented by the id P279 and “dam” is represented with the id Q12323. This feels like the top level for this ontological tree.

Looking at the talk page for the dam item, I can see some useful links allowing us to dive into the subclasses of “dam”, and also various instances of “dam”.

Properties

Taking another look at the Three Gorges dam item page we see various properties used to describe the dam that might be useful to look at in the dam context:

The whole set

The best way to get an overview of the whole data set is to use the query service. The dam item talk page already included a link to the query service listing all instances of dam, so we can start there.

This query will list a random set of 1000 “instance of” (P31) or instances of “subclasses of” (P279) the “dam” (Q12323) item, while also providing the English label (name) of the “dam”.

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31/(wdt:P279)* wd:Q12323 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"  }  
}
LIMIT 1000

If we remove the LIMIT from the query we can see that 84215 dams are currently collected in Wikidata. However, sometimes this might not be desired, as the lists can get pretty long.

We can see more information about the dams on this list by expanding the query to look for triples for statements of installed capacity (P2109). Statements are the basic data building blocks in Wikidata connecting a property such as “instance of” with a value such as “dam”. Triples are a representation of this data which is queried by the SPARQL language.

SELECT ?item ?itemLabel ?installedCapacity WHERE {
  ?item wdt:P31/(wdt:P279)* wd:Q12323 .
  ?item wdt:P2109 ?installedCapacity
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"  }  
}
LIMIT 1000

This will only return dams that have this statement defined (261). In order to still return the complete list, this needs to be added as an OPTIONAL triple.

SELECT ?item ?itemLabel ?installedCapacity WHERE {
  ?item wdt:P31/(wdt:P279)* wd:Q12323 .
  OPTIONAL{ ?item wdt:P2109 ?installedCapacity }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"  }  
}
LIMIT 1000

More data points can be extracted in much the same way where available.

SELECT ?item ?itemLabel ?installedCapacity ?anualEnergyOutput ?watershedArea WHERE {
  ?item wdt:P31/(wdt:P279)* wd:Q12323 .
  ?item wdt:P2109 ?installedCapacity .
  ?item wdt:P4131 ?anualEnergyOutput .
  ?item wdt:P2053 ?watershedArea .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"  }  
}
LIMIT 1000

Other views can also be used for the data. Setting the defaultView option in a comment will make this happen once the query has run.

#defaultView:Map
SELECT ?item ?geo WHERE {
  ?item wdt:P31/wdt:P279* wd:Q12323;
        wdt:P625 ?geo .
}
LIMIT 10000

This map displays 10,000 random dams, and allows you to zoom in, hover and inspect them.

Conclusion

Starting with a topic area to explore and a single known example I have explored the way that Wikidata describes the topic and figured out where the topic fits within the larger tree of concepts. I have also expanded from a single example to a complete data set, visualizing that set on a map.

SPARQL and the query service allow much more than is discussed in this post, such as filtering, alternate data representation and visualization, and much more.

Further reading

The post How can I get data on all the dams in the world? Use Wikidata appeared first on Addshore.

Every Wikimedia project has information available that could be shared with other Wikimedia projects. Data is incomplete in every project and the objective of this proposal is to indicate missing data so that it may be included.
It starts with a category. This category links to six English Wikipedia articles. Using a tool, that information is now available in Wikidata as well. As this information is further enriched, it is found that one article should be included in the category.

The category exists on many Wikipedias, in the defintion of the category it is known what content the category contains in Wikidata. Reasonator shows the information with an inbuilt query. When you check the article with a list, it is obvious that many articles do not have a category entry. The latest entry in the list is known to Wikidata thanks to the Latin Wikipedia..

The proposal is simple. Have a messaging agent that indicates missing categories on articles. This will enable any Wikipedian to add them. For Wikidata we would import data based on the definition of categories. The process would be enabled per defined category.

Arguments: 
  • Nothing happens on a Wikipedia without prior agreement
  • The mechanism used is by default one of signalling and not of updating 
  • It follows existing practice for importing data from Wikipedias into Wikidata
Thanks,
      GerardM

Remedial Skills In Open-To-The-Public Working Groups

17:45, Tuesday, 01 2020 September UTC

I'm talking in this post about wikis, political clubs, open source projects, fanvidding exchanges -- any groups where people try to work together and are open to the public.

"No, what's that?"

Some people joining your groups don't know things you take for granted.

Years ago, while helping new applicants to Google Summer of Code get into working on (I think) MediaWiki or Zulip, I was talking with a person who was (at that time) a student in an Indian engineering college. He had run into trouble and was trying to debug a setup issue. He seemed to be asking for help but not systematically investigating what the problem might be.

So I said, let me give you some tips on troubleshooting. You've heard of the scientific method?

He said: no, what's that?

I gave him a link to the Simple English Wikipedia article on it -- and to the They Might Be Giants video for their song "Put It To The Test", on forming falsifiable hypotheses and testing them. I asked him to watch and read them and then tell me when he had done so. He did.

They Might Be Giants - Put It to the Test from They Might Be Giants on Vimeo.

I then said: so, you do that. You come up with a hypothesis for what might be the problem, and figure out how you would check to see if that is the case. You run the experiment, and you use the resulting data to refine your hypothesis -- if you're wrong, you try to come up with a new hypothesis. And then you either find and fix the problem, or if you get stuck, you at least have a bunch of data to give someone so they can help you better.

He said: THIS IS AMAZING! I'M GONNA USE THIS ALL THE TIME! And I was like IT IS! And then in a separate private conversation with colleagues, I was privately very angry with the educational system that had let him get that far without teaching him this!

[This is one of the experiences that led to me writing a fairly well-regarded blog post for new open source software contributors, on the scientific method, learning from and contributing to shared notes and logs, and self-reliance and interdependence.]

Lessons

Some lessons here for you:

  1. Don't act surprised when people say they don't know something. Because (as I said a few years back) that’s just a dominance display. That's grandstanding. That makes the other person feel a little bit bad and makes them less likely to show you vulnerability in the future. It makes them more likely to go off and surround themselves in a protective shell of seeming knowledge before ever contacting you again.
  2. Some gaps you can remediate. Sometimes it's as quick as the explanation and links I used above. Sometimes it's more involved, as with providing free English tutoring to your internship applicants. Sometimes remediation takes a lot more work, and you may not have time to provide it; see my suggestions in "How to Teach And Include Volunteers who Write Poor Patches" which include some lower-effort options, like the section "Using their knowledge and curiosity to improve the project in other ways."

    In open source projects, I think it's also okay to exclude unskilled people from a project based on "we do not have time to help them learn remedial skills, and this is not a suitable project for novices." If you are going to do this, explicit is better than implicit. You should try to forewarn contributors with explicit "not good for beginners/prerequisite skills are [list]" language in your README and CONTRIBUTORS files. And when you need to tell contributors that they aren't ready to participate in your group yet, you should offer them a redirect to a project more suitable to their skill level, you should take care not to insult them while redirecting, and you should tell them they'll be welcome back in the future after learning more of the prerequisites.

  3. The scientific method is still as kickass a cognitive tool as it has ever been. It's amazing and empowering!

Thanks to Dr Linda McIver for the conversation that spurred this post.

Tech News issue #36, 2020 (August 31, 2020)

00:00, Monday, 31 2020 August UTC
previous 2020, week 36 (Monday 31 August 2020) next
Other languages:
Bahasa Indonesia • ‎British English • ‎Canadian English • ‎Deutsch • ‎English • ‎Nederlands • ‎Simple English • ‎Türkçe • ‎español • ‎français • ‎italiano • ‎magyar • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎српски / srpski • ‎українська • ‎עברית • ‎العربية • ‎മലയാളം • ‎ไทย • ‎中文 • ‎日本語 • ‎粵語 • ‎한국어

weeklyOSM 527

09:42, Sunday, 30 2020 August UTC

18/08/2020-24/08/2020

lead picture

The Melbourne “Monolisk” 1 appeared in Microsoft’s Flight simulator through a typo in OSM and created big media coverage this week | © MS Flight Simulator – picture by Twitter@alexandermuscat | data © OpenStreetMap contributors

Mapping

  • Following on from erroneous tagging on a building in Melbourne (Australia) that then appeared in Microsoft Flight Simulator, here is another example from Portugal: The Statue of Christ the King in Almada, Portugal, was mapped as a modern residential building with many floors! ‘It’s the old bad habit of classifying statues, fountains, piers, etc., as building=yes‘, says Luis Forte.
  • Vollis proposed the new tag amenity=funeral_hall to enable proper mapping of (or parts of) buildings dedicated to funeral services, that may or may not be religious, often within a cemetery or linked to one.
  • Supaplex030 is proposing the tag kerb=regular to explicitly distinguish kerbs (curbs) with ‘normal’ standard height from kerb=raised. The voting is open until 3 September.

Community

  • Nirab Pudasaini created a lot of graphics analysing his own OSM twitter dataset containing over 45,000 tweets containing “OpenStreetMap”. Cluster analysis reveals some sub-communities such as osm.fr, OSM enterprises with Mapbox and Mapillary, the OSM community from the Americas, the sub community from Africa, clusters focused on teaching and learning about OSM, clusters around map based data viz centred around @qgis, and many more.
  • User Arjun has written a blog post about ‘Downloading Vector Data for Highways, National Parks and Other Common Features from OpenStreetMap for Geospatial Analysis’.
  • Regional Ambassadors Chomba Chishala and Yusuf Suleiman wrote about how YouthMappers are contributing to the 2030 agenda for sustainable development.
  • Christopher Beddow started a poll on Twitter about how people contribute to OSM, specifically if they use mobile phones as their primary or only method. Many users commented in response about editing OSM on mobiles as compared to on a computer.
  • The OpenStreetMap Ops team has now enabled a new secondary database server, which once fully tested will likely become the primary server.
  • The August newsletter of OpenStreetMap US has been published.
  • María Fernanda Peña Valencia shared her experience being part of YouthMappers and encourages other students to join the network.

OpenStreetMap Foundation

  • What type of memberships are there in the OSM Foundation? Apply for a free membership if you have mapped on 42 or more days in the past year.
  • Didn’t reach the 42-day mapping goal, but contributed to the success of OpenStreetMap? Then you too can become a voting member of the OSMF for free. Here is the form where you can specify these activities.
  • While keeping up activities on proprietary social media platforms to be visible in those spaces, OSMF board member Tobias Knerr assures us that all effective participation in the OSM community will remain possible through free channels (free means accessible through open-source software and open protocols, and not requiring an account at a third-party service to access).
  • OpenStreetMap US has made an application to become an official local group of the OSMF (we covered earlier). Joost Schouppe asked, on behalf of the OSMF Board, for people to express their opinions on the wiki page by 5 September.
  • OSM Kosovo, or Free Libre Open Source Software Kosova (FLOSSK) submitted an application to the OSMF to become an official local group. Mikel Maron, on behalf of the OSMF, asked for expressions of opinion by 7 September.
  • OSM Buildings has existed since 2012. Cesium has applied for the right to use the name ‘Cesium OSM Buildings’ and has been approved by the OSMF Board. According to the project OSM Buildings had expressed concerns and asked for the opportunity to clarify the situation. OSMF and Cesium published on the product ‘Cesium OSM Buildings’ (we covered earlier). It was also announced that Cesium is now a silver sponsor of OSMF. OSM Buildings now fears that the use of the project name by Cesium will affect the continued existence of the project.

Events

  • The call for venues for State of the Map Africa 2021 is open! Find all the details on what is required and how to submit your proposal on the wiki page. The deadline for proposals is 20 October. The winner will be announced on 31 October.
  • FOSS4G Tokyo/KANSAI/Tokai (regional events in Japan) have announced (ja) > en their joint online event FOSS4G 2020 Online, and invited event sponsors and session participants.
  • FOSS4G Hokkaido (ja) > en , one of the regional events in Japan, will be held online on 26 September. Applications for sessions have started.
  • Nakaner has been invited to present OpenStreetMap at the Eurobike 2020 expo, as almost all apps and devices for cyclists nowadays include OSM data. The (free of charge) expo will take place online on 3 September. The conference language is English.

Humanitarian OSM

  • HOT and the GAL School of Peru have been working together to support the Peruvian and local governments in their COVID-19 response, by mapping the Cusco region and providing analytical tools and expertise to make use of the data.

Education

Maps

  • The French geography blog decryptageo (fr) > en featured Tatsuo Mitsuchi’s (ja) > en fork of anvaka’s website. Tatsuo Mitsuchi created a website where the orientation of a city’s road network is represented by different colours.
  • Elana Levin Schtulberg creates beautiful drawings by colouring regions of city maps in a ‘painting-by-numbers’ style.
  • Thomas Froitzheim presented PhoneMaps (de) > en , an easy-to-use smartphone app with Europe-wide free OSM offline vector maps and improved routing on bicycle and hiking trails as well as on mountain bike tracks.
  • Supaplex continued writing about the history of mapping in Taiwan. In the metropolitan area of Hsinzhu, there was an almost empty space compared to other areas of Taiwan on OpenStreetMap. Unfortunately there used to be only low-resolution Bing imagery available. Now better data is available most of the ponds on the tableland have been mapped.

Open Data

  • Mapillary has announced that computer vision extracted data is now available globally on OSM. Previously only traffic signs were available in JOSM, and other data was available by request in iD. Now data including pedestrian crossings (crosswalks), traffic lights, bicycle parking, and many other classes are available globally, thanks to image contributions from Mapillary users.
  • OpenAQ aggregates environmental sensor data from all around the world. Sensor.Community published a guide to building your own similar measuring station.

Software

  • The tool used in ‘Generation Streets‘, a game available on Steam, to generate 3D worlds based on OpenStreetMap has been open sourced. Rvtgen3d is the command line tool and is part of the rsgeotools toolset, now available on Github.

Did you know …

OSM in the media

  • [1] The typo heard around the world’ (we covered earlier) continues to garner press coverage in English, French, Russian, Italian, Taiwanese Mandarin, Japanese, Norwegian, Spanish, and German (we won’t insult your intelligence by providing auto-translate links but do have a read of Ilya Zverev’s take on the issue). In other news, some scientists believe there may be areas in the upper Amazon basin where the news of Nathan Wright’s typo has not yet spread.
  • Rebecca Firth explained (it) > en (video in English) how the OpenStreetMap humanitarian team (HOT) is using open source software to put one billion people on the map over the next five years.
  • Antônio Heleno Caldas Laranjeira wrote (pt) > en about territories hidden by Google Maps.

Other “geo” things

  • The Cloud GIS Market Status and Forecast report lists OpenStreetMap among the most significant players.
  • Google has updated their map style with more details and colour.
  • David Pollack wrote about ‘GPS and The Future of Indoor Navigation’.
  • Matt Parker published a YouTube video about his attempt to determine if the effect of terrain is included in estimates of country sizes and whether this would affect the rankings of mountainous countries.
  • Peter Van Geit had planned to spend six months this year fast hiking 5,000 km and exploring new passes across the Indian Himalaya. After the pandemic hit, foiling his plans, he started a new project — creating detailed hiking maps of the area.
  • What happens in our brain when we look at a map? Do grid lines help us to orientate? The team led by Prof. Dr. Frank Dickmann, Ruhr University Bochum, Germany, has investigated (de) > en these questions.
  • Robert Murrah investigates why maps are important for our response to the COVID-19 pandemic

Upcoming Events

Where What When Country
London Missing Maps London Mapathon 2020-09-01 united kingdom
Salt Lake City / Virtual OpenStreetMap Utah Map Night 2020-09-01 united states
Stuttgart Stuttgarter Stammtisch 2020-09-02 germany
San José South Bay Civic Hack & Map Night 2020-09-03 united states
Taipei OSM x Wikidata #20 2020-09-07 taiwan
Lyon Rencontre mensuelle 2020-09-08 france
Berlin 147. Berlin-Brandenburg Stammtisch 2020-09-10 germany
Munich Münchner Treffen 2020-09-10 germany
Zurich 121. OSM Meetup Zurich 2020-09-11 switzerland
Leoben Stammtisch Obersteiermark 2020-09-12 austria
Ashurst Trek View New Forest Pano Party 2020-09-13 united kingdom
Lüneburg Lüneburger Mappertreffen 2020-09-15 germany
Salt Lake City / Virtual OpenStreetMap Utah Map Night 2020-09-15 united states
Kandy 2020 State of the Map Asia 2020-10-31-2020-11-01 sri lanka

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by AnisKoutsi, Elizabete, LorenzoStucchi, MatthiasMatthias, Nordpfeil, NunoMASAzevedo, Polyglot, Rogehm, SK53, Sammyhawkrad, Supaplex, TheSwavu, YoViajo, derFred, jinalfoflia, k_zoar.

Wiki Loves Monuments in 2020

05:30, Sunday, 30 2020 August UTC

Also in complicated times like 2020, we will organize Wiki Loves Monuments – albeit in a slightly modified form.

Because of the coronavirus pandemic, the amount of in-person activities has been decreased to a minimum, and most countries will not organize any in-person activities (photo walks, lectures, workshops etc) at all. However, as far as your local health advisory permit you, you can still photograph buildings by yourself – taking all safety precautions as advised by local health officials and keeping in mind the advisory of the World Health Organization.

Traditionally, Wiki Loves Monuments is organized in September. This year, some countries will organize their national competition in October or November. Please check the national website for more information. This will also mean that the international jury process will happen much later than usual. You can expect an international announcement on the winners in February 2021.

Finally, there may be some special prizes this year. Stay tuned!

Malayalam Spellchecker version 1.1.1 released

11:54, Saturday, 29 2020 August UTC

A new version of Malayalam spell checker based on mlmorph is available as python library. Install the library $ pip install mlmorph_spellchecker Sample usage >>> from mlmorph_spellchecker import SpellChecker >>> spellchecker = SpellChecker() >>> word = "ഉച്ഛാരണം" >>> spellchecker.spellcheck(word) False >>> spellchecker.candidates(word) ['ഉച്ചാരണം'] >>> spellchecker.spellcheck("ചിത്രകാരൻ") True The new version adds a database of commonly mistaken words of Malayalam for quick checks and correction. If the given word is present in that common list, spellcheck result and correction suggestions will be based on that database.

Manjari version 1.910 released

04:40, Friday, 28 2020 August UTC

A new version of Manjari typeface is available now. New version adds about 25 Latin glyphs that are considered important by Google Fonts checks. Manjari is now integrated with Fontbakery font quality check in its CI. Some bugs reported by Fontbakery is also fixed. It is available at SMC website. Change log is available in gitlab SMC also started to publish the font releases in a new release file server - releases.

An unusual place to find community

16:43, Thursday, 27 2020 August UTC

Dr. Azmina Bhayani is family physician. She recently completed one of our Wiki Scholars courses sponsored by the Society of Family Planning. She practices in New York City and is particularly interested in reproductive health and medical education. 

Azmina Bhayani
Azmina Bhayani

Community means different things to different people. Wikipedia says community is “a social unit…with commonality such as norms, religion, values, customs, or identity. Communities may share a sense of place situated in a given geographical area…or in virtual space through communication platforms.” Through the Society of Family Planning Wiki Scholars Program, I was pleasantly surprised to learn that while it may seem vast and amorphous, Wikipedia is a place to find community.

As a scholar, I was connected with like-minded health care professionals and public health researchers. Our common goal was to create or improve Wikipedia articles related to family planning and reproductive health, so the most accurate information was available to the world. Through our weekly chats, I learned about the other scholars’ areas of interest and got ideas about how to improve my own articles about infections of the female reproductive system and postpartum medical care.

We were incredibly fortunate to learn from Wikipedia experts, who showed us not only the basics of how to edit and create articles on Wikipedia, but also that Wikipedia works because it is created and maintained by a community of folks across the globe. I find it inspiring that individuals take on the responsibility of various roles in order to keep Wikipedia functional, accurate, and true to its mission. These members offer friendly pushback when articles are edited and also engage in less glamorous tasks like checking article references and correcting grammar. This powerful force ensures that Wikipedia stays alive and well, and makes me feel proud of the work my cohort of scholars and others engage in to deliver accurate information to the public.

So, to anyone on the fence of whether you should edit Wikipedia articles, I say go for it! And you will learn not only how easy it can be to add knowledge to this amazing resource, but also find a sense of community while doing it.

A Teen Threw Scots Wiki Into Chaos and It Highlights a Massive Problem With Wikipedia:

An overview article of the Scots Wikipedia saga, which provides useful context about how editing issues come to happen in smaller-language Wikipedias. Excerpt:  

Back in 2013, a 12-year-old American with an enthusiasm for the Scots language decided to contribute to the Scots Wikipedia. For seven years, they edited tens of thousands of articles with little oversight. Then, a Reddit post blew everything up.

The problem here was the teen in question is not a native Scots speaker and was a prolific contributor for a small wiki. Several Wikipedia admins and editors familiar with the situation have reached out to Gizmodo, and by all accounts, the teen was acting in good faith and meant no harm. It’s the sort of earnest and naive attempt to help that sometimes ends up doing more harm than intended.

That hasn’t stopped the backlash. Sure, you might write off this teen as lacking common sense, but the question remains: How did a single person, a teenager at that, have this much free reign for such a long period of time over what is largely considered to be a reference platform? […]

Scots Wikipedia may be the latest, and perhaps most well-known example thus far, but multiple Wikipedia editors emailed Gizmodo to bring attention to other examples of smaller wiki projects facing similar problems.

Read the whole thing

By Lalitha, Outreachy Intern and Srishti Sethi and Sarah R. Rodlund, Wikimedia Developer Advocacy

In partnership with Wikimedia and other technical organizations, Outreachy provides several rounds of paid internships each year with the aim of supporting diversity in the Open Source and free software movement.

Outreachy invites and encourages anyone “who faces under-representation, systemic bias, or discrimination in the technology industry of their country” to apply to its programs. Outreachy interns hail from around the globe and often work remotely with experienced mentors on Open Source technical projects. 

As an Outreachy coordinator for Wikimedia, one of my duties is to ensure that interns submit their bi-weekly reports on time. I also try to read these reports often to learn about their progress on the projects, how the internship is going, if there are any challenges that I should be aware of, etc. Sometimes in these posts, interns share personal stories that make me feel connected with interns, even more, understand where they are coming from, their backgrounds, and their motivations for joining our organization. 

This piece of the program is the key for me and is truly uplifting! During the last round of Outreachy, I read this weekly report shared by Lalitha, one of our interns. Reading her post, brought me “aha!” moments. I immediately went on to Twitter to document what I captured and found another intern echoed my sentiments! As an immigrant myself from India, as is Lalitha, it was an eye-opener for me to learn about her struggle as an ESL parent and immigrant in the US and how she spent significant years learning and becoming comfortable in English. 

Lalitha is almost my mother’s age with the desire to learn something new, and she is interested in picking up software development skills! It brought me so much joy reading this in the post! I remember after reading her report, getting up from my chair to walk around in the office, and share this story with my colleagues.

At the end of the Outreachy program, we interviewed Lalitha about her experience.

Here is Lalitha in her own words:

Can you tell us a little about your background?

I am a 54-year-old immigrant mother of two–one in his thirties and another daughter in high school. I love to cook new healthy recipes, especially with the produce I grow in my garden. Though I came to this country 20 years ago, it took me 10 years to motivate myself to join the local community college to begin my ESL classes. Since I had completed a 3-year college degree in Math but in Telugu medium, I had to start with the basics of the English language.

I used to attribute my childrens’ obsession with their smartphones to our generational gap and was quite content remaining oblivious to technology. However, I began to notice many of my friends were also active on social media and accustomed to using the latest tech products. Realizing that I could no longer remain a bystander, or keep to societal expectations of me I decided to not only become a part of the tech space but to contribute to it as well.

My journey in software engineering began with ESL classes to help with my English ten years ago. I then took some math classes to refresh my Math skills. Core computer science classes followed soon after. After taking several Python courses in the fall of 2017, I started preparing for a Bootcamp to accelerate my learning. I took a Hackbright prep course in the summer of 2018, which led me to get accepted into the Fall 2018 Hackbright Software Engineering Bootcamp. 

How did you get to hear about Outreachy? 

Even though I was using many Open Source platforms and products including Github, Atom, React, Wikipedia, and others as part of my software education, I wasn’t aware of the open-source projects until my son introduced me to the Outreachy program a few months before applying for the internship. 

My situation was unique with no prior work experience, no peers of my age, or background around me to learn and practice programming with. While searching for ways to enter into the IT industry, Outreachy opportunity seemed exactly suitable for my needs.

Can you tell us about the project you participated in?    

When I was searching for the projects to choose from the list, I came across the Wiki Education Dashboard. My familiarity with Wikipedia, its open-source nature attracted me and on top of it, this project was looking for React programmers which happened to be my interest and area that I was already learning and familiar with.

My project scope was about converting the campaign view of an instructor from Ruby and Haml implementation to a React implementation. This was part of the larger project to update all of their code to React in order to not re-render the whole page when navigating the dashboard.

My specific assignment was to convert Haml pages that relate to a campaign into React. Campaigns are collections of courses that an instructor is teaching. I broke up the code into components such as campaign, campaign navbar, campaign stats, and campaign home to account for the various parts of the campaign page. 

One of my challenges was to get the data from the server-side for the campaign.jsx component using the Redux data flow. Redux is another area that I was initially very confused about. Boilerplate for Redux seemed very complex to understand. There were many terms in Redux interlinked such as reducers, actions, constants, thunk, selectors, mapStateToProps, dispatch,  mapStateToProps and connect. It took me a while to get to know all the terms and how Redux sends data as props.

Working remotely, initially, it was challenging for me to articulate my problem or question about what I was going through in the right way to my mentors in Slack.  

What was it like working with Outreachy mentors?

Both of my mentors have been very supportive of me throughout this internship. Initially, I did have a hard time understanding Sage Ross. The technical jargon he used and the quick pace at which he spoke were challenging. But when I mentioned this to him, he immediately made adjustments to make me feel more comfortable. Whenever I was having a hard time solving, Sage always pair-programmed with me and taught me so much in the process.    

Although my other mentor, Khayati Soneji, lives in India, which is on the other side of the world, she always found time to respond to my questions. Once I had a great time talking to her for almost the whole day. She worked with me on Zoom from 11 AM to 3 PM even though it was very late at night for her. We often pair programmed but also found time to connect on a personal level. 

I miss my mentors, especially Sage who always was smiling and talking to us standing next to his desk while we were all sitting.

What are you working on now?

After my Outreachy internship, I have been attending career fairs, applying for jobs, practicing coding problems, whiteboarding, and volunteering at Organiz (https://organiz.org) as a software developer. 

Do you have advice for future Outreachy interns?

I think three months is a very short period of time. My advice would be to ask questions and ask for help without any inhibition including about any language barrier that you may have. Speak up and share the difficulties that you are going through with mentors so that they are aware of your requirements and can help you suitably. 

The next round of Outreachy internships will open in late August 2020. 

To learn more about the program, visit the program page here: https://www.outreachy.org/. Once you clear an initial application phase, you will able to browse projects available for all mentoring organizations, including Wikimedia on the Outreachy’s program website. You might also want to check out Outreachy’s response to Covid-19, and the policies they revised for students in the previous round. Keep an eye on this page for new information for the upcoming round. 

If you are a potential intern or a mentor interested in participating in the program, we look forward to your participation! For questions, come and join the Wikimedia’s Zulip chat: https://wikimedia.zulipchat.com/

About this post

Featured image credit: Two-brown-trees, Johannes Plenio, CC BY-SA 4.0

Ceph at WMCS, the numbers and the details

15:52, Monday, 24 2020 August UTC

By Andrew Bogott, Senior Site Reliability Engineer and Brooke Storm, Staff Site Reliability Engineer

We’re currently running Ceph version ‘Nautilus,’ which is the stock version packaged with Debian Buster.

General hardware overview

The details of how Ceph actually works is well beyond the scope of this blog post (More detail can be found here.).  In brief (and for our present use case), it splits all data into arbitrary chunks and maintains three copies of each chunk, keeps track of where those copies are, and makes sure that all three eggs are in different baskets.  The software that keeps track of the health of all this is on hosts called ‘monitor’ hosts. The hosts that actually store and replicate the data are called ‘osd hosts’.

Having three copies of everything is great — it means that there’s never a tie about state, and it means that a lot of hardware would have to die at once in order for any data to be lost. Having three copies also means that the total hardware needs are immense.

Our current cluster contains 15 osd hosts; each host contains eight 1.8 terabyte ssds.  All together that’s about 216 terabytes of raw storage, which is enough to handle all the current VMs plus just a little bit of space for growth.  The good news is that since Ceph worries about redundancy and striping we don’t lose anything to local RAIDs, and expanding the cluster when needed is operationally trivial.

There are also three monitor hosts.  Three is, again, the lucky number that means there’s redundancy but no chance of a tied vote in case of a disagreement.

Networking

In order to safely maintain redundant data copies, blocks are constantly getting copied, deleted, and rebalanced among the OSD nodes in a probabilistic fashion according to the CRUSH algorithm. That’s a fair amount of network chatter even when things are at rest; if a given node loses power or suffers some other kind of failure, everyone will get to work replacing the lost copies and the network will get extremely busy. Although we don’t estimate our system will be anywhere near the scale needed to cause such a problem, to prevent Ceph from even being able to launch an accidental denial of service attack on the datacenter we’ve installed everything in such a way that the backend traffic cannot flood uplinks for other services. The cluster network for backend traffic isn’t even on a routed VLAN.

Traffic to hypervisors is on separate interfaces to keep things isolated and performant even during minor outages of individual OSD servers. This traffic will be rate-limited at the VM level via the VM’s flavor definition in Openstack in order to prevent individual VMs from flooding the hypervisor’s network capacity.

Planning for the worst

The WMCS team is new to Ceph, which means we’re largely unfamiliar with possible failure scenarios. It has a good reputation for stability, but as we rely on it for ever more storage cases it also becomes an ever-more intimidating disaster risk. To mitigate that we’re pursuing several approaches:

  • We are hiring a short-term consultant, both to provide training and also to be on call in case problems arise that are beyond our understanding.
  • We’re investigating backup solutions for Ceph-hosted volumes. We will probably never have the resources to comprehensively back up every piece of data, but we do hope to have a selective backup process whereby more project-critical VMs can have short-term backups outside of Ceph in case of collapse. More research is needed here before we’ll know what level of durability we can commit to here.

About this Post

This is Part 2 of 2 posts on Ceph and Wikimedia Cloud Services. Read Part 1.

Featured image credit: Takoyaki, nakashi, CC BY-SA 2.0

Ceph distributed VM storage coming to Cloud Services

15:52, Monday, 24 2020 August UTC

By Andrew Bogott, Senior Site Reliability Engineer

When the Wikimedia cloud platform was first designed in 2011, it relied heavily on distributed storage.  Home directories, project storage, and VM root drives were all located on a GlusterFS file system spread out over a cluster of servers. Gluster was new; we were early adopters, and our use case was outside the initial expectations of the Gluster designers. The result was a bit ugly.  Every week or two the filesystem would get overwhelmed and fall into a ‘split-brain state’ where different hosts in the cluster had different ideas about what the latest version of a file looked like; then admins would have to shut everything down, go through the logs, and make unilateral choices about which file versions lived and which died. Often there was data corruption followed by the inevitable apologetic admin emails: “Sorry, your VM was corrupted and will have to be rebuilt.  Remember: cattle, not pets!”

As is often the case with young projects, every fixed bug had a new bug hiding behind it. It eventually became clear that we weren’t going to be able to provide stability for our users with GlusterFS. Shared files (home directories, etc.) moved to NFS. VM storage was moved off of shared systems entirely.

That is essentially how things remain: each VM has a single, fixed-size drive that is stored on a single physical host (aka ‘hypervisor’).  This approach has served us surprisingly well — RAID storage means that when individual drives fail, we can generally replace them before any data is lost, and we’ve never had a major loss of VM data.  Despite this success, I’ve spent the last five years waiting for the other shoe to drop, because local VM storage has some serious disadvantages:

  • If a hypervisor does suffer total failure (e.g. a misbehaving drive controller), all VMs hosted on that system could be corrupted or lost.  Some of our hypervisors host as many as 80 VMs at once.  As easy as it is to say ‘cattle, not pets,’ it’s clear that most volunteer-run projects don’t have the resources to maintain a fully-puppetized, easily recreated setup; such a failure would be extremely disruptive.
  • Shutting down a hypervisor for maintenance means shutting down all the VMs hosted there.  This is one of the main causes of those ‘Instance Downtime’ emails that we send out periodically.
  • If we want to replace or re-image a hypervisor, we have to copy the complete contents of every VM off of the hypervisor first.  This takes many hours, and results in yet more ‘Instance Downtime’ emails.  When the VMs that need moving are part of a larger cluster, this involves all kinds of elaborate load-balancing and failover dances that require special knowledge of each service involved.
  • File storage is strictly bound to individual VMs.  If someone wants to rebuild a VM with a different operating system or just get a fresh start, they have to download all of their data first, or arrange for a complicated VM-to-VM copying process.

Nearly a decade has elapsed since our original experience with shared storage, and the bruises have mostly healed. In recent years, it’s been more and more obvious that no one building a new cloud today is doing things the way we are; for must public cloud providers, the amount of risk and downtime that comes along with local storage would be simply unacceptable.

Some of these other cloud providers are using a much-more-advanced-and-finally-ready-for-prime-time GlusterFS setup.  Even more are using a different storage solution, Ceph. Based on conventional wisdom (and remaining trauma associated with Gluster), we’ve decided to adopt Ceph as our new storage system.  Last year, Jason Hedden built a small proof-of-concept Ceph cluster and convinced us that it’s a viable option. A few minor WMCS projects have been running on Ceph for several months. Jason has since left to pursue other opportunities, but after many budget discussions and hardware orders, we are now ready to build out a full-sized cluster that will eventually support storage for the entire WMCS platform.

The first use of the new cluster will be to host the most cattle-like VMs in our platform, Toolforge Kubernetes worker nodes. As we iron out various issues and bugs we’ll gradually move more and more VMs off of local storage.

Once everything is ceph-hosted we’ll start to see the first big payoff–‘live migration’ of VMs.  With live migration, we can transfer a virtual server from one physical host to another with no interruption in downtime.  That will finally allow us to do needed hypervisor maintenance without imposing downtime on our users. Ceph will also maintain multiple replicas of every block of VM data, which will cause us to lose a lot less sleep worrying about data loss from hypervisor failure.

Once we’ve secured better stability and uptime for VMs, we hope to also implement attachable block storage.  That will allow users to attach additional storage to running virtual servers, and even detach that storage and reattach it to a different server.

We’re planning to be cautious in our progress to this new platform, so it may be several weeks or months before you start hearing maintenance & downtime announcements about the move.  When you do, you can rejoice in knowing that they might be our last.

About this post

This is 1 of a 2 part series of posts on Ceph and Wikimedia Cloud Services. Read Part 2.

Featured image credit: Dumbo Octopus, NOAA, CC BY-SA 2.0

Tech News issue #35, 2020 (August 24, 2020)

00:00, Monday, 24 2020 August UTC
previous 2020, week 35 (Monday 24 August 2020) next
Other languages:
British English • ‎Deutsch • ‎English • ‎Nederlands • ‎Türkçe • ‎español • ‎français • ‎italiano • ‎magyar • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎Ελληνικά • ‎русский • ‎српски / srpski • ‎українська • ‎עברית • ‎العربية • ‎हिन्दी • ‎ગુજરાતી • ‎മലയാളം • ‎ไทย • ‎中文 • ‎日本語 • ‎粵語 • ‎한국어
Once there is a stalemate, where positions are entrenched, there is only sniping and little progress. At the English Wikipedia they are adamant; they do not want automatic changes from Wikidata. As a result there is little or no progress making effective use of the information that is at all the Wikipedias and Wikidata. There is room for improvement, improvement that will benefit both English Wikipedia and Wikidata.
Let me explain with an example. In the Gambia they have foreign ministers. Great information can be found at this English article. There is also an incomplete category, incomplete because not all the foreign ministers with an article are included. 

When somebody enters the data for Gambian foreign ministers in Wikidata, the result is best shown using Reasonator. Reasonator show it best because you can have it show in any and all languages. That is quite relevant because there may be lists in other languages.. like in German for instance. The German list has only one red link, the English list has five and the Reasonator list, once completed, will have none. 

When you summarise the state of play for lists of position like this, the presentation of these lists differs greatly while the content is by definition the same. When you want to spare both the cabbage and the goat, it takes extra moves. The Reasonator information for the category shows 24 entries and categories in four languages. It is easy to test if all the articles linked have a category entry for each language and also if Wikidata knows these people for the position they hold. 

We do not have to put Wikidata in "your face" like we would do with automatically changing infoboxes. Having a system that indicates that attention is needed is a first step for getting used to shared information. Information that comes from all Wikimedia projects and has Wikidata as its intermediary.
Thanks,
       GerardM

weeklyOSM 526

09:55, Sunday, 23 2020 August UTC

11/08/2020-17/08/2020

lead picture

Project of the month 1 | © Projet du Mois | Map data © OpenStreetMap contributors

Mapping

  • A green alley proposal from 2013 is revived: alley=official_green_alley. A green alley is a service alley that residents embellish with vegetation, such as trees, vines and flowers.
  • Stefan Tauner wonders (de) > whether seasonal businesses should be mapped with two nodes. This concerns, for example, bicycle shops that become ski shops, or ice cream shops that are rented out to other businesses in winter.
  • Martin Koppenhoefer has drafted a new wiki page for tree_lined, a tag for indicating that a feature is tree lined. The tag was discussed (de) > on the German forum.
  • As part of Ireland’s 2020 Heritage Week, Anne-Karoline Distel created a video showing how to make your own heritage map for historical and heritage societies.
  • The proposed tag leisure=drawing_surface would identify walls designated for graffiti.
  • Woodpeck writes in his blog about why he is now mapping trees in OpenStreetMap and how he is doing that.

Community

  • Kwame Odame, a doctoral researcher at the University of Cape Coast, Ghana, shared his experience working as a YouthMappers Regional Ambassador in Ghana. He gave a brief overview of YouthMappers in Ghana and talked about the trips he’s made, giving insights into students’ interaction and the impact he’s made in his role.
  • Geoffrey Kateregga wrote in his diary about ‘The State of OpenStreetMap in Africa’, analysing among other things buildings in OSM, buildings per 1000 people, and communication channels.
  • The Times of India reports that the OSM Kerala volunteer Manoj Karingamadathil has mapped Kochi Corporation wards at the request of the Ernakulam sub-district collector. News covered by media has raised the need for more open data in India.
  • [1] Adrien Pavie is working with a small team to develop ProjetDuMois (fr) > en (project of the month) in France, to encourage thematic contributions to OpenStreetMap during a month. The website will offer the community a dashboard with contribution statistics, a web map for efficient mapping, and badges for gamification. Adrien also wrote about his experiences working with full-history OSM files to share his journey with us, hoping to make it easier for others to work with these files.

Imports

  • n76 blogged on the issues he faced while carrying out an import of buildings and addresses in Orange County, California.

OpenStreetMap Foundation

Events

  • Mappers and members of emergency organisations (de) > en from Switzerland and Austria met online (de) > en on 12 August and decided to form a core group (de) > en , with its own mailing list, to discuss topics specific to the field of public safety and develop solutions. Existing or implemented solutions include: Defikarte.ch (de), Notfallkarten (de), and Feuerwehreinsatzkarten mit OSM (de).
  • The OSM Geography Awareness Week is taking place from 15 to 21 November. A global coalition of partners are hosting mapping events at colleges, community centres and other institutions to map places around the globe. If you are organising an event, or even just thinking about it, get in touch.

Humanitarian OSM

  • In recognition of their 10 year anniversary, HOT hosted a webinar on 21 August with members of four OSM communities from around the world to discuss how microgrants have aided their work and growth.
  • At the 2020 HOT Voting Members Annual Meeting on Wednesday 12 August, the members confirmed the re-election of Gertrude ‘Trudy’ Hope Namitala, from Zambia, and the election of Felix Delattre, from Germany, as a new Board member. Other candidates in the election were Celina Agaton, Maning Sambale, Matseliso Thobei Letsie, Dale Kunce and Willy Franck Sob. Questions to the candidates and detailed results are presented on the Board Election 2020 wiki page. Matthew Gibb was re-elected as chairperson.With the principle of rotating seat elections, two Board seats were up for election this year, while next year the election will cover the other five Board seats.
  • Openstreetmap India has been active in HOTOSM. A small team has formed to create and manage HOTOSM tasks.
  • Rebecca Firth, of the Humanitarian OpenStreetMap Team (HOT), gave a TED Talk about how HOT is going to work over the next five years to support the humanitarian mapping community, with the goal of mapping an area home to one billion people.

Education

  • The International Journal of Geoinformation (ISPRS), published by MDPI, Switzerland, invites people to submit research papers for an ‘OpenStreetMap as a multi-disciplinary nexus: Perspectives, Practices and Procedures’ issue. The deadline for manuscript submissions is 30 November 2020.

Maps

  • Supaplex explained why there is a blank area located south-east of Taiwan’s Taoyuan International Airport on most of the commercial maps available in Taiwan. There is a former military base in this blank area, the former Taoyuan Airbase of ROC Air Force and ROC Navy. The former military airport was once the home base of the Black Cat Squadron, which flew U-2 surveillance planes to investigate nuclear weapon infrastructure and capacity in China, and is mapped in very high detail on OpenStreetMap.
  • Mapbox has partnered with Zenrin to provide indoor maps for major metro stations and underground buildings in Japan.
  • Visualisations of operational sites and tactical conditions are becoming increasingly important. The fire brigade of Gossau ZH decided (de) > en to use OpenStreetMap for the creation of an emergency map. The project started with hydrants, with the intention to able to show the nearest one to firefighters as soon as possible. However, the project was quickly expanded to include additional, mission-relevant data.
  • Tatsuo Mitsuchi has created (ja) a map in which the colours are painted according to the direction of the road. The distribution of colours seems to reflect (ja) > the topography and history of the city. You can try it out on your favourite city on a test server. This system is based on city-roads.

switch2OSM

  • Microsoft’s new flight simulator uses OpenStreetMap data to procedurally generate cities (we covered earlier). A combination of an ill-conceived university assignment, a bored student, and a typo created a 212 storey house in the outer suburbs of Melbourne, Victoria. The ‘Melbourne Monolisk’ has been attracting attention, including those who have taken the opportunity to land on it before it disappears forever.

Licences

Software

  • Recently openrouteservice has been updated to version 6.2 by HeiGIT, which has brought some pretty useful features to OSM-based routing. One of the ‘new’ features is the re-introduction of the maximum_speed parameter so users can set the maximum speed their vehicle can travel at.
  • Dirk Stöcker announced that the OpenStreetMap Subversion repository is now read-only (we covered earlier). The JOSM and JMapViewer parts have been moved to the JOSM SVN server. For more detail see the source code and developing plugins wiki pages.

Did you know …

  • Anonymaps, the crowdsourced sarcasm on Twitter?
  • … the real-time LightningMaps is based on OSM? Also available as App. Lightning is displayed within two seconds!
  • … the Android app ‘RADARES of Portugal’ (pt)? It is available for Android devices and provides real-time information on speed radars (fixed or mobile) installed on Portuguese roads. The basemap is OSM.

Other “geo” things

  • Frederik Ramm asked (de) > on talk-de what the state of the art is for fixed-wing drones and received many hints. [1], [2], [3]
  • Manfred Handschuher discussed (de) > the new motorcycle navigation system Garmin Zumo XT.

Upcoming Events

Where What When Country
Derby Derby pub meetup 2020-08-25 united kingdom
Düsseldorf Düsseldorfer OSM-Stammtisch 2020-08-26 germany
London Missing Maps London Mapathon 2020-09-01 united kingdom
Salt Lake City / Virtual OpenStreetMap Utah Map Night 2020-09-01 united states
Stuttgart Stuttgarter Stammtisch 2020-09-02 germany
Taipei OSM x Wikidata #20 2020-09-07 taiwan
Lyon Rencontre mensuelle 2020-09-08 france
Berlin 147. Berlin-Brandenburg Stammtisch 2020-09-10 germany
Munich Münchner Treffen 2020-09-10 germany
Zurich 121. OSM Meetup Zurich 2020-09-11 switzerland
Kandy 2020 State of the Map Asia 2020-10-31-2020-11-01 sri lanka

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by MatthiasMatthias, NunoMASAzevedo, PierZen, Polyglot, Rogehm, Sammyhawkrad, Guillaume Rischard (Stereo), Supaplex, TheSwavu, YoViajo, derFred, naveenpf, richter_fn, ᚛ᚏᚒᚐᚔᚏᚔᚋ᚜ 🏳️‍🌈.

Monthly Report, June 2020

18:19, Wednesday, 19 2020 August UTC

Highlights

  • June marked the end of our Spring 2020 cohort of courses, and we couldn’t be more proud of our instructors and students. To say that Spring 2020 was a challenging term is a gross understatement, and we’re incredibly grateful to and impressed by all of our instructors and students as they forged ahead with their Wikipedia assignments in spite of the upheavals brought about by the pandemic. Students edited 6,390 articles, created 591 new entries, and added 5.27 million words and 56,700 references.
  • In June we debuted our three week long Wikidata Summer Institute. The idea behind this new approach was to condense the length of the course and meet twice a week instead of once. Scheduling has proven to be challenging during the pandemic and this seemed like a worthwhile change to test. Although total edits were significantly lower than the longer courses, the engagement and enthusiasm of the participants was as active as ever. Take a look at this Dashboard to see a detailed list of their accomplishments. We were thrilled to work with participants from Western Canada, LACMA, and some independent data consultants.
  • In June, we were happy to work with the American Academy of Developmental Medicine & Dentistry (AADMD) to meet their members virtually. Several AADMD members joined our Wiki Scholars course sponsored by the WITH Foundation, and we enjoyed speaking to members about Wikipedia’s missing coverage of healthcare access for people with disabilities. We’re excited to see the work those members will do in our current course to make Wikipedia more equitable.

Organizational updates

Take our survey

We’ve been publishing this Monthly Report in the same format for many years now. As we turn to our new fiscal year on July 1, we are interested in determining if the report still meets our stakeholders’ needs. We’ve created a survey about our Monthly Report. If you read our report, please take a few minutes to give us your feedback.

Staffing

At the end of June, we said goodbye to several staff members, a painful decision necessitated by the economic uncertainty in the United States due to the COVID-19 pandemic. We express our deep gratitude to the following individuals for their contributions to Wiki Education:

  • Paul Carroll joined us in February as our Director of Institutional Funding. While we didn’t work with Paul long, he dove right into Wiki Education’s world. Paul made every effort from the first day to use his considerable network in the philanthropic community to support our work. We appreciated his enthusiasm for our work.
  • Elysia Webb worked for us as a Wikipedia Expert for nearly two years. She supported countless student editors and participants in our Scholars & Scientists Program, answering questions and providing excellent feedback on work. More recently, she taught some of our Scholars & Scientists Program courses, where her knowledge of Wikipedia enriched the participants’ experiences.
  • Cassidy Villeneuve has served as the voice of our organization since 2017. She has expertly guided our blog, social media, training material development, marketing, and any other communications task we threw her way. Cassidy’s work is read by thousands of our program participants each year as they seek out her clear explanations of complex Wikipedia policies and guidelines, and we know her work will continue increasing our visibility and informing our participants
  • Ozge Gundogdu’s work as office manager, executive assistant, and human resources manager entailed tracking a lot of little details, from ensuring we paid our rent on time to making sure our time off was accurately recorded. She worked closely with our ED, supported our external accountants, and served as the liaison to the board. Throughout it all, Ozge made everyone feel supported and appreciated.
  • Shalor Toncray single-handedly supported thousands of new Wikipedia editors each year as a Wikipedia Expert. As the first Wikipedian many of our participants engaged with for the last three years, Shalor showcased the best newbie welcome Wikipedia can offer. She was exceptionally dedicated to ensuring Wikipedia’s quality got better while student editors had a great experience.
  • Ryan McGrady worked for Wiki Education for more than five years, serving in a variety of Program Manager roles. He’s overseen our Wikipedia Student Program, our Visiting Scholars Program, and most recently built our Scholars & Scientists Program’s Wikipedia courses. His deep engagement with both Wikipedia and academia shines through everything he does, and his dedication enriched every program he’s worked on. His influence will elevate our work for years to come.
  • Samantha Weald joined Wiki Education in 2014, and for many instructors in our Student Program and participants in our Scholars & Scientists Program, she has been the first person they meet. Her outreach skills have brought participants into every program we’ve run, and her ability to make literally hundreds of individuals feel personally welcomed simultaneously is nothing short of astounding. Samantha has been instrumental in establishing processes that have helped Wiki Education scale our impact with limited resources, and we’re grateful for her adaptability and role as a team player.

We wish these seven individuals all the best in their future endeavors.

Programs

Wikipedia Student Program

Status of the Wikipedia Student Program for Spring 2020 in numbers, as of June 30:

  • 409 Wiki Education supported courses were in progress (268, or 65%, were led by returning instructors).
  • 7,498 student editors were enrolled.
  • 54% of students were up-to-date with their assigned training modules.
  • Students edited 6,390 articles, created 591 new entries, and added 5.27 million words and 56,700 references.

June marked the end of our Spring 2020 cohort of courses, and we couldn’t be more proud of our instructors and students. To say that Spring 2020 was a challenging term is a gross understatement, and we’re incredibly grateful to and impressed by all of our instructors and students as they forged ahead with their Wikipedia assignments in spite of the upheavals brought about by the pandemic. 

The Student Program team spent much of June closing out courses from the spring term and planning ahead for the fall. We know the Fall term will also be full of uncertainties, and we’re striving to support our program participants in the ways that help them most.

Student work highlights:

The idea that plants, which lack a brain or nervous system, are capable of “behavior” may seem odd to many, but that’s the focus on Elizabeth Van Volkenburgh’s Plant Behavior class. Students in the class created articles on hydraulic signaling in plants, plant nucleus movement, Mechanoreceptors (in plants), root phenotypic plasticity and plant memory. Another student tripled the size of the plant root exudates article, which had been created by a student in an earlier iteration of the class back in 2018, and another expanded the kin selection article, an important concept in evolutionary biology, to include information about plants.

For the fourth consecutive year, students in Ashleigh Theberge’s Meso and Microfluidics in Chemical Analysis course continued to expand and improve the droplet-based microfluidics article that students in a previous class created in 2016. Other students in the class made major improvements to related articles like microfluidic cell culture, digital microfluidics, paper-based microfluidics as well as the main microfluidics article.

Scholars & Scientists Program

Wikidata

We’re excited to be trying out a new version of our Wikidata program. In June we debuted our three week long Wikidata Summer Institute. The idea behind this new approach was to condense the length of the course and meet twice a week instead of once. Scheduling has proven to be challenging during the pandemic and this seemed like a worthwhile change to test. Although total edits were significantly lower than the longer courses, the engagement and enthusiasm of the participants was as active as ever. Take a look at this Dashboard to see a detailed list of their accomplishments. We were thrilled to work with participants from Western Canada, LACMA, and some independent data consultants.

This group did some excellent work. Check out the newly-improved item for Salome Bey, a Canadian singer, composer, and actor. One participant also did some excellent work on the Plan of laying out the ground of Publick Square, London (Ontario) item.

This new approach to the Wikidata courses will create new opportunities for more participants to take the course and we hope the new schedule will be easier to fit into participants’ already-busy weeks.

Wikipedia

This month we wrapped up our first COVID-themed course and launched a second 6-week intensive course. The idea behind both of these courses is to focus on improving Wikipedia’s coverage of COVID-19 pandemic information, specifically state-specific articles. Since responses vary so much from state to state, capturing this information has become even more vital. This original blog post details how we are able to run a course at no cost to participants in order to shore up this essential content.

From the course that wrapped up, you should spend some time looking at how much the North Dakota article has expanded. Similarly, new sections have appeared on the New York article. Although these courses are condensed, the urgency around this topic places a special emphasis on the timeliness of high-quality information on these articles. We have yet to see how this will impact continued editing after the courses wrap up, but it is our hope that these participants are able to continue to add to these articles and develop them.

Our second COVID-themed course has also begun. We are thrilled to be working with these 17 editors, most of whom have not edited Wikipedia before. The approach to this course is identical to the first one. A set of editors in this group have taken to improving the impact section of the articles. Most impact sections have, so far, only addressed the pandemic’s impact on sports. These participants acknowledge that the impact of the pandemic extends into other aspects of life – education, economy, workplaces, and specific communities of those affected by the pandemic. Spend some time on the South Carolina and Arizona articles to see these newly expanded sections. We’re looking forward to all of the contributions this group will be making to these articles.

We are pleased to announce that we have started a second Wiki Scholar course with the WITH Foundation. As with the first course, the focus of this course will be improving articles in the field of disability and individuals with disabilities. We are just a few weeks into the course, getting to know these 14 excellent participants. As the weeks go by, keep your eye on this Dashboard to track their hard work and new developments.

This month we launched our fourth course Wiki Scholars course in partnership with the Society of Family Planning (SFP). As with the previous courses, this course will focus on training SFP members to improve Wikipedia articles related to abortion, contraception and related topics. We know that Wikipedia plays a significant role in the personal research people do about health and medicine, and we are happy to work with SFP to ensure the public has access to the highest quality information about family planning. Visit this Dashboard to keep up with their contributions to free knowledge. 

Advancement

Partnerships

In June, we were happy to work with the American Academy of Developmental Medicine & Dentistry (AADMD) to meet their members virtually. Several AADMD members joined our Wiki Scholars course sponsored by the WITH Foundation, and we enjoyed speaking to members about Wikipedia’s missing coverage of healthcare access for people with disabilities. We’re excited to see the work those members will do in our current course to make Wikipedia more equitable. 

We confirmed a Wiki Scientists course with the American Physical Society and will spend July and August recruiting members who are eager to expand Wikipedia’s coverage of physicists. 

Finally, we spent the month recruiting scholars and scientists to participate in our third Wiki Scholars course about state and regional responses to COVID-19. We look forward to seeing the great work this engaged group does to bring regional data to the public.

Communications

Blog posts:

Technology

June was a busy month for the Wiki Education technology. We deployed a bevy of updates to the Dashboard and wikiedu.org to prepare for our Wikipedia Classroom Program plans for Fall 2020, and our two Google Summer of Code students, Amit and Shashwat, got off to very productive starts to the official “coding period” of their projects.

For the upcoming Fall 2020 term and the new application process for the Wikipedia Student Program, Chief Technology Officer Sage Ross worked with Student Program Manager Helaine Blumenthal to update our instructor orientation, the Dashboard course creation process and assignment wizard, and many of the automated emails that the Dashboard sends to instructors before, during and after the term. Sage began preparing an FAQ system for the Dashboard, which will replace our earlier question-and-answer system (ask.wikiedu.org) in July. We also finished up and deployed a feature developed by former Outreachy intern and Outreachy mentor Khyati Soneji: tracking contributions to specific sets of articles defined with the PagePile tool, which is an alternative to using Categories or templates to identify a relevant set of pages.

We also fixed several bugs that affected specific browsers, and updated the Dashboard for the domain switch from tools.wmflabs.org to the venerable toolforge.org.

Returning Summer of Code student Amit Joki implemented a set of changes that reduce the amount of JavaScript required to load the Dashboard. This will improve site load times a little bit, and may make a substantial difference for Programs & Events Dashboard users with significant bandwidth limitations. Summer of Code student Shashwat Kathuria integrated better error tracking into the Dashboard stats update processes, so that each course can now show program organizers when there has been a problem importing recent editing activity for their event. These projects continue through August, and we anticipate both students will move on to some of their stretch goals for the final weeks.

Finance & Administration

The total expenditures for the month of June were $167K, ($16K) under the budget of $183K. The Board was under ($9K) by moving the Board Meeting from In-person to Remote. Fundraising was over budget +$3K due to employment costs +$4K while under ($1K) in Travel. General & Administrative were over +$7K due to Indirect overhead allocation change +$5K, Payroll Costs +$1K, and Administrative Costs +$4K while under in Professional Fees ($2K) and Location Expense ($1K). Programs were under by ($17K) including Payroll ($7K), Travel ($2K), Communications ($2K) and Indirect costs ($6K).

The Year-to-date expenses $2.219K were ($58K) under the budget of $2.277K. The Board was under budget by ($16K) due to a combination of +$4K in payroll costs while under ($20K) in Board Meeting expenses. Fundraising was over +$19K due to interim consulting work +$10K and Payroll +$10K, while under ($1K) in Indirect Costs. General & Administrative were over +$129K. +$136K in Indirect Cost allocations, +$8K in payroll, +$7K in Travel, +$4K Furniture and Office Expenses, and +$4K in Communications while under budget ($6K) in Professional Fees ($24K) in Occupancy Costs. Programs were under ($190K), of which ($136K) were Indirect Costs, ($62K) in Travel, ($22K) Communications, (3K) Office Supplies, ($3K) Professional Fees while showing overages +$36K in payroll.

Office of the ED

Current priorities:

  • Finalizing the annual plan & budget for fiscal year 2020–21
  • Annual Board Meeting (Zoom)
  • Dealing with the effects of the COVID-19 pandemic on our organization

In early June, Frank sent the final version of the annual plan and budget for fiscal year 2020–21 to the members of Wiki Education’s board. Given the severity of the impact of the COVID-19 pandemic on the economy and the uncertainty around the endowments of institutional funders, we don’t expect to generate the revenue that would be necessary for keeping Wiki Education’s operations on the same level as in the past. That’s why the new annual plan calls for moving the organization fully online to save the money we’re currently spending on our office space in the Presidio of San Francisco. Furthermore, the plan for fiscal year 2020–21 calls for reducing Wiki Education’s headcount significantly, yet in a way that will allow the organization to provide our core services to an extent that is reasonable under the current conditions.

On June 5th and 6th, the board held its annual board meeting. Due to the coronavirus pandemic, this year’s board meeting took place virtually through Zoom. On the first day, Frank reflected on the past year, which – until COVID-19 hit the United States – had been on a very good trajectory. Then, Frank presented the annual plan for next fiscal year, followed by LiAnna and Sage who talked about our plans for Programs and Technology. Subsequently, the board approved the plan & the budget. On day two, the board renewed the terms of some of its members and then discussed the current status of Philanthropy and Education in the United States before moving into the Executive Session.

During the rest of the month, Frank, supported by Ozge, dealt with the layoffs due to COVID-19.