The annual Wikimedia Hackathons brings people with different skillsets together to improve the software that powers Wikimedia websites. This edition took place at the Prague National Library of Technology and provided an opportunity for volunteer technologists to meet in person, share ideas, have fun, and work together to improve the software that Wikipedia and its sister projects depend on to ensure free knowledge is available and accessible to anyone with an internet connection.

Each year the Wikimedia Hackathon takes place in a different location. This year, volunteers with WMCZ—an independent, nonprofit affiliate organization—did the work of planning and hosting a diverse group of attendees, who ranged from long term contributors to Wikimedia projects, to brand new members of the community.

Natalia, WMCZ’s event organizer, describes her experience: “It was both challenging and exciting to work on organizing this technical event. Our ultimate goal was to ensure the best working conditions for participants so they could focus on what they came to Prague for. This would not have been possible if I had not had an amazing team which was well organized, proactive and deeply involved.”

The planning was worth it! For three days, we saw people who are used to collaborating in online spaces working together in a physical one. Attendees shared their knowledge and skills in real time. They discussed and informed each other about ongoing projects. Experienced community members enthusiastically mentored and helped newcomers to get their hands on code.

As Jon describes it, “Everywhere I look, someone is helping somebody solve the latest problem, or sharing a laugh, or furiously coding to wrap up their project so they have something to show.

It was truly inspiring to observe and participate with others in a real space. Whether people spent their time one-on-one, attended organized sessions on software development, or gathered together in front of their laptops at tables in the hacking area, we know that every collaboration has a real effect, and that we were able to help facilitate that.  What happened at the Prague Hackathon will make the life of readers, editors, translators, multimedia creators, open data enthusiasts, and developers easier.

It’s especially inspiring to see new attendees. Nearly half of this year’s were at their first Wikimedia Hackathon!

Gopa, who attended his first Wikimedia Hackathon and works on an online tool which will allow users to cut videos uploaded to Wikimedia Commons, shares that “hacking through the code for 3 continue days and developing something productive is a great learning experience.”

More than 300 software projects, trainings, discussion sessions and activities proposed by attendees took place over the three days of the Hackathon, and hundreds of code changes were made to improve the user experience on Wikimedia websites.

On the last day, dozens of achievements were presented by attendees in a showcase session. This includes:

 

 

  • Tonina worked on improving the search function (see screenshot above). She added a new feature which allows you to sort your search results on Wikimedia websites by date edited or date created of a wiki page.
  • Florian wrote an extension called PasswordlessLogin. His proof of concept allows you to log into a Wikimedia website with your smartphone without having to enter your password.

 
The next large gathering of Wikimedians to work on Wikimedia software will take place at Wikimania Stockholm in August 2019.

Andre Klapper, Developer Advocate (Contractor), Technical Engagement
Sarah Rodlund, Technical Writer,
Technical Engagement

Wikimedia Foundation

Last chance to sign up for July Wikidata courses!

17:45, Monday, 10 2019 June UTC

Like Wikipedia, Wikidata is a collaborative online community that organizes knowledge and presents it to the world for free. This global repository is important for so many reasons, chiefly among them that the data stored in Wikidata is machine readable. That means when you ask Alexa or Siri a question, it’s likely that the answer is coming from Wikidata. By engaging in Wikidata’s community, you have the power to equip millions of people with accurate and comprehensive information.

Wiki Education is offering online courses and in-person workshops for those interested in learning how to contribute to Wikidata and answer their own research questions utilizing its analytical tools. Take a course or attend a workshop with us this July! Enroll one participant for $800, or, enroll two or more participants from your institution for $500 each. To reserve your seat, please fill out your registration by June 14th. The payment deadline is June 24th.

Wikidata as a platform allows librarians (and researchers in general!) to do things we’ve all only dreamt of before. What’s so great about it is that once you have a grip on linked data practices and the mechanics of working within Wikidata, the applications are endless. The repository is only expanding (there are already 56 million items). Soon, thousands of institutions around the world will have connected their collections where we can all query them. Here are just a few applications you can pursue after taking our Wikidata course(s):

  • Elevate the visibility of your collections by mapping your items in the global repository that is Wikidata.
  • Draw new insights about your collections using Wikidata’s customizable visualization and query tools.
  • Gain a comprehensive understanding of existing research by tracking and linking faculty publications.
  • Develop an equitable and inclusive ontology for linked data.
  • Teach students data literacy by incorporating Wikidata / metadata practices into the classroom.

If you’re new to linked data, check out our beginner course. If you have some experience with linked data (not necessarily Wikidata), check out our intermediate course. And if you’re in the DC or New York area, sign up for one of our in-person workshops!

So much is possible with Wikidata. We’ll help you discover how it can best work for your goals.


To explore more options or to sign up to receive updates about future courses, visit data.wikiedu.org.

Ewan McAndrew (centre) at the University of Edinburgh Spy Week Wikipedia edit-a-thon – image by Mihaela Bodlovic CC BY-SA 4.0

Wikimedia UK is very pleased to announce that our partners at the University of Edinburgh have been awarded the Innovative Use of Technology award for their use of Wikipedia in the Curriculum at the Herald Higher Education Awards 2019.

University of Edinburgh Wikimedian in Residence Ewan McAndrew has been leading on this work in Edinburgh, running dozens of Wikimedia events since he began his residency in January 2016, and developing innovative projects and partnerships across the university

The award is well-deserved recognition for Ewan’s hard work in changing the perception of Wikipedia within academia, and the progress that has been made in the understanding of Wikimedia projects as important teaching tools.

Ewan attended the LILAC information literacy conference at Nottingham University in April, where he saw evidence of the growing recognition of Wikipedia as a learning platform.he University of Edinburgh was also awarded Wikimedia UK’s Partnership of the Year award in 2018.

As of April 2019, Ewan has delivered a total of 156 training sessions, trained 635 students, 419 staff, and 260 members of the public, and helped create 476 Wikipedia articles and improve 1946 articles.

Courses at the University which now include a Wikipedia assignment include: World Christianity MSc, Translation Studies MSc, History MSc (Online), Global Health MSc, Digital Sociology MSc, Data Science for Design MSc, Language Teaching MSc, Psychology in Action MSc, Digital Education MSc, Public Health MSc and Reproductive Biology Honours. Working with the Wikimedia projects not only allows students to improve the skills any university wants to develop (such as critical reading, summarising, paraphrasing, original writing, referencing, citing, publishing, data handling), but allows them to have influence beyond the university, with their work being read and influencing thousands of people reading Wikipedia.

Wikimedia UK hopes that the long term success of Ewan’s residency will encourage other universities to also employ Wikimedians to mainstream the use of Wikimedia projects as teaching and learning tools in UK universities. This trend seems to be taking effect already as Coventry University’s Disruptive Media Learning Lab has recently employed Andy Mabbett as a part time Wikimedian in Residence, and we hope that other universities will follow suit.

We would like to thank Melissa Highton, the Director of Learning, Teaching and Web Services at the University, for her vision and support of the residency, and Allison Littlejohn, whose research was crucial in showing that it was a worthwhile endeavour. Dr Martin Poulter’s work as Wikimedian in Residence at the Bodleian Libraries, Oxford, was also instrumental in demonstrating the worth of a Wikimedian in Residence in a university setting.

We wish Ewan and the rest of the team at the University of Edinburgh all the best, and look forward to their further recognition as trailblazers of learning innovation in the higher education sector.

 

@Wikipedia: #notability versus #relevance

11:40, Monday, 10 2019 June UTC
I had a good discussion with imho a deletionist Wikipedia admin. For me the biggest take away was how notability is in the way of relevance.

With statements made like: "There are only two options, one is that the same standards apply, and the other is the perpetuation of prejudice" and "I view our decisions of notability as primarily subjective--decisions based on individual values and understandings of what WP should be like" no/little room is given for contrary points of view.

Notability has as its problem that it enables such a personal POV while relevance is about what others want to read. For a professor Bart O. Roep there is no article. Given two relevant diabetes related awards he should be notable and as he started a human study for a vaccine for diabetes type 1, he should be extremely relevant.

A personal POV ignoring the science that is in the news has its dangers. It is easy enough for Wikimedians to learn about scientific credentials, the papers are there to read but what we write is not for us but for our public. Withholding articles opens our public up to fake facts and fake science. An article about Mr Roep is therefore relevant and timely particularly because people die as they cannot afford their insulin. Articles about the best of what science has to bring about diabetes now is of extreme relevance.

At Wikidata, there is no notability issue. Given the relevance of diabetes all that is needed is to concentrate effort for a few days on a subject. New authors and papers are connected to what we already have, genders are added to authors (to document the gender ratio) and as a result more objective facts available for the subjective Wikipedia admins to consider, particularly when they accept tooling like Scholia to open up the available data.
Thanks,
      GerardM

Introducing the codehealth pipeline beta

06:02, Monday, 10 2019 June UTC

After many months of discussion, work and consultation across teams and departments[0], and with much gratitude and appreciation to the hard work and patience of @thcipriani and @hashar, the Code-Health-Metrics group is pleased to announce the introduction of the code health pipeline. The pipeline is currently in beta and enabled for GrowthExperiments, soon to be followed by Notifications, PageCuration, and StructuredDiscussions. (If you'd like to enable the pipeline for an extension you maintain or contribute to, please reach out to us via the comments on this post.)

What are we trying to do?

The Code-Health-Metrics group has been working to define a set of common code health metrics. Our current understanding of code health factors are: simplicity, readability, testability, buildability. Beyond analyzing a given patch set for these factors, we also want to have a historical view of code as it evolves over time. We want to be able to see which areas of code lack test coverage, where refactoring a class due to excessive complexity might be called for, and where possible bugs exist.

After talking through some options, we settled on a proof-of-concept to integrate Wikimedia's gerrit patch sets with SonarQube as the hub for analyzing and displaying metrics on our code[1]. SonarQube is a Java project that analyzes code according to a set of a rules. SonarQube has a concept of a "Quality Gate", which can be defined organization wide or overridden on a per-project basis. The default Quality Gate says that of code added in a patch set, over 80% of it must be covered by tests, less than 3% of it may contain duplicated lines of code, and the maintainability, reliability and security ratings should be graded as an A. If code passes these criteria then we say it has passed the quality gate, otherwise it has failed.

Here's an example of a patch that failed the quality gate:

screenshot of sonarqube quality gate

If you click through to the report, you can see that it failed because the patch introduced an unused local variable (code smell), so the maintainability score for that patch was graded as a C.

How does it integrate with gerrit?

For projects that have been opted in to the code health pipeline, submitting a new patch or commenting with "check codehealth" will result in the following actions:

  1. The mwext-codehealth-patch job checks out the patchset and installs MediaWiki
  2. PHPUnit is run and a code coverage report is generated
  3. npm test:unit is run which may generate a code coverage report if the package.json file is configured to do so
  4. sonar-scanner binary runs which sends 1) the code, 2) PHP code coverage, and 3) the JavaScript code coverage to Sonar
  5. After Sonar is done analyzing the code and coverage reports, the pipeline reports if the quality gate passed or failed. The outcome does not prevent merge in case of failure.
pipeline screenshot

If you click the link, you'll be able to view the analysis in SonarQube. From there you can also view the code of a project and see which lines are covered by tests, which lines have issues, etc.

Also, when a patch merges, the mwext-codehealth-master-non-voting job executes which will update the default view of a project in SonarQube with the latest code coverage and code metrics.[3]

What's next?

We would like to enable the code health pipeline for more projects, and eventually we would like to use it for core. One challenge with core is that it currently takes ~2 hours to generate the PHPUnit coverage report. We also want to gather feedback from the developer community on false positives and unhelpful rules. We have tried to start with a minimal set of rules that we think everyone could agree with but are happy to adjust based on developer feedback[2]. Our current list of rules can be seen in this quality profile.

If you'll be at the Hackathon, we will be presenting on the code health pipeline and SonarQube at the Code health and quality metrics in Wikimedia continuous integration session on Friday at 3 PM. We look forward to your feedback!

Kosta, for the Code-Health-Metrics group


[0] More about the Code Health Metrics group: https://www.mediawiki.org/wiki/Code_Health_Group/projects/Code_Health_Metrics, currently comprised of Guillaume Lederrey (R), Jean-Rene Branaa (A), Kosta Harlan (R), Kunal Mehta (C), Piotr Miazga (C), Željko Filipin (R). Thank you also to @daniel for feedback and review of rules in SonarQube.
[1] While SonarQube is an open source project, we currently use the hosted version at sonarcloud.io. We plan to eventually migrate to our own self-hosted SonarQube instance, so we have full ownership of tools and data.
[2] You can add a topic here https://www.mediawiki.org/wiki/Talk:Code_Health_Group/projects/Code_Health_Metrics
[3] You might have also noticed a post-merge job over the last few months, wmf-sonar-scanner-change. This job did not incorporate code coverage, but it did analyze most of our extensions and MediaWiki core, and as a result there is a set of project data and issues that might be of interest to you. The Issues view in SonarQube might be interesting, for example, as a starting point for new developers who want to contribute to a project and want to make some small fixes.

Tech News issue #24, 2019 (June 10, 2019)

00:00, Monday, 10 2019 June UTC
TriangleArrow-Left.svgprevious 2019, week 24 (Monday 10 June 2019) nextTriangleArrow-Right.svg
Other languages:
Bahasa Indonesia • ‎English • ‎Tiếng Việt • ‎dansk • ‎español • ‎français • ‎polski • ‎português do Brasil • ‎suomi • ‎svenska • ‎čeština • ‎русский • ‎українська • ‎עברית • ‎العربية • ‎فارسی • ‎हिन्दी • ‎中文 • ‎日本語 • ‎粵語

The title is a little wordy, but I hope you get the gist. I just spent 10 minutes staring at some data on a Grafana dashboard, comparing it with some other data, and finding the numbers didn’t add up. Here is the story in case it catches you out.

The dashboard

The dashboard in question is the Wikidata Edits dashboard hosted on the Wikimedia Grafana instance that is public for all to see. The top of the dashboard features a panel that shows the total number of edits on Wikidata in the past 7 days. The rest of the dashboard breaks these edits down further, including another general edits panel on the left of the second row. 

The problem

The screenshot above shows that the top edit panel is fixed to show the last 7 days (this can be seen by looking at the blue text in the top right of the panel). The second edits panel on the left of the second row is also currently displaying data for the last 7 days (this can be seen by looking at the range selector on the top right of the dashboard.

The outlines of the 2 graphs in the panels appear to follow the same general shape. However both panels show different totals for the total edits made in the window. The first panel reports 576k edits in 1 week, but the second panel reports 307k. What on earth is going on?

Double checking the data against another source I found that both numbers  here are totally off. For a single day the total edits is closer to 700k, which scales up to 4-5 million edits per week.

hive (event)> select count(*)
            > from mediawiki_revision_create
            > where `database` = "wikidatawiki"
            > and meta.dt between "2018-09-09T02:00Z" and "2018-09-10T02:00Z"
            > and year=2018 and month=9 and (day=9 or day=10)
            > ;
.....
_c0
702453
Time taken: 24.991 seconds, Fetched: 1 row(s)

maxDataPoints

The Graphite render API used by Grafana has a parameter called maxDataPoints which decides the total number of data points to return. The docs are slightly more detailed saying:

Set the maximum numbers of datapoints for each series returned when using json content.
If for any output series the number of datapoints in a selected range exceeds the maxDataPoints value then the datapoints over the whole period are consolidated.
The function used to consolidate points can be set using the consolidateBy function.

Graphite 1.14 docs

Reading the documentation of the consolidateBy functions we find the problem:

The consolidateBy() function changes the consolidation function from the default of ‘average’ to one of ‘sum’, ‘max’, ‘min’, ‘first’, or ‘last’.

Graphite 1.14 docs

As the default consolidateBy function of ‘average’ is used, the total value on the dashboard will never be correct. Instead we will get the total of the averages.

Fixes for the dashboard

I could set the maxDataPoints parameter to 9999999 for all panels, that would mean that the previous assumptions would now hold true. Grafana would be getting ALL of the data points in Graphite and correctly totaling them. I gave it a quick shot but it probably isn’t what we want. We don’t need that level of granularity.

Adding consolidateBy(sum) should do the trick. And in the screenshot below we can now see that the totals make sense and roughly line up with our estimations.

For now I have actually set the second panel to have a maxDataPoints value for 9999999. As the data is stored at a minutely granularity this means roughly 19 years of minutely data can be accessed. When looking at the default of 7 days that equates to 143KB of data.

Continued confusion and misdirection

I have no doubt that Grafana will continue to trip me and others up with little quirks like this. At least the tooltip for the maxDataPoints options explains exactly what the option does, although this is hidden by default on the current Wikimedia version.

Data data everywhere. If only it were all correct.

The post Grafana, Graphite and maxDataPoints confusion for totals appeared first on Addshore.

Wikidata is 6

18:42, Sunday, 09 2019 June UTC

It’s was Wikidata’s 6th birthday on the 30th of October 2018. WMUK celebrated this with a meetup on the 7th of November. They also made this great post event video.

Video from WMUK hosted Wikidata birthday event

Celebrated all over the world

The 6th birthday was celebrated in over 40 different locations around the world according to the Wikidata item for the birthday:

Presents

Various Wikidata related presents were made by various volunteers. The presents can be found on Wikidata:Sixth_Birthday on the right hand side and include various scripts, tools, dashboard and lists.

Next year

The 7th birthday will again take the form of a WikidataCon conference.

Watch this space…

The post Wikidata is 6 appeared first on Addshore.

Wikidata Map October 2018

18:42, Sunday, 09 2019 June UTC

It has been another 6 months since my last post in the Wikidata Map series. In that time Wikidata has gained 4 million items, 1 property with the globe-coordinate data type (coordinates of geographic centre) and 1 million items with coordinates [1]. Each Wikidata item with a coordinate is represented on the map with a single dim pixel. Below you can see the areas of change between this new map and the once generated in March. To see the equivalent change in the previous 4 months take a look at the previous post.

Comparison of March 26th and October 1st maps in 2018

Daniel Mietchen believes that lots of the increased coverage could probably be attributed to Cebuano Bot. (link needed).

Areas of increase

Below I have extracted sections of the map that have shown significant increase in items.

If you know why these areas saw an increase, such as a wikiproject or individual working on the area, then please leave a comment below and I’ll be sure to add explanations for each area.

If you think I have missed an area also please leave a comment and I’ll add that too!

Africa

Some areas within Africa can be picked out as having specific increases:

  • Republic of Cameroon
  • Gabonese Republic
  • Democratic Republic of the Congo
  • People’s Democratic Republic of Algeria
  • Republic of Djibouti

The increase in the coverage on the African continent in general by Wikidata could be down to Wikimania 2018 which was held in Cape Town. Cape Town itself doesn’t show any real increase in items in the 6 month period and is not included in the image above. Mexico also so an increase to the number of items in Wikidata in the area when Wikimania was hosted there in 2015.

Asia

The main areas of increase here appear to be:

  • Jakarta
  • Indonesia
  • Bangkok
  • North Korea

Europe

The main areas of increase here appear to be:

  • Scotland
  • Ireland
  • Norway
  • Finland
  • Latvia
  • Greece
  • Croatia
  • Cyprus (while not in europe) can be seen in the bottom right

North America

There is a general increase across the whole of North America. Most notably the west of continent and Canada.

The Dominican republic can also be seen in bright colour to the bottom right of the image.

South America

South America has a general increase throughout, however various areas appear highlighted such as:

  • Columbia
  • Chile
  • São Paulo & Brazil

Smaller snippets

Iceland

Sri Lanka & Maldives

Fiji

Footnotes

[1] Number of items with coordinates based on grepping the generated wdlabel.json file used by the map generation.
addshore@stat1005:~$ grep -o ",\"Q" wdlabel-20181001.json | wc -l
6765828
addshore@stat1005:~$ grep -o ",\"Q" wdlabel-20180326.json | wc -l
5764875

Links

The October 2018 images: https://tools.wmflabs.org/wikidata-analysis/20181001/geo2png/

The post Wikidata Map October 2018 appeared first on Addshore.

wikibase-docker, Mediawiki & Wikibase update

18:42, Sunday, 09 2019 June UTC

Today on the Wikibase Community User Group Telegram chat I noticed some people discussing issues with upgrading Mediawiki and Wikibase using the docker images provided for Wikibase.

As the wikibase-registry is currently only running Mediawiki 1.30 I should probably update it to 1.31, which is the next long term stable release.

This blog post was written as I performed the update and is yet to be proofread, so expect some typos. I hope it can help those that were chatting on Telegram today.

Starting state

Documentation

There is a small amount of documentation in the wikibase docker image README file that talks about upgrading, but this simply tells you to run update.php.

Update.php has its own documentation on mediawiki.org.
None of this helps you piece everything together for the docker world.

Installation

The installation creation process is documented in this blog post, and some customization regarding LocalSettings and extensions was covered here.
The current state of the docker-compose file can be seen below with private details redacted.

This docker-compose files is found in /root/wikibase-registry on the server hosting the installation. (Yes I know that’s a dumb place, but that’s not the point of this post)

version: '3'

services:
  wikibase:
    image: wikibase/wikibase:1.30-bundle
    restart: always
    links:
      - mysql
    ports:
     - "8181:80"
    volumes:
      - mediawiki-images-data:/var/www/html/images
      - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
      - ./Nuke:/var/www/html/extensions/Nuke
      - ./ConfirmEdit:/var/www/html/extensions/ConfirmEdit
    depends_on:
    - mysql
    environment:
      MW_ADMIN_NAME: "private"
      MW_ADMIN_PASS: "private"
      MW_SITE_NAME: "Wikibase Registry"
      DB_SERVER: "mysql.svc:3306"
      DB_PASS: "private"
      DB_USER: "private"
      DB_NAME: "private"
      MW_WG_SECRET_KEY: "private"
    networks:
      default:
        aliases:
         - wikibase.svc
         - wikibase-registry.wmflabs.org
  mysql:
    image: mariadb:latest
    restart: always
    volumes:
      - mediawiki-mysql-data:/var/lib/mysql
    environment:
      MYSQL_DATABASE: 'private'
      MYSQL_USER: 'private'
      MYSQL_PASSWORD: 'private'
      MYSQL_RANDOM_ROOT_PASSWORD: 'yes'
    networks:
      default:
        aliases:
         - mysql.svc
  wdqs-frontend:
    image: wikibase/wdqs-frontend:latest
    restart: always
    ports:
     - "8282:80"
    depends_on:
    - wdqs-proxy
    environment:
      BRAND_TITLE: 'Wikibase Registry Query Service'
      WIKIBASE_HOST: wikibase.svc
      WDQS_HOST: wdqs-proxy.svc
    networks:
      default:
        aliases:
         - wdqs-frontend.svc
  wdqs:
    image: wikibase/wdqs:0.3.0
    restart: always
    volumes:
      - query-service-data:/wdqs/data
    command: /runBlazegraph.sh
    environment:
      WIKIBASE_HOST: wikibase-registry.wmflabs.org
    networks:
      default:
        aliases:
         - wdqs.svc
  wdqs-proxy:
    image: wikibase/wdqs-proxy
    restart: always
    environment:
      - PROXY_PASS_HOST=wdqs.svc:9999
    ports:
     - "8989:80"
    depends_on:
    - wdqs
    networks:
      default:
        aliases:
         - wdqs-proxy.svc
  wdqs-updater:
    image: wikibase/wdqs:0.3.0
    restart: always
    command: /runUpdate.sh
    depends_on:
    - wdqs
    - wikibase
    environment:
      WIKIBASE_HOST: wikibase-registry.wmflabs.org
    networks:
      default:
        aliases:
         - wdqs-updater.svc

volumes:
  mediawiki-mysql-data:
  mediawiki-images-data:
  query-service-data:

Backups

docker-compose.yml

So that you can always return to your previous configuration take a snapshot of your docker-compose file.

If you have any other mounted files it also might be worth taking a quick snapshot of those.

Volumes

The wikibase docker-compose example README has a short section about backing up docker volumes using the loomchild/volume-backup docker image.
So let’s give that a go.

I’ll run the backup command for all 3 volumes used in the docker compose file which cover the 3 locations that I care about that persist data.

docker run -v wikibase-registry_mediawiki-mysql-data:/volume -v /root/volumeBackups:/backup --rm loomchild/volume-backup backup mediawiki-mysql-data_20190129
docker run -v wikibase-registry_mediawiki-images-data:/volume -v /root/volumeBackups:/backup --rm loomchild/volume-backup backup mediawiki-images-data_20190129
docker run -v wikibase-registry_query-service-data:/volume -v /root/volumeBackups:/backup --rm loomchild/volume-backup backup query-service-data_20190129

Looking in the /root/volumeBackups directory I can see that the backup files have been created.

ls -lahr /root/volumeBackups/ | grep 2019
-rw-r--r-- 1 root root 215K Jan 29 16:40 query-service-data_20190129.tar.bz2
-rw-r--r-- 1 root root  57M Jan 29 16:40 mediawiki-mysql-data_20190129.tar.bz2
-rw-r--r-- 1 root root  467 Jan 29 16:40 mediawiki-images-data_20190129.tar.bz2

I’m not going to bother checking that the backups are actually complete here, but you might want to do that!

Prepare the next version

Grab new versions of extensions


The wikibase-registry has a couple of extension shoehorned into it mounted through mounts in the docker-compose file (see above).

We need new versions of these extensions for Mediawiki 1.31 while leaving the old versions in place for the still running 1.30 version.

I’ll do this by creating a new folder, copying the existing extension code into it, and then changing and fetching the branch.

# Make copies of the current 1.30 versions of extensions
root@wbregistry-01:~/wikibase-registry# mkdir mw131
root@wbregistry-01:~/wikibase-registry# cp -r ./Nuke ./mw131/Nuke
root@wbregistry-01:~/wikibase-registry# cp -r ./ConfirmEdit ./mw131/ConfirmEdit

# Update them to the 1.31 branch of code
root@wbregistry-01:~/wikibase-registry# cd ./mw131/Nuke/
root@wbregistry-01:~/wikibase-registry/mw131/Nuke# git fetch origin REL1_31
From https://github.com/wikimedia/mediawiki-extensions-Nuke
 * branch            REL1_31    -> FETCH_HEAD
root@wbregistry-01:~/wikibase-registry/mw131/Nuke# git checkout REL1_31
Branch REL1_31 set up to track remote branch REL1_31 from origin.
Switched to a new branch 'REL1_31'
root@wbregistry-01:~/wikibase-registry/mw131/Nuke# cd ./../ConfirmEdit/
root@wbregistry-01:~/wikibase-registry/mw131/ConfirmEdit# git fetch origin REL1_31
From https://github.com/wikimedia/mediawiki-extensions-ConfirmEdit
 * branch            REL1_31    -> FETCH_HEAD
root@wbregistry-01:~/wikibase-registry/mw131/ConfirmEdit# git checkout REL1_31
Branch REL1_31 set up to track remote branch REL1_31 from origin.
Switched to a new branch 'REL1_31'

Define an updated Wikibase container / service

We can run a container with the new Mediawiki and Wikibase code in alongside the old container without causing any problems, it just needs a name.

So below I define this new service, called wikibase-131 using the same general details as my previous wikibase service, but pointing to the new versions of my extensions, and add it to my docker-compose file.

Note that no port is exposed, as I don’t want public traffic here yet, and also no network aliases are yet defined. We will switch those from the old service to the new service at a later stage.

wikibase-131:
    image: wikibase/wikibase:1.31-bundle
    restart: always
    links:
      - mysql
    volumes:
      - mediawiki-images-data:/var/www/html/images
      - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
      - ./mw131/Nuke:/var/www/html/extensions/Nuke
      - ./mw131/ConfirmEdit:/var/www/html/extensions/ConfirmEdit
    depends_on:
    - mysql
    environment:
      MW_ADMIN_NAME: "private"
      MW_ADMIN_PASS: "private"
      MW_SITE_NAME: "Wikibase Registry"
      DB_SERVER: "mysql.svc:3306"
      DB_PASS: "private"
      DB_USER: "private"
      DB_NAME: "private"
      MW_WG_SECRET_KEY: "private"

I tried running this service as is but ran into an issue with the change from 1.30 to 1.31. (Your output will be much more verbose if you need to pull the image)

root@wbregistry-01:~/wikibase-registry# docker-compose up wikibase-131
wikibase-registry_mysql_1 is up-to-date
Creating wikibase-registry_wikibase-131_1 ... done
Attaching to wikibase-registry_wikibase-131_1
wikibase-131_1   | wait-for-it.sh: waiting 120 seconds for mysql.svc:3306
wikibase-131_1   | wait-for-it.sh: mysql.svc:3306 is available after 0 seconds
wikibase-131_1   | wait-for-it.sh: waiting 120 seconds for mysql.svc:3306
wikibase-131_1   | wait-for-it.sh: mysql.svc:3306 is available after 1 seconds
wikibase-131_1   | /extra-entrypoint-run-first.sh: line 3: MW_ELASTIC_HOST: unbound variable
wikibase-registry_wikibase-131_1 exited with code 1

The wikibase:1.31-bundle docker image includes the Elastica and CirrusSearch extensions which were not a part of the 1.30 bundle, and due to the entrypoint infrastructure added along with it I will need to change some things to continue without using Elastic for now.

Fix MW_ELASTIC_HOST requirement with a custom entrypoint.sh

The above error message shows that the error occurred while running extra-entrypoint-run-first.sh which is provided as part of the bundle.
It is automatically loaded by the base image entry point.
The bundle now also runs some extra steps as part of the install for wikibase that we don’t want if we are not using Elastic.

If you give the entrypoint file a read through you can see that it does a few things:

  • Makes sure the required environment variables are passed in
  • Waits for the DB server to be online
  • Runs extra scripts added by the bundle image
  • Does the Mediawiki / Wikibase install on the first run (if LocalSettings does not exist)
  • Run apache

This is a bit excessive for what the wikibase-registry requires right now, so lets strip this down, saving next to our docker-compose file, so /root/wikibase-registry/entrypoint.sh for the wikibase-registry

#!/bin/bash

REQUIRED_VARIABLES=(MW_ADMIN_NAME MW_ADMIN_PASS MW_WG_SECRET_KEY DB_SERVER DB_USER DB_PASS DB_NAME)
for i in ${REQUIRED_VARIABLES[@]}; do
    eval THISSHOULDBESET=\$$i
    if [ -z "$THISSHOULDBESET" ]; then
    echo "$i is required but isn't set. You should pass it to docker. See: https://docs.docker.com/engine/reference/commandline/run/#set-environment-variables--e---env---env-file";
    exit 1;
    fi
done

set -eu

/wait-for-it.sh $DB_SERVER -t 120
sleep 1
/wait-for-it.sh $DB_SERVER -t 120

docker-php-entrypoint apache2-foreground

And mount it in the wikibase-131 service that we have created by adding a new volume.

volumes:
      - ./entrypoint.sh:/entrypoint.sh

Run the new service alongside the old one

Running the service now works as expected.

root@wbregistry-01:~/wikibase-registry# docker-compose up wikibase-131
wikibase-registry_mysql_1 is up-to-date
Recreating wikibase-registry_wikibase-131_1 ... done
Attaching to wikibase-registry_wikibase-131_1
{snip, broing output}

And the service appears in the list of running containers.

root@wbregistry-01:~/wikibase-registry# docker-compose ps
              Name                             Command               State          Ports
-------------------------------------------------------------------------------------------------
wikibase-registry_mysql_1           docker-entrypoint.sh mysqld      Up      3306/tcp
wikibase-registry_wdqs-frontend_1   /entrypoint.sh nginx -g da ...   Up      0.0.0.0:8282->80/tcp
wikibase-registry_wdqs-proxy_1      /bin/sh -c "/entrypoint.sh"      Up      0.0.0.0:8989->80/tcp
wikibase-registry_wdqs-updater_1    /entrypoint.sh /runUpdate.sh     Up      9999/tcp
wikibase-registry_wdqs_1            /entrypoint.sh /runBlazegr ...   Up      9999/tcp
wikibase-registry_wikibase-131_1    /bin/bash /entrypoint.sh         Up      80/tcp
wikibase-registry_wikibase_1        /bin/bash /entrypoint.sh         Up      0.0.0.0:8181->80/tcp

Update.php

From here you should now be able to get into your new container with the new code.

root@wbregistry-01:~/wikibase-registry# docker-compose exec wikibase-131 bash
root@40de55dc62fc:/var/www/html#

And then run update.php

In theory updates to the database, and anything else, will always be backward compatible for at least 1 major version, which is why we can run this update while the site is still being served from Mediawiki 1.30

root@40de55dc62fc:/var/www/html# php ./maintenance/update.php --quick
MediaWiki 1.31.1 Updater

Your composer.lock file is up to date with current dependencies!
Going to run database updates for wikibase_registry
Depending on the size of your database this may take a while!
{snip boring output}
Purging caches...done.

Done in 0.9 s.

Switching versions

The new service is already running alongside the old one, and the database has already been updated, now all we have to do is switch the services over.

If you want a less big bangy approach you could probably setup a second port exposing the updated version and direct a different domain or sub domain to that location, but I don’t go into that at all here.

Move the “ports” definition and “networks” definition from the “wikibase” service to the “wikibase-131” service. Then recreate the container for each service using the update configuration. (If you have any other references to the “wikibase” service in the docker-compose.yml file such as in depends-on then you will also need to change this.

root@wbregistry-01:~/wikibase-registry# docker-compose up -d wikibase
wikibase-registry_mysql_1 is up-to-date
Recreating wikibase-registry_wikibase_1 ... done
root@wbregistry-01:~/wikibase-registry# docker-compose up -d wikibase-131
wikibase-registry_mysql_1 is up-to-date
Recreating wikibase-registry_wikibase-131_1 ... done

If everything has worked you should see Special:Version reporting the newer version, which we now see on the wikibase-registry.

Cleanup

Now that everything is updated we can stop and remove the previous “wikibase” service container.

root@wbregistry-01:~/wikibase-registry# docker-compose stop wikibase
Stopping wikibase-registry_wikibase_1 ... done
root@wbregistry-01:~/wikibase-registry# docker-compose rm wikibase
Going to remove wikibase-registry_wikibase_1
Are you sure? [yN] y
Removing wikibase-registry_wikibase_1 ... done

You can then do some cleanup:

  • Remove the “wikibase” service definition from the docker-compose.yml file, leaving “wikibase-131” in place.
  • Remove any files or extensions (older versions) that are only loaded by the old service that you have now removed.

Further notes

There are lots of other things I noticed while writing this blog post:

  • It would be great to move the env vars out of the docker-compose and into env var files.
  • The default entrypoint in the docker images is quite annoying after the initial install and if you don’t use all of the features in the bundle.
  • We need a documentation hub? ;)

The post wikibase-docker, Mediawiki & Wikibase update appeared first on Addshore.

Wikidata Architecture Overview (diagrams)

18:42, Sunday, 09 2019 June UTC

Over the years diagrams have appeared in a variety of forms covering various areas of the architecture of Wikidata. Now, as the current tech lead for Wikidata it is my turn.

Wikidata has slowly become a more and more complex system, including multiple extensions, services and storage backends. Those of us that work with it on a day to day basis have a pretty good idea of the full system, but it can be challenging for others to get up to speed. Hence, diagrams!

All diagrams can currently be found on Wikimedia Commons using this search, and are released under CC-BY-SA 4.0. The layout of the diagrams with extra whitespace is intended to allow easy comparison of diagrams that feature the same elements.

High level overview

High level overview of the Wikidata architecture

This overview shows the Wikidata website, running Mediawiki with the Wikibase extension in the left blue box. Various other extensions are also run such as WikibaseLexeme, WikibaseQualityConstraints, and PropertySuggester.

Wikidata is accessed through a Varnish caching and load balancing layer provided by the WMF. Users, tools and any 3rd parties interact with Wikidata through this layer.

Off to the right are various other external services provided by the WMF. Hadoop, Hive, Ooozie and Spark make up part of the WMF analytics cluster for creating pageview datasets. Graphite and Grafana provide live monitoring. There are many other general WMF services that are not listed in the diagram.

Finally we have our semi persistent and persistent storages which are used directly by Mediawiki and Wikibase. These include Memcached and Redis for caching, SQL(mariadb) for primary meta data, Blazegraph for triples, Swift for files and ElasticSearch for search indexing.

Getting data into Wikidata

There are two ways to interact with Wikidata, either the UI or the API.

The primary UI is JS based and itself interacts with the API. The JS UI covers most of the core functionality of Wikibase with the exception of some small small features such as merging of entities (T140124, T181910). 

A non JS UI also exists covering most features. This UI is comprised of a series of Mediawiki SpecialPages. Due to the complexities around editing statements there is currently no non JS UI for this.

The API and UIs interact with Wikidata entities stored as Mediawiki pages saving changes to persistent storage and doing any other necessary work.

Wikidata data getting to Wikipedia

Wikidata clients within the Wikimedia cluster can use data from wikidata in a variety of ways. The most common and automatic way is the generation of the “Languages” side bar on projects linking to the same article in other languages.

Data can also be accessed through the property parser function and various LUA functions.

Once entities are updated on wikidata.org that data needs to be pushed to client sites that are subscribed to the entity. This happens using various subscription metadata tables on both the clients and the repo(wikidata.org) itself. The Mediawiki jobqueue is used to process the updates outside of a regular webrequest, and the whole process is controlled by a cron job running the dispatchChanges,php maintenance script.

For wikidata.org multiple copies of the dispatchChanges script run simultaneously, looking at the list of client sites and changes that have happened since updates were last pushed, determining if updates need to be pushed and queueing jobs to actually update the data where needed, causing a page purge on the client. When these jobs are triggered the changes are also added to the client recent changes table so that they appear next to other changes for users of the site.

The Query Service

The Wikidata query service, powered by blazegraph, listens to a stream of changes happening on Wikidata.org. There are two possible modes, polling Special:RecentChanges, or using a kafka queue of EventLogging data. Whenever an entity changes the query service will request new turtle data for the entity from Special:EntityData, munge it (do further processing) and add it to the triple store.

Data can also be loaded into the query service from the RDF dumps. More details can be found here.

Data Dumps

Wikidata data is dumped in a variety of formats using a couple of different php based dump scripts.

More can be read about this here.

The post Wikidata Architecture Overview (diagrams) appeared first on Addshore.

Hacking vs Editing, Wikipedia & Declan Donnelly

18:42, Sunday, 09 2019 June UTC

On the 18th of November 2018 the Wikipedia article for Declan Donnelly was edited and vandalised. Vandalism isn’t new on Wikipedia, it happens to all sorts of articles throughout every day. A few minutes after the vandalism the change made its way to Twitter and from there on to some media outlets such as thesun.co.uk and  metro.co.uk the following day, with another headline scaremongering and misleading using the word “hack”.

“I’m A Celebrity fans hack Declan Donnelly by changing his height on Wikipedia after Holly Willoughby mocks him”

Hacking has nothing to do with it. One of the definitions of hacking is to “gain unauthorized access to data in a system or computer”. What actually happened is someone, somewhere, edited the article, which everyone is able and authorized  to do. Editing is a feature, and its the main action that happens on Wikipedia.

The word ‘hack’ used to mean something, and hackers were known for their technical brilliance and creativity. Now, literally anything is a hack — anything — to the point where the term is meaningless, and should be retired.


The word ‘hack’ is meaningless and should be retired – 15 June 2018 by MATTHEW HUGHES

The edit that triggered the story can be seen below. It adds a few words to the lead paragraph of the article at 22:04 and was reverted at 22:19 giving it 15 minutes of life on the site.

The resulting news coverage increased the traffic to the article quite dramatically, going from just 500-1000 views a day to 27,000-29,000 for the 2 days following then slowly subsiding to 12,000 and 9,800 on day 4. This is similar to the uptick in traffic caused by a youtube video I spotted some time ago, but realistically these upticks happen pretty much every day for various articles for various reasons.

Wikimedia pageviews tool for Declan Donnelly article

I posted about David Cameron’s Wikipedia page back in 2015 when another vandalism edit made some slightly more dramatic changes to the page. Unfortunately the page views tool for Wikimedia projects doesn’t have readily available data going back that far.

Maybe one day people will stop vandalising Wikipedia… Maybe one day people will stop reported everything that happens online as a “hack”.

The post Hacking vs Editing, Wikipedia & Declan Donnelly appeared first on Addshore.

Creating a Dockerfile for the Wikibase Registry

18:42, Sunday, 09 2019 June UTC

Currently the Wikibase Registry(setup post) is deployed using the shoehorning approach described in one of my earlier posts. After continued discussion on the Wikibase User Group Telegram chat about different setups and upgrade woes I have decided to convert the Wikibase Registry to use the prefered approach of a custom Dockerfile building a layer on top of one of the wikibase images.

I recently updated updated the Wikibase registry from Mediawiki version 1.30 to 1.31 and described the process in a recent post, so if you want to see what the current setup and docker-compose file looks like, head there.

As a summary the Wikibase Registry uses:

  • The wikibase/wikibase:1.31-bundle image from docker hub
  • Mediawiki extensions:
    • ConfirmEdit
    • Nuke

Creating the Dockerfile

Our Dockerfile will likely end up looking vaugly similar to the wikibase base and bundle docker files, with a fetching stage, possible composer stage and final wikibase stage, but we won’t have to do anything that is already done in the base image.

FROM ubuntu:xenial as fetcher
# TODO add logic
FROM composer as composer
# TODO add logic
FROM wikibase/wikibase:1.31-bundle
# TODO add logic

Fetching stage

Modifying the logic that is used in the wikibase Dockerfile the extra Wikibase Registry extensions can be fetched and extracted.

Note that I am using the convenience script for fetching Mediawiki extensions from the wikibase-docker git repo matching the version of Mediawiki I will be deploying.

FROM ubuntu:xenial as fetcher

RUN apt-get update && \
    apt-get install --yes --no-install-recommends unzip=6.* jq=1.* curl=7.* ca-certificates=201* && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

ADD https://raw.githubusercontent.com/wmde/wikibase-docker/master/wikibase/1.31/bundle/download-extension.sh /download-extension.sh

RUN bash download-extension.sh ConfirmEdit;\
bash download-extension.sh Nuke;\
tar xzf ConfirmEdit.tar.gz;\
tar xzf Nuke.tar.gz

Composer stage

None of these extensions require a composer install, so there will be no composer step in this example. If Nuke for example required a composer install, the stage would look like this.

FROM composer as composer
COPY --from=fetcher /Nuke /Nuke
WORKDIR /Nuke
RUN composer install --no-dev

Wikibase stage

The Wikibase stage needs to pull in the two fetched extensions and make any other modifications to the resulting image.

In my previous post I overwrote the entrypoint to something much simpler removing logic to do with ElasticSearch that the Registry is not currently using. In my Dockerfile I have simplified this even further inlining the creation of a simple 5 line entrypoint, overwriting what was provided by the wikibase image.

I have left the default LocalSettings.php in the image for now, and I will continue to override this with a docker-compose.yml volume mount over the file. This avoid the need to rebuild the image when all you want to do is tweak a setting.

FROM wikibase/wikibase:1.31-bundle

COPY --from=fetcher /ConfirmEdit /var/www/html/extensions/ConfirmEdit
COPY --from=fetcher /Nuke /var/www/html/extensions/Nuke

RUN echo $'#!/bin/bash\n\
set -eu\n\
/wait-for-it.sh $DB_SERVER -t 120\n\
sleep 1\n\
/wait-for-it.sh $DB_SERVER -t 120\n\
docker-php-entrypoint apache2-foreground\n\
' > /entrypoint.sh

If the composer stage was used to run a composer command on something that was fetched then you would likely need to COPY that extension –from the composer layer rather than the fetcher layer.

Building the image

I’m going to build the image on the same server that the Wikibase Registry is running on, as this is the simplest option. More complicated options could involve building in some Continuous Integration pipeline and publishing to an image registry such as Docker Hub.

I chose the descriptive name “Dockerfile.wikibase.1.31-bundle” and saved the file alongside my docker-compose.yml file.

There are multiple approaches that could now be used to build and deploy the image.

  1. I could add a build configuration to my docker-compose file specifying the location of the Dockerfile as described here then building the service image using docker-compose as described here.
  2. I could build the image separate to docker-compose, giving it an appropriate name and then simply use that image name (which will exist on the host) in the docker-compose.yml file

I’m going with option 2.

docker build --tag wikibase-registry:1.31-bundle-1 --pull --file ./Dockerfile.wikibase.1.31-bundle .

docker build documentation can be found here. The command tells docker to build an image from the “Dockerfile.wikibase.1.31-bundle” file, pulling new versions of any images being used and giving the image the name “wikibase-registry” with tag “1.31-bundle-1”

The image should now be visible in the docker images list for the machine.

root@wbregistry-01:~/wikibase-registry# docker images | grep wikibase-registry
wikibase-registry         1.31-bundle-1       e5dad76c3975        8 minutes ago       844MB

Deploying the new image

In my previous post I migrated from one image to another having two Wikibase containers running at the same time with different images.

For this image change however I’ll be going for more of a “big bang” approach and I’m pretty confident.

The current wikibase service definition can be seen below. This includes volumes for the entry point, extensions, LocalSettings and images, some of which I can now get rid of. Also I have removed the requirement for most of these environment variables by using my own entrypoint file and overriding LocalSettings entirely.

wikibase-131:
    image: wikibase/wikibase:1.31-bundle
    restart: always
    links:
      - mysql
    ports:
     - "8181:80"
    volumes:
      - mediawiki-images-data:/var/www/html/images
      - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
      - ./mw131/Nuke:/var/www/html/extensions/Nuke
      - ./mw131/ConfirmEdit:/var/www/html/extensions/ConfirmEdit
      - ./entrypoint.sh:/entrypoint.sh
    depends_on:
    - mysql
    environment:
      MW_ADMIN_NAME: "XXXX"
      MW_ADMIN_PASS: "XXXX"
      MW_SITE_NAME: "Wikibase Registry"
      DB_SERVER: "XXXX"
      DB_PASS: "XXXX"
      DB_USER: "XXXX"
      DB_NAME: "XXXX"
      MW_WG_SECRET_KEY: "XXXX"
    networks:
      default:
        aliases:
         - wikibase.svc
         - wikibase-registry.wmflabs.org

The new service definition has an updated image name, removed redundant volumes and reduced environment variables (DB_SERVER is still used as it is needed in the entrypoint I added).

wikibase-131:
    image: wikibase-registry:1.31-bundle-1
    restart: always
    links:
      - mysql
    ports:
     - "8181:80"
    volumes:
      - mediawiki-images-data:/var/www/html/images
      - ./LocalSettings.php:/var/www/html/LocalSettings.php:ro
    depends_on:
    - mysql
    environment:
      DB_SERVER: "mysql.svc:3306"
    networks:
      default:
        aliases:
         - wikibase.svc
         - wikibase-registry.wmflabs.org

For the big bang switchover I can simply reload the service.

root@wbregistry-01:~/wikibase-registry# docker-compose up -d wikibase-131
wikibase-registry_mysql_1 is up-to-date
Recreating wikibase-registry_wikibase-131_1 ... done

Using the docker-compose images command I can confirm that it is now running from my new image.

root@wbregistry-01:~/wikibase-registry# docker-compose images | grep wikibase-131
wikibase-registry_wikibase-131_1    wikibase-registry        1.31-bundle-1   e5dad76c3975   805 MB

Final thoughts

  • This should probably be documented in the wikibase-docker git repo which everyone seems to find, and also in the README for the wikibase image.
  • It would be nice if there were a single place to pull the download-extension.sh script from, perhaps with a parameter for version?

The post Creating a Dockerfile for the Wikibase Registry appeared first on Addshore.

weeklyOSM 463

08:27, Sunday, 09 2019 June UTC

28/05/2019-03/06/2019

pic

Multimapas – a combination of many historical, topographic, satellite and road maps 1 | © CC BY 4.0 – Instituto Geográfico Nacional de España | © Leaflet | map data © OpenStreetMap contributors

Mapping

  • The latest version of “How Did You Contribute” by Pascal Neis now uses osmose to display information about the quality of the user’s edits.
  • Facebook’s Maps Team announced the RapiD Editor based on iD, which enables mappers to convert data they extracted using machine learning into OSM features.
  • WiGeoGIS, a contractor of the fuel station chain OMV, announced plans to improve and maintain fuel stations of the brands OMV, AVANTI, Petrom, FE Trading and EuroTruck in OSM. See also the discussion on the Talk mailing list (May, June).
  • Valor Naram put a revised version of changing_table=* tagging to a vote.
  • Developers of the Maps.me navigator, which uses OSM data, created a validator for underground railways (subways/metros) all over the world.

Community

  • Simon Poole made some suggestions on the Tagging mailing list on how to deal with the increasing number of messages and proposals there. There were lots of replies.
  • Recently the Kazakhstan OSM community created a Telegram chat.

OpenStreetMap Foundation

  • Joost Schouppe reports about the OSMF Board meeting in Brussels (including discussing the survey that was carried out beforehand) on the OSM Foundation’s blog.

Events

  • OpenStreetMap Argentina announces a local State of the Map to be held on Saturday 27th July in Santa Fe.

Humanitarian OSM

  • HOT shares on their blog about an experimental version of the Tasking Manager which incorporates machine learning assistance for various tasks.
  • Simon Johnson writes on Medium about how “big data” in the humanitarian community is often at odds with the idea of “minimum viable data” – the idea that less data is actually more valuable because it’s better quality and easier to verify.
  • In Better Bombing with Machine Learning Frederik Ramm points out that computer vision/machine learning could be used by military forces for aerial bombing, and the OSM community should consider whether we should be so jubilant regarding companies that use those technologies for improving OSM.

Maps

  • Richard Fairhust tweets an example of using Lua to automatically transliterate names whilst processing OSM data for rendering. Post-processing OSM data in this way often removes the need for many name:<iso-code> tags.
  • The map of electoral districts on the website of the Magnitogorsk City (in Russia) is based on OSM.
  • A map of roads that are or will be repaired has been posted on the website of the Russian federal program “Safe and high-quality roads”. It is based on OSM but first you need to choose a region.

Releases

  • A new stable JOSM version (15155) was released. Category icons and a field for filtering background image settings have been added. Dynamic entries in the Background Images menu are displayed in submenus as well as many more improvements.
  • The OsmAnd Online GPS Tracker has been updated to version 0.5. New features are contact search, proxy settings, GPS settings and active markers.
  • Thanks to v2.0 of Babykarte, babies will not lose their way anymore. Or rather, it will become easier for parents to find baby and toddler friendly amenities (map).

Did you know …

OSM in the media

  • This short podcast from BBC Radio 4 asks the question: are there more stars in the universe than grains of beach sand on Earth? A contributor to the programme, using the OSM Planet File, computed an approximation for the amount of beach sand on the planet Earth.
  • An interview with Russian mapper Ilya Zverev has been posted (ru) on Habr. He spoke about what he did during his two years on the OSMF Board, why the American OSM society is the most friendly and why you need to participate in offline conferences. (automatic translation)

Other “geo” things

  • How to make a Simpsons-inspired map with expressions.
  • John Murray has been using the recent release of Rapids AI, a data science library for GPUs, to compute distances to everywhere in Great Britain from a point in a few seconds.
  • An extract from a forthcoming book by Barbara Tversky discusses “What makes a good map?”
  • An outdoor clothing company has been sneakily adding photos of its products to Wikipedia articles, in an attempt to get their brand higher up in Google image results. Just another danger of an open wiki system that we must be aware of.
  • Daniel J-H announced the release of a new version of RoboSat, which can detect roads and buildings in aerial imagery.
  • Die Welt has an article (automatic translation) about the best apps for water sports enthusiasts.

Upcoming Events

Where What When Country
Rennes Réunion mensuelle 2019-06-10 france
Bordeaux Réunion mensuelle 2019-06-10 france
Lyon Rencontre mensuelle pour tous 2019-06-11 france
Salt Lake City SLC Mappy Hour 2019-06-11 united states
Zurich OSM Stammtisch Zurich 2019-06-11 switzerland
Bordeaux Réunion mensuelle 2019-06-11 france
Hamburg Hamburger Mappertreffen 2019-06-11 germany
Wuppertal Wuppertaler Stammtisch im Hutmacher 18 Uhr 2019-06-12 germany
Leoben Stammtisch Obersteiermark 2019-06-13 austria
Munich Münchner Stammtisch 2019-06-13 germany
Bochum Mappertreffen 2019-06-13 germany
Berlin 132. Berlin-Brandenburg Stammtisch 2019-06-14 germany
Montpellier State of the Map France 2019 2019-06-14-2019-06-16 france
Essen 5. OSM-Sommercamp und 12. FOSSGIS-Hackingevent im Linuxhotel 2019-06-14-2019-06-16 germany
Kyoto 京都!街歩き!マッピングパーティ:第9回 光明寺 2019-06-15 japan
Dublin OSM Ireland AGM & Talks 2019-06-15 ireland
Santa Cruz Santa Cruz Ca. Mapping Party 2019-06-15 California
Cologne Bonn Airport Bonner Stammtisch 2019-06-18 germany
Lüneburg Lüneburger Mappertreffen 2019-06-18 germany
Sheffield Sheffield pub meetup 2019-06-18 england
Rostock Rostocker Treffen 2019-06-18 germany
Karlsruhe Stammtisch 2019-06-19 germany
London #geomob London 2019-06-19 england
Leoberdorf Leobersdorfer Stammtisch 2019-06-20 austria
Rennes Préparer ses randos pédestres ou vélos 2019-06-23 france
Bremen Bremer Mappertreffen 2019-06-24 germany
Angra do Heroísmo Erasmus+ EuYoutH_OSM Meeting 2019-06-24-2019-06-29 portugal
Salt Lake City SLC Map Night 2019-06-25 united states
Montpellier Réunion mensuelle 2019-06-26 france
Lübeck Lübecker Mappertreffen 2019-06-27 germany
Mannheim Mannheimer Mapathons e.V. 2019-06-27 germany
Düsseldorf Stammtisch 2019-06-28 germany
London OSMUK Annual Gathering including Wikidata UK Meets OSM 2019-06-29 united kingdom
Kyoto 幕末京都オープンデータソン#11:京の浪士と池田屋事件 2019-06-29 japan
Santa Fe State of the Map Argentina 2019 2019-07-27 argentina
Minneapolis State of the Map US 2019 2019-09-06-2019-09-08 united states
Edinburgh FOSS4GUK 2019 2019-09-18-2019-09-21 united kingdom
Heidelberg Erasmus+ EuYoutH_OSM Meeting 2019-09-18-2019-09-23 germany
Heidelberg HOT Summit 2019 2019-09-19-2019-09-20 germany
Heidelberg State of the Map 2019 (international conference) 2019-09-21-2019-09-23 germany
Grand-Bassam State of the Map Africa 2019 2019-11-22-2019-11-24 ivory coast

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Polyglot, Rogehm, SK53, Silka123, SomeoneElse, TheFive, TheSwavu, YoViajo, derFred, geologist, jinalfoflia.

#Wikidata - Exposing #Diabetes #Research

07:04, Sunday, 09 2019 June UTC
People die of diabetes when they cannot afford their insulin. There is not much that I can do about it but I can work in Wikidata on the scholars, the awards, the papers that are published that have to do with diabetes. The Wikidata tools that are important in this are: Reasonator, Scholia and SourceMD and the ORCiD, Google Scholar and VIAF websites prove themselves to be essential as well.

One way to stay focused is by concentrating on awards and, at this time it is the Minkowski Prize, it is conferred by the European Association for the Study of Diabetes. The list of award winners was already complete so I concentrated on their papers and co-authors. The first thing to do is to check if there is an ORCiD identifier and if that ORCiD identifier is already known in Wikidata, I found that it often is and merges of Wikidata items may follow. I then submit a SourceMD job to update that author and its co-authors.

The next (manual) step is about gender ratios. Scholia includes a graphic representation of co-authors and for all the "white" ones no gender has been entered. The process is as follows: when the gender is "obvious", it is just added. For an "Andrea" you look them up in Google and add what you think you see. When a name is given as "A. Winkowsky", you check ORCiD for a full name and iterate the process.

Once the SourceMD job is done, chances are that you have to start the gender process again because of new co-authors. Thomas Yates is a good example of a new co-author, already with a sizable amount of papers (95) to his name but not complete (417). Thomas is a "male".

What I achieve is an increasingly rich coverage of everything related to diabetes. The checks and balances ensure a high quality. And as more data is included in Wikidata, people who query will gain a better result.

What I personally do NOT do is add authors without an ORCiD identifier. It takes much more effort and chances of getting it wrong make it unattractive as well. In addition, I care for science but when people are not "Open" about their work I am quite happy for their colleagues to get the recognition they deserve.
Thanks,
      GerardM

This Month in GLAM: May 2019

01:21, Sunday, 09 2019 June UTC

We've recently published research on performance perception that we did last year. The micro survey used in this study is still running on multiple Wikipedia languages and gives us insights into perceived performance.

The micro survey simply asks users on Wikipedia articles, in their own language, if they think that the current page loaded fast enough:

Let's look at the results on Spanish and Russian Wikipedias, where we're collecting the most data. We have collected more than 1.1 million survey responses on Spanish Wikipedia and close to 1 million on Russian Wikipedia so far. The survey is displayed to a small fraction of our visitors.

How satisfied are our visitors with our page load performance?

Ignoring neutral responses ("I'm not sure"), we see that consistently across wikis between 85 and 90% of visitors find that the page loaded fast enough. That's an excellent score, one that we can be proud of. And it makes sense, considering that Wikipedia is one of the fastest websites on the Web.

Now, a very interesting finding is that this satisfaction ratio varies quite a bit depending on whether you're logged into the website, or if like most Wikipedia visitors, you're logged out:

wiki status sample size satisfaction ratio
spanish logged in 1,500 89.70%
spanish logged out 1,109,205 85.82%
russian logged in 7,093 92.28%
russian logged out 885,926 85.82%

It appears that logged-in users are consistently more satisfied about our performance than logged-out visitors.

The contributor performance penalty

Andres Apevalov — Press team of Prima Vista Literature Festival, CC BY-SA 4.0
Andres Apevalov — Press team of Prima Vista Literature Festival, CC BY-SA 4.0

What's very surprising about logged-in users being more satisfied is that we know for a fact that the logged-in experience is slower. Because our logged-in users have to reach our master datacenter in the US, instead of hitting the cache point of presence closest to them. This is a long-standing technical limitation of our architecture. An issue we intend to resolve one day.

Why could they possibly be happier, then?

The Spanish paradox

Spanish Wikipedia, at first glance, seems to contradict this phenomenon of slower page loads for logged-in users. Looking at the desktop site only (to rule out differences in the mobile/desktop mix):

wiki status median loadEventEnd
spanish logged in 1400.5
spanish logged out 1834
russian logged in 1356
russian logged out 1075

The reason why - contrary to what we see on other wikis and at a global scale - Spanish Wikipedia page loads seem faster for logged-in users, is that Spanish Wikipedia traffic has a very peculiar geographic distribution. Logged-in users are much more likely to be based in Spain (30.04%) than in latin american countries than their logged-out counterparts (22.3%). Since internet connectivity tends to be faster in Spain, this ratio difference explains why the logged-in experience appears to be faster - but isn't - when looking at RUM data at the website level.

This is a very common pitfall of RUM data, where seemingly contradicting results can emerge depending on how you slice the data. RUM data has to be studied from many angles before drawing conclusions.

Caching differences

Looking at the Navigation Timing data we collect for survey respondants, we see that for logged-in users the median connect time on Spanish Wikipedia is 0 and for logged-out users it's 144ms. This means that logged-in users view a lot of pages and the survey mostly ends up being displayed on their nth viewed page, where n is more than 1, because their browser is already connected to our domain. Whereas for a lot of logged-out users, we capture their first page load, with a higher probability of a cold cache. This means that logged-in users, despite having a (potential) latency penalty of connecting to the US, tend to have more cached assets, particularly the JS and CSS needed by the page. This doesn't fully compensate the performance penalty of connecting to a potentially distant datacenter, but it might reduce the variability of performance between page loads.

In order to further confirm this theory, in the future we could try to record information about how much of the JS and CSS was already available in the browser cache and the time the page load happened. This is not information we currently collect. Such data would allow us to confirm whether or not satisfaction is correlated to how well cached dependencies are, regardless of the user's logged-in/logged-out status.

Brand affinity?

Becoming a Wikipedia contributor - and therefore, logging in - requires a certain affinity to the Wikipedia project. It's possible, as a result, that logged-in users have a more favourable view of Wikipedia than logged-out users on average. And that positive outlook might influence how they judge the performance of the website.

This is a theory we will explore in the future by asking more questions in the micro survey, in order to determine whether or not the user who responds has a positive view of our website in general. This would allow us to quantify how large the effect of brand affinity might be on performance perception.

Kitty Quintanilla is a second year medical student at Western Michigan University Homer Stryker M.D. School of Medicine. Here, she shares why she’s passionate about increasing access to free information.

Kitty Quintanilla

Have you ever had your parents ramble at you about the “when I was young, we didn’t have all these computers and the internet and smartphones” thing? Mine have done it countless times, but they often sounded sad when they said it, wistful instead of condescending. A lot of older folk like to give our generation flak for using the internet as much as we do, but some like my parents wish they had this kind of thing growing up.

We have billions of pages of information at our fingertips, in seconds. It’s a modern miracle.

But among my family, a bunch of Latinos from El Salvador, there was a considerable portion of them that I noticed were under-educated, had grown up in rural towns and extreme poverty, in a third-world country that didn’t have a lot of technology at all—like basic electricity—let alone computers of any kind. Even now, in their forties, fifties, and sixties, they struggle with modern technology, and if they manage to figure out how to get Google open, they struggle to find things in the language they understand best.

I grew up translating things for my parents and family members. Countless people who didn’t speak or read English well enough to navigate this country or find the information and help they needed, so as I grew older, I found myself in another position of translation. I worked at Johns Hopkins doing pediatrics research, trialing a text-alert help system to help new, Latino mothers navigate their newborn children’s healthcare visits and needs, something to help break down the language barriers often present in healthcare. We hoped that once we had this system working, we could expand on it and even spread it farther, so hospitals all across the country could use similar templates for multiple languages, helping make the healthcare system a little easier to handle.

Look, healthcare is hard to navigate even for those of us who speak English. It must be even more terrifying and frustrating when you don’t speak English at all!

So I was delighted when I found out that the librarians at my medical school, Liz Lorbeer and Isaac Clark, wanted to create (and had been creating!) elective and projects to help translate medical resources into Spanish, and Spanish resources into English.

The more information we can make accessible to Spanish-speaking people, the more we can help those who consistently are left floundering in the U.S. healthcare system. My parents would be thrilled to discover that there were pages upon pages of information in their native tongue—and more importantly, at a level, they could understand. They didn’t have the benefit of a robust education system, and my father never even finished his equivalent of middle school, while my mother only had a high-school education. They always lament about their lack of education, their struggle with English as a second language, the way that the Salvadoran Civil War stole many opportunities and chances that others take for granted. They want to learn! They want to be able to search for information quickly and find what they are looking for. They want to make up for all the lost time.

For them to be able to learn at the click of a button, to open Wikipedia and find that the Spanish Wikipedia had pages on what they were looking for? That would be monumental.

I want to make all kinds of information more accessible for people like my parents: people who maybe do not know all the complicated jargon, or do not feel confident in their English, or want to read something simple and understandable in their native tongue. Wikipedia was created because of a desire to share knowledge and make it possible for anyone to learn, anyone to access and read, at the click of a button.

Our Wikipedia project was a fantastic chance to get to build something to help. I was delighted to get to help with creating the curriculum and syllabus for the elective, which would have students adding more information to Spanish Wikipedia articles, or even creating new ones! The English Wikipedia by far seems to have the most articles, but there is such a vast gap of knowledge between the different Wikipedias, with so many topics not covered in other languages.

There were plenty of things I realized while helping to work on the project, though. For one, I realize how badly I’ve always taken my course syllabi for granted, especially in undergrad. Having now achieved the amount of planning and detail-work that a syllabus requires has given me a completely new appreciation for every professor who has had to make one.

(I’m sorry, every single undergraduate professor whose syllabus I never read.)

The project also required a lot of testing—with me as a guinea pig, oh boy—to make sure it was feasible, and I also had to go about finding resources for students who maybe weren’t super fluent in Spanish. This is a translation course, but I was told to make it accessible and possible even for students who aren’t very fluent or bilingual, which was a challenge.  

If our particular project does become successful, I hope we can share how we’ve adapted the Wiki Education course template with other institutions and encourage their students to help in the endeavor to make medical information available, in even more languages than Spanish. Long story short, hopefully this project can be a step forward in the big grand goal of accessible information for everyone.


If you’re interested in having students write or translate Wikipedia articles as an assignment, use our free assignment templates and management tools! Visit teach.wikiedu.org for all you need to know to get started.

Wikipedia for Peace at Europride

10:14, Friday, 07 2019 June UTC

Next week I’ll be taking a little time out from my work at Edinburgh to go to Wikipedia for Peace at Europride 2019 in Vienna. Europride promotes lesbian, gay, bisexual, trans (LGBT) and other queer issues on an international level through parades, festivals and other cultural activities.  During the event a group of international editors will be coming together to create and edit LBGT+ articles in a range of European languages.  The event, which is run by Wikimedia Austria, is part of the Wikipedia for Peace movement which aims to strengthen peace and social justice through Wikimedia projects. Wikipedia for Peace organises community projects which bring together Wikipedia editors and people active in social and peace movements.

Although I’m not exactly the world’s most prolific Wikipedia editor, one of my proudest editing achievements is creating a page for Mary Susan McIntosh during one of Ewan McAndrew’s early editathons at the University of Edinburgh.  McIntosh was one of the founders of the Gay Liberation Front in the UK, and a member of the Policy Advisory Committee which advocated for lowering the age of male homosexual consent from 21 to 18.  As an academic criminologist and sociologist, she was one of the first to present evidence that homosexuality was not a psychiatric or clinical pathology but rather influenced by historical and cultural factors, and her paper The Homosexual Role was crucial in shaping the development of social constructionism. 

I had never heard of McIntosh before writing her Wikipedia entry and it was shocking to me that such an important activist and foundational thinker had been omitted from the encyclopedia.  I hope I can use my time in Vienna to create articles for other overlooked individuals from the queer community.   I’m particularly interested in focusing on the creation of articles around bisexual topics and individuals, which are sometimes marginalised in the LGBT+ community.  So if their are any LGBT+, with emphasis on the B, topics or individuals that you think should be added to the encyclopedia, please let me know!  You can also participate in the event remotely by signing up here.

I’m also looking forward to having an opportunity to photograph the European Pride Parade for Wikimedia Commons.  I think this will be my first Pride since 1998!

I’m immensely grateful to Wikimedia Austria for supporting my attendance at this event, and to Wikipedia UK for funding my travel through one of their project grants. Wikimedia UK’s project grants support volunteers to complete activities that benefit the organisation’s strategic goals including creating and raising awareness of open knowledge, building volunteer communities, releasing information and images under an open licence, and technology innovation. You can find out more information about project grants and how to apply here Wikimedia UK Project Grants.

Perspectives on #references, #citations

13:02, Thursday, 06 2019 June UTC
Wikipedia articles, scientific papers and some books have them: citations. Depending on your outlook, citations serve a different purpose. They exist to prove a point or to enable further reading. These differing purposes are not without friction.

In science, it makes sense to cite the original research establishing a fact. Important because when such a fact is retracted, the whole chain of citing papers may need to be reconsidered. In a Wikipedia article it is imho a bit different. For many people references are next level reading material and therefore a well written text expanding on the article is to be preferred, it helps bring things together.

When you consider the points made in a book to be important, like the (many) points made in Superior, the book by Angela Saini, you can expand the Wikidata item for the book by including its citations. It is one way to underline a point because those who seek such information will find a lot of additional reading and confirmation for the points made.

Adding citations in Wikidata often means that the sources and its authors are to be introduced. It takes some doing and by adding DOI, ORCiD, VIAF, and or Google Scholar data it is easy to make future connections. When you care to add citations to this book with me, this is my project page.
Thanks,
     GerardM

Welcome interns Amit, Khyati, and Ujjwal!

18:27, Wednesday, 05 2019 June UTC
Amit Joki

Last week we kicked off three exciting internship projects to improve the Wiki Education Dashboard. Over the next few months, Outreachy and Google Summer of Code students will join the Wiki Education technology team to build new features and tools for both Wiki Education programs and the global Wikimedia community.

Amit Joki, an Information Technology student from Madurai, has wide-ranging plan for improving the support for tracking cross-wiki programs. Among other things, Amit’s project will make it easier and more intuitive to choose which wikis to track edits for. Amit has been contributing to the Dashboard since spring 2018, and is responsible for a number of key features, including completing the internationalization system for Dashboard training modules. He just finished his second year of college.

Khyati Soneji

Khyati Soneji, a Computer Science student from Gandhinagar, will be working on making the Dashboard a better tool for the #1lib1ref campaign, which focuses on adding citations and other improvements to Wikipedia through outreach to librarians. One big part of this project will be to add ‘references added’ as one of the core statistics that the Dashboard tracks. Khyati also just finished her second year of college.

Ujjwal Agrawal, who just finished his Electronics and Communication Engineering degree from Indian Institute of Technology, Dhanbad, will be building an Android app for accessing the Dashboard. Ujjwal is a veteran Android developer who spent last summer working on the Wikimedia Commons app for Google Summer of Code.

Ujjwal Agrawal

Wes Reid and I will serve as mentors, and we’re looking forward to seeing what Amit, Khyati, and Ujjwal can accomplish.

To read more about Wiki Education’s open tech project and mentorship, read our blog post about running a newbie-friendly software project.

 

 

 

This week, the Wikimedia Foundation was invited to provide opening remarks for the third annual Global Conference of the Internet Jurisdiction and Policy Network in Berlin, Germany. This conference represents a place for civil society, platforms, elected representatives, policymakers, and other stakeholders to come together and discuss how we can manage tensions between competing national laws that impact information on the internet while elevating our essential rights and freedoms.  As advocates of free knowledge, it was an opportunity to share our belief in the importance of policymaking that supports an internet that is open and accessible to all.

This conference comes at a critical moment. The internet is in a moment of change, a testing of the boundaries of the free exchange of information and ideas. In the past year, we have seen increased concern about what information is available on social media and online, and how videos, images and stories are being shared more quickly and with wider audiences than has previously been possible.

This summit is an opportunity for all of us to continue to weigh how potential regulation may impact the promise of the internet to connect people and serve the common good. An overly broad, one-size-fits-all approach to regulation across the internet preferences platforms over people, places limits on knowledge and collaboration online, and effectively builds walls between people and ideas, rather than bridges. As stakeholders consider the very real challenges and responsibilities posed by internet governance and regulation, it is crucial to consider the following:

  • The importance of clearly articulating the norms and values we seek to uphold
  • The responsibility of governments to protect, and platforms to respect human rights
  • The challenges and risk of reactionary responses, and one size fits all regulation
  • The need for cross-border collaboration in service of our common humanity, and
  • The need to engage all stakeholders, especially civil society, in these critical dialogues

 
Laws and public policy should promote and preserve the freedom to share and participate in knowledge and exchange. The internet—and Wikipedia—is richer, more useful, and more representative when more people can engage together. That is why, unlike other internet platforms, Wikipedia does not localize knowledge for different countries or target it to individual users. Versions of Wikipedia are differentiated only by language—never by geography, demographic, or personal preference.

That means the information on Wikipedia is the same whether you are in Berlin or Brasilia, and editors from around the world can work together to improve, correct, and advance knowledge. Such a flourishing and competition of ideas and perspectives from different cultures may be a messy process, but it allows people to build consensus on how we see and share the world around us.

Any regulation also needs to consider its impact on international human rights. They are universal, fundamental, and non negotiable. We should carefully examine all solutions to make sure that we are aware of how potential restrictions could be abused, applied unevenly to different populations, or enforced too broadly in a way that silences or excludes people online. When we are overzealous about limiting knowledge, we risk impacting inclusivity and diversity. Permanent removal of knowledge can have long-term invisible impacts.

So how can we keep knowledge free and the internet open? Our recommendation is that this happens by giving power not to the few but to the many. Wikipedia is often held up as an exception to more traditional models for the consumer web, but we believe it is the example that decentralized models of curation and regulation can work. Wikipedia has shown how effective it can be when we empower communities to uphold a clear mission, purpose, and set of standards. As we look to the future of content moderation, we must similarly devise means to involve broad groups of stakeholders in these discussions, in order to create truly democratic, diverse, and sustainable online spaces.

Wikimedia’s vision is a world where every single human can freely share in the sum of all knowledge. This week’s conference produced some powerful momentum and collaboration between a multitude of stakeholders towards this shared future. The hard work is just beginning, but by meaningfully engaging more people and organizations today and in the future, we can develop standards and principles that are more inclusive, more enforceable, and more effective. We are encouraged by the possibility in front of us.

Together, we can help protect a flourishing and open internet that allows for new forms of culture, science, participation and knowledge.

Batches of Rust

10:47, Wednesday, 05 2019 June UTC

QuickStatments is a workhorse for Wikidata, but it had a few problems of late.

One of those is bad performance with batches. Users can submit a batch of commands to the tool, and these commands are then run on the Labs server. This mechanism has been bogged down for several reasons:

  • Batch processing written in PHP
  • Each batch running in a separate process
  • Limitation of 10 database connection per tool (web interface, batch processes, testing etc. together) on Labs
  • Limitation of (16? observed but not validated) simultaneous processes per tool on Labs cloud
  • No good way to auto-start a batch process when it is submitted (currently, auto-starting a PHP process every 5 minutes, and exit if there is nothing to do)
  • Large backlog developing

Amongst continued bombardment on Wiki talk pages, Twitter, Telegram etc. that “my batch is not running (fast enough)”, I went to mitigate the issue. My approach is to do all the batches in a new processing engine, written in Rust. This has several advantages:

  • Faster and easier on the resources than PHP
  • A single process running on Labs cloud
  • Each batch is a thread within that process
  • Checking for a batch to start every second (if you submit a new batch, it should start almost immediately)
  • Use of a database connection pool (the individual thread might have to wait a few milliseconds to get a connection, but the system never runs out)
  • Limiting simultaneous batch processing for batches from the same user (currently: 2 batches max) to avoid the MediaWiki API “you-edit-too-fast” error
  • Automatic handling of maxlag, bot/OAuth login etc. by using my mediawiki crate

This is now running on Labs, processing all (~40 at the moment) open batches simultaneously. Grafana shows the spikes in edits, but no increased lag so far. The process is given 4GB of RAM, but could probably do with a lot less (for comparison, each individual PHP process used 2GB).

A few caveats:

  • This is a “first attempt”. It might break in new, fun, unpredicted ways
  • It will currently not process batches that deal with Lexemes. This is mostly a limitation of the wikibase crate I use, and will likely get solved soon. In the meantime, please run Lexeme batches only within the browser!
  • I am aware that I have now code duplication (the PHP and the Rust processing). For me, the solution will be to implement QuickStatements command parsing in Rust as well, and replace PHP completely. I am aware that this will impact third-party use of QuickStatements (e.g. the WikiBase docker container), but the PHP and Rust sources are independent, so there will be no breakage; of course, the Rust code will likely evolve away from PHP in the long run, possibly causing incompatabilities

So far, it seems to be running fine. Please let me know if you encounter any issues (unusual errors in your batch, weird edits etc.)!

History books often focus on the white change-makers of the suffrage movement. Until last month, it was no different on Wikipedia. The article documenting the Nineteenth Amendment, which prohibited governments from discriminating against voters on the basis of sex, not only centered the narrative on white people, but on white men.

In order to ensure the public has access to the best representation of women’s suffrage in the United States, Wiki Education teamed up with the National Archives and Records Administration last fall. We have spent the last 8 months running courses to train researchers and archivists how to improve Wikipedia’s coverage of these topics. Experts spent months expanding articles on suffragists like Ida B. Wells and Mabel Ping-Hua Lee, and events like the Woman suffrage parade of 1913.

After seeing the incredible work these new Wikipedians could accomplish, we invited top performers to come back and focus their efforts on getting the article on the Nineteenth Amendment up to a higher quality prior to the 100th anniversary of its passing. Thanks to these six women, the thousands of people reading about the Nineteenth Amendment today will find a better representation of this important moment in history.

In just four weeks, our Wiki Scholars are responsible for 68.8% of the article’s content. And it’s now close to the second highest quality rating on Wikipedia. The article reaches nearly 2,000 readers a day (not including the influx we expect on today’s anniversary). That’s an astonishing reach to achieve.

Six Wiki Scholars contributed 69% of the article’s content in the last month. Pie chart shows the size of their total edits corresponding to their Wikipedia usernames.

What changed?

Before the six Wiki Scholars began working, two of the three photos in the article featured men (like the Supreme Court justice who presided over the Leser v. Garnett case where the Nineteenth Amendment was constitutionally established). And the presentation of the facts typically focused on men as the catalysts of change.

Before March, the section about the Leser v. Garnett case read: “Oscar Leser sued to stop two women registered to vote in Baltimore, Maryland, because he believed that the Maryland Constitution limited the suffrage to men and the Maryland legislature had refused to vote to ratify the Nineteenth Amendment.” The Wiki Scholars asked, why are we focusing on the men in this story instead of the women they were trying to stop from voting? These women don’t even get names? In fact, one of the women asserting her right to vote was a woman of color, which was completely erased through that limited description. After our course participants got to work, the section now reads,

Maryland citizens Mary D. Randolph, “‘a colored female citizen’ of 331 West Biddle Street,”[1] and Cecilia Street Waters, “a white woman, of 824 North Eutaw Street”,[2] applied for and were granted registration as qualified Baltimore voters on October 12, 1920. To have their names removed from the list of qualified voters, Oscar Leser and others brought suit against the two women on the sole grounds that they were women, arguing that they were not eligible to vote because the Constitution of Maryland limited suffrage to men[2] and the Maryland legislature had refused to vote to ratify the Nineteenth Amendment.

The Nineteenth Amendment article now features a subsection about what it didn’t do to enfranchise women of color — a facet of suffrage history that was previously absent from the article and is often underrepresented in the telling of suffragist history in general. Even after the passing of the Amendment, states used loopholes to prohibit both men and women of color from exercising their right well into the 60s when the Voting Rights Act of 1965 made racial discrimination in voting much more difficult for states to perpetrate.

The change in the table of contents from March 4, 2019 to June 4, 2019

Our NARA Wiki Scholars courses have made strides in bringing women of color to the forefront of public history, recognizing and celebrating their integral contributions to the movement. Other Wikipedia volunteers have praised the work, especially coming from new users. They completed the project just in time for the kickoff of celebrations around the Nineteenth Amendment’s passing. Now, millions worldwide can get a better picture of the rich history of this day and its legacy.

Wiki Education staff members Will Kent and Elysia Webb with four of the Wiki Scholars, who all met weekly throughout the month to coordinate their Wikipedia work.

Header image in the public domain, via Wikimedia Commons.

In short interviews with our employees we illuminate aspects and questions of our daily work with BlueSpice and our customers. Today’s topic: rights management. An interview with Florian Bäckmann, responsible for technical support at Hallo Welt! GmbH.

Florian, wikis are essentially open and transparent. Why suddenly assign rights?

That’s right. Actually, wikis build on the “wisdom of the crowd” and are virtual common property. Basically everyone participates and the contents are mostly public. With our Wiki BlueSpice, however, we serve corporate customers in the enterprise segment. In this context rights management is a key issue when it comes to run a web-platform like an enterprise-wiki.

But shouldn’t a wiki be used to break up “knowledge silos” and store company knowledge centrally?

Absolutely. But central isn’t always central. Our customers usually know where the bottlenecks and pitfalls lie when accessing a company-wide knowledge database. Between the two concepts “everyone may do everything” and “everything lies with one or a few employees” there is a large playground, which we serve with our permission manager.

Could you explain that in more detail?

Sure. On the content level individual rights are often required for departments, work groups or project teams, for example for changing articles. Another example: the management usually requires separate areas (namespaces) which are safeguarded with a privacy screen. After all, sensitive information is exchanged here. On top of that, if a corporate Wiki is public or partially public, sophisticated regulations are usually necessary to prevent misuse. And then there is the technical-functional level: Imagine every wiki user would have admin rights, being able to change central settings. Chaos would be preprogrammed. That’s why there are extended rights for appropriately qualified employees or IT administrators.

With BlueSpice 3 the assignment of rights has been revised. Why?

It turned out that the assignment of rights in BlueSpice 2 was not very intuitive and too complex for many customers. Since almost every function in the Wiki is associated with a right, at that time more than 200 individual rights could be assigned individually. With the launch of BlueSpice 3, we introduced the so-called role system. The main goal was to significantly simplify the assignment of rights. Talking of complicated: While MediaWiki uses a file on the server to assign rights, BlueSpice offers a graphic interface that allows rights to be assigned or adjusted quickly and easily by setting a few checkmarks.

Sounds good to me. But how exactly does the assignment of rights work?

The first step is to distinguish between users, groups, roles and rights. Furthermore, a basic distinction is made between anonymous wiki visitors without an account and registered wiki users with an account. First, a group already existing in the system or created by the customer himself is linked to individual Wiki users, usually employees. In other words: employees are assigned to a group.

With the help of “roles” the groups are then equipped with “rights packages”. Individual roles bundle numerous individual rights under one meaningful roof. The rights can be assigned wiki-wide or for individual namespaces (e.g. different rights for employees in different departments of the company).

Typical roles are:

  • readers: are allowed to read and comment on content
  • editors: as above, may additionally edit content
  • reviewers: as above, may additionally release content
  • administrators: as above, are allowed to make additional settings on the Wiki

 

bluespice mediawiki rights management rights permission
Screenshot: Overview of the rights management in BlueSpice.

 

OK, got it. So it’s not quite trivial to set up a rights management system, is it?

You’re right. Since the assignment of rights plays a central role in many companies, special attention should be paid to planning and configuration. Even though a Wiki is essentially an open and transparent application, many companies have legal requirements and internal policies that make access restrictions necessary. This applies to reading, but above all to changing information.

 

bluespice mediawiki rights management rights permission
Screenshot: Insight into the individual permissions of a “role”, in this case the editor.

 

How do you ensure that the rights system is set up properly?

We offer our customers the “rights management workshop”. In common we analyze and specify the individual rights-setup of the company wiki, granting group-, read- and write-rights or the right to delete pages. The results of the workshop are systematically documented in a rights matrix. After that the customer wiki is configured according to the specifications by our IT experts. After all, we want our customers to start with a wiki that exactly meets their expectations. Rights management included.

Let’s Wiki together!

 

More information about rights management can be found here:
https://en.wiki.bluespice.com/wiki/Reference:PermissionManager

Test BlueSpice pro now for 30 days free of charge and without obligation:
https://bluespice.com/bluespice-pro-evaluation/

Visit our webinar and get to know BlueSpice:
https://bluespice.com/webinar/

Contact us:
Angelika Müller and Florian Müller
Telephone: +49 (0) 941 660 800
E-Mail: sales@bluespice.com

Author: David Schweiger

The post The permission manager: Rights management in BlueSpice pro appeared first on BlueSpice Blog.

Sarah Mojarad teaches a Social Media for Scientists and Engineers course at the University of Southern California where students write and improve Wikipedia articles as an assignment. Here, she shares her pedagogical motivations for doing so and the impact it has on students.


Sarah Mojarad helping to update a Wikipedia page at a specialized conference, 2016

“Chemistry is often elusive but Wikipedia helps to make chemistry topics, and all topics, more accessible. It was also helpful that in the assignment we could choose the topic of information we wanted to update. Not only did it give me more agency, making the assignment more enjoyable, but it required that I update a page that I could see myself using. It is exciting knowing that I contributed to a page that others have viewed since and that I’ve perhaps helped someone learn more about a topic that interests me, and I never even had to directly connect with this person.” – Olivia Harper Wilkins, Caltech Physical Chemistry PhD Candidate

Often overlooked in higher education STEM programs, effective communication skills are highly valued and sought after in both industry and academia. Assignments that leverage STEM expertise and translate technical knowledge to the public can improve a student’s ability to communicate science. Writing for Wikipedia serves this purpose and increases the visibility of a student’s work. The platform fosters a unique, global community of writers, editors, and readers. It’s a place where students can contribute knowledge and where their work has the potential of being seen by millions of people.

Why I Use Wikipedia

The undergraduate and graduate students that enroll in my science communication course, Social Media for Scientists and Engineers, at USC (and Caltech previously) are diverse STEM majors with differing career trajectories. Throughout the term, students learn to articulate complex concepts online to audiences outside their field of study. It’s a skill that will need to be honed regardless of whether or not the student intends to pursue a career in academia or industry. The second time my course was offered at Caltech I included a Wikipedia assignment because it required students to write for non-technical, non-peer groups. The 2016 Year of Science campaign was underway at that point, so participating in the Wikipedia event was a natural fit. I believed that Wikipedia could be a good way for STEM students to write about technical topics for non-academic audiences, and the experience could demonstrate science outreach impact. Wiki Education was an ideal partner. Instructors can design assignments on their robust platform, utilize free support, and track student progress throughout the duration of the activity. The Wiki Education Dashboard also displays useful analytics, like page views on student contributions, that help quantify the impact of work.

With the Dashboard, I can create a list of approved Wikipedia pages for students to review and self-assign. To fit the diverse student audience, I chose STEM-related articles based on the academic background of course enrollees and development status of science Wikipedia pages. Offering articles that have room for improvements and that are relevant to students’ fields of study improves the overall experience them. Students then write about what they know. Thanks to the increased visibility of Wikipedia, students feel more accountable for their work than they normally would with a traditional writing assignment.

Impactful Science Writing – CRISPR

Wikipedia is a tool that can help STEM students participate in online science communication and develop confidence with their writing. The updates on the “encyclopedia anyone can edit” keep pace with scientific advancements, and science and medical pages are often high quality and accurate. Though the platform can be intimidating to new editors, each semester I see cases where students are willing to challenge themselves and write on advanced topics. For example, biomedical engineering students in my course have self-assigned CRISPR—a well-developed, highly trafficked page. Contributing to the article about CRISPR requires more coordination and interaction with other editors on the Talk page. Because it’s watched by many editors, it’s possible for student edits to be immediately reversed. These challenges have not deterred people in my course. The CRISPR contributions from three students in Social Media for Scientists and Engineers have been viewed a combined 2,946,555 times on Wikipedia.

We often associate millions of views with viral content—not classroom assignments. However, a shift in traditional teaching and learning environments is taking place thanks to the Internet. I anticipate we’ll see more classrooms adopting Wikipedia coursework into existing curriculum. After all, why write for an audience of one when writing for millions is possible?

Returning to Wikipedia Each Academic Year

I continue to use Wikipedia in Social Media for Scientists and Engineers because it is a useful form of science outreach that generates impact. Through their collaborations with other editors, revisions of existing work, and interactions in Talk pages, students become members of digital teams working towards a common goal. When I share the page view analytics with students, I see a classroom full of proud faces. Contributing to Wikipedia is a gratifying experience, and their work helps improve an open-access resource that everyone in the world uses.

Often times, the edits don’t end because the assignment is completed. Though the grading period is over, they will continue to write on Wikipedia because it’s fulfilling. In a class of STEM students, it’s tough to spark interest and excitement for writing; however, with Wikipedia this is possible.


Interested in teaching a Wikipedia writing assignment? Use our free tools and assignment templates to best adapt it to your course. Visit teach.wikiedu.org to get started.


Header image by Smojarad, CC BY-SA 4.0, via Wikimedia Commons.
Bio image by Smojarad, CC BY-SA 4.0, via Wikimedia Commons, cropped.

Why is Wikidata important to you?

17:13, Monday, 03 2019 June UTC

Why is Wikidata important to you?

You may not know it yet, but Wikidata is very important to you. For years most people were suspicious or cautious about Wikipedia being a reliable source. Now the Library of Congress tracks items in Wikidata, making it an authority whose reliability has improved significantly in recent years. Wikidata is surging in popularity and is going to occupy a similarly influential space in our lives.

Wikidata is the centralized, linked data repository for all Wikimedia projects. This means that all Wikimedia projects (Commons and Wikipedia for instance) can pull the information from the same central place. This also means that all 300+ language versions of Wikipedia can pull data from Wikidata as well. There is incredible potential for more access to information, more consistency across different languages, and the ability for any language-speaker to contribute more equitably.

Beyond the effect it is having in Wiki-verse, Wikidata is machine readable. This means that digital assistants, AI, bots, and scripts can interact with Wikidata’s structured, linked data. With one of the world’s largest databases of freely licensed (CC-0), open data, software will be able to better answer your questions, provide more context when you search, and link you to related sources in an efficient way. Additionally, this has implications for increased visibility in Google’s search results, elevating more accurate information to above the fold for countless concepts, events, and individuals.

For those in academia, consider the impact linked data has on libraries. More and more collections are being linked through authority control, structured vocabulary, and other identifiers. Wikidata (and the database software it runs on, Wikibase) are allowing institutions to connect their data like never before. The GND in Germany is a great example of how ambitious these projects can be. Other projects like the Sum of All Paintings/Crotos demonstrate how easy it is to share entire collections with anyone, anywhere. Once entire collections are in Wikidata, users can pull specific information from Wikidata using a powerful query service. Queries can reveal new insights like the location of cities with current mayors who identify as femaleurban population distribution, and customizable lists of Chemistry Nobel Prize Winners.

Building off of these examples, it becomes a logical next step to increase the representation of library collections on Wikidata. The beauty of Wikidata being open is that anyone can pull information from it to enrich a collection, improve research or help illustrate a point in a presentation with a visualization. Imagine the impact on access and visibility integrating an archive, special collections, or general collection on a Wikimedia project could have.

Interested in learning more about Wikidata? Wiki Education is facilitating online courses and in-person workshops this July that embed participants in the possibilities of Wikidata. We’re eager to train new editors and foster a passionate, inclusive community on Wikidata. Find more information and sign up today at data.wikiedu.org.

Bad credentials

10:00, Monday, 03 2019 June UTC

So there has been an issue with QuickStatements on Friday.

As users of that tool will know, you can run QuickStatements either from within your browser, or “in the background” from a Labs server. Originally, these “batch edits” were performed as QuickStatementsBot, mentioning batch and the user who submitted it in the edit summary. Later, through a pull request, QuickStatements gained the ability to run batch edits as the user who submitted the batch. This is done by storing the OAuth information of the user, and playing it back to the Wikidata API for the edits. So far so good.

However, as with many of my databases on Labs, I made the QuickStatements database open for “public reading”, that is, any Labs tool account could see its contents. Including the OAuth login credentials. Thus, since the introduction of the “batch edit as user” feature, up until last Friday, anyone with a login on Labs could, theoretically, perform edits and anyone who did submit a QuickStatements batch, by copying the OAuth credentials.

We (WMF staff and volunteers, including myself) are not aware that any such user account spoofing has taken place (security issue). If you suspect that this has happened, please contact WMF staff or myself.

Once the issue was reported, the following actions were taken

  • deactivation of the OAuth credentials of QuickStatements, so no more edits via spoofed user OAuth information could take place
  • removal of the “publicly” (Labs-internally) visible OAuth information from the database
  • deactivation of the QuickStatement bots and web interface

Once spoofed edits were no longer possible, I went ahead and moved the OAuth storage to a new database that only the QuickStatements “tool user” (the instance of the tool that is running, and myself) can see. I then got a new OAuth consumer for QuickStatements, and restarted the tool. You can now use QuickStatements as before. Your OAuth information will be secure now. Because of the new OAuth consumer for QuickStatements, you will have to log in again once.

This also means that all the OAuth information that was stored prior to Friday is no longer usable, and was deleted. This means that the batches you submitted until Friday will now fall back on the aforementioned QuickStatementsBot, and no longer edit as your user account. If it is very important to you that your edits appear under your user account, please let me know. All new batches will run edit your user accounts, as before.

My apologies for this incident. Luckily, there appears to be no actual damage done.

This is a blog in two parts – the first is some session recommendations for the CILIPS conference, and the second is a list of cool stuff about library engagement with Wikimedia….

Tom Murphy VII, CC-BY-SA 3.0

On 3-4 June, in the fair city of Dundee, it’s the CILIPS Annual Conference 2019, where the great and good of the Library and Information Professionals world in Scotland will come to gather.

We are particularly excited to say that our CEO, Lucy Crompton-Reid, will be keynoting on day 1, and there’ll also be a chance to hear from two other members of the Wikimedia community earlier that day.  If you have even a passing interest in how libraries can – and should – engage with open knowledge in general and Wikimedia in particular, then don’t miss these.

So here’s our Wiki-and-friends top sessions to attend:

  • Monday, 12:25, City Suite – Leveraging libraries: Community, open access and Wikimedia – Jason Evans, National Library of Wales Wikimedian in Residence and Dr Sara Thomas, Scotland Programme Coordination, Wikimedia UK
  • Monday, 15:55, City Suite – Keynote 3 – Creating a more tolerant, informed and democratic society through open knowledge – Lucy Crompton-Reid, Chief Executive of Wikimedia UK
  • Tuesday, 11:40, City Suite – Open Access, Plan S and new models for academic publishing – Dominic Tate, University of Edinburgh
  • Tuesday, 14:15, City Suite – The joy of digital – Exploring digital making and scholarship to enable innovation in research libraries – Kirsty Lingstadt, Head of Digital Library and Deputy Director of Library & University Collections at the University of Edinburgh.

Scotland has hosted three Wikimedians in Residence within the library sector – two at the National Library of Scotland, and one at the Scottish Library and Information Council – and so we’re very happy to be able to continue our relationship with the sector through a presence at the conference.  

Here are a few of our favourite things…

Want to know more about how libraries can engage with Wikimedia?  Here’s some of our favourite library things…. We’d recommend that you bookmark these for later reading….

And finally…

We’re also excited to see the release of the Association of Research Libraries’ White Paper on Wikidata, which makes some excellent – and very practical – suggestions as to how libraries can contribute to Wikidata, and indeed, why Wikidata / Linked Open Data in general, is good for libraries in the first place.  Even if you’re unfamiliar with the ins and outs of linked open data, it’s valuable reading, and well explained. https://www.arl.org/resources/arl-whitepaper-on-wikidata/

If you’re at the CILIPS conference, then we hope you have a great couple of days, please do come and say hi!  If you’re not, but would like to follow along on Twitter, you can do so on #CILIPS19

Older blog entries