Semantic MediaWiki 4.1.2 released

Sunday, 30 July 2023 16:04 UTC

July 29, 2023

Semantic MediaWiki 4.1.2 (SMW 4.1.2) has been released today as a new version of Semantic MediaWiki.

It is a maintenance release that increases version compatibility with MediaWiki 1.39 and PHP 8.1, provides bug fixes and translation updates. Please refer to the help pages on installing or upgrading Semantic MediaWiki to get detailed instructions on how to do this.

weeklyOSM 679

Sunday, 30 July 2023 12:12 UTC

18/07/2023-24/07/2023

Several movies and films have featured maps based on OpenStreetMap data. [1]

Mapping

  • Anne-Karoline Distel has been mapping sewer gas vents and wrote a blog post about it.
  • Kamil Monicz (NorthCrab) described a new tool for importing building data into OSM from Polish official sources with the verification performed by automatic pattern matching on orthorectified imagery. His analysis showed that the quality of edits is comparable with humans verifying official building data. He is also willing to run or help run such imports elsewhere, if similar sources are available.

Community

  • Stéphane De Greef single-handedly captured all of Brussels in 360° street view imagery, covering 4,500 km by bike in two years. All of his 600,000 plus images are available on Mapillary. OpenStreetMap Belgium have published a post on their blog about Stéphane’s project.
  • Ed Freyfogle, in episode 190 of the Geomob Podcast, interviewed Roman Tsisyk, co-founder of Organic Maps.
  • Jiri Vlasak blogged on how he sees the respective roles of the core team and the community in the organisation and running of Missing Maps mapathons.

Events

  • Katja Haferkorn, from the FOSSGIS Coordination Office, put out a call for volunteers to work on the 2024 FOSSGIS Conference programme committee. The first meeting of the committee will be held on Saturday 16 September.

Education

  • Vincent de Château-Thierry announced that a video presentation of the Pifomètre v3 tool, an application for managing street, place and address data in France, has been released.

OSM research

  • HeiGIT researchers published an article in Nature Communications on the uneven distribution of OpenStreetMap data worldwide, which has implications for research and humanitarian use.

Software

  • Kamil Monicz asked why the backlog of pull requests for iD appears to be so large. iD’s maintainer, Martin Raifer, explained that the review process takes much effort even for a simple change, and it is not possible to quickly review every suggestion, so he needs to prioritise them. These pending proposals are simply lower priority.

Releases

  • Roland Olbricht announced the release of Overpass version 0.7.61.

Did you know …

  • … that OpenStreetMap is great for creating maps that anyone can use? When film-makers, on the set of Hollywood blockbusters, need a map, they turn to us! There is a wiki page that lists where OSM has been used in movies. Know of any that are missing? Add them to the wiki page.
  • … have you ever heard of the ‘The United Territories of the Sovereign Nation of the People’s Republic of Slowjamastan’? If not, have a look at The Guardian’s article, the OSM node, its wikipedia article, MicroWiki entry, or the official Slowjamastani website.

OSM in the media

  • Pierre Breteau explained how a French citizen can go on vacation ‘abroad’ without leaving France by visiting French cities that share names with more exotic foreign locales.

Other “geo” things

  • Geomob tooted that you can hear from Dario Raijman, co-founder of Textomap, about the challenges of turning words into maps in episode 191 of their podcast.
  • Flora Incognita explained how to export plant observations from their app to process them with Google Maps, QGIS, or R.

Upcoming Events

Where What Online When Country
North and NorthEast India Floods 2023 Disaster Mapathon 2023-07-29
大安區 COSCUP 2023 OpenStreetMap x Wikidata 開放內容議程軌 2023-07-30 flag
Reunião 2 – OSM Brasil – Mapeamento Aberto para Gestão de Riscos de Desastres [Horário 2] 2023-07-31
Missing Maps London Mapathon 2023-08-01
San Jose South Bay Map Night 2023-08-02 flag
Stuttgart Stuttgarter OpenStreetMap-Treffen 2023-08-02 flag
Sankt Augustin FrOSCon 2023 – St. Augustin / Germany 2023-08-05 – 2023-08-06 flag
OSMF Engineering Working Group meeting 2023-08-09
Budapest #OSM19 birthday fish and ice cream on the beach 2023-08-09 flag
München Münchner OSM-Treffen 2023-08-09 flag
Montrouge Rencontre contributeurs Sud de Paris 2023-08-10 flag
Budapest #OSM19 birthday QGIS editing, drinks, party games 2023-08-11 flag
Zürich OSM-Stammtisch 2023-08-11 flag
Recording milk churn stands for Ireland’s National Heritage Week 2023 2023-08-12 – 2023-08-20
Queens Mapping Access on Austin Street 2023-08-12 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by MatthiasMatthias, Nordpfeil, PierZen, TheSwavu, barefootstache, anonymus, rtnf.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Investigate a PHP segmentation fault

Friday, 28 July 2023 20:46 UTC

Summary


The Beta-Cluster-Infrastructure is a farm of wikis we use for experimentation and integration testing. It is updated continuously: new code is every ten minutes and the databases every hour by running MediaWiki maintenance/update.php. The scheduling and running are driven by Jenkins jobs which statuses can be seen on the Beta view:

On top of that, Jenkins will emit notification messages to IRC as long as one of the update job fails. One of them started failing on July 25th and this is how I was seeing it the alarm (times are for France, UTC+2):

(wmf-insecte is the Jenkins bot, insecte is french for bug (animals), and the wmf- prefix identifies it as a Wikimedia Foundation robot).

Clicking on the link gives the output of the update script which eventually fails with:

+ /usr/local/bin/mwscript update.php --wiki=wikifunctionswiki --quick --skip-config-validation
20:31:09 ...wikilambda_zlanguages table already exists.
20:31:09 ...have wlzl_label_primary field in wikilambda_zobject_labels table.
20:31:09 ...have wlzl_return_type field in wikilambda_zobject_labels table.
20:31:09 /usr/local/bin/mwscript: line 27:  1822 Segmentation fault      sudo -u "$MEDIAWIKI_WEB_USER" $PHP "$MEDIAWIKI_DEPLOYMENT_DIR_DIR_USE/multiversion/MWScript.php" "$@"

The important bit is Segmentation fault which indicates the program (php) had a fatal fault and it got rightfully killed by the Linux Kernel. Looking at the instance Linux Kernel messages via dmesg -T:

[Mon Jul 24 23:33:55 2023] php[28392]: segfault at 7ffe374f5db8 ip 00007f8dc59fc807 sp 00007ffe374f5da0 error 6 in libpcre2-8.so.0.7.1[7f8dc59b9000+5d000]
[Mon Jul 24 23:33:55 2023] Code: ff ff 31 ed e9 74 fb ff ff 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 41 55 41 54 55 48 89 d5 53 44 89 c3 48 81 ec 98 52 00 00 <48> 89 7c 24 18 4c 8b a4 24 d0 52 00 00 48 89 74 24 10 48 89 4c 24
[Mon Jul 24 23:33:55 2023] Core dump to |/usr/lib/systemd/systemd-coredump 28392 33 33 11 1690242166 0 php pipe failed

With those data, I had enough to the most urgent step: file a task (T342769) which can be used as an audit trail and reference for the future. It is the single most important step I am doing whenever I am debugging an issue, since if I have to stop due to time constraint or lack of technical abilities, others can step in and continue. It also provides an historical record that can be looked up in the future, and indeed this specific problem already got investigated and fully documented a couple years ago. Having a task is the most important thing one must do whenever debugging, it is invaluable. For PHP segmentation fault, we even have a dedicated project php-segfault

With the task filed, I have continued the investigation. The previous successful build had:

19:30:18 ...have wlzl_label_primary field in wikilambda_zobject_labels table.
19:30:18 ...have wlzl_return_type field in wikilambda_zobject_labels table.
19:30:18        ❌ Unable to make a page for Z7138: The provided content's label clashes with Object 'Z10138' for the label in 'Z1002'.
19:30:18        ❌ Unable to make a page for Z7139: The provided content's label clashes with Object 'Z10139' for the label in 'Z1002'.
19:30:18        ❌ Unable to make a page for Z7140: The provided content's label clashes with Object 'Z10140' for the label in 'Z1002'.
19:30:18 ...site_stats is populated...done.

The successful build started at 19:20 UTC and the failing one finished at 20:30 UTC which gives us a short time window to investigate. Since the failure seems to happen after updating the WikiLambda MediaWiki extension, I went to inspect the few commits that got merged at that time. I took advantage of Gerrit adding review actions as git notes, notably the exact time a change got submitted and subsequently merged. The process:

Clone the suspect repository:

git clone https://gerrit.wikimedia.org/r/extensions/WikiLambda
cd WikiLambda

Fetch the Gerrit review notes:

git fetch origin refs/notes/review:refs/notes/review

The review notes can be shown below the commit by passing --notes=review to git log or git show, an example for the current HEAD of the repository:

$ git show -q --notes=review
commit c7f8071647a1aeb2cef6b9310ccbf3a87af2755b (HEAD -> master, origin/master, origin/HEAD)
Author: Genoveva Galarza <ggalarzaheredero@wikimedia.org>
Date:   Thu Jul 27 00:34:03 2023 +0200

    Initialize blank function when redirecting to FunctionEditor from DefaultView
    
    Bug: T342802
    Change-Id: I09d3400db21983ac3176a0bc325dcfe2ddf23238

Notes (review):
    Verified+1: SonarQube Bot <kharlan+sonarqubebot@wikimedia.org>
    Verified+2: jenkins-bot
    Code-Review+2: Jforrester <jforrester@wikimedia.org>
    Submitted-by: jenkins-bot
    Submitted-at: Wed, 26 Jul 2023 22:47:59 +0000
    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/942026
    Project: mediawiki/extensions/WikiLambda
    Branch: refs/heads/master

Which shows this change has been approved by Jforrester and entered the repository on Wed, 26 Jul 2023 22:47:59 UTC. Then to find the commits in that range, I ask git log to list:

  • anything that has a commit date for the day (it is not necessarily correct but in this case it is a good enough approximation)
  • from oldest to newest
  • sorted by topology order (aka in the order the commit entered the repository rather than based on the commit date)
  • show the review notes to get the Submitted-at field

I can then scroll to the commits having a Submitted-at in the time window of 19:20 UTC - 20:30 UTC. I have amended the below output to remove most of the review notes except for the first commit:

$ git log --oneline --since=2023/07/25 --reverse --notes=review --no-merges --topo-order
<scroll>
653ea81a Handle oldid url param to view a particular revision
Notes (review):
    Verified+1: SonarQube Bot <kharlan+sonarqubebot@wikimedia.org>
    Verified+2: jenkins-bot
    Code-Review+2: Jforrester <jforrester@wikimedia.org>
    Submitted-by: jenkins-bot
    Submitted-at: Tue, 25 Jul 2023 19:26:53 +0000
    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941482
    Project: mediawiki/extensions/WikiLambda
    Branch: refs/heads/master

fe4b0446 AUTHORS: Update for July 2023
Notes (review):
    Submitted-at: Tue, 25 Jul 2023 19:49:43 +0000
    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941507

73fcb4a4 Update function-schemata sub-module to HEAD (1c01f22)
Notes (review):
    Submitted-at: Tue, 25 Jul 2023 19:59:23 +0000
    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941384

598f5fcc PageRenderingHandler: Don't make 'read' selected if we're on the edit tab
Notes (review):
    Submitted-at: Tue, 25 Jul 2023 20:16:05 +0000
    Reviewed-on: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/941456

Or in a Phabricator task and human friendly way:

The Update function-schemata sub-module to HEAD (1c01f22) has a short log of changes it introduces:

  • New changes:
  • abc4aa6 definitions: Add Z1908/bug-bugi and Z1909/bug-lant ZNaturalLanguages
  • 0f1941e definitions: Add Z1910/piu ZNaturalLanguage
  • 1c01f22 definitions: Re-label all objects to drop the 'Z' per Amin

Since the update script fail on WikiLambda I have reached out to its developers so they can investigate their code and maybe find what can trigger the issue.

On the PHP side we need a trace. That can be done by configuring the Linux Kernel to take a dump of the program before terminating it and having it stored on disk, it did not quite work due to a configuration issue on the machine and in the first attempt we forgot to run the command by asking bash to allow the dump generation (ulimit -c unlimited). From a past debugging session, I went to run the command directly under the GNU debugger: gdb.

There are a few preliminary step to debug the PHP program, at first one needs to install the debug symbols which lets the debugger map the binary entries to lines of the original source code. Since error mentions libpcre2 I also installed its debugging symbols:

$ sudo apt-get -y install php7.4-common-dbgsym php7.4-cli-dbgsym libpcre2-dbg

I then used gdb to start a debugging session:

sudo  -s -u www-data gdb --args /usr/bin/php /srv/mediawiki-staging/multiversion/MWScript.php update.php --wiki=wikifunctionswiki --quick --skip-config-validation
gdb>

Then ask gdb to start the program by entering in the input prompt: run . After several minutes, it caught the segmentation fault:

gdb> run
<output>
<output freeze for several minutes while update.php is doing something>

Thread 1 "php" received signal SIGSEGV, Segmentation fault.
0x00007ffff789e807 in pcre2_match_8 (code=0x555555ce1fb0, 
    subject=subject@entry=0x7fffcb410a98 "Z1002", length=length@entry=5, 
    start_offset=start_offset@entry=0, options=0, 
    match_data=match_data@entry=0x555555b023e0, mcontext=0x555555ad5870)
    at src/pcre2_match.c:6001
6001    src/pcre2_match.c: No such file or directory.

I could not find a debugging symbol package containing src/pcre2_match.c but that was not needed afterall.

To retrieve the stacktrace enter to the gdb prompt bt :

gdb> bt
#0  0x00007ffff789e807 in pcre2_match_8 (code=0x555555ce1fb0, 
    subject=subject@entry=0x7fffcb410a98 "Z1002", length=length@entry=5, 
    start_offset=start_offset@entry=0, options=0, 
    match_data=match_data@entry=0x555555b023e0, mcontext=0x555555ad5870)
    at src/pcre2_match.c:6001
#1  0x00005555556a3b24 in php_pcre_match_impl (pce=0x7fffe83685a0, 
    subject_str=0x7fffcb410a80, return_value=0x7fffcb44b220, subpats=0x0, global=0, 
    use_flags=<optimized out>, flags=0, start_offset=0) at ./ext/pcre/php_pcre.c:1300
#2  0x00005555556a493b in php_do_pcre_match (execute_data=0x7fffcb44b710, 
    return_value=0x7fffcb44b220, global=0) at ./ext/pcre/php_pcre.c:1149
#3  0x00007ffff216a3cb in tideways_xhprof_execute_internal ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#4  0x000055555587ddee in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1732
#5  execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#6  0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#7  0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#8  execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#9  0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#10 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#11 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#12 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#13 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#14 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#15 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#16 0x000055555587c63c in ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1602
#17 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53535
#18 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#19 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#20 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#21 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#22 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
    at ./Zend/zend_vm_execute.h:1714
#23 execute_ex (ex=0x555555ce1fb0) at ./Zend/zend_vm_execute.h:53539
#24 0x00007ffff2169c89 in tideways_xhprof_execute_ex ()
   from /usr/lib/php/20190902/tideways_xhprof.so
#25 0x000055555587de4b in ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER ()
 at ./Zend/zend_vm_execute.Quit
CONTINUING

Which is not that helpful. Thankfully the PHP project provides a set of macro for gdb which lets one map the low level C code to the PHP code that was expected. It is provided in their source repository /.gdbinit and one should use the version from the PHP branch being debugged, since we use php 7.4 I went to use the version from the latest 7.4 series (7.4.30 at the time of this writing): https://raw.githubusercontent.com/php/php-src/php-7.4.30/.gdbinit

Download the file to your home directory (ex: /home/hashar/gdbinit) and ask gdb to import it with, for example, source /home/hashar/gdbinit :

(gdb) source /home/hashar/gdbinit

This provides a few new commands to show PHP Zend values and to generate a very helpfull stacktrace (zbacktrace):

(gdb) zbacktrace
[0x7fffcb44b710] preg_match("\7^Z[1-9]\d*$\7u", "Z1002") [internal function]
[0x7fffcb44aba0] Opis\JsonSchema\Validator->validateString(reference, reference, array(0)[0x7fffcb44ac10], array(7)[0x7fffcb44ac20], object[0x7fffcb44ac30], object[0x7fffcb44ac40], object[0x7fffcb44ac50]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:1219 
[0x7fffcb44a760] Opis\JsonSchema\Validator->validateProperties(reference, reference, array(0)[0x7fffcb44a7d0], array(7)[0x7fffcb44a7e0], object[0x7fffcb44a7f0], object[0x7fffcb44a800], object[0x7fffcb44a810], NULL) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:943 
[0x7fffcb44a4c0] Opis\JsonSchema\Validator->validateKeywords(reference, reference, array(0)[0x7fffcb44a530], array(7)[0x7fffcb44a540], object[0x7fffcb44a550], object[0x7fffcb44a560], object[0x7fffcb44a570]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:519 
[0x7fffcb44a310] Opis\JsonSchema\Validator->validateSchema(reference, reference, array(0)[0x7fffcb44a380], array(7)[0x7fffcb44a390], object[0x7fffcb44a3a0], object[0x7fffcb44a3b0], object[0x7fffcb44a3c0]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:332 
[0x7fffcb449350] Opis\JsonSchema\Validator->validateConditionals(reference, reference, array(0)[0x7fffcb4493c0], array(7)[0x7fffcb4493d0], object[0x7fffcb4493e0], object[0x7fffcb4493f0], object[0x7fffcb449400]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:703 
[0x7fffcb4490b0] Opis\JsonSchema\Validator->validateKeywords(reference, reference, array(0)[0x7fffcb449120], array(7)[0x7fffcb449130], object[0x7fffcb449140], object[0x7fffcb449150], object[0x7fffcb449160]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:523 
[0x7fffcb448f00] Opis\JsonSchema\Validator->validateSchema(reference, reference, array(0)[0x7fffcb448f70], array(7)[0x7fffcb448f80], object[0x7fffcb448f90], object[0x7fffcb448fa0], object[0x7fffcb448fb0]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:332 
<loop>

The stacktrace shows the code entered an infinite loop while validating a Json schema up to a point it is being stopped.

The arguments can be further inspected by using printz and giving it as argument an object reference. For the line:

For [0x7fffcb44aba0] Opis\JsonSchema\Validator->validateString(reference, reference, array(0)[0x7fffcb44ac10], array(7)[0x7fffcb44ac20], object[0x7fffcb44ac30], object[0x7fffcb44ac40], object[0x7fffcb44ac50]) /srv/mediawiki-staging/php-master/vendor/opis/json-schema/src/Validator.php:1219
(gdb) printzv 0x7fffcb44ac10
[0x7fffcb44ac10] (refcount=2) array:     Hash(0)[0x5555559d7f00]: {
}
(gdb) printzv 0x7fffcb44ac20
[0x7fffcb44ac20] (refcount=21) array:     Packed(7)[0x7fffcb486118]: {
      [0] 0 => [0x7fffcb445748] (refcount=17) string: Z2K2
      [1] 1 => [0x7fffcb445768] (refcount=18) string: Z4K2
      [2] 2 => [0x7fffcb445788] long: 1
      [3] 3 => [0x7fffcb4457a8] (refcount=15) string: Z3K3
      [4] 4 => [0x7fffcb4457c8] (refcount=10) string: Z12K1
      [5] 5 => [0x7fffcb4457e8] long: 1
      [6] 6 => [0x7fffcb445808] (refcount=6) string: Z11K1
}
(gdb) printzv 0x7fffcb44ac30
[0x7fffcb44ac30] (refcount=22) object(Opis\JsonSchema\Schema) #485450 {
id => [0x7fffcb40f508] (refcount=3) string: /Z6#
draft => [0x7fffcb40f518] (refcount=1) string: 07
internal => [0x7fffcb40f528] (refcount=1) reference: [0x7fffcb6704e8] (refcount=1) array:     Hash(1)[0x7fffcb4110e0]: {
      [0] "/Z6#" => [0x7fffcb71d280] (refcount=1) object(stdClass) #480576
}
(gdb) printzv 0x7fffcb44ac40
[0x7fffcb44ac40] (refcount=5) object(stdClass) #483827
Properties     Hash(1)[0x7fffcb6aa2a0]: {
      [0] "pattern" => [0x7fffcb67e3c0] (refcount=1) string: ^Z[1-9]\d*$
}
(gdb) printzv 0x7fffcb44ac50
[0x7fffcb44ac50] (refcount=5) object(Opis\JsonSchema\ValidationResult) #486348 {
maxErrors => [0x7fffcb4393e8] long: 1
errors => [0x7fffcb4393f8] (refcount=2) array:     Hash(0)[0x5555559d7f00]: {
}

Extracting the parameters was enough for WikiLambda developers to find the immediate root cause, they have removed some definitions which triggered the infinite loop and manually ran a script to reload the data in the Database. Eventually the Jenkins job managed to update the wiki database:

16:30:26 <wmf-insecte> Project beta-update-databases-eqiad build #69029: FIXED in 10 min: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/69029/

One problem solved!

References:

Greetings, fellow knowledge enthusiasts and advocates of open access! We are thrilled to announce an exciting new project that aims to foster a thriving community of Open Culture Practices for GLAM (Galleries, Libraries, Archives, and Museums) institutions in Nigeria, promoting the sharing, distribution, and access of cultural heritage materials for the benefit of all.

In Nigeria, the concept of open culture and open licensing is relatively new and remains largely untapped, with many GLAM institutions still functioning as physical spaces with restrictions to their collections outside of the four walls of their buildings, limiting their potential reach and impact. This has posed significant challenges for the Creative Commons Nigeria Affiliate, which seeks to encourage the use of Creative Commons (CC) licensing and open models for the dissemination of open knowledge.

According to various studies conducted by esteemed researchers such as Okoro (2016), Oluruntimilehin (2020 and 2022), Chuma-Okoro and Ogbeidi (2021), and Ikem (2022), the hurdles to promoting open access in Nigerian cultural institutions are multifold. They include the lack of awareness and understanding of open practices, scarce funding for digitization projects, legal constraints, and technical support. Additionally, misconceptions about CC and Open Educational Resources (OERs) have fueled skepticism in the publishing industry, with some fearing revenue loss.

Amidst these challenges, the Creative Commons launched a call for case studies on open access in cultural institutions, revealing that knowledge of open practices in GLAMs is relatively low in Nigeria. This prompted the formulation of a comprehensive proposal focused on Building Open Culture Community for Sustainable Open Licensing Practices in Nigeria through advocacy, awareness campaigns, and capacity building as part of the Creative Commons Activity for 2023.

The main goal of this project is to promote open access and licensing of GLAM collections in Nigeria using Creative Commons licenses, attributions, and open platform like Wikipedia and Wikimedia Commons. By increasing awareness and understanding of open practices among GLAM professionals, we aim to boost the adoption of open access policies and practices in these institutions. Moreover, through extensive digitization efforts and the utilization of appropriate CC platforms and tools, we seek to enhance the online visibility and accessibility of cultural heritage materials and library collections.

To achieve these ambitious goals, the project will encompass several key activities:

  1. Online Advocacy: Using E-posters and social media platforms, we will push for Open Culture Practices and the use of CC tools and licensing, aiming to elevate the importance of open practices in the policy agenda of GLAM institutions via twitter and facebook.
  2. Network of Open Culture Practices Advocates: We will establish a dedicated network of GLAM professionals who are passionate about open access to foster collaboration, shared learning, and advocacy for openness in GLAM institutions.
  3. Roundtable Discussion: Bringing together 30 heads of GLAM institutions, policymakers, managers, and representatives, we will engage in a constructive dialogue to identify challenges and devise a roadmap for promoting open GLAM practices. The insights gained during this roundtable discussion will be published in a report titled “Open Culture Practices in Nigeria: The Way Forward.”
  4. Workshops and Training: A series of in-person and online workshops will be organized for 30 selected GLAM professionals. These sessions will focus on open practices, leveraging existing open platforms for knowledge dissemination, and the use of appropriate CC licenses to enhance the online presence of GLAM collections using Wikimedia commons and Wikipedia. Additionally, practical training on using Wikidata will promote linked open data among GLAM institutions in Nigeria.

We firmly believe that this initiative will be a game-changer for the cultural heritage landscape in Nigeria. By uniting GLAM professionals and stakeholders in a community of open-mindedness, we can pave the way for transformative change in the accessibility and preservation of our rich cultural heritage.

We invite you all to embark on this incredible journey with us as we strive to build a more inclusive, open, and connected world of knowledge. Together, we can bring about positive change and ensure that our cultural heritage remains accessible for generations to come.

Stay tuned for updates and progress reports as we embark on this noble endeavor! Let us embrace the power of open access and propel Nigerian GLAMs to the forefront of the global open knowledge movement.

Project Team

Bukola James (project lead): is a fellow of Wikipedian in African Libraries, and the first Certified Trainer of the Reading Wikipedia in the Classroom program in Nigeria. She has led several projects among which include: WikiLovesLibraries Nig., Reading Wikipedia in the Classroom Kwara, Nigeria, 1Lib1Ref Kwara 2022, WikiGLAM Awareness for Librarians in Kwara State, and Wikidata for Libraries and notable Librarians in Kwara State. Co-organiser, Wikipedia awareness for Library and Information Students in Nigeria, Wikidata for novelists and their novels, and the founder of the Wikimedia Fan Club, Kwara State University. She is currently a Special Advisor for the Wikipedia +Education User Group, the Regional Liasion Sub-Saharan Africa, Lets Connect Team, and community coordinator for CfA WiR Community.

Linason Blessing (facilitator): is an experienced Wikibrarian and has participated in various Wikimedia projects such as Wiki Loves Africa, WPWP, Wikidata (Media personalities in Nigeria), Winner of WPWP Kwara and Wikidata Media personality in Nigeria, she facilitated several in various trainings like; WikiGLAM Awareness for Libraries and Librarians in Kwara, Wikidata for Libraries and Librarians in Nigeria, 1lib1ref 2022 Kwara, Wiki Loves Libraries, the first Reading Wikipedia in the classroom Nigeria e.t.c. She led the Wikipedia awareness for Library and Information Science Students in Nigeria.

Miracle James (facilitator): Mijesty is an experienced Wikipedia volunteer who has participated in several wikipedia projects in Nigeria. She has facilitated several Wikipedia trainings like the Wiki Loves Libraries, Promoting Nigerian books and authors on Wikipedia, she also led the recently concluded Wikipedia awareness in Offa community amongst others.

Rhoda James (facilitator): is an experienced Wikimedian, who has participated in various Wikimedia projects (as a facilitator as well as an editor) namely; Some of these projects include Wiki loves SDGs, WikiGLAM, 1lib1ref, Wikidata for Libraries and Librarians in Nigeria, Wikidata for Novels and Novelists in Nigeria, WikiLovesLibraries, Reading Wikipedia in the Classroom for Secondary School students, Promoting Nigerian Books and Authors, Celtic Knot Kwara, and more.

Barakat Adegboye (facilitator): is a Wikimedia volunteer from Nigeria who has gained extensive experience as a project lead, facilitator and participant in various Wikimedia projects. Some of these projects include Let’s Connect Peer Learning Program in Kwara, Reading Wikipedia in the Classroom programs in Kwara, Nigeria, Wiki loves SDGs, WikiGLAM, 1lib1ref, Wikidata for Libraries and Librarians in Nigeria, Wikidata for Novels and Novelists in Nigeria, WikiLovesLibraries, Promoting Nigerian Books and Authors, Celtic Knot Kwara, and more. She is also a trained educator and the General Secretary of the University of Ilorin Wikimedia Fan Club.

Dr. Ngozi Osuchukwu(Facilitator): is a certified Librarian and Wikipedian: Reading Wikipedia in the Classroom. She is the Coordinator of Wikimedia Anambra Network. She has facilitated several Wikipedia trainings like: AfLIA Basics of Wikipedia Training; Awareness and Campaign on #1Lib1Ref in Anambra State Nigeria; World Cancer Day 2023 Edit-a-thon among others. Ngozi has presented and published papers on Wikimedia in local and international platforms.

Omorodion (Facilitator): is an experienced Librarian and Wikimedian from Nigeria. He has organized and facilitated several Wikimedia events in Nigeria, including the Wikipedia Train-the-Trainer Workshop for Librarians in Edo State, Nigeria; A seminar on Linked Open Data and Wikibase for Nigerian Librarians, Edo History Write-a-thon and Wiki for Climate and Environmental Literacy in Nigeria. Omorodion is a certified Wikimedia Campaign organizer and Co-founder of the Port Harcourt Wikimedia Hub. He has also published over 40 scholarly works in reputable outlets around the World.

Read this post in Spanish in our website.

It is a pleasure for Wikimedistas de Uruguay to announce that we have been approved to join the Digital Citizenship Working Group (GTCD) of the Agency of Electronic Government and Information and Knowledge Society (AGESIC).

AGESIC is in charge of coordinating this group of organizations committed to digital citizenship, which includes actors such as Unesco Uruguay, UNICEF and other multilateral agencies, public, academic and civil society organizations.

This is a special moment for the WGCD, as it is in the midst of the review process of the Digital Citizenship Strategy for an Information and Knowledge Society, published in 2020. The United Nations defines digital citizen participation as “the process of engaging citizens through ICTs in policy formulation, decision-making and service design and delivery, in ways that are participatory, inclusive and deliberative” (United Nations, 2013). The WGCD Strategy established three lines of work linked to governance, capacity building and research around digital citizenship.

Our incorporation to the WGCD is part of a strategic line of action of Wikimedistas de Uruguay. Since we started our work in 2021, we have been developing actions around digital citizenship, from training workshops on Wikipedia and digital citizenship in partnership with the Department of Wellbeing and Digital Citizenship of Ceibal, to the recent closing panel of the Wiki for Human Rights campaign on Green Citizenship in digital spaces, among other activities. We hope that participation in the GTCD will allow us to align and enhance efforts in the promotion of digital skills for the exercise of citizenship.

Wikipedia is truly inseparable from Johnny Au’s life. This English Wikipedia editor has been editing each day since November 11, 2007, which makes him the Wikimedian with the longest editing streak on English Wikipedia. This month we celebrate his amazing work and his longtime and continuous dedication to free knowledge.

a yellow barnstar leaving a yellow trail as it flys through a white background.

Johnny Au started contributing to Wikipedia in 2006, and at some point editing became a part of his everyday life. I have a habit of editing every day. I edit when I wake up and I edit before heading to bed to ensure that I maintain my editing streak, he says.

Each day Johnny checks his extensive watchlist, which includes articles related to his beloved hometown Toronto. From local sports teams and art galleries, to Toronto transit system – Johnny is passionate about all things Toronto-related, and carefully watches over Wikipedia articles about it. Including his favorite one: the article about Toronto subway public art because he really enjoys looking at the subway system’s artworks.

He specializes in minor, maintenance edits: correcting spelling and language, reverting vandalisms, adding images and correcting mistakes. This kind of tireless, detailed, regular work is important for keeping Wikipedia quality. And it hasn’t gone unnoticed: Johnny’s dedication to keeping the Toronto Blue Jays article vandalism-free even caught some media attention and resulted in an article in the Toronto Star.

The daily habit of editing and the pure love for Wikipedia, and passion for free knowledge, is what keeps him going. Day after day, for 15 years (and counting). He makes around 10 edits each day, which adds up to an impressive number of 64 000 edits in English Wikipedia. And more than 5700 days of continuous editing.

Johnny Au, fot. Johnny Au, CC BY-SA 3.0

Johnny’s editing streak was not always free of threats and challenges. It came close to a screeching halt at a seminal moment in Wikipedia’s history, the 2012 blackout. Due to a decision of the Wikipedia community, who protested against legislation seen as potentially harmful to their work, English Wikipedia was not accessible for 24 hours – not only for reading, but also for editing. Fortunately, the time zone differences helped Johnny maintain his editing streak. The blackout started at midnight Eastern Standard Time, while Johnny’s streak was based on the UTC time standard. The five-hour difference between the two gave our constant editor a chance to make his daily edit after all.

While Toronto remains Johnny Au’s most beloved editing topic, it is not the only thing he is passionate about. As a true Wikimedian, Johnny has a wide area of interests: he is a piano player, a science enthusiast, a person interested in linguistics, cartography, public transport, astronomy and sports. He is also developing an interest in digital art and photography of buildings and other structures in Toronto under construction.

He would like the world to know Wikimedia projects are much more reliable than the vast majority of websites, despite Wikimedia projects being user-generated.There are plenty of people who distrust Wikimedia projects for ideological reasons (especially during the COVID-19 pandemic and the 2020 American presidential election), preferring completely bogus clickbait websites filled with misinformation and disinformation. It is a major challenge to convince others that Wikimedia projects are not ideologically biased, he says. At the same time, the community’s resilience to misinformation in the face of the pandemic and preparation ahead of contentious elections has proven resilient in our world.

When asked about advice to share with other editors, he says: Never give up. Fight the good fight. We must fight against misinformation and disinformation.

Thank you for your work Johnny and we wish you 100 years (and more) of daily edits!

In Nelson Mandela’s seminal autobiography, Long Walk to Freedom (1995), he stated: 

“One cannot be prepared for something while secretly believing it will not happen.” 

With climate change impacting weather patterns and environmental degradation rife across the African continent, how can Africa’s populations prepare if they don’t even know what is happening and what to do about it? 

In 2019, the Afrobarometer report found that only 28% of Africans (in the 34 countries surveyed) were aware of climate change as a concept. This worrying lack of knowledge is partly due to a pervasive lack of visibility of any factual information related to Climate, Weather, and Climate Change, which is absent and hardly ever carried by Africa’s media. Wikipedia is one of the few media platforms that people across Africa can access for free. Yet, the information about climate, climate change, and environmental impact related to Africa needs to be updated if it exists. 

In March 2023, Wiki In Africa‘s Florence Devouard and Isla Haddow-Flood, in partnership with Wikimedia Usergroup Côte d’Ivoire (WMUG CI),  was tasked by the African Knowledge Initiative (an African Union and Wikimedia Foundation partnership) to develop and manage an awareness and content contribution WikiFocus to tackle this monumental continental oversight. The two-month-long African Environment WikiFocus was designed to inspire Africa’s Wikimedia communities to add updated, factual content that fills the gaps in climate and environment knowledge on the Wikimedia projects. 

The African Environment WikiFocus launched on the 3rd of March, a day set by the Africa Union to celebrate one of Africa’s most stalwart and inspiring environmentalists, Wangari Maathai. 

Wangari Maathai famously said: “You cannot protect the environment unless you empower people, you inform them, you help them understand that these resources are their own, that they must protect them.”

With Wangari Maathai’s words echoing in our hearts, we released a bi-lingual call to Africa’s Wikimedians, offering microgrants to support training and contribution events and for jurors to assess these applications. Sixty-two microgrant applications were submitted, and 49 expressions of interest in being a jury member. 

The seven-member jury faced the challenge of whittling down the final selection to 17 microgrant-funded communities in 13 countries. 

Harnessing the Power of WikiFocus to Inspire Environmental Action in Africa

The bi-lingual (English and French) Africa Environment WikiFocus was centrally organized online around the Africa Environment portal on Meta. Activities occurred across Africa through online and on-the-ground activities of 16 microgrant-funded communities (one withdrew). 

“The Africa Environment WikiFocus has been a game-changer in working with the Wikimedia projects to raise awareness about environmental issues and promote sustainable open practices in the region,” says co-organiser Florence Devouard. “Its success is a testament to the power of collective action and the importance of knowledge-sharing in creating a better future for our planet.” 

The 16 communities hosted over 28 events in 13 African countries (Botswana, Benin, Cameroon, Côte d’Ivoire, Democratic Republic of Congo, Ghana, Guinea, Kenya, Malawi, Nigeria, Rwanda, Tanzania, Togo and Zimbabwe). The selected African Environment WikiFocus applications received microgrants totaling 20,125 USD.

“I also got to learn about different individuals because we worked with environmentalists and climate change individuals fighting to protect environments and help keep the environment safe, ranging from a 20-year-old who initiated the school strike initiative. She’s from Osun State. The ‘school strike’ initiative has to do with the fact that every Friday, they will skip classes to go and do a kind of demonstration, telling our governments how to help protect our environment. Every Friday, they come together and do a kind of demonstration. So, it’s very interesting.” — Obiageli Ezeilo (Local Organiser, Nigeria)

Wiki In Africa hosted three online training webinars, two online office hours, and three local organizer meetups online (with the mid-term organized by WMUG CI) to support skills acquisition, experience sharing, provide visibility, and support the activities of the micro-grantees. 

There were three external partners (one media partner, two activism partners). A compilation of 51 lists, reports, and references in both English and French and 12 red lists reinforced the factual contributions of each community and individual participant.

“It was an eye-opener because I discovered the range of ecosystems, conservation efforts, and some other sustainable practices being implemented across the continent. I also got to know more about the connections between the environment and the community and the importance of preserving our natural resources, particularly in Africa. You know, it’s really been an exciting one, and it’s really exposed me to a lot of ideas and opportunities.” — Kingsley Nkem (Nigeria)

The African Environment WikiFocus far exceeded expectations. The 17 communities that received microgrants hosted around 400 editors in creating or adding to around 10,500 articles and 2,579 references. There were around 4,500 media uploads to Wikimedia Commons – this figure excludes the 12,961 Wiki Loves Africa contributions under the theme of Climate & Weather.

“I realized that there wasn’t really an article that talked about the climate or the air here, for example. So you can imagine. And today, through this campaign, my greatest joy is to see that when you type on Wikipedia: “Climate of the Democratic Republic of the Congo,” you already have an article presenting the climate of the DRC. So it’s a real pleasure to have articles that talk to you about climates—the climate of our capital, Kinshasa, for example—so it’s a real pleasure for us.” – Valentin Nasibu (D R Congo)

Just a few of the examples of articles created or improved as part of the WikiFocus include these articles in French: 

Just a few of the examples of articles created or improved as part of the WikiFocus include these articles in English: 

Finally, many articles were created in local language Wikipedias across Africa, including these:

Extending the visibility of the project through collaborative synergies

Aligning the focused activities of existing Wiki In Africa initiatives with the African Environment Wiki Focus was pivotal in its overall success. WikiAfrica Hour, the live monthly Wikimedia discussion show, was the platform for the launch event (watch the launch on YouTube). 

The Wiki Loves Women annual Tell Us About Her image description drive focused on images depicting Women on the Frontline of Climate. This drive resulted in 28 contributors adding 43,835 descriptions to 2,076 images on Wikimedia Commons.

Finally, Africa’s most potent annual photo contest, Wiki Loves Africa, received 12,961 images, video, audio, diagrams, and photo essays depicting Climate & Weather, contributed by 804 photographers (57% new contributors to the Wikimedia projects). 

The intentional alignment by Wiki in Africa between the two projects – Wiki Loves Africa and the Africa Environment WikiFocus – saw the addition of two extra prizes to Wiki Loves Africa 2023. The Africa Environment WikiFocus has sponsored two Wiki Loves Africa media categories for the best representation of the impact of climate change: the best video and the best photo essay. The winners will be announced at Wikimania 2023 in August.

A great beginning – so much still to be achieved

Africa is one of the regions most vulnerable to the impacts of climate change. The continent is responsible for only around 4% of global greenhouse gas emissions. Still, the United Nations has warned that climate change will disproportionately affect Africa. Expected impacts include increased frequency and severity of droughts, floods, and other extreme weather events that can damage crops and infrastructure and cause food shortages. The United Nations Development Programme (UNDP) has released estimates that climate change could reduce crop yields in Africa by up to 30% by 2050.

African Environment WikiFocus is just the beginning! The human and environmental cost of the unusual and extreme weather patterns in the first part of 2023 alone has indicated that climate information is indeed a priority. Wiki In Africa is looking forward to ensuring it happens again next year, to a greater extent with a more significant impact. We look forward to you joining us and other open movement initiatives focused on making a difference!

Without knowing about grants by the Wikimedia Foundation, the outreachers in Japan used to pay for train tickets and accommodation fees on their own. The majority of them live in the wider Tokyo area. It wasn’t always that public libraries/local governments called them with the transportation expenses.

So it was very hard to outreach in the remote areas. But WMF’s grants has changed this situation. Let me show you an example. How a Rapid Fund contributed to a smaller island? 

Place: a junior high school (age 13 – 15)

Audience: 120 students + teachers

Date: 2023 July 6 Thu   13:50-14:40 + 14:50-15:40 (100mints)

The outreacher: Racco

The important things Racco wanted the students to consider were shared at the beginning:

1, Always be conscious of where the information comes from.

2, Never hurt anyone nor yourself.

3, Your edit must help someone and yourself.

The flow

1, Made 12 groups

2, Racco showed 12 themes about the village, On-na, which were prepared in advance

3, Each group searched about the theme via the internet/books.

4, When the students’ outcomes were appropriate for Wikipedia, Racco edited them in the classroom.

The setup

Teachers said in advance that students were very shy to say something in the classroom. So to make the session interactive, Racco recommended the students use the tablets, which were connected to a projector. Each tablet was shared by a few students.

In details

For beginners, Racco always first shares the importance of information literacy using examples which are familiar for the participants (in this session, customized for students ages 13 – 15), before explaining “What is Wikipedia?” :

  • Quizzes: using Pokémon and others, “Why do you get knowledge?”
  • Deepfake+AI→some movies of a famous person’s body with another famous person’s head
  • A beautiful young girl, but the reality is a middle-aged man (you NEVER know who the person is)
  • “Do you eat food on the road?”  “No”
  • ”Have you checked the notice on the food package you would buy in a market?”  “Yes”
  • “Do you trust something on the internet?”   “Yes…No…”
  • The point of view→ Google Chrome  z=abs(-y/sqrt(1-x*x)),x is from -1 to 1, y is from -1 to 1,z is from 0 to 1  (Change the aspect, then you’ll get the other view)
  • Time changes evaluation→fossils
  • Expressions “for” misunderstanding→e.g. the amount of VitaminC on a food package
  • A part-time job to click “like”
  • Titles of net news (attention economy)
  • How degenerates find you. Teenagers are targets for perverts.
  • Even if you think you deleted it, it stays on the server.
  • When you think “I don’t like it”, others also think so.
  • Why artists these years hide their profile.

Wikipedia

  • Access ranking of Japanese Wikipedia top 15
  • Differences from Google and others
  • “share knowledge”
  • Open-data
  • Other Wiki-projects
  • What’s encyclopedia
  • Who edits
  • Donation→The amount of donations 2021 calendar year=The local beer company’s sales
  • Trust→citations and links
  • Free copies
  • Many languages→your contributions could be translated into other languages

“You’re in this classroom with air-conditioning, internet-connection. Let’s think of schools without electricity. You can convey an encyclopedia to students in such an environment via Wikipedia. Equal opportunity, quality education.”

The evening of the day for 2 hours was for adults:

Place: On-na Village Information Center

Audience: 10 local people including library staff

Contents:

A few contents were the same as for students but for this group, Racco customized the contents and shared about how to objectively understand AI including ChatGPT. 

Japanese Wikipedia’s pages related to On-na Village; Some of them were without links. Some of them are unattended, on the way for a long time. (Racco picked up them in advance)

What are appropriate links?

Other people may edit what you edited.

The difference between adults and digital natives.

That concludes my diff as a report. This blog is with the permission of the outreacher and supported by JNakayama (WMF). This Rapid Fund is for 6 places including this village.

Thank all the Wikimedians, it is such an honor, and I am very happy to be here because my childhood was on another isolated smaller island,

Yumiko Shibata


Announcing the 2023 Community Insights Report

Thursday, 27 July 2023 14:49 UTC

Measuring knowledge gaps through representative survey data of active editors

This month, we published the 2023 Community Insights Report on metawiki

This report uses data from the 2022 Community Insights survey, which collected responses from a representative random sample of active editors across Wikimedia projects from June through September 2022. Some of the goals of the Community Insights survey are to gather estimates about the demographics of contributors to understand whether we are progressing towards goals such as the 2030 Strategic Direction’s aim towards Knowledge Equity, and to understand how users across roles, identities, and the globe experience contributing on-wiki.

In introducing the 2023 report, we’ll focus on Knowledge Equity, Knowledge Gaps, and how the Community Insights survey helps us to think about and measure aspects of these concepts.

What is Knowledge Equity?

Knowledge Equity is the concept the Wikimedia Foundation uses to talk about the structural inequalities which exist in the world and how they present within and reflect onto our digital ecosystem of free knowledge. If our aim is to have encyclopedias that reflect the sum of all knowledge, where anyone who shares in our vision will be able to join us, knowledge equity is a goal to strive to counteract those inequalities to “ensure a just representation of knowledge and people in the Wikimedia movement.”

To understand whether we’re moving towards knowledge equity, we must first identify and measure the presence of Knowledge Gaps, i.e., the disparities in representation of specific groups of readers, contributors, or content topics in Wikimedia spaces. Using gender as an example, where about 50% of the world is made up of women, knowledge gaps can be applied to 

  • content, where we estimate that around 80% of articles about people on large Wikipedias are about men; 
  • readership, where we estimate 72% of pageviews are generated by men; and 
  • contributors, where we estimate that 80% of active editors self-identify as men based on the Community Insights data. 

These data points reflect the inequities of the world, from a history of knowledge perspective on who is deemed important enough to be represented in the historical record and in which ways, to a time-use perspective on who has the time to devote to reading, writing and editing the sum of all knowledge. But they also give us something to aspire to – by knowing a little about ourselves through data, we can start to ask whether and how we can create an environment that can counteract or circumvent these global inequalities and nourish a sustainable movement where anyone can join us – and stick around. 

What can the data tell us?

As you read over the 2023 Report, we invite you to think about the intersections of differences in on-wiki experience across demographics and roles, and how that shapes our ability to reach knowledge equity as a Movement. 

To get us started, we offer an interpretation of gender and on-wiki experience. 

While women are still markedly underrepresented in our Movement, newcomers are substantially more likely to identify as women and with a trans or non-binary gender identity than contributors who have been around longer. 

Active editors who engaged in organizing activities were likewise more likely to identify as women (approximately 30%) compared to those in other editor roles, underlining that we need to look across data to understand where and how women are engaged in our Movement. Not doing so risks reinforcing gender stereotypes and minimizing the contributions and impactful work of women in our ecosystem

However, women, gender diverse people, and those who preferred not to identify their gender were overall more likely to indicate they’ve felt unsafe or uncomfortable contributing on-wiki in the 12 months preceding the survey. Women were also more likely to respond that they have been harassed in a Wikimedia space compared to those who did not identify as women.

While reaching gender parity can be a long-term aspiration, understanding differences of experience across gender identities today can help us to think about how to make Wikimedia spaces more welcoming and accessible for everyone. This brings an opportunity to ensure that the diverse newcomers who join our Movement stay around and continue to share free knowledge for generations to come. 

What’s next?

In 2022, we shortened the Community Insights survey to reduce the burden on our respondents while retaining the key questions which we have tracked over time to identify contributor knowledge gaps and measure community health across roles and wiki projects. 

This year, aim to continue improving the survey experience based on respondent feedback and increase its impact by including the Community Insights data and results in other tools or decision making processes. For example, we expect that in 2024 the gender of contributors in the Knowledge Gap Index will be measured based on the Community Insights survey. We also expect to utilize the data and insights from this survey to support WMF in developing a metric for measuring contributors’ health, in line with WMF’s annual plan goals.

Our goal is to distribute the next Community Insights survey in March 2024.

Reach out to us on our talk page or email us privately at surveys@wikimedia.org for any questions, comments, or feedback.

Building Wikimedia Movement Hubs Together

Thursday, 27 July 2023 13:20 UTC

Following the share out of the draft Global Council, which started our third community conversations on Meta-Wiki, we are happy to share the next draft chapter of the Wikimedia Movement Charter, which focuses on Hubs, for your review and feedback. This draft is accessible in Arabic, Chinese, French, German, Hindi, Indonesian, Brazilian Portuguese, Russian, Spanish, and Ukrainian. We are excited to continue this journey with you all. Share your feedback on the Talk page, join our upcoming conversation on July 30, or speak with us at an upcoming regional or thematic community event. 

Note-taking during the MCDC Meeting in Utrecht June 2023. CC BY-SA 4.0 via Wikimedia Commons

The Origins of Hubs

The idea of a Wikimedia hub most recently comes from the 2030 Movement Strategy recommendation “Ensure Equity in Decision Making”. Hubs were envisioned as agile and flexible structures to convene communities and affiliates and to empower them with capacities and resources. At their core, Hub structures are about sharing resources and moving decision-making closer to communities. 

Currently, within the movement, there are various conversations taking place at different maturity levels, for example a pilot hub is developing in the CEE region while scopes are being discussed in the MENA and LATAM regions, among others. Discussions on Hubs grounded in the Movement Charter and combined with the lived experiences of Wikimedians involved in their piloting and experimenting will be an excellent base for future collaborations to build more intentional and inclusive support structures together.  

Hubs Draft content 

The Hubs draft chapter presents the Committee’s attempt to define the purpose of these structures. It articulates their goal of being support structures that empower Wikimedia communities to make and implement their own decisions, fostering knowledge sharing, mutual learning, and collaboration across the movement. 

The Hubs chapter elucidates the core responsibilities by providing guidelines. It recognizes the immense potential of local contributors in understanding and addressing thematic or region-specific challenges. Hubs are encouraged to undertake projects that empower communities and provide support where it was not available previously. The chapter also delves into conversations on fundraising and funds dissemination as well as setup and governance process. 

We invite everyone to let us know what you and your community think about these concepts. Given previous conversations on hubs such as the Hubs co-creation workshop in November 2021, Hubs Dialogue, Hubs Global Conversations in March 2022 and Hubs Global Conversations in June 2022, we are especially interested in hearing from the hub organizers from different communities. 

Participants in an online Hubs discussion on January 2021. CC BY-SA 4.0, via Wikimedia Commons.

How can you share feedback?

The Movement Charter Drafting Committee recognizes the significance of collective wisdom in shaping the Hubs structures. Therefore, everyone in the Wikimedia movement is invited to participate and provide feedback on any part of the draft. Your input is vital in refining the Hubs concept to ensure it resonates with the diverse needs of our communities globally. We hope that it will be something that will empower communities and open many more doors of opportunities. 

You are invited to share feedback on the Hubs draft Talk Page. In addition, let us invite you to the open community conversation a.k.a. the MCDC Launch Party to celebrate our work so far and exchange ideas on July 30 at 14.00. Please let us know if you need language support by signing up here for the Zoom call. You can also sign-up on the Movement Strategy Forum to receive a notification 30 minutes before the call.  

Movement Charter Community Conversations Support Grants 

Are you interested in organizing a community gathering with your Wikimedia friends and colleagues on the topic of hubs? Do you need some financial support to cover basic expenses? There are humble support grants available to support you. Please check out the grant page to apply by July 30 and do not hesitate to reach out to the Regional Specialists of the Movement Communications team if you have any questions or require further assistance.  

MCDC Meeting in Utrecht – June 2023 Wrap-up of Discussions. CC BY-SA 4.0, via Wikimedia Commons

César de Paço’s lawsuit against the Wikimedia Foundation is an ongoing legal case in Portugal that raises serious concerns about privacy and free expression. We are concerned that this is a strategic lawsuit against public participation (SLAPP) designed to suppress well-sourced public information. We are fighting this case for two reasons: 1) to protect the user data of volunteers contributing to political biographies; and, 2) to set an important precedent protecting the ability to write biographies of living persons.

A photograph of a Portugal border sign, which features the country's name on the blue field and surrounded by the yellow stars of the European Union flag, and has some weathered stickers and graffiti on it
Portugal border sign by the international bridge at Valença do Minho, Viana do Castelo district. Image by Fernando Losada Rodríguez, CC BY-SA 4.0, via Wikimedia Commons.

Over the last several months, the Wikimedia Foundation has faced a lawsuit from an individual named César de Paço (also known as Caesar DePaço), which presents a threat to the Wikimedia projects and users in Portugal. 

The case started in August 2021 with a complaint that de Paço was upset about the Portuguese and English language versions of the articles about him. These contain information about his right-wing political affiliations and past criminal accusations, topics that had been reported in reliable sources as publicly relevant. The lawsuit went to court in Portugal, and the Foundation won the preliminary case. Like most courts around the world, the lower court’s decision protected the ability of volunteers to research and write about notable topics, including biographies. However, the case took a strange turn on de Paço’s appeal. We are filing a series of appeals of our own in Portugal to protect the safety of users who contribute accurate and well-sourced information on important topics to Wikipedia. In our 5 July filing, we asked the Portuguese appellate court to refer several important legal questions to the Court of Justice of the European Union (CJEU). However, the Portuguese court ruled against us on 13 July, and demanded that the Foundation turn over personal data about multiple users who worked on the article.

We believe this latest ruling was not in line with European law. In this case, the information about de Paço is relevant to ongoing Portuguese and global politics, and the information about him as a person is something that recent reliable Portuguese sources as well as Portuguese users on Wikipedia viewed as relevant and important to the public interest. We think the users were right and that their work needs to be protected. We are continuing to explore legal options to oppose the court’s ruling—and will update this blog post with more detail as the case develops.

Societies whose governance is grounded in universal human rights standards try to strike a balance between the right to freedom of expression and the right to privacy. Rules about defamation and information privacy exist to ensure this balance: A person who writes accurate, well-sourced, publicly relevant information is protected, while a person who deliberately writes false or misleading information that significantly hurts someone’s reputation can be sued for doing so. The law works differently for neutral website hosts, such as the Wikimedia Foundation, which hosts Wikipedia but, importantly, doesn’t write, commission, or edit the articles. A neutral website host can be sued when it has notice that a user has added illegal content to its website, and refuses to remove that content.

When people try to abuse the law in order to censor accurate, important information, it’s referred to as a SLAPP: a strategic lawsuit against public participation. This sort of lawsuit is designed to exploit the law, and censor people who are providing information important and valuable to public discourse. In this case, we’re concerned that’s exactly what de Paço is doing. 

De Paço sued the Foundation in relation to the content of the articles about him. These articles state that reporting in Portuguese sources, including SIC (a well-known Portuguese television network), had indicated associations between de Paço and the far-right political party Chega in Portugal. They also cover some of de Paço’s past legal problems, which volunteer Wikipedia editors found relevant to understanding his biography. This reporting, which covered the issue in 2020 and 2021, continues to be available and relevant to the Portuguese public and other readers. We believe that de Paço is abusing the legal system by pursuing this case and that this should be seen as a form of SLAPP effort against Wikipedia. 

The case has gone through several stages very quickly over June and July 2023.

First ruling and first appeal

Initially, the lower court ruled in the favor of the Foundation, observing that free expression and the public interest outweighed de Paço’s right to keep reporting about his political associations private. However, on initial appeal, the appellate court overruled the lower court in a surprising decision: It brought up arguments de Paço had not even made—regarding the European Union’s General Data Protection Regulation (GDPR)—and used these surprise arguments to order the Foundation to disclose user data to de Paço as well as remove all of the information relating to his political associations and past legal issues.

The original appellate decision determined that under Portugal’s laws implementing the GDPR, only professional journalists were allowed to discuss someone’s political affiliations. It held that Wikipedians did not qualify for this exemption, and it did not consider any other usual exceptions (such as those that allow the sharing of information for education or historical archiving). It also ordered the deletion of material about past criminal accusations, and for the Foundation to identify the volunteer Wikipedia editors who worked on these sections.

Request to nullify first appeal

We then asked the appellate court to reconsider its own ruling (in a process called “nullification”) and it did so, ruling that the Foundation had not had a fair opportunity to properly prepare arguments regarding broader European privacy law issues, which are critical to protecting both user data and the integrity of Wikipedia content.

Second appeal

In our filing on 5 July, we asked the Portuguese appellate court to rule in our favor in order to protect users and their data as well as the integrity of their work. To be clear, by “integrity” we mean wanting to ensure that accurate, good faith information contributed by users is protected. We do not think that article subjects should be able to selectively delete publicly relevant information in order to distort how they are described and perceived. We also identified several questions regarding the EU GDPR that we believed the Portuguese courts needed to refer to the Court of Justice of the European Union (CJEU), since Portuguese law may have incorrectly implemented the GDPR to the detriment of free expression and freedom of information. In particular, we noted that the CJEU should be asked to consider three main questions.

  • The first is whether subjects of biographies on Wikipedia should have the right under the GDPR to demand complete erasure of online material that they wish to be taken out of public knowledge, despite the GDPR’s limited international applicability.
  • The second concerns the extent to which protections for the public interest, journalism, education, and free expression must be implemented in the law of every EU member state. These should allow volunteer Wikipedia editors to write neutral, accurate, and well-sourced information about notable figures, including details about their politics and political associations, even if those articles’ subjects do not like what is being said.
  • The third and last is whether a court should be allowed to order the disclosure of volunteer Wikipedia editors’ personal information so early in a court case, especially when that protection is critical to allowing volunteers to freely and confidently contribute to Wikipedia. 

On 13 July, the Portuguese appellate court ruled on the case. It reversed its previous decision, finding that the EU GDPR did not apply to Wikipedia. This means that Wikipedia articles were therefore allowed to include information about de Paço’s political associations. Having decided that the GDPR did not apply, the court found that it did not need to ask the CJEU to consider the questions we had proposed. However, the court still ruled against the Foundation: It decided that general Portuguese law on privacy and defamation would apply to the discussion of past criminal accusations, so those should be deleted. 

Unusually, the court once again appears to have raised arguments de Paço did not make, this time declaring that Portuguese implementation of European law for website hosts neither protects the Foundation nor the volunteer Wikipedia editors who wrote about de Paço.

Therefore, it once again ordered the Foundation to delete information about certain historical matters involving de Paço, such as allegations of misconduct, and to identify the volunteer editors who added that content.

At present, we are exploring options for further appeal.

Why the case matters

Wikipedia articles are written and edited entirely by volunteer editors, who also set and enforce the rules for sourcing and notability on different topics and in different languages. Wikipedia articles—and those sourcing and notability rules—cover biographies of notable public figures, including people who become notable for supporting politicians or political parties. De Paço is one such notable figure to both Portuguese and English-speaking audiences: He is a celebrated supporter of public authorities abroad (especially US law enforcement agencies), and a global business leader. 

A ruling that punishes individuals around the globe simply for summarizing and publishing on Wikipedia what was said about such a notable person in news media reporting would have an undue chilling effect on freedom of expression and information worldwide. It is particularly concerning if the ruling in question would expect each individual volunteer editor to review all the previous contributions and sources provided in an article before adding to it. 

Wikipedia works best when each volunteer editor can research their own contributions while relying on the work of other volunteer editors for what is already there, slowly improving articles over time. If the court rules that each volunteer editor who works on an article can become liable for everything that is in the article, contributing to Wikipedia will become much more difficult not only in Portugal, but potentially elsewhere as well. 

We want to reassure Wikipedians as well as readers that the Foundation has not provided any data in this case. We treat user data with a high degree of care and require that demands for data follow our procedures and guidelines for requesting nonpublic data. We are hopeful that the Portuguese courts will refer our case to the CJEU in order to help obtain a ruling that protects volunteer editors who carry out good-faith research on notable subjects and contribute to free knowledge projects such as Wikipedia. Ultimately, we also hope they will agree with us on the importance of protecting freedom of information and good-faith discussion of public interest topics.

Why do I document oral culture and you should too?

Wednesday, 26 July 2023 22:38 UTC

A method I have been using to assert my language and cultural heritage is documenting Angika oral culture and integrating it with different Wikimedia projects. Oral culture serves as a valuable record of people’s voices, making it crucial to create comprehensive archives. As linguist Bidisha Bhattacharjee says: “Different forms of oral traditions express the unique identity of a community, its history, its tradition, its cultural heritage and ethnicity.”

Recently, when I was visiting my village, Akhadi pooja, an annual festival, was taking place. Interestingly, the way this festival is performed in my village differs from the usual practice. Apart from widespread practice of worshipping goddess Kali, the villagers worship the nearby hill deity called ‘Biharo Maay’ on the same day. The goddess is believed to be residing in Suiya hill, locally known as ‘Biharo Pahaad’, the villagers worship her to seek security from wild animals dwelling in the hill. The festival takes its name from ‘Asadh’ (असाढ़) month of the Hindu calendar (June-July). Another unique aspect of the worship in my village is the involvement of a man who dances in sync with the beat of drums and cymbals. He assumes the role of the goddess’s attendant, and in her presence, he feels a profound connection, transforming into her embodiment, with his dance becoming a manifestation of her divine energy. 

Amrit Sufi, CC BY-SA 4.0, Wikimedia Commons

Intrigued by this, I decided to experiment in documenting the event and conduct an interview with one of the participants to explore the lore behind this distinct form of worship. Another factor that motivated me is the practice of singing folk songs during the festival. One of the folk songs around Neem tree that I have already documented is also practised during this festival, as it is believed that the goddess guards the worshippers against smallpox. 

Integrating oral culture with Wikimedia projects

After recording audio-visuals of Angika folk culture and uploading them on Commons, I transcribe them and upload them on Wikisource. The video and transcription are then linked to related articles on Wikipedia. For example, the article on Neem in Angika wikipedia has a subsection titled ‘In Angika culture’. This section describes the mention of neem in Angika folk songs and celebrations. The related Commons video and Wikisource text are linked to it.

Invitation to join hands to document Indic Oral Culture

It is crucial to encourage documentation efforts for languages and cultures around the world. As Mamadou Kouyate, a West African griot (oral performer) expresses it: “We are vessels of speech, we are the repositories that harbour secrets many centuries old…We are the memory of mankind, by the spoken word we bring life to the deeds and exploits of kings for younger generations.”

In order to make it easier to document oral culture, I worked on the Oral Culture Transcription Toolkit with help from Wikitongues and OpenSpeaks. This toolkit has information on how to begin language documentation, tips on good audio-visual documentation, and Wikimedia workflow on uploading the files on Commons and Wikisource. Based on a Wikimedia funded research project for needs assessment, the urgent needs with regards to documenting language and culture were assessed and reported on.

Based on the learnings of the research, a new project has been proposed on documenting Indic oral culture. This project is dedicated to providing training to volunteers from Indic languages, to document their oral culture on Wikimedia projects. Hence enriching the presence of their language and culture online. If you are interested in participating in any capacity, feel free to join the project on Meta-wiki or contact me at amritsufi2@gmail.com

Eugene M. DeLoatch founded the School of Engineering at Morgan State University, Maryland’s only HBCU (Historically Black Colleges and Universities). He spent his career educating African American engineers. But, like many STEM figures from historically marginalized communities, DeLoatch lacked a Wikipedia article — until Kimberly Ivy created it.

DeLoatch’s life story is one of championing African American engineers. He provided an opportunity for countless of his students to see him as a role model. That makes him exactly the kind of person who Wiki Education is empowering students like Kimberly to add to Wikipedia, thanks to an initiative funded by the Broadcom Foundation to support diverse students as they add biographies of the hidden figures of STEM to Wikipedia.

Kimberly Ivy
Kimberly Ivy
Image courtesy Kimberly Ivy, all rights reserved.

“If it weren’t for this assignment I’m not sure if I would have ever been introduced to Eugene DeLoatch and all of the other subjects of my classmates’ Wikipedia biographies,” Kimberly says. “Ultimately I gained an abundance of knowledge while writing this article.”

Kimberly is an elementary school teacher in San Jose, California, who’s pursuing her doctorate in educational leadership from Santa Clara University. She constantly uses Wikipedia but had never thought about contributing herself until her instructor this term, Dr. La’Tonya Rease Miles, assigned Kimberly and her classmates to create biographies of diverse figures in STEM.

“I was excited, intrigued, and intimidated all at the same time,” she says. “I knew it was going to take a lot of thought, preparation, and energy to create an article. My initial impression of this assignment was, ‘this is innovative!’ Requiring us to become Wiki editors and create a Wikipedia biography article is unprecedented, and I would choose a Wikipedia assignment over a traditional term paper every time. Not only did we learn a valuable skill, but provided a service to the community.”

Kimberly took Wiki Education’s training and got feedback from Wiki Education staff on her draft. She found the experience meaningful, and intends to contribute content again, as part of her efforts to highlight achievements of African Americans.

“Learning about Wikipedia’s lack of biographies of people of color and women in STEM fields opened my eyes to the privilege and power that news and social media platforms possess,” Kimberly says. “Our nation has a history of presenting information that portrays African Americans and people of color in a negative light. The absence of positive contributions from groups that have been historically marginalized can be equally oppressive. Because of these inequities that exist, I made a conscious decision to choose an African American male as a subject. After learning about DeLoatch’s development of Morgan State University’s engineering program, and that he is responsible for training more African American engineers than anyone else in the world, writing his Wikipedia biography became more than a graded assignment. DeLoatch deserves the type of public recognition that possessing a Wikipedia biography article grants.”

Average Lengths of MediaWiki Translations

Wednesday, 26 July 2023 15:09 UTC

I was wondering: In which languages, user interface translations tend to be longer, and in which ones they are shorter?

The intuitive answers to these questions are that Chinese and Japanese are very short, English tends to be shorter than the average, Hebrew is shorter than English, and the longest ones are Turkish, Finnish, German, and Tamil. But what if I try to find a more precise answer?

So I made a super-simplistic calculation: I checked the average length of a core MediaWiki user interface message for English and the 150 languages with the highest number of translations.

I sorted them from the shortest average length to the longest. The table is at the end of the post.

Here’s a verbal summary of some interesting points that I found:

  1. The shortest messages are found, unsurprisingly, in Chinese, Japanese, and Korean.
  2. Another group of languages that surprised me by having very short translations are some Arabic-script languages of South Asia: Saraiki, Punjabi, Sindhi, Pashto, Balochi.
  3. Three more languages surprised me by being at the shoter end of the list: Hill Mari (mhr) and Northern Sami (se), which are Finno-Ugric, a family known for agglutinative grammar that tends to make words longer; and Armenian, about which I, for no particular reason, had the impression that its words are longish.
  4. English is at #22 out of 151, with an average length of 38.
  5. Hebrew is slightly above English at #21, with 37.9. This surprised me: I was always under the impression that Hebrew tends to be much shorter.
  6. The longest languages are not quite the ones I thought! The longest ones tend to be the Romance languages: Lombard, French, Portuguese, Spanish, Galician, Arpitan, Romanian, Catalan.
  7. Three Germanic languages, namely Colognian, German and Dutch, are on the longer end of the list, but not all of them. (Colognian is the longest in my list. The reason for this is not so natural, though: The most prolific translator into it, User:Purodha, liked writing out abbreviations in full, so it made many strings longer than they could be. He passed away in 2016. May he rest in peace.)
  8. Other language groups that tend to be longer are Slavic (Belarusian, Russian, Bulgarian, Polish, Ukrainian) and Austronesian (Sakizaya, Ilokano, Tagalog, Bikol, Indonesian).
  9. Other notable, but not easily grouped languages that tend to be longer are Irish, Greek, Shan, Quechua, Finnish, Hungarian, Basque, and Malayalam. All of them have an average length between 45 and 53 characters.
  10. Turkish is only slightly above average with 44.1, at #88.
  11. Tamil is a bit longer, with an average length of 44.6, at #94. Strings in its sister language Malayalam are considerably longer, 49.1.
  12. The median length is 43, and the average for everyone is 42. Notable languages at these lengths are Mongolian, Serbian, Welsh, Norwegian, Malaysian, Esperanto, Georgian, Balinese, Tatar, Estonian, and Bashkir. (Esperantistoj, ĉu vi ĝojas aŭdi, ke via lingvo aperas preskaŭ ĝuste en la mezo de ĉi tiu listo?)

One important factor that I didn’t take into account is that, for various reasons, translators to different languages may select to translate different messages, and one of those reasons may be that people choose to translate shorter messages first because they are usually easier. I addressed this in a very quick and dirty way, by ignoring strings longer than 300 characters. Some time in the (hopefully near) future, I’ll try to make a smarter way to calculate it.

And here are the full results. Please don’t take them too seriously, and feel free to write your own, better, calculation code!

# Language code Average translation length
1 zh-hans 17.67324825
2 zh-hant 18.52284388
3 skr-arab 21.81899964
4 ja 24.67007612
5 ko 25.8110372
6 sd 27.71960396
7 mhr 28.95451413
8 ps 32.73647059
9 pnb 33.03592163
10 bgn 34.39934667
11 se 34.69274476
12 hy 35.02317597
13 su 35.37706967
14 th 35.52957892
15 ce 35.6969602
16 mai 36.02093909
17 lv 36.14100906
18 gu 36.59380971
19 bcc 36.64866033
20 fy 37.60139287
21 nqo 37.94138834
22 he 37.95259865
23 en 38.04300371
24 ar 38.18569036
25 ckb 38.66867672
26 min 38.71156958
27 ses 38.87941712
28 jv 38.94753377
29 is 39.0652467
30 alt 39.39977435
31 az 39.4337931
32 kab 39.50967506
33 tk 39.54990758
34 mr 39.72049689
35 as 39.72080166
36 sw 39.73986071
37 km 39.77591036
38 azb 39.92411642
39 nn 39.96771069
40 yo 40.00503291
41 io 40.0528125
42 af 40.1640678
43 blk 40.2813059
44 sco 40.33289474
45 diq 40.33887373
46 yi 40.34033476
47 ur 40.39857651
48 ug-arab 40.53965184
49 da 40.55894826
50 my 40.67551519
51 kk-cyrl 40.87443182
52 guw 41.07080182
53 mg 41.08369028
54 sq 41.23219241
55 fa 41.27007299
56 or 41.27020202
57 ne 41.33971151
58 rue 41.40219378
59 lfn 41.54527278
60 lrc 41.61281337
61 sah 41.63293173
62 vi 41.74578313
63 awa 41.84093291
64 hi 41.9257885
65 si 41.93065693
66 te 41.99780915
67 mn 42.18728223
68 lki 42.21091396
69 bjn 42.57961538
70 sr-ec 42.67730151
71 cy 42.75020408
72 frr 42.92761394
73 vec 43.00573682
74 sr-el 43.13764389
75 nb 43.34987835
76 krc 43.54919554
77 ms 43.5553814
78 hr 43.55564807
79 eo 43.57477789
80 nds-nl 43.59060895
81 ka 43.60108696
82 ban 43.64178033
83 bs 43.681094
84 tt-cyrl 43.78230132
85 xmf 43.80860161
86 et 43.96494239
87 ba 43.99432099
88 tr 44.17996604
89 bn 44.28768449
90 bew 44.44706174
91 sv 44.49027333
92 sa 44.58670931
93 cs 44.59026764
94 ta 44.62803055
95 mt 44.70207417
96 lt 44.7615
97 roa-tara 44.79812466
98 fit 44.79824561
99 dsb 44.9151957
100 hsb 44.96197228
101 br 44.98873461
102 sh-latn 45.00976709
103 fi 45.1222031
104 hu 45.17139303
105 sk 45.35804702
106 lb 45.39073034
107 li 45.5539548
108 id 45.56471159
109 gsw 45.63605209
110 sl 45.75350606
111 be 45.80325
112 oc 45.85709988
113 mk 45.90943939
114 bcl 45.97070064
115 scn 46.11905532
116 an 46.14892665
117 uk 46.22955524
118 qu 46.30301842
119 eu 46.33589404
120 lij 46.660536
121 pl 46.76863316
122 hrx 46.79802761
123 ast 46.87204161
124 nap 46.93783147
125 ru 47.02326139
126 bg 47.03590259
127 be-tarask 47.28525242
128 hif-latn 47.41652614
129 tl 47.51263001
130 rm 47.60741067
131 pms 47.69805527
132 pt-br 47.84063647
133 ca 47.92468307
134 ro 48.22437186
135 nl 48.4175636
136 ia 48.48612816
137 it 48.52347014
138 frp 48.54542755
139 gl 48.57820482
140 ml 49.12108224
141 es 49.21062944
142 pt 49.63085602
143 de 49.77225067
144 szy 49.84650877
145 shn 49.92356241
146 fr 50.15585031
147 lmo 50.85627837
148 ilo 50.9798995
149 el 51.14834894
150 gd 51.72994269
151 ksh 53.36332609

The Python 3 code I’ve used to create the table. You can run in the root directory of the core MediaWiki source tree. It’s horrible, please improve it!

import json
import os
import re

languages = {}
code_re = re.compile(r"(?P<code>[^/]+)\.json$")


def process_file(filename):
    code_search = code_re.search(filename)
    code = code_search.group("code")
    if code in ('qqq', 'ti', 'lzh', 'yue-hant'):
        return

    with open(filename, "r", encoding="utf-8") as file:
        data = json.load(file)
        del(data['@metadata'])
        average_unicode_length(code, data)


def average_unicode_length(language, translations):
    total_translations = len(translations)
    if total_translations < 2200:
        print('Language ' + language + ' has fewer than 2200 translations')
        return

    total_length = 0

    for translation in translations.values():
        if len(translation) < 300:
            total_length += len(translation)

    # Calculate the average length
    average_length = total_length / total_translations
    languages[language] = average_length

root = "./languages/i18n/"
for file in os.listdir(root):
    if file.endswith(".json"):
        path = os.path.join(root, file)
        process_file(path)

sorted_languages = sorted(
    languages.items(),
    key=lambda item: item[1]
)

# Print the sorted items
for code, length in sorted_languages:
    print(code, '\t', length)

An Overview 

Mentorship is a relationship between two or more people where the individual with more experience (mentor) can share their knowledge with another individual that has a lesser knowledge (mentee) within their field. Mentorship can be a transformative tool, especially within the technical world, where knowledge transfer and the sharing of insights can accelerate learning and skill acquisition. As a mentee, one can tap into a mentor’s knowledge, gaining practical understanding and insight from an expert who has already navigated a similar path. Wiki Mentor Africa (WMA) was designed with this objective in mind.

What is WMA?

Wiki Mentor Africa is a unique program that guides new and inexperienced developers or programmers within African communities. It provides more than just a foundation for making contributions to Wikimedia. It aims to equip African programmers with the skills to build and maintain Wikimedia tools, pairing them with more experienced contributors who serve as mentors.

Generally, WMA’s broad objectives include developing technical capacity within African Wikimedia communities, promoting skill transfer from more equipped individuals to those less capable, increasing the number of Wikidata tools and their maintainers, and recruiting new Wikimedia Tool contributors.

Furthermore, the program is geared towards fostering collaborations aimed at solving issues peculiar to African communities, ensuring Wikimedia technical skills become less Eurocentric and Organising annual events, conferences, contests for mentors/mentees/the general public.

How did WMA come about?

Founded by the Igbo Wikimedia User Group (IG Wiki UG), WMA was conceived by Udeh Benedict, an IG Wiki UG member. Tochi Precious and Eugene Egbe joined the initiative as outreach and technical coordinators, respectively. Their shared goal was to enhance technical capacity in African Wikimedia communities by mentoring developers for Wikimedia Tool Development. Wikimedia Deutschland played a crucial role in the project’s implementation, providing financial support and guidance.

WMA – The Journey so far

Over the past 12 months, the Wiki Mentor Africa project has made substantial progress towards its objectives of developing technical capacity within African Wikimedian communities and diversifying the development and maintenance of Wikimedia tools. The program has successfully hosted 12 live training events, attracting an average of over 110 participants per event while engaging seven mentors. The mentorship program was widely advertised, leading to a growing Telegram group with over 440 members. The project has so far groomed a group of 22 participants who became mentees and have gained hands-on experience and in-depth knowledge of Wikimedia tools. One year of this project achieved a significant product which is the improvement of the Stub Builder tool. This tool, critical to Wikipedia’s ecosystem (a vital tool for creating Wikipedia stubs) has been upgraded thanks to the efforts of the WMA team.

Targets versus what has been achieved

The project made impressive progress on the metrics laid out in her proposal:

Following the target metrics, 12 trainings has been organized and implemented as planned. Seven mentors has so far been attached to the Wiki Mentor Africa program, exceeding the goal of five. 22 participants has so far been adopted as mentees. Over 110 participants became repeat attendees surpassing the goal of 50. More than 440 people were reached through her Telegram group exceeding the goal of 100. One existing tool (the Stub Builder tool) was successfully updated. The Wiki Mentor Africa program has so far hosted participants from Nigeria, Ghana, Cameroon, Rwanda, Burkina Faso etc and in the process trained over 150 young African Wikimedians.

Challenge(s) encountered

Just like any other project, there were challenges that emerged during the one year of operation of WMA. This challenge centered particularly around recruitment and retention of mentors and ensuring continuous engagement and productivity of the mentees. The WMA team is however setting up modalities to overcome these temporary challenges as the overall project has made impressive strides in its planned activities and outcomes.

Next goal of the project

WMA’s next goal is to have at least 10 participants actively contributing to Wikimedia technical projects, a target currently in progress. Additionally, the project aspires to develop a new tool within the next year.

WMA Mentees and their Knowledge dissemination

In terms of knowledge dissemination, WMA mentees have disseminated their newfound knowledge by presenting their work at two conferences, with one presentation publicly accessible at Biss Pensoft. This has significantly helped to develop the technical capacity of African Wikimedian communities, promoting a more diverse and inclusive growth of Wikimedia tools.

WMA online session

Who are the mentors?

Within the period of one year, WMA has had seven mentors who have so far adopted 22 participants as mentees; these mentees have successfully presented their work at two conferences. Below are the mentors, their language(s) and their active projects:

WMA mentors

Recognition for Dedication: Mentor and Mentees of the Year

In the past year, WMA has engaged with many mentors and mentees. However, certain individuals have stood out due to their dedication, hard work, and significant contributions.

The Mentor of the Year title goes to Andra Waagmeester. Not only has he been an exceptional mentor, but he has also made significant contributions to several projects. Notably, he led his mentees in upgrading the Wikipedia Stub Builder, a tool critical for creating biodiversity Wikipedia stubs using data from the iNaturalist app. This work is in addition to his active involvement in Wikidata:WikiProject Biodiversity, WikiProject_Schemas, and Wikidata Integrator. His dedication and leadership have had a substantial impact on the Wiki Mentor Africa project and its beneficiaries.

As for the Mentees of the Year, this recognition is awarded to Dnshitobu from Ghana and Agnes Abah from Nigeria. Their enthusiastic participation, eagerness to learn, and their dedication to applying their newfound skills set them apart.

Through their efforts and accomplishments, these individuals have demonstrated the transformative power of the Wiki Mentor Africa project and the broader impact it can have on the African Wikimedian communities.

Summary

The first year of the Wiki Mentor Africa project has laid a strong foundation for future progress and growth. This program serves as a testament to the transformative power of mentorship in empowering communities with the skills and knowledge to shape their own futures. 

Outreachy report #46: July 2023

Tuesday, 25 July 2023 00:00 UTC

Two weeks ago, my carer and I were 11,000 kilometers away from home. We visited five different airports (GYN, GRU, DFW, PDX, CGH) to travel to Portland, OR in order to attend the very first FOSSY, our American 1,000 interns celebration, and meet two of the people I’ve been working with for half a decade. I had so many silly questions: Would people be taller or shorter than me? How do they sound when there isn’t a not-so-stable internet connection intermediating our communications?

Tech News issue #30, 2023 (July 24, 2023)

Monday, 24 July 2023 00:00 UTC
previous 2023, week 30 (Monday 24 July 2023) next

Tech News: 2023-30

weeklyOSM 678

Sunday, 23 July 2023 14:55 UTC

11/07/2023-17/07/2023

lead picture

Home Assistant now with a map from OSM in the user onboarding step [1] © OpenStreetMap contributors

Mapping

  • deptho wonders why one would map sidewalks/pavements separately from the associated road and not use the sidewalk tag on the road object. Commenters say that it can improve pedestrian routing and mapping kerb drops can help people with limited mobility, but that just mapping sidewalks separately isn’t a panacea.
  • The RapiD editor has allowed users to trace from Google’s Open Buildings layer since 2022. Recent changes have made more data available and the licence is now compatible with OSM. Particularly useful is that this dataset includes a precision threshold, which should help mappers avoid false positives (which have been a problem with the default Microsoft Buildings layer).

Community

  • AngocA crafts a real “hold it in your hands” OSM model, which was inspired > by a Wikpedia model.
  • b-unicycling and a friend of hers started a regular’s table called the “Kilkenny History Mappers”. The local newspaper and radio station reported about it.
  • courtiney (formerly of Tomtom, now a freelance writer and a member of OSM’s Communications Working Group) elaborates on how to use OSM channel data for effective communications. She is looking into developing an open-source version of the tool she used to share with the community.
  • LFF_490_Alexander reflects on the “sustainability of OSM communities”.
  • The German “Focus of the Week” (a weekly mapping activity started during the coronavirus pandemic) has been suspended until further notice, as fewer people have been taking part recently.
  • SGorskiy describes > in detail how he finds drinking water with the help of Organic Maps and OsmAnd.
  • ShresthaSR shared his experience while participating in the FOSS4G Prizren 2023 conference.

OpenStreetMap Foundation

  • The next monthly video-meeting of the OSMF board,
    will take place on Thursday 27 July 2023 at 15:00 UTC in the video room
    (opens ~20 minutes before the meeting). The preliminary agenda is on the wiki and this is where the draft minutes will be added.The topics are:
    • Treasurer’s report
    • Secretary’s report
    • Sovereign Tech Fund application
    • Board rules of order regarding circulars
    • Review Strategic Plan revision
    • Corporate Membership revision
    • Change to Courtney Williamson’s status to volunteer, cancellation of contract
    • Support as financial intermediary to regional State of the Map sponsorships
    • EU Transparency Register
    • Vector tiles
    • Monthly presentation – OSM community in the Democratic Republic of the Congo

Local chapter news

  • OpenStreetMap US, in partnership with Mapillary, will be running a 360 degrees camera grant programme for OSM community members in the United States.

Events

  • The annual Wikimania conference will take place in Singapore and online from 16-19 August.
    Andy Mabbett indicates that several sessions on maps are planned and asks if any OSM contributors are going.

Education

  • Miku has released a progress report on her Google Summer of Code 2023 project. The project focuses on customising Nominatim’s address lookup algorithm based on the address system used in Japan.

OSM research

  • Val Ismaili looked at the validity of OSM data in a region in the UK on Medium. He contrasts data collected by Transport East (Norfolk, Suffolk, Essex, Thurrock and Southend-on-Sea) with OSM found data in several tables. The approach (actually comparing a range of OSM data with what local people thought was present) is significantly better than some studies that only look at one factor (e.g. “land area occupied by buildings” was one recent one).

Maps

  • Lisa Hornung shared on Mastodon a map of gelaterias in Italy. The publication of the map generated a short discussion on why there are fewer gelaterias in southern Italy on OpenStreetMap.

switch2OSM

  • [1] Home Assistant, an open source home automation application, now uses OpenStreetMap and Nominatim in its user onboarding feature.
  • Andy Townsend updated the “switch2osm” OSM tile server installation guide page for the Debian 12 operating system.

Open Data

  • Google released Open Buildings V3 that contains 1.8B building detections in Africa, Latin America, Caribbean, South Asia, and Southeast Asia. The data is shared under the CC BY-4.0 and ODbL v1.0 license. An import proposal derived from this dataset is currently under discussion. As noted in that thread, an important feature of this dataset that should help avoid false positives is a confidence score.

Software

  • Jochen Topf, the developer of Taginfo, describes the external links he has added to Taginfo’s Tags pages with instructions on how to extract / analyze them from external tools (Overpass, OSM Tag History, and ohsome). ohsome gives the option to select countries to analyze historical OSM data.
  • Eugene describes how to use OsmAnd more efficiently with the help of the “online map plugin”.

Programming

  • Mustapha Mekhatria describes how he creates a map with Highcharts and OSM data.
  • richlv uses the LearnOverpass app to learn how to create Overpass queries.

Releases

  • The GraphHopper Maps Android app released a new version and it can be found in the F-Droid app store.

Did you know …

OSM in the media

  • Darkonus discovered the use of OpenStreetMap maps in the film “Mission Impossible: Fallout (2018)”.

Other “geo” things

  • Anders Sundell used an algorithm to find the countries that look the most like each other.
  • HeiGIT announces that the ohsome dashboard got extended to be able to compute data quality metrics for OpenStreetMap through the ohsome quality analyst (OQT) which is also developed by HeiGIT.
  • Jake Coppinger wrote an in-depth analysis of the operational management of traffic lights in the Sydney region.
  • Neal Agarwal released the “Wonders of Street View” website to collect and explore unique places in the world captured by Street View photos.

Upcoming Events

Where What Online When Country
Rēzekne Kartēšana Rēzeknē 2023-07-21 flag
City Of Vincent Social mapping Sunday: Leederville and Lake Monger 2023-07-23 flag
Bremen Bremer Mappertreffen (Online) 2023-07-24 flag
Reunião 2 – OSM Brasil – Mapeamento Aberto para Gestão de Riscos de Desastres [Horário 1] 2023-07-26
OSMF Engineering Working Group meeting 2023-07-26
Richmond MapRVA Happy Hour 2023-07-27 flag
Zürich Missing Maps Zürich Mapathon 2023-07-26 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen 2023-07-26 flag
Wien 68. Wiener Stammtisch 2023-07-27 flag
大安區 COSCUP 2023 OpenStreetMap x Wikidata 開放內容議程軌 2023-07-30 flag
Reunião 2 – OSM Brasil – Mapeamento Aberto para Gestão de Riscos de Desastres [Horário 2] 2023-07-31
San Jose South Bay Map Night 2023-08-02 flag
Missing Maps London Mapathon 2023-08-01
Stuttgart Stuttgarter OpenStreetMap-Treffen 2023-08-02 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Elizabete, MatthiasMatthias, PierZen, SomeoneElse, TheSwavu, barefootstache, derFred, rtnf.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

I wrote about the exploration on Natural language querying for wikipedia in previous two blog posts. In Part 1, I was suggesting that building such a collection of question and answers can help natural language answering. One missing piece was actually suggesting an answer for a new question that is not part of QA set for article. In Part 2, I tried using distilbert-base-cased-distilled-squad with ONNX optimization to answer the questions.

Adding Latinx scholars to Wikipedia

Wednesday, 19 July 2023 18:39 UTC

Marina Corrales’s career goal is to become a university librarian. She was excited when she took Dr. La’Tonya Rease Miles’s Social Innovation for Social Impact Leaders class at Santa Clara University this spring and discovered she’d be writing a Wikipedia article as part of the course.

“I’ve referenced Wikipedia extensively throughout my academic career and enjoyed the idea of contributing to it,” says Marina, who just wrapped up her first year of the Social Justice Leadership EdD program.

Her course was participating in an initiative funded by the Broadcom Foundation to create diverse biographies of people in STEM on Wikipedia, so Marina set out to create the biography of Arnaldo Díaz Vázquez. As a self-identified Latina, Marina says she wanted to increase the presence of Latinx scholars on Wikipedia.

Marina Corrales
Marina Corrales.
Image courtesy Marina Corrales, all rights reserved.

But she didn’t stop there. A guest lecture by Dr. Daniel Solórzano in another class she was taking inspired Marina to learn more about his work. She found that many of the scholars whose work she looked up to were connected to him.

“I became curious about what his Wikipedia page might say and how extensive it might be. I was shocked that Dr. Solórzano didn’t have a page despite having hundreds of people study under him for several decades, many of whom have been successful in their own right,” she says. “After my shock subsided, I thought about our assignment and why we were creating the pages in the first place; because of an underrepresentation of people of color on Wikipedia in STEM fields. I realized that underrepresentation might also be the case for scholars of color in Education on Wikipedia.”

Armed with her research mindset and newfound knowledge of how Wikipedia worked, Marina dug deeper. She searched for academics that inspire and inform her work, such as Dr. Angela Valenzuela (aesthetic vs. authentic caring), Dr. William G. Tierney (cultural integrity vs. cultural suicide), Dr. Ricardo D. Stanton-Salazar (school and kin support networks), and Dr. Mark Granovetter (social network theory), among others.

“I was disheartened to find that those who identify as people of color, Dr. Valenzuela and Dr. Stanton-Salazar, did not have Wikipedia pages. Yet those who identify as white, Dr. Tierney and Dr. Granovetter, both did,” Marina says. “I wasn’t surprised, and yet I still felt disappointed. This frustration, added to the disappointment of Dr. Solórzano not having a page, caused me to want to do something beyond simply being upset; I wanted to take action. Creating a Wikipedia article for Dr. Solórzano allows me to expand on whose knowledge we value and whom we consider important enough to have a Wikipedia page.”

Marina says she felt compelled  to start a page for Dr. Solórzano after learning how to create and edit Wikipedia articles and is now confident that she can. Given Wikipedia’s neutrality and conflict of interest rules, it’s better for someone like her — who has no personal connection to the subject — to create the article. So she is working on it.

She credits Wiki Education’s staff (with special thanks to Wikipedia Expert Brianda Felix) for  helping her learn the ropes of Wikipedia. Marina says her favorite part of the assignment was sharing her course output  with friends and family instead of a dense, jargon-heavy academic paper.

As a scholar of education and information science, Marina sees the value of the assignment in developing  open educational resources (OERs) like Wikipedia as a way  to support students of color as a future university librarian.

“Most people assume that information literacy is intuitive, but it’s not, particularly when finding and utilizing academic resources such as library search engines,” Marina says. “At the beginning of the quarter, we discussed OERs and how people access information. This conversation, combined with my developing interest in the field of library sciences and  supporting students of color in expanding their information literacy skills, made this assignment particularly impactful. I considered how many people without access to academic resources such as scholarly journals and peer-reviewed articles use OERs such as Wikipedia. I also thought about the frustration that comes with being a student, trying to complete an assignment, and hitting a wall because the institution you belong to doesn’t have access to a particular journal. Despite my extensive experience searching for academic resources, this continues to happen to me. Now, more than ever, I realize the value of OERs and am excited about contributing to one through Wikipedia.”

And she’s grateful for the skills she gained. She hopes the articles she’s adding to Wikipedia will help raise the profile of other scholars of color.

“Without this assignment, I don’t think I would have thought twice about not finding a Wikipedia page for scholars of color that I admire,” she says. “The skills I learned through the experience of completing this assignment will benefit me long after this class ends. I now know more about OERs, adding to them, and will encourage others to do the same. I look forward to completing an initial draft of Dr. Solórzano’s Wikipedia page so that more people learn about his extensive and powerful work. I hope that creating Dr. Solórzano’s page supports communities  of color by amplifying our voices and lived experiences so that the lenses applied to research on our communities aren’t all deficit-based and instead, highlight our strengths.”

 

Episode 142: Rita Ho

Tuesday, 18 July 2023 17:37 UTC

🕑 1 hour 12 minutes

Rita Ho is the senior groups design manager at the Wikimedia Foundation. She was formerly a member of the Growth Team, and is still actively involved in growth-related projects.

Links for some of the topics discussed:

Tech News issue #29, 2023 (July 17, 2023)

Monday, 17 July 2023 00:00 UTC
previous 2023, week 29 (Monday 17 July 2023) next

Tech News: 2023-29

weeklyOSM 677

Sunday, 16 July 2023 10:34 UTC

04/07/2023-10/07/2023

lead picture

OpenCampingMap is looking for translators [1] © Sven Geggus © Contribuintes OpenStreetMap

Mapping

  • adreamy wondered about how to map an intersection that is completely covered with pedestrian crossings overlapping in all directions. They proposed using area:highway=pedestrian.
  • pluton_od blogged about high-occupancy vehicle lanes (HOV) and how they updated around 8,000 road segments in the United States from the deprecated tag hov=lane to hov:lanes=*.
  • Brian Sperlongano has proposed a mechanical edit to remove the tag service=driveway2.
  • The proposal to establish the keys wheelchair:portable_ramp=* and wheelchair:portable_lift=*, for placing on public transport platforms to indicate whether a portable ramp or a portable wheelchair lift is present at the platform, is waiting for your comments.

Community

  • [1] After having added more features in the English language version of his OpenCampingMap, Sven Geggus, from Karlsruhe, would like to have people help in translating these to French, Spanish, and Russian.
  • Amanda McCann (user ᚛ᚐᚋᚐᚅᚇᚐ᚜) shared her insights comparing OSM engagement on Twitter with Mastodon/Fediverse.
  • bes199 described how they researched the confusing house number sequence in a residential area.
  • The OpenStreetMap community on Lemmy is searching for moderators.

Imports

  • James Crawford proposed moving the discussion of import proposals from the current mailing list to the OSM Community forum, which passed with a vote of 45-5. However, critics of the vote pointed to violations of the wiki’s tagging proposal process. Frederik Ramm noted that the Data Working Group would consider reviewing imports in either communication channel to be acceptable.

OpenStreetMap Foundation

  • Take a moment to reflect on why State of the Map Conferences are so important to OSM.

Events

  • HeiGIT took part in the US State of the Map Conference for the first time. Marcel Reinmuth gave several talks presenting HeiGIT’s platforms and services to the US OSM community and ran a workshop, together with Charles Hatfield, to introduce the new version of the Sketch Map Tool to a wider audience.
  • It may be worth having a look at the upcoming OSM hack weekend in Karlsruhe, Germany, being held on the weekend of 30 September. Surely you have some unfinished OSM projects or some good ideas for new ones? Bring them with you to Karlsruhe. You will find people there to discuss the details with you, or help build something.

Maps

  • meteoblue helps you to find cooler spaces, for example in München, during high summer. Data is also available for some other European and world cities.

Software

  • Eugene shows how OsmAnd users create their own map styles.
  • SportsTrackLive has released a brand new companion app named SportsTrackLive Unity (iOS, Android). It has a unique 3D terrain map render engine built in-house on Unity. The app supports the recording and replaying of outdoor sport activity. Everyone is invited to try and test it. The developers are actively making efforts to improve the map engine.

Programming

  • M!dgard wanted to use different user accounts with the same JOSM presets. Due to the fact that JOSM doesn’t have built-in support for multiple user accounts, he created a script that lets him switch without pain. It works on a typical Linux setup or other *NIXes, such as macOS, but not on Windows.
  • MapLibre is urging users of MapLibre Native’s custom OpenGL layers to come forward. This feature is about to be refactored, so user feedback is needed to ensure the code remains backwards compatible.
  • Paul Norman has created a table of the 2,713,316 OSM changesets from the last 12 months that referenced Maxar imagery, and listed the countries with the highest numbers of references. From this data, Guillaume Rischard showed that countries from the Global South have a relatively high proportion of Maxar imagery used to edit OSM (93% of changesets from Guatemala used Maxar). Pierre Béland presented a synthesis of changesets and objects edited by continent and sub-continent, again showing a high proportion of Maxar use for countries from the Global South.
  • Kyle Barron wrote a tutorial about developing a fast rendering engine for vector geodata in the browser. The article includes an interactive map with a million vector polygons of buildings in Salt Lake City.
  • Stadia Maps announced the initial release of three SDKs (JavaScript/TypeScript, Python, and Kotlin), along with a MapLibre GL JS autocomplete search box plugin for their services. All are open source and available on their GitHub.

Releases

  • Vespucci version 19.1 beta has been released and is available from the Google Play Store or Vespucci repository. It now features tile layer diagnostics that display technical information about the currently displayed map tile.
  • HeiGIT announced the release of Openrouteservice 7.0, also known as Callisto. The new version provides, among other improvements, a faster routing graph building processes.

Other “geo” things

  • The drift of Earth’s pole confirmed groundwater depletion as a significant contributor to global sea level rise between 1993 and 2010.
  • HeiGIT is looking for a Senior Spatial Data Scientist for the Climate Action focus group (m/f/d, up to 100%). Tasks will include, among others, creating high-resolution spatial indicators of climate change readiness and greenhouse gas emission maps, as well as the analysis of crowdsourced geodata such as OSM in combination with other spatial data including remote sensing and government data.
  • HeiGIT is looking for a community engagement, outreach, and partnership manager for the Climate Action focus group (m/f/d, up to 100%). They will work on the interface between communities and scientists to enable citizens and decision makers to identify and take actions related to the climate crisis.

Upcoming Events

Where What Online When Country
165. Treffen des OSM-Stammtisches Bonn 2023-07-18
Lüneburg Lüneburger Mappertreffen (online) 2023-07-18 flag
The Municipal District of Kilkenny City Kilkenny History Mappers MeetUp 2023-07-18 flag
Lorain County OpenStreetMap Ohio Meetup – Kickoff 2023-07-20 flag
Karlsruhe Stammtisch Karlsruhe 2023-07-19 flag
Missing Maps – DRK & MSF Online Mapathon 2023-07-19
City Of Vincent Social mapping Sunday: Leederville and Lake Monger 2023-07-23 flag
Bremen Bremer Mappertreffen (Online) 2023-07-24 flag
OSMF Engineering Working Group meeting 2023-07-26
Düsseldorf Düsseldorfer OpenStreetMap-Treffen 2023-07-26 flag
Wien 68. Wiener Stammtisch 2023-07-27 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Elizabete, MatthiasMatthias, Nordpfeil, PierZen, Strubbl, TheSwavu, TrickyFoxy, barefootstache, derFred.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Wikipedia’s value in the age of generative AI

Wednesday, 12 July 2023 17:21 UTC

If there was a generative artificial intelligence system that could, on its own, write all the information contained in Wikipedia, would it be the same as Wikipedia today?

This might seem like a philosophical question, but it’s now a very practical one due to recent advances in generative artificial intelligence and large language models (LLMs). Because of widespread adoption of generative AI technology designed to predict and mimic human responses, it is now possible to nearly effortlessly create text that seems a lot like it came from Wikipedia.

My answer to the question is simple: No — it would not be the same.

The process of freely creating knowledge, of sharing it, and refining it over time, in public and with the help of hundreds of thousands of volunteers, has for 20 years fundamentally shaped Wikipedia and the many other Wikimedia projects. Wikipedia contains trustworthy, reliably sourced knowledge because it is created, debated, and curated by people. It’s also grounded in an open, noncommercial model, which means that Wikipedia is free to access and for sharing and it always will be. And in an internet flooded with machine generated content, this means that Wikipedia becomes even more valuable.

In the past six months, the public has been introduced to dozens of LLMs, trained on vast data sets that can read, summarize, and generate text. Wikipedia is one of the largest open corpuses of information on the internet, with versions in over 300 languages. To date, every LLM is trained on Wikipedia content, and it is almost always the largest source of training data in their data sets.

An obvious thing to do with one of these new systems is to try to generate Wikipedia articles. Of course, people have tried it. And, as I’m sure many readers have experienced firsthand, these attempts highlight many challenges for using LLMs to produce what Wikipedians call knowledge, which is trustworthy, reliably sourced encyclopedic writing and images. Some of these shortcomings include:

  • Output from LLMs isn’t currently fact checked, and there are already well-publicized instances of people using generative AI to try to do their own jobs. There are tons of low-stakes situations, like prompts for thank you notes, plans for a fun vacation, or outlines to start an essay, where the outputs are helpful and not harmful. However, there are other situations where it’s not so good – like in the instance when an LLM fabricated court cases, and the lawyer who used the answers in a real courtroom was ultimately fined. In another situation, a doctor demonstrated that a generative AI system would give poor diagnoses when provided symptoms from patients seen in an emergency room. Over time, my guess is that these systems will get much better, and become more reliably sourced in a variety of contexts. An exciting possibility is demand for better sourcing will improve access to research and books that can be used online. But getting there will take time, and probably significant pressure from regulators and the public to improve in ways that benefit all people.
  • LLMs can’t use information they haven’t been trained on to respond to prompts. This means that all the books of the world that aren’t available in full text online, content from pre-internet research, and information in languages other than English, aren’t part of what a typical LLM “knows”. As a result, the data sets used to train LLMs today can amplify existing inequities and bias in many areas – like hiring, medicine, and criminal sentencing. Maybe someday this will change, but we’re pretty far off from being able to freely access and then train LLMs on all the different kinds of information people in every language currently use to write for Wikipedia. And even then, additional work will be needed to mitigate bias.
  • Finally, it’s been shown that LLMs trained on the output of LLMs become measurably worse, and even forget things they once “knew”, an affliction named “model collapse”. What this means is that for LLMs to be good and to get better, they’ll need a steady supply of original content, written by humans, making Wikipedia and other sources of human generated content even more valuable. It also means the world’s generative AI companies need to figure out how to keep sources of original human content, the most critical element of our information ecosystem, sustainable and growing over time. 

These are just some of the problems that need to be solved as internet users explore how LLMs can be used. We believe that internet users will place increasing value on reliable sources of information that have been vetted by people. Wikipedia’s policies and our experiences from more than a decade of using machine learning to support human volunteers offer worthwhile lessons in this future.

Principles for the use of generative AI

Machine generated content and machine learning tools aren’t new to Wikipedia and other Wikimedia projects. At the Wikimedia Foundation, we have developed machine learning and AI tools around the same principles that have made Wikipedia such a useful resource to so many: by centering human-led content moderation and human governance. We continue to experiment with new ways to meet people’s knowledge needs in responsible ways including with generative AI platforms, aiming to bring human contribution and reciprocity to the forefront. Wikipedia editors are in control of all machine generated content − they edit, improve, and audit any work done by AI − and they create policies and structures to govern machine learning tools that are used to generate content for Wikipedia.

These principles can form a good starting point for the use of current and emerging large language models. To start, LLMs should consider how their models support people in three key ways:

  1. Sustainability. Generative AI technology has the potential to negatively impact human motivation to create content. In order to preserve and encourage more people to contribute their knowledge to the commons, LLMs should look to augment and support human participation in growing and creating knowledge. They should not ever impede or replace the human creation of knowledge. This can be done by always keeping humans in the loop and properly crediting their contributions. Not only is continuing to support humans in sharing their knowledge in line with the strategic mission of the Wikimedia movement, but it will be required to continue expanding our overall information ecosystem, which is what creates up-to-date training data that LLMs rely on.
  2. Equity. At their best, LLMs can expand the accessibility of information and offer innovative ways to deliver information to knowledge seekers. To do so, these platforms need to build in checks and balances that do not perpetuate information biases, widen knowledge gaps, continue to erase traditionally-excluded histories and perspectives, or contribute to human rights harms. LLMs should also consider how to identify, address, and correct biases in training data that can produce inaccurate and wildly inequitable results.
  3. Transparency. LLMs and the interfaces to them should allow humans to understand the source of, verify, and correct model outputs. Increased transparency in how outputs are generated can help us understand and then mitigate harmful systemic biases. By allowing users of these systems to assess causes and consequences of bias that may be present in training data or in outputs, creators and users can be part of understanding and the thoughtful application of these tools.

A vision for a trusted future

Human contributions are an essential part of the internet. People are the engine that has driven online growth and expansion, and created an incredible place for learning, for business, and for connecting with others.

Could a generative AI replace Wikipedia? It could try, but it would result in a replacement that no one really wants. There’s nothing inevitable about new technology. Instead, it’s up to us all to choose what is most important. We can prioritize human understanding and contribution of knowledge back to the world – sustainably, equitably, and transparently – as a key goal of generative AI systems, not as an afterthought. This would help mitigate increasing misinformation and hallucinations from LLMs; ensure human creativity is recognized for the knowledge that’s created; and most importantly, it will ensure that LLMs and people alike can continue to rely on an up-to-date, evolving, and trustworthy information ecosystem for the long term.

Selena Deckelmann is Chief Product and Technology Officer at the Wikimedia Foundation.

The post Wikipedia’s value in the age of generative AI appeared first on Wikimedia Foundation.

This Month in GLAM: June 2023

Monday, 10 July 2023 07:21 UTC
  • Albania report: CEE Spring Campaign 2023, Albania and Kosovo
  • Asia report: Donation of images from the National Centre for Biological Sciences
  • Brazil report: Native Brazilian photographer wins Wiki Loves Folklore Brazil 2023
  • Croatia report: Half done in 2023
  • Germany report: Museum tour, WLM, handouts and image donation
  • India report: Wiki Exploration Programme GLAM activities
  • Indonesia report: Conclusion of Mini Grants; Second #1Lib1Ref Campaign; Wikisource Workshop in Bali
  • Italy report: TCI and Turin Academy of Science
  • Kosovo report: CEE Spring Campaign 2023, Albania and Kosovo
  • Netherlands report: A new book, new Wikipedia articles, videos and further images on Africa
  • New Zealand report: Report on the Society for the Preservation of Natural History Collections Conference 2023 and Auckland suburb updates
  • Philippines report: GLAM outreach activity at University of Nueva Caceres: Digitization, workshops and proofread-a-thons as future collaboration
  • Poland report: GLAM-Wiki workshops for the Czartoryski Library; Work on the GLAM-Wiki Project Page Continues; End of Internship within the “Praktykuj w Kulturze” Program
  • Sweden report: Knowledge overview; Almedalen week
  • Switzerland report: Swiss GLAM Programm
  • UK report: Cultural diversity
  • USA report: WikiWednesday returns to Manhattan; Wikimedia NYC and Art+Feminism; WikiConference North America 2023; GLAM Wiki 2023
  • Special story: Flickr Foundation and Wikimedia Foundation partner to build Flickypedia
  • GLAM Wiki conference report: The call for proposals is now open for the GLAM Wiki Conference
  • Calendar: July’s GLAM events

Tech News issue #28, 2023 (July 10, 2023)

Monday, 10 July 2023 00:00 UTC
previous 2023, week 28 (Monday 10 July 2023) next

Tech News: 2023-28

As the fediverse and Mastodon grow in users and activity, so does the attention and the risk. This week Mastodon developers released a critical upgrade to address a recent security audit. As the word got out about the upcoming security release, the community rushed like never before to get everyone safe. Inspired by the awesome Claire, who contacted many instances and even patched custom codebases to keep users safe, I wondered how to help.

weeklyOSM 676

Sunday, 9 July 2023 11:51 UTC

27/06/2023-03/07/2023

lead picture

Where are the top 10 corporations contributing? Here DevSeed. [1] © piebro | map data © OpenStreetMap contributors

About us

  • As Chris correctly pointed out, in a comment to our issue #675, we do not link to platforms that are not publicly accessible. Therefore, in future we will no longer link to messages on Twitter. If you want to distribute information coming from Twitter that is relevant to our readers, and you think we should report on it, please provide a non-Twitter link.If you are thinking of joining the Fediverse (especially Mastodon), Jorge Sanz has many tips on where to find accounts that cover geographical topics.

Mapping

  • arjunaraoc published a how-to guide on mapping roadside fuelling stations using OsmAnd to collect the data and JOSM to add the new POIs to the map.
  • CactiStaccingCrane shared his experience of suffering from burnout while mapping landuse in Vietnam. He provided tips for OSM contributors to avoid this.
  • Cascafico explained how to create a simple script to automatically validate the URLs in OSM tags. In a reply, Strubbl referred to his programme on Codeberg, with which one can specify an OSM relation and then all the mapped websites within this boundary will be checked. He then showed an example analysis for München.
  • darkonus shared his technique for creating smooth curves using the CAD Tools plug-in in JOSM.
  • Kai Johnson blogged about his glossary of landforms, which he has put together for contributors interested in mapping natural features.
  • The FlatLaf plug-in lets you personalise the interface of JOSM with nine different themes. You can view these FlatLaf themes on the OSM wiki. The plug-in is maintained by DevCharly on GitHub and it’s also available under the Apache licence for use in other software.
  • The proposal playground=*, to add values to the list of documented playground equipment, is open for vote until Friday 14 July.

Mapping campaigns

  • Mateusz Konieczny has published a MapRoulette challenge listing cases where the wikipedia tag links to a non-existing Wikipedia entry. This is an experimental tool and feedback is welcome. As with any QA tool, remember to use your brain and do not change things blindly!
  • Obianinulu reported on a two-day training course, held at the University of Nigeria in Enugu, on using the iD Editor and Tasking Manager.
  • With Maxar imagery being unavailable for several weeks, SColchester published a short guide that outlines the steps you can take to switch your Tasking Manager project to alternative imagery sources.

Community

  • GrapeMapping published a diary post listing their favourite OpenStreetMap sites and tools.
  • OSM Belgium has declared Michel Hebert, from Grenoble, to be their Mapper of the Month.
  • Having moved to Bonaire a couple of months ago, Probelnijs described some of the challenges they have encountered while contributing to OSM.
  • William Edmisten described how he mounted a GoPro MAX 360° camera on his car, processed the videos, and imported them into JOSM to help find missing features in OpenStreetMap.

OpenStreetMap Foundation

  • Maxar cut off access to their imagery a few weeks ago as they try to deal with non-OpenStreetMap related use of the imagery. The OSMF board is in contact with Maxar to try and resolve the situation.
  • The OpenStreetMap Foundation Board reported on their ‘OSMF Board Screen to Screen’ meeting held in June 2023.

Local chapter news

  • Quincy Morgan unveiled the new design of the OpenStreetMap United States website at the State of the Map US 2023.

Education

  • Sebastian Kauer and Petra Sauerborn have written an article on using OpenStreetMap in the classroom, looking at potential benefits and challenges.

Maps

  • [1] Piet Brömmel has updated his OSM statistics to June 2023 and added some statistics on corporate editors. According to the June 2023 statistics, the ten companies adding the most data to OpenStreetMap are: Meta, Kaart, Amazon, Apple, Microsoft, Grab, DigitalEgypt, DevSeed, TomTom, and Lightcypher.
  • Christoph Hormann wrote, in the first of a series, about ‘drawing the line’ on maps, how different styles of lines are used to convey information in cartography.
  • Ruben Mendoza reported that Development Seed have incorporated Meta’s Segment Anything Model into their DS-Annotate tool.

switch2OSM

  • Robert Potter, from the Victorian Department of Transport and Planning, Australia, consulted the OSM-talk mailing list on the issue of ODbL licences on OpenStreetMap. Adam Franco advised them to release their geospatial data to a public domain licence first, before adding it to the OSM database.

Software

  • Meta Reality Labs have released OrienterNet, an AI-based application to automatically estimate a location based on visual information contained in photos and an user provided coarse location within 100 to 200 meters.
  • autoevolution reviewed OsmAnd on Android Auto as an alternative to Google Maps.

Programming

  • Starting with Debian 12 ‘Bookworm’, the libgdal-perl package, which is based on XS, will no longer be in the repositories. The upstream support period has ended. You can follow these instructions to migrate to the CPAN package Geo::GDAL::FFI. For more details, see the release notes for Debian 12.
  • Jonny J Watson (pixmusix) built a wall hanging board that can display a simulation of pedestrians in Paris. This simulation was created using a Raspberry Pi device and OpenStreetMap data (via the PrettyMaps Python library).

Did you know …

  • Geoconfirmed? It is a community-based geolocation platform with a global reach, focusing on the current conflict in Ukraine. In addition, Syria and Yemen, among others, are also highlighted.
  • … that the maximum size of a PDF version 7 file is 381 km x 381 km?

OSM in the media

  • The Taj Mahal was incorrectly named ‘Shiva Kshetra’ on OpenStreetMap for 13 days. Mappers from Kerala corrected it.

Other “geo” things

  • HeiGIT has analysed and compared the data in the regions of Maribor, Slovenia and Ngaoundéré, Cameroon using their OSM Element Vectorisation Tool.
  • HeiGIT also analysed the correlations between indicators generated from the OSM data in Heidelberg, first to ensure that these indicators measured distinct data attributes, and then to look for interesting or surprising correlations. This was their third and final example of using the OSM Element Vectorisation Tool.

Upcoming Events

Where What Online When Country
London Missing Maps London Mapathon 2023-07-04 – 2030-07-04 flag
Reunião – OSM Brasil – Mapeamento Aberto para Gestão de Riscos/Desastres [Horário 3] 2023-07-07
København OSMmapperCPH 2023-07-09 flag
臺北市 OpenStreetMap x Wikidata 月聚會 #54 2023-07-10 flag
Zürich OSM-Stammtisch 2023-07-11 flag
München Münchner OSM-Treffen 2023-07-11 flag
OSMF Engineering Working Group meeting 2023-07-12
Salt Lake City Salt Lake City OSM Night 2023-07-13 flag
Berlin 181. Berlin-Brandenburg OpenStreetMap Stammtisch 2023-07-13 flag
Bochum OpenStreetMap-Stammtisch Bochum 2023-07-13 flag
Montrouge Rencontre contributeurs Sud de Paris 2023-07-13 flag
165. Treffen des OSM-Stammtisches Bonn 2023-07-18
Lüneburg Lüneburger Mappertreffen (online) 2023-07-18 flag
The Municipal District of Kilkenny City Kilkenny History Mappers MeetUp 2023-07-18 flag
Karlsruhe Stammtisch Karlsruhe 2023-07-19 – 2030-06-21 flag
Lorain County OpenStreetMap Ohio Meetup – Kickoff 2023-07-20 flag
Missing Maps – DRK & MSF Online Mapathon 2023-07-19

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Elizabete, MatthiasMatthias, PierZen, Strubbl, TheSwavu, barefootstache, derFred, rtnf, s8321414.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Bovington tank museum

Friday, 7 July 2023 22:39 UTC

Or as it insists on calling itself the tank museum. The good news is that this museum contains a lot of notable items (it has one of the best tank collections on earth). The bad news is that its been pretty mined out. A lot of wikipedians have visited the museum over the years including as part of a wikimedia UK project in 2014.

Most things available to be photographed have been photographed and most things to be written about have been written about. The photo environment also isn’t the best. Tanks are large vehicles which means the usual issues of photographing indoors (not being able to get far enough away, uneven lighting) are compounded. The museum does run a couple of event days where tanks are driven outside which can provide a much better photographic environment. Indeed given the right weather its possible to take some of the best tank photos Wikipedia is going to get. Another wikipedian has already written rather more lyrically than I can about the experience there.

There are items in the museum’s collection that Wikipedia could still use photos of (or better photos of in some cases). The Morris-Martel, the T14 Heavy Tank and Excelsior. Problem is none of them are currently on display. There are also a few tanks that lack articles such at the L1E3 amphibious tank.

So overall a great museum if you like tanks but for Wikipedians you are probably going to need to plan in advance if you want to get much out of it.

Last year, 1 out of 7 cars bought around the world was an electric vehicle. That’s a huge uptick from just 6 years ago where only 1 in 70 were EVs. As consumers seek to understand more about this fast-growing market, it’s likely they’ll turn to Wikipedia for clear explanations of complex topics.

Enoch Rassachack, rights reserved.

Take lithium nickel manganese cobalt oxides, for example. This is an important material in the lithium-ion batteries made for electric vehicles and our phones. Why is that? How do they work? Well, you can ask Enoch Rassachack who wrote the Wikipedia article about it as part of an assignment. He’s completing an honours degree in chemistry at the University of British Columbia and entering his final year of undergraduate study. He drew upon his studies and research experience to update this public resource.

“I have some co-op experience working with batteries which helped me find this article,” Enoch shared. “I also hope to work as a researcher in materials chemistry after (hopefully) going to grad school, and this project helped me practice communicating some of the knowledge I acquired. I see climate change as the key issue defining this century as well, so I hoped to work on an article that would educate people on something related to it, whether it be atmospheric and environmental chemistry, or technology that’s helping counteract the climate crisis. Considering all this, the page on lithium nickel manganese cobalt oxides fit me perfectly.”

Most of Enoch’s improvements to the article focus on adding new sections about the material’s structure and synthesis. The original version of the article touched upon these points a bit, but Enoch knew he could build upon it with the academic sources he had on hand. He also found the introductory lede section to be difficult to follow without already being knowledgeable on battery materials. This is the beauty of students engaging in this science communication work. They remember what it was like learning about these topics for the first time. But they also have developed some subject matter expertise in their studies. Enoch simplified the opening paragraphs for Wikipedia’s general audience.

“While a majority of my edits involved technical writing, my main goal was to help folks who hand’t heard of NMC materials get a basic understanding of what they are. To this end, the relatively short lead section was the most important part of the article to get right, in my opinion. Of course, my work on the body sections were also valuable, as the summarized technical knowledge would be useful for more savvy chemists/material scientists. But for most readers on Wikipedia, these sections would not be nearly as helpful as the lead.”

For Enoch, this practice in distilling a complex topic into digestible and concise explanations was good practice. He knows he’ll draw upon these skills in his future classes and career.

“There is a lot of focus on more complicated details in an undergraduate chemistry program and even in the technical writing course I did this Wikipedia assignment in, so trying to write for a more general audience was a nice change,” he shared. “Being able to generalize my research later on as a scientist will likely be a useful skill, too. I know that public sentiment can potentially be a factor in getting research funding so spreading knowledge about my own work to people without my chemistry training could help with getting grants. Practicing more concise writing will also help me be clearer in my writing in all aspects of any future career.

“I think Wikipedia can be a great reference tool as well as a good starting point for curious individuals to begin looking into certain topics. Being an online encyclopedia makes it really unique because it’s very convenient for finding generally credible information, but can still be scrutinized since anyone can modify articles. Summarizing topics is Wikipedia’s biggest strength so it is most useful for finding general knowledge in a field without going too much into detail.”

All in all, Enoch found the experience to be valuable in many ways. Considering most students say they’ve been told never to use Wikipedia, diving into its inner-workings and learning to interact with the resource critically and actively is a great experience.

“This was one of the most unique assignments I’ve ever done and gave me a good glimpse behind one of the best Internet resources available. It really showed me a more balanced side to Wikipedia; I knew how the site operated before doing the assignment, but actually taking part in edits gave me much more appreciation for the anonymous users that edit or write huge parts of articles. It’s a lot more work than it seems! My expectations for information from Wikipedia were tempered down closer to reality, too, after seeing how many pages still needed significant work in project pages. It’s still a super useful resource, but its limitations as an ever-expanding collection of knowledge are much clearer, which will ultimately help me use Wikipedia more effectively.”

Interested in incorporating a Wikipedia assignment into your course? Visit teach.wikiedu.org to learn more about the free assignment templates and resources that Wiki Education offers to instructors in the United States and Canada.