en.planet.wikimedia

October 03, 2016

Wiki Education Foundation

The Roundup: Life in Mexico

In 1839, the Scottish explorer Frances “Fanny” Erskine Inglis, later known as Fanny Calderón, took a somewhat controversial road trip.

Her travels through Mexico, which she documented through letters collected in her 1843 book, Life in Mexico, formed one of the earliest and most influential European travel narratives about Latin America. The book, the only narrative at the time written by a woman, was controversial in Mexico. Erskine ended up a consultant of sorts with the US government ahead of the Mexican-American war.

The controversy over the book was twofold. Her writing deployed a somewhat proto-feminist critique of Mexico’s male elite, and the violence of the revolution. But it was also decried as imperialist, because her writing takes the position that Spain was essential to Mexico’s existence.

It’s a fascinating story and discussion, contextualizing some of the themes running through our political discourse today. Thanks to User:Cmartlover, from David Sartorius’s Travel Writing in the Americas course at the University of Maryland, College Park, the world has a deeper understanding of those who shaped the United States’ view of Mexico. That student expanded Calderón’s biography to cover the book in greater depth, and then built a brand new article for Life in Mexico with all of the resources she found.

This is just one great example of the kinds of work students can do that deepen Wikipedia’s coverage of history, literature, and women’s lives. We’re looking to create more great examples! If you’d like to inspire your students and expand the horizons of open knowledge, we’d love to hear from you. Start a conversation by emailing us: contact@wikiedu.org.


Photo: Mexico in 1838 by DigbyDaltonOwn work, CC BY-SA 3.0

by Eryk Salvaggio at October 03, 2016 04:00 PM

William Beutler

Gene Weingarten Proves Wikipedia Still Needs a Better Way to Deal With Feedback

Wikipedia has two kinds of problems. The first category includes problems it recognizes and realizes how to fix, sometimes through a policy change but more often, in recent years especially, by administrative actions or PR activities led by the Wikimedia Foundation. For example, educators once warned students away from Wikipedia, but now editing Wikipedia is an increasingly common pedagogical tool, for which a great deal of credit is owed to the Wiki Education Foundation.

The second type of problem comprises those issues it cannot or will not fix, for reasons as diverse as the problems themselves. This past week brings us another example, highlighted by a September 29 column in the Washington Post Magazine by Gene Weingarten, titled “Dear Wikipedia: Please change my photo!” This comes more than four years after Philip Roth published “An Open Letter to Wikipedia” online at The New Yorker. In each case, both men found fault with their biographical entries on Wikipedia, and used their access to the mainstream media to call attention to the changes.


The problem we are highlighting is that anyone who is written about in a Wikipedia entry typically has no idea what they can or cannot do if they have a problem with said entry. There is some awareness that editing one’s own biography is fraught with peril—“(One is evidently not allowed to alter one’s own entry.)” Weingarten explains in an aside that is effectively true, technically false, and debatable as a matter of Wikipedia guidelines, so who can blame him—but there is little understanding of what one is supposed to do instead:

I tried asking Wikipedia to change or delete this picture. No answer. So I did what any user can do, and deleted it myself, on seven occasions — which, yes, was in blatant and shameful contravention of all Wikimedia Commons policies blah, blah, blah.

Absent a clear path to offering feedback, Weingarten and Roth did they only thing they could imagine: they tried editing the “encyclopedia anyone can edit”. Oddly enough, this didn’t work. Looking at Weingarten’s edits, it’s not hard to see why his attempts to remove the photo were overturned: more than once he simply deleted the entire infobox. He might have been successful if he’d just removed the actual image link (but then again maybe not) however it stands to reason a middle-aged newspaper humor columnist might not be the most adept with markup languages. In Roth’s case, he asked his biographer to make the changes for him, which were overturned because available news sources contravened Roth’s preferred version.

New photo for Gene Weingarten's photo, via Simona Combi on Flickr. Whether it's actually an improvement is a matter on which reasonable people can disagree.

New photo for Gene Weingarten’s photo, via Simona Combi on Flickr. Whether it’s actually an improvement is a matter on which reasonable people can disagree.

When editing Wikipedia didn’t work, each finally turned their media access to their benefit, and this time they got results. Within hours of Weingarten’s article becoming available, Wikipedia editors gathered on the discussion page of his biography to determine what could or should be done about his plight. Meanwhile on Twitter, longtime Wikipedia contributor (and DC-based journalism professor) Andrew Lih engaged Weingarten in a conversation, trying to get a better photo for him, and explaining why his Washington Post headshot could not be used. Soon, another photo satisfying Wikipedia’s arcane image use policies was identified and added to the article, although it doesn’t seem Weingarten isn’t especially happy with it, either. Lih had previously invited Weingarten out to lunch and a quick photo shoot, and it sounds like this may still happen.

In Roth’s case, it was a more complicated matter: several book reviews had identified a character in Roth’s The Human Stain as “allegedly inspired by” a writer whom Roth denies was the character’s inspiration. In the short term, Roth’s objection was noted, but sometime after the entire matter was relocated to a subsection of the novel’s Wikipedia entry as “Anatole Broyard controversy”, explaining the matter more fully. This seems like the right outcome.

So, everything worked itself out, right? That’s just how Wikipedia works? Mostly, and yes, and this is nevertheless somewhat regrettable. The fact is Weingarten and Roth are both able to command a major media audience via a “reliable source” platform that the vast majority of people (and bands, brands, teams, companies, nonprofits, &c.) do not. The method they used to get action not only doesn’t scale, it rarely happens at all due to most article subjects’ fear of a “Streisand effect” bringing undue attention to their article. As Weingarten writes in his piece:

[I]it is also possible that this column will serve as a clarion call to every smart aleck and wisenheimer and cyber-vandal out there. Anyone can make ephemeral changes to my Wikipedia page, any time.

Fortunately, that hasn’t happened, but it isn’t an unreasonable worry. Fortunately for Weingarten, as a white male whose writing doesn’t really take sides on controversial issues, he’s not much of a target for the Internet’s troll armies and political agitators.

The causes of this failure are many. We can assign some blame to Wikipedia’s strict policies regarding copyrights and reliance on crowdsourced images which has made its often-poor celebrity headshots both a source of angst and amusement. We can assign some to Wikipedia’s confusing discussion pages, which are forbidding; a project was once in development to overhaul them, only to be mothballed after facing community critcism. We can assign some as well to the contradictory message of Wikipedia as the encyclopedia anyone can edit—just not when the subject is the one you know about best, yourself. And we cannot let Wikipedia’s editing community escape blameless; even as they are not an organized (or organizable) thing, the culture is generally hostile to outsiders, unless of course said outsiders can get their criticism of Wikipedia into a periodical they’ve heard of before.

In the four years since the Roth episode, Wikipedia has had time to come up with a process for accepting, reviewing, and responding to feedback. I’ve argued previously for placing a button on each entry to solicit feedback, feeding into a public queue for editorial review. The reasons not to do this are obvious: most of it would be noise, and there wouldn’t be enough editor time to respond even to those requests which might be actionable.

I still think the feedback button is a good idea, but I recognize it is not sufficient: it would also needs an ombuds committee set up to triage this feedback. Perhaps this could be community-run, but this seems too important to be left up to volunteers. This work could be performed by WMF staff even if, for complicated reasons every Wikipedia editor understands but would need a lengthy paragraph to explain, they could not implement them outright. And it’s not just a matter of making sure Wikipedia is accurate—though you’d think that would be enough!—it’s also a matter of making sure Wikipedia is responsible and responsive to legitimate criticism.

Of course, Wikipedia already operates on this very model, in a way: it solicits edits from its readership, and then also spends a lot of time reverting unhelpful edits, and the difference between bad edits with good intentions and bad edits with bad intentions is often impossible to tell. Providing a clear option for expressing a specific concern rather than forcing the expression of that problem to be an edit rather than a request is something Wikipedians should think about again. When someone is unhappy with their Wikipedia entry, that they have no idea what can be done about it isn’t really their fault. Ultimately, it’s Wikipedia’s. And it’s not just an abstract information asymmetry problem—it’s a PR problem, too.

by William Beutler at October 03, 2016 03:23 PM

Tech News

Tech News issue #40, 2016 (October 3, 2016)

TriangleArrow-Left.svgprevious 2016, week 40 (Monday 03 October 2016) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎čeština • ‎Ελληνικά • ‎English • ‎español • ‎suomi • ‎français • ‎עברית • ‎हिन्दी • ‎italiano • ‎norsk bokmål • ‎polski • ‎русский • ‎shqip • ‎svenska • ‎українська • ‎Tiếng Việt • ‎中文

October 03, 2016 12:00 AM

October 01, 2016

Wikimedia Foundation

What motivates a Wikipedia editor to write about the Battersea Power Station?

Photo by Alberto Pascual, CC BY-SA 3.0.

Photo by Alberto Pascual, CC BY-SA 3.0.

You gotta be crazy, you gotta have a real need
You gotta sleep on your toes, and when you’re on the street
You gotta be able to pick out the easy meat with your eyes closed

So begins the first verse of “Dogs,” a song by progressive rock icon Pink Floyd from their album Animals.

The cover of this album, the visuals that greeted every prospective buyer in record stores around Europe and the United States, featured the now-iconic Battersea Power Station with a forty-foot long inflatable pig floating above it.

The band ran into problems when a gust of wind carried the pig miles away, landing in a farmer’s field, but the resulting album cover was memorable enough to receive a cameo appearance during the opening ceremony of the 2012 Olympics.

This factoid is just one of many covered in Wikipedia’s “good” article on the power station, a quality marker given when an article meets six criteria. Most recently, editors have updated it to include information on Apple’s new London headquarters, which will be located on the site; approximately 1,400 Apple employees will eventually work from there.

The article was nominated for good article status by Wikipedia editor Fintan264, who called the power station “iconic and special”—although in large part only because it is the only station of its type still extant.

“In the 1940s and 1950s, the ‘brick cathedral’ was the go-to design style of electricity generating stations in the UK. That time period also marked the nationalisation of the UK electricity industry and the establishment of the Central Electricity Generating Board. It was a time when electricity was being rolled out to everyone across the country, coinciding with the government of Clement Attlee and his building of a welfare state.”

“The electricity was generated in these enormous, beautifully designed cathedrals, which in my opinion complimented the British landscape perfectly; modern temples to worship a new sort of power. Personally I don’t think there’s anything particularly special about Battersea, other than it was designed by GG Scott and in London, and so sort of ended up the poster boy of the brick cathedrals. As far as I know, it is now the sole remaining example of the style—which makes it very special!”

Fintan has worked on a plethora of power stations around the UK, including Lemington, Blyth, and Stella. The latter is perhaps the most important, as it is what kindled Fintan’s interest in power stations.

“My interest sprung from having Stella power station, a defunct brick cathedral station, on my doorstep growing up.  There was something about the way it looked in the sun, sitting in the Tyne valley, that was really pleasing to me as a kid. Its demolition in 1996-97 was one of my strongest childhood memories, and is still something I’m disappointed about.”

All in all, over 600 editors have made 1230 edits to the article in its fourteen years of existence. Created with 501 words by editor Maury Markowitz in October 2002, the article is now more than ten times larger, at 5,169 words.

Ed Erhart, Editorial Associate
Wikimedia Foundation

by Ed Erhart at October 01, 2016 11:17 PM

September 30, 2016

Brion Vibber

JavaScript async/await fiddling

I’ve been fiddling with using ECMAScript 2015 (“ES6”) in rewriting some internals for ogv.js, both in order to make use of the Promise pattern for asynchronous code (to reduce “callback hell”) and to get cleaner-looking code with the newer class definitions, arrow functions, etc.

To do that, I’ll need to use babel to convert the code to the older ES5 version to run in older browsers like Internet Explorer and old Safari releases… so why not go one step farther and use new language features like asynchronous functions that are pretty solidly specced but still being implemented natively?

Not yet 100% sure; I like the slightly cleaner code I can get, but we’ll see how it functions once translated…

Here’s an example of an in-progress function from my buffering HTTP streaming abstraction, currently being rewritten to use Promises and support a more flexible internal API that’ll be friendlier to the demuxers and seek operations.

I have three versions of the function: one using provisional ES2017 async/await, one using ES2015 Promises directly, and one written in ES5 assuming a polyfill of ES2015’s Promise class. See the full files or the highlights of ES2017 vs ES2015:

The first big difference is that we don’t have to start with the “new Promise((resolve,reject) => {…})” wrapper. Declaring the function as async is enough.

Then we do some synchronous setup which is the same:

Now things get different, as we perform one or two asynchronous sub-operations:

In my opinion the async/await code is cleaner:

First it doesn’t have as much extra “line noise” from parentheses and arrows.

Second, I can use a try/finally block to do the final state change only once instead of on both .then() and .catch(). Many promise libraries will provide an .always() or something but it’s not standard.

Third, I don’t have to mentally think about what the “intermittent return” means in the .then() handler after the triggerDownload call:

Here, returning a promise means that that function gets executed before moving on to the next .then() handler and resolving the outer promise, whereas not returning anything means immediate resolution of the outer promise. It ain’t clear to me without thinking about it every time I see it…

Whereas the async/await version:

makes it clear with the “await” keyword what’s going on.

Updated: I managed to get babel up and running; here’s a github gist with expanded versions after translation to ES5. The ES5 original is unchanged; the ES2015 promise version is very slightly more verbose, and the ES2017 version becomes a state machine monstrosity. 😉 Not sure if this is ideal, but it should preserve the semantics desired.

by brion at September 30, 2016 11:57 PM

Wikimedia Foundation

Editors chronicle the life of baseball pitcher José Fernández after his untimely death

Photo by Arturo Pardavila III, CC BY-SA 2.0.

Photo by Arturo Pardavila III, CC BY 2.0.

This week started with news about the tragic death of Cuban-born American baseball pitcher José Fernández, who died with three friends in a boating accident last weekend. His Wikipedia page has received 300 edits and 1.5 million visits since his death.

The news came as a bombshell for Miami Marlins pitcher fans and many others who followed his brief but accomplished career—including Wikipedia editor User:Muboshgu, who created Fernández’s page on the English Wikipedia in May 2012 and has dedicated a significant amount of time to updating the article after Fernández’s death.

“I was personally shocked, as most people were,” Muboshgu told us. “You don’t expect a 24-year-old athlete to die so suddenly. It’s highly reminiscent of the deaths of Nick Adenhart and Oscar Taveras, who were in their early 20s when they died in car crashes. It’s terribly sad to think of what might have been if he had had a full career, and instead see his life end so tragically.”

Indeed, Slate‘s Josh Levin called Fernández the “future of baseball,” and FiveThirtyEight‘s Neil Paine wrote “Fernández’s loss will leave a void not just of personality, but of Hall of Fame-caliber talent. The game only had a small taste of what Fernández could do.”

Muboshgu said that he and other Wikipedia community members “kept a close eye on the page to make sure it was being edited properly” in the days after the death.

This tragic accident occurred only a short time after Fernández announced that his girlfriend was pregnant by sharing a photo of her on Instagram, using the caption “I’m so glad you came into my life. I’m ready for where this journey is gonna take us together. #familyfirst.”

On the day of his death, Fernández’s page on the English Wikipedia received over 670,000 views. The number climbed to over 1.5 million views in the first four days. This is 1,000 times more pageviews than the last four days prior to his death. Many editors volunteered to update the page over 250 times in the four days after the news broke.

Graph of pageviews to José Fernàndez article between September 23 and 28.
Graph of pageviews to José Fernàndez’s article between September 23 and 28.

Many people trust Wikipedia as a source for up-to-the-minute information when news breaks. Others refer to Wikipedia for historical context behind current events. 

“It’s important to update Wikipedia biographies whenever major news breaks,”Muboshgu explains. “There’s always a surge in article readership after its subject dies—the weekly Wikipedia top 25 pageviews report usually includes famous people who’ve died. Unfortunately, there’s also a surge of misinformation as unconfirmed reports and rumors circulate, which we need to watch for.”

Muboshgu’s statement is backed up by Wikipedia’s statistics, even if you limit it to only this year. Page views to the Wikipedia articles about Prince, David Bowie, and Alan Rickman also soared into the millions of views in the days following their deaths.

Muboshgu believes that the intrigue surrounding Fernández’s death has also been influenced by his difficult youth a few years before his rise to fame. “The story of his escape from Cuba is remarkable: he was caught three times and spent time in a Cuban prison when he was still a teenager,”Muboshgu said. “Then, on his fourth attempt, he saw someone fall into the water and dove in to save that person, without realizing it was his mother.”

“His personality really shone through on and off the field. I still remember the emotion of his reunion with his grandmother for the first time since his defection. It’s a terrible loss for baseball, and for all of us.”

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at September 30, 2016 06:03 AM

September 29, 2016

Weekly OSM

weeklyOSM 323

09/20/2016-09/26/2016

Logo

This year’s participants @ SotM 2016 in Brussels 1 | Picture by Tatiana van Campenhout CC-BY-SA 4.0

SotM 2016

  • During State of the Map in Brussels the 2016 funding drive was started, which shall finance the current cost of the OSMF. This is (not only) including the purchase and operation of the OSM servers. The funding goal is 70,000 Euro.
  • The videos from State of the Map 2016 are available online as full event streams. It will be split by talk shortly.
  • [1] At SotM the winner of the OSM awards were published (number of votes and given votes in brackets): Roland Olbricht (213/665), Lukas Martinelli and Manuel Roth (160/500), WeeklyOSM-Team (377/584), Martin Ždila (186/554), Pascal Neis (306/605), Frederik Ramm (186/576). More detailed results are on the voting platform, and there is a OSMF-Blog post.
  • You can find some pictures from the SotM Brussels with the tag sotm2016 on flickr.
  • At this year’s SotM there had been many wearers of a craftmapper t-shirt. The shirt was distributed at the conference as a reply to Michal Migurskis’s blog post (as reported here earlier). Which created some speculation on twitter.
  • Gregory Marler shares his experience of being a Code of Conduct volunteer at the SoTM Brussels.

Mapping

  • BushmanK thinks that the RFC status of proposals is not always respected and are not allowed to pass into the voting phase without processing all comments and questions.
  • On the Tagging mailing list, the tag for cenotaph is being discussed. It seems like none of these memorials had been added to OSM.
  • Warin has discovered an eight year old draft of a proposal to tag floodplains (natural=floodplain).
  • Abhishek of the Mapbox data team reports in his user diary about their work to improve the positional accuracy of the road network in Taiwan. (we reported earlier)

Community

  • Bryan Housel suggests in the Github repository of OpenStreetMap to introduce a code of conduct. In Bryan’s simultaneously added pull request there is a discussion about who would be the correct person to address for complaints.
  • About 100 students of the 4th grade of Santa Maria high school of Portugalete mapped the problematic points for people with reduced mobility and suggested more accessible routes so they can reach destinations, which are critical for them. Mapping Portugalete according to the report of this blog (automatic translation) is one of the many actions that were developed since 2014 by the program Ciudades Amigables (Friendly Cities).
  • Yahoo Finance reported, that Telenav on September 22 announced the availability of Open Street View, a free open – source platform, designed to accelerate the development of OSM. The platform includes free iOS and Android apps.
  • Pascal Neis experiments with a presentation of active mappers on a map.
  • Ben Discoe (in possibly the most polite “rant” ever written) commented on Michal Migurskis blogpost “OSM at Crossroads” (which was mentioned in weeklyOSM previously). The issue that he raises is somewhat different however – money!
  • Pratik Yadav asks for the reason to choose a particular username for OpenStreetMap. First answers can be found in the comments.

Imports

  • Spencer Gardner tries to find out how the lanes=* tag of oneway streets got “caculated” for streets in Massachusetts during the MassGIS Road Import.
  • The highway milestones became OpenData in France. The community discusses the tags, especially the value for the ref-tag
  • Dewi Sulistioningrum, an member of HOT, is interested in importing “infrastructure” (schools, mosques, hospitals etc.) and boundaries in a part of Indonesia.

OpenStreetMap Foundation

  • The Swiss OpenStreetMap Association is now a recognized local chapter (regional office) of OSMF. More local chapters exist in Italy and Iceland.

Humanitarian OSM

  • Biondi Sima of HOT Indonesia reported about three ways to DRR in Indonesia.
  • The project Healthsites.io that collects medical care centers for humanitarian purposes is now in a ß-status.
  • HOT reports about a week-long pre-assessment trip in Karamoja, Uganda.

Maps

  • The OpenCycleMap now supports cycleway: right = track / lane and cycleway: left = track / lane.
  • The Open Transport Net project with 14 European partners involved, opens and visualizes transport data in Europe. The city of Pilsen (Czech republic) visualized the traffic flow before and when major public works were taking place in the center of the city. The Pilsener citizens can see on the web which traffic issues they can expect.
  • Sven Geggus has deployed (Deutsch) (automatic translation) the new German map style, a fork of OSM Carto with German/transliterated labels and “German” road colours, on openstreetmap.de.
  • Yvecai has created a new base layer for OpenSnowMap, which allows good contrast and good visibility for the display of ski slopes.

Open Data

Software

  • Urbica has developed Galton an OSM-data and OSRM-engine based router that shows the areas for any point that can be reached in a fixed time. The source code is on Github. On her blog, the project is described more details.

Programming

  • On AWS there’s a new public and free digital elevation model which was created with Mapzen’s help out of different free data sources.
  • Tim Teulings reports about his work on libosmscout during the SommerCamp in Essen.
  • Lukas Martinelli searches for supporters on Kickstarter.com to develop a free style editor for JSON Styles for MapboxGL. The funding target was 4000 CHF and was reached during the SotM weekend.

Releases

Software Version Release date Comment
Mapbox GL JS v0.24.0 2016-09-19 Eleven new features & improvements, ten bugs fixed.
OsmAnd for iOS * 1.2.5 2016-09-19 Bugs fixed.
Geobuf 3.0.0 2016-09-20 One important breaking change and one bug fixed.
OsmAnd for Android * 2.4 2016-09-21 No actual info.
OsmAnd+ for Android * 2.4.6 2016-09-21 Car audio system integration, better routing, bugfixes.
Vespucci 0.9.8 2016-09-21 Many changes and improvements, please read relase info.
Mapillary Android * 2.40 2016-09-23 Login and signup with OpenStreetMap and Google, fixes for more stable browsing.
QGIS 2.16.3 2016-09-23 No infos
Gnome Maps 3.22.0 2016-09-24 Added, updated or fixed translations: Persian, Italian, Scottish Gaelic and Ukrainian.
Maperitive * 2.4.0 2016-09-25 No Info
Komoot Android * var 2016-09-26 No info
Mapillary iOS * 4.4.14 2016-09-26 Login and signup with OpenStreetMap and Google, preserving EXIF tags.
Maps.me Android * var 2016-09-26 GPS error fixed, new map data.
OpenStreetMap Carto Style 2.44.0 2016-09-26 Please read info.

Provided by the OSM Software Watchlist.

(*) unfree software. See freesoftware.

Did you know …

  • … the Taginfo report that indicates which tags are used a lot but do not have a wiki page to describe them?
  • VROOM (Vehicle Routing Optimization Open-source Machine)? jcoupey provides an extension for OSMBC.
  • … the ‘Binary Bandits’ are stealing house numbers in Philadelphia, USA? Mysterious thieves steal the zeros and ones of the house numbers in Philadelphia. What drives the ‘Binary Bandits? And what does that mean for OSM? 😉

Other “geo” things

  • Holger Sparr of MacLife lists the best map apps for iPhone/iPad and explains their merits.
  • The web-enabled vehicles from Audi, BMW and Daimler probably will soon be a data supplier for Here.
  • The Spanish magazine NOSOLOSIG reported (Spanish) (automatic translation) in its latest edition that the “Instituto Panamericano de Geografía e Historia (IPGH)” commissioned a study on the quality of geodata in Latin-America.
  • Google explains how to create a custom map style for their vector tiles and therefore tries once more to close the gap with OpenStreetMap.

Upcoming Events

Wo Was Wann Land
Tampere OSM kahvit Tampere 29.09.2016 finland
Espoo OSM kahvit Espoo 29.09.2016 finland
Leoben Stammtisch Obersteiermark 29.09.2016 austria
Stazzema Mapping Party Sentiero Alta Versilia: serata introduttiva al Mapping Party del 2 Ottobre 30.09.2016 italy
Düsseldorf Stammtisch 30.09.2016 germany
Sarapiquí https://fundecor.org: Mapatón v3 UNA Sede Río Frío Distrito Horquetas de Sarapiquí, Heredia 30.09.2016 costa rica
Genova Mappalonga Mapathon 01.10.2016 italy
Saint-Herblain Cartopartie 01.10.2016 france
Kyoto 京都オープンデータソン2016 vol.2(吉田神社) with 第1回諸国・浪漫マッピングパーティ 01.10.2016 japan
Trento Mapping party Pieve Tesino 01.10.2016-02.10.2016 italy
Metro Manila State of the Map Asia 2016 01.10.2016-02.10.2016 philippines
Stazzema 2016 Mapping party del sentiero Alta Versilia: escursione e raccolta dati 02.10.2016 italy
Rennes Découverte d’OpenStreetMap pour l’humanitaire 02.10.2016 france
Taipei Taipei Meetup, Mozilla Community Space 03.10.2016 taiwan
Rostock OSM Stammtisch Rostock 04.10.2016 germany
Paris Mapathon Missing Maps Paris 04.10.2016 france
Falmouth Missing Maps by ClearMapping (https://twitter.com/clearmappingco/status/780704843786817536) 04.10.2016 united kingdom
Roma Geobirra Roma 04.10.2016 Italia
Stuttgart Stammtisch 05.10.2016 germany
Vienna 57. Wiener Stammtisch 06.10.2016 austria
Dresden Stammtisch 07.10.2016 germany
Dresden Elbe-Labe-Meeting 08.10.2016-09.10.2016 germany
Lyon Rencontre mensuelle mappeurs 11.10.2016 france
Landshut Landshut Stammtisch 11.10.2016 germany
Munich Stammtisch München 11.10.2016 germany
Berlin 100. Berlin-Brandenburg Stammtisch 14.10.2016 germany
Tokyo 東京!街歩き!マッピングパーティ:第1回 哲学堂公園 15.10.2016 japan
Berlin Hack Weekend 15.10.2016-16.10.2016 germany
Favara Mapping party dei vicoli, cortili, scalinate, archi e orti del centro antico di Favara, Organizzato dalla Molitec, Farm Cultural Park e Tivissima 16.10.2016 italy
Bonn Bonner Stammtisch 18.10.2016 germany
Lüneburg Mappertreffen Lüneburg 18.10.2016 germany
Nottingham Nottingham 18.10.2016 united kingdom
Scotland Edinburgh 18.10.2016 united kingdom

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..

This weekly was produced by Hakuch, Laura Barroso, Nakaner, Peda, Polyglot, Rogehm, SomeoneElse, TheFive, YoViajo, derFred, escada, jinalfoflia, mgehling, seumas, wambacher.

by weeklyteam at September 29, 2016 09:31 PM

Wiki Education Foundation

SFSU opening access to library resources for Wikipedian interested in disability studies

Definitions of disability are often cast in medical terms. While important, concentrating on that one aspect of a disability-related topic can mean inadequate coverage of other social, cultural, historical, economic, and political aspects. Writing a high-quality Wikipedia article about the subject thus typically means drawing from research in the sciences, but also in the social sciences and humanities. Getting access to those sources, however, can be a challenge for Wikipedia editors, who may run into barriers like paywalls. When Wikipedians can’t access the necessary materials about a subject, articles and perspectives within articles can be neglected.

For that reason, San Francisco State University (SFSU) is opening access to its library resources for a Wikipedian interested in disability studies.

As with other Visiting Scholars positions, the Wikipedians aren’t required to be physically present at the university. The only expectation is that they bring some of the articles they work on in that subject area to B-class or better over the course of a year. For most Wikipedians who would be applying for such a position, that’s the sort of activity they would be doing anyway, but now with access to high-quality research resources.

The opportunity is supported by SFSU’s Paul K. Longmore Institute on Disability, which works to challenge stereotypes and showcase the strength, ingenuity, and originality of disabled people. For Associate Director Emily Smith Beitiks, the Visiting Scholars program is a way to support the Institute’s mission by helping to improve public knowledge about disability on Wikipedia, using the rich resources collected by SFSU to build well-rounded multidisciplinary articles.

If you’re a passionate Wikipedian with an interest in this field, we’d love to help connect you. You can apply for a Visiting Scholar position here and, if you have questions, drop us a line: visitingscholars@wikiedu.org. For more information about the Visiting Scholars program, see the Visiting Scholars section of our website.


Photo: SFSU Campus Overview Nov2012 by Webbi1987 – Own work, CC BY-SA 3.0

by Ryan McGrady at September 29, 2016 05:21 PM

Kelly Doyle

Shyamal

The many shades of citizen science

Everyone is a citizen but not all have the same kind of grounding in the methods of science. Someone with a training in science should find it especially easy to separate pomp from substance. The phrase "citizen science" is a fairly recent one which has been pompously marketed without enough clarity.

In India, the label of a "scientist" is a status symbol, indeed many actually proceed on paths just to earn status. In many of the key professions (example: medicine, law) authority is gained mainly by guarded membership, initiation rituals, symbolism and hierarchies. At its roots, science differs in being egalitarian but the profession is at odds and its institutions are replete with tribal ritual and power hierarchies.

Long before the creation of the profession of science, "Victorian scientists" (who of course never called themselves that) pursued the quest for knowledge (i.e. science) and were for the most part quite good as citizens. In the field of taxonomy, specimens came to be the reliable carriers of information and they became a key aspect of most of zoology and botany. After all what could you write about or talk about if you did not have a name for the subjects under study. Specimens became currency. Victorian scientists collaborated in various ways that involved sharing information, sharing /exchanging specimens, debating ideas, and tapping a network of friends and relatives for more. Learned societies and their journals helped the participants meet and share knowledge across time and geographic boundaries.  Specimens, the key carriers of unquestionable information, were acquired for a price and there was a niche economy created with wealthy collectors, not-so-wealthy field collectors and various agencies bridging them. That economy also included the publishers of monographs, field guides and catalogues who grew in power along with organizations such as  museums and later universities. Along with political changes, there was also a move of power from private wealthy citizens to state-supported organizations. Power brings disparity and the Victorian brand of science had its share of issues but has there been progress in the way of doing science?

Looking at the natural world can be completely absorbing. The kinds of sights, sounds, textures, smells and maybe tastes can keep one completely occupied. The need to communicate our observations and reactions almost immediately makes one need to look for existing structure and framework and that is where organized knowledge a.k.a. science comes in. While the pursuit of science might seem be seen by individuals as being value neutral and objective, the settings of organized and professional science are decidedly not. There are political and social aspects to science and at least in India the tendency is to view them as undesirable and not be talked about so as to appear "professional".  

Being silent so as to appear diplomatic probably adds to the the problem. Not engaging in conversation or debate with "outsiders" (a.k.a. mere citizens) probably fuels the growing label of "arrogance" applied to scientists. Once the egalitarian ideal of science is tossed out of the window, you can be sure that "citizen science" moves from useful and harmless territory to a region of conflict and potential danger. Many years ago I saw a bit of this  tone in a publication boasting the virtues of Cornell's ebird and commented on it. Ebird was not particularly novel to me (especially as it was not the first either by idea or implementation, lots of us would have tinkered with such ideas, even I did with - BirdSpot - aimed to be federated and peer-to-peer - ideally something like torrent) but Cornell obviously is well-funded. I commented in 2007 that the wording used sounded like "scientists using citizens rather than looking upon citizens as scientists", the latter being in my view the nobler aim to achieve. Over time ebird has gained global coverage, but has remained "closed" not opening its code or discussions on software construction and by not engaging with its stakeholders. It has on the other hand upheld traditional political hierarchies and processes that ensure low-quality in parts of the world where political and cultural systems are particularly based on hierarchies of users. As someone who has watched and appreciated the growth of systems like Wikipedia it is hard not to see the philosophical differences - almost as stark as right-wing versus left-wing politics.

Do projects like ebird see the politics in "citizen-science"?
Arnstein's ladder is a nice guide to judge
the philosophy behind a project.
I write this while noting that criticisms of ebird as it currently works are slowly beginning to come out (despite glowing accounts in the past). There are comments on how it is reviewed by self-appointed police  (it seems that the problem seems to be not just in the appointment - indeed why could not have the software designers allowed anyone to question any record and put in methods to suggest alternative identifications - gather measures of confidence based on community queries and opinions on confidence measures), there are supposedly a class of user who manages something called "filters" (the problem here is not just with the idea of creating user classes but also with the idea of using manually-defined "filters", to an outsider like me who has some insight in software engineering poor-software construction is symptomatic of poor vision, guiding philosophy and probably issues in project governance ), there are issues with taxonomic changes (I heard someone complain about a user being asked to verify identification - because of a taxonomic split - that too a split that allows one to unambiguously relabel older records based on geography - these could have been automatically resolved but the lazy developers obviously prefer to get users to manage it), and there are now dangers to birds themselves. There are also issues and conflicts associated with licensing, intellectual property and so on. Now it is easy to fix all these problems piecemeal but that does not make the system better, fixing the underlying processes and philosophies is the big thing to aim for. So how do you go from a system designed for gathering data to one where you want the stakeholders to be enlightened. Well, a start could be made by first discussing in the open.

I guess many of us who have seen and discussed ebird privately could have just said I told you so, but it is not just a few nor is it new. Many of the problems were and are easily foreseeable. One merely needs to read the history of ornithology to see how conflicts worked out between the center and the periphery (conflicts between museum workers and collectors); the troubles of peer-review and open-ness; the conflicts between the rich and the poor (not just measured by wealth); or perhaps the haves and the have-nots. And then of course there are scientific issues - the conflicts between species concepts not to mention conservation issues - local versus global thinking. Conflicting aims may not be entirely solved but you cannot have an isolated software development team, a bunch of "scientists" and citizens at large expected merely to key in data and be gone. There is perhaps a lot to learn from other open-source projects and I think the lessons in the culture, politics of Wikipedia are especially interesting for citizen science projects like ebird. I am yet to hear of an organization where the head is forced to resign by the long tail that has traditionally been powerless in decision making and allowing for that is where a brighter future lies. Even better would be where the head and tail cannot be told apart.

Postscript:  Amazingly I heard from nobody involved in the activities considered harmful above - the vetting of records by a select few on ebird etc. But I have heard quite a bit from "victims" since...

There is an interesting study of fieldguides and their users in Nature - which essentially shows that everyone is quite equal in making misidentifications - just another reason why ebird developers ought to just remove this whole system creating an uber class involved in rating observations/observers.

by Shyamal L. (noreply@blogger.com) at September 29, 2016 07:09 AM

Gerard Meijssen

Trust

I read an article, I found what was written astounding and signalled that I had to read it again to really understand what is said and what it implies. The article was published in a quality newspaper; the Independent. The reply that I got was: "Indeed. And it's Fisk, so you can't just pretend it is an obscure journalist talking about something that may have happened..."

As I did not know Robert Fisk, I looked him up. I checked his Wikipedia article and found that he has indeed a reputation that is really good. He received many more rewards than was known at Wikidata so I added several and it is fun to establish the quality of its sources. For the Lannan Cultural Freedom Prize the Lannan website says it all. It is linked on the item for the award and that should suffice. For the Amnesty International UK Media Award it is not so obvious. It is conferred by te UK branch of Amnesty International and it has no dedicated page for the award. I added the award, the chapter and had a look at the pages for the award ceremony for each year. These Wikipedia articles refer to webpages that no longer exist.

For the Lannan Cultural Freedom Prize I added the other recipients because it gives some insight in the relevance of the award. I did not do this for the Martha Gellhorn prize for journalism.

The point of this all is that reputation amounts to trust about the message that is written. Read the article, it is likely that you are not familiar with the Wahhabi belief, a subset of Sunni Islam that is practiced in Saudi Arabia. The article is about 200 Sunni scholars that denounce the Wahhabi belief. Several major scholars are involved. Have a read and have a think, the article is by a major journalist published in a major news paper about something that is not without consequences.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at September 29, 2016 06:35 AM

September 28, 2016

Wikimedia Foundation

Discover Egypt with Roland Unger on Wikivoyage

Photo via Roland Unger

Photo via Roland Unger.

Roland Unger, a volunteer co-founder of Wikivoyage, strives to enrich the content on this free travel guide. His regular trips to Egypt inform hundreds of articles about it on the German Wikivoyage.

During our interview with him, he wore a T-shirt with the Wikivoyage logo and an Arabic inscription which said, “Discover the world, or rather, discover Egypt.” “This is my advice to everyone,” Unger explained pointing to the slogan on his T-shirt.

So far, Unger has made over 350,000 edits to Wikivoyage and created over 1,000 articles on the German Wikivoyage alone, illustrated with hundreds of freely-licensed photos of his own work. The vast majority of his contributions are about Egypt. He has visited over 700 towns during thirty trips to it. Unger told the Wikimedia blog about the reason behind visiting this big number of the mostly non-touristic-appealing towns.

“Every place is unique for some reason, for example this last visit I was in the town of Fuwwah. Normally, tourists don’t go to this town, but I have been told that there are 300 mosques in this area and that at least 20 of them are historic monuments.”

A pure coincidence led to Unger’s long adventure in Egypt. “I lived in East Germany until 1990,” Unger recalls, “where there were restrictions on traveling to many countries. When that was over, I wanted to visit some of the countries that were formerly off limits to us. I wanted to travel to Greece first, but all the tickets were sold out. The travel agency suggested that I go to Egypt instead.”

Photo via Roland Unger

Photo via Roland Unger.

During the next few years, Unger made his way to other countries, such as Iran, Iraq, Syria and Morocco. He could have visited a new country or two every year, but he had a new idea. “I wanted to visit fewer countries or just one country in order to experience it more fully. I chose Egypt because you can visit extraordinary Egyptian monuments from different historical epochs and you can participate in many other activities, like diving,” Unger adds.

With all his experience traveling before joining Wikivoyage, Unger contributed to a web-based free travel guide between 2004 and 2006. After that side was sold to a for-profit company, Unger worked with other contributors from Germany and Italy to create a new free website to host their writing about travel. They founded Wikivoyage in 2006, which became a Wikimedia project in 2013.

“I would like to share my knowledge with everyone,” says Unger, “and I can’t imagine a young traveler nowadays holding a big travel book all the time to get the information they need. They usually have a smartphone. I believe information should be available at their fingertips on their phones for free.”

Unger lives in Halle, Saxony-Anhalt in Germany where he was born and raised. For his day job, he is a scientific collaborator and lecturer at his city’s historic university and where he did a PhD in Chemical Physics.

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at September 28, 2016 10:26 PM

Medical school class profiled as case study of Wikipedia Education Program

Photo by Samantha Erickson, CC BY-SA 4.0.

Photo by Samantha Erickson, CC BY-SA 4.0.

A team of students, academics, researchers, and Wikipedia contributors have produced “Why Medical Schools Should Embrace Wikipedia,” a case study of the Wikipedia Education Program published in the journal Academic Medicine. The research presents the study method and outcomes for several groups of medical students editing Wikipedia health-related articles. This paper is the first academic case study of the Wikipedia Education Program. It models a method for reporting audience reach for Wikipedia editing projects, grants credibility for Wikipedia editing in the sensitive space of medical schools, and presents a thorough classroom outreach and follow-up model which interested instructors can replicate.

In the study, a class on editing Wikipedia was offered between 2013 and 2015 to final-year medical students. Collectively, 43 students edited 43 Wikipedia articles. Student contributions were reviewed by classroom peers, subject matter experts, and the Wikipedia community. Following the class, the Wikipedia articles edited by the students were accessed more than 22 million times by Wikipedia readers. The authors of the paper argue that students met learning goals by editing Wikipedia and that Wikipedia is an efficient way for anyone to share information with a large, relevant audience.

This research is significant because Wikipedia continues to gain popularity as a source of medical information among health science professionals and students. Having a Wikipedia editing case study in a medical school is especially relevant because readers use the information to inform their own healthcare decisions.

Historically, many Wikipedia outreach projects have focused on reporting Wikipedia participation. This study highlighted the impact to readers by tracking Wikipedia pageviews of the articles which were edited by the students. Although a comparison to other publishing channels was out of scope of the study, the paper does provocatively ask if a student-written article “garners over 100,000 views/ month, might those edits constitute the greatest contribution to the medical literature in that student’s nascent career?”

Following the paper’s publication, the authors made the following calls to action:

First, they would like Wikipedians to support instructors in considering class projects which include student Wikipedia editing. When an instructor and students can accept the time involved in the Wikipedia Education Program, students get practical experience in new media publishing; Wikipedia editors get good information to process; professors get knowledge shared more broadly; the school gets prestige for making a real-world impact; and, Wikipedia readers get improved information in the articles they read.

Second, they would like ask whether any method exists which is more efficient to share general interest information than Wikipedia. Right now, Wikipedia’s significance is still doubted in corners of education, publishing, and the media. Despite those doubts, no other organization reaches a larger or more relevant audience than Wikipedia in medicine, or likely any other field for that matter.

Azzam, Amin; Bresler, David; Leon, Armando; Maggio, Lauren; Whitaker, Evans; Heilman, James; Orlowitz, Jake; Swisher, Valerie; Rasberry, Lane; Otoide, Kingsley; Trotter, Fred; Ross, Will; McCue, Jack D. (2016). “Why Medical Schools Should Embrace Wikipedia”Academic Medicine: 1. doi:1097/ACM.0000000000001381ISSN1040-2446.

Lane Rasberry, Wikipedian in Residence
Consumer Reports

by Lane Rasberry at September 28, 2016 08:36 PM

Wiki Education Foundation

The Roundup: Happy Birthday, Antibiotics!

On September 28, 1928, Alexander Fleming came into work and discovered that he’d left the window open, changing the course of modern medicine.

The window had allowed some outside air to land in a stack of staphylococci cultures he’d been researching. A fungus had grown and, around that fungus, colonies of staphylococci had been destroyed. Meanwhile, colonies further away from the fungus were doing just fine. Fleming had discovered the effects of Penicillin, and inadvertently gave rise to the entire field of antibiotics. Antibiotics are humankind’s great hope against a lifeform that far outnumbers us: bacteria.

To celebrate this serendipitous discovery, we’re sharing great student work on the topic of mold, bacteria, and antibiotics.

From James Scott’s University of Toronto course, “Medical and Veterinary Mycology,” come two examples of student work: a plant fungus, and a mold that can infect animal and human tissue in the central nervous system.

Antibiotics kill bacteria. Thanks to students from Dr. Cameron Thrash’s Prokaryotic Diversity course at LSU, we can learn more about the other side. One student wrote about the Legionella jordanis bacteria, which is found in sewage and can cause respiratory problems. while another student developed an article on Streptobacillus moniliformis, a bacteria spread through contact with rats and linked to “rat-bite fever.”

Also from that course, we find an article about Parachlamydia acanthamoebae, a bacteria that has shown antibiotic resistance.

And, finally, a student shared information about a bacteria that produces antibiotics.

Thanks to Alexander Fleming for his accidental discovery of antibiotics, and to these students for sharing the knowledge that’s been made possible as a result of that discovery.


Photo: Penicilina, by Tomaz Silva/Agência Brasil, CC BY 3.0 (br).

by Eryk Salvaggio at September 28, 2016 04:00 PM

September 27, 2016

Wikimedia Foundation

When it comes to shipwrecks raised off the seafloor, Peter Isotalo may be Wikipedia’s resident aficionado

Photo by JL0312, CC BY-SA 4.0.

Photo by JL0312, CC BY-SA 4.0.

On a generally calm late summer day in 1628, a wooden warship—one of the largest to ever sail—left Stockholm on its first voyage. Named Vasa, the vessel was the culmination of an effort to solidify the fledgling Swedish Empire‘s control over the Baltic Sea.

Unfortunately for the Swedish, however, Vasa heeled over and sank in full view of a crowd of people who had gathered to witness the occasion. 30 people died in the disaster, caused by what the King of Sweden called “imprudence and negligence.”

Vasa spent 333 years at the bottom of the Baltic. It was refloated in 195961, after a complex and difficult operation conducted by maritime archaeologists.

Today, Vasa is a museum ship in Stockholm. According to its Wikipedia article, the ship “has become a widely recognized symbol of the Swedish ‘great power period’ and is … a de facto standard in the media and among Swedes for evaluating the historical importance of shipwrecks.”

Map, public domain/CC0.Map, public domain/CC0.

The man who wrote those words, Peter Isotalo, rewrote the ship’s English-language Wikipedia article with Henrik, another Wikipedia editor. Isotalo says that the ship is a time capsule, one that “offers an insight into a completely different, lost world. It has a physical presence that makes it easy to comprehend and size up even for non-nerds like myself.”

Thanks in large part to Isotalo and Henrick, Vasa is a “featured” article on Wikipedia, a quality marker which recognizes the encyclopedia’s “very best work” and is “distinguished by professional standards of writing, presentation, and sourcing.”

Drawing from the Anthony Roll, public domain/CC0.

Mary Rose seen in the Anthony Roll, public domain/CC0.

Mary Rose, closer to the present day. Photo via the Mary Rose Trust, CC BY-SA 3.0.

Mary Rose, closer to the present day. Photo via the Mary Rose Trust, CC BY-SA 3.0.

After Vasa, Isotalo continued onto another lost warship raised hundreds of years after being sunk—England’s Mary Rose. Isotalo estimates that he has spent hundreds of hours on the two articles. “When I get hooked on a particular topic and really go for it,” he says, “I tend to pursue pretty much every lead I can get a hold of.”

This sort of obstinacy has helped in ensuring that the articles are complete; he points to the “Causes of sinking/Modern theories” section in Mary Rose to make this point. “I spent quite a lot of time tracking down different perspectives and made sure to check up even on fairly obscure references,” he said, including “a rather minor (but important) critical note about potential eyewitness bias from Maurice de Brossard in a 1984 issue of Mariner’s Mirror.”

Extensive research, however, posed its own set of problems. “The history of [Vasa and Mary Rose] is often subsumed under layers of dramatic storytelling,” Isotalo told me. “There is a tendency to fit into a rather nationalist historical narrative—especially with Vasa, where the history of the ship itself has been presented through the perspective of being the personal property of an absolute monarch, which is clearly not true.” He continued:

Photo, public domain/CC0.

Photo, public domain/CC0.

“The modern discovery of Vasa in the 1950s is often portrayed as the work of a single man (Anders Franzén, pictured at right), and previous knowledge of the ship’s location has been largely ignored or glossed over. The decision to salvage the ship is also portrayed as something more or less self-evident, though it really wasn’t. Today, most maritime archaeologists would consider it an unnecessary (and extremely costly) risk to salvage entire shipwrecks. To this day, there aren’t even rough estimates of what it actually cost to salvage Vasa, or who footed the bill.”

Born in 1980 in the-then Soviet Union, Isotalo moved to Sweden as a child. His interest in maritime history was kindled during this time, as he was able to visit the Vasa Museum and later work in the Vasa Museum’s gift shop. The latter experience came in handy when writing Vasa‘s Wikipedia article, as he had easy access to the museum’s staff—including its Director of Research Fred Hocker, who Isotalo called “one of the leading experts on Vasa.” These individuals were able to give him assistance with the history of the ship, what was happening in Sweden and its navy around that time, the recovery of the ship from the bottom of the sea, and how it has been preserved since then.

Isotalo’s connections and Wikipedia work were also useful in obtaining a set of 57 images from the Mary Rose Trust, the charitable organization that runs the Mary Rose museum and is charged with preserving the ship’s remains.

Photo by Јакша, public domain/CC0.

Vasa’s stern. Photo by Јакша, public domain/CC0.

Isotalo’s interest in maritime history on Wikipedia has continued even after writing about Vasa and Mary Rose, manifesting itself in several more featured and good-quality articles:

  • Anthony Rolla preserved inventory of ships in the English Navy in the 1540s, complete with illustrations;
  • Kronan, another sunken Swedish wooden warship discovered in the 1950s (but not raised to the surface);
  • Battle of Öland, where Kronan—the admiral’s flagship—was sunk;
  • Udema and turuma, two ship types used by the Swedish archipelago fleet in the eighteenth to nineteenth centuries.

When he’s not editing Wikipedia, Isotalo is a trained records manager/archivist with a bachelor’s degree in history. He describes himself as a civil servant/bureaucrat of the Weberian variety, and works for the Swedish Committee for Afghanistan, a foreign aid non-governmental organization (NGO).

Ed Erhart, Editorial Associate
Wikimedia Foundation

 

by Ed Erhart at September 27, 2016 09:02 PM

Wiki Loves Monuments

UNESCO and Wikimedia collaborate to promote built cultural heritage

rsz_banneren-wlm_iran

Wiki Loves Monuments is proud to be supported by UNESCO through its Unite4Heritage program.

Unite4Heritage is a global movement powered by UNESCO that aims to celebrate and safeguard cultural heritage and diversity around the world. Launched in response to the unprecedented recent attacks on heritage, the campaign calls on everyone to stand up against sectarian propaganda by celebrating the places, objects and cultural traditions that make the world such a rich and vibrant place.

As part of Unite4Heritage, UNESCO is supporting Wiki Loves Monuments on social media through September by using amazing images entered into previous competitions.

Wiki Loves Monuments is the largest photography competition in the world, giving people in 41 countries the opportunity share their built cultural heritage. The competition is run by hundreds of volunteers who want to educate and inspire people about built cultural heritage.  It aligns with the goals of Unite4Heritage by celebrating and raising awareness of built heritage with the 500 million people who visit Wikipedia each month.

 

english-wlm_thailand

Photographs entered into Wiki Loves Monuments are available under open licenses so that they can be used by everyone. UNESCO strongly supports the creation of open license content, giving free access to information and unrestricted use of electronic data for everyone.
Many of the photos will be added Wikipedia articles by the tens of thousands of Wikimedia community volunteers who create it.

 

english-wlm_italy

You can take part in Wiki Loves Monuments by exploring, sharing and photographing the built heritage that is important to you with the world and encourage others to take part by sharing social media messages on Facebook, Twitter and Instagram from UNESCO, the Wikimedia Foundation, and others. Wiki Loves Monuments is running in 41 countries around the world from 1 to 30 of September 2016.

 

(This blog post was prepared by John Cummings, Wikimedian in Residence at UNESCO.)

 

by Leila at September 27, 2016 04:46 PM

September 26, 2016

Luis Villa

Public licenses and data: So what to do instead?

I just explained why open and copyleft licensing, which work fairly well in the software context, might not be legally workable, or practically a good idea, around data. So what to do instead? tl;dr: say no to licenses, say yes to norms.

"Day 43-Sharing" by A. David Holloway, under CC BY 2.0.
Day 43-Sharing” by A. David Holloway, under CC BY 2.0.

Partial solutions

In this complex landscape, it should be no surprise that there are no perfect solutions. I’ll start with two behaviors that can help.

Education and lawyering: just say no

If you’re reading this post, odds are that, within your organization or community, you’re known as a data geek and might get pulled in when someone asks for a new data (or hardware, or culture) license. The best thing you can do is help explain why restrictive “public” licensing for data is a bad idea. To the extent there is a community of lawyers around open licensing, we also need to be comfortable saying “this is a bad idea”.

These blog posts, to some extent, are my mea culpa for not saying “no” during the drafting of ODbL. At that time, I thought that if only we worked hard enough, and were creative enough, we could make a data license that avoided the pitfalls others had identified. It was only years later that I finally realized there were systemic reasons why we were doomed, despite lots of hard work and thoughtful lawyering. These posts lay out why, so that in the future I can say no more efficiently. Feel free to borrow them when you also need to say no :)

Project structure: collaboration builds on itself

When thinking about what people actually want from open licenses, it is important to remember that how people collaborate is deeply impacted by factors of how your project is structured. (To put it another way, architecture is also law.) For example, many kernel contributors feel that the best reason to contribute your code to the Linux kernel is not because of the license, but because the high velocity of development means that your costs are much lower if you get your features upstream quickly. Similarly, if you can build a big community like Wikimedia’s around your data, the velocity of improvements is likely to reduce the desire to fork. Where possible, consider also offering services and collaboration spaces that encourage people to work in public, rather than providing the bare minimum necessary for your own use. Or more simply, spend money on community people, rather than lawyers! These kinds of tweaks can often have much more of an impact on free-riding and contribution than any license choice. Unfortunately, the details are often project specific – which makes it hard to talk about in a blog post! Especially one that is already too long.

Solving with norms

So if lawyers should advise against the use of data law, and structuring your project for collaboration might not apply to you, what then? Following Peter Desmet, Science Commons, and others, I think the right tool for building resilient, global communities of sharing (in data and elsewhere) is written norms, combined with a formal release of rights.

Norms are essentially optimistic statements of what should be done, rather than formal requirements of what must be done (with the enforcement power of the state behind them). There is an extensive literature, pioneered by Nobelist Elinor Ostrom, on how they are actually how a huge amount of humankind’s work gets done – despite the skepticism of economists and lawyers. Critically, they often work even without the enforcement power of the legal system. For example, academia’s anti-plagiarism norms (when buttressed by appropriate non-legal institutional supports) are fairly successful. While there are still plagiarism problems, they’re fairly comparable to the Linux kernel’s GPL-violation problems – even though, unlike GPL, there is no legal enforcement mechanisms!

Norms and licenses have similar benefits

In many key ways, norms are not actually significantly different than licenses. Norms and licenses both can help (or hurt) a community reach their goals by:

  • Educating newcomers about community expectations: Collaboration requires shared understanding of the behavior that will guide that collaboration. Written norms can create that shared expectation just as well as licenses, and often better, since they can be flexible and human-readable in ways legally-binding international documents can’t.
  • Serving as the basis for social pressure: For the vast majority of collaborative projects, praise, shame, and other social nudges, not legal threats, are the actual basis for collaboration. (If you need proof of this, consider the decades-long success of open source before any legal enforcement was attempted.) Again, norms can serve this role just as well or not better, since it is often desire to cooperate and a fear of shaming that are what actually drive collaboration.
  • Similar levels of enforcement: While you can’t use the legal system to enforce a norm, most people and organizations also don’t have the option to use the legal system to enforce licenses – it is too expensive, or too time consuming, or the violator is in another country, or one of many other reasons why the legal system might not be an option (especially in data!) So instead most projects result to tools like personal appeals or threats of publicity – tools that are still available with norms.
  • Working in practice (usually): As I mentioned above, basing collaboration on social norms, rather than legal tools, work all the time in real life. The idea that collaboration can’t occur without the threat of legal sanction is really a somewhat recent invention. (I could actually have listed this under differences – since, as Ostrom teaches us, legal mechanisms often fail where norms succeed, and I think that is the case in data too.)

Why are norms better?

Of course, if norms were merely “as good as” licenses in the ways I just listed, I probably wouldn’t recommend them. Here are some ways that they can be better, in ways that address some of the concerns I raised in my earlier posts in this series:

  • Global: While [building global norms is not easy](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3038591/), social norms based on appeals to the very human desires for collaboration and partnership can be a lot more global than the current schemes for protecting database or hardware rights, which aren’t international. (You can try to fake internationalization through a license, but as I pointed out in earlier posts, that is likely to fail legally, and be ignored by exactly the largest partners who you most want to get on board.)
  • Flexible: Many of the practical problems with licenses in data space boil down to their inflexibility: if a license presumes something to be true, and it isn’t, you might not be able to do anything about it. Norms can be much more generous – well-intentioned re-users can creatively reinterpret the rules as necessary to get to a good outcome, without having to ask every contributor to change the license. (Copyright law in the US provides some flexibility through fair use, which has been critical in the development of the internet. The EU does not extend such flexibility to data, though member states can add some fair dealing provisions if they choose. In neither case are those exceptions global, so they can’t be relied on by collaborative projects that aim to be global in scope.)
  • Work against, not with, the permission culture: Lessig warned us early on about “permission culture” – the notion that we would always need to ask permission to do anything. Creative Commons was an attempt to fight it, but by being a legal obligation, rather than a normative statement, it made a key concession to the permission culture – that the legal system was the right terrain to have discussions about sharing. The digital world has pretty whole-heartedly rejected this conclusion, sharing freely and constantly. As a result, I suspect a system that appeals to ethical systems has a better chance of long-term sustainability, because it works with the “new” default behavior online rather than bringing in the heavy, and inflexible, hand of the law.

Why you still need a (permissive) license

Norms aren’t enough if the underlying legal system might allow an early contributor to later wield the law as a threat. That’s why the best practice in the data space is to use something like the Creative Commons public domain grant (CC-Zero) to set a clear, reliable, permissive baseline, and then use norms to add flexible requirements on top of that. This uses law to provide reliability and predictability, and then uses norms to address concerns about fairness, free-riding, and effectiveness. CC-Zero still isn’t perfect; most notably it has to try to be both a grant and a license to deal with different international rules around grants.

What next?

In this context, when I say “norms”, I mean not just the general term, but specifically written norms that can act as a reference point for community members. In the data space, some good examples are DPLA’s “CCO-BY” and the Canadensys biodiversity initiative. A more subtle form can be found buried in the terms for NIH’s Clinical Trials database. So, some potential next steps, depending on where your collaborative project is:

  • If your community has informal norms (“attribution good! sharing good!”) consider writing them down like the examples above. If you’re being pressed to adopt a license (hi, Wikidata!), consider writing down norms instead, and thinking creatively about how to name and shame those who violate those norms.
  • If you’re an organization that publishes licenses, consider using your drafting prowess to write some standard norms that encapsulate the same behaviors without the clunkiness of database (or hardware) law. (Open Data Commons made some moves in this direction circa 2010, and other groups could consider doing the same.)
  • If you’re an organization that keeps getting told that people won’t participate in your project because of your license, consider moving towards a more permissive license + a norm, or interpreting your license permissively and reinforcing it with norms.

Good luck! May your data be widely re-used and contributors be excited to join your project.

by Luis Villa at September 26, 2016 03:00 PM

Resident Mario

Joseph Reagle

Nerd vs. 'bro': Geek privilege, triumphalism, and idiosyncrasy

I've been working on geek meritocracy and privilege for a while now and my original draft has now been split into two. The meritocracy piece will be published early next year, and I've just finished the first draft of: Nerd vs. 'bro': Geek privilege, triumphalism, and idiosyncrasy

ABSTRACT: Peggy McIntosh characterized privilege as an “invisible knapsack” of unearned advantages. Although the invisible knapsack is a useful metaphor, the notion of unearned advantage is not readily appreciated, especially by geeks who see their culture as meritocratic. After providing brief cultural histories of geekdom and privilege, I ask: Why are some geeks resistant to the notion of privilege? Beyond the observation that privilege often prompts defensiveness and unproductive comparisons, there is a geek-specific reason. Geek identity is informed by the trope of geek triumphalism: early insecurity is superseded by a sense of superiority. Geeks’ intelligence, unconventional enthusiasms (e.g., technology and fantasy), and idiosyncratic dress were once targets of ridicule, leading triumphant geeks to believe they have no privilege. These same characteristics, later in life, become sources of success and pride, leading them to think they are beyond bias. Nonetheless, I show that even in the seemingly innocuous realm of idiosyncratic dress, there is bias and privilege.

Comments below or in email are welcome and appreciated.

by Joseph Reagle at September 26, 2016 04:00 AM

Tech News

Tech News issue #39, 2016 (September 26, 2016)

TriangleArrow-Left.svgprevious 2016, week 39 (Monday 26 September 2016) nextTriangleArrow-Right.svg
Other languages:
العربية • ‎български • ‎čeština • ‎English • ‎español • ‎suomi • ‎français • ‎עברית • ‎italiano • ‎Lëtzebuergesch • ‎norsk bokmål • ‎polski • ‎português • ‎português do Brasil • ‎русский • ‎shqip • ‎svenska • ‎українська • ‎Tiếng Việt • ‎中文

September 26, 2016 12:00 AM

September 24, 2016

Resident Mario

September 23, 2016

Wikimedia Foundation

#100wikidays challenge inspires thousands of new Wikipedia articles

Photo by Satdeep Gill, CC BY-SA 4.0.

Photo by Satdeep Gill, CC BY-SA 4.0.

Wikipedia editor Arminé Aghayan, from Vedi in the Ararat Valley, Armenia, was at Wikimania 2015 in Mexico City when she first heard about the #100wikidays challenge.

Aghayan had already created over 2,700 articles on Wikipedia, but this idea gripped her. She created her first challenge article a few days after returning home. “I love challenges in general because they inspire you to collaborate with other Wikipedians. You get to know them and their interests, and they learn about what drives you,” Aghayan explains.

Aghayan was the first Armenian to complete the challenge. She was proud when she received a message from an Armenian WikiCamp alumni to inquire about #100wikidays. Surprisingly, many of the alum’s fellows had learned about the challenge and followed his lead. It quickly  went viral in the Armenian Wikipedia community.

This is just one inspirational example of over 180 Wikipedians who took on the #100wikidays challenge within the past year and half. Forty-seven 100wikidayers have met the challenge’s target and at least 7,700 articles have been created as a result.

“Everybody is free to adapt the challenge according to their lifestyle,” says Vassia Atanassova, a Bulgarian Wikipedian who came up with the #100wikidays challenge. “There is no such thing as failure in the challenge. It is all about fun and creating good content.”

Some participants in the challenge found it entertaining enough that they did not stop contributing articles on a daily basis after the 100-day period concluded. Nat Tymkiv, a Wikipedian from Ukraine and a member of the Wikimedia Foundation’s Board of Trustees, had completed the full course of the challenge on the Ukrainian Wikipedia, then started and completed another 100-day challenge on Wikiquote. Nat wanted to accomplish a third challenge on Wikivoyage, but that proved difficult.

“My life is really busy, so editing Wikipedia can’t always be a priority,” says Tymkiv. “But I still wanted to try to make this happen. It was a real challenge for me, and I enjoyed it very much. It involved a lot more time management, or what felt like pulling off miracles, in some cases.”

But having fun isn’t always easy. “100wikidays really requires total devotion, which I sometimes lack,” Atanassova admits. “Some days I ask myself what kind of idiot invented this nonsense.”

The simple but smart idea has inspired Rebecca O’Neill, from Ireland, to both participate in the challenge and use her experience to inform her autoethnographic PhD research. Her research focuses on how terms like “curation” and “curator” have changed over the years.

“I’m interested in how both professional and citizen curators see their own work and how they evaluate the work of others,” explains O’Neill. “My own 100wikidays experience has allowed me to explore the motivations and emotions behind this engaging work.”

The number of people joining this venture expands rapidly every day. It’s difficult but enjoyable for most contributors. If you feel inspired to participate, you can start today—head to this talk page or this group on Facebook if you would like to engage with others about joining the challenge.

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at September 23, 2016 08:59 PM

UNESCO and Wikimedia collaborate to promote built cultural heritage

Photo by Mohammad Reza Domiri Ganji, modified by UNESCO, CC BY-SA 4.0.

Photo by Mohammad Reza Domiri Ganji, modified by UNESCO, CC BY-SA 4.0.

Unite4Heritage is a global movement powered by UNESCO that aims to celebrate and safeguard cultural heritage and diversity around the world. Launched in response to the unprecedented recent attacks on heritage, the campaign calls on everyone to stand up against sectarian propaganda by celebrating the places, objects and cultural traditions that make the world such a rich and vibrant place.

As part of Unite4Heritage, UNESCO is supporting Wiki Loves Monuments on social media through September by using amazing images entered into previous competitions.

Wiki Loves Monuments is the largest photography competition in world, giving people in 41 countries the opportunity share their built cultural heritage. The competition is run by hundreds of volunteers who want educate and inspire people about built cultural heritage.  It aligns with the goals of Unite4Heritage by celebrating and raising awareness of built heritage with the 500 million people who visit Wikipedia each month.

Photo by Siripatwongpin, modified by UNESCO, CC BY-SA 4.0.

Photo by Siripatwongpin, modified by UNESCO, CC BY-SA 4.0.

Photographs entered into Wiki Loves Monuments are available under open licenses so that they can be used by everyone. UNESCO strongly supports the creation of open license content, giving free access to information and unrestricted use of electronic data for everyone.

Many of the photos will be added to Wikipedia articles by the tens of thousands of Wikimedia community volunteers who create and curate the sites.

Photo by Elio Pallard, modified by UNESCO, CC BY-SA 4.0.

Photo by Elio Pallard, modified by UNESCO, CC BY-SA 4.0.

You can take part in Wiki Loves Monuments by exploring, sharing and photographing the built heritage that is important to you with the world; you can also encourage others to take part by sharing social media messages on Facebook, Twitter and Instagram from UNESCO, the Wikimedia Foundation, and others. Wiki Loves Monuments is running in 41 countries from 1–30 September 2016.

John Cummings, Wikimedian-in-residence
UNESCO

You can see more about UNESCO’s work on Wikimedia sites at WikiProject UNESCO.

by John Cummings at September 23, 2016 08:40 PM

Shyamal

Tracing some ornithological roots

The years 1883-1885 were tumultuous in the history of zoology in India. A group called the Simla Naturalists' Society was formed in the summer of 1885. The founding President of the Simla group was, oddly enough, Courtenay Ilbert - who some might remember for the Ilbert Bill which allowed Indian magistrates to make judgements on British subjects. Another member of this Simla group was Henry Collett who wrote a Flora of the Simla region (Flora Simlensis). This Society vanished without much of a trace. A slightly more stable organization was begun in 1883, the Bombay Natural History Society. The creation of these organizations was precipitated by the emergence of a gaping hole. A vacuum was created with the end of an India-wide correspondence network of naturalists that was fostered by a one-man-force - that of A. O. Hume. The ornithological chapter of Hume's life begins and ends in Shimla. Hume's serious ornithology began around 1870 and he gave it all up in 1883, after the loss of years of carefully prepared manuscripts for a magnum opus on Indian ornithology, damage to his specimen collections and a sudden immersion into Theosophy which also led him to abjure the killing of animals, taking to vegetarianism and subsequently to take up the cause of Indian nationalism. The founders of the BNHS included Eha (E. H. Aitken was also a Hume/Stray Feathers correspondent), J.C. Anderson (who was a Simla naturalist) and Phipson (who was from a wine merchant family with a strong presence in Simla).

Shimla then was where Hume rose in his career (as Secretary of State, before falling) allowing him to work on his hobby project of Indian ornithology by bringing together a large specimen collection and conducting the publication of Stray Feathers. Through readings, I had a constructed a fairytale picture of the surroundings that he lived in. Richard Bowdler Sharpe, a curator at the British Museum who came to Shimla in 1885 wrote (his description  is well worth reading in full):
... Mr. Hume who lives in a most picturesque situation high up on Jakko, the house being about 7800 feet above the level of the sea. From my bedroom window I had a fine view of the snowy range. ... at last I stood in the celebrated museum and gazed at the dozens upon dozens of tin cases which filled the room ... quite three times as large as our meeting-room at the Zoological Society, and, of course, much more lofty. Throughout this large room went three rows of table-cases with glass tops, in which were arranged a series of the birds of India sufficient for the identification of each species, while underneath these table-cases were enormous cabinets made of tin, with trays inside, containing series of the birds represented in the table-cases above. All the specimens were carefully done up in brown-paper cases, each labelled outside with full particulars of the specimen within. Fancy the labour this represents with 60,000 specimens! The tin cabinets were all of materials of the best quality, specially ordered from England, and put together by the best Calcutta workmen. At each end of the room were racks reaching up to the ceiling, and containing immense tin cases full of birds. As one of these racks had to be taken down during the repairs of the north end of the museum, the entire space between the table-cases was taken up by the tin cases formerly housed in it, so that there was literally no space to walk between the rows. On the western side of the museum was the library, reached by a descent of three stops—a cheerful room, furnished with large tables, and containing, besides the egg-cabinets, a well-chosen set of working volumes. ... In a few minutes an immense series of specimens could be spread out on the tables, while all the books were at hand for immediate reference. ... we went below into the basement, which consisted of eight great rooms, six of them full, from floor to ceilings of cases of birds, while at the back of the house two large verandahs were piled high with cases full of large birds, such as Pelicans, Cranes, Vultures, &c.
I was certainly not hoping to find Hume's home as described but the situation turned out to be a lot worse. The first thing I did was to contact Professor Sriram Mehrotra, a senior historian who has published on the origins of the Indian National Congress. Prof. Mehrotra explained that Rothney Castle had long been altered with only the front facade retained along with the wood-framed conservatories. He said I could go and ask the caretaker for permission to see the grounds. He was sorry that he could not accompany me as it was physically demanding and he said that "the place moved him to tears." Professor Mehrotra also told me about how he had decided to live in Shimla simply because of his interest in Hume! I left him and walked to Christ Church and took the left branch going up to Jakhoo with some hopes. I met the caretaker of Rothney Castle in the garden where she was walking her dogs on a flat lawn, probably the same garden at the end of which there once had been a star-shaped flower bed, scene of the infamous brooch incident with Madame Blavatsky (see the theosophy section in Hume's biography on Wikipedia). It was a bit of a disappointment however as the caretaker informed me that I could not see the grounds unless the owner who lived in Delhi permitted it. Rothney Castle has changed hands so many times that it probably has nothing to match with what Bowdler-Sharpe saw and the grounds may very soon be entirely unrecognizable but for the name plaque at the entrance. Another patch of land in front of Rothney Castle was being prepared for what might become a multi-storeyed building. A botanist friend had shown me a 19th century painting of Shimla made by Constance Frederica Gordon-Cumming. In her painting, the only building visible on Jakko Hill behind Christ Church is Rothney Castle. The vegetation on Shimla has definitely become denser with trees blocking the views.
 
So there ended my hopes of adding good views (free-licensed images are still misunderstood in India) of Rothney Castle to the Wikipedia article on Hume. I did however get a couple of photographs from the roadside. In 2014, I managed to visit the South London Botanical Institute which was the last of Hume's enterprises. This visit enabled the addition a few pictures of his herbarium collections as well as an illustration of his bookplate which carries his personal motto.

Clearly Shimla empowered Hume, provided a stimulating environment which included several local collaborators. Who were his local collaborators in Shimla? I have only recently discovered (and notes with references are now added to the Wikipedia entry for R. C. Tytler) that Robert (of Tytler's warbler fame - although named by W E Brooks) and Harriet Tytler (of Mt. Harriet fame) had established a kind of natural history museum at Bonnie Moon in Shimla with  Lord Mayo's support. The museum closed down after Robert's death in 1872, and it is said that Harriet offered the bird specimens to the government. It would appear that at least some part of this collection went to Hume. It is said that the collection was packed away in boxes around 1873. The collection later came into possession of Mr B. Bevan-Petman who apparently passed it on to the Lahore Central Museum in 1917.

Hume's idea of mapping rainfall
to examine patterns of avian distribution
It was under Lord Mayo that Hume rose in the government hierarchy. Hume was not averse to utilizing his power as Secretary of State to further his interests in birds. He organized the Lakshadweep survey with the assistance of the navy ostensibly to examine sites for a lighthouse. He made use of government machinery in the fisheries department (Francis Day) to help his Sind survey. He used the newly formed meteorological division of his own agricultural department to generate rainfall maps for use in Stray Feathers. He was probably the first to note the connection between rainfall and bird distributions, something that only Sharpe saw any special merit in. Perhaps placing specimens on those large tables described by Sharpe allowed Hume to see geographic trends.

Hume was also able to appreciate geology (in his youth he had studied with Mantell ), earth history and avian evolution. Hume had several geologists contributing to ornithology including Stoliczka and Ball. One wonders if he took an interest in paleontology given his proximity to the Shiwalik ranges. Hume invited Richard Lydekker to publish a major note on avian osteology for the benefit of amateur ornithologists. Hume also had enough time to speculate on matters of avian biology. A couple of years ago I came across this bit that Hume wrote in the first of his Nests and Eggs volumes (published post-ornith-humously in 1889):

Nests and Eggs of Indian birds. Vol 1. p. 199
I wrote immediately to Tim Birkhead, the expert on evolutionary aspects of bird reproduction and someone with an excellent view of ornithological history (his Ten Thousand Birds is a must read for anyone interested in the subject) and he agreed that Hume had been an early and insightful observer to have suggested female sperm storage.

Shimla life was clearly a lot of hob-nobbing and people like Lord Mayo were spending huge amounts of time and money just hosting parties. Turns out that Lord Mayo even went to Paris to recruit a chef and brought in an Italian,  Federico Peliti. (His great-grandson has a nice website!) Unlike Hume, Peliti rose in fame after Lord Mayo's death by setting up a cafe which became the heart of Shimla's social life and gossip. Lady Lytton (Lord Lytton was the one who demoted Hume!) recorded that Simla folk "...foregathered four days a week for prayer meetings, and the rest of the time was spent in writing poisonous official notes about each other." Another observer recorded that "in Simla you could not hear your own voice for  the grinding of axes. But in 1884 the grinders were few. In the course of my service I saw much of Simla society,  and I think it would compare most favourably with any other town of English-speaking people of the same size. It was bright and gay. We all lived, so to speak, in glass houses. The little bungalows perched on the mountainside wherever there was a ledge, with their winding paths under the pine trees, leading to our only road, the Mall." (Lawrence, Sir Walter Roper (1928) The India We Served.)

A view from Peliti's (1922).
Peliti's other contribution was in photography and it seems like he worked with Felice Beato who also influenced Harriet Tytler and her photography. I asked a couple of Shimla folks about the historic location of Peliti's cafe and they said it had become the Grand Hotel (now a government guest house). I subsequently found that Peliti did indeed start Peliti's Grand Hotel, which was destroyed in a fire in 1922, but the centre of Shimla's social life, his cafe, was actually next to the Combermere Bridge (it ran over a water storage tank and is today the location of the lift that runs between the Mall and the Cart Road). A photograph taken from "Peliti's" clearly lends support for this location as do descriptions in Thacker's New Guide to Simla (1925). A poem celebrating Peliti's was published in Punch magazine in 1919. Rudyard Kipling was a fan of Peliti's but Hume was no fan of Kipling (Kipling seems to have held a spiteful view of liberals - "Pagett MP" has been identified by some as being based on W.S.Caine, a friend of Hume; Hume for his part had a lifelong disdain for journalists. Kipling's boss, E.K. Robinson started the British Naturalists' Association while E.K.R.'s brother Philip probably influenced Eha.

While Hume most likely stayed well away from Peliti's, we see that a kind of naturalists social network existed within the government. About Lord Mayo we read: 
Lord Mayo and the Natural History of India - His Excellency Lord Mayo, the Viceroy of India, has been making a very valuable collection of natural historical objects, illustrative of the fauna, ornithology, &c., of the Indian Empire. Some portion of these valuable acquisitions, principally birds and some insects, have been brought to England, and are now at 49 Wigmore Street, London, whence they will shortly be removed. - Pertshire Advertiser, 29 December 1870.
Another news report states:
The Early of Mayo's collection of Indian birds, &c.

Amids the cares of empire, the Earl of Mayo, the present ruler of India, has found time to form a valuable collection of objects illustrative of the natural history of the East, and especially of India. Some of these were brought over by the Countess when she visited England a short time since, and entrusted to the hands of Mr Edwin Ward, F.Z.S., for setting and arrangement, under the particular direction of the Countess herself. This portion, which consists chiefly of birds and insects, was to be seen yesterday at 49, Wigmore street, and, with the other objects accumulated in Mr Ward's establishment, presented a very striking picture. There are two library screens formed from the plumage of the grand argus pheasant- the head forward, the wing feathers extended in circular shape, those of the tail rising high above the rest. The peculiarities of the plumage hae been extremely well preserved. These, though surrounded by other birds of more brilliant covering, preserved in screen pattern also, are most noticeable, and have been much admired. There are likewise two drawing-room screens of smaller Indain birds (thrush size) and insects. They are contained in glass cases, with frames of imitation bamboo, gilt. These birds are of varied and bright colours, and some of them are very rare. The Countess, who returned to India last month, will no doubt,add to the collection when she next comes back to England, as both the Earl and herself appear to take a great interest in Illustrating the fauna and ornithology of India. The most noticeable object, however, in Mr. Ward's establishment is the representation of a fight between two tigers of great size. The gloss, grace, and spirit of the animals are very well preserved. The group is intended as a present to the Prince of Wales. It does not belong to the Mayo Collection. - The Northern Standard, January 7, 1871
And Hume's subsequent superior was Lord Northbrook about whom we read:
University and City Intelligence. - Lord Northbrook has presented to the University a valuable collection of skins of the game birds of India collected for him by Mr. A.O.Hume, C.B., a distinguished Indian ornithologist. Lord Northbrook, in a letter to Dr. Acland, assures him that the collection is very perfec, if not unique. A Decree was passed accepting the offer, and requesting the Vice-Chancellor to convey the thanks of the University to the donor. - Oxford Journal, 10 February 1877
Papilio mayo
Clearly Lord Mayo and his influence on naturalists in India is not sufficiently well understood. Perhaps that would explain the beautiful butterfly named after him shortly after his murder. It appears that Hume did not have this kind of hobby association with Lord Lytton, little wonder perhaps that he fared so badly!

Despite Hume's sharpness on many matters there were bits that come across as odd. In one article on the flight of birds he observes the soaring of crows and vultures behind his house as he sits in the morning looking towards Mahassu. He points out that these soaring birds would appear early on warm days and late on cold days but he misses the role of thermals and mixes physics with metaphysics, going for a kind of Grand Unification Theory:

And then claims that crows, like saints, sages and yogis are capable of "aethrobacy".
This naturally became a target of ridicule. We have already seen the comments of E.H. Hankin on this. Hankin wrote that if levitation was achieved by "living an absolutely pure life and intense religious concentration" the hill crow must be indulging in "irreligious sentiments when trying to descend to earth without  the help of gravity." Hankin despite his studies does not give enough credit for the forces of lift produced by thermals and his own observations were critiqued by Gilbert Walker, the brilliant mathematican who applied his mind to large scale weather patterns apart from conducting some amazing research on the dynamics of boomerangs. His boomerang research had begun even in his undergraduate years and had earned him the nickname of Boomerang Walker. On my visit to Shimla, I went for a long walk down the quiet road winding through dense woodland and beside streams to Annandale, the only large flat ground in Shimla where Sir Gilbert Walker conducted his weekend research on boomerangs. Walker's boomerang research mentions a collaboration with Oscar Eckenstein and there are some strange threads connecting Eckenstein, his collaborator Aleister Crowley and Hume's daughter Maria Jane Burnley who would later join the Hermetic Order of the Golden Dawn. But that is just speculation!
1872 Map showing Rothney Castle

The steep road just below Rothney Castle

Excavation for new constructions just below and across the road from Rothney Castle

The embankment collapsing below the guard hut

The lower entrance, concrete constructions replace the old building

The guard hut and home are probably the only heritage structures left


I got back from Annandale and then walked down to Phagli on the southern slope of Shimla to see the place where my paternal grandfather once lived. It is not a coincidence that Shimla and my name are derived from the local deity Shyamaladevi (a version of Kali).


The South London Botanical Institute

After returning to England, Hume took an interest in botany. He made herbarium collections and in 1910 he established the South London Botanical Institute and left money in his will for its upkeep. The SLBI is housed in a quiet residential area. Here are some pictures I took in 2014, most can be found on Wikipedia.


Dr Roy Vickery displaying some of Hume's herbarium specimens

Specially designed cases for storing the herbarium sheets.

The entrance to the South London Botanical Institute

A herbarium sheet from the Hume collection

 
Hume's bookplate with personal motto - Industria et Perseverentia

An ornate clock which apparently adorned Rothney Castle

Further reading
 Postscript

 An antique book shop had a set of Hume's Nests and Eggs (Second edition) and it bore the signature of "R.W.D. Morgan" - it appears that there was a BNHS member of that name from Calcutta c. 1933. It is unclear if it is the same person as Rhodes Morgan, who was a Hume correspondent and forest officer in Wynaad/Malabar who helped William Ruxton Davison.
Update:  Henry Noltie of RBGE has pointed out to me privately that this is not the forester Rhodes Morgan (died 1919!). - September, 2016.

    by Shyamal L. (noreply@blogger.com) at September 23, 2016 09:49 AM

    September 22, 2016

    Wikimedia Foundation

    Why I create beautiful math GIFs

    GIF by Jason Hise, public domain/CC0.

    GIF by Jason Hise, public domain/CC0.

    My first exposure to the concept of higher dimensional space came from reading Flatland in elementary school.  The book used the analogy of beings living in a two dimensional world trying to understand the third dimension in order to convey the type of imagination required for three dimensional beings like us to visualize a fourth.  The concept completely blew my mind; I spent the rest of the day almost in a daze trying to picture a fourth direction extending at a right angle to the three directions I knew.  It would be another decade before I would have the tools needed to visualize the shadows that such four dimensional geometry might project on our 3D world, starting with a humble 4D cube and ranging to the majestic 120-cell; the 4D analog of the dodecahedron.

    GIF by Jason Hise, public domain/CC0.

    GIF by Jason Hise, public domain/CC0.

    My parents raised me to be curious.  Any time I didn’t understand how something worked or something needed to be fixed, it was an opportunity to learn something new.  It was all about creative problem solving; figuring out what worked and what didn’t, and using whatever was available in clever ways.  They made clear that in school, my grades weren’t important to them.  They only cared whether or not I was learning.  They fostered an environment in which I was encouraged to make mistakes in order to fully explore a problem space, and it gave me the confidence to approach everything in my life with that mentality.  Whenever I encountered something that I didn’t fully understand, I would poke and prod at it until I could build a gut level intuition about how it worked.

    In mathematics and geometry, there are often concepts that can be difficult to build intuition about.  Mathematically there is no problem with doing algebra with objects in a higher dimensional space, but using memorized algorithms to manipulate symbols on a page is a far cry from really understanding the underlying concepts that those symbols represent.

    Quaternions are a fantastic example of this.  In computer graphics, programmers often treat them as a magical black box.  They are ‘magic mysterious hyper-dimensional imaginary numbers that can represent 3D rotations’.  This wasn’t acceptable to me—I wanted to know why they worked the way that they did, and what they really represented.  After a significant amount of investigation I was able to build a mental model that was much more simple and intuitive—quaternions are made out of four parts because there are four rotations that are about as different from each other as you can possibly get.  In a way, these rotations really are perpendicular to each other.  You have where you started, where you land after rotating 180 degrees about the x axis, where you land after rotating 180 degrees about the y axis, and where you land after rotating 180 degrees about the z axis.  By taking some blend of each of these rotations you can build any other rotation that you could possibly want.  Quaternions aren’t magic—they are just a list of four blending values that tell you how much of each ‘fundamental’ rotation to use.

    This drive to transform difficult concepts into intuitive tools was one of the major forces that led me to develop mathematical animations for wikipedia.  I wanted to demystify concepts that were too easy to take on faith, and turn them into something that people could really understand.  For instance, take the notion of an object that has ‘spin ½’.  This allegedly describes some sort of object that you have to spin around twice (a full 720 degrees!) before it will get back to where it started.  That just seems like quantum magic unless you have a way to visualize it.

    One day I came across a description in a paper of a device that could be attached to a frame and spin continuously without getting tangled.  It was described in tremendous detail, but I still couldn’t see how it was possible.  So I dug into my closet through some of my childhood toys and tried to build what had been described with k’nex and rubber bands.  I could manipulate it with my hands, rotate it, see that it really did work, and come to understand it.  And amazingly, that little part in the center had to spin around twice to get back to where it started!

    Once I understood how it worked and had a general mental model, I created some animations in an attempt to share that insight.  I made the first one below with manual keyframes and standard animation tools, and then later I wrote code to generate something a little more precise and elegant (the second one).

    GIF by Jason Hise, public domain/CC0.

    GIF by Jason Hise, public domain/CC0.

    GIF by Jason Hise, public domain/CC0.

    GIF by Jason Hise, public domain/CC0.

    Finally, I decided to go all out and see if I could demonstrate that this works with any number of fibers, and that in the limit, a solid piece of continuous space could twist like this without getting tangled.

    GIF by Jason Hise, public domain/CC0.

    GIF by Jason Hise, public domain/CC0.

    GIF by Jason Hise, public domain/CC0.

    GIF by Jason Hise, public domain/CC0.

    I work in the videogame industry as a physics programmer, so I spend a lot of my time using math to make 3D geometry move in an appealing way.  I’m currently working on a game where literally all of the character motion is driven by math—shielding turns you into a block, which I figured out how to do by reading up on alternate ways of measuring distance.  Math and geometry can be both expressive and beautiful, and I want to share that feeling with the world.

    Jason Hise, Wikimedia community member

    by Jason Hise at September 22, 2016 08:12 PM

    Weekly OSM

    weeklyOSM 322

    09/13/2016-09/19/2016

    Logo 1 |

    Mapping

    • Jothirnadh explains in his user diary how one can revert changesets in JOSM that would normally trigger a timeout.
    • Christoph Hormann analyzed data quality in the Canadian Arctic and compared it to Greenland in his blog post entitled “OpenStreetMap at its worst”.
    • Michael Tsang suggests a new relation in his proposal for “Through Services”.
    • Johnw asks how one could tag an amphitheatre.
    • Greg wants to add some more stats for the upcoming quarterly project and asks for help on Github.
    • Edits by Telenav employees with the quality assurance tool ImproveOSM are seen very critically by the Canadian Community. User Mihai Iepure requests to provide a short summary of the emails in English as he does not know French.
    • Adam Old, member of a “Tree Board“ in South Florida asks if they may use OpenStreetMap for their collection of trees and meta data of those trees.
    • Denis Stein asks on the OpenRailwayMap mailing list where exactly and how to map points (switches, turnouts), and makes some suggestions.
    • Jojo4u asks if railway=technical_station is obsoleted by railway=service_station.
    • Analysing MAPS.ME changesets, manoharuss found several typical rookie mistakes but surprisingly little misuse of name=*.
    • Krishna Nammala from Mapbox Data Team reported the German forum on the status of their efforts to “missing turn restrictions” in Germany.
    • Jojo4u created a new proposal for tagging mud flat trails.
    • Srividya from Mapbox Data Team reported a notable data offset of 15-20 meters with respect to GPS Traces in Taiwan. Mapbox has therefore provided new satellite images for the larger cities.

    Community

    • Joe Morris is a cyclist and interested in public drinking water spots and created a map for that. To increase the data he’d like to create a dedicated site to map such locations and therefore asks for feedback.
    • Søren Johannessen notes that OpenStreetMap reached 200 million buildings in its data base.

    Imports

    • Gianmario Mengozzi proposes on the import mailing list a boundary Import in the northern Italian region of Emilia-Romagna from a CC-0-licensed source.

    Maps

    • The application Gnome Maps slowly starts to get usable. After the end of MapQuest Open they started to use Mapbox tiles, they now have aerial imagery, a very basic search function and a routing based on Graphhopper.
    • GeoHipster will again be publishing a calendar in 2017. For that they have asked to send in the maps that might be included in the official calendar.
    • Hans Hack created a poster of the 104 islands in the German capital Berlin.
    • The OSM Carto developers plan to switch to Noto, a new font. They ask for feedback from readers of Asian countries.
    • [1] Thej from India analyzes in his blog, the consideration of Indian languages in OpenStreetMap (Mapnik). Arun Ganesh points among others to the multilingual map by Jochen Topf and his own experiments.

    switch2OSM

    • It is evident, that Pokémon Go uses OSM data.

    Software

    • The OsmAnd maps of September 1st initially ignored turn restrictions. This issue has been fixed, newer maps contain turn restrictions.

    Programming

    • Ircama did publish tutorials on how to setup your environment if you want to participate in OSM Carto development.
    • Andy Allan asks for help to refactor the original ruby on rails code of the OSM-API.
    • Paul Norman would like to register MIME types for our OSM file formats.
    • Geofabrik now provides separate extracts of every country in Africa on their download server.

    Releases

    Software Version Release date Comment
    Nominatim 2.5.1 2016-08-02 Bug fix release with minor fixes
    SQLite 3.14.2 2016-09-12 Six fixes
    Osmose Backend v1.0-2016-09-13 2016-09-13 No info
    QMapShack Lin/Mac/Win 1.7.1 2016-09-14 No infos
    libosmium 2.9.0 2016-09-15 see below
    Osmium Tool 1.4.0 2016-09-15 Eight extensions, six changes and 2 fixes
    Overpass-Turbo 2016-09-15 2016-09-15 Fixed GPX output format, support portuguese, some other fixes
    PyOsmium 2.9.0 2016-09-15 Adjustments to actual libosmium
    Magic Earth * 7.1.16.37 2016-09-16 Eliminates GPS problems, improved audio via Bluetooth, further changes and improvements
    Komoot Android * var 2016-09-17 No info
    Naviki Android * 3.48 2016-09-19 Layout revised
    OsmAnd for Android * 2.4 2016-09-19 Improved user interface, refined POI search
    OsmAnd+ for Android * var 2016-09-19 Improved user interface, refined POI search

    Provided by the OSM Software Watchlist.

    (*) unfree software. See freesoftware.

    Jochen Topf announces the new features of libosmium 2.9.0 and Osmium-Tool 1.4.0 in his blog. The latter now also allows to change tags with sed and generates human readable diffs.

    Did you know …

    Other “geo” things

    • Both Forbes and TechCrunch Network report on the technical assistance, that has done inter alia OpenStreetMap Italy after the earthquake.
    • Paul Groves explains how 3D-Mapping could be used to enhance the accuracy of GNSS in cities. As a data source for the needed 3D models he suggests to use OpenStreetMap data.
    • Peter Richardson of Mapzen reported in his blog post as he generates 3D models with Heightmapper from Mapzen’s high-quality open-source terrain data.

    Upcoming Events

    Dónde Qué Fecha País
    Brussels HOT Summit 2016 22/09/2016 belgium
    Brussels HOT Summit Missing Maps Mapathon 22/09/2016 belgium
    Brussels State of the Map 2016 23/09/2016-26/09/2016 belgium
    Grenoble Rencontre groupe local 26/09/2016 france
    Nottingham Nottingham 27/09/2016 united kingdom
    Kyoto 京都オープンデータソン2016 vol.2(吉田神社) with 第1回諸国・浪漫マッピングパーティ 01/10/2016 japan
    Trento Mapping Party Pieve Tesino 01/10/2016-02/10/2016 italy
    Genova Mappalonga Mapathon 01/10/2016 italy
    Metro Manila State of the Map Asia 2016 01/10/2016-02/10/2016 philippines
    Taipei Taipei Meetup, Mozilla Community Space 03/10/2016 taiwan
    Dresden Elbe-Labe-Meeting 08.10.2016-09.10.2016 germany
    Lyon Rencontre mensuelle mappeurs 11/10/2016 france
    Berlin Hack Weekend 15.10.2016-16.10.2016 germany
    Tokyo 東京!街歩き!マッピングパーティ:第1回 哲学堂公園 15/10/2016 japan

    Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..

    This weekly was produced by Hakuch, Nakaner, Peda, Rogehm, Softgrow, YoViajo, derFred, jinalfoflia, kreuzschnabel, mgehling, wambacher.

    by weeklyteam at September 22, 2016 05:05 PM

    Shyamal

    Crowdsourced Indian geology in the 1800s

    Crowd might be a bit of a stretch for less than a hundred contributors but George Bellas Greenough (1769 – 1839), one of the founders of the Geological Society of London produced, posthumously, the first geological map of India which was published in 1855. Greenough was the first president of the Geological Society of London and was reportedly best known for his ability to compile and synthesize the works of others and his annual address to the Society was apparently much appreciated. He was however entirely against the idea that fossils could be used to differentiate strata and in that he failed to admire William "Strata" Smith who produced the first geological map of England. An obituarist noted that Greenough was an outspoken critic of theoretical frameworks and a "drag" on the progress of the science of geology!

    Not much has been written about the history of the making of the Greenough map of Indian geology - it was begun somewhere in 1853 and was finally published in 1855 and consisted of four sheets and measured 7 by 5¾ foot. A small number of copies were made which are apparently collector items but hardly any are available online for anyone wishing to study the contents. The University of Minnesota has a set of scanned copies of three-fourths of the map but if you want to read it you need to download three large files (each of about 300 MB!) . I decided to stitch together these images and to enhance them a bit and since the image is legally in the public domain (ie. copyright expired), I have placed it on Wikimedia Commons. There really is a research need for examining the motivations for making this map and on how Greenough went about with it. He apparently had officers of the East India Company providing him information and he seems to have sent draft maps on which they commented. There is a very interesting compilation of the correspondence that went into the making of this map. It has numerous errors both in geology as well as in the positions and labelling but definitely something to admire for its period. 

    On has to lament that nobody has made a nice geological map in recent times showing interesting regional formations, fossil localities and so on. So much for our human-centricity and recentism. 

    Here is a small overview of the 1855 map. You can find and download the whole image on Wikimedia Commons.

    You can zoom into this image and enjoy the details by using this viewer that uses the Flash plugin or this one that is Flash-free.

    PLEA: If anyone can find a digital version of the northeast sector at a resolution that is readable, please please do let me know. 

    by Shyamal L. (noreply@blogger.com) at September 22, 2016 11:53 AM

    September 21, 2016

    Wikimedia Foundation

    Editing Wikipedia for a decade: Mohamed Ouda

    Photo by Mohamed Ouda, CC BY-SA 4.0.

    Photo by Mohamed Ouda, CC BY-SA 4.0.

    Last month, Mohamed Ouda celebrated his tenth year on Wikipedia. It has been an immersive experience, during which he has made every effort to expand the website, help others be as accomplished, and promote the culture of free knowledge.

    His story with the movement began when he was searching for information on the Arabic Wikipedia in August 2006. At the time, the emerging encyclopedia had around 10,000 mostly very short articles, and the article he was reading had a note stated that “This is a stub article. Please help expand it.”

    It was a message that many readers might ignore, but Ouda was different. “I decided to contribute to Wikipedia,” the first entry on Ouda’s personal user page reads, “because the Arabic Wikipedia was very small compared to other major languages on Wikipedia.”

    In the last decade, Mohamed, a native of Cairo, has edited Wikipedia over 21,000 times, created nearly 900 new articles, and expanded many more. Some of the articles he has developed, like Disney’s 1992 film Aladdin, have met the community’s quality standards—qualifying them to be among the Arabic Wikipedia’s “good” articles (English).

    In our interview with Ouda, he reaffirmed that enriching digital Arabic content is still his main urge for contributing to Wikipedia. “The shortage of digital content in Arabic is still what keeps me editing every day,” he says. The efforts of him and other Arabic-language Wikipedians have made a noticeable dent in this shortage. Today, the Arabic Wikipedia has just under 450,000 articles—less than a tenth of the English Wikipedia’s five million, but 200,000 articles stronger than three years ago. Still, there is much room for improvement; Arabic is the the fifth most-used language in the world.

    Ouda’s persistent efforts to enhance his language’s presence on the internet started with topics close to him. His first few edits included creating an article for the academy from which he has obtained his university degree, and expanding the article about the neighborhood where he lived. He quickly moved into writing important missing articles about parts of the body, like tongue, hair, and psychological disorders like narcissism.

    Within a few months, Ouda was nominated by the community to be an administrator on the Arabic Wikipedia; few opposed the move.

    Since the inception of the Wikipedia Education Program in Egypt, Ouda joined with it to apply his expertise with Wikipedia to help the program’s students. “The program’s idea was exciting to me.” says Ouda. “Assigning students to edit Wikipedia articles in the field of their study is wonderful as it increases article quality. I wanted to do anything I could to support that initiative.” Ouda served as a campus ambassador in the first edition of the program at Cairo University in 2012 and has provided needed training to his fellow campus volunteers and university students.

    For Ouda, a positive community is a key to support Wikipedia. “The first time I personally met up with Wikipedians was in attending Wikimania 2008 in Alexandria.” Ouda recalls, “That was a turning point in my Wikipedian experience. Meeting people offline and online, learning from others’ experiences and even hanging out together is always encouraging in an online community. I’ve seen many Wikipedians quit contributing because of negative communication experiences, namely arguments with trolls and online harassment.”

    Before co-founding Egypt Wikimedians user group, Ouda has been striving for establishing an affiliated group with Wikimedia to help “connect people and facilitate holding activities in Egypt.” Ouda is an active member and co-founder of the user group. He helps define the needs for the activities and projects held in Egypt, making plans and securing funding. Ouda and the user group have helped organize several writing and photography contests and backed the education program expansion. They are currently working on the plans to host the third WikiArabia conference early next year.

    Samir Elsharbaty, Digital Content Intern
    Wikimedia Foundation

    “The decade” is a new blog series that profiles Wikipedians who have spent ten years or more editing Wikipedia. If you know of a long-time editor who we should talk to, send us an email at blogteam[at]wikimedia.org.

    by Samir Elsharbaty at September 21, 2016 10:48 PM

    Luis Villa

    Copyleft, attribution, and data: other considerations

    Public licenses for databases don’t work well. Before going into solutions to that problem, though, I wanted to talk briefly about some things that are important to consider when thinking about solutions: real-world examples of the problems; a common, but bad, solution; and a discussion of the motivations behind public licenses.

    2013-bullfrog-map-unavailable
    Bullfrog map unavailable“, by Peter Desmets, under CC BY 3.0 unported

    Real-world concerns, not just theoretical

    When looking at solutions, it is important to understand that the practical concerns I blogged about aren’t just theoretical — they matter in practice too. For example, Peter Desmet has done a great job showing how overreaching licenses make bullfrog maps (and other data combinations) illegal. Alex Barth of OpenStreetMap has also discussed how ODbL creates problems for OSM users (though he got some Wikipedia-related facts wrong). And I’ve spoken to very well-intentioned organizations (including thoughtful, impactful non-profits) scared off from OSM for similar reasons.

    On the flip side, because these rules are based on such flimsy legal grounds, sophisticated corporate legal departments often feel comfortable circumventing
    the requirements by exploiting loopholes. (Needless to say, they don’t blog about the problems with the licenses – they just go ahead and use the loopholes.) So overreaching attempts to create new rights are, in many ways, the worst of both worlds: they hurt well-intentioned cooperation, and don’t dissuade parties with a significant interest in exploiting the commons.

    What not to do: create new “rights”

    When thinking about solutions, it is unfortunately also important to say what isn’t a good idea: create new rights, or override limitations on old ones. The Free Software Foundation, to their great credit, has always consistently said that if weakening copyright also weakens the GPL, they’ll take that tradeoff; and that vice-versa, the GPL should not ask for rights that go beyond copyright law. The most recent copyleft licenses from Creative Commons, Mozilla, and the FSF all make this explicit: limitations on copyright, like fair use, are not trumped by our licenses.

    Unfortunately, many people have a good-faith desire to see copyleft-like results in other domains. As a result, they’ve gone the wrong way on this point. ODbL is probably the most blatant example of this: even at the time, Science Commons correctly pointed out that ODbL’s attempt to create database rights by contract outside of the EU was a bad idea. Unfortunately, well-intentioned people (including me!) pushed it through anyway. Similarly, open hardware proponents have tried to stretch copyright to cover functional works, with predictably messy results.

    This is not just practically wrong, for the reasons I’ve explained in earlier posts. It is also ethically wrong for those of us who want to see more data sharing, because any “rights” we create by fiat are going to end up being used primarily to stop sharing, not encourage it.

    Remembering why we do share-alike and attribution

    Consider this section a brief sketch for a future post – if I forgot something
    big, please let me know, but please don’t roast me in comments for being brief
    or reductive about your favorite motivation.

    It is important when writing about public licenses to remember why the idea of
    placing restrictions on re-use is so intuitively appealing outside of software.
    If we don’t understand why people want to do less-than-public domain, it’s hard
    to come up with solutions that actually work. Motivations tend to be some
    combination (varying from person to person and community to community) of:

    • Recognition: Many people want to at least be recognized for their work, even when they ask for nothing else. (When Creative Commons assessed usage after their 1.0 licenses, [97-98% of people chose attribution](https://creativecommons.org/2004/05/25/announcingandexplainingournew20licenses/).) This sentiment underlies many otherwise “permissive” licenses, as well as academic norms around plagiarism and attribution.
    • Reducing free riding: Lots of people are afraid that commons can be destroyed by people who use the resource without giving back. Historically, this “tragedy of the commons” was about [rivalrous](https://en.wikipedia.org/wiki/Rivalry_(economics)) goods (like fisheries), but the same concern is often raised in the context of collaborative communities, whose labor can be rivalrous even when their goods are non-rivalrous. Some people like share-alike requirements because, pragmatically, they feel such requirements are one way to prevent (or at least reduce) this risk by encouraging people to either participate fully or not participate at all. (If you’re interested in this point, I’ve [written about it before](http://lu.is/blog/2014/12/02/free-riding-and-copyleft-in-cultural-commons-like-flickr/).)
    • “Fairness”: Many people like share-alike out of a deep moral sense that if you take, you should also give back. This often looks the same as the previous point, but with the key distinction that at least some people focused on fairness care more about process and less about outcomes: a smaller, less productive community with more sharing may, for them, be better than a larger, more productive community where not everyone shares perfectly.
    • Access to allow self-help: Another variation on the previous two points is a use of copyleft that focuses less on “is the author helping me by cooperating” and more on “did the author give me materials I can then use to help myself”. In this view, increased access to raw material (like source code, or data) can be good even the authors are non-cooperative. (To those familiar with the Linux kernel discussions, this is essentially “I got a lousy driver, and the authors hate me, but at least I got *a* driver”.)
    • Ethical: Many people simply think data/source should never be proprietary, and so will use any means possible, like copyleft, to increase the amount of non-proprietary code in the world.

    All of these motivations can be more or less valid at different points in time, in ways that (again) deserve a different post. (For example, automatic attribution may not have the same impact as “human” attribution, which may not be a surprise given the evidence on crowding out of intrinsic motivations.)

    Finally, next (and final?) post: what solutions we’ve got.

    by Luis Villa at September 21, 2016 06:22 PM

    Wiki Education Foundation

    Students can develop statistical literacy through Wikipedia

    Outreach Manager Samantha Erickson
    Outreach Manager Samantha Erickson

    Statistics seem to drive the news lately. It’s an election year, so people are obsessing over polls. Public policy discussions are being driven by research and analysis. Now more than ever, people are thinking and talking about statistics and what they mean.

    Wikipedia attracts millions of those online queries every day. Search for something on the web, and you’re bound to end up on Wikipedia. So it’s crucial that the information the public finds is reliable, accurate, and comprehensible.

    Unfortunately, statistical information isn’t always presented in the clearest way on Wikipedia. When it comes to understanding statistics, articles often have a lot of room for improvement. Some articles are great: The Monty Hall Problem, for example.

    But what if articles on important statistical topics, such as Deviance or Causal inference were just as easy for online passerby to understand?

    This summer, I’ve been attending conferences such as the Joint Statistics meeting to help identify and remedy these information gaps. We’re asking instructors to assign their students to think deeply about their fields, and then make a change to the way others access that information. In other words: students can help simplify complex language, and bring a wider understanding of statistics to the public.

    A few instructors raised concerns about whether students could create that kind of high quality content. But when I mentioned the help of our content experts, our online trainings, and our scaffolded approach to writing for Wikipedia (including our freshly updated course timelines) they saw the possibilities.

    Fittingly, Milo Schield of Augsburg College wants his students to work together to update the Wikipedia article on Statistical literacy. While the page does have some good information, and touches upon the complications in using statistics in advertising, Dr. Schield saw ways for his students to expand the article. With more sections, good citations, and a clear vision, this page could become a great source for individuals who want to learn more about statistics.

    Our work at the JSM is just one step in our larger year of science – a year long initiative to improve the way students think about and access science knowledge. By communicating what they know, student editors think about how to share their work with the world. But they’ll also think deeply about how to assess the information they find online. Along the way, millions of readers get access to the world’s most-read open knowledge resource: Wikipedia.

    Are you an American or Canadian instructor working in statistics in higher ed? Do you want to work with your students to help improve Wikipedia? Check out more information here to learn more, or contact us at contact@wikiedu.org. I hope to hear from you soon!


    Photo: Miastootwarte by Sebastian SikoraCC-BY 2.0 via Flickr.

    by Samantha Erickson at September 21, 2016 04:00 PM

    Wiki Loves Monuments

    Two-thirds update

    It has been 20 days since the launch of Wiki Loves Monuments 2016 and it is time for an update on what we have achieved together. In the majority of the more than 40 participating countries the photo submissions are accepted until September 30.

    Participation

    So far, more than 120,000 photos have been uploaded by 5741 participants as part of the contest in 41 participating countries. The highest number of uploads comes from Germany with more than 27,000 photo uploads (more than 50% of them by User:Tilman2007) while India is on top of the chart with 1196 unique participants!

    Compared to last year this time, the total number of photo uploads have increased by more than 10%. In the same period, the number of Wikimedia accounts created right before uploading a photo to the contest (a strong signal for the number of accounts created as part of the contest) has increased by almost 80%!

    Usage

    While it is too early to showcase the usage of photos uploaded as part of Wiki Loves Monuments 2016, it is nice to see that some of the participating photos have already been added to more than 3400 pages on Wikimedia projects.

    What’s next?

    We know that historically the majority of the photos have been uploaded in the last week of the contest. With less than 10 days left in the majority of the participating countries, now is our last chance as the WLM local organizers and enthusiasts to get the word out about the contest, organize local tours and photo upload events, and more. Remember that all it takes to participate is to find a monument near-by, photograph it, and submit a photo of it for the competition. You can win a prize and help Wikipedia.

    For more statistics about the contest, please check wikiloves or wlm-stat.


    Photo: Saint Samaan The Tanner Monastery in Cairo, Egypt. By: Hoba offendum, CC BY-SA-4.0.

    by Leila at September 21, 2016 01:57 AM

    September 20, 2016

    Wikimedia Foundation

    We need your input: building a shared vision for leadership development in the Wikimedia movement

    Photo by María Cruz and Jason Krüger, CC BY-SA 4.0.

    Photo by María Cruz and Jason Krüger, CC BY-SA 4.0.

    Sometimes the simplest of actions can create unexpected change in the world. That is what happened when Vassia Atanassova, decided to write 100 Wikipedia articles in 100 days as a challenge to herself. She called it#100wikidays, shared her challenge on Facebook, and quickly inspired dozens of other Wikipedia editors to take on the same challenge.

    In another corner of the world and many years earlier, Liam Wyatt started sending emails to museums to propose a new form of partnership: a “Wikipedian in residence.” The British Museum said yes, which led to the first GLAM-Wiki program of this sort. Five years later, in 2015, he found himself giving a presentation at the Soumaya museum in Mexico City to inspire the local community to start a residency of their own.

    There are now 110 Wikipedians in residence all over the world, and 7,500 articles have been created through the #100WikiDays challenge. Vassia and Liam are only two of the many Wikimedians who have boldly stepped up to help other Wikimedians succeed in our shared educational mission. This support pattern is consistent across the movement, and we, the Wikimedia Foundation, would now like to know how to best support it to help Wikimedia communities thrive.

    What do you mean leadership?

    In the Wikimedia world, almost every contributor has to “be bold” and step up to the challenges of guiding the projects and activities to success. In turn, experienced individuals become models for others and help mentor newcomers to participate in our projects and to continue to grow our community. This form of mentoring, or leadership, or collaborative guiding of the communities is an absolutely crucial part of meeting our vision: to freely share in the sum of all knowledge.

    As we pursue this world-changing mission, leadership is not about something beyond us, it is not about a single person leading, but a great many people: it is a shared practice that lies in the core of our culture. Wikimedia is a movement made of many volunteers leading through everyday acts to liberate knowledge, and help others to do the same.

    What can we do to build leadership?

    The Wikimedia Foundation’s Community Engagement department, along with movement affiliates, support and collaborate with these leaders, mentors and guides. However, there are many people throughout the movement who don’t get direct support for their leadership development activities. For every one of those community guides that movement organizations identify, there are dozens more within our movement who could lead, if given access to the right skills or encouragement. As a community, we should seek greater engagement of community leaders, and to do so we need a shared vision.

    That is why we are launching the Leadership Development Dialogue. We need your help in refining not only what the Wikimedia Foundation provides in terms of direct training activities for community leaders, but also to refine how we describe those leaders.

    Over the last year, we have engaged focus groups to explore the shared understandings of what kinds of “leadership” traits we need of new leaders within our communities—and we found that we want very similar things: empathetic community organizers, who know how to inspire our communities, making our communities more sustainable without alienating others. However the word “leader” does not translate well between languages and cultures, as it can mean anything from inspiring and engaging new participants to dictatorial control over projects or activities. We certainly don’t want that confusion!

    We need your help!

    From now through October 16, 2016, we invite you to comment on two items. First, we’d like your thoughts on how we can design for appropriate inclusivity, achieve the goals of the peer mentoring and leadership development, and develop additional support infrastructure to reinforce important skills in the Wikimedia communities; and second, how we should describe leaders in our movement, down to the word(s) we use to identify them and the skills that make them who they are.

    We invite you to join our conversation and help us refine what it means to develop leadership for program and community development.

    Alex Stinson, GLAM-Wiki Strategist
    Jaime Anstee, Senior Strategist, Manager
    María Cruz, Communications and Outreach Coordinator
    Wikimedia Foundation

    by Alex Stinson, Jaime Anstee and María Cruz at September 20, 2016 05:09 PM

    Addshore

    The RevisionSlider

    The RevisionSlider is an extension for MediaWiki that has just been deployed on all Wikipedias and other Wikimedia websites as a beta feature. The extension was developed by Wikimedia Germany as part of their focus on technical wishes of the German speaking Wikimedia community. This post will look at the RevisionSliders design, development and use so far.

    What is the RevisionSlider

    Once enabled, the slider appears on the diff comparison page of MediaWiki, where it aims to enable users to more easily find the revision of a page that introduced or removed some text as well as making the navigation of the history of the page easier. Each revision is represented by a vertical bar extending upward from the centre of the slider for revisions that added content and downward from the slider for those that removed content. Two coloured pointers are used to indicate the revisions that are currently being compared, the colour coding matches the colour of the revision changes in the diff view. Each pointer can be moved by dragging to a new revision bar or by clicking on the bar, at this point the diff will be reloaded using ajax for the user to review. For pages with many revisions arrows are enabled at the ends of the slider to move back and forward through revisions. Extra information about the revisions represented by bars is shown in a tooltip on hover.

    Deployment & Usage

    The RevisionSlider was deployed in stages, first to test sites in mid July 2016, then to the German Wikipedia and a few other sites that have been proactive in requesting the feature in late July 2016, and finally to all Wikimedia sites on 6 September 2016. In the 5 days following the deployment to all sites the number of users using the feature increased from 1739 to 3721 (over double) according to the Grafana dashboard https://grafana.wikimedia.org/dashboard/db/mediawiki-revisionslider. This means the beta feature now has more users than the “Flow on user talk page” feature and will soon overtake the number of users with ORES enabled unless we see a sudden slow down https://grafana.wikimedia.org/dashboard/db/betafeatures.

    The wish

    The wish that resulted in the creation of the RevisionSlider was wish #15 from the 2015 German Community Technical Wishlist and the Phabricator task can be found at https://phabricator.wikimedia.org/T139619. The wish actually reads (roughly translated) When viewing the diff a section of the version history, especially the editing comments show be show. Lots of discussion follows to establish the actual issue that the community was having with the diff page, and the consensus was it was generally very hard to move from one diff to another. The standard process within MediaWiki requires the user to start from the history page to select a diff. The diff then allows moving forward or backward revision by revision but big jumps are not possible without first navigating back to the history page.

    The first test version of the slider was inspired by the user script called RevisionJumper. This script provided a drop down menu in the diff view that provided various options to jump to a version of the page considerably before or after the current shown version. This can be seen in the German example below.

    DerHexer (https://commons.wikimedia.org/wiki/File:Gadget-revisionjumper11_de.png), „Gadget-revisionjumper11 de“, https://creativecommons.org/licenses/by-sa/3.0/legalcode

    DerHexer (https://commons.wikimedia.org/wiki/File:Gadget-revisionjumper11_de.png), „Gadget-revisionjumper11 de“, https://creativecommons.org/licenses/by-sa/3.0/legalcode

    The WMF Communit Tech team worked on a prototype during autumn 2015 which was then picked up by WMDE at the Wikimedia Jerusalem hackathon in 2016 and pushed to fruition.

    DannyH (WMF) (https://commons.wikimedia.org/wiki/File:Revslider_screenshot.jpg), „Revslider screenshot“, https://creativecommons.org/licenses/by-sa/4.0/legalcode

    DannyH (WMF) (https://commons.wikimedia.org/wiki/File:Revslider_screenshot.jpg), „Revslider screenshot“, https://creativecommons.org/licenses/by-sa/4.0/legalcode

    Further development

    Links

    by addshore at September 20, 2016 04:35 PM

    Wikimedia Foundation

    Interaction principles for online collaboration

    Painting by Harry Wilson Watrous, public domain/CC0.

    Painting by Harry Wilson Watrous, public domain/CC0.

    Over the past 15 years, Wikimedians have collaboratively built some of the most amazing projects on the Internet and for free knowledge. Editors on Wikipedia, contributors to Commons, and administrators on other sites are united in their goals of collecting human knowledge and making it accessible and reusable for free and for everybody in the world. They work hard towards this goal, contributing an impressive amount of time and effort.

    Wikimedians not only want to collect knowledge, they also want to get that knowledge right. They care a lot about the factual quality of the information on Wikimedia projects—about complying with copyright (e.g. for images that illustrate Wikipedia), about freedom of expression, about neutrality and the strength of underlying sources. In order to collaboratively build and improve content, Wikimedians discuss their views on talk pages, on email lists, and in 1-to-1 conversations. Naturally, when editors, administrators, and other contributors disagree about certain issues, they will argue. Often times, they have passionate debates, fiercely defending their point of view. And at times their disagreements escalate in ways that will lead Wikipedians to use harsh words or be abusive to each other. In some cases, however, bad behavior seemingly comes out of nowhere, for instance when users personally attack or troll others or engage in acts of vandalism. These are issues on the Internet that have been written about in various places and have been researched in our community. The existence of these problems on Wikipedia is something that we are not proud of.

    However, this is not a phenomenon that only exists in Wikimedia discussions: many websites that facilitate user contributions or comments see harsh conversations and personal attacks among users. As most of our communication moves online, including important democratic discourse, speech that threatens sincere conversations and debates increasingly becomes a problem. That’s why we are pleased to see that there are different initiatives that seek to address the issue of harassment online. We try to learn from those initiatives and hope they will succeed. At the same time, we know that we cannot rely on the work of others to make sure that the Wikimedia projects are safe for everyone to access and contribute to free knowledge. Rather, we are determined to create a friendly space ourselves where people can gather to collect encyclopedic information and educational content.

    There are several connected reasons for us to do this. An unfriendly or even toxic environment can be an impenetrable barrier for access to knowledge. We cannot expect people to join our movement and contribute to our mission of collecting free knowledge if they don’t feel comfortable on our websites. Yet, in order to build an exhaustive encyclopedia that covers diverse views and perspectives, we need as many people as possible to contribute to Wikipedia and our other Projects. In today’s world, people have many options for spending their free time and negative experiences would seriously threaten the success of our movement’s work. It has also been argued that there is an ethical obligation for platforms to protect their users from abusive behavior through community management. Finally, we also believe that productivity is diminished by a harsh tone and especially by polemic and aggression.

    While it is clear to us that many of these reasons deserve further research, we also recognize that one challenge to our intention is finding the right balance between promoting free speech and curbing harassment. Wikimedia’s values build on democratic decision-making and collaboration. So we started the process of developing principles for interaction on the Projects by asking the community for input. At this year’s Wikimania, the annual gathering of the global Wikimedia movement, together with roughly fifty participants, we discussed Wikimedians’ experiences with existing codes of conduct and policies on Wikipedia, Commons, etc. We discussed participants’ expectations for communication on- and off-Wiki and collected recommendations for behavior in arguments and disputes over facts and compliances with guidelines on the Wikimedia sites.

    Five patterns have emerged:

    1. Offer constructive criticism. Offer options.
    2. Treat people as you would like to be treated. No personal attacks. Be empathetic.
    3. Re-read your contributions. Be patient. Think: this is how x makes me feel.
    4. If you see something bad, say something.
    5. Connect on human level. Apologize. Get off-Wiki for a second. Rewind.

    We believe these principles for interaction can help us create a friendly space for all contributors and newcomers alike. The Wikimedia Foundation is taking this issue very seriously and working on developing better training for volunteers to discourage abuse and better resolve disputes; you can participate in that project on Meta. We invite you to discuss these principles with your community and to let us know what you think about them in the comments below. This is only the start of a larger conversation that we need to have in order to ensure the continued success of the Wikimedia projects and access to knowledge for everyone.

    Patrick Earley, Senior Community Advocate (International)
    Jan Gerlach, Public Policy Manager
    Wikimedia Foundation

    by Patrick Earley and Jan Gerlach at September 20, 2016 05:14 AM

    September 19, 2016

    Wikimedia Tech Blog

    How Wikimedia helped mobile web readers save on data

    Photo by Abigail Ripstra, CC BY-SA 4.0.

    A reader uses Wikipedia mobile for the first time to get an overview on a resistor. Photo by Abigail Ripstra, CC BY-SA 4.0.

    As the saying goes, a picture is worth a thousand words. Yet images on mobile devices can translate to more data used. In many parts of the world, high mobile data costs present a significant barrier to accessing knowledge on the Wikimedia sites.

    To address this, the Wikimedia Reading web team has made the article download process on Wikimedia mobile sites more efficient by preventing unnecessary image downloads. We’ve already seen the positive impact of this change on the amount of data used to access Wikimedia mobile content around the world.

    (If you’re a developer who is curious about how the change was made, we have a complete rundown in the last section of this post.)

    Why we made the change

    As of this year, over half of Wikimedia’s traffic comes from mobile devices. Readers access Wikipedia through mobile now more than ever, and we have to continue to understand and build for our readers’ changing needs.

    From the Foundation’s work with the New Readers initiative, we know that in places like Nigeria and India, high data costs are considered one of the largest barriers to accessing and reading Wikipedia. Feature phones and lower-grade Android smartphones are the primary devices for connecting to the internet, and in Nigeria, internet access has been prohibitively expensive. Data is a precious commodity in many countries, due to high bandwidth costs, bandwidth caps, and inconsistent internet connections.

    For context, the average web page consumes about 2.3MB of a mobile data plan. A web page is composed of several elements including the text you read, the CSS code that styles its interface, JavaScript code that makes the page more interactive, and images that illustrate it. Browsers do a good job of downloading these elements efficiently, but images and text respectively remain the biggest consumers of data.

    To illustrate this impact, as of June 2016, the article about Japan on the Japanese Wikipedia contained 1.4MB of images, 195KB of text, 157KB of JavaScript and 8KB of CSS. Without loading any of the images for the article, that would translate to about 0.03USD in mobile data costs (on a post-paid data plan in Japan) rather than 0.15USD with all the images loaded for the article.

    Similar stories can be told for people in Brazil reading the Portuguese article about Brasil or people in the United States reading the Barack Obama article in English.

    We made this change as our research has indicated that many of our mobile users, despite downloading an entire article, do not read every single word. On the mobile site, many people presumably use Wikipedia as a quick fact lookup. Knowing this, we were concerned about the amount of images people downloaded unnecessarily, and how those downloaded images might then impact their ability to consume knowledge.

    Photos are a ubiquitous element of Wikipedia’s most popular and highest quality articles, and this change now means that your phone will only load images as you scroll down a page, rather than on opening a page.

    How much more efficient?

    We wanted to see how this change impacted readers, so we looked at the traffic to our image servers across three language wikis for a week-long period before and after the change was made. We restricted our analysis to images that had been requested by page views—to avoid requests from external websites that we cannot control—by looking for a HTTP referrer header (a piece of information sent by web browsers to describe the context in which the request was made). We analysed the English Wikipedia because it has the highest volume by traffic, as well as the Japanese and Indonesian Wikipedias because these languages are mostly spoken inside a single geographical area—as we were also interested in the impact on speed, we wanted to rule out factors such as distance from the closest data center that would affect our results.

    Our analysis showed that on the mobile site of Indonesian Wikipedia, our data centers served our visitors 187 gigabytes less, a 32% decrease compared to a week before the change. For the same period on English Wikipedia, the decrease in data usage was even greater: we shipped 4.9 terabytes less than normal (that’s enough data to fill 1042 DVDs), resulting in a 47% decrease. On the Japanese Wikipedia, the results were similar—we saw a 51% decrease in data usage. Projecting the savings across all of Wikipedia, we hope to annually save our users 450 terabytes of mobile data!

    chart

    This reduction in data usage means web browsers will load Wikipedia pages in less time, because there’s less to load. Certain users on slower connections may even find their web pages display quicker, as there are now fewer requests battling for bandwidth. We’re now looking into whether these changes are significant, which can be challenging due to the limitations of older browsers, the scale of Wikipedia’s traffic and the limited information we collect about our users in keeping with our strong commitment to user privacy.

    To further demonstrate the impact of this change, let’s go back to the example of the Japan article on the Japanese Wikipedia, which weighed 1.76MB, and consider a 500mb data plan. Assuming the user accessed the internet for no other purpose, that article could have been consulted 9 times each day for a month, before the reader incurred additional charges or lost internet connectivity. After our changes on that same data plan, that particular article weighs only 530KB and could be viewed up to 30 times a day!

    Next steps

    The positive results that we are seeing are just the start. We are currently monitoring our page view traffic to see if this change leads to readers spending more time on our websites. The Wikimedia Foundation is also working on reducing the amount of JavaScript and CSS we serve, as well as thinking about ideas around speeding up their delivery. We are exploring how using new open web technologies such as Service Workers can help get content to our users more quickly. We’re also thinking about offline use cases for those users who, at times, may have no connection at all. Outside mobile, we hope to explore how we might apply similar enhancements for our desktop readers.

    Let us know how these changes have impacted you using this wiki page. Do you notice the difference? How has this changed your mobile reading experience? Have you noticed any bugs? What else could we be doing? We’d love to hear your thoughts.

    How we did it (technical)

    We also wanted to outline exactly how we made this change for technical audiences who might find the information useful. This section details how we prevented images from downloading unnecessarily, and is aimed at a developer audience.

    Any image inside a block of HTML will be loaded unconditionally, so the only way to avoid this was to remove our image tags from the HTML output.

    Rather than outputting an image into our HTML, we wrapped the image inside a <noscript> tag and appended a placeholder element with all the information needed to render the image via JavaScript. Our users who didn’t have JavaScript enabled would see the image inside the <noscript> tag and not benefit from the optimisation. For those with JavaScript, we had enough information to load the image when necessary.

    <noscript>

    <img alt=”A young boy (preteen), a younger girl (toddler), a woman (about age thirty) and a man (in his mid-fifties) sit on a lawn wearing contemporary c.-1970 attire. The adults wear sunglasses and the boy wears sandals.” src=”//upload.wikimedia.org/wikipedia/en/thumb/3/33/Ann_Dunham_with_father_and_children.jpg/300px-Ann_Dunham_with_father_and_children.jpg” width=”300″ height=”199″ class=”thumbimage” data-file-width=”320″ data-file-height=”212″>

    </noscript>

    <span class=”lazy-image-placeholder” style=”width: 300px;height: 199px;” data-src=”//upload.wikimedia.org/wikipedia/en/thumb/3/33/Ann_Dunham_with_father_and_children.jpg/300px-Ann_Dunham_with_father_and_children.jpg” data-alt=”A young boy (preteen), a younger girl (toddler), a woman (about age thirty) and a man (in his mid-fifties) sit on a lawn wearing contemporary c.-1970 attire. The adults wear sunglasses and the boy wears sandals.” data-width=”300″ data-height=”199″ data-class=”thumbimage”></span>

    For those with JavaScript enabled, we listened to the window scroll event and for any unloaded images (those with temporary placeholders), which loaded them when they moved close to the viewport. We wanted the experience of loading an image to be seamless so we used a generous offset, to load images before they might be needed. We also checked if the placeholder was visible given that it might be in a collapsed section. In that case images showed when a reader expanded the section.

    Many websites use a lower resolution image as a place holder. We decided against this because we felt it would be detrimental to the goal of avoiding unnecessarily sending bytes to our users. Instead we relied on a CSS animation to ease the transition from no image to image.

    var offset = $( window ).height() * 1.5;
    if ( mw.viewport.isElementCloseToViewport( placeholder, offset ) && $placeholder.is( ‘:visible’ ) ) {
    self.loadImage( $placeholder );
    }

    There was another set of users we had to consider—those with older browsers. To provide a better experience to our users on older browsers, we avoid running JavaScript, even if enabled. For these browsers we injected a small amount of JavaScript that replaced the placeholder with the original image tag, copying across all the necessary attributes. We were careful to use methods that enjoy broad browser support. For example rather than using getElementsByClassName we used the even more widely supported getElementsByTagName, which is supported by virtually all browsers.

    var ns,i,p,img;
    ns=document.getElementsByTagName(‘noscript’);
    for(i=0;i<ns.length;i++){

    p=ns[i].nextSibling;
    if(p && p.className && p.className.indexOf(‘lazy-image-placeholder’)>-1){

    img=document.createElement(‘img’);
    img.setAttribute(‘src’,p.getAttribute(‘data-src’));
    img.setAttribute(‘width’,p.getAttribute(‘data-width’));
    img.setAttribute(‘height’,p.getAttribute(‘data-height’));
    img.setAttribute(‘alt’,p.getAttribute(‘data-alt’));
    p.parentNode.replaceChild(img,p);

    }

    }

    The biggest challenges we experienced were ensuring the lazy image placeholders we were adding would not disrupt the presentation of the content. For example, images might be inline or block elements. We spent the majority of our time tweaking CSS rules to ensure disruption was minimal as possible. If you happen to find any bugs with our implementation please raise them!

    Jon Robson, Senior Software Engineer
    Wikimedia Foundation

    by Jon Robson at September 19, 2016 07:27 PM

    Engaging the world’s libraries with Wikipedia—what are the opportunities?

    Photo by Alex Stinson/Sadads, CC BY-SA 4.0.

    Photo by Alex Stinson/Sadads, CC BY-SA 4.0.

    Imagine thousands of librarians from all parts of the world descending on a midwestern town in United States. What would they talk about?

    To find out, the Wikimedia Foundation’s Wikipedia Library and GLAM-Wiki team traveled to Columbus during August to attend the World Library and Information Conference 2016 (#WLIC2016) hosted by the International Federation of Library Associations (IFLA) and its institutional supporters. We went to the conference hoping to help the library community get excited about the opportunities for collaborating with Wikipedia, by hosting an exhibit booth and giving a presentation.

    And to our delight, we didn’t have to get people excited and start the conversation about Wikipedia in the library communities—librarians from all over the world were already doing it for us! We even found several presentations about Wikipedia editing campaigns hosted by libraries, such as participation in Art + Feminism.

    Wikipedia became a hot topic throughout the conference, creating a backchannel of conversation on Twitter. But one conference only reaches a small community; that is why we have been working with the team at IFLA Headquarters to support the development of two public white papers which we first launched at the WLIC 2016.

    How do we expose the world’s librarians to Wikipedia?

    Enter libraries the world over, and wherever you find patrons using the internet, you will frequently also find patrons browsing Wikipedia. It’s hard to search the internet for information and not end up using Wikipedia information, whether you know it or not. But do the patrons get the best information for the topic they are looking for? Do they have the skills needed to use Wikipedia as part of a research process that helps with learning and advancement of human society?

    We spoke to these points and more at our IFLA presentation. The talk introduced two draft papers produced by committees of volunteers and librarians who have explored the opportunities for Wikipedia and Libraries to collaborate. These committees, chaired by Wikimedians and library advocates Alex Hinojo of Amical Wikimedia and the Mylee Joseph of the State Library of New South Wales, created very strong first drafts of the papers that survey the opportunities for libraries to be more engaged in the Wikimedia community.

    However, these committees were only able to scratch the surface of the experiences libraries and Wikipedia communities have with the Wikimedia community. For just one example, at the conference we discovered that Agricultural libraries at Cornell and Arizona State University had organized an editathon to cover key topic areas in agriculture! Wikipedia and the library communities have vast size, and it’s almost impossible for a small group of people to document: and that’s where we need your help!

    We need you to help us refine the conversation!

    As individual libraries it’s often hard to find the right skills, models and tools for collaborating so that the effort of libraries in improving access to information can work in collaboration with the effort of Wikipedians in sharing the sum of the world’s knowledge freely. However, there are hundreds of example partnerships throughout the world, that have demonstrated just how effective collaboration is.

    That’s where the white papers can help: they seek to document the best of these opportunities. And we need your help expanding and refining them so that we can share the examples from around the world with anyone who wants to try them further. We invite you to join the conversation by reading and commenting at the following documents :

    Thank you for your feedback, and for working to build the landscape of opportunities for Libraries and Wikipedia!

    Alex Stinson, GLAM-Wiki Strategist, Wikimedia Foundation
    Julia Brungs, Policy and Research Officer, International Federation of Library Associations and Institutions (IFLA)

    by Alex Stinson and Julia Brungs at September 19, 2016 05:06 PM

    Wiki Education Foundation

    The Roundup: Truth and Reconciliation

    When a state government owns up to its wrongdoings, the consequences can be severe. The process of uncovering the past is politically wrought, and the findings are often deliberately obscured. Nonetheless, documenting and distributing the findings of these commissions, often through truth commissions, is an essential part of a government holding itself accountable for human rights abuses and other malfeasance.

    David Webster’s Memory, truth and reconciliation in the developing world course at Bishop University focuses on bringing those findings to Wikipedia. In that course, student editors gather reports and write an article about a specific truth commission’s findings.

    Students have created nine new articles and expanded 60 more, including commissions in Morocco, Nepal, South Korea, and El Salvador.

    Webster’s students are making an incredible difference in this area. The course is responsible for more than 20% of the articles on Wikipedia’s list of truth and reconciliation commissions.

    It’s an example that fuses human rights, an area many students are passionate about, with the incentive of raising public awareness through Wikipedia. The articles written for this class have been seen 1.83 million times. That’s a stunning impact on the awareness of human rights issues.

    But through our online trainings and other tools, students are careful to let their passions guide their interest, but not their writing. Students emphasize a careful presentation and balance of views, carefully considering Wikipedia’s policies about equal weight to all sources. Students are encouraged to write for Wikipedia in a way that documents the discussion of these panels, not weigh in on it. That’s an essential skill that encourages students to carefully assess their sources, their writing, and their own positions.

    We’re very proud of these contributions to Wikipedia, and the excellent work carried out by these students!

    Do you have an interest in a similar project for your own course? Wiki Ed can help by providing resources and guidance to make sure students understand the value of balanced, encyclopedic writing. Our staff can provide trainings for students that helps them stay on the right side of Wikipedia’s policies. Many instructors use this assignment as a stepping stone to a broader, more reflective position or policy paper. That way, students are motivated by sharing their knowledge of history to raise awareness.

    We’d love to hear your ideas. Reach out to us to start a conversation: contact@wikiedu.org.


    Photo: Modified from Declassified by EFF Photos, CC-BY 2.0 via Flickr.

    by Eryk Salvaggio at September 19, 2016 04:00 PM

    Sam Wilson

    My dream job

    So I’ve started a new job: I’m now working for the Wikimedia Foundation in the Community Tech team. It’s really quite amazing, actually: I go to “work” and do things that I really quite like doing and would be attempting to find time to do anyway if I were employed elsewhere. Not that I’m really into the swing of things yet—only two weeks in—but so far it’s pretty great.

    I’m really excited about being part of an organisation that actually means something.

    Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.

    It’s a bit cheesy to quote that I know, but still: how nice it is to think that there’s something higher up the orgchart than an ever-increasing concentration of money.

    by Sam Wilson at September 19, 2016 07:29 AM

    Tech News

    Tech News issue #38, 2016 (September 19, 2016)

    TriangleArrow-Left.svgprevious 2016, week 38 (Monday 19 September 2016) nextTriangleArrow-Right.svg
    Other languages:
    العربية • ‎čeština • ‎Deutsch • ‎Ελληνικά • ‎English • ‎español • ‎suomi • ‎français • ‎עברית • ‎italiano • ‎日本語 • ‎norsk bokmål • ‎polski • ‎português • ‎português do Brasil • ‎русский • ‎shqip • ‎svenska • ‎українська • ‎Tiếng Việt • ‎中文

    September 19, 2016 12:00 AM

    September 18, 2016

    Ash Crow

    Sunday Query : use SPARQL and Python to fix typographical errors on Wikidata

    My turn to make a #SundayQuery! As Harmonia Amanda just said in her own article, I was about explain how to make a Python script to fix the results of her query… But I thought I should start with another script, similar but shorter and easier to understand. The script for Harmonia is here, though.

    On Thursday, I published an article about medieval battles, and since then, I did start to fix battle items on Wikidata. One of the most repetitive fixes is the capitalization of the French labels: as they have been imported from Wikipedia, the labels have an unnecessary capital first letter (“Bataille de Saint-Pouilleux en Binouze” instead of “bataille de Saint-Pouilleux en Binouze”)

    The query

    So first, we need to find all the items that have this typo:

    SELECT ?item ?label WHERE {
      ?item wdt:P31/wdt:P279* wd:Q178561 .
      ?item rdfs:label ?label . FILTER(LANG(?label) = "fr") .
      FILTER(STRSTARTS(?label, "Bataille ")) .
    }

    http://tinyurl.com/jljf6xr

    Some basic explanations :

    • ?item wdt:P31/wdt:P279* wd:Q178561 .
        looks for items that are battles or subclasses of battles, just to be sure I’m not making changes to some book called “Bataille de Perpète-les-Olivettes”…
    • On the next line, I query the labels for the items 
      ?item rdfs:label ?label .
        and filter to keep only those in French
      FILTER(LANG(?label) = "fr") .
      . As I need to use the label inside the query and not merely for display (as Harmonia Amanda just explained in her article), I cannot use the wikibase:label, and so I use the semantic web standard rdfs:label.
    • The last line is a
      FILTER
       , that keeps only those of the results that matches the function inside it. Here,
      STRSTARTS
        checks if
      ?label
        begins with
      "Bataille "
       .

    As of the time I write this, running the query returns 3521 results. Far too much to fix it by hand, and I know no tool that already exists and would fix that for me. So, I guess it’s Python time!

    The Python script

    I love Python. I absolutely love Python. The language is great to put up a useful app within minutes, easily readable (It’s basically English, in fact), not cluttered with gorram series of brackets or semicolons, and generally has great libraries for the things I do the most: scraping webpages, parsing and sorting data, checking ISBNs[1] and making websites. Oh and making SPARQL queries of course[2].

    Two snake charmers with a python and a couple of cobras.
    Not to mention that the name of the language has a “snake charmer” side ;)

    Preliminary thoughts

    If you don’t know Python, this article is not the right place to learn it, but there are numerous resources available online[3]. Just make sure they are up-to-date and for Python 3. The rest of this articles assumes that you have a basic understanding of Python (indentation, variables, strings, lists, dictionaries, imports and “for” loops.), and that Python 3 and pip are installed on your system.

    Why Python 3? Because we’ll handle strings that come from Wikidata and are thus encoded in UTF-8, and Python 2 makes you jump through some loops to use it. Plus, we are in 2016, for Belenos’ sake.

    Why pip? because we need a non-standard library to make SPARQL queries, called SPARQLwrapper, and the easiest way to install it is to use this command:

    pip install sparqlwrapper

    Now, let’s start scripting!

    For a start, let’s just query the full list of the sieges[4]:

    #!/usr/bin/env python3
    
    from SPARQLWrapper import SPARQLWrapper, JSON
    
    endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"
    
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery("""
    SELECT ?item ?label WHERE {{
      ?item wdt:P31/wdt:P279* wd:Q178561 .
      ?item rdfs:label ?label . FILTER(LANG(?label) = "fr") .
      FILTER(STRSTARTS(?label, "Siège ")) .
    }}
    """)  # Link to query: http://tinyurl.com/z8bd26h
    
    sparql.setReturnFormat(JSON)
    
    results = sparql.query().convert()
    
    print(results)

    That’s quite a bunch of lines, but what does this script do? As we’ll see, most of this will be included in every script that uses a SPARQL query.

    • First, we import two things from the SPARQLWrapper module: the SPARQLWrapper object itself and a “JSON” that it will use later (don’t worry, you won’t have to manipulate json files yourself.)
    • Next, we create a “endpoint” variable, which contains the full URL to the SPARQL endpoint of Wikidata[5].
    • Next, we create a SPARQLWrapper object that will use this endpoint to make queries, and put it in a variable simply called “sparql”.
    • We apply the setQuery function to this variable, which is where we put the query we used earlier. Notice that we need to replace { and } by {{ and }} : { and } are reserved characters in Python strings.
    • sparql.setReturnFormat(JSON)
        tells the script that what the endpoint will return is formated in  json.
    • results = sparql.query().convert()
      actually makes the query to the server and converts the response to a Python dictionary called “results”.
    • And for now, we just want to print the result on screen, just to see what we get.

    Let’s open a terminal and launch the script:

    $ python3 fix-battle-labels.py 
    {'head': {'vars': ['item', 'label']}, 'results': {'bindings': [{'label': {'value': 'Siège de Pskov', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q815196', 'type': 'uri'}}, {'label': {'value': 'Siège de Silistra', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q815207', 'type': 'uri'}}, {'label': {'value': 'Siège de Tyr', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q815233', 'type': 'uri'}}, {'label': {'value': 'Siège de Cracovie', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q608163', 'type': 'uri'}}, {'label': {'value': 'Siège de Narbonne', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q1098377', 'type': 'uri'}}, {'label': {'value': 'Siège de Hloukhiv', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q2065069', 'type': 'uri'}}, {'label': {'value': "Siège d'Avaricum", 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q4087405', 'type': 'uri'}}, {'label': {'value': 'Siège de Fort Pulaski', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q2284279', 'type': 'uri'}}, {'label': {'value': 'Siège de Liakhavitchy', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q4337397', 'type': 'uri'}}, {'label': {'value': 'Siège de Smolensk', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q4337448', 'type': 'uri'}}, {'label': {'value': 'Siège de Rhodes', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q701067', 'type': 'uri'}}, {'label': {'value': 'Siège de Cracovie', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q7510162', 'type': 'uri'}}, {'label': {'value': 'Siège de Péronne', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q23013145', 'type': 'uri'}}, {'label': {'value': 'Siège de Pskov', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q10428014', 'type': 'uri'}}, {'label': {'value': 'Siège du Hōjūjidono', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q3090571', 'type': 'uri'}}, {'label': {'value': 'Siège de Fukuryūji', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q3485893', 'type': 'uri'}}, {'label': {'value': "Siège d'Algésiras", 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q4118683', 'type': 'uri'}}, {'label': {'value': 'Siège de Berwick', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q5036985', 'type': 'uri'}}, {'label': {'value': "Siège d'Ilovaïsk", 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q17627724', 'type': 'uri'}}, {'label': {'value': "Siège d'Antioche", 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q815112', 'type': 'uri'}}]}}

    That’s a bunch of things, but we can see that it contains a dictionary with two entries:

    • “head”, which contains the name of the two variables returned by the query,
    • and “results”, which itself contains another dictionary with a “bindings” key, associated with a list of the actual results, each of them being a Python dictionary. Pfew…

    Let’s examine one of the results:

    {'label': {'value': 'Siège de Pskov', 'type': 'literal', 'xml:lang': 'fr'}, 'item': {'value': 'http://www.wikidata.org/entity/Q815196', 'type': 'uri'}}

    It is a dictionary that contains two keys (label and item), each of them having for value another dictionary that has a “value” key associated with, this time, the actual value we want to get. Yay, finally!

    Parsing the results

    Let’s parse the “bindings” list with a Python “for” loop, so that we can extract the value:

    for result in results["results"]["bindings"]:
        qid = result['item']['value'].split('/')[-1]
        label = result['label']['value']
    
        print(qid, label)

    Let me explain the 

    qid = result['item']['value'].split('/')[-1]
      line: as the item name is stored as a full url (“https://www.wikidata.org/entity/Q815196” and not just “Q815196”), we need to separate each part of it that is between a ‘/’ character. For this, we use the “split()” function of Python, which transforms the string to a Python list containing this:

    ['https:', '', 'www.wikidata.org', 'entity', 'Q815196']

    We only want the last item in the list. In Python, that means the item with the index -1, hence the [-1] at the end of the line. We then store this in the qid variable.

    Let’s launch the script:

    $ python3 fix-battle-labels.py 
    Q815196 Siège de Pskov
    Q815207 Siège de Silistra
    Q815233 Siège de Tyr
    Q608163 Siège de Cracovie
    Q1098377 Siège de Narbonne
    Q2065069 Siège de Hloukhiv
    Q4087405 Siège d'Avaricum
    Q2284279 Siège de Fort Pulaski
    Q4337397 Siège de Liakhavitchy
    Q4337448 Siège de Smolensk
    Q701067 Siège de Rhodes
    Q7510162 Siège de Cracovie
    Q23013145 Siège de Péronne
    Q10428014 Siège de Pskov
    Q3090571 Siège du Hōjūjidono
    Q3485893 Siège de Fukuryūji
    Q4118683 Siège d'Algésiras
    Q5036985 Siège de Berwick
    Q17627724 Siège d'Ilovaïsk
    Q815112 Siège d'Antioche

    Fixing the issue

    We are nearly there! Now what we need is to replace this first proud capital “S” initial by a modest “s”:

    label = label[:1].lower() + label[1:]

    What is happening here? a Python string works like a list, so we take the part of the string between the beginning of the “label” string and the position after the first character (“label[:1]”) and force it to lower case (“.lower()”). We then concatenate it with the rest of the string (position 1 to the end or “label[1:]”) and assign all this back to the “label” variable.

    Last thing, print it in a format that is suitable for QuickStatements:

    out = "{}\tLfr\t{}".format(qid, label)
    print(out)

    That first line seems barbaric? it’s in fact pretty straightforward:

    "{}\tLfr\t{}"
    is a string that contains a first placeholder for a variable (“{}”), then a tabulation (“\t”), then the QS keyword for the French label (“Lfr”), then another tabulation and finally the second placeholder for a variable. Then, we use the “format()” function to replace the placeholders with the content of the “qid” and “label” variables. The final script should look like this:

    #!/usr/bin/env python3
    
    from SPARQLWrapper import SPARQLWrapper, JSON
    
    endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"
    
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery("""
    SELECT ?item ?label WHERE {{
      ?item wdt:P31/wdt:P279* wd:Q178561 .
      ?item rdfs:label ?label . FILTER(LANG(?label) = "fr") .
      FILTER(STRSTARTS(?label, "Siège ")) .
    }}
    """)  # Link to query: http://tinyurl.com/z8bd26h
    
    sparql.setReturnFormat(JSON)
    
    results = sparql.query().convert()
    
    for result in results["results"]["bindings"]:
        qid = result['item']['value'].split('/')[-1]
        label = result['label']['value']
    
        label = label[:1].lower() + label[1:]
    
        out = "{}\tLfr\t{}".format(qid, label)
        print(out)

    Let’s run it:

    $ python3 fix-battle-labels.py 
    Q815196	Lfr	siège de Pskov
    Q815207	Lfr	siège de Silistra
    Q815233	Lfr	siège de Tyr
    Q2065069	Lfr	siège de Hloukhiv
    Q2284279	Lfr	siège de Fort Pulaski
    Q1098377	Lfr	siège de Narbonne
    Q608163	Lfr	siège de Cracovie
    Q4087405	Lfr	siège d'Avaricum
    Q4337397	Lfr	siège de Liakhavitchy
    Q4337448	Lfr	siège de Smolensk
    Q701067	Lfr	siège de Rhodes
    Q10428014	Lfr	siège de Pskov
    Q17627724	Lfr	siège d'Ilovaïsk
    Q23013145	Lfr	siège de Péronne
    Q815112	Lfr	siège d'Antioche
    Q3090571	Lfr	siège du Hōjūjidono
    Q3485893	Lfr	siège de Fukuryūji
    Q4118683	Lfr	siège d'Algésiras
    Q5036985	Lfr	siège de Berwick

    Yay! All we have to do now is to copy and paste the result to QuickStatements and we are done.

    Title picture: Photograph of typefaces by Andreas Praefcke (public domain)

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    Enregistrer

    1. I hope I’ll be able to write something about it sometime soon.
    2. Plus, the examples in the official documentation are Firefly-based. Yes sir, Captain Tightpants.
    3. For example, https://www.codecademy.com/learn/python or https://docs.python.org/3.5/tutorial/.
    4. I’ve fixed the battles in the meantime 😉
    5. And not the web access to the endpoint, which is just “https://query.wikidata.org/”

    Cet article Sunday Query : use SPARQL and Python to fix typographical errors on Wikidata est apparu en premier sur The Ash Tree.

    by Ash_Crow at September 18, 2016 04:30 PM

    Addshore

    Wikidata Map May 2016 (Belarus & Uganda)

    I originally posted about the Wikidata maps back in early 2015 and have followed up with a few posts since looking at interesting developments. This is another one of those posts covering the changes since the last post, so late 2015, to now, May 2016.

    The new maps look very similar to the naked eye and the new ‘big’ map can be seen below.

    So while at the 2016 Wikimedia Hackathon in Jerusalem I teamed up with @valhallasw to generate some diffs of these maps, in a slightly more programatic way to my posts following up the 2015 Wikimania!

    In the image below all pixels that are red represent Wikidata items with coordinate locations and pixels that are yellow represent items added between October 27, 2015 and April 2, 2016 with coordinate locations. Click the image to see it full size.

    The area in eastern Europe with many new items is Belarus and the area in eastern Africa is Uganda. Some other smaller clusters of yellow pixels can also be seen in the image.

    All of the generated images from April 2016 can be found on Wikimedia Commons at the links below:

    by addshore at September 18, 2016 01:05 PM

    Ash Crow

    Sunday Query: all surnames marked as disambiguation pages, with an English Wikipedia link and with “(surname)” in their English label

    It’s Sunday again! Time for the queries! Last week I showed you the basics of SPARQL; this week I wanted to show you how we could use SPARQL to do maintenance work. I assume you now understand the use of PREFIX, SELECT, WHERE.

    I have been a member of the WikiProject:Names for years. When I’m not working on Broadway and the Royal Academy of Dramatic Art archives,[1] I am one of the people who ensure that “given name:Christopher (Iowa)” is transformed back to “given name:Christopher (given name)”. Over the last weeks I’ve corrected thousands of wrong uses of the given name/family name properties, and for this work, I used dozens of SPARQL queries. I thought it could be interesting to show how I used SPARQL to create a list of strictly identical errors that I could then treat automatically.

    What do we search?

    If you read the constraints violations reports, you’ll see that the more frequent error for the property “family name” (P734) is the use of a disambiguation page as value instead of a family name. We can do a query like that:

    SELECT ?person ?personLabel ?name ?nameLabel
    WHERE {
        ?person wdt:P734 ?name . #the person has a family name
        ?name wdt:P31 wd:Q4167410 . #the item used as family name is a disambiguation page
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr" . } #We want to have the results in English or in French if there is no English label
    }

    link to the query. The results are in the thousands. Sigh.

    But then we find something more interesting: there are entities which are both a disambiguation page and a family name. What?! That’s ontologically wrong. To use the wrong value as family name is human error; but an entity can’t be both a specific type of Wikimedia page and a family name. It’s like saying a person could as well be a book. Ontologically absurd. So all items with both P31 need to be corrected. How many are there?

    SELECT DISTINCT ?name ?nameLabel (LANG(?label) AS ?lang)
    WHERE {
        ?name wdt:P31 wd:Q101352 ; #the entity is a family name
              wdt:P31 wd:Q4167410 . #the entity is also a disambiguation page
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr" . } #We want to have the results in English or in French if there is no English label
    }

    link to the query.
    Several thousands again. Actually, there are more entities which are both a disambiguation page and a family name than there are person using disambiguation pages as family names. This means there are family names/disambiguation pages in the database which aren’t used. They’re still wrong but it doesn’t show in violation constraints reports.

    If we explore, we see than there are different cases out there: some of the family names/disambiguation pages are in reality disambiguation pages, some are family names, some are both (they link to articles on different Wikipedia, some about a disambiguation page and some about a family name; these need to be separated). Too many different possibilities: we can’t automatize the correction. Well… maybe we can’t.

    Restraining our search

    If we can’t treat in one go all disambig/family name pages, maybe we can ask a more precise question. In our long list of violations, I asked for English label and I found some disbelieving ones. There were items named “Poe (surname)”. As disambiguation pages. That’s a wrong use of label, which shouldn’t have precisions about the subject in brackets (that’s what the description is for) but if they are about a surname they shouldn’t be disambiguation pages too! So, so wrong.

    Querying labels

    But still, the good news! We can isolate these entries. For that we’ll have to query not the relations between items but the labels of the items. Until now, we had used the SERVICE wikibase:label workaround, a tool which only exists on the Wikidata endpoint, because it was really easy and we only wanted to have human-readable results, not really to query labels. But now that we want to, the workaround isn’t enough, we’ll need to do it the real SPARQL way, using rdfs.

    Our question now is: can I list all items which are both family names and disambiguation pages, whose English label contains “(surname)”?

    SELECT DISTINCT ?name ?label (LANG(?label) AS ?lang)
    WHERE {
        ?name wdt:P31 wd:Q101352 ; #the entity is a family name
              wdt:P31 wd:Q4167410 ; #the entity is also a disambiguation page
              rdfs:label ?label . #the entity have a label
        FILTER(LANG(?label) IN ("en")). #this label exists in English
        FILTER(CONTAINS(?label, "(surname)")). #this label contains a specific string
    }

    link to the query. We had several hundreds results.[2] You should observe the changes I made in the SELECT DISTINCT as I don’t use the SERVICE wikibase:label workaround.

    Querying sitelinks

    Can we automatize correction now? Well… no. There is still problems. In this list, there are items which have links to several Wikipedia, the English one about the surname and the other(s) ones about a disambiguation page. Worse, there are items which don’t have an English interwiki any longer, because it was deleted or linked to another item (like the “real” family name item) and the wrong English label persisted. Si maybe we can filter our list to only items with a link to the English Wikipedia. For this, we’ll use schema.

    SELECT DISTINCT ?name ?label (LANG(?label) AS ?lang)
    WHERE {
        ?name wdt:P31 wd:Q101352 ; #the entity is a family name
              wdt:P31 wd:Q4167410 ; #the entity is also a disambiguation page
              rdfs:label ?label . #the entity have a label
        ?sitelink schema:about ?name .  #We want the entity to have a sitelink
                  schema:inLanguage "en" ; #this sitelink is in English
                  schema:isPartOf  <https://en.wikipedia.org/> . #and link to the English WP (and not Wikisource or others projects)
        FILTER(LANG(?label) IN ("en")). #the label exists in English
        FILTER(CONTAINS(?label, "(surname)")). #the label contains a specific string
    }

    link to the query. Well, that’s better! But our problem is still here: if they have several sitelinks, maybe the other(s) sitelink are not about the family name. So we want the items with an English interwiki and only an English interwiki. Like this:

    SELECT DISTINCT ?name ?label (LANG(?label) AS ?lang)
    WHERE {
        ?name wdt:P31 wd:Q101352 ; #the entity is a family name
              wdt:P31 wd:Q4167410 ; #the entity is also a disambiguation page
              rdfs:label ?label . #the entity have a label
        ?sitelink schema:about ?name .  #We want ?name to have a sitelink
        ?WParticle schema:about ?name ; #We'll define the characteristic of the sitelink
                   schema:inLanguage "en" ; #this sitelink is in English
                   schema:isPartOf  <https://en.wikipedia.org/> . #and link to the English WP (and not Wikisource or others projects)
        FILTER(LANG(?label) IN ("en")). #the label exists in English
        FILTER(CONTAINS(?label, "(surname)")). #the label contains a specific string
    } GROUP BY ?name ?label HAVING (COUNT(DISTINCT ?sitelink) = 1) #With only one sitelink

    link to the query.

    Several things: we separated ?sitelink and ?WParticle. We use ?sitelink to query the number of sitelinks, and ?WParticle to query the particular of this sitelink. Note that we need to use GROUP BY, like last week.

    Polishing of the query

    Just to be on the safe side (we are never safe enough before automatizing corrections) we’ll also check that all items on our list are only family name/disambiguation pages; they’re not also marked as a location or something equally strange. So we query that they have only two P31 (instance of), these two being defined as Q101352 (family name) and Q4167410 (disambiguation page).

    SELECT DISTINCT ?name ?label (LANG(?label) AS ?lang)
    WHERE {
        ?name wdt:P31 ?type ; #the entity use the property P31 (instance of)
              wdt:P31 wd:Q101352 ; #the entity is a family name
              wdt:P31 wd:Q4167410 ; #the entity is also a disambiguation page
              rdfs:label ?label . #the entity have a label
        ?sitelink schema:about ?name .  #We want ?name to have a sitelink
        ?WParticle schema:about ?name ; #We'll define the characteristic of the sitelink
                   schema:inLanguage "en" ; #this sitelink is in English
                   schema:isPartOf  <https://en.wikipedia.org/> . #and link to the English WP (and not Wikisource or others projects)
        FILTER(LANG(?label) IN ("en")). #the label exists in English
        FILTER(CONTAINS(?label, "(surname)")). #the label contains a specific string
    } GROUP BY ?name ?label HAVING ((COUNT(DISTINCT ?type) = 2) && (COUNT(DISTINCT ?sitelink) = 1)) #With only two P31 and one sitelink

    link to the query.

    It should give you a beautiful “no matching records found”. Yesterday, it gave me 175 items which I knew I could correct automatically. Which I have done, with a python script made by Ash_Crow. If you are good, he’ll make a #MondayScript in response to this #SundayQuery!

    (Main picture: Name List of Abhiseka – Public Domain, photograph done by Kūkai.)

    1. Yes, I really want you to read this one.
    2. Then. We had several hundreds result then. I’m happy to say it isn’t true now.

    Cet article Sunday Query: all surnames marked as disambiguation pages, with an English Wikipedia link and with “(surname)” in their English label est apparu en premier sur The Ash Tree.

    by Harmonia Amanda at September 18, 2016 12:51 PM

    September 17, 2016

    Mahmoud Hashemi

    kensamonte: Little thingy I did a while ago. Ken’s an amazing...



    kensamonte:

    Little thingy I did a while ago.

    Ken’s an amazing illustrator who helped us with our banner and logo, both of which you can see prominently on our Twitter page, where we update a bit more frequently, to say the least. Thanks again, Ken!

    September 17, 2016 02:15 AM

    September 16, 2016

    Wiki Education Foundation

    Visualizing article history with Structural Completeness

    You may have noticed a recent addition to the Articles tab of dashboard.wikiedu.org course pages: “structural completeness”. This feature is an experiment in visualizing the history of articles as they develop.

    The evolution of an article over the course of a Wikipedia assignment.
    The evolution of an article over the course of a Wikipedia assignment.

    The structural completeness data comes from the “Objective Revision Evaluation Service” (ORES), a Wikimedia Foundation research project that uses machine learning to analyze Wikipedia articles and individual edits. I started digging into ORES last year to see how well the “wp10” scores — estimates of what score an article would get on the Wikipedia 1.0 scale from Stub to Feature Article at any point in its history — map to the work that student editors do in our classes. What I found was that even small changes in the ORES wp10 score were meaningful in terms of the changes that happened to an article. While the scores don’t account for the intellectual content of articles, they give a great sense of the major — and minor — changes of an individual article over time.

    ORES, a data mining project for Wikipedia articles
    ORES, a data mining project for Wikipedia articles

    In the dashboard, I’m calling this data “structural completeness”, because the scores are based on how well an article matches the typical structural features of a mature Wikipedia article. The machine learning model calculates scores based on the amount of prose, the number of wikilinks to other articles, the numbers of references, images, headers and templates, and a few other basic features. Down the road, we may be able to use this data to give automated suggestions about what aspects of their article an editor should focus on next — whether adding links to related topics, improving the citations, or breaking it into sections as it grows.

    Take a look at how articles by student editors develop. When you spot a big change in the structural completeness score, this usually means something interesting happened to the article that suddenly made it look a lot more (or a lot less) like a typical well-developed Wikipedia article.

    I’ll continue to iterate on these visualizations; our goal is to make it as easy as possible to both get an overview of an article’s history and to drill down to the details of individual edits. If you have ideas, comments, or you notice something really interesting with these visualizations, drop me a line!

    by Sage Ross at September 16, 2016 06:46 PM

    Weekly OSM

    weeklyOSM 321

    09/06/2016-09/12/2016

    Logo “Where do cyclists cycle? An important question not just for city planners”, stated the German business journal “manager magazin“ 1 | here: London (Photo: Heat created by Strava cyclists. Map provided by Mapbox/Open Street Map)

    About us

    • Do you wish to join the diverse weeklyOSM team or start weeklyOSM in your language? Or are you just curious to know how weeklyOSM works? If so, we’d be delighted to see you at our workshop at SotM. If you look at this map of where weeklyosm is currently produced in, you can see that we are looking for YOU!

    Mapping

    • The Mapbox team now intends to focus on OSM data in French cities. In his English blog BharataHS summarizes his questions to understand French road rules.
    • User BushmanK explains on why he believes that the healthcare=midwife is a poorly designed tag.
    • Mountaineer’s Mailbox has been proposed as a new value for the man_made tag. The proposal was also discussed on the Tagging mailing list.
    • LeTopographeFou suggests an automated correction of typo errors in tag values. To begin with, he has started a voting on typos tunnel = building_passage in the wiki.
    • On the talk-GB, the British community discusses its upcoming quarterly project. Proposals range from opening hours on speed limits to food hygiene ratings. In preparation for these tasks, interesting analysis tools are used.
    • User mapmeld writes a dairy about transliterating place names around the world, with an open source crowdsourcing tool called CityNamer. This project uses OSM data and account details, but does not save edits yet.

    Community

    • Gmane website has been off the air for a few weeks, Martin writes about reviving this web interface and get it working.
    • OSM Awards 2016 are the community awards. Have you voted for your favourites in the six categories? The voting ends on September 22nd. Frederik Ramm explained (Deutsch) (automatic translation) in the forum why you should vote. Nakaner presents his view point on the candidates in his user diary.
    • User PeWu from Poland presents OSM History Viewer – a tool to view the history of OSM nodes, ways and relations. In his post in the OpenStreetMap Forum he added some nice examples. (Example1, Example2). The source code and the examples are available on Github.
    • Mapper of the month, SomeoneElse, shares his OSM journey so far and his upcoming work.
    • The Saarländischer Rundfunk, a German TV channel, broadcasted an excellent video (start playing from 16:39 min) by Herbert Mangold on “Mapping with OSM”. This video talks about Mundraub, an OSM based map, how mapping efforts assist in fighting catastrophes like the Ebola crisis in Africa and the importance of local knowledge. Few members of the weeklyOSM team have also been featured in the video! (Deutsch)
    • There is a survey aimed at OSM users who edit in Argentina, in order to learn more about what they think about the project and also determine the possibility to organize task groups and contribute to the enrichment of the map.

    Events

    • As a run up to the State of the Map conference at Brussels, there is a call for informal sessions including Birds of a Feather (BoFs). Take a look at some of the proposed sessions.
    • The upcoming week is slated to be eventful at Brussels with five big Geo-events lined up including a hackday and a mapathon! (It surely is Meptember!)
    • Selene Yang writes in her user diary writes about 25th September being the last date for submitting proposals for the State of the Map Latam which is going to be held from 25th to 27th November in Brazil.
    • At this year’s State of The Map conference, members from the various local communities can share their experience during the State of the Local Map and the Local Chapters Congress.

    Humanitarian OSM

    Maps

    • [1] The German business journal ‘Manager Magazin’ illustrates the cyclist routes of the ten major European cities.
    • During ongoing system upgrade, the OSMF Tileserver will be migrated on Mapnik 3. Tom Hughes writes about the same on the Talk-list.
    • Luke Smith writes about grough developing a composite map, which would be a combination obtained by blending OSM data with OS OpenData to fill in the gaps, and using public rights of way data directly from the local authorities which have released it.

    switch2OSM

    Open Data

    • The Array of Things (AoT) is an urban sensing project, a network of interactive, modular sensor boxes that will be installed around Chicago to collect real-time data on the city’s environment, infrastructure, and activity for research and public use.
    • East Africa’s commitment to Open Data is evident from its latest initiatives including the recently concluded East Africa Open Data Conference.

    Licences

    • When Toursprung found out that a German TV station used their OSM based map, they billed them for it and donated the amount (200 €) to OSM. 😉

    Programming

    • Mapzen has set up a Personal Package Archives or PPAs for Ubuntu for its routing engine Valhalla. This makes it possible to install Valhalla with a simple apt-get command.

    Releases

    Software Version Release date Comment
    Komoot Android * var 2016-09-05 No info
    Cruiser for Android * 1.4.11 2016-09-06 Routing enhancements, added hill shading
    Cruiser for Desktop * 1.2.11 2016-09-06 No infos
    Merkaartor 0.18.3 2016-09-06 Mac OS X, improved download, relation filter and many bug fixed.
    Naviki Android * 3.47.1 2016-09-06 Fixed bug after pausing recording or navigation.
    OpenStreetMap Carto Style 2.43.0 2016-09-06 Please read info.
    Mapnik 3.0.12 2016-09-08 Many changes. Please read changelog.
    BRouter 1.4.5 2016-09-10 Performance improvements, two bugs fixed.
    Komoot iOS * 8.3.1 2016-09-10 Two bugs fixed.
    Mapillary iOS * 4.4.12 2016-09-10 Updated Russian, disconnectable image stabilization
    Traccar Server 3.7 2016-09-10 No Info
    MapFactor Navigator Free * 2.2 2016-09-12 Many extensions and fixes, see release info.

    Provided by the OSM Software Watchlist.

    (*) unfree software. See freesoftware.

    OSM in the media

    • An interesting article in the Melbourne Age, on how economics can make your dinner taste better! They did some analysis and matched review data about restaurants with geospatial data from OpenStreetMap and found that there is a strong negative link between restaurant quality – as defined by star ratings – and proximity to tourist attractions and street corners.

    Other “geo” things

    • You do not want to be constantly tracked by Google and hence delete its mapping service on your phone. Would that suffice? Google makes it harder to evade it’s data collection.
    • Here is a video of generating a 3D city model in LOD1 by extrusion, with the software 3dfier developed by the 3D Geoinformation group at TU Delft and partially funded by Kadaster.
    • Japan is mapping its streets in 3D to support their autonomous taxis for the 2020 Olympic Games.
    • Mapbox opened a development office in Detroit.
    • Tanvi Misra published an article on CityLab with the headline “Gorgeous Maps of an Ugly War” about the ongoing conflict in Ukraine.
    • Owen Powell, a GIS analyst and cartographer, explains his workflow in creating both beautiful and accurate digital 3D maps using Blender and GIS data in great detail.
    • German science magazine “Spektrum” discusses possible effects of the use of electronic navigation aids on our natural sense of direction.
    • User schleuss shares interesting aerial imagery captured during the LA Building import project. A batman landing rooftop, a hexagonal pool and a bunch of green cars are among the weird things seen from above Los Angeles.

    Upcoming Events

    Dónde Qué Fecha País
    Zaragoza OpenStreetMap Spain Association at the Wikimedia-ES conference 16/09/2016 spain
    Zaragoza Spanish 10th anniversary mapping party 17/09/2016 spain
    Karnali Tikapur Mapping organized by Kathmandu Living Labs and Practical Action Nepal Tikapur 18/09/2016 nepal
    Rennes Rencontres mensuelles 19/09/2016 france
    Nottingham Nottingham 20/09/2016 united kingdom
    Edinburgh Edinburgh 20/09/2016 united kingdom
    Brussels HOT Summit 2016 22/09/2016 belgium
    Brussels HOT Summit Missing Maps Mapathon 22/09/2016 belgium
    Grenoble Rencontre groupe local 26/09/2016 france
    Leoben Stammtisch Obersteiermark 29/09/2016 austria
    Kyoto 京都オープンデータソン2016 vol.2(吉田神社) with 第1回諸国・浪漫マッピングパーティ 01/10/2016 japan
    Metro Manila State of the Map Asia 2016 01/10/2016-02/10/2016 philippines
    Taipei Taipei Meetup, Mozilla Community Space 03/10/2016 taiwan

    Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..

    This weekly was produced by Hakuch, Nakaner, Rogehm, SeleneYang, SomeoneElse, SrrReal, derFred, escada, jinalfoflia, mgehling.

    by weeklyteam at September 16, 2016 10:59 AM

    September 15, 2016

    Wikimedia Foundation

    Wikipedia has all the Emmy facts you want, commercial-free

    Photo by FA2010, photoshopped by Aubrie Johnson, public domain/CC0.

    Photo by FA2010, altered by Aubrie Johnson, public domain/CC0. Wikipedia screenshot of “68th Primetime Emmy Awards” licensed under CC BY-SA 3.0.

    Wikipedia editors will be updating articles on Emmy categories and nominees in real time, from Game of Thrones and Mr. Robot to Sherlock. You can watch the 68th Primetime Emmy Awards with them this Sunday to get the full rundown on your favorite programs, ad-free and in real-time.

    CAWylie, a Wikipedian who edits many Emmy-related articles, says that Wikipedians have a formula for updating quickly and accurately. “The main advantage Wikipedia has in updating current events is it seems more streamlined. Usually we have templates in place wherein the nominees are already listed, so updating it live is simply a formatting move or boldfacing the winner.”

    That means that if a show you haven’t watched yet or a person you haven’t seen before wins, you’ll be able to jump on a second screen to a Wikipedia article and learn about them, even as they’re being edited.

    You can find almost anything Emmy-related on Wikipedia, including records from every category. Last year, for example, Game of Thrones broke the record for most wins garnered by a single season, and Saturday Night Live has been nominated 209 times in its lifetime.

    Most of the ceremony’s hundreds of categories have detailed Wikipedia pages dating back to the birth of the Emmys in 1949. Learn more about nominations for Outstanding Character Voice-Over Performance and Outstanding Hairstyling for a Single-Camera Series, or dig up retired categories like the Super Emmy Award!

    CAWylie added, “Emmy articles are an ebb and flow type that require a WikiProject to monitor and follow continuity year to year. That’s where the aforementioned formula works best.”

    And it’s this constant ebb and flow that makes Wikipedia a valuable second screen. No matter who wins in the actual program, it’s almost as if Wikipedia is running a competition of its own. Check out the pageview standings below to find out which film or series is Wikipedia’s most popular.

    Have fun watching the Emmy Awards, and don’t forget to take us with you.

    Photo from/altered by White House staff, public domain/CC0.

    Photo from the/altered by White House staff, public domain/CC0.

    Drama holds Wikipedia’s Iron Throne

    If Wikipedia views are a viable metric, Game of Thrones will run away with an Emmy.

    With over twelve million pageviews from July 2015 to May 2016, which roughly equates to the period during which an Emmy-nominated show must air,* Game of Thrones is far ahead of the other programs. It’s part of a wealth of riches in the drama category, as Mr. Robot and House of Cards—sitting behind Game of Thrones with seven and six million pageviews, respectively—have more pageviews than any program in the other categories.

    Other Emmy wiki-winners include Fargo, Modern Family, and, with the lowest total of any first-place finisher, Last Week Tonight with John Oliver.

    Photo by Universal Television, public domain/CC0,Outstanding Comedy Series

    1. Modern Family: 3,552,759
    2. Silicon Valley: 2,448,528
    3. Unbreakable Kimmy Schmidt: 2,296,114
    4. Master of None: 1,734,987
    5. Transparent: 1,350,759
    6. Veep: 1,310,872
    7. Black-ish: 857,052

    Image by DreamWorks Television and/or FX, public domain/CC0.Outstanding Drama Series

    1. Game of Thrones: 12,515,992
    2. Mr. Robot: 7,125,310
    3. House of Cards: 6,175,630
    4. Better Call Saul: 4,835,770
    5. Downton Abbey: 4,549,164
    6. Homeland: 3,731,915
    7. The Americans: 1,991,590

    Image, public domain/CC0.
    Outstanding Variety Talk Series

    1. Last Week Tonight with John Oliver: 892,225
    2. The Late Late Show with James Corden: 678,439
    3. The Tonight Show Starring Jimmy Fallon: 564,653
    4. Jimmy Kimmel Live!: 436,162
    5. Comedians in Cars Getting Coffee: 432,823
    6. Real Time with Bill Maher: 202,884

    Outstanding Variety Sketch Series

    1. Saturday Night Live: 2,150,568
    2. Key & Peele: 1,074,838
    3. Drunk History: 736,562
    4. Portlandia: 704,956
    5. Inside Amy Schumer: 538,476
    6. Documentary Now!: 371,930

    Image, public domain/CC0.Outstanding Limited Series

    1. Fargo: 5,011,284
    2. The Night Manager: 2,570,953
    3. American Crime: 1,378,060
    4. The People v. O. J. Simpson: American Crime Story: 294,137
    5. Roots (2016 miniseries): 257,956

    Outstanding Television Movie

    1. Luther: 2,243,733
    2. The Abominable Bride” (Sherlock): 831,843
    3. A Very Murray Christmas: 656,858
    4. All the Way: 318,800
    5. Confirmation: 304,215

    Image, public domain/CC0.Outstanding Reality-Competition Program

    1. The Voice: 2,887,390
    2. Dancing with the Stars: 2,259,556
    3. American Ninja Warrior: 1,028,929
    4. Project Runway: 739,489
    5. The Amazing Race: 688,460
    6. Top Chef: 424,176

    *Statistics here are derived from Pageviews Analysis and run from July 2015 to May 2016, which roughly corresponds with the Emmy eligibility criteria (June 2015 to May 2016); reliable and accurate data is not available for June and earlier. The time period does rob some articles of views—Roots, for example, started airing at the end of May 2016 and is possibly the most heavily affected article, with several hundred thousand views lost. Statistics for The People v. O. J. Simpson: American Crime Story were affected by a page move that added a space between “O” and “J”; the numbers, however, should still be accurate.

    Aubrie Johnson, Social Media Associate
    Ed Erhart, Editorial Associate
    Wikimedia Foundation

    by Aubrie Johnson and Ed Erhart at September 15, 2016 11:44 PM

    Editing Wikipedia for a decade: David García

    A crop of a photo by Pedro J Pacheco, CC BY-SA 4.0.

    Members of Wikimedia Spain; Garcia is in the front row, left side, with a black and yellow Pacman shirt. Photo by Pedro J Pacheco, CC BY-SA 4.0.

    Way back in August 2003, David García was browsing the Spanish Wikipedia while looking for information about mathematics and computer science, but he found it lacking and decided to fix it.

    Hitting the “edit” button that day was a decision that changed García’s life—he has been editing, with a few breaks, ever since, celebrating thirteen years just a few months ago. Wikipedia has evolved over the years, and what García calls his first “little changes” have since turned into significant contributions that have not gone unnoticed.

    A web developer, García was born and raised in Madrid, Spain; he now lives in Chicago, Illinois, in the United States. García has created 2,242 pages on the Spanish Wikipedia, and he has edited Wikipedia and the Wikimedia projects more than 100,000 times because he loves the idea of sharing.

    “Being able to share what you’ve learned with other people is exciting,” he explains. “I love the idea of helping people and getting help from others. It is an amazing feeling when you are in Spain and can get guidance from someone in South Africa, China, Bolivia, Mexico or anywhere.”

    García's first article on Wikipedia was about Mahjong. Photo by yui, CC BY 2.0.

    García’s first article on Wikipedia was about Mahjong. Photo by yui, CC BY 2.0.

    Over the years, Wikipedia’s appearance, structure and content have changed extensively. In the first few years, not only was Wikipedia very different from what it is today, site visitors had different expectations and editors had different motivations for contributing.

    “The Spanish Wikipedia had around 10,000 articles when I started editing,” García recalls. It now has 1.3 million. “The layout was totally different then—it looked like most websites in the 1990s. I remember that important articles were missing. There was no entry for triangular number until I created it in January 2004, and it did not include articles on many famous people like Martin Luther King until others finally created those.”

    García has quit editing Wikipedia twice, once between 2006 and 2008 and then in 2014 for a year. During this time he had used Wikipedia to look for information, and every time he asked himself, “Why don’t you participate again?” The visceral nature of the site, however, means that he feels Wikipedia has changed when he returns; he’s found that he needs to “get used to it and learn about the policy and community nature all over again” each time.

    García believes that being a Wikipedian does not necessarily require experience in the field you plan to write about. Since it is an encyclopedia that “anyone can edit,” anyone is eligible to do at any time once they have the references for a topic. If they don’t know where to start or what to write about, they can take on anything that interests them.

    Sometimes, for example, García has decided to cover a topic of professional interest to him, like the article he wrote on prime numbers, a good article on the Spanish Wikipedia (meaning that it has met high-quality standards set by the community). He has also written about topics of personal interest, such as the most recent article he has created—Lapsed Catholic (in Spanish: Católico No Practicante).

    Samir ElSharbaty, Digital Content Intern
    Wikimedia Foundation

    “The decade” is a new blog series that will, as you might expect, profile Wikipedians who have spent ten years or more editing Wikipedia. If you know of a long-time editor who we should talk to, send us an email at blogteam[at]wikimedia.org.

    by Samir Elsharbaty at September 15, 2016 07:19 PM

    Wiki Education Foundation

    Science students become science communicators through Wikipedia

    When students study a new topic, they often turn to a search engine to get a better understanding of the topic. Those search results take them to Wikipedia, where (hopefully) they find a comprehensive and understandable summary. As they begin to understand the concept, they scroll to the bottom to find sources for further reading. Students find links to academic articles within their university libraries and click through for a deeper reading.

    That’s how it works for students. But what about the rest of the world – those who can’t access those journal articles? Wikipedia may be their only source of information.

    That’s one of the reasons we launched the Wikipedia Year of Science. If Wikipedia is the general public’s science primer, we believe it should be as comprehensive and accurate as possible. Most importantly, it should be understandable.

    This year, science students all over the United States and Canada have participated in our initiative to create science content that your typical non-scientist can understand. They’re educating the public while learning how to communicate science. Students are already making Wikipedia better for the world. But we’re not satisfied yet!

    That’s why we attended so many science conferences this summer—to spread the word about teaching with Wikipedia.

    In July, we attended the Allied Genetics Conference, where we met dozens of university instructors who want the public to understand how geneticists’ research is transforming the world. We joined plant biologists at the American Society of Plant Biologists’ annual meeting, where scientists stressed the importance of educating the world about increasing the food supply over the next century. Again, Wikipedia is the place to do so. Later in August, Wiki Ed attended the Botanical Society of America’s conference, the Joint Statistics Meeting, MathFest, the Ecological Society of America’s conference, and the American Chemical Society’s fall meeting.

    The common thread across all of these events? Science communication. In fact, a quick search of these conferences’ programs turns up nearly 100 results for sessions about science communication and public engagement. In a digital world that provides so much information to the curious among us, scientists need to learn how to speak to people without their expertise and rigorous research background. Writing Wikipedia is one way our future scientists can develop this skill.

    Won’t you join us? If you’d like to work with us during the Year of Science and beyond, we’d love to hear from you. Whether you’re a higher education instructor looking to bring Wikipedia into your course, a librarian looking to expand access to your special collections with a Visiting Scholar, or you’re interested in offering financial support, reach out to us: contact@wikiedu.org.

    by Jami Mathewson at September 15, 2016 04:00 PM

    September 14, 2016

    Wiki Education Foundation

    Monthly Report for August 2016

    Highlights

    • The Classroom Program has been busy onboarding instructors for the fall term. The fall marks the second half of the Year of Science, and continues Wiki Ed’s trend of growth. We’ve nearly doubled the number of supported courses compared to this time last fall.
    • Wiki Ed staff was on the road in August, promoting the Year of Science with instructors at the Joint Statistics meeting, MathFest, the Ecological Society of America, and American Chemical Society events. These events create an environment for face-to-face contact with experts in the sciences, and provide an opportunity to raise awareness among the scientific community.
    • We’ve opened new Visiting Scholars positions: Brown University, which focuses on ethnic studies, and Temple University, which is seeking to contribute to the improvement of articles about Philadelphia history and/or the Holocaust.
    • Wiki Ed has produced a new subject-specific brochure for students developing Wikipedia content on topics related to linguistics. The guide is a nod to our partnership with the Linguistics Society of America, and discusses scaffolds and frameworks for articles related to dialects and concepts in linguistics.
    • Wiki Ed’s Student Learning Outcomes research project’s surveys were approved by the University of Massachusetts Amherst’s Human Subjects Research Protection Office and Internal Review Board. These voluntary surveys were distributed to students via the Wiki Ed Dashboard, and will form the core of Research Fellow Zach McDowell’s analysis.

    Programs

    Educational Partnerships

    An instructor at the Botany 2016 Conference in Savannah, Georgia learns more about Wikipedia as a teaching tool.
    An instructor at the Botany 2016 Conference in Savannah, Georgia learns more about Wikipedia as a teaching tool.

    August was a busy month for the Educational Partnerships team. Staff traveled to several academic conferences to promote the Wikipedia Year of Science. Educational Partnerships Manager Jami Mathewson attended the Botanical Society of America’s annual meeting in Savannah, Georgia, where she spoke with botanists about plant physiology and taxonomy. While there, she talked to instructors about the role their students can play in improving Wikipedia articles related to the plants they study.

    webmathfest_conference_2016_10
    Director of Programs LiAnna Davis speaks to a MathFest 2016 attendee.


    Outreach Manager Samantha Erickson attended the Joint Statistics Meeting in Chicago. This visit focused on increasing communication skills in students through Wikipedia, encouraging instructors to see the role our assignments have in elevating the public understanding of statistical concepts.

    Director of Programs LiAnna Davis joined Jami at MathFest in Columbus, Ohio. Math instructors expressed an increased interest in Wikipedia writing assignments, based on the communication experience they provide. Students need to develop communication skills during their studies. Math departments want to help their students be more competitive when they enter the workforce. A Wikipedia assignment is an excellent fit, since math articles on Wikipedia, though often accurate, are difficult for laypeople to comprehend. When students translate that content and make it accessible to the general public, they build skills otherwise overlooked in the math classroom.

    Educational Partnerships Manager Jami Mathewson speaks with an attendee at the Ecological Society of America conference.
    Educational Partnerships Manager Jami Mathewson speaks with an attendee at the Ecological Society of America conference.

    Jami and Samantha went to Fort Lauderdale for the Ecological Society of America conference. There, instructors and students alike were interested in Wiki Ed’s Ecology handbook, which aids ecologists and experts in editing Wikipedia.

    Wrapping up the month of travel and outreach, Jami attended the American Chemical Society’s fall meeting in Philadelphia, PA. There, she presented to attendees about using Wikipedia as a pedagogical tool in the chemistry classroom. She also joined the Simons Foundation’s edit-a-thon, where participants learned how to contribute and focused on articles about chemistry or women chemists.

    Classroom Program

    Status of the Classroom Program for Fall 2016 as of August 31:

    • 138 Wiki Ed-supported courses were in progress (69, or 50%, were led by returning instructors)
    • 720 student editors were enrolled
    • 86% of students were up-to-date with the student training
    • Students edited 32 articles and created 2 new entries.

    The Fall 2016 term has started, and we’re well on our way to supporting our largest number of classes to date. This time last year, Wiki Ed had 86 courses in progress, compared to 138 this term. In Fall 2015 as a whole, we supported 162 courses, just 24 more than where we stand today, with the fall term yet to begin. This growth is due in large part to our outreach team and to our ability to provide instructors and students with meaningful learning experiences.

    As the Fall term begins, we’re also entering the second half of the Year of Science. So far for Fall 2016, we have 82 courses in STEM and social science fields, and we anticipate many more to come on board as the term progresses. In Spring and Summer 2016, we supported 130 courses and over 2,300 students during this year-long initiative to improve science content on Wikipedia and science literacy and communication among our students. Our Year of Science courses have ranged from genetics to archaeology and from sociology to plant biology. With a half year still to go, our students have already made a significant impact on Wikipedia. They’ve added over 2.3 million words, edited over 2,300 articles, and created almost 200 new entries. We’re excited to see what the second half of the Year of Science brings!

    Some examples of article expansions are coming in from summer courses:

    • Tamarins are small monkeys found in Central and South American. The black tamarin, one of the smallest primates, is found exclusively in northeastern Brazil where it is threatened by habitat destruction. At the start of the summer, the Wikipedia article on the black tamarin was a two-sentence stub. Students in Nancy Clum’s Biology 124 BKclass spend the summer expanding the article. They added a section describing the species, and others about its distribution, behavior, feeding, reproduction, and conservation status. And in so doing, they turned a stub into an informative article.
    • Kathryn Grafton’s course from the University of British Columbia took advantage of the shorter term length in the summer to investigate a specific kind of content gap: omissions in articles themselves. Students looked at articles related to knowledge mobilization, a term describing how research is or could be brought out of academia and into public use. Students looked at research, knowledge mobilizations and scholarly analysis, highlighting where Wikipedia did not include important, relevant information, often from outside the west. Because all our student editors are wikipedians, they didn’t stop with criticism and have proposed changes with sources from their research for each article!

    Community Engagement

    This month we are happy to announce two new opportunities for Visiting Scholars. The first is through Brown University’s John Nicholas Brown Center for Public Humanities and Cultural Heritage, which is looking for a Wikipedia to improve articles about ethnic studies. Supporting the position at Brown are Jim McGrath, Postdoctoral Fellow in Digital Public Humanities, and Susan Smulyan, the Center’s Director. The second position is at Temple University, which would like to support a Wikipedian’s work on subjects related to the history of Philadelphia, the history of African Americans in Philadelphia, and/or the history and study of the Holocaust. Associate University Librarian Steven Bell is supporting the position at Temple University Libraries. There’s more information about these positions in our blog posts about them:

    Community Engagement Manager Ryan McGrady is focused on recruiting experienced Wikipedians for the open positions. He also continued to work with several other sponsors at various stages of the onboarding process and new contacts thanks to Jami and Samantha’s outreach at recent conferences.

    The current Visiting Scholars continued to produce some stand-out work. George Mason University’s Gary Greenbaum brought Mr. Dooley up to Featured Article status. At the end of the month, it was also selected as “Today’s Featured Article” on Wikipedia’s main page. Barbara Page’s article, Serial rapist, was also featured on the main page in the Did You Know section with the following: “[Did you know] … that serial rapists are more likely to be strangers to their victims than single-victim rapists?”


    Program Support

    Communications

    Communications Manager Eryk Salvaggio worked with Product Manager for Digital Services Sage Ross to organize a touchup of the Wiki Education Foundation’s website. The new site encourages deeper reads for specific audiences with visual cues directing readers to the blog, teaching resources, and fundraising pages.

    In August, we also announced publication of a new subject-specific brochure, Editing Wikipedia articles on Linguistics. The guide was written with input from Dr. Gretchen McCulloch and Wikipedia editors User:Cnilep, User:Uanfala, and our own Wikipedia Content Expert in Humanities, Adam Hyland. It takes student editors through the process of writing or improving Wikipedia articles, with templates for structuring articles on languages, dialects, and linguistic concepts.

    Eryk worked with Wikipedia Content Expert in the Sciences Ian Ramjohn to complete some updating of our training modules for students and the instructors’ orientation.

    Blog posts:

    External Media:

    1. Students take on role of Wikipedia editors George St. Martin, News @ Northeastern (University) (August 1)
    2. Чому Вікіпедія важлива для жінок у науці Eryk Salvaggio, “Why Wikipedia Matters to Women in Science,” translation by Vira Motorko for Wikimedia Ukraine. (August 1)
    3. When the professor says it’s OK to use Wikipedia Janelle Nanos, Boston Globe (scroll down for story) (August 2)
    4. O ano da ciência se propaga no Brasil Eryk Salvaggio, translated into Portugeuse by Victor Barcellos for Ciencia Aberta (August 8)
    5. Open Educational Practice: Unleashing the Potential of OER TJ Bliss, EdSurge (August 9)
    6. Midwest Political Science Association (MPSA) Newsletter (August 18)
    7. Volunteer Expands Pitt’s Reach, One Wikipedia Citation at a Time Sharon S. Blake, Pitt Chronicle (August 22)
    8. 5 razões pelas quais tarefas na Wikipédia podem melhorar habilidades de comunicar ciência Eryk Salvaggio, translated by David Alves for the Neuromat Blog (Brazil, in Portuguese) (August 23)

    Digital Infrastructure

    With the Fall term ramping up, Sage spent much of August fixing bugs in the course creation and course cloning features, adding Dashboard features that make it easier for Wiki Ed staff to onboard and monitor new courses, and updating the dashboard survey functionality for UMass Amherst’s Internal Review Board’s requirements for the Student Learning Outcomes Research. Work also continues on making the dashboard codebase easier for new developers to get started.

    Among the more noticeable improvements:

    • The ‘Overview’ tab of each course now shows the number of images uploaded.
    • Students can now see and edit the article(s) they are working on directly from the course Overview. (This feature debuted earlier, but had been disabled for the last few months.)

    Research and Academic Engagement

    Research

    In August, Data Science Intern Kevin Schiroo analyzed the portion of academic content produced by Wiki Ed students. This came in two parts. First, Kevin uncovered and incorporated new “signals” within an article that identified it was related to academic content. That classifier pulled information from references, introduction text, templates, and the use of academic words to classify articles as academic or not academic with a high degree of accuracy.

    After constructing the classifier tool, Kevin applied it to a sample of Wikipedia pages to calculate the total productivity of Wiki Ed students within topics deemed academic.

    wiki_ed_percent_of_general_academic_content
    Percent of general academic content contributed by Wiki Ed student editors.

    When we consider all academic content, we saw some substantial contribution rates. Over the spring term, Wiki Ed student editors averaged 2.6% of all content. However, the entire term can be misleading, since we do not expect significant contributions early in the term, when most classes aren’t active. Shortening the window to 30 days shows that we produced 4.6% of all content between mid-April and mid-May.

    wiki_ed_percent_of_early_academic_content
    Percent of early-stage academic content contributed by Wiki Ed student editors.

    We also examined contribution rates for early academic content, since this is a focal area for Wiki Ed. Here, we see substantially higher contribution rates. Over the whole term last term, Wiki Ed’s student editors produced 6.6% of this content; during our most active period (between mid-April and mid-May) we produced 10.1% of all early academic content, that is, either new articles or articles that were in a fledgling stage of development when students first encountered them.

    Details can be found on meta and a general overview is available on Wiki Ed’s website.

    Student Learning Outcomes Research

    The Human Subjects Research Protection Office / IRB at the University of Massachusetts Amherst approved the protocol titled “Student Learning Outcomes using Wikipedia Based Assignments,” in just three weeks from submission. Research Fellow Zach McDowell worked with experts from various fields (particularly, Information Literacy and Composition and Rhetoric) that have taught with Wikipedia-based assignments to refine the survey’s assessments. Zach constructed the initial assessment and survey tool on the Wiki Ed dashboard, engaged the Wiki Ed in a round of testing and feedback, and implemented those changes. Additionally, Zach gathered valuable feedback from board members, helping to further improve and shape the research questionnaires. These changes were re-submitted to IRB and approved.

    Helaine, Eryk, and Zach worked on refining a communications strategy to instructors and students, including a script for an introductory video for the research project, which will be shown to students before they are provided with a consent form.

    Final approval for this phase of the project was received in late August. The survey has been released, with emails sent in waves to students and instructors that had already on-boarded. These emails are sent out multiple times a week to students and instructors as they sign up for classes, informing them of the study and encouraging them to participate.


    Finance & Administration / Fundraising

    Finance & Administration

    Monthly expenses for the Wiki Education Foundation in August 2016.
    Monthly expenses for the Wiki Education Foundation in August 2016.

    For the month of August, expenses were $150,664 versus our planned spending of $208,033. The variance of $57k was primarily due to the departure and vacancy of two staff positions ($21k), as well as some cutbacks and savings with travel ($26k) expenses.

    Year to Date expenses for the Wiki Education Foundation as of August 2016.
    Year to Date expenses for the Wiki Education Foundation as of August 2016.

    Our year-to-date expenses are $335,774 versus our planned expenditures of $436,676. Along with the staff vacancies and cutbacks in travel mentioned above, the $101k variance is also a result of deferring our fundraising and marketing campaigns ($49k) until later in the year.

    Fundraising

    Current priorities:

    • Securing new funding for fall 2016
    • Renewing major institutional funders in early 2017

    Office of the ED

    Current priorities:

    • Securing funding
    • Preparing for the strategic planning process

    In August, Executive Director Frank Schulenburg started the “Executive Director’s Major Gift Campaign,” reaching out to high net-worth individuals via personalized solicitation letters, followed by phone calls with prospects. The goal of this new initiative is to measure the effectiveness of an in-house mail campaign based on a highly-curated list of contacts. Also, using A/B-testing, we’ll be comparing the results of using different messages and the return from different target groups. Inspired by the #100wikidays challenge, we’re aiming at sending 100 letters in 100 days.

    Frank began preparing for the upcoming strategic planning exercise. With our current strategic plan running out in 2017, Frank and the board will create a new strategy for the next two years. In preparation for the kick-off, Frank started drafting a process and gathered materials for distribution among the participants of the planning exercise.

    Finally, Frank and the members of the senior leadership team used the first iteration of the new “Executive Director’s Summary Report” which aims at increasing our effectiveness in keeping track of organizational performance indicators on a monthly level. Based on the feedback received this month, we’ll further improve the usefulness of the report card in future iterations.


    Visitors and guests

    • Tobias Kleinlercher, Wikipedian from Austria
    • Brenda Laribee, Consultant

    by Eryk Salvaggio at September 14, 2016 07:11 PM

    Wikimedia Foundation

    Victory in Brazil as court rules in favor of Wikimedia Foundation

    Photo by Claudney Neves, CC BY-SA 3.0.

    Rio de Janeiro. Photo by Claudney Neves, CC BY-SA 3.0.

    We are happy to announce that the 6th Civil Court of Jacarepaguá in Rio de Janeiro, Brazil has ruled in favor of the Wikimedia Foundation in an injunction claim filed by Brazilian musician Rosanah Fienngo.

    Ms. Fienngo filed a lawsuit objecting to information on her personal life in the Portuguese Wikipedia article about her. The court stated that although the information available on her Wikipedia page concerned her private life, Ms. Fienngo had already disclosed that information to the media herself, so its inclusion on Wikipedia was not an invasion of her privacy.

    The Portuguese Wikipedia article about Ms. Fienngo contained information about her as a notable public figure in Brazil. This information included some details of her personal life, but this information was derived from public sources, most of which Ms. Fienngo had provided herself, such as an interview Ms. Fienngo gave to the gossip website O Fuxico.

    In 2014, Ms. Fienngo filed an indemnification claim against Google Brasil and “Wikipedia,” apparently believing that Google was responsible for the content of Wikipedia. In November 2014, the Wikimedia Foundation received word that a Brazilian court had ruled against “Wikipedia” in Ms. Fienngo’s suit. The Wikimedia Foundation was not a party to this action and received no notice of the case in advance. The court order required removal of the article about Ms. Fienngo, and imposed a daily fine if the article remained intact. The article was then removed from Portuguese Wikipedia by community members.

    In response, the Wikimedia Foundation argued that the article was written using information already publicly available online, including statements Ms. Fienngo had made in published interviews. Additionally, as a public figure, Ms. Fienngo has a reduced “sphere of privacy,” and celebrities do not need to approve articles written about themselves using publicly available information.

    This decision confirms that the information that was in the article about Ms. Fienngo was appropriate to host on Wikipedia in both Brazil and the United States. It should be noted, though that Ms. Fienngo retains the right to appeal to the Brazilian State Court of Appeals, but we believe that the decision was strong enough that community members should feel free to make editorial decisions to write articles like the one about Ms. Fienngo, which is as of publishing time still deleted.

    Overall, this decision is a positive outcome for Wikimedia. This ruling supports the ability of Wikipedians in Brazil and all around the world to create accurate and well-sourced articles, even if the information in those articles may sometimes be unflattering to the article’s subject. Those who share personal information with the media should expect that it will be available to a large number of people, and may someday appear on Wikipedia.

    The Wikimedia Foundation will continue to support you, the global community, in constructing the best encyclopedia possible to aid in the dissemination of free knowledge.

    Jacob Rogers, Legal Counsel
    Wikimedia Foundation

    We would like to extend our sincerest gratitude to Koury Lopes Advogados for their excellent representation in this matter, especially Tania Liberman, Eloy Rizzo, Tiago Cortez, Daniel Rodrigo Shingai, and Yasmine Maluf. We would also like to extend special thanks to legal fellow Leighanna Mixter for her assistance in preparing this blog post.

    by Jacob Rogers at September 14, 2016 05:25 PM

    Wikimedia UK

    Help shape Wikimedia UK’s delivery plans for 2017 – 18

    Wikimedia UK evaluation panel, June 2016. Photo by  Wolliff (WMUK) CC BY-SA 4.0

    Wikimedia UK evaluation panel, June 2016. Photo by Wolliff (WMUK) CC BY-SA 4.0

    Wikimedia UK will soon be applying to the Wikimedia Foundation for an Annual Plan Grant (APG) in 2017 – 18. Longstanding volunteers, members and other stakeholders will be familiar with this process but for those of you who aren’t, an APG enables affiliated organisations around the world – including country ‘chapters’ of the global Wikimedia movement, like Wikimedia UK – to access funds raised by the Foundation through the Wikipedia banner campaign.

    The deadline for proposals is 1st October and we will need to submit our draft delivery plan for next year as well as the proposal itself. On Saturday 24th September we will be holding a day of meetings to discuss and develop our proposal and our delivery plans for next year alongside the wider Wikimedia UK community. These include a meeting of the Evaluation Panel in the morning followed by a discussion focused on education from 12 – 3pm and a Planning Lab from 3 to 5pm.

    The education meeting will give participants the opportunity to feed into our emerging plans for education and help us to shape an education conference in early 2017. At the Planning Lab we will share our plans for partnerships and programmes in 2017, with a view to incorporating feedback and ideas into our proposal to the Wikimedia Foundation, and enabling volunteers to identify how they might get involved with Wikimedia UK over the next year.

    All meetings will take place at Development House near Old Street, London and are open to all, but signing up in advance is essential (see below for links). Refreshments including lunch during the education meeting will be provided, and support for travel is available if Wikimedia UK is notified in advance by email to karla.marte@wikimedia.org.uk.

    Education eventbrite registration page.

    Planning Lab eventbrite registration page.

    by John Lubbock at September 14, 2016 02:48 PM

    Luis Villa

    Copyleft and data: databases as poor subject

    tl;dr: Open licensing works when you strike a healthy balance between obligations and reuse. Data, and how it is used, is different from software in ways that change that balance, making reasonable compromises in software (like attribution) suddenly become insanely difficult barriers.

    In my last post, I wrote about how database law is a poor platform to build a global public copyleft license on top of. Of course, whether you can have copyleft in data only matters if copyleft in data is a good idea. When we compare software (where copyleft has worked reasonably well) to databases, we’ll see that databases are different in ways that make even “minor” obligations like attribution much more onerous.

    Card Puncher from the 1920 US Census.
    Card Puncher from the 1920 US Census.

    How works are combined

    In software copyleft, the most common scenarios to evaluate are merging two large programs, or copying one small file into a much larger program. In this scenario, understanding how licenses work together is fairly straightforward: you have two licenses. If they can work together, great; if they can’t, then you don’t go forward, or, if it matters enough, you change the license on your own work to make it work.

    In contrast, data is often combined in three ways that are significantly different than software:

    • Scale: Instead of a handful of projects, data is often combined from hundreds of sources, so doing a license conflicts analysis if any of those sources have conflicting obligations (like copyleft) is impractical. Peter Desmet did a great job of analyzing this in the context of an international bio-science dataset, which has 11,000+ data sources.
    • Boundaries: There are some cases where hundreds of pieces of software are combined (like operating systems and modern web services) but they have “natural” places to draw a boundary around the scope of the copyleft. Examples of this include the kernel-userspace boundary (useful when dealing with the GPL and Linux kernel), APIs (useful when dealing with the LGPL), or software-as-a-service (where no software is “distributed” in the classic sense at all). As a result, no one has to do much analysis of how those pieces fit together. In contrast, no natural “lines” have emerged around databases, so either you have copyleft that eats the entire combined dataset, or you have no copyleft. ODbL attempts to manage this with the concept of “independent” databases and produced works, but after this recent case I’m not sure even those tenuous attempts hold as a legal matter anymore.
    • Authorship: When you combine a handful of pieces of software, most of the time you also control the licensing of at least one of those pieces of software, and you can adjust the licensing of that piece as needed. (Widely-used exceptions to this rule, like OpenSSL, tend to be rare.) In other words, if you’re writing a Linux kernel driver, or a WordPress theme, you can choose the license to make sure it complies. Not necessarily the case in data combinations: if you’re making use of large public data sets, you’re often combining many other data sources where you aren’t the author. So if some of them have conflicting license obligations, you’re stuck.

    How attribution is managed

    Attribution in large software projects is painful enough that lawyers have written a lot on it, and open-source operating systems vendors have built somewhat elaborate systems to manage it. This isn’t just a problem for copyleft: it is also a problem for the supposedly easy case of attribution-only licenses.

    Now, again, instead of dozens of authors, often employed by the same copyright-owner, imagine hundreds or thousands. And imagine that instead of combining these pieces in basically the same way each time you build the software, imagine that every time you have a different query, you have to provide different attribution data (because the relevant slices of data may have different sources or authors). That’s data!

    The least-bad “solution” here is to (1) tag every field (not just data source) with licensing information, and (2) have data-reading software create new, accurate attribution information every time a new view into the data is created. (I actually know of at least one company that does this internally!) This is not impossible, but it is a big burden on data software developers, who must now include a lawyer in their product design team. Most of them will just go ahead and violate the licenses instead, pass the burden on to their users to figure out what the heck is going on, or both.

    Who creates data

    Most software is either under a very standard and well-understood open source license, or is produced by a single entity (or often even a single person!) that retains copyright and can adjust that license based on their needs. So if you find a piece of software that you’d like to use, you can either (1) just read their standard FOSS license, or (2) call them up and ask them to change it. (They might not change it, but at least they can if they want to.) This helps make copyleft problems manageable: if you find a true incompatibility, you can often ask the source of the problem to fix it, or fix it yourself (by changing the license on your software).

    Data sources typically can’t solve problems by relicensing, because many of the most important data sources are not authored by a single company or single author. In particular:

    • Governments: Lots of data is produced by governments, where licensing changes can literally require an act of the legislature. So if you do anything that goes against their license, or two different governments release data under conflicting licenses, you can’t just call up their lawyers and ask for a change.
    • Community collaborations: The biggest open software relicensing that’s ever been done (Mozilla) required getting permission from a few thousand people. Successful online collaboration projects can have 1-2 orders of magnitude more contributors than that, making relicensing is hard. Wikidata solved this the right way: by going with CC0.

    What is the bottom line?

    Copyleft (and, to a lesser extent, attribution licenses) works when the obligations placed on a user are in balance with the benefits those users receive. If they aren’t in balance, the materials don’t get used. Ultimately, if the data does not get used, our egos feel good (we released this!) but no one benefits, and regardless of the license, no one gets attributed and no new material is released. Unfortunately, even minor requirements like attribution can throw the balance out of whack. So if we genuinely want to benefit the world with our data, we probably need to let it go.

    So what to do?

    So if data is legally hard to build a license for, and the nature of data makes copyleft (or even attribution!) hard, what to do? I’ll go into that in my next post.

    by Luis Villa at September 14, 2016 01:00 PM

    Resident Mario

    September 13, 2016

    Wikimedia UK

    No article? No problem.

    Generating Article Placeholders on the Welsh Wikipedia

    The Welsh Language Wicipedia already punches above its weight with seventy thousand articles. That’s roughly one article for every eight Welsh speakers. But now a student in Germany has developed a new tool which can fill in the gaps on Wikipedia by borrowing data from another of Wikimedia’s projects – Wikidata.

    The aim of this new feature is to increase the access to open and free knowledge in Wikipedia.  The Article Placeholder will gather data, images and sources from Wikidata and display it Wikipedia style, making it easily readable and accessible.

    Currently the Article Placeholder is being trialled on a few smaller Wikipedia’s and after a consultation with the Welsh Wicipedia community it was agreed that we would activate the new extension here in Wales.

    An Article Placeholder for Hobbits on the Welsh Wikipedia

    An Article Placeholder for Hobbits on the Welsh Wikipedia

    The most obvious advantage of this functionality is the easy access to information which has not yet been included on Wicipedia, and with 20 million items in Wikidata, it’s not short on information. This in turn should encourage editors to create new articles using the information presented in the Article Placeholder.

    But perhaps the most exciting aspect of using Wikidata to generate Wikipedia content, is that Wikidata speaks hundreds of languages, including Welsh! This means that many pages it generates on the Welsh Wikipedia appear entirely in Welsh.

    If the Wikidata entry being used hasn’t yet been translated into Welsh, the Placeholder will display the information in English, however it is now easier than ever to link from the Placeholder to the Wikidata item and add a Welsh translation.  And plans are underway to hold Translate-a-thons with Welsh speakers in order to translate more Wikidata items into Welsh.

    Welsh can easily be added to any Wikidata label

    Welsh can easily be added to any Wikidata label

    It is hoped that embedding this feature into the Welsh language Wicipedia will provide Welsh speakers with a richer Wiki experience and will encourage more editors to create content and add Welsh translations to Wikidata, cementing the place of the Welsh language in the digital realm.

     

    Jason Evans

    Wikimedian in Residence

    National Library of wales

     

    by Jason.nlw at September 13, 2016 12:07 PM

    Wikimedia Foundation

    In an attempt to modernize copyright laws, the European Commission forgets about users

    Photo by Lin Kristensen, CC BY 2.0.

    Photo by Lin Kristensen, CC BY 2.0.

    Update (15 September 2016): The EC has released their official proposal for the Directive. It differs in some minor ways from the leaked version. Those differences do not substantially affect the analysis and concerns discussed in this post.

    Several documents by the European Commission (EC) leaked during the past couple of weeks, giving us a clear view of the Commission’s plans for EU copyright reform. The EC had great ambitions to modernize copyright and to “ensure wider access to content across the EU.” However, its proposals do not look good for the public’s ability to access and share knowledge on the Internet. The burden is now on the European Parliament and the EU Council to balance the proposal.

    The EC proposes creating a new copyright for publishers that would make it harder for the public to find news articles online and restrict their freedom to share the articles they do find. Another proposal targets online platforms built on user contributions—as the Wikimedia projects are—forcing them to implement technology to monitor for copyright infringement. Just as important are what the EC left out of its proposal, such as an EU-wide freedom of panorama copyright exception to give people the right to share photographs of public spaces. Some of the proposed rules would benefit libraries, museums, schools, and other important institutions for public knowledge. However, these benefits are highly circumscribed and far from outweigh the recommended measures’ harms.

    As part of its public consultation in preparation for its proposal, the EC asked for input specifically on the topic of freedom of panorama. We focused much of our comments on that exception, which has already been adopted in many EU member states. With a full freedom of panorama exception, it does not infringe copyright for people to take and share pictures of art and buildings in public spaces. It is a sensible copyright reform that redounds to the public’s benefit without significantly harming artists’ and architects’ ability to make a living. It is also a major topic in European copyright discussions and within the Wikimedia communities. While the EC does recommend all EU Member States incorporate freedom of panorama into their national law, and it recognizes that “the current situation holds back digital innovation in the areas of education, research, and preservation of cultural heritage”, it does not even consider harmonizing freedom of panorama EU-wide. In its study of possible reforms, it relegates the freedom of panorama issue to a single footnote.

    While failing to propose positive reforms like freedom of panorama harmonization, the EC pushes for regulations that are potentially harmful to Wikimedia. The EC wants to force sites that host “large amounts of works” to enter agreements with rightsholders that would require the services to monitor for copyright infringement on their platforms. The EC seems unconcerned with the difficulty in determining which platforms the law would affect, saying it would be based on “factors including the number of users and visitors and the amount of content uploaded”. Based on those factors, however, the Wikimedia projects may meet the criteria for regulation. There are tens of millions of articles on Wikipedia and media files on Wikimedia Commons, and hundreds of millions of monthly visitors to the Wikimedia sites. However, as seemed to be the consensus at a multistakeholder discussion of similar requirements in the US, it would be absurd to require the Wikimedia Foundation to implement costly and technologically impractical automated systems for detecting copyright infringement. The Wikimedia projects are dedicated to public domain and freely licensed content and they have dedicated volunteers who diligently remove content suspected of infringing copyright. Furthermore, beyond Wikimedia, this proposal would lead to over-removing non-infringing content, with a corresponding chilling effect on free expression and creativity.

    The EC’s recommendations also include the creation of a new 20-year copyright for press publishers—even more extreme than we and others feared. The concern behind the publisher’s right is that sites like Google News that aggregate news articles and list their headlines (accompanied by brief excerpts or summaries) are reducing traffic to news sites and thereby diminishing publishers’ ad revenue. The publisher’s right would force news aggregators to pay fees in order to aggregate articles—potentially including to simply list article headlines—or else be liable for copyright infringement. This proposal would make it more difficult for the public to find and access news articles, because there would be additional financial barriers to providing that access. It would also make it more difficult for new news aggregators to emerge to challenge existing ones. The extraordinarily long term for this right exacerbates these problems. Creating a new copyright for publishers could impair the public’s ability to learn about important events in the world around them.

    The proposal does contain small steps in the direction of positive copyright reform. It grants an exception to “cultural heritage institutions” who make copies of works for preservation purposes. However, “cultural heritage institution” is narrowly defined to include only libraries, archives, museums and film heritage institutions, with apparently no consideration of entities like Wikimedia and Internet Archive that are important for cultural heritage but do not fit a traditional mold. Limiting who is allowed to preserve works makes it more likely that the world will lose them. The proposal also recognizes the value and importance of text and data mining for research. Unfortunately, the proposed exception only covers public interest research institutions.

    These few brighter spots in the EC’s proposal are overshadowed by its many problems. Altogether, the Impact Assessment’s language and focus suggest that the EC’s primary concern is the amount of money legacy publishers are making. They try to frame this as concern for long-term “cultural diversity”, but they offer no support or argument for why it will diminish cultural diversity for businesses built on ink-and-paper revenue models to fail. They appear to give no credit, or even consideration, to the democratic cultural production that has flourished thanks to technological developments and Internet platforms. Instead, they paint these platforms as mostly indifferent to copyright infringement and as obstinate for refusing to capitulate to rightsholders’ demands for overzealous takedown systems.

    There are more issues with this proposal than can be addressed in one blog post, but it should be apparent by now that the EC’s recommendations must not be enacted as legislation. The European Parliament now has the opportunity to amend or reject the proposal.

    Charles M. RoslofLegal Counsel, Wikimedia Foundation
    John WeitzmannLegal and Policy Advisor, Wikimedia Germany (Deutschland)

    by Charles M. Roslof and John Weitzmann at September 13, 2016 05:11 AM

    September 12, 2016

    Luis Villa

    Copyleft and data: database law as (poor) platform

    tl;dr: Databases are a very poor fit for any licensing scheme, like copyleft, that (1) is intended to encourage use by the entire world but also (2) wants to place requirements on that use. This is because of broken legal systems and the way data is used. Projects considering copyleft, or even mere attribution, for data, should consider other approaches instead.

    Hollerith Census Machine Dials, by Marcin Wichary, under CC BY 2.0
    The original database: Hollerith Census Machine Dials, by Marcin Wichary, under CC BY 2.0.

    I’ve been a user of copyleft/share-alike licenses for a long time, and even helped draft several of them, but I’ve come around to the point of view that copyleft is a poor fit for data. Unfortunately, I’ve been explaining this a lot lately, so I want to explain why in writing. This first post will focus on how the legal system around databases is broken. Later posts will focus on how databases are hard to license, and what we might do about it.

    FOSS licensing, and particularly copyleft, relies on legal features database rights lack

    Defenders of copyleft often have to point out that copyleft isn’t necessarily anti-copyright, because copyleft depends on copyright. This is true, of course, but the more I think about databases and open licensing, the more I think “copyleft depends on copyright” almost understates the case – global copyleft depends not just on “copyright”, but on very specific features of the international copyright system which database law lacks.

    To put it in software terms, the underlying legal platform lacks the features necessary to reliably implement copyleft.

    Consider some differences between the copyright system and database law:

    • Maturity: Copyright has had 100 or so years as an international system to work out kinks like “what is a work” or “how do joint authors share rights?” Even software copyright law has existed for about 40 years. In contrast, database law in practice has existed for less  than 20 years, pretty much all of that in Europe, and I can count all the high court rulings on it on my fingers and toes. So key terms, like “substantial”, are pretty hard to define-courts and legislatures simply haven’t defined, or refined, the key concepts. This makes it very hard to write a general-purpose public license whose outcomes are predictable.

    • Stability: Related to the previous point, copyright tends to change incrementally, as long-standing concepts are slowly adapted to new circumstances. (The gradual broadening of fair use in the Google era is a good example of this.) In contrast, since there are so few decisions, basically every decision about database law leads to upheaval. Open Source licenses tend to have a shelf-life of about ten years; good luck writing a database license that means the same thing in ten years as it does today!

    • Global nature: Want to share copyrighted works with the entire world? Copyright (through the Berne Convention) has you covered. Want to share a database? Well, you can easily give it away to the whole world (probably!), but want to reliably put any conditions on that sharing? Good luck! You’ve now got to write a single contract that is enforceable in every jurisdiction, plus a license that works in the EU, Japan, South Korea, and Mexico. As an example again, “substantial” – used in both ODbL and CC 4.0 – is a term from the EU’s Database Directive, so good luck figuring out what it means in a contract in the US or within the context of Japan’s database law.

    • Default rights: Eben Moglen has often pointed out that anyone who attacks the GPL is at a disadvantage, because if they somehow show that the license is legally invalid, then they get copyright’s “default”: which is to say, they don’t get anything. So they are forced to fight about the specific terms, rather than the validity of the license as a whole. In contrast, in much of the world (and certainly in the US), if you show that a database license is legally invalid, then you get database’s default: which is to say, you get everything. So someone who doesn’t want to follow the copyleft has very, very strong incentives to demolish your license altogether. (Unless, of course, the entire system shifts from underneath you to create a stronger default – like it may have in the EU with the Ryanair case.)

    With all these differences, what starts off as hard (“write a general-purpose, public-facing license that requires sharing”) becomes insanely difficult in the database context. Key goals of a general-purpose, public license – global, predictable, reliable – are very hard to do.

    In  upcoming posts, I’ll try to explain why, even if it were possible to write such a license from a legal perspective, it might not be a good idea because of how databases are used.

    by Luis Villa at September 12, 2016 05:22 PM

    Wiki Education Foundation

    The Roundup: Clouds in your coffee

    Coffee is an essential part of a million morning rituals every day. For millions of bathrobe-clad and bed-headed people, coffee is a cup of pure vitality.

    But there’s a dark side to the dark liquid. For those who experience anxiety symptoms, coffee can encourage the onset of panic attacks. It’s a fascinating and little-discussed side effect, and you can read all about it thanks to students in Dr. Michelle Mynlieff’s Neurobiology course at Marquette University.

    Student editors in that course transformed an article that sat at less than 400 words, expanding it by more than 10 times its original length. The article, “Caffeine-induced anxiety disorder,” discusses how caffeine works, and how it effects anxiety.

    It’s part of a set of articles improved by Dr. Mynlieff’s students, including a major expansion of the article on neuroscientists themselves! The Neuroscientist article sat at just three paragraphs, with two references (and one of them was a dead link). Students expanded the article with historical context and a summary of existing research projects.

    Students expanded the article on Adipsia, the rare decreased sensation of thirst that can be a sign of diabetes. Others expanded Camptocormia, a bent spine often seen among the elderly, and Myocionus dystonia, a muscle disorder that causes abnormal posture. The camptocormia article was expanded from a three-sentence article to discuss the history of the disease, the ways it is diagnosed, and some of the causes, treatments and current areas of research.

    These students are making large strides toward the public’s understanding of how biology and chemistry are part of their lives. That includes detailed improvements of phenomenon related to brain tumors, and an overview of how the communication between the sympathetic nervous system and the adrenal medulla is involved in anxiety, obesity and stress.

    They also expanded knowledge available to researchers. For example, students expanded an article on a particular gene — SLC7A11. The absence or impairment of this gene’s expression may play a role in drug addiction and schizophrenia, as well as Alzheimer’s and Parkinson diseases.

    These articles have been viewed 610,000 times since these students took them on! Thanks to these students, the entire world has access to knowledge that helps us better understand the way our bodies work.

    Think your students might want to share their knowledge to improve the world? Check out our Year of Science initiative, or send us an e-mail: contact@wikiedu.org.


    Photo: Coffee & Cream by Yuri Ivanov, CC BY 2.0 via Flickr

    by Eryk Salvaggio at September 12, 2016 04:00 PM

    Wikimedia Foundation

    Wikimedia Research Newsletter, August 2016

    AI-generated Wikipedia articles give rise to debate about research ethics

    At the International Joint Conference on Artificial Intelligence (IJCAI) – one of the prime AI conferences, if not the pre-eminent one – Banerjee and Mitra from Penn State published the paper “WikiWrite: Generating Wikipedia Articles Automatically”.[1]

    The system described in the paper looks for red links in Wikipedia and classifies them based on their context. To find section titles, it then looks for similar existing articles. With these titles, the system searches the web for information, and eventually uses content summarization and a paraphrasing algorithm. The researchers uploaded 50 of these automatically created articles to Wikipedia, and found that 47 of them survived. Some were heavily edited after upload, others not so much.

    Artificial.intelligence.jpg

    While I was enthusiastic about the results, I was surprised by the suboptimal quality of the articles I reviewed – three that were mentioned in the paper. After a brief discussion with the authors, a wider discussion was initiated on the Wiki-research mailing list. This was followed by an entry on the English Wikipedia administrators’ noticeboard (which includes a list of all accounts used for this particular research paper). The discussion led to the removal of most of the remaining articles.

    The discussion concerned the ethical implications of the research, and using Wikipedia for such an experiment without the consent of Wikipedia contributors or readers. The first author of the paper was an active member of the discussion; he showed a lack of awareness of these issues, and appeared to learn a lot from the discussion. He promised to take these lessons to the relevant research community – a positive outcome.

    In general, this sets an example for engineers and computer-science engineers, who often show a lack of awareness of certain ethical issues in their research. Computer scientists are typically trained to think about bits and complexities, and rarely discuss in depth how their work impacts human lives. Whether it’s social networks experimenting with the mood of their users, current discussions of biases in machine-learned models, or the experimental upload of automatically created content in Wikipedia without community approval, computer science has generally not reached the level of awareness of some other sciences for the possible effects of their research on human subjects, at least as far as this reviewer can tell.

    Even in Wikipedia, there’s no clear-cut, succinct Wikipedia policy I could have pointed the researchers to. The use of sockpuppets was a clear violation of policy, but an incidental component of the research. WP:POINT was a stretch to cover the situation at hand. In the end, what we can suggest to researchers is to check back with the Wikimedia Research list. A lot of people there have experience with designing research plans with the community in mind, and it can help to avoid uncomfortable situations.

    See also our 2015 review of a related paper coauthored by the same authors: “Bot detects theatre play scripts on the web and writes Wikipedia articles about them” and other similarly themed papers they have published since then: “WikiKreator: Automatic Authoring of Wikipedia Content”[2], “WikiKreator: Improving Wikipedia Stubs Automatically”[3], “Filling the Gaps: Improving Wikipedia Stubs”[4]. DV

    Ethics researcher: Vandal fighters should not be allowed to see whether an edit was made anonymously

    A paper[5] in the journal Ethics and Information Technology examines the “system of surveillance” that the English Wikipedia has built up over the years to deal with vandalism edits. The author, Paul B. de Laat from the University of Groningen, presents an interesting application of a theoretical framework by US law scholar Frederick Schauer that focuses on the concepts of rule enforcement and profiling. While providing justification for the system’s efficacy and largely absolving it of some of the objections that are commonly associated with the use of profiling in e.g. law enforcement, de Laat ultimately argues that in its current form, it violates an alleged “social contract” on Wikipedia by not treating anonymous and logged-in edits equally. Although generally well-informed about both the practice and the academic research of vandalism fighting, the paper unfortunately fails to connect to an existing debate about very much the same topic – potential biases of artificial intelligence-based anti-vandalism tools against anonymous edits – that was begun last year[6] by the researchers developing ORES (an edit review tool that was just made available to all English Wikipedia users, see this week’s Technology report) and most recently discussed in the August 2016 WMF research showcase.

    The paper first gives an overview of the various anti-vandalism tools and bots in use, recapping an earlier paper[7] where de Laat had already asked whether these are “eroding Wikipedia’s moral order” (following an even earlier 2014 paper in which he had argued that new-edit patrolling “raises a number of moral questions that need to be answered urgently”). There, de Laat’s concerns included the fact that some stronger tools (rollback, Huggle, and STiki) are available only to trusted users and “cause a loss of the required moral skills in relation to newcomers”, and that they a lack of transparency about how the tools operate (in particular when more sophisticated artificial intelligence/machine learning algorithms such as neural networks are used). The present paper expands on a separate but related concern, about the use of “profiling” to pre-select which recent edits will be subject to closer human review. The author emphasizes that on Wikipedia this usually does not mean person-based offender profiling (building profiles of individuals committing vandalism), citing only one exception in form of a 2015 academic paper – cf. our review: “Early warning system identifies likely vandals based on their editing behavior“. Rather, “the anti-vandalism tools exemplify the broader type of profiling” that focuses on actions. Based on Schauer’s work, the author asks the following questions:

    1. “Is this profiling profitable, does it bring the rewards that are usually associated with it?”
    2. “is this profiling approach towards edit selection justified? In particular, do any of the dimensions in use raise moral objections? If so, can these objections be met in a satisfactory fashion, or do such controversial dimensions have to be adapted or eliminated?”

    But snakes are much more dangerous! According to Schauer, while general rules are always less fair than case-by-case decisions, their existence can be justified by other arguments.

    To answer the first question, the author turns to Schauer’s work on rules, in a brief summary that is worth reading for anyone interested in Wikipedia policies and guidelines – although de Laat instead applies the concept to the “procedural rules” implicit in vandalism profiling (such as that anonymous edits are more likely to be worth scrutinizing). First, Schauer “resolutely pushes aside the argument from fairness: decision-making based on rules can only be less just than deciding each case on a particularistic basis “. (For example, a restaurant’s “No Dogs Allowed” rule will unfairly exclude some well-behaved dogs, while not prohibiting much more dangerous animals such as snakes.) Instead, the existence of rules have to be justified by other arguments, of which Schauer presents four:

    • Rules “create reliability/predictability for those affected by the rule: rule-followers as well as rule-enforcers”.
    • Rules “promote more efficient use of resources by rule-enforcers” (e.g. in case of a speeding car driver, traffic police and judges can apply a simple speed limit instead having to prove in detail that an instance of driving was dangerous).
    • Rules, if simple enough, reduce the problem of “risk-aversion” by enforcers, who are much more likely to make mistakes and face repercussions if they have to make case by case decisions.
    • Rules create stability, which however also presents “an impediment to change; it entrenches the status-quo. If change is on a society’s agenda, the stability argument turns into an argument against having (simple) rules.”

    The author cautions that these four arguments have to be reinterpreted when applying them to vandalism profiling, because it consists of “procedural rules” (which edits should be selected for inspection) rather than “substantive rules” (which edits should be reverted as vandalism, which animals should be disallowed from the restaurant). While in the case of substantive rules, their absence would mean having to judge everything on a case-by-case basis, the author asserts that procedural rules arise in a situation where the alternative would be to to not judge at all in many cases: Because “we have no means at our disposal to check and pass judgment on all of them; a selection of a kind has to be made. So it is here that profiling comes in”. With that qualification, Schauer’s second argument provides justification for “Wikipedian profiling [because it] turns out to be amazingly effective”, starting with the autonomous bots that auto-revert with an (aspired) 1:1000 false-positive rate.

    De Laat also interprets “the Schauerian argument of reliability/predictability for those affected by the rule” in favor of vandalism profiling. Here, though, he fails to explain the benefits of vandals being able to predict which kind of edits will be subject to scrutiny. This also calls into question his subsequent remark that “it is unfortunate that the anti-vandalism system in use remains opaque to ordinary users”. The remaining two of Schauer’s four arguments are judged as less pertinent. But overall the paper concludes that it is possibile to justify the existence of vandalism profiling rules as beneficial via Schauer’s theoretical framework.

    Police traffic stops: A good analogy for anti-vandalism patrol on Wikipedia?

    Photo by böhringer, CC BY-SA 3.0

    Next, de Laat turns to question 2, on whether vandalism profiling is also morally justified. Here he relies on later work by Schauer, from a 2003 book, “Profiles, Probabilities, and Stereotypes”, that studies such matters as profiling by tax officials (selecting which taxpayers have to undergo an audit), airport security (selecting passengers for screening) and by police officers (e.g. selecting cars for traffic stops). While profiling of some kind is a necessity for all these officials, the particular characteristics (dimensions) used for profiling can be highly problematic (see e.g. Driving While Black). For de Laat’s study of Wikipedia profiling, “two types of complications are important: (1) possible ‘overuse’ of dimension(s) (an issue of profile effectiveness) and (2) social sensibilities associated with specific dimension(s) (a social and moral issue).” Overuse can mean relying on stereotypes that have no basis in reality, or over-reliance on some dimensions that, while having a non-spurious correlation with the deviant behavior, are over-emphasized at the expense of other relevant characteristics because they are more visible or salient to the profile. E.g. while Schauer considers that it may be justified for “airport officials looking for explosives [to] single out for inspection the luggage of younger Muslim men of Middle Eastern appearance”, it would be an over-use if “officials ask all Muslim men and all men of Middle Eastern origin to step out of line to be searched”, thus reducing their effectiveness by neglecting other passenger characteristics. This is also an example for the second type of complication profiling, where the selected dimensions are socially sensitive – indeed, for the specific case of luggage screening in the US, “the factors of race, religion, ethnicity, nationality, and gender have expressly been excluded from profiling” since 1997.

    Applying this to the case of Wikipedia’s anti-vandalism efforts, de Laat first observes that complication (1) (overuse) is not a concern for fully automated tools like ClueBotNG – obviously their algorithm applies the existing profile directly without a human intervention that could introduce this kind of bias. For Huggle and STiki, however, “I see several possibilities for features to be overused by patrollers, thereby spoiling the optimum efficacy achievable by the profile embedded in those tools.” This is because both tools do not just use these features in their automatic pre-selection of edits to be reviewed, but expose at least the fact whether an edit was anonymous to the human patroller in the edit review interface. (The paper examines this in detail for both tools, also observing that Huggle presents more opportunities for this kind of overuse, while STiki is more restricted. However, there seems to have been no attempt to study empirically whether this overuse actually occurs.)

    Regarding complication (2), whether some of the features used for vandalism profiling are socially sensitive, de Laat highlights that they include some amount of discrimination by nationality: IP edits geolocated to the US, Canada, and Australia have been found to contain vandalism more frequently and are thus more likely to be singled out for inspection. However, he does not consider this concern “strong enough to warrant banning the country-dimension and correspondingly sacrifice some profiling efficacy”, chiefly because there do not appear to be a lot of nationalistic tensions within the English Wikipedia community that could be stirred up by this.

    In contrast, de Laat argues that “the targeting of contributors who choose to remain anonymous … is fraught with danger since anons already constitute a controversial group within the Wikipedian community.” Still, he acknowledges the “undisputed fact” that the ratio of vandalism is much higher among anonymous edits. Also, he rejects the concern that they might be more likely to be the victim of false positives:

    normally [IP editors] do not experience any harm when their edits are selected and inspected as a result of anon-powered profiling; they will not even notice that they were surveilled since no digital traces remain of the patrolling. … The only imaginable harm is that patrollers become over focussed on anons and indulge in what I called above ‘overinspection’ of such edits and wrongly classify them as vandalism … As a consequence, they might never contribute to Wikipedia again. … Nevertheless, I estimate this harm to be small. At any rate, the harm involved would seem to be small in comparison with the harassment of racial profiling—let alone that an ‘expressive harm hypothesis’ applies.

    With this said, de Laat still makes the controversial call “that the anonymous-dimension should be banned from all profiling efforts” – including removing it from the scoring algorithms of Huggle, STiki and ClueBotNG. Instead of concerns about individual harm,

    my main argument for the ban is a decidedly moral one. From the very beginning the Wikipedian community has operated on the basis of a ‘social contract’ that makes no distinction between anons and non-anons – all are citizens of equal stature. … In sum, the express profiling of anons turns the anonymity dimension from an access condition into a social distinction; the Wikipedian community should refrain from institutionalizing such a line of division. Notice that I argue, in effect, that the Wikipedian community has only two choices: either accept anons as full citizens or not; but there is no morally defensible social contract in between.

    Sadly, while the paper is otherwise rich in citations and details, it completely fails to provide evidence for the existence of this alleged social contract. While it is true that “the ability of almost anyone to edit (most) articles without registration” forms part of Wikipedia’s founding principles (a principle that this reviewer strongly agrees with), the “equal stature” part seems to be de Laat’s own invention – there is a long list of things that, by longstanding community consensus, require the use of an account (which after all is freely available to everyone, without even requiring an email address). Most of these restrictions – say, the inability to create new articles or being prevented from participating in project governance during admin or arbcom votes – seem much more serious than the vandalism profiling that is the topic of de Laat’s paper. TB

    Briefly

    Conferences and events

    Other recent publications

    A list of other recent publications that could not be covered in time for this issue—contributions are always welcome for reviewing or summarizing newly published research. This month, the list mainly gathers research about the extraction of specific content from Wikipedia.

    • “Large SMT Data-sets Extracted from Wikipedia”[8] From the abstract: “The article presents experiments on mining Wikipedia for extracting SMT [ statistical machine translation ] useful sentence pairs in three language pairs. … The optimized SMT systems were evaluated on unseen test-sets also extracted from Wikipedia. As one of the main goals of our work was to help Wikipedia contributors to translate (with as little post editing as possible) new articles from major languages into less resourced languages and vice-versa, we call this type of translation experiments ‘in-genre’ translation. As in the case of ‘in-domain’ translation, our evaluations showed that using only ‘in-genre’ training data for translating same genre new texts is better than mixing the training data with ‘out-of-genre’ (even) parallel texts.”
    • “Recognizing Biographical Sections in Wikipedia”[9] From the abstract: “Thanks to its coverage and its availability in machine-readable format, [Wikipedia] has become a primary resource for large scale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person’s page, we identify the list of sections where information about her/his life is present [as opposed to nonbiographical sections, e.g. ‘Early Life’ but not ‘Legacy’ or ‘Selected writings’].”
    • “Extraction of lethal events from Wikipedia and a semantic repository”[10] From the abstract and conclusion: “This paper describes the extraction of information on lethal events from the Swedish version of Wikipedia. The information searched includes the persons’ cause of death, origin, and profession. […] We also extracted structured semantic data from the Wikidata store that we combined with the information retrieved from Wikipedia … [The resulting] data could not support the existence of the Club 27“.
    • “Learning Topic Hierarchies for Wikipedia Categories”[11] (from frequently used section headings in a category, e.g. “eligibility”, “endorsements” or “results” for Category:Presidential elections)
    • “‘A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce’: Learning State Changing Verbs from Wikipedia Revision History.”[12] From the abstract: “We propose to learn state changing verbs [such as ‘born’, ‘died’, ‘elected’, ‘married’] from Wikipedia edit history. When a state-changing event, such as a marriage or death, happens to an entity, the infobox on the entity’s Wikipedia page usually gets updated. At the same time, the article text may be updated with verbs either being added or deleted to reflect the changes made to the infobox. … We observe in our experiments that when state-changing verbs are added or deleted from an entity’s Wikipedia page text, we can predict the entity’s infobox updates with 88% precision and 76% recall.”
    • “Extracting Representative Phrases from Wikipedia Article Sections”[13] From the abstract: “Since [Wikipedia’s] long articles are taking time to read, as well as section titles are sometimes too short to capture comprehensive summarization, we aim at extracting informative phrases that readers can refer to.”
    • “Accurate Fact Harvesting from Natural Language Text in Wikipedia with Lector”[14] From the abstract: “Many approaches have been introduced recently to automatically create or augment Knowledge Graphs (KGs) with facts extracted from Wikipedia, particularly its structured components like the infoboxes. Although these structures are valuable, they represent only a fraction of the actual information expressed in the articles. In this work, we quantify the number of highly accurate facts that can be harvested with high precision from the text of Wikipedia articles […]. Our experimental evaluation, which uses Freebase as reference KG, reveals we can augment several relations in the domain of people by more than 10%, with facts whose accuracy are over 95%. Moreover, the vast majority of these facts are missing from the infoboxes, YAGO and DBpedia.”
    • “Extracting Scientists from Wikipedia”[15] From the abstract: “[We] describe a system that gathers information from Wikipedia articles and existing data from Wikidata, which is then combined and put in a searchable database. This system is dedicated to making the process of finding scientists both quicker and easier.”
    • “LeadMine: Disease identification and concept mapping using Wikipedia”[16] From the abstract: “LeadMine, a dictionary/grammar based entity recognizer, was used to recognize and normalize both chemicals and diseases to MeSH [ Medical Subject Headings ] IDs. The lexicon was obtained from 3 sources: MeSH, the Disease Ontology and Wikipedia. The Wikipedia dictionary was derived from pages with a disease/symptom box, or those where the page title appeared in the lexicon.”
    • “Finding Member Articles for Wikipedia Lists”[17] From the abstract: “… for a given Wikipedia article and list, we determine whether the article can be added to the list. Its solution can be utilized on automatic generation of lists, as well as generation of categories based on lists, to help self-organization of knowledge structure. In this paper, we discuss building classifiers for judging on whether an article belongs to a list or not, where features are extracted from various components including list titles, leading sections, as well as texts of member articles. … We report our initial evaluation results based on Bayesian and other classifiers, and also discuss feature selection.”
    • “Study of the content about documentation sciences in the Spanish-language Wikipedia”[18] (in Spanish). From the English abstract: “This study explore how [Wikipedia] addresses the documentation sciences, focusing especially on pages that discuss the discipline, not only the page contents, but the relationships between them, their edit history, Wikipedians who participated and all aspects that can influence on how the image of this discipline is projected” [sic]. TB

    References

    1. Siddhartha Banerjee, Prasenjit Mitra, “WikiWrite: Generating Wikipedia Articles Automatically”.
    2. Banerjee, Siddhartha; Mitra, Prasenjit (October 2015). “WikiKreator: Automatic Authoring of Wikipedia Content”. AI Matters 2 (1): 4–6. doi:10.1145/2813536.2813538. ISSN 2372-3483.  Closed access
    3. Banerjee, Siddhartha and Mitra, Prasenjit: “WikiKreator: Improving Wikipedia Stubs Automatically, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing” (Volume 1: Long Papers), July 2015, Beijing, China, Association for Computational Linguistics, pages 867–877,
    4. Banerjee, Siddhartha; Mitra, Prasenjit (2015). “Filling the Gaps: Improving Wikipedia Stubs”. Proceedings of the 2015 ACM Symposium on Document Engineering. DocEng ’15. New York, NY, USA: ACM. pp. 117–120. doi:10.1145/2682571.2797073. ISBN 9781450333078.  Closed access
    5. Laat, Paul B. (30 April 2016). “Profiling vandalism in Wikipedia: A Schauerian approach to justification”. Ethics and Information Technology: 1–18. doi:10.1007/s10676-016-9399-8. ISSN 1388-1957. 
    6. See e.g. Halfaker, Aaron (December 6, 2015). “Disparate impact of damage-detection on anonymous Wikipedia editors”. Socio-technologist. 
    7. Laat, Paul B. de (2 September 2015). “The use of software tools and autonomous bots against vandalism: eroding Wikipedia’s moral order?”. Ethics and Information Technology 17 (3): 175–188. doi:10.1007/s10676-015-9366-9. ISSN 1388-1957. 
    8. Tufiş, Dan; Ion, Radu; Dumitrescu, Ştefan; Ştefănescu2, Dan (26 May 2014). “Large SMT Data-sets Extracted from Wikipedia” (PDF). Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). TUFI 14.103. ISBN 978-2-9517408-8-4. 
    9. Aprosio, Alessio Palmero; Tonelli, Sara (17 September 2015). “Recognizing Biographical Sections in Wikipedia”. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal. pp. 811–816. 
    10. Norrby, Magnus; Nugues, Pierre (2015). Extraction of lethal events from Wikipedia and a semantic repository (PDF). workshop on Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities at NODALIDA 2015. Vilnius, Lithuania. 
    11. Hu, Linmei; Wang, Xuzhong; Zhang, Mengdi; Li, Juanzi; Li, Xiaoli; Shao, Chao; Tang, Jie; Liu, Yongbin (2015-07-26). “Learning Topic Hierarchies for Wikipedia Categories” (PDF). Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers). Beijing, China. pp. 346–351. 
    12. Nakashole, Ndapa; Mitchell, Tom; Wijaya, Derry (2015). “A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce”: Learning State Changing Verbs from Wikipedia Revision History. (PDF). Proceedings of EMNLP 2015. Lisbon, Portugal. pp. 518–523. 
    13. Shan Liu, Mizuho Iwaihara: Extracting Representative Phrases from Wikipedia Article Sections, DEIM Forum 2016 C3-6. http://db-event.jpn.org/deim2016/papers/314.pdf
    14. Cannaviccio, Matteo; Barbosa, Denilson; Merialdo, Paolo (2016). “Accurate Fact Harvesting from Natural Language Text in Wikipedia with Lector”. Proceedings of the 19th International Workshop on Web and Databases. WebDB ’16. New York, NY, USA: ACM. doi:10.1145/2932194.2932203. ISBN 9781450343107.  Closed access
    15. Ekenstierna, Gustaf Harari; Lam, Victor Shu-Ming. Extracting Scientists from Wikipedia. Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, July 11, 2016, Krakow, Poland. 
    16. Lowe, Daniel M.; O’Boyle, Noel M.; Sayle, Roger A. “LeadMine: Disease identification and concept mapping using Wikipedia” (PDF). Proceeding of the fifth BioCreative challenge evaluation workshop. BCV 2015. pp. 240–246. 
    17. Shuang Sun, Mizuho Iwaihara: Finding Member Articles for Wikipedia Lists. DEIM Forum 2016 C3-3. http://db-event.jpn.org/deim2016/papers/184.pdf
    18. Martín Curto, María del Rosario (2016-04-15). “Estudio sobre el contenido de las Ciencias de la Documentación en la Wikipedia en español” (info:eu-repo/semantics/bachelorThesis).  thesis, University of Salamanca, 2014

    Wikimedia Research Newsletter
    Vol: 6 • Issue: 8 • August 2016
    This newletter is brought to you by the Wikimedia Research Committee and The Signpost
    Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter[archives] [signpost edition] [contribute] [research index]


    by Tilman Bayer at September 12, 2016 05:44 AM

    Resident Mario

    Brion Vibber

    Dell P2415Q 24″ UHD monitor review

    Last year I got two Dell P2415Q 24″ Ultra-HD monitors, replacing my old and broken 1080p monitor, to use with my MacBook Pro. Since the model’s still available, thought I’d finally post my experience.

    tl;dr:

    Picture quality: great
    Price:
    good for what you get and they’re cheaper now than they were last year.
    Functionality:
    mixed; some problems that need workarounds for me.

    So first the good: the P2415Q is the “right size, right resolution” for me; with an operating system such as Mac OS X, Windows 10, or some Linux environments that handles 200% display scaling correctly, it feels like a 24″ 1080p monitor that shows much, much sharper text and images. When using the external monitors with my 13″ MacBook Pro, the display density is about the same as the internal display and the color reproduction seems consistent enough to my untrained eye that it’s not distracting to move windows between the laptop and external screens.

    Two side by side plus the laptop makes for a vveerryy wwiiddee desktop, which can be very nice when developing & testing stuff since I’ve got chat, documentation, terminal, code, browser window, and debugger all visible at once. 🙂

    The monitor accepts DisplayPort input via either full-size or mini, and also accepts HDMI (limited to 30 Hz at the full resolution, or full 60Hz at 1080p) which makes it possible to hook up random devices like phones and game consoles.

    There is also an included USB hub capability, which works well enough but the ports are awkward to reach.

    The bad: there are three major pain points for me, in reducing order of WTF:

    1. Sometimes the display goes black when using DisplayPort; the only way to resolve it seems to be to disconnect the power and hard-reset the monitor. Unplugging and replugging the DisplayPort cable has no effect. Switching cables has no effect. Rebooting computer has no effect. Switching the monitor’s power on and off has no effect. Have to reach back and yank out the power.
    2. There are neither speakers nor audio passthrough connectors, but when connecting over HDMI devices like game consoles and phones will attempt to route audio to the monitor, sending all your audio down a black hole. Workaround is to manually re-route audio back to default or attach a USB audio output path to the connected device.
    3. Even though the monitor can tell if there’s something connected to each input or not, it won’t automatically switch to the only active input. After unplugging my MacBook from the DisplayPort and plugging a tablet in over HDMI, I still have to bring up the on-screen menu and switch inputs.

    The first problem is so severe it can make the unit appear dead, but is easily worked around. The second and third may or may not bother you depending on your needs.

    So, happy enough to use em but there’s real early adopter pain in this particular model monitor.

    by brion at September 12, 2016 12:21 AM