On a generally calm late summer day in 1628, a wooden warship—one of the largest to ever sail—left Stockholm on its first voyage. Named Vasa, the vessel was the culmination of an effort to solidify the fledgling Swedish Empire‘s control over the Baltic Sea.
Unfortunately for the Swedish, however, Vasa heeled over and sank in full view of a crowd of people who had gathered to witness the occasion. 30 people died in the disaster, caused by what the King of Sweden called “imprudence and negligence.”
Vasa spent 333 years at the bottom of the Baltic. It was refloated in 1959–61, after a complex and difficult operation conducted by maritime archaeologists.
Today, Vasa is a museum ship in Stockholm. According to its Wikipedia article, the ship “has become a widely recognized symbol of the Swedish ‘great power period’ and is … a de facto standard in the media and among Swedes for evaluating the historical importance of shipwrecks.”
The man who wrote those words, Peter Isotalo, rewrote the ship’s English-language Wikipedia article with Henrik, another Wikipedia editor. Isotalo says that the ship is a time capsule, one that “offers an insight into a completely different, lost world. It has a physical presence that makes it easy to comprehend and size up even for non-nerds like myself.”
Thanks in large part to Isotalo and Henrick, Vasa is a “featured” article on Wikipedia, a quality marker which recognizes the encyclopedia’s “very best work” and is “distinguished by professional standards of writing, presentation, and sourcing.”
Mary Rose seen in the Anthony Roll, public domain/CC0.
Mary Rose, closer to the present day. Photo via the Mary Rose Trust, CC BY-SA 3.0.
After Vasa, Isotalo continued onto another lost warship raised hundreds of years after being sunk—England’s Mary Rose. Isotalo estimates that he has spent hundreds of hours on the two articles. “When I get hooked on a particular topic and really go for it,” he says, “I tend to pursue pretty much every lead I can get a hold of.”
This sort of obstinacy has helped in ensuring that the articles are complete; he points to the “Causes of sinking/Modern theories” section in Mary Rose to make this point. “I spent quite a lot of time tracking down different perspectives and made sure to check up even on fairly obscure references,” he said, including “a rather minor (but important) critical note about potential eyewitness bias from Maurice de Brossard in a 1984 issue of Mariner’s Mirror.”
Extensive research, however, posed its own set of problems. “The history of [Vasa and Mary Rose] is often subsumed under layers of dramatic storytelling,” Isotalo told me. “There is a tendency to fit into a rather nationalist historical narrative—especially with Vasa, where the history of the ship itself has been presented through the perspective of being the personal property of an absolute monarch, which is clearly not true.” He continued:
The modern discovery of Vasa in the 1950s is often portrayed as the work of a single man (Anders Franzén, pictured at right), and previous knowledge of the ship’s location has been largely ignored or glossed over. The decision to salvage the ship is also portrayed as something more or less self-evident, though it really wasn’t. Today, most maritime archaeologists would consider it an unnecessary (and extremely costly) risk to salvage entire shipwrecks. To this day, there aren’t even rough estimates of what it actually cost to salvage Vasa, or who footed the bill.
Born in 1980 in the-then Soviet Union, Isotalo moved to Sweden as a child. His interest in maritime history was kindled during this time, as he was able to visit the Vasa Museum and later work in the Vasa Museum’s gift shop. The latter experience came in handy when writing Vasa‘s Wikipedia article, as he had easy access to the museum’s staff—including its Director of Research Fred Hocker, who Isotalo called “one of the leading experts on Vasa.” These individuals were able to give him assistance with the history of the ship, what was happening in Sweden and its navy around that time, the recovery of the ship from the bottom of the sea, and how it has been preserved since then.
Isotalo’s connections and Wikipedia work were also useful in obtaining a set of 57 images from the Mary Rose Trust, the charitable organization that runs the Mary Rose museum and is charged with preserving the ship’s remains.
Isotalo’s interest in maritime history on Wikipedia has continued even after writing about Vasa and Mary Rose, manifesting itself in several more featured and good-quality articles:
Anthony Roll, a preserved inventory of ships in the English Navy in the 1540s, complete with illustrations;
Kronan, another sunken Swedish wooden warship discovered in the 1950s (but not raised to the surface);
Battle of Öland, where Kronan—the admiral’s flagship—was sunk;
Udema and Turuma, two ship types used by the Swedish archipelago fleet in the eighteenth to nineteenth centuries.
When he’s not editing Wikipedia, Isotalo is a trained records manager/archivist with a bachelor’s degree in history. He describes himself as a civil servant/bureaucrat of the Weberian variety, and works for the Swedish Committee for Afghanistan, a foreign aid non-governmental organization (NGO).
Ed Erhart, Editorial Associate
Wikimedia Foundation
Wiki Loves Monuments is proud to be supported by UNESCO through its Unite4Heritage program.
Unite4Heritage is a global movement powered by UNESCO that aims to celebrate and safeguard cultural heritage and diversity around the world. Launched in response to the unprecedented recent attacks on heritage, the campaign calls on everyone to stand up against sectarian propaganda by celebrating the places, objects and cultural traditions that make the world such a rich and vibrant place.
As part of Unite4Heritage, UNESCO is supporting Wiki Loves Monuments on social media through September by using amazing images entered into previous competitions.
Wiki Loves Monuments is the largest photography competition in the world, giving people in 41 countries the opportunity share their built cultural heritage. The competition is run by hundreds of volunteers who want to educate and inspire people about built cultural heritage. It aligns with the goals of Unite4Heritage by celebrating and raising awareness of built heritage with the 500 million people who visit Wikipedia each month.
Photographs entered into Wiki Loves Monuments are available under open licenses so that they can be used by everyone.UNESCO strongly supports the creation of open license content, giving free access to information and unrestricted use of electronic data for everyone. Many of the photos will be added Wikipedia articles by the tens of thousands of Wikimedia community volunteers who create it.
You can take part in Wiki Loves Monuments by exploring, sharing and photographing the built heritage that is important to you with the world and encourage others to take part by sharing social media messages on Facebook, Twitter and Instagram from UNESCO, the Wikimedia Foundation, and others. Wiki Loves Monuments is running in 41 countries around the world from 1 to 30 of September 2016.
(This blog post was prepared by John Cummings, Wikimedian in Residence at UNESCO.)
I just explained why open and copyleft licensing, which work fairly well in the software context, might not be legally workable, or practically a good idea, around data. So what to do instead? tl;dr: say no to licenses, say yes to norms.
In this complex landscape, it should be no surprise that there are no perfect solutions. I’ll start with two behaviors that can help.
Education and lawyering: just say no
If you’re reading this post, odds are that, within your organization or community, you’re known as a data geek and might get pulled in when someone asks for a new data (or hardware, or culture) license. The best thing you can do is help explain why restrictive “public” licensing for data is a bad idea. To the extent there is a community of lawyers around open licensing, we also need to be comfortable saying “this is a bad idea”.
These blog posts, to some extent, are my mea culpa for not saying “no” during the drafting of ODbL. At that time, I thought that if only we worked hard enough, and were creative enough, we could make a data license that avoided the pitfalls others had identified. It was only years later that I finally realized there were systemic reasons why we were doomed, despite lots of hard work and thoughtful lawyering. These posts lay out why, so that in the future I can say no more efficiently. Feel free to borrow them when you also need to say no :)
Project structure: collaboration builds on itself
When thinking about what people actually want from open licenses, it is important to remember that how people collaborate is deeply impacted by factors of how your project is structured. (To put it another way, architecture is also law.) For example, many kernel contributors feel that the best reason to contribute your code to the Linux kernel is not because of the license, but because the high velocity of development means that your costs are much lower if you get your features upstream quickly. Similarly, if you can build a big community like Wikimedia’s around your data, the velocity of improvements is likely to reduce the desire to fork. Where possible, consider also offering services and collaboration spaces that encourage people to work in public, rather than providing the bare minimum necessary for your own use. Or more simply, spend money on community people, rather than lawyers! These kinds of tweaks can often have much more of an impact on free-riding and contribution than any license choice. Unfortunately, the details are often project specific – which makes it hard to talk about in a blog post! Especially one that is already too long.
Solving with norms
So if lawyers should advise against the use of data law, and structuring your project for collaboration might not apply to you, what then? Following Peter Desmet, Science Commons, and others, I think the right tool for building resilient, global communities of sharing (in data and elsewhere) is written norms, combined with a formal release of rights.
Norms are essentially optimistic statements of what should be done, rather than formal requirements of what must be done (with the enforcement power of the state behind them). There is an extensive literature, pioneered by Nobelist Elinor Ostrom, on how they are actually how a huge amount of humankind’s work gets done – despite the skepticism of economists and lawyers. Critically, they often work even without the enforcement power of the legal system. For example, academia’s anti-plagiarism norms (when buttressed by appropriate non-legal institutional supports) are fairly successful. While there are still plagiarism problems, they’re fairly comparable to the Linux kernel’s GPL-violation problems – even though, unlike GPL, there is no legal enforcement mechanisms!
Norms and licenses have similar benefits
In many key ways, norms are not actually significantly different than licenses. Norms and licenses both can help (or hurt) a community reach their goals by:
Educating newcomers about community expectations: Collaboration requires shared understanding of the behavior that will guide that collaboration. Written norms can create that shared expectation just as well as licenses, and often better, since they can be flexible and human-readable in ways legally-binding international documents can’t.
Serving as the basis for social pressure: For the vast majority of collaborative projects, praise, shame, and other social nudges, not legal threats, are the actual basis for collaboration. (If you need proof of this, consider the decades-long success of open source before any legal enforcement was attempted.) Again, norms can serve this role just as well or not better, since it is often desire to cooperate and a fear of shaming that are what actually drive collaboration.
Similar levels of enforcement: While you can’t use the legal system to enforce a norm, most people and organizations also don’t have the option to use the legal system to enforce licenses – it is too expensive, or too time consuming, or the violator is in another country, or one of many other reasons why the legal system might not be an option (especially in data!) So instead most projects result to tools like personal appeals or threats of publicity – tools that are still available with norms.
Working in practice (usually): As I mentioned above, basing collaboration on social norms, rather than legal tools, work all the time in real life. The idea that collaboration can’t occur without the threat of legal sanction is really a somewhat recent invention. (I could actually have listed this under differences – since, as Ostrom teaches us, legal mechanisms often fail where norms succeed, and I think that is the case in data too.)
Why are norms better?
Of course, if norms were merely “as good as” licenses in the ways I just listed, I probably wouldn’t recommend them. Here are some ways that they can be better, in ways that address some of the concerns I raised in my earlier posts in this series:
Global: While [building global norms is not easy](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3038591/), social norms based on appeals to the very human desires for collaboration and partnership can be a lot more global than the current schemes for protecting database or hardware rights, which aren’t international. (You can try to fake internationalization through a license, but as I pointed out in earlier posts, that is likely to fail legally, and be ignored by exactly the largest partners who you most want to get on board.)
Flexible: Many of the practical problems with licenses in data space boil down to their inflexibility: if a license presumes something to be true, and it isn’t, you might not be able to do anything about it. Norms can be much more generous – well-intentioned re-users can creatively reinterpret the rules as necessary to get to a good outcome, without having to ask every contributor to change the license. (Copyright law in the US provides some flexibility through fair use, which has been critical in the development of the internet. The EU does not extend such flexibility to data, though member states can add some fair dealing provisions if they choose. In neither case are those exceptions global, so they can’t be relied on by collaborative projects that aim to be global in scope.)
Work against, not with, the permission culture: Lessig warned us early on about “permission culture” – the notion that we would always need to ask permission to do anything. Creative Commons was an attempt to fight it, but by being a legal obligation, rather than a normative statement, it made a key concession to the permission culture – that the legal system was the right terrain to have discussions about sharing. The digital world has pretty whole-heartedly rejected this conclusion, sharing freely and constantly. As a result, I suspect a system that appeals to ethical systems has a better chance of long-term sustainability, because it works with the “new” default behavior online rather than bringing in the heavy, and inflexible, hand of the law.
Why you still need a (permissive) license
Norms aren’t enough if the underlying legal system might allow an early contributor to later wield the law as a threat. That’s why the best practice in the data space is to use something like the Creative Commons public domain grant (CC-Zero) to set a clear, reliable, permissive baseline, and then use norms to add flexible requirements on top of that. This uses law to provide reliability and predictability, and then uses norms to address concerns about fairness, free-riding, and effectiveness. CC-Zero still isn’t perfect; most notably it has to try to be both a grant and a license to deal with different international rules around grants.
What next?
In this context, when I say “norms”, I mean not just the general term, but specifically written norms that can act as a reference point for community members. In the data space, some good examples are DPLA’s “CCO-BY” and the Canadensys biodiversity initiative. A more subtle form can be found buried in the terms for NIH’s Clinical Trials database. So, some potential next steps, depending on where your collaborative project is:
If your community has informal norms (“attribution good! sharing good!”) consider writing them down like the examples above. If you’re being pressed to adopt a license (hi, Wikidata!), consider writing down norms instead, and thinking creatively about how to name and shame those who violate those norms.
If you’re an organization that publishes licenses, consider using your drafting prowess to write some standard norms that encapsulate the same behaviors without the clunkiness of database (or hardware) law. (Open Data Commons made some moves in this direction circa 2010, and other groups could consider doing the same.)
If you’re an organization that keeps getting told that people won’t participate in your project because of your license, consider moving towards a more permissive license + a norm, or interpreting your license permissively and reinforcing it with norms.
Good luck! May your data be widely re-used and contributors be excited to join your project.
I've been working on geek meritocracy and privilege for a while now and my original draft has now been split into two.
The meritocracy piece will be published early next year, and I've just finished the first draft of: Nerd vs. 'bro': Geek privilege, triumphalism, and idiosyncrasy
ABSTRACT: Peggy McIntosh characterized privilege as an “invisible knapsack” of unearned advantages. Although the invisible knapsack is a useful metaphor, the notion of unearned advantage is not readily appreciated, especially by geeks who see their culture as meritocratic. After providing brief cultural histories of geekdom and privilege, I ask: Why are some geeks resistant to the notion of privilege? Beyond the observation that privilege often prompts defensiveness and unproductive comparisons, there is a geek-specific reason. Geek identity is informed by the trope of geek triumphalism: early insecurity is superseded by a sense of superiority. Geeks’ intelligence, unconventional enthusiasms (e.g., technology and fantasy), and idiosyncratic dress were once targets of ridicule, leading triumphant geeks to believe they have no privilege. These same characteristics, later in life, become sources of success and pride, leading them to think they are beyond bias. Nonetheless, I show that even in the seemingly innocuous realm of idiosyncratic dress, there is bias and privilege.
Comments below or in email are welcome and appreciated.
The Tech News weekly summaries help you monitor recent software changes likely to impact you and your fellow Wikimedians. Subscribe, contribute and give feedback.
Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.
When you edit text and mention a new username they are notified if you add your signature. Before this only happened under certain conditions. [2]
Users are notified if they are mentioned in a section where you add your own signature even if you edit more than one section. Before, users were not notified if you edited more than one section in one edit. [3]
Problems
The MediaWiki version that was supposed to come to the wikis two weeks ago was put on hold again because of new problems. The MediaWiki version after it is now on all wikis. [4][5]
Changes this week
There will be no new MediaWiki version this week. [6]
Abandoned tools on Tool Labs could be taken over by other developers. There is a new discussion on Meta about this. It will be discussed until 12 October and then voted on. [7]
Aghayan had already created over 2,700 articles on Wikipedia, but this idea gripped her. She created her first challenge article a few days after returning home. “I love challenges in general because they inspire you to collaborate with other Wikipedians. You get to know them and their interests, and they learn about what drives you,” Aghayan explains.
Aghayan was the first Armenian to complete the challenge. She was proud when she received a message from an Armenian WikiCamp alumni to inquire about #100wikidays. Surprisingly, many of the alum’s fellows had learned about the challenge and followed his lead. It quicklywent viral in the Armenian Wikipedia community.
This is just one inspirational example of over 180 Wikipedians who took on the #100wikidays challenge within the past year and half. Forty-seven 100wikidayers have met the challenge’s target and at least 7,700 articles have been created as a result.
“Everybody is free to adapt the challenge according to their lifestyle,” says Vassia Atanassova, a Bulgarian Wikipedian who came up with the #100wikidays challenge. “There is no such thing as failure in the challenge. It is all about fun and creating good content.”
Some participants in the challenge found it entertaining enough that they did not stop contributing articles on a daily basis after the 100-day period concluded. Nat Tymkiv, a Wikipedian from Ukraine and a member of the Wikimedia Foundation’s Board of Trustees, had completed the full course of the challenge on the Ukrainian Wikipedia, then started and completed another 100-day challenge on Wikiquote. Nat wanted to accomplish a third challenge on Wikivoyage, but that proved difficult.
“My life is really busy, so editing Wikipedia can’t always be a priority,” says Tymkiv. “But I still wanted to try to make this happen. It was a real challenge for me, and I enjoyed it very much. It involved a lot more time management, or what felt like pulling off miracles, in some cases.”
But having fun isn’t always easy. “100wikidays really requires total devotion, which I sometimes lack,” Atanassova admits. “Some days I ask myself what kind of idiot invented this nonsense.”
The simple but smart idea has inspired Rebecca O’Neill, from Ireland, to both participate in the challenge and use her experience to inform her autoethnographic PhD research. Her research focuses on how terms like “curation” and “curator” have changed over the years.
“I’m interested in how both professional and citizen curators see their own work and how they evaluate the work of others,” explains O’Neill. “My own 100wikidays experience has allowed me to explore the motivations and emotions behind this engaging work.”
The number of people joining this venture expands rapidly every day. It’s difficult but enjoyable for most contributors. If you feel inspired to participate, you can start today—head to this talk page or this group on Facebook if you would like to engage with others about joining the challenge.
Samir Elsharbaty, Digital Content Intern Wikimedia Foundation
Photo by Mohammad Reza Domiri Ganji, modified by UNESCO, CC BY-SA 4.0.
Unite4Heritage is a global movement powered by UNESCO that aims to celebrate and safeguard cultural heritage and diversity around the world. Launched in response to the unprecedented recent attacks on heritage, the campaign calls on everyone to stand up against sectarian propaganda by celebrating the places, objects and cultural traditions that make the world such a rich and vibrant place.
As part of Unite4Heritage, UNESCO is supporting Wiki Loves Monuments on social media through September by using amazing images entered into previous competitions.
Wiki Loves Monuments is the largest photography competition in world, giving people in 41 countries the opportunity share their built cultural heritage. The competition is run by hundreds of volunteers who want educate and inspire people about built cultural heritage. It aligns with the goals of Unite4Heritage by celebrating and raising awareness of built heritage with the 500 million people who visit Wikipedia each month.
Photographs entered into Wiki Loves Monuments are available under open licenses so that they can be used by everyone. UNESCO strongly supports the creation of open license content, giving free access to information and unrestricted use of electronic data for everyone.
Many of the photos will be added to Wikipedia articles by the tens of thousands of Wikimedia community volunteers who create and curate the sites.
You can take part in Wiki Loves Monuments by exploring, sharing and photographing the built heritage that is important to you with the world; you can also encourage others to take part by sharing social media messages on Facebook, Twitter and Instagram from UNESCO, the Wikimedia Foundation, and others. Wiki Loves Monuments is running in 41 countries from 1–30 September 2016.
John Cummings, Wikimedian-in-residence
UNESCO
You can see more about UNESCO’s work on Wikimedia sites at WikiProject UNESCO.
The years 1883-1885 were tumultuous in the history of zoology in India. A group called the Simla Naturalists' Society was formed in the summer of 1885. The founding President of the Simla group was, oddly enough, Courtenay Ilbert - who some might remember for the Ilbert Bill which allowed Indian magistrates to make judgements on British subjects. Another member of this Simla group was Henry Collett who wrote a Flora of the Simla region (Flora Simlensis). This Society vanished without much of a trace. A slightly more stable organization was begun in 1883, the Bombay Natural History Society. The creation of these organizations was precipitated by the emergence of a gaping hole. A vacuum was created with the end of an India-wide correspondence network of naturalists that was fostered by a one-man-force - that of A. O. Hume. The ornithological chapter of Hume's life begins and ends in Shimla. Hume's serious ornithology began around 1870 and he gave it all up in 1883, after the loss of years of carefully prepared manuscripts for a magnum opus on Indian ornithology, damage to his specimen collections and a sudden immersion into Theosophy which also led him to abjure the killing of animals, taking to vegetarianism and subsequently to take up the cause of Indian nationalism. The founders of the BNHS included Eha (E. H. Aitken was also a Hume/Stray Feathers correspondent), J.C. Anderson (who was a Simla naturalist) and Phipson (who was from a wine merchant family with a strong presence in Simla).
Shimla then was where Hume rose in his career (as Secretary of State, before falling) allowing him to work on his hobby project of Indian ornithology by bringing together a large specimen collection and conducting the publication of Stray Feathers. Through readings, I had a constructed a fairytale picture of the surroundings that he lived in. Richard Bowdler Sharpe, a curator at the British Museum who came to Shimla in 1885 wrote (his description is well worth reading in full):
... Mr. Hume who lives in a most picturesque situation high up on Jakko, the house being about 7800 feet above the level of the sea. From my bedroom window I had a fine view of the snowy range. ... at last I stood in the celebrated museum and gazed at the dozens upon dozens of tin cases which filled the room ... quite three times as large as our meeting-room at the Zoological Society, and, of course, much more lofty. Throughout this large room went three rows of table-cases with glass tops, in which were arranged a series of the birds of India sufficient for the identification of each species, while underneath these table-cases were enormous cabinets made of tin, with trays inside, containing series of the birds represented in the table-cases above. All the specimens were carefully done up in brown-paper cases, each labelled outside with full particulars of the specimen within. Fancy the labour this represents with 60,000 specimens! The tin cabinets were all of materials of the best quality, specially ordered from England, and put together by the best Calcutta workmen. At each end of the room were racks reaching up to the ceiling, and containing immense tin cases full of birds. As one of these racks had to be taken down during the repairs of the north end of the museum, the entire space between the table-cases was taken up by the tin cases formerly housed in it, so that there was literally no space to walk between the rows. On the western side of the museum was the library, reached by a descent of three stops—a cheerful room, furnished with large tables, and containing, besides the egg-cabinets, a well-chosen set of working volumes. ... In a few minutes an immense series of specimens could be spread out on the tables, while all the books were at hand for immediate reference. ... we went below into the basement, which consisted of eight great rooms, six of them full, from floor to ceilings of cases of birds, while at the back of the house two large verandahs were piled high with cases full of large birds, such as Pelicans, Cranes, Vultures, &c.
I was certainly not hoping to find Hume's home as described but the situation turned out to be a lot worse. The first thing I did was to contact Professor Sriram Mehrotra, a senior historian who has published on the origins of the Indian National Congress. Prof. Mehrotra explained that Rothney Castle had long been altered with only the front facade retained along with the wood-framed conservatories. He said I could go and ask the caretaker for permission to see the grounds. He was sorry that he could not accompany me as it was physically demanding and he said that "the place moved him to tears." Professor Mehrotra also told me about how he had decided to live in Shimla simply because of his interest in Hume! I left him and walked to Christ Church and took the left branch going up to Jakhoo with some hopes. I met the caretaker of Rothney Castle in the garden where she was walking her dogs on a flat lawn, probably the same garden at the end of which there once had been a star-shaped flower bed, scene of the infamous brooch incident with Madame Blavatsky (see the theosophy section in Hume's biography on Wikipedia). It was a bit of a disappointment however as the caretaker informed me that I could not see the grounds unless the owner who lived in Delhi permitted it. Rothney Castle has changed hands so many times that it probably has nothing to match with what Bowdler-Sharpe saw and the grounds may very soon be entirely unrecognizable but for the name plaque at the entrance. Another patch of land in front of Rothney Castle was being prepared for what might become a multi-storeyed building. A botanist friend had shown me a 19th century painting of Shimla made by Constance Frederica Gordon-Cumming. In her painting, the only building visible on Jakko Hill behind Christ Church is Rothney Castle. The vegetation on Shimla has definitely become denser with trees blocking the views.
So there ended my hopes of adding good views (free-licensed images are still misunderstood in India) of Rothney Castle to the Wikipedia article on Hume. I did however get a couple of photographs from the roadside. In 2014, I managed to visit the South London Botanical Institute which was the last of Hume's enterprises. This visit enabled the addition a few pictures of his herbarium collections as well as an illustration of his bookplate which carries his personal motto.
Clearly Shimla empowered Hume, provided a stimulating environment which included several local collaborators. Who were his local collaborators in Shimla? I have only recently discovered (and notes with references are now added to the Wikipedia entry for R. C. Tytler) that Robert (of Tytler's warbler fame - although named by W E Brooks) and Harriet Tytler (of Mt. Harriet fame) had established a kind of natural history museum at Bonnie Moon in Shimla with Lord Mayo's support. The museum closed down after Robert's death in 1872, and it is said that Harriet offered the bird specimens to the government. It would appear that at least some part of this collection went to Hume. It is said that the collection was packed away in boxes around 1873. The collection later came into possession of Mr B. Bevan-Petman who apparently passed it on to the Lahore Central Museum in 1917.
Hume's idea of mapping rainfall to examine patterns of avian distribution
It was under Lord Mayo that Hume rose in the government hierarchy. Hume was not averse to utilizing his power as Secretary of State to further his interests in birds. He organized the Lakshadweep survey with the assistance of the navy ostensibly to examine sites for a lighthouse. He made use of government machinery in the fisheries department (Francis Day) to help his Sind survey. He used the newly formed meteorological division of his own agricultural department to generate rainfall maps for use in Stray Feathers. He was probably the first to note the connection between rainfall and bird distributions, something that only Sharpe saw any special merit in. Perhaps placing specimens on those large tables described by Sharpe allowed Hume to see geographic trends.
Hume was also able to appreciate geology (in his youth he had studied with Mantell ), earth history and avian evolution. Hume had several geologists contributing to ornithology including Stoliczka and Ball. One wonders if he took an interest in paleontology given his proximity to the Shiwalik ranges. Hume invited Richard Lydekker to publish a major note on avian osteology for the benefit of amateur ornithologists. Hume also had enough time to speculate on matters of avian biology. A couple of years ago I came across this bit that Hume wrote in the first of his Nests and Eggs volumes (published post-ornith-humously in 1889):
I wrote immediately to Tim Birkhead, the expert on evolutionary aspects of bird reproduction and someone with an excellent view of ornithological history (his Ten Thousand Birds is a must read for anyone interested in the subject) and he agreed that Hume had been an early and insightful observer to have suggested female sperm storage.
Shimla life was clearly a lot of hob-nobbing and people like Lord Mayo were spending huge amounts of time and money just hosting parties. Turns out that Lord Mayo even went to Paris to recruit a chef and brought in an Italian, Federico Peliti. (His great-grandson has a nice website!) Unlike Hume, Peliti rose in fame after Lord Mayo's death by setting up a cafe which became the heart of Shimla's social life and gossip. Lady Lytton (Lord Lytton was the one who demoted Hume!) recorded that Simla folk "...foregathered four days a week for prayer meetings, and the rest of the time was spent in writing poisonous official notes about each other." Another observer recorded that "in Simla you could not hear your own voice for the grinding of axes. But in 1884 the grinders were few. In the course of my service I saw much of Simla society, and I think it would compare most favourably with any other town of English-speaking people of the same size. It was bright and gay. We all lived, so to speak, in glass houses. The little bungalows perched on the mountainside wherever there was a ledge, with their winding paths under the pine trees, leading to our only road, the Mall." (Lawrence, Sir Walter Roper (1928) The India We Served.)
A view from Peliti's (1922).
Peliti's other contribution was in photography and it seems like he worked with Felice Beato who also influenced Harriet Tytler and her photography. I asked a couple of Shimla folks about the historic location of Peliti's cafe and they said it had become the Grand Hotel (now a government guest house). I subsequently found that Peliti did indeed start Peliti's Grand Hotel, which was destroyed in a fire in 1922, but the centre of Shimla's social life, his cafe, was actually next to the Combermere Bridge (it ran over a water storage tank and is today the location of the lift that runs between the Mall and the Cart Road). A photograph taken from "Peliti's" clearly lends support for this location as do descriptions in Thacker's New Guide to Simla (1925). A poem celebrating Peliti's was published in Punch magazine in 1919. Rudyard Kipling was a fan of Peliti's but Hume was no fan of Kipling (Kipling seems to have held a spiteful view of liberals - "Pagett MP" has been identified by some as being based on W.S.Caine, a friend of Hume; Hume for his part had a lifelong disdain for journalists. Kipling's boss, E.K. Robinson started the British Naturalists' Association while E.K.R.'s brother Philip probably influenced Eha.
While Hume most likely stayed well away from Peliti's, we see that a kind of naturalists social network existed within the government. About Lord Mayo we read:
Lord Mayo and the Natural History of India - His Excellency Lord Mayo, the Viceroy of India, has been making a very valuable collection of natural historical objects, illustrative of the fauna, ornithology, &c., of the Indian Empire. Some portion of these valuable acquisitions, principally birds and some insects, have been brought to England, and are now at 49 Wigmore Street, London, whence they will shortly be removed. - Pertshire Advertiser, 29 December 1870.
Another news report states:
The Early of Mayo's collection of Indian birds, &c.
Amids the cares of empire, the Earl of Mayo, the present ruler of India, has found time to form a valuable collection of objects illustrative of the natural history of the East, and especially of India. Some of these were brought over by the Countess when she visited England a short time since, and entrusted to the hands of Mr Edwin Ward, F.Z.S., for setting and arrangement, under the particular direction of the Countess herself. This portion, which consists chiefly of birds and insects, was to be seen yesterday at 49, Wigmore street, and, with the other objects accumulated in Mr Ward's establishment, presented a very striking picture. There are two library screens formed from the plumage of the grand argus pheasant- the head forward, the wing feathers extended in circular shape, those of the tail rising high above the rest. The peculiarities of the plumage hae been extremely well preserved. These, though surrounded by other birds of more brilliant covering, preserved in screen pattern also, are most noticeable, and have been much admired. There are likewise two drawing-room screens of smaller Indain birds (thrush size) and insects. They are contained in glass cases, with frames of imitation bamboo, gilt. These birds are of varied and bright colours, and some of them are very rare. The Countess, who returned to India last month, will no doubt,add to the collection when she next comes back to England, as both the Earl and herself appear to take a great interest in Illustrating the fauna and ornithology of India. The most noticeable object, however, in Mr. Ward's establishment is the representation of a fight between two tigers of great size. The gloss, grace, and spirit of the animals are very well preserved. The group is intended as a present to the Prince of Wales. It does not belong to the Mayo Collection. - The Northern Standard, January 7, 1871
And Hume's subsequent superior was Lord Northbrook about whom we read:
University and City Intelligence. - Lord Northbrook has presented to the University a valuable collection of skins of the game birds of India collected for him by Mr. A.O.Hume, C.B., a distinguished Indian ornithologist. Lord Northbrook, in a letter to Dr. Acland, assures him that the collection is very perfec, if not unique. A Decree was passed accepting the offer, and requesting the Vice-Chancellor to convey the thanks of the University to the donor. - Oxford Journal, 10 February 1877
Papilio mayo
Clearly Lord Mayo and his influence on naturalists in India is not sufficiently well understood. Perhaps that would explain the beautiful butterfly named after him shortly after his murder. It appears that Hume did not have this kind of hobby association with Lord Lytton, little wonder perhaps that he fared so badly!
Despite Hume's sharpness on many matters there were bits that come across as odd. In one article on the flight of birds he observes the soaring of crows and vultures behind his house as he sits in the morning looking towards Mahassu. He points out that these soaring birds would appear early on warm days and late on cold days but he misses the role of thermals and mixes physics with metaphysics, going for a kind of Grand Unification Theory:
And then claims that crows, like saints, sages and yogis are capable of "aethrobacy".
This naturally became a target of ridicule. We have already seen the comments of E.H. Hankin on this. Hankin wrote that if levitation was achieved by "living an absolutely pure life and intense religious concentration" the hill crow must be indulging in "irreligious sentiments when trying to descend to earth without the help of gravity." Hankin despite his studies does not give enough credit for the forces of lift produced by thermals and his own observations were critiqued by Gilbert Walker, the brilliant mathematican who applied his mind to large scale weather patterns apart from conducting some amazing research on the dynamics of boomerangs. His boomerang research had begun even in his undergraduate years and had earned him the nickname of Boomerang Walker. On my visit to Shimla, I went for a long walk down the quiet road winding through dense woodland and beside streams to Annandale, the only large flat ground in Shimla where Sir Gilbert Walker conducted his weekend research on boomerangs. Walker's boomerang research mentions a collaboration with Oscar Eckenstein and there are some strange threads connecting Eckenstein, his collaborator Aleister Crowley and Hume's daughter Maria Jane Burnley who would later join the Hermetic Order of the Golden Dawn. But that is just speculation!
1872 Map showing Rothney Castle
The steep road just below Rothney Castle
Excavation for new constructions just below and across the road from Rothney Castle
The embankment collapsing below the guard hut
The lower entrance, concrete constructions replace the old building
The guard hut and home are probably the only heritage structures left
I got back from Annandale and then walked down to Phagli on the southern slope of Shimla to see the place where my paternal grandfather once lived. It is not a coincidence that Shimla and my name are derived from the local deity Shyamaladevi (a version of Kali).
The South London Botanical Institute
After returning to England, Hume took an interest in botany. He made herbarium collections and in 1910 he established the South London Botanical Institute and left money in his will for its upkeep. The SLBI is housed in a quiet residential area. Here are some pictures I took in 2014, most can be found on Wikipedia.
Dr Roy Vickery displaying some of Hume's herbarium specimens
Specially designed cases for storing the herbarium sheets.
The entrance to the South London Botanical Institute
A herbarium sheet from the Hume collection
Hume's bookplate with personal motto - Industria et Perseverentia
An ornate clock which apparently adorned Rothney Castle
An antique book shop had a set of Hume's Nests and Eggs (Second edition) and it bore the signature of "R.W.D. Morgan" - it appears that there was a BNHS member of that name from Calcutta c. 1933. It is unclear if it is the same person as Rhodes Morgan, who was a Hume correspondent and forest officer in Wynaad/Malabar who helped William Ruxton Davison. Update: Henry Noltie of RBGE has pointed out to me privately that this is not the forester Rhodes Morgan (died 1919!). - September, 2016.
My first exposure to the concept of higher dimensional space came from reading Flatland in elementary school. The book used the analogy of beings living in a two dimensional world trying to understand the third dimension in order to convey the type of imagination required for three dimensional beings like us to visualize a fourth. The concept completely blew my mind; I spent the rest of the day almost in a daze trying to picture a fourth direction extending at a right angle to the three directions I knew. It would be another decade before I would have the tools needed to visualize the shadows that such four dimensional geometry might project on our 3D world, starting with a humble 4D cube and ranging to the majestic 120-cell; the 4D analog of the dodecahedron.
My parents raised me to be curious. Any time I didn’t understand how something worked or something needed to be fixed, it was an opportunity to learn something new. It was all about creative problem solving; figuring out what worked and what didn’t, and using whatever was available in clever ways. They made clear that in school, my grades weren’t important to them. They only cared whether or not I was learning. They fostered an environment in which I was encouraged to make mistakes in order to fully explore a problem space, and it gave me the confidence to approach everything in my life with that mentality. Whenever I encountered something that I didn’t fully understand, I would poke and prod at it until I could build a gut level intuition about how it worked.
In mathematics and geometry, there are often concepts that can be difficult to build intuition about. Mathematically there is no problem with doing algebra with objects in a higher dimensional space, but using memorized algorithms to manipulate symbols on a page is a far cry from really understanding the underlying concepts that those symbols represent.
Quaternions are a fantastic example of this. In computer graphics, programmers often treat them as a magical black box. They are ‘magic mysterious hyper-dimensional imaginary numbers that can represent 3D rotations’. This wasn’t acceptable to me—I wanted to know why they worked the way that they did, and what they really represented. After a significant amount of investigation I was able to build a mental model that was much more simple and intuitive—quaternions are made out of four parts because there are four rotations that are about as different from each other as you can possibly get. In a way, these rotations really are perpendicular to each other. You have where you started, where you land after rotating 180 degrees about the x axis, where you land after rotating 180 degrees about the y axis, and where you land after rotating 180 degrees about the z axis. By taking some blend of each of these rotations you can build any other rotation that you could possibly want. Quaternions aren’t magic—they are just a list of four blending values that tell you how much of each ‘fundamental’ rotation to use.
This drive to transform difficult concepts into intuitive tools was one of the major forces that led me to develop mathematical animations for wikipedia. I wanted to demystify concepts that were too easy to take on faith, and turn them into something that people could really understand. For instance, take the notion of an object that has ‘spin ½’. This allegedly describes some sort of object that you have to spin around twice (a full 720 degrees!) before it will get back to where it started. That just seems like quantum magic unless you have a way to visualize it.
One day I came across a description in a paper of a device that could be attached to a frame and spin continuously without getting tangled. It was described in tremendous detail, but I still couldn’t see how it was possible. So I dug into my closet through some of my childhood toys and tried to build what had been described with k’nex and rubber bands. I could manipulate it with my hands, rotate it, see that it really did work, and come to understand it. And amazingly, that little part in the center had to spin around twice to get back to where it started!
Once I understood how it worked and had a general mental model, I created some animations in an attempt to share that insight. I made the first one below with manual keyframes and standard animation tools, and then later I wrote code to generate something a little more precise and elegant (the second one).
Finally, I decided to go all out and see if I could demonstrate that this works with any number of fibers, and that in the limit, a solid piece of continuous space could twist like this without getting tangled.
GIF by Jason Hise, public domain/CC0.
GIF by Jason Hise, public domain/CC0.
I work in the videogame industry as a physics programmer, so I spend a lot of my time using math to make 3D geometry move in an appealing way. I’m currently working on a game where literally all of the character motion is driven by math—shielding turns you into a block, which I figured out how to do by reading up on alternate ways of measuring distance. Math and geometry can be both expressive and beautiful, and I want to share that feeling with the world.
Edits by Telenav employees with the quality assurance tool ImproveOSM are seen very critically by the Canadian Community. User Mihai Iepure requests to provide a short summary of the emails in English as he does not know French.
Adam Old, member of a “Tree Board“ in South Florida asks if they may use OpenStreetMap for their collection of trees and meta data of those trees.
Denis Stein asks on the OpenRailwayMap mailing list where exactly and how to map points (switches, turnouts), and makes some suggestions.
Jojo4u asks if railway=technical_station is obsoleted by railway=service_station.
Analysing MAPS.ME changesets, manoharuss found several typical rookie mistakes but surprisingly little misuse of name=*.
Krishna Nammala from Mapbox Data Team reported the German forum on the status of their efforts to “missing turn restrictions” in Germany.
Jojo4u created a new proposal for tagging mud flat trails.
Srividya from Mapbox Data Team reported a notable data offset of 15-20 meters with respect to GPS Traces in Taiwan. Mapbox has therefore provided new satellite images for the larger cities.
Community
Joe Morris is a cyclist and interested in public drinking water spots and created a map for that. To increase the data he’d like to create a dedicated site to map such locations and therefore asks for feedback.
Søren Johannessen notes that OpenStreetMap reached 200 million buildings in its data base.
Imports
Gianmario Mengozzi proposes on the import mailing list a boundary Import in the northern Italian region of Emilia-Romagna from a CC-0-licensed source.
Maps
The application Gnome Maps slowly starts to get usable. After the end of MapQuest Open they started to use Mapbox tiles, they now have aerial imagery, a very basic search function and a routing based on Graphhopper.
GeoHipster will again be publishing a calendar in 2017. For that they have asked to send in the maps that might be included in the official calendar.
Hans Hack created a poster of the 104 islands in the German capital Berlin.
The OSM Carto developers plan to switch to Noto, a new font. They ask for feedback from readers of Asian countries.
[1] Thej from India analyzes in his blog, the consideration of Indian languages in OpenStreetMap (Mapnik). Arun Ganesh points among others to the multilingual map by Jochen Topf and his own experiments.
Jochen Topf announces the new features of libosmium 2.9.0 and Osmium-Tool 1.4.0 in his blog. The latter now also allows to change tags with sed and generates human readable diffs.
Both Forbes and TechCrunch Network report on the technical assistance, that has done inter alia OpenStreetMap Italy after the earthquake.
Paul Groves explains how 3D-Mapping could be used to enhance the accuracy of GNSS in cities. As a data source for the needed 3D models he suggests to use OpenStreetMap data.
Peter Richardson of Mapzen reported in his blog post as he generates 3D models with Heightmapper from Mapzen’s high-quality open-source terrain data.
Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..
This weekly was produced by Hakuch, Nakaner, Peda, Rogehm, Softgrow, YoViajo, derFred, jinalfoflia, kreuzschnabel, mgehling, wambacher.
Crowd might be a bit of a stretch for less than a hundred contributors but George Bellas Greenough (1769 – 1839), one of the founders of the Geological Society of London produced, posthumously, the first geological map of India which was published in 1855. Greenough was the first president of the Geological Society of London and was reportedly best known for his ability to compile and synthesize the works of others and his annual address to the Society was apparently much appreciated. He was however entirely against the idea that fossils could be used to differentiate strata and in that he failed to admire William "Strata" Smith who produced the first geological map of England. An obituarist noted that Greenough was an outspoken critic of theoretical frameworks and a "drag" on the progress of the science of geology!
Not much has been written about the history of the making of the Greenough map of Indian geology - it was begun somewhere in 1853 and was finally published in 1855 and consisted of four sheets and measured 7 by 5¾ foot. A small number of copies were made which are apparently collector items but hardly any are available online for anyone wishing to study the contents. The University of Minnesota has a set of scanned copies of three-fourths of the map but if you want to read it you need to download three large files (each of about 300 MB!) . I decided to stitch together these images and to enhance them a bit and since the image is legally in the public domain (ie. copyright expired), I have placed it on Wikimedia Commons. There really is a research need for examining the motivations for making this map and on how Greenough went about with it. He apparently had officers of the East India Company providing him information and he seems to have sent draft maps on which they commented. There is a very interesting compilation of the correspondence that went into the making of this map. It has numerous errors both in geology as well as in the positions and labelling but definitely something to admire for its period.
On has to lament that nobody has made a nice geological map in recent times showing interesting regional formations, fossil localities and so on. So much for our human-centricity and recentism.
Here is a small overview of the 1855 map. You can find and download the whole image on Wikimedia Commons.
Last month, Mohamed Oudacelebrated his tenth year on Wikipedia. It has been an immersive experience, during which he has made every effort to expand the website, help others be as accomplished, and promote the culture of free knowledge.
His story with the movement began when he was searching for information on the Arabic Wikipedia in August 2006. At the time, the emerging encyclopedia had around 10,000 mostly very short articles, and the article he was reading had a note stated that “This is a stub article. Please help expand it.”
It was a message that many readers might ignore, but Ouda was different. “I decided to contribute to Wikipedia,” the first entry on Ouda’s personal user page reads, “because the Arabic Wikipedia was very small compared to other major languages on Wikipedia.”
In the last decade, Mohamed, a native of Cairo, has edited Wikipedia over 21,000 times, created nearly 900 new articles, and expanded many more. Some of the articles he has developed, like Disney’s 1992 film Aladdin, have met the community’s quality standards—qualifying them to be among the Arabic Wikipedia’s “good” articles (English).
In our interview with Ouda, he reaffirmed that enriching digital Arabic content is still his main urge for contributing to Wikipedia. “The shortage of digital content in Arabic is still what keeps me editing every day,” he says. The efforts of him and other Arabic-language Wikipedians have made a noticeable dent in this shortage. Today, the Arabic Wikipedia has just under 450,000 articles—less than a tenth of the English Wikipedia’s five million, but 200,000 articles stronger than three years ago. Still, there is much room for improvement; Arabic is the the fifth most-used language in the world.
Ouda’s persistent efforts to enhance his language’s presence on the internet started with topics close to him. His first few edits included creating an article for the academy from which he has obtained his university degree, and expanding the article about the neighborhood where he lived. He quickly moved into writing important missing articles about parts of the body, like tongue, hair, and psychological disorders like narcissism.
Within a few months, Ouda was nominated by the community to be an administrator on the Arabic Wikipedia; few opposed the move.
Since the inception of the Wikipedia Education Program in Egypt, Ouda joined with it to apply his expertise with Wikipedia to help the program’s students. “The program’s idea was exciting to me.” says Ouda. “Assigning students to edit Wikipedia articles in the field of their study is wonderful as it increases article quality. I wanted to do anything I could to support that initiative.” Ouda served as a campus ambassador in the first edition of the program at Cairo University in 2012 and has provided needed training to his fellow campus volunteers and university students.
For Ouda, a positive community is a key to support Wikipedia. “The first time I personally met up with Wikipedians was in attending Wikimania 2008 in Alexandria.” Ouda recalls, “That was a turning point in my Wikipedian experience. Meeting people offline and online, learning from others’ experiences and even hanging out together is always encouraging in an online community. I’ve seen many Wikipedians quit contributing because of negative communication experiences, namely arguments with trolls and online harassment.”
Before co-founding Egypt Wikimedians user group, Ouda has been striving for establishing an affiliated group with Wikimedia to help “connect people and facilitate holding activities in Egypt.” Ouda is an active member and co-founder of the user group. He helps define the needs for the activities and projects held in Egypt, making plans and securing funding. Ouda and the user group have helped organize several writing and photography contests and backed the education program expansion. They are currently working on the plans to host the third WikiArabia conference early next year.
Samir Elsharbaty, Digital Content Intern Wikimedia Foundation
“The decade” is a new blog series that profiles Wikipedians who have spent ten years or more editing Wikipedia. If you know of a long-time editor who we should talk to, send us an email at blogteam[at]wikimedia.org.
Public licenses for databases don’t work well. Before going into solutions to that problem, though, I wanted to talk briefly about some things that are important to consider when thinking about solutions: real-world examples of the problems; a common, but bad, solution; and a discussion of the motivations behind public licenses.
On the flip side, because these rules are based on such flimsy legal grounds, sophisticated corporate legal departments often feel comfortable circumventing
the requirements by exploiting loopholes. (Needless to say, they don’t blog about the problems with the licenses – they just go ahead and use the loopholes.) So overreaching attempts to create new rights are, in many ways, the worst of both worlds: they hurt well-intentioned cooperation, and don’t dissuade parties with a significant interest in exploiting the commons.
What not to do: create new “rights”
When thinking about solutions, it is unfortunately also important to say what isn’t a good idea: create new rights, or override limitations on old ones. The Free Software Foundation, to their great credit, has always consistently said that if weakening copyright also weakens the GPL, they’ll take that tradeoff; and that vice-versa, the GPL should not ask for rights that go beyond copyright law. The most recent copyleft licenses from Creative Commons, Mozilla, and the FSF all make this explicit: limitations on copyright, like fair use, are not trumped by our licenses.
Unfortunately, many people have a good-faith desire to see copyleft-like results in other domains. As a result, they’ve gone the wrong way on this point. ODbL is probably the most blatant example of this: even at the time, Science Commons correctly pointed out that ODbL’s attempt to create database rights by contract outside of the EU was a bad idea. Unfortunately, well-intentioned people (including me!) pushed it through anyway. Similarly, open hardware proponents have tried to stretch copyright to cover functional works, with predictably messy results.
This is not just practically wrong, for the reasons I’ve explained in earlier posts. It is also ethically wrong for those of us who want to see more data sharing, because any “rights” we create by fiat are going to end up being used primarily to stop sharing, not encourage it.
Remembering why we do share-alike and attribution
Consider this section a brief sketch for a future post – if I forgot something
big, please let me know, but please don’t roast me in comments for being brief
or reductive about your favorite motivation.
It is important when writing about public licenses to remember why the idea of
placing restrictions on re-use is so intuitively appealing outside of software.
If we don’t understand why people want to do less-than-public domain, it’s hard
to come up with solutions that actually work. Motivations tend to be some
combination (varying from person to person and community to community) of:
Recognition: Many people want to at least be recognized for their work, even when they ask for nothing else. (When Creative Commons assessed usage after their 1.0 licenses, [97-98% of people chose attribution](https://creativecommons.org/2004/05/25/announcingandexplainingournew20licenses/).) This sentiment underlies many otherwise “permissive” licenses, as well as academic norms around plagiarism and attribution.
Reducing free riding: Lots of people are afraid that commons can be destroyed by people who use the resource without giving back. Historically, this “tragedy of the commons” was about [rivalrous](https://en.wikipedia.org/wiki/Rivalry_(economics)) goods (like fisheries), but the same concern is often raised in the context of collaborative communities, whose labor can be rivalrous even when their goods are non-rivalrous. Some people like share-alike requirements because, pragmatically, they feel such requirements are one way to prevent (or at least reduce) this risk by encouraging people to either participate fully or not participate at all. (If you’re interested in this point, I’ve [written about it before](http://lu.is/blog/2014/12/02/free-riding-and-copyleft-in-cultural-commons-like-flickr/).)
“Fairness”: Many people like share-alike out of a deep moral sense that if you take, you should also give back. This often looks the same as the previous point, but with the key distinction that at least some people focused on fairness care more about process and less about outcomes: a smaller, less productive community with more sharing may, for them, be better than a larger, more productive community where not everyone shares perfectly.
Access to allow self-help: Another variation on the previous two points is a use of copyleft that focuses less on “is the author helping me by cooperating” and more on “did the author give me materials I can then use to help myself”. In this view, increased access to raw material (like source code, or data) can be good even the authors are non-cooperative. (To those familiar with the Linux kernel discussions, this is essentially “I got a lousy driver, and the authors hate me, but at least I got *a* driver”.)
Ethical: Many people simply think data/source should never be proprietary, and so will use any means possible, like copyleft, to increase the amount of non-proprietary code in the world.
Statistics seem to drive the news lately. It’s an election year, so people are obsessing over polls. Public policy discussions are being driven by research and analysis. Now more than ever, people are thinking and talking about statistics and what they mean.
Wikipedia attracts millions of those online queries every day. Search for something on the web, and you’re bound to end up on Wikipedia. So it’s crucial that the information the public finds is reliable, accurate, and comprehensible.
Unfortunately, statistical information isn’t always presented in the clearest way on Wikipedia. When it comes to understanding statistics, articles often have a lot of room for improvement. Some articles are great: The Monty Hall Problem, for example.
But what if articles on important statistical topics, such as Deviance or Causal inference were just as easy for online passerby to understand?
This summer, I’ve been attending conferences such as the Joint Statistics meeting to help identify and remedy these information gaps. We’re asking instructors to assign their students to think deeply about their fields, and then make a change to the way others access that information. In other words: students can help simplify complex language, and bring a wider understanding of statistics to the public.
A few instructors raised concerns about whether students could create that kind of high quality content. But when I mentioned the help of our content experts, our online trainings, and our scaffolded approach to writing for Wikipedia (including our freshly updated course timelines) they saw the possibilities.
Fittingly, Milo Schield of Augsburg College wants his students to work together to update the Wikipedia article on Statistical literacy. While the page does have some good information, and touches upon the complications in using statistics in advertising, Dr. Schield saw ways for his students to expand the article. With more sections, good citations, and a clear vision, this page could become a great source for individuals who want to learn more about statistics.
Our work at the JSM is just one step in our larger year of science – a year long initiative to improve the way students think about and access science knowledge. By communicating what they know, student editors think about how to share their work with the world. But they’ll also think deeply about how to assess the information they find online. Along the way, millions of readers get access to the world’s most-read open knowledge resource: Wikipedia.
Are you an American or Canadian instructor working in statistics in higher ed? Do you want to work with your students to help improve Wikipedia? Check out more information here to learn more, or contact us at contact@wikiedu.org. I hope to hear from you soon!
It has been 20 days since the launch of Wiki Loves Monuments 2016 and it is time for an update on what we have achieved together. In the majority of the more than 40 participating countries the photo submissions are accepted until September 30.
Participation
So far, more than 120,000 photos have been uploaded by 5741 participants as part of the contest in 41 participating countries. The highest number of uploads comes from Germany with more than 27,000 photo uploads (more than 50% of them by User:Tilman2007) while India is on top of the chart with 1196 unique participants!
Compared to last year this time, the total number of photo uploads have increased by more than 10%. In the same period, the number of Wikimedia accounts created right before uploading a photo to the contest (a strong signal for the number of accounts created as part of the contest) has increased by almost 80%!
Usage
While it is too early to showcase the usage of photos uploaded as part of Wiki Loves Monuments 2016, it is nice to see that some of the participating photos have already been added to more than 3400 pages on Wikimedia projects.
What’s next?
We know that historically the majority of the photos have been uploaded in the last week of the contest. With less than 10 days left in the majority of the participating countries, now is our last chance as the WLM local organizers and enthusiasts to get the word out about the contest, organize local tours and photo upload events, and more. Remember that all it takes to participate is to find a monument near-by, photograph it, and submit a photo of it for the competition. You can win a prize and help Wikipedia.
For more statistics about the contest, please check wikiloves or wlm-stat.
Photo: Saint Samaan The Tanner Monastery in Cairo, Egypt. By: Hoba offendum, CC BY-SA-4.0.
Sometimes the simplest of actions can create unexpected change in the world. That is what happened when Vassia Atanassova, decided to write 100 Wikipedia articles in 100 days as a challenge to herself. She called it#100wikidays, shared her challenge on Facebook, and quickly inspired dozens of other Wikipedia editors to take on the same challenge.
In another corner of the world and many years earlier, Liam Wyatt started sending emails to museums to propose a new form of partnership: a “Wikipedian in residence.” The British Museum said yes, which led to the first GLAM-Wiki program of this sort. Five years later, in 2015, he found himself giving a presentation at the Soumaya museum in Mexico City to inspire the local community to start a residency of their own.
There are now 110 Wikipedians in residence all over the world, and 7,500 articles have been created through the #100WikiDays challenge. Vassia and Liam are only two of the many Wikimedians who have boldly stepped up to help other Wikimedians succeed in our shared educational mission. This support pattern is consistent across the movement, and we, the Wikimedia Foundation, would now like to know how to best support it to help Wikimedia communities thrive.
What do you mean leadership?
In the Wikimedia world, almost every contributor has to “be bold” and step up to the challenges of guiding the projects and activities to success. In turn, experienced individuals become models for others and help mentor newcomers to participate in our projects and to continue to grow our community. This form of mentoring, or leadership, or collaborative guiding of the communities is an absolutely crucial part of meeting our vision: to freely share in the sum of all knowledge.
As we pursue this world-changing mission, leadership is not about something beyond us, it is not about a single person leading, but a great many people: it is a shared practice that lies in the core of our culture. Wikimedia is a movement made of many volunteers leading through everyday acts to liberate knowledge, and help others to do the same.
What can we do to build leadership?
The Wikimedia Foundation’s Community Engagement department, along with movement affiliates, support and collaborate with these leaders, mentors and guides. However, there are many people throughout the movement who don’t get direct support for their leadership development activities. For every one of those community guides that movement organizations identify, there are dozens more within our movement who could lead, if given access to the right skills or encouragement. As a community, we should seek greater engagement of community leaders, and to do so we need a shared vision.
That is why we are launching the Leadership Development Dialogue. We need your help in refining not only what the Wikimedia Foundation provides in terms of direct training activities for community leaders, but also to refine how we describe those leaders.
Over the last year, we have engaged focus groups to explore the shared understandings of what kinds of “leadership” traits we need of new leaders within our communities—and we found that we want very similar things: empathetic community organizers, who know how to inspire our communities, making our communities more sustainable without alienating others. However the word “leader” does not translate well between languages and cultures, as it can mean anything from inspiring and engaging new participants to dictatorial control over projects or activities. We certainly don’t want that confusion!
We need your help!
From now through October 16, 2016, we invite you to comment on two items. First, we’d like your thoughts on how we can design for appropriate inclusivity, achieve the goals of the peer mentoring and leadership development, and develop additional support infrastructure to reinforce important skills in the Wikimedia communities; and second, how we should describe leaders in our movement, down to the word(s) we use to identify them and the skills that make them who they are.
We invite you to join our conversation and help us refine what it means to develop leadership for program and community development.
Alex Stinson, GLAM-Wiki Strategist Jaime Anstee, Senior Strategist, Manager María Cruz, Communications and Outreach Coordinator
Wikimedia Foundation
The RevisionSlider is an extension for MediaWiki that has just been deployed on all Wikipedias and other Wikimedia websites as a beta feature. The extension was developed by Wikimedia Germany as part of their focus on technical wishes of the German speaking Wikimedia community. This post will look at the RevisionSliders design, development and use so far.
What is the RevisionSlider
Once enabled, the slider appears on the diff comparison page of MediaWiki, where it aims to enable users to more easily find the revision of a page that introduced or removed some text as well as making the navigation of the history of the page easier. Each revision is represented by a vertical bar extending upward from the centre of the slider for revisions that added content and downward from the slider for those that removed content. Two coloured pointers are used to indicate the revisions that are currently being compared, the colour coding matches the colour of the revision changes in the diff view. Each pointer can be moved by dragging to a new revision bar or by clicking on the bar, at this point the diff will be reloaded using ajax for the user to review. For pages with many revisions arrows are enabled at the ends of the slider to move back and forward through revisions. Extra information about the revisions represented by bars is shown in a tooltip on hover.
Deployment & Usage
The RevisionSlider was deployed in stages, first to test sites in mid July 2016, then to the German Wikipedia and a few other sites that have been proactive in requesting the feature in late July 2016, and finally to all Wikimedia sites on 6 September 2016. In the 5 days following the deployment to all sites the number of users using the feature increased from 1739 to 3721 (over double) according to the Grafana dashboard https://grafana.wikimedia.org/dashboard/db/mediawiki-revisionslider. This means the beta feature now has more users than the “Flow on user talk page” feature and will soon overtake the number of users with ORES enabled unless we see a sudden slow down https://grafana.wikimedia.org/dashboard/db/betafeatures.
The wish
The wish that resulted in the creation of the RevisionSlider was wish #15 from the 2015 German Community Technical Wishlist and the Phabricator task can be found at https://phabricator.wikimedia.org/T139619. The wish actually reads (roughly translated) When viewing the diff a section of the version history, especially the editing comments show be show. Lots of discussion follows to establish the actual issue that the community was having with the diff page, and the consensus was it was generally very hard to move from one diff to another. The standard process within MediaWiki requires the user to start from the history page to select a diff. The diff then allows moving forward or backward revision by revision but big jumps are not possible without first navigating back to the history page.
The first test version of the slider was inspired by the user script called RevisionJumper. This script provided a drop down menu in the diff view that provided various options to jump to a version of the page considerably before or after the current shown version. This can be seen in the German example below.
The WMF Communit Tech team worked on a prototype during autumn 2015 which was then picked up by WMDE at the Wikimedia Jerusalem hackathon in 2016 and pushed to fruition.
The initial version of the slider is great and provides functionality that can be easily used by many editors and readers. Further developments will include:
Painting by Harry Wilson Watrous, public domain/CC0.
Over the past 15 years, Wikimedians have collaboratively built some of the most amazing projects on the Internet and for free knowledge. Editors on Wikipedia, contributors to Commons, and administrators on other sites are united in their goals of collecting human knowledge and making it accessible and reusable for free and for everybody in the world. They work hard towards this goal, contributing an impressive amount of time and effort.
Wikimedians not only want to collect knowledge, they also want to get that knowledge right. They care a lot about the factual quality of the information on Wikimedia projects—about complying with copyright (e.g. for images that illustrate Wikipedia), about freedom of expression, about neutrality and the strength of underlying sources. In order to collaboratively build and improve content, Wikimedians discuss their views on talk pages, on email lists, and in 1-to-1 conversations. Naturally, when editors, administrators, and other contributors disagree about certain issues, they will argue. Often times, they have passionate debates, fiercely defending their point of view. And at times their disagreements escalate in ways that will lead Wikipedians to use harsh words or be abusive to each other. In some cases, however, bad behavior seemingly comes out of nowhere, for instance when users personally attack or troll others or engage in acts of vandalism. These are issues on the Internet that have been written about in variousplaces and have been researched in our community. The existence of these problems on Wikipedia is something that we are not proud of.
However, this is not a phenomenon that only exists in Wikimedia discussions: many websites that facilitate user contributions or comments see harsh conversations and personal attacks among users. As most of our communication moves online, including important democratic discourse, speech that threatens sincere conversations and debates increasingly becomes a problem. That’s why we are pleased to see that there are differentinitiatives that seek to address the issue of harassment online. We try to learn from those initiatives and hope they will succeed. At the same time, we know that we cannot rely on the work of others to make sure that the Wikimedia projects are safe for everyone to access and contribute to free knowledge. Rather, we are determined to create a friendly space ourselves where people can gather to collect encyclopedic information and educational content.
There are several connected reasons for us to do this. An unfriendly or even toxic environment can be an impenetrable barrier for access to knowledge. We cannot expect people to join our movement and contribute to our mission of collecting free knowledge if they don’t feel comfortable on our websites. Yet, in order to build an exhaustive encyclopedia that covers diverse views and perspectives, we need as many people as possible to contribute to Wikipedia and our other Projects. In today’s world, people have many options for spending their free time and negative experiences would seriously threaten the success of our movement’s work. It has also been argued that there is an ethical obligation for platforms to protect their users from abusive behavior through community management. Finally, we also believe that productivity is diminished by a harsh tone and especially by polemic and aggression.
While it is clear to us that many of these reasons deserve further research, we also recognize that one challenge to our intention is finding the right balance between promoting free speech and curbing harassment. Wikimedia’s values build on democratic decision-making and collaboration. So we started the process of developing principles for interaction on the Projects by asking the community for input. At this year’s Wikimania, the annual gathering of the global Wikimedia movement, together with roughly fifty participants, we discussed Wikimedians’ experiences with existing codes of conduct and policies on Wikipedia, Commons, etc. We discussed participants’ expectations for communication on- and off-Wiki and collected recommendations for behavior in arguments and disputes over facts and compliances with guidelines on the Wikimedia sites.
Five patterns have emerged:
Offer constructive criticism. Offer options.
Treat people as you would like to be treated. No personal attacks. Be empathetic.
Re-read your contributions. Be patient. Think: this is how x makes me feel.
If you see something bad, say something.
Connect on human level. Apologize. Get off-Wiki for a second. Rewind.
We believe these principles for interaction can help us create a friendly space for all contributors and newcomers alike. The Wikimedia Foundation is taking this issue very seriously and working on developing better training for volunteers to discourage abuse and better resolve disputes; you can participate in that project on Meta. We invite you to discuss these principles with your community and to let us know what you think about them in the comments below. This is only the start of a larger conversation that we need to have in order to ensure the continued success of the Wikimedia projects and access to knowledge for everyone.
Patrick Earley, Senior Community Advocate (International) Jan Gerlach, Public Policy Manager Wikimedia Foundation
A reader uses Wikipedia mobile for the first time to get an overview on a resistor. Photo by Abigail Ripstra, CC BY-SA 4.0.
As the saying goes, a picture is worth a thousand words. Yet images on mobile devices can translate to more data used. In many parts of the world, high mobile data costs present a significant barrier to accessing knowledge on the Wikimedia sites.
To address this, the Wikimedia Reading web team has made the article download process on Wikimedia mobile sites more efficient by preventing unnecessary image downloads. We’ve already seen the positive impact of this change on the amount of data used to access Wikimedia mobile content around the world.
(If you’re a developer who is curious about how the change was made, we have a complete rundown in the last section of this post.)
Why we made the change
As of this year, over half of Wikimedia’s traffic comes from mobile devices. Readers access Wikipedia through mobile now more than ever, and we have to continue to understand and build for our readers’ changing needs.
From the Foundation’s work with the New Readers initiative, we know that in places like Nigeria and India, high data costs are considered one of the largest barriers to accessing and reading Wikipedia. Feature phones and lower-grade Android smartphones are the primary devices for connecting to the internet, and in Nigeria, internet access has been prohibitively expensive. Data is a precious commodity in many countries, due to high bandwidth costs, bandwidth caps, and inconsistent internet connections.
For context, the average web page consumes about 2.3MB of a mobile data plan. A web page is composed of several elements including the text you read, the CSS code that styles its interface, JavaScript code that makes the page more interactive, and images that illustrate it. Browsers do a good job of downloading these elements efficiently, but images and text respectively remain the biggest consumers of data.
To illustrate this impact, as of June 2016, the article about Japan on the Japanese Wikipedia contained 1.4MB of images, 195KB of text, 157KB of JavaScript and 8KB of CSS. Without loading any of the images for the article, that would translate to about 0.03USD in mobile data costs (on a post-paid data plan in Japan) rather than 0.15USD with all the images loaded for the article.
Similar stories can be told for people in Brazil reading the Portuguese article about Brasil or people in the United States reading the Barack Obama article in English.
We made this change as our research has indicated that many of our mobile users, despite downloading an entire article, do not read every single word. On the mobile site, many people presumably use Wikipedia as a quick fact lookup. Knowing this, we were concerned about the amount of images people downloaded unnecessarily, and how those downloaded images might then impact their ability to consume knowledge.
Photos are a ubiquitous element of Wikipedia’s most popular and highest quality articles, and this change now means that your phone will only load images as you scroll down a page, rather than on opening a page.
How much more efficient?
We wanted to see how this change impacted readers, so we looked at the traffic to our image servers across three language wikis for a week-long period before and after the change was made. We restricted our analysis to images that had been requested by page views—to avoid requests from external websites that we cannot control—by looking for a HTTP referrer header (a piece of information sent by web browsers to describe the context in which the request was made). We analysed the English Wikipedia because it has the highest volume by traffic, as well as the Japanese and Indonesian Wikipedias because these languages are mostly spoken inside a single geographical area—as we were also interested in the impact on speed, we wanted to rule out factors such as distance from the closest data center that would affect our results.
Our analysis showed that on the mobile site of Indonesian Wikipedia, our data centers served our visitors 187 gigabytes less, a 32% decrease compared to a week before the change. For the same period on English Wikipedia, the decrease in data usage was even greater: we shipped 4.9 terabytes less than normal (that’s enough data to fill 1042 DVDs), resulting in a 47% decrease. On the Japanese Wikipedia, the results were similar—we saw a 51% decrease in data usage. Projecting the savings across all of Wikipedia, we hope to annually save our users 450 terabytes of mobile data!
This reduction in data usage means web browsers will load Wikipedia pages in less time, because there’s less to load. Certain users on slower connections may even find their web pages display quicker, as there are now fewer requests battling for bandwidth. We’re now looking into whether these changes are significant, which can be challenging due to the limitations of older browsers, the scale of Wikipedia’s traffic and the limited information we collect about our users in keeping with our strong commitment to user privacy.
To further demonstrate the impact of this change, let’s go back to the example of the Japan article on the Japanese Wikipedia, which weighed 1.76MB, and consider a 500mb data plan. Assuming the user accessed the internet for no other purpose, that article could have been consulted 9 times each day for a month, before the reader incurred additional charges or lost internet connectivity. After our changes on that same data plan, that particular article weighs only 530KB and could be viewed up to 30 times a day!
Next steps
The positive results that we are seeing are just the start. We are currently monitoring our page view traffic to see if this change leads to readers spending more time on our websites. The Wikimedia Foundation is also working on reducing the amount of JavaScript and CSS we serve, as well as thinking about ideas around speeding up their delivery. We are exploring how using new open web technologies such as Service Workers can help get content to our users more quickly. We’re also thinking about offline use cases for those users who, at times, may have no connection at all. Outside mobile, we hope to explore how we might apply similar enhancements for our desktop readers.
Let us know how these changes have impacted you using this wiki page. Do you notice the difference? How has this changed your mobile reading experience? Have you noticed any bugs? What else could we be doing? We’d love to hear your thoughts.
How we did it (technical)
We also wanted to outline exactly how we made this change for technical audiences who might find the information useful. This section details how we prevented images from downloading unnecessarily, and is aimed at a developer audience.
Any image inside a block of HTML will be loaded unconditionally, so the only way to avoid this was to remove our image tags from the HTML output.
Rather than outputting an image into our HTML, we wrapped the image inside a <noscript> tag and appended a placeholder element with all the information needed to render the image via JavaScript. Our users who didn’t have JavaScript enabled would see the image inside the <noscript> tag and not benefit from the optimisation. For those with JavaScript, we had enough information to load the image when necessary.
<noscript>
<img alt=”A young boy (preteen), a younger girl (toddler), a woman (about age thirty) and a man (in his mid-fifties) sit on a lawn wearing contemporary c.-1970 attire. The adults wear sunglasses and the boy wears sandals.” src=”//upload.wikimedia.org/wikipedia/en/thumb/3/33/Ann_Dunham_with_father_and_children.jpg/300px-Ann_Dunham_with_father_and_children.jpg” width=”300″ height=”199″ class=”thumbimage” data-file-width=”320″ data-file-height=”212″>
</noscript>
<span class=”lazy-image-placeholder” style=”width: 300px;height: 199px;” data-src=”//upload.wikimedia.org/wikipedia/en/thumb/3/33/Ann_Dunham_with_father_and_children.jpg/300px-Ann_Dunham_with_father_and_children.jpg” data-alt=”A young boy (preteen), a younger girl (toddler), a woman (about age thirty) and a man (in his mid-fifties) sit on a lawn wearing contemporary c.-1970 attire. The adults wear sunglasses and the boy wears sandals.” data-width=”300″ data-height=”199″ data-class=”thumbimage”></span>
For those with JavaScript enabled, we listened to the window scroll event and for any unloaded images (those with temporary placeholders), which loaded them when they moved close to the viewport. We wanted the experience of loading an image to be seamless so we used a generous offset, to load images before they might be needed. We also checked if the placeholder was visible given that it might be in a collapsed section. In that case images showed when a reader expanded the section.
Many websites use a lower resolution image as a place holder. We decided against this because we felt it would be detrimental to the goal of avoiding unnecessarily sending bytes to our users. Instead we relied on a CSS animation to ease the transition from no image to image.
There was another set of users we had to consider—those with older browsers. To provide a better experience to our users on older browsers, we avoid running JavaScript, even if enabled. For these browsers we injected a small amount of JavaScript that replaced the placeholder with the original image tag, copying across all the necessary attributes. We were careful to use methods that enjoy broad browser support. For example rather than using getElementsByClassName we used the even more widely supported getElementsByTagName, which is supported by virtually all browsers.
var ns,i,p,img;
ns=document.getElementsByTagName(‘noscript’);
for(i=0;i<ns.length;i++){
The biggest challenges we experienced were ensuring the lazy image placeholders we were adding would not disrupt the presentation of the content. For example, images might be inline or block elements. We spent the majority of our time tweaking CSS rules to ensure disruption was minimal as possible. If you happen to find any bugs with our implementation please raise them!
Jon Robson, Senior Software Engineer Wikimedia Foundation
Imagine thousands of librarians from all parts of the world descending on a midwestern town in United States. What would they talk about?
To find out, the Wikimedia Foundation’s Wikipedia Library and GLAM-Wiki team traveled to Columbus during August to attend the World Library and Information Conference 2016 (#WLIC2016) hosted by the International Federation of Library Associations (IFLA) and its institutional supporters. We went to the conference hoping to help the library community get excited about the opportunities for collaborating with Wikipedia, by hosting an exhibit booth and giving a presentation.
And to our delight, we didn’t have to get people excited and start the conversation about Wikipedia in the library communities—librarians from all over the world were already doing it for us! We even found several presentations about Wikipedia editing campaigns hosted by libraries, such as participation in Art + Feminism.
Wikipedia became a hot topic throughout the conference, creating a backchannel of conversation on Twitter. But one conference only reaches a small community; that is why we have been working with the team at IFLA Headquarters to support the development of two public white papers which we first launched at the WLIC 2016.
How do we expose the world’s librarians to Wikipedia?
Enter libraries the world over, and wherever you find patrons using the internet, you will frequently also find patrons browsing Wikipedia. It’s hard to search the internet for information and not end up using Wikipedia information, whether you know it or not. But do the patrons get the best information for the topic they are looking for? Do they have the skills needed to use Wikipedia as part of a research process that helps with learning and advancement of human society?
We spoke to these points and more at our IFLA presentation. The talk introduced two draft papers produced by committees of volunteers and librarians who have explored the opportunities for Wikipedia and Libraries to collaborate. These committees, chaired by Wikimedians and library advocates Alex Hinojo of Amical Wikimedia and the Mylee Joseph of the State Library of New South Wales, created very strong first drafts of the papers that survey the opportunities for libraries to be more engaged in the Wikimedia community.
However, these committees were only able to scratch the surface of the experiences libraries and Wikipedia communities have with the Wikimedia community. For just one example, at the conference we discovered that Agricultural libraries at Cornell and Arizona State University had organized an editathon to cover key topic areas in agriculture! Wikipedia and the library communities have vast size, and it’s almost impossible for a small group of people to document: and that’s where we need your help!
We need you to help us refine the conversation!
As individual libraries it’s often hard to find the right skills, models and tools for collaborating so that the effort of libraries in improving access to information can work in collaboration with the effort of Wikipedians in sharing the sum of the world’s knowledge freely. However, there are hundreds of example partnerships throughout the world, that have demonstrated just how effective collaboration is.
That’s where the white papers can help: they seek to document the best of these opportunities. And we need your help expanding and refining them so that we can share the examples from around the world with anyone who wants to try them further. We invite you to join the conversation by reading and commenting at the following documents :
Thank you for your feedback, and for working to build the landscape of opportunities for Libraries and Wikipedia!
Alex Stinson, GLAM-Wiki Strategist, Wikimedia Foundation Julia Brungs, Policy and Research Officer, International Federation of Library Associations and Institutions (IFLA)
When a state government owns up to its wrongdoings, the consequences can be severe. The process of uncovering the past is politically wrought, and the findings are often deliberately obscured. Nonetheless, documenting and distributing the findings of these commissions, often through truth commissions, is an essential part of a government holding itself accountable for human rights abuses and other malfeasance.
David Webster’s Memory, truth and reconciliation in the developing world course at Bishop University focuses on bringing those findings to Wikipedia. In that course, student editors gather reports and write an article about a specific truth commission’s findings.
Webster’s students are making an incredible difference in this area. The course is responsible for more than 20% of the articles on Wikipedia’s list of truth and reconciliation commissions.
It’s an example that fuses human rights, an area many students are passionate about, with the incentive of raising public awareness through Wikipedia. The articles written for this class have been seen 1.83 million times. That’s a stunning impact on the awareness of human rights issues.
But through our online trainings and other tools, students are careful to let their passions guide their interest, but not their writing. Students emphasize a careful presentation and balance of views, carefully considering Wikipedia’s policies about equal weight to all sources. Students are encouraged to write for Wikipedia in a way that documents the discussion of these panels, not weigh in on it. That’s an essential skill that encourages students to carefully assess their sources, their writing, and their own positions.
We’re very proud of these contributions to Wikipedia, and the excellent work carried out by these students!
Do you have an interest in a similar project for your own course? Wiki Ed can help by providing resources and guidance to make sure students understand the value of balanced, encyclopedic writing. Our staff can provide trainings for students that helps them stay on the right side of Wikipedia’s policies. Many instructors use this assignment as a stepping stone to a broader, more reflective position or policy paper. That way, students are motivated by sharing their knowledge of history to raise awareness.
We’d love to hear your ideas. Reach out to us to start a conversation: contact@wikiedu.org.
So I’ve started a new job: I’m now working for the Wikimedia Foundation in the Community Tech team. It’s really quite amazing, actually: I go to “work” and do things that I really quite like doing and would be attempting to find time to do anyway if I were employed elsewhere. Not that I’m really into the swing of things yet—only two weeks in—but so far it’s pretty great.
I’m really excited about being part of an organisation that actually means something.
Imagine a world in which every single human being can freely share in the sum of all knowledge. That’s our commitment.
It’s a bit cheesy to quote that I know, but still: how nice it is to think that there’s something higher up the orgchart than an ever-increasing concentration of money.
The Tech News weekly summaries help you monitor recent software changes likely to impact you and your fellow Wikimedians. Subscribe, contribute and give feedback.
Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.
The new version of MediaWiki will hopefully be on test wikis and MediaWiki.org from 20 September. It will be on non-Wikipedia wikis and some Wikipedias from 21 September. It will be on all wikis from 22 September (calendar). This is the version that was meant to go out last week.
Meetings
You can join the next meeting with the VisualEditor team. During the meeting, you can tell developers which bugs you think are the most important. The meeting will be on 20 September at 19:00 (UTC). See how to join.
My turn to make a #SundayQuery! As Harmonia Amanda just said in her own article, I was about explain how to make a Python script to fix the results of her query… But I thought I should start with another script, similar but shorter and easier to understand. The script for Harmonia is here, though.
On Thursday, I published an article about medieval battles, and since then, I did start to fix battle items on Wikidata. One of the most repetitive fixes is the capitalization of the French labels: as they have been imported from Wikipedia, the labels have an unnecessary capital first letter (“Bataille de Saint-Pouilleux en Binouze” instead of “bataille de Saint-Pouilleux en Binouze”)
The query
So first, we need to find all the items that have this typo:
looks for items that are battles or subclasses of battles, just to be sure I’m not making changes to some book called “Bataille de Perpète-les-Olivettes”…
On the next line, I query the labels for the items
?item rdfs:label ?label .
and filter to keep only those in French
FILTER(LANG(?label) = "fr") .
. As I need to use the label inside the query and not merely for display (as Harmonia Amanda just explained in her article), I cannot use the wikibase:label, and so I use the semantic web standard rdfs:label.
The last line is a
FILTER
, that keeps only those of the results that matches the function inside it. Here,
STRSTARTS
checks if
?label
begins with
"Bataille "
.
As of the time I write this, running the query returns 3521 results. Far too much to fix it by hand, and I know no tool that already exists and would fix that for me. So, I guess it’s Python time!
The Python script
I love Python. I absolutely love Python. The language is great to put up a useful app within minutes, easily readable (It’s basically English, in fact), not cluttered with gorram series of brackets or semicolons, and generally has great libraries for the things I do the most: scraping webpages, parsing and sorting data, checking ISBNs[1] and making websites. Oh and making SPARQL queries of course[2].
Preliminary thoughts
If you don’t know Python, this article is not the right place to learn it, but there are numerous resources available online[3]. Just make sure they are up-to-date and for Python 3. The rest of this articles assumes that you have a basic understanding of Python (indentation, variables, strings, lists, dictionaries, imports and “for” loops.), and that Python 3 and pip are installed on your system.
Why Python 3? Because we’ll handle strings that come from Wikidata and are thus encoded in UTF-8, and Python 2 makes you jump through some loops to use it. Plus, we are in 2016, for Belenos’ sake.
Why pip? because we need a non-standard library to make SPARQL queries, called SPARQLwrapper, and the easiest way to install it is to use this command:
pip install sparqlwrapper
Now, let’s start scripting!
For a start, let’s just query the full list of the sieges[4]:
That’s quite a bunch of lines, but what does this script do? As we’ll see, most of this will be included in every script that uses a SPARQL query.
First, we import two things from the SPARQLWrapper module: the SPARQLWrapper object itself and a “JSON” that it will use later (don’t worry, you won’t have to manipulate json files yourself.)
Next, we create a “endpoint” variable, which contains the full URL to the SPARQL endpoint of Wikidata[5].
Next, we create a SPARQLWrapper object that will use this endpoint to make queries, and put it in a variable simply called “sparql”.
We apply the setQuery function to this variable, which is where we put the query we used earlier. Notice that we need to replace { and } by {{ and }} : { and } are reserved characters in Python strings.
sparql.setReturnFormat(JSON)
tells the script that what the endpoint will return is formated in json.
results = sparql.query().convert()
actually makes the query to the server and converts the response to a Python dictionary called “results”.
And for now, we just want to print the result on screen, just to see what we get.
That’s a bunch of things, but we can see that it contains a dictionary with two entries:
“head”, which contains the name of the two variables returned by the query,
and “results”, which itself contains another dictionary with a “bindings” key, associated with a list of the actual results, each of them being a Python dictionary. Pfew…
It is a dictionary that contains two keys (label and item), each of them having for value another dictionary that has a “value” key associated with, this time, the actual value we want to get. Yay, finally!
Parsing the results
Let’s parse the “bindings” list with a Python “for” loop, so that we can extract the value:
for result in results["results"]["bindings"]:
qid = result['item']['value'].split('/')[-1]
label = result['label']['value']
print(qid, label)
Let me explain the
qid = result['item']['value'].split('/')[-1]
line: as the item name is stored as a full url (“https://www.wikidata.org/entity/Q815196” and not just “Q815196”), we need to separate each part of it that is between a ‘/’ character. For this, we use the “split()” function of Python, which transforms the string to a Python list containing this:
We only want the last item in the list. In Python, that means the item with the index -1, hence the [-1] at the end of the line. We then store this in the qid variable.
Let’s launch the script:
$ python3 fix-battle-labels.py
Q815196 Siège de Pskov
Q815207 Siège de Silistra
Q815233 Siège de Tyr
Q608163 Siège de Cracovie
Q1098377 Siège de Narbonne
Q2065069 Siège de Hloukhiv
Q4087405 Siège d'Avaricum
Q2284279 Siège de Fort Pulaski
Q4337397 Siège de Liakhavitchy
Q4337448 Siège de Smolensk
Q701067 Siège de Rhodes
Q7510162 Siège de Cracovie
Q23013145 Siège de Péronne
Q10428014 Siège de Pskov
Q3090571 Siège du Hōjūjidono
Q3485893 Siège de Fukuryūji
Q4118683 Siège d'Algésiras
Q5036985 Siège de Berwick
Q17627724 Siège d'Ilovaïsk
Q815112 Siège d'Antioche
Fixing the issue
We are nearly there! Now what we need is to replace this first proud capital “S” initial by a modest “s”:
label = label[:1].lower() + label[1:]
What is happening here? a Python string works like a list, so we take the part of the string between the beginning of the “label” string and the position after the first character (“label[:1]”) and force it to lower case (“.lower()”). We then concatenate it with the rest of the string (position 1 to the end or “label[1:]”) and assign all this back to the “label” variable.
Last thing, print it in a format that is suitable for QuickStatements:
out = "{}\tLfr\t{}".format(qid, label)
print(out)
That first line seems barbaric? it’s in fact pretty straightforward:
"{}\tLfr\t{}"
is a string that contains a first placeholder for a variable (“{}”), then a tabulation (“\t”), then the QS keyword for the French label (“Lfr”), then another tabulation and finally the second placeholder for a variable. Then, we use the “format()” function to replace the placeholders with the content of the “qid” and “label” variables. The final script should look like this:
#!/usr/bin/env python3
from SPARQLWrapper import SPARQLWrapper, JSON
endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"
sparql = SPARQLWrapper(endpoint)
sparql.setQuery("""
SELECT ?item ?label WHERE {{
?item wdt:P31/wdt:P279* wd:Q178561 .
?item rdfs:label ?label . FILTER(LANG(?label) = "fr") .
FILTER(STRSTARTS(?label, "Siège ")) .
}}
""") # Link to query: http://tinyurl.com/z8bd26h
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
qid = result['item']['value'].split('/')[-1]
label = result['label']['value']
label = label[:1].lower() + label[1:]
out = "{}\tLfr\t{}".format(qid, label)
print(out)
Let’s run it:
$ python3 fix-battle-labels.py
Q815196 Lfr siège de Pskov
Q815207 Lfr siège de Silistra
Q815233 Lfr siège de Tyr
Q2065069 Lfr siège de Hloukhiv
Q2284279 Lfr siège de Fort Pulaski
Q1098377 Lfr siège de Narbonne
Q608163 Lfr siège de Cracovie
Q4087405 Lfr siège d'Avaricum
Q4337397 Lfr siège de Liakhavitchy
Q4337448 Lfr siège de Smolensk
Q701067 Lfr siège de Rhodes
Q10428014 Lfr siège de Pskov
Q17627724 Lfr siège d'Ilovaïsk
Q23013145 Lfr siège de Péronne
Q815112 Lfr siège d'Antioche
Q3090571 Lfr siège du Hōjūjidono
Q3485893 Lfr siège de Fukuryūji
Q4118683 Lfr siège d'Algésiras
Q5036985 Lfr siège de Berwick
Yay! All we have to do now is to copy and paste the result to QuickStatements and we are done.
I originally posted about the Wikidata maps back in early 2015 and have followed up with a few posts since looking at interesting developments. This is another one of those posts covering the changes since the last post, so late 2015, to now, May 2016.
The new maps look very similar to the naked eye and the new ‘big’ map can be seen below.
So while at the 2016 Wikimedia Hackathon in Jerusalem I teamed up with @valhallasw to generate some diffs of these maps, in a slightly more programatic way to my posts following up the 2015 Wikimania!
In the image below all pixels that are red represent Wikidata items with coordinate locations and pixels that are yellow represent items added between October 27, 2015 and April 2, 2016 with coordinate locations. Click the image to see it full size.
The area in eastern Europe with many new items is Belarus and the area in eastern Africa is Uganda. Some other smaller clusters of yellow pixels can also be seen in the image.
All of the generated images from April 2016 can be found on Wikimedia Commons at the links below:
It’s Sunday again! Time for the queries! Last week I showed you the basics of SPARQL; this week I wanted to show you how we could use SPARQL to do maintenance work. I assume you now understand the use of PREFIX, SELECT, WHERE.
I have been a member of the WikiProject:Names for years. When I’m not working on Broadway and the Royal Academy of Dramatic Art archives,[1] I am one of the people who ensure that “given name:Christopher (Iowa)” is transformed back to “given name:Christopher (given name)”. Over the last weeks I’ve corrected thousands of wrong uses of the given name/family name properties, and for this work, I used dozens of SPARQL queries. I thought it could be interesting to show how I used SPARQL to create a list of strictly identical errors that I could then treat automatically.
What do we search?
If you read the constraints violations reports, you’ll see that the more frequent error for the property “family name” (P734) is the use of a disambiguation page as value instead of a family name. We can do a query like that:
SELECT ?person ?personLabel ?name ?nameLabel
WHERE {
?person wdt:P734 ?name . #the person has a family name
?name wdt:P31 wd:Q4167410 . #the item used as family name is a disambiguation page
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr" . } #We want to have the results in English or in French if there is no English label
}
But then we find something more interesting: there are entities which are both a disambiguation page and a family name. What?! That’s ontologically wrong. To use the wrong value as family name is human error; but an entity can’t be both a specific type of Wikimedia page and a family name. It’s like saying a person could as well be a book. Ontologically absurd. So all items with both P31 need to be corrected. How many are there?
SELECT DISTINCT ?name ?nameLabel (LANG(?label) AS ?lang)
WHERE {
?name wdt:P31 wd:Q101352 ; #the entity is a family name
wdt:P31 wd:Q4167410 . #the entity is also a disambiguation page
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr" . } #We want to have the results in English or in French if there is no English label
}
link to the query.
Several thousands again. Actually, there are more entities which are both a disambiguation page and a family name than there are person using disambiguation pages as family names. This means there are family names/disambiguation pages in the database which aren’t used. They’re still wrong but it doesn’t show in violation constraints reports.
If we explore, we see than there are different cases out there: some of the family names/disambiguation pages are in reality disambiguation pages, some are family names, some are both (they link to articles on different Wikipedia, some about a disambiguation page and some about a family name; these need to be separated). Too many different possibilities: we can’t automatize the correction. Well… maybe we can’t.
Restraining our search
If we can’t treat in one go all disambig/family name pages, maybe we can ask a more precise question. In our long list of violations, I asked for English label and I found some disbelieving ones. There were items named “Poe (surname)”. As disambiguation pages. That’s a wrong use of label, which shouldn’t have precisions about the subject in brackets (that’s what the description is for) but if they are about a surname they shouldn’t be disambiguation pages too! So, so wrong.
Querying labels
But still, the good news! We can isolate these entries. For that we’ll have to query not the relations between items but the labels of the items. Until now, we had used the SERVICE wikibase:label workaround, a tool which only exists on the Wikidata endpoint, because it was really easy and we only wanted to have human-readable results, not really to query labels. But now that we want to, the workaround isn’t enough, we’ll need to do it the real SPARQL way, using rdfs.
Our question now is: can I list all items which are both family names and disambiguation pages, whose English label contains “(surname)”?
SELECT DISTINCT ?name ?label (LANG(?label) AS ?lang)
WHERE {
?name wdt:P31 wd:Q101352 ; #the entity is a family name
wdt:P31 wd:Q4167410 ; #the entity is also a disambiguation page
rdfs:label ?label . #the entity have a label
FILTER(LANG(?label) IN ("en")). #this label exists in English
FILTER(CONTAINS(?label, "(surname)")). #this label contains a specific string
}
link to the query. We had several hundreds results.[2] You should observe the changes I made in the SELECT DISTINCT as I don’t use the SERVICE wikibase:label workaround.
Querying sitelinks
Can we automatize correction now? Well… no. There is still problems. In this list, there are items which have links to several Wikipedia, the English one about the surname and the other(s) ones about a disambiguation page. Worse, there are items which don’t have an English interwiki any longer, because it was deleted or linked to another item (like the “real” family name item) and the wrong English label persisted. Si maybe we can filter our list to only items with a link to the English Wikipedia. For this, we’ll use schema.
SELECT DISTINCT ?name ?label (LANG(?label) AS ?lang)
WHERE {
?name wdt:P31 wd:Q101352 ; #the entity is a family name
wdt:P31 wd:Q4167410 ; #the entity is also a disambiguation page
rdfs:label ?label . #the entity have a label
?sitelink schema:about ?name . #We want the entity to have a sitelink
schema:inLanguage "en" ; #this sitelink is in English
schema:isPartOf <https://en.wikipedia.org/> . #and link to the English WP (and not Wikisource or others projects)
FILTER(LANG(?label) IN ("en")). #the label exists in English
FILTER(CONTAINS(?label, "(surname)")). #the label contains a specific string
}
link to the query. Well, that’s better! But our problem is still here: if they have several sitelinks, maybe the other(s) sitelink are not about the family name. So we want the items with an English interwiki and only an English interwiki. Like this:
SELECT DISTINCT ?name ?label (LANG(?label) AS ?lang)
WHERE {
?name wdt:P31 wd:Q101352 ; #the entity is a family name
wdt:P31 wd:Q4167410 ; #the entity is also a disambiguation page
rdfs:label ?label . #the entity have a label
?sitelink schema:about ?name . #We want ?name to have a sitelink
?WParticle schema:about ?name ; #We'll define the characteristic of the sitelink
schema:inLanguage "en" ; #this sitelink is in English
schema:isPartOf <https://en.wikipedia.org/> . #and link to the English WP (and not Wikisource or others projects)
FILTER(LANG(?label) IN ("en")). #the label exists in English
FILTER(CONTAINS(?label, "(surname)")). #the label contains a specific string
} GROUP BY ?name ?label HAVING (COUNT(DISTINCT ?sitelink) = 1) #With only one sitelink
Several things: we separated ?sitelink and ?WParticle. We use ?sitelink to query the number of sitelinks, and ?WParticle to query the particular of this sitelink. Note that we need to use GROUP BY, like last week.
Polishing of the query
Just to be on the safe side (we are never safe enough before automatizing corrections) we’ll also check that all items on our list are only family name/disambiguation pages; they’re not also marked as a location or something equally strange. So we query that they have only two P31 (instance of), these two being defined as Q101352 (family name) and Q4167410 (disambiguation page).
SELECT DISTINCT ?name ?label (LANG(?label) AS ?lang)
WHERE {
?name wdt:P31 ?type ; #the entity use the property P31 (instance of)
wdt:P31 wd:Q101352 ; #the entity is a family name
wdt:P31 wd:Q4167410 ; #the entity is also a disambiguation page
rdfs:label ?label . #the entity have a label
?sitelink schema:about ?name . #We want ?name to have a sitelink
?WParticle schema:about ?name ; #We'll define the characteristic of the sitelink
schema:inLanguage "en" ; #this sitelink is in English
schema:isPartOf <https://en.wikipedia.org/> . #and link to the English WP (and not Wikisource or others projects)
FILTER(LANG(?label) IN ("en")). #the label exists in English
FILTER(CONTAINS(?label, "(surname)")). #the label contains a specific string
} GROUP BY ?name ?label HAVING ((COUNT(DISTINCT ?type) = 2) && (COUNT(DISTINCT ?sitelink) = 1)) #With only two P31 and one sitelink
It should give you a beautiful “no matching records found”. Yesterday, it gave me 175 items which I knew I could correct automatically. Which I have done, with a python script made by Ash_Crow. If you are good, he’ll make a #MondayScript in response to this #SundayQuery!
Ken’s an amazing illustrator who helped us with our banner and logo, both of which you can see prominently on our Twitter page, where we update a bit more frequently, to say the least. Thanks again, Ken!
You may have noticed a recent addition to the Articles tab of dashboard.wikiedu.org course pages: “structural completeness”. This feature is an experiment in visualizing the history of articles as they develop.
The structural completeness data comes from the “Objective Revision Evaluation Service” (ORES), a Wikimedia Foundation research project that uses machine learning to analyze Wikipedia articles and individual edits. I started digging into ORES last year to see how well the “wp10” scores — estimates of what score an article would get on the Wikipedia 1.0 scale from Stub to Feature Article at any point in its history — map to the work that student editors do in our classes. What I found was that even small changes in the ORES wp10 score were meaningful in terms of the changes that happened to an article. While the scores don’t account for the intellectual content of articles, they give a great sense of the major — and minor — changes of an individual article over time.
In the dashboard, I’m calling this data “structural completeness”, because the scores are based on how well an article matches the typical structural features of a mature Wikipedia article. The machine learning model calculates scores based on the amount of prose, the number of wikilinks to other articles, the numbers of references, images, headers and templates, and a few other basic features. Down the road, we may be able to use this data to give automated suggestions about what aspects of their article an editor should focus on next — whether adding links to related topics, improving the citations, or breaking it into sections as it grows.
Take a look at how articles by student editors develop. When you spot a big change in the structural completeness score, this usually means something interesting happened to the article that suddenly made it look a lot more (or a lot less) like a typical well-developed Wikipedia article.
I’ll continue to iterate on these visualizations; our goal is to make it as easy as possible to both get an overview of an article’s history and to drill down to the details of individual edits. If you have ideas, comments, or you notice something really interesting with these visualizations, drop me a line!
“Where do cyclists cycle? An important question not just for city planners”, stated the German business journal “manager magazin“ 1 | here: London (Photo: Heat created by Strava cyclists. Map provided by Mapbox/Open Street Map)
About us
Do you wish to join the diverse weeklyOSM team or start weeklyOSM in your language? Or are you just curious to know how weeklyOSM works? If so, we’d be delighted to see you at our workshop at SotM. If you look at this map of where weeklyosm is currently produced in, you can see that we are looking for YOU!
Mapping
The Mapbox team now intends to focus on OSM data in French cities. In his English blog BharataHS summarizes his questions to understand French road rules.
User BushmanK explains on why he believes that the healthcare=midwife is a poorly designed tag.
Mountaineer’s Mailbox has been proposed as a new value for the man_made tag. The proposal was also discussed on the Tagging mailing list.
On the talk-GB, the British community discusses its upcoming quarterly project. Proposals range from opening hours on speed limits to food hygiene ratings. In preparation for these tasks, interesting analysis tools are used.
User mapmeld writes a dairy about transliterating place names around the world, with an open source crowdsourcing tool called CityNamer. This project uses OSM data and account details, but does not save edits yet.
Community
Gmane website has been off the air for a few weeks, Martin writes about reviving this web interface and get it working.
OSM Awards 2016 are the community awards. Have you voted for your favourites in the six categories? The voting ends on September 22nd. Frederik Ramm explained (automatic translation) in the forum why you should vote. Nakaner presents his view point on the candidates in his user diary.
User PeWu from Poland presents OSM History Viewer – a tool to view the history of OSM nodes, ways and relations. In his post in the OpenStreetMap Forum he added some nice examples. (Example1, Example2). The source code and the examples are available on Github.
Mapper of the month, SomeoneElse, shares his OSM journey so far and his upcoming work.
The Saarländischer Rundfunk, a German TV channel, broadcasted an excellent video (start playing from 16:39 min) by Herbert Mangold on “Mapping with OSM”. This video talks about Mundraub, an OSM based map, how mapping efforts assist in fighting catastrophes like the Ebola crisis in Africa and the importance of local knowledge. Few members of the weeklyOSM team have also been featured in the video!
There is a survey aimed at OSM users who edit in Argentina, in order to learn more about what they think about the project and also determine the possibility to organize task groups and contribute to the enrichment of the map.
Events
As a run up to the State of the Map conference at Brussels, there is a call for informal sessions including Birds of a Feather (BoFs). Take a look at some of the proposed sessions.
The upcoming week is slated to be eventful at Brussels with five big Geo-events lined up including a hackday and a mapathon! (It surely is Meptember!)
Selene Yang writes in her user diary writes about 25th September being the last date for submitting proposals for the State of the Map Latam which is going to be held from 25th to 27th November in Brazil.
At this year’s State of The Map conference, members from the various local communities can share their experience during the State of the Local Map and the Local Chapters Congress.
[1] The German business journal ‘Manager Magazin’ illustrates the cyclist routes of the ten major European cities.
During ongoing system upgrade, the OSMF Tileserver will be migrated on Mapnik 3. Tom Hughes writes about the same on the Talk-list.
Luke Smith writes about grough developing a composite map, which would be a combination obtained by blending OSM data with OS OpenData to fill in the gaps, and using public rights of way data directly from the local authorities which have released it.
switch2OSM
The social network Diaspora shows in its latest version, locations on an OSM map. (automatic translation)
Open Data
The Array of Things (AoT) is an urban sensing project, a network of interactive, modular sensor boxes that will be installed around Chicago to collect real-time data on the city’s environment, infrastructure, and activity for research and public use.
When Toursprung found out that a German TV station used their OSM based map, they billed them for it and donated the amount (200 €) to OSM.
Programming
Mapzen has set up a Personal Package Archives or PPAs for Ubuntu for its routing engine Valhalla. This makes it possible to install Valhalla with a simple apt-get command.
An interesting article in the Melbourne Age, on how economics can make your dinner taste better! They did some analysis and matched review data about restaurants with geospatial data from OpenStreetMap and found that there is a strong negative link between restaurant quality – as defined by star ratings – and proximity to tourist attractions and street corners.
Other “geo” things
You do not want to be constantly tracked by Google and hence delete its mapping service on your phone. Would that suffice? Google makes it harder to evade it’s data collection.
Here is a video of generating a 3D city model in LOD1 by extrusion, with the software 3dfier developed by the 3D Geoinformation group at TU Delft and partially funded by Kadaster.
Japan is mapping its streets in 3D to support their autonomous taxis for the 2020 Olympic Games.
Tanvi Misra published an article on CityLab with the headline “Gorgeous Maps of an Ugly War” about the ongoing conflict in Ukraine.
Owen Powell, a GIS analyst and cartographer, explains his workflow in creating both beautiful and accurate digital 3D maps using Blender and GIS data in great detail.
German science magazine “Spektrum” discusses possible effects of the use of electronic navigation aids on our natural sense of direction.
User schleuss shares interesting aerial imagery captured during the LA Building import project. A batman landing rooftop, a hexagonal pool and a bunch of green cars are among the weird things seen from above Los Angeles.
Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..
This weekly was produced by Hakuch, Nakaner, Rogehm, SeleneYang, SomeoneElse, SrrReal, derFred, escada, jinalfoflia, mgehling.
Wikipedia editors will be updating articles on Emmy categories and nominees in real time, from Game of Thrones and Mr. Robot to Sherlock. You can watch the 68th Primetime Emmy Awards with them this Sunday to get the full rundown on your favorite programs, ad-free and in real-time.
CAWylie, a Wikipedian who edits many Emmy-related articles, says that Wikipedians have a formula for updating quickly and accurately. “The main advantage Wikipedia has in updating current events is it seems more streamlined. Usually we have templates in place wherein the nominees are already listed, so updating it live is simply a formatting move or boldfacing the winner.”
That means that if a show you haven’t watched yet or a person you haven’t seen before wins, you’ll be able to jump on a second screen to a Wikipedia article and learn about them, even as they’re being edited.
You can find almost anything Emmy-related on Wikipedia, including records from every category. Last year, for example, Game of Thrones broke the record for most wins garnered by a single season, and Saturday Night Live has been nominated 209 times in its lifetime.
CAWylie added, “Emmy articles are an ebb and flow type that require a WikiProject to monitor and follow continuity year to year. That’s where the aforementioned formula works best.”
And it’s this constant ebb and flow that makes Wikipedia a valuable second screen. No matter who wins in the actual program, it’s almost as if Wikipedia is running a competition of its own. Check out the pageview standings below to find out which film or series is Wikipedia’s most popular.
Have fun watching the Emmy Awards, and don’t forget to take us with you.
Photo from the/altered by White House staff, public domain/CC0.
Drama holds Wikipedia’s Iron Throne
If Wikipedia views are a viable metric, Game of Thrones will run away with an Emmy.
With over twelve million pageviews from July 2015 to May 2016, which roughly equates to the period during which an Emmy-nominated show must air,* Game of Thrones is far ahead of the other programs. It’s part of a wealth of riches in the drama category, as Mr. Robot and House of Cards—sitting behind Game of Thrones with seven and six million pageviews, respectively—have more pageviews than any program in the other categories.
Other Emmy wiki-winners include Fargo, Modern Family, and, with the lowest total of any first-place finisher, Last Week Tonight with John Oliver.
*Statistics here are derived from Pageviews Analysis and run from July 2015 to May 2016, which roughly corresponds with the Emmy eligibility criteria (June 2015 to May 2016); reliable and accurate data is not available for June and earlier. The time period does rob some articles of views—Roots, for example, started airing at the end of May 2016 and is possibly the most heavily affected article, with several hundred thousand views lost. Statistics for The People v. O. J. Simpson: American Crime Story were affected by a page move that added a space between “O” and “J”; the numbers, however, should still be accurate.
Aubrie Johnson, Social Media Associate
Ed Erhart, Editorial Associate
Wikimedia Foundation
Members of Wikimedia Spain; Garcia is in the front row, left side, with a black and yellow Pacman shirt. Photo by Pedro J Pacheco, CC BY-SA 4.0.
Way back in August 2003, David García was browsing the Spanish Wikipedia while looking for information about mathematics and computer science, but he found it lacking and decided to fix it.
Hitting the “edit” button that day was a decision that changed García’s life—he has been editing, with a few breaks, ever since, celebrating thirteen years just a few months ago. Wikipedia has evolved over the years, and what García calls his first “little changes” have since turned into significant contributions that have not gone unnoticed.
A web developer, García was born and raised in Madrid, Spain; he now lives in Chicago, Illinois, in the United States. García has created 2,242 pages on the Spanish Wikipedia, and he has edited Wikipedia and the Wikimedia projects more than 100,000 times because he loves the idea of sharing.
“Being able to share what you’ve learned with other people is exciting,” he explains. “I love the idea of helping people and getting help from others. It is an amazing feeling when you are in Spain and can get guidance from someone in South Africa, China, Bolivia, Mexico or anywhere.”
Over the years, Wikipedia’s appearance, structure and content have changed extensively. In the first few years, not only was Wikipedia very different from what it is today, site visitors had different expectations and editors had different motivations for contributing.
“The Spanish Wikipedia had around 10,000 articles when I started editing,” García recalls. It now has 1.3 million. “The layout was totally different then—it looked like most websites in the 1990s. I remember that important articles were missing. There was no entry for triangular number until I created it in January 2004, and it did not include articles on many famous people like Martin Luther King until others finally created those.”
García has quit editing Wikipedia twice, once between 2006 and 2008 and then in 2014 for a year. During this time he had used Wikipedia to look for information, and every time he asked himself, “Why don’t you participate again?” The visceral nature of the site, however, means that he feels Wikipedia has changed when he returns; he’s found that he needs to “get used to it and learn about the policy and community nature all over again” each time.
García believes that being a Wikipedian does not necessarily require experience in the field you plan to write about. Since it is an encyclopedia that “anyone can edit,” anyone is eligible to do at any time once they have the references for a topic. If they don’t know where to start or what to write about, they can take on anything that interests them.
Sometimes, for example, García has decided to cover a topic of professional interest to him, like the article he wrote on prime numbers, a good article on the Spanish Wikipedia (meaning that it has met high-quality standards set by the community). He has also written about topics of personal interest, such as the most recent article he has created—Lapsed Catholic (in Spanish: Católico No Practicante).
Samir ElSharbaty, Digital Content Intern Wikimedia Foundation
“The decade” is a new blog series that will, as you might expect, profile Wikipedians who have spent ten years or more editing Wikipedia. If you know of a long-time editor who we should talk to, send us an email at blogteam[at]wikimedia.org.
When students study a new topic, they often turn to a search engine to get a better understanding of the topic. Those search results take them to Wikipedia, where (hopefully) they find a comprehensive and understandable summary. As they begin to understand the concept, they scroll to the bottom to find sources for further reading. Students find links to academic articles within their university libraries and click through for a deeper reading.
That’s how it works for students. But what about the rest of the world – those who can’t access those journal articles? Wikipedia may be their only source of information.
That’s one of the reasons we launched the Wikipedia Year of Science. If Wikipedia is the general public’s science primer, we believe it should be as comprehensive and accurate as possible. Most importantly, it should be understandable.
This year, science students all over the United States and Canada have participated in our initiative to create science content that your typical non-scientist can understand. They’re educating the public while learning how to communicate science. Students are already making Wikipedia better for the world. But we’re not satisfied yet!
That’s why we attended so many science conferences this summer—to spread the word about teaching with Wikipedia.
In July, we attended the Allied Genetics Conference, where we met dozens of university instructors who want the public to understand how geneticists’ research is transforming the world. We joined plant biologists at the American Society of Plant Biologists’ annual meeting, where scientists stressed the importance of educating the world about increasing the food supply over the next century. Again, Wikipedia is the place to do so. Later in August, Wiki Ed attended the Botanical Society of America’s conference, the Joint Statistics Meeting, MathFest, the Ecological Society of America’s conference, and the American Chemical Society’s fall meeting.
The common thread across all of these events? Science communication. In fact, a quick search of these conferences’ programs turns up nearly 100 results for sessions about science communication and public engagement. In a digital world that provides so much information to the curious among us, scientists need to learn how to speak to people without their expertise and rigorous research background. Writing Wikipedia is one way our future scientists can develop this skill.
Won’t you join us? If you’d like to work with us during the Year of Science and beyond, we’d love to hear from you. Whether you’re a higher education instructor looking to bring Wikipedia into your course, a librarian looking to expand access to your special collections with a Visiting Scholar, or you’re interested in offering financial support, reach out to us: contact@wikiedu.org.
The Classroom Program has been busy onboarding instructors for the fall term. The fall marks the second half of the Year of Science, and continues Wiki Ed’s trend of growth. We’ve nearly doubled the number of supported courses compared to this time last fall.
Wiki Ed staff was on the road in August, promoting the Year of Science with instructors at the Joint Statistics meeting, MathFest, the Ecological Society of America, and American Chemical Society events. These events create an environment for face-to-face contact with experts in the sciences, and provide an opportunity to raise awareness among the scientific community.
We’ve opened new Visiting Scholars positions: Brown University, which focuses on ethnic studies, and Temple University, which is seeking to contribute to the improvement of articles about Philadelphia history and/or the Holocaust.
Wiki Ed has produced a new subject-specific brochure for students developing Wikipedia content on topics related to linguistics. The guide is a nod to our partnership with the Linguistics Society of America, and discusses scaffolds and frameworks for articles related to dialects and concepts in linguistics.
Wiki Ed’s Student Learning Outcomes research project’s surveys were approved by the University of Massachusetts Amherst’s Human Subjects Research Protection Office and Internal Review Board. These voluntary surveys were distributed to students via the Wiki Ed Dashboard, and will form the core of Research Fellow Zach McDowell’s analysis.
Programs
Educational Partnerships
August was a busy month for the Educational Partnerships team. Staff traveled to several academic conferences to promote the Wikipedia Year of Science. Educational Partnerships Manager Jami Mathewson attended the Botanical Society of America’s annual meeting in Savannah, Georgia, where she spoke with botanists about plant physiology and taxonomy. While there, she talked to instructors about the role their students can play in improving Wikipedia articles related to the plants they study.
Outreach Manager Samantha Erickson attended the Joint Statistics Meeting in Chicago. This visit focused on increasing communication skills in students through Wikipedia, encouraging instructors to see the role our assignments have in elevating the public understanding of statistical concepts.
Director of Programs LiAnna Davis joined Jami at MathFest in Columbus, Ohio. Math instructors expressed an increased interest in Wikipedia writing assignments, based on the communication experience they provide. Students need to develop communication skills during their studies. Math departments want to help their students be more competitive when they enter the workforce. A Wikipedia assignment is an excellent fit, since math articles on Wikipedia, though often accurate, are difficult for laypeople to comprehend. When students translate that content and make it accessible to the general public, they build skills otherwise overlooked in the math classroom.
Jami and Samantha went to Fort Lauderdale for the Ecological Society of America conference. There, instructors and students alike were interested in Wiki Ed’s Ecology handbook, which aids ecologists and experts in editing Wikipedia.
Wrapping up the month of travel and outreach, Jami attended the American Chemical Society’s fall meeting in Philadelphia, PA. There, she presented to attendees about using Wikipedia as a pedagogical tool in the chemistry classroom. She also joined the Simons Foundation’s edit-a-thon, where participants learned how to contribute and focused on articles about chemistry or women chemists.
Classroom Program
Status of the Classroom Program for Fall 2016 as of August 31:
138 Wiki Ed-supported courses were in progress (69, or 50%, were led by returning instructors)
720 student editors were enrolled
86% of students were up-to-date with the student training
Students edited 32 articles and created 2 new entries.
The Fall 2016 term has started, and we’re well on our way to supporting our largest number of classes to date. This time last year, Wiki Ed had 86 courses in progress, compared to 138 this term. In Fall 2015 as a whole, we supported 162 courses, just 24 more than where we stand today, with the fall term yet to begin. This growth is due in large part to our outreach team and to our ability to provide instructors and students with meaningful learning experiences.
As the Fall term begins, we’re also entering the second half of the Year of Science. So far for Fall 2016, we have 82 courses in STEM and social science fields, and we anticipate many more to come on board as the term progresses. In Spring and Summer 2016, we supported 130 courses and over 2,300 students during this year-long initiative to improve science content on Wikipedia and science literacy and communication among our students. Our Year of Science courses have ranged from genetics to archaeology and from sociology to plant biology. With a half year still to go, our students have already made a significant impact on Wikipedia. They’ve added over 2.3 million words, edited over 2,300 articles, and created almost 200 new entries. We’re excited to see what the second half of the Year of Science brings!
Some examples of article expansions are coming in from summer courses:
Tamarins are small monkeys found in Central and South American. The black tamarin, one of the smallest primates, is found exclusively in northeastern Brazil where it is threatened by habitat destruction. At the start of the summer, the Wikipedia article on the black tamarin was a two-sentence stub. Students in Nancy Clum’s Biology 124 BKclass spend the summer expanding the article. They added a section describing the species, and others about its distribution, behavior, feeding, reproduction, and conservation status. And in so doing, they turned a stub into an informative article.
Kathryn Grafton’s course from the University of British Columbia took advantage of the shorter term length in the summer to investigate a specific kind of content gap: omissions in articles themselves. Students looked at articles related to knowledge mobilization, a term describing how research is or could be brought out of academia and into public use. Students looked at research, knowledge mobilizations and scholarly analysis, highlighting where Wikipedia did not include important, relevant information, often from outside the west. Because all our student editors are wikipedians, they didn’t stop with criticism and have proposed changes with sources from their research for each article!
Community Engagement
This month we are happy to announce two new opportunities for Visiting Scholars. The first is through Brown University’s John Nicholas Brown Center for Public Humanities and Cultural Heritage, which is looking for a Wikipedia to improve articles about ethnic studies. Supporting the position at Brown are Jim McGrath, Postdoctoral Fellow in Digital Public Humanities, and Susan Smulyan, the Center’s Director. The second position is at Temple University, which would like to support a Wikipedian’s work on subjects related to the history of Philadelphia, the history of African Americans in Philadelphia, and/or the history and study of the Holocaust. Associate University Librarian Steven Bell is supporting the position at Temple University Libraries. There’s more information about these positions in our blog posts about them:
Community Engagement Manager Ryan McGrady is focused on recruiting experienced Wikipedians for the open positions. He also continued to work with several other sponsors at various stages of the onboarding process and new contacts thanks to Jami and Samantha’s outreach at recent conferences.
The current Visiting Scholars continued to produce some stand-out work. George Mason University’s Gary Greenbaum brought Mr. Dooley up to Featured Article status. At the end of the month, it was also selected as “Today’s Featured Article” on Wikipedia’s main page. Barbara Page’s article, Serial rapist, was also featured on the main page in the Did You Know section with the following: “[Did you know] … that serial rapists are more likely to be strangers to their victims than single-victim rapists?”
Program Support
Communications
Communications Manager Eryk Salvaggio worked with Product Manager for Digital Services Sage Ross to organize a touchup of the Wiki Education Foundation’s website. The new site encourages deeper reads for specific audiences with visual cues directing readers to the blog, teaching resources, and fundraising pages.
In August, we also announced publication of a new subject-specific brochure, Editing Wikipedia articles on Linguistics. The guide was written with input from Dr. Gretchen McCulloch and Wikipedia editors User:Cnilep, User:Uanfala, and our own Wikipedia Content Expert in Humanities, Adam Hyland. It takes student editors through the process of writing or improving Wikipedia articles, with templates for structuring articles on languages, dialects, and linguistic concepts.
Eryk worked with Wikipedia Content Expert in the Sciences Ian Ramjohn to complete some updating of our training modules for students and the instructors’ orientation.
Чому Вікіпедія важлива для жінок у науці Eryk Salvaggio, “Why Wikipedia Matters to Women in Science,” translation by Vira Motorko for Wikimedia Ukraine. (August 1)
With the Fall term ramping up, Sage spent much of August fixing bugs in the course creation and course cloning features, adding Dashboard features that make it easier for Wiki Ed staff to onboard and monitor new courses, and updating the dashboard survey functionality for UMass Amherst’s Internal Review Board’s requirements for the Student Learning Outcomes Research. Work also continues on making the dashboard codebase easier for new developers to get started.
Among the more noticeable improvements:
The ‘Overview’ tab of each course now shows the number of images uploaded.
Students can now see and edit the article(s) they are working on directly from the course Overview. (This feature debuted earlier, but had been disabled for the last few months.)
Research and Academic Engagement
Research
In August, Data Science Intern Kevin Schiroo analyzed the portion of academic content produced by Wiki Ed students. This came in two parts. First, Kevin uncovered and incorporated new “signals” within an article that identified it was related to academic content. That classifier pulled information from references, introduction text, templates, and the use of academic words to classify articles as academic or not academic with a high degree of accuracy.
After constructing the classifier tool, Kevin applied it to a sample of Wikipedia pages to calculate the total productivity of Wiki Ed students within topics deemed academic.
When we consider all academic content, we saw some substantial contribution rates. Over the spring term, Wiki Ed student editors averaged 2.6% of all content. However, the entire term can be misleading, since we do not expect significant contributions early in the term, when most classes aren’t active. Shortening the window to 30 days shows that we produced 4.6% of all content between mid-April and mid-May.
We also examined contribution rates for early academic content, since this is a focal area for Wiki Ed. Here, we see substantially higher contribution rates. Over the whole term last term, Wiki Ed’s student editors produced 6.6% of this content; during our most active period (between mid-April and mid-May) we produced 10.1% of all early academic content, that is, either new articles or articles that were in a fledgling stage of development when students first encountered them.
Details can be found on meta and a general overview is available on Wiki Ed’s website.
Student Learning Outcomes Research
The Human Subjects Research Protection Office / IRB at the University of Massachusetts Amherst approved the protocol titled “Student Learning Outcomes using Wikipedia Based Assignments,” in just three weeks from submission. Research Fellow Zach McDowell worked with experts from various fields (particularly, Information Literacy and Composition and Rhetoric) that have taught with Wikipedia-based assignments to refine the survey’s assessments. Zach constructed the initial assessment and survey tool on the Wiki Ed dashboard, engaged the Wiki Ed in a round of testing and feedback, and implemented those changes. Additionally, Zach gathered valuable feedback from board members, helping to further improve and shape the research questionnaires. These changes were re-submitted to IRB and approved.
Helaine, Eryk, and Zach worked on refining a communications strategy to instructors and students, including a script for an introductory video for the research project, which will be shown to students before they are provided with a consent form.
Final approval for this phase of the project was received in late August. The survey has been released, with emails sent in waves to students and instructors that had already on-boarded. These emails are sent out multiple times a week to students and instructors as they sign up for classes, informing them of the study and encouraging them to participate.
Finance & Administration / Fundraising
Finance & Administration
For the month of August, expenses were $150,664 versus our planned spending of $208,033. The variance of $57k was primarily due to the departure and vacancy of two staff positions ($21k), as well as some cutbacks and savings with travel ($26k) expenses.
Our year-to-date expenses are $335,774 versus our planned expenditures of $436,676. Along with the staff vacancies and cutbacks in travel mentioned above, the $101k variance is also a result of deferring our fundraising and marketing campaigns ($49k) until later in the year.
Fundraising
Current priorities:
Securing new funding for fall 2016
Renewing major institutional funders in early 2017
Office of the ED
Current priorities:
Securing funding
Preparing for the strategic planning process
In August, Executive Director Frank Schulenburg started the “Executive Director’s Major Gift Campaign,” reaching out to high net-worth individuals via personalized solicitation letters, followed by phone calls with prospects. The goal of this new initiative is to measure the effectiveness of an in-house mail campaign based on a highly-curated list of contacts. Also, using A/B-testing, we’ll be comparing the results of using different messages and the return from different target groups. Inspired by the #100wikidays challenge, we’re aiming at sending 100 letters in 100 days.
Frank began preparing for the upcoming strategic planning exercise. With our current strategic plan running out in 2017, Frank and the board will create a new strategy for the next two years. In preparation for the kick-off, Frank started drafting a process and gathered materials for distribution among the participants of the planning exercise.
Finally, Frank and the members of the senior leadership team used the first iteration of the new “Executive Director’s Summary Report” which aims at increasing our effectiveness in keeping track of organizational performance indicators on a monthly level. Based on the feedback received this month, we’ll further improve the usefulness of the report card in future iterations.
We are happy to announce that the 6th Civil Court of Jacarepaguá in Rio de Janeiro, Brazil has ruled in favor of the Wikimedia Foundation in an injunction claim filed by Brazilian musician Rosanah Fienngo.
Ms. Fienngo filed a lawsuit objecting to information on her personal life in the Portuguese Wikipedia article about her. The court stated that although the information available on her Wikipedia page concerned her private life, Ms. Fienngo had already disclosed that information to the media herself, so its inclusion on Wikipedia was not an invasion of her privacy.
The Portuguese Wikipedia article about Ms. Fienngo contained information about her as a notable public figure in Brazil. This information included some details of her personal life, but this information was derived from public sources, most of which Ms. Fienngo had provided herself, such as an interview Ms. Fienngo gave to the gossip website O Fuxico.
In 2014, Ms. Fienngo filed an indemnification claim against Google Brasil and “Wikipedia,” apparently believing that Google was responsible for the content of Wikipedia. In November 2014, the Wikimedia Foundation received word that a Brazilian court had ruled against “Wikipedia” in Ms. Fienngo’s suit. The Wikimedia Foundation was not a party to this action and received no notice of the case in advance. The court order required removal of the article about Ms. Fienngo, and imposed a daily fine if the article remained intact. The article was then removed from Portuguese Wikipedia by community members.
In response, the Wikimedia Foundation argued that the article was written using information already publicly available online, including statements Ms. Fienngo had made in published interviews. Additionally, as a public figure, Ms. Fienngo has a reduced “sphere of privacy,” and celebrities do not need to approve articles written about themselves using publicly available information.
This decision confirms that the information that was in the article about Ms. Fienngo was appropriate to host on Wikipedia in both Brazil and the United States. It should be noted, though that Ms. Fienngo retains the right to appeal to the Brazilian State Court of Appeals, but we believe that the decision was strong enough that community members should feel free to make editorial decisions to write articles like the one about Ms. Fienngo, which is as of publishing time still deleted.
Overall, this decision is a positive outcome for Wikimedia. This ruling supports the ability of Wikipedians in Brazil and all around the world to create accurate and well-sourced articles, even if the information in those articles may sometimes be unflattering to the article’s subject. Those who share personal information with the media should expect that it will be available to a large number of people, and may someday appear on Wikipedia.
The Wikimedia Foundation will continue to support you, the global community, in constructing the best encyclopedia possible to aid in the dissemination of free knowledge.
We would like to extend our sincerest gratitude to Koury Lopes Advogados for their excellent representation in this matter, especially Tania Liberman, Eloy Rizzo, Tiago Cortez, Daniel Rodrigo Shingai, and Yasmine Maluf. We would also like to extend special thanks to legal fellow Leighanna Mixter for her assistance in preparing this blog post.
Wikimedia UK evaluation panel, June 2016. Photo by Wolliff (WMUK) CC BY-SA 4.0
Wikimedia UK will soon be applying to the Wikimedia Foundation for an Annual Plan Grant (APG) in 2017 – 18. Longstanding volunteers, members and other stakeholders will be familiar with this process but for those of you who aren’t, an APG enables affiliated organisations around the world – including country ‘chapters’ of the global Wikimedia movement, like Wikimedia UK – to access funds raised by the Foundation through the Wikipedia banner campaign.
The deadline for proposals is 1st October and we will need to submit our draft delivery plan for next year as well as the proposal itself. On Saturday 24th September we will be holding a day of meetings to discuss and develop our proposal and our delivery plans for next year alongside the wider Wikimedia UK community. These include a meeting of the Evaluation Panel in the morning followed by a discussion focused on education from 12 – 3pm and a Planning Lab from 3 to 5pm.
The education meeting will give participants the opportunity to feed into our emerging plans for education and help us to shape an education conference in early 2017. At the Planning Lab we will share our plans for partnerships and programmes in 2017, with a view to incorporating feedback and ideas into our proposal to the Wikimedia Foundation, and enabling volunteers to identify how they might get involved with Wikimedia UK over the next year.
All meetings will take place at Development House near Old Street, London and are open to all, but signing up in advance is essential (see below for links). Refreshments including lunch during the education meeting will be provided, and support for travel is available if Wikimedia UK is notified in advance by email to karla.marte@wikimedia.org.uk.
tl;dr: Open licensing works when you strike a healthy balance between obligations and reuse. Data, and how it is used, is different from software in ways that change that balance, making reasonable compromises in software (like attribution) suddenly become insanely difficult barriers.
In my last post, I wrote about how database law is a poor platform to build a global public copyleft license on top of. Of course, whether you can have copyleft in data only matters if copyleft in data is a good idea. When we compare software (where copyleft has worked reasonably well) to databases, we’ll see that databases are different in ways that make even “minor” obligations like attribution much more onerous.
In software copyleft, the most common scenarios to evaluate are merging two large programs, or copying one small file into a much larger program. In this scenario, understanding how licenses work together is fairly straightforward: you have two licenses. If they can work together, great; if they can’t, then you don’t go forward, or, if it matters enough, you change the license on your own work to make it work.
In contrast, data is often combined in three ways that are significantly different than software:
Scale: Instead of a handful of projects, data is often combined from hundreds of sources, so doing a license conflicts analysis if any of those sources have conflicting obligations (like copyleft) is impractical. Peter Desmet did a great job of analyzing this in the context of an international bio-science dataset, which has 11,000+ data sources.
Boundaries: There are some cases where hundreds of pieces of software are combined (like operating systems and modern web services) but they have “natural” places to draw a boundary around the scope of the copyleft. Examples of this include the kernel-userspace boundary (useful when dealing with the GPL and Linux kernel), APIs (useful when dealing with the LGPL), or software-as-a-service (where no software is “distributed” in the classic sense at all). As a result, no one has to do much analysis of how those pieces fit together. In contrast, no natural “lines” have emerged around databases, so either you have copyleft that eats the entire combined dataset, or you have no copyleft. ODbL attempts to manage this with the concept of “independent” databases and produced works, but after this recent case I’m not sure even those tenuous attempts hold as a legal matter anymore.
Authorship: When you combine a handful of pieces of software, most of the time you also control the licensing of at least one of those pieces of software, and you can adjust the licensing of that piece as needed. (Widely-used exceptions to this rule, like OpenSSL, tend to be rare.) In other words, if you’re writing a Linux kernel driver, or a WordPress theme, you can choose the license to make sure it complies. Not necessarily the case in data combinations: if you’re making use of large public data sets, you’re often combining many other data sources where you aren’t the author. So if some of them have conflicting license obligations, you’re stuck.
How attribution is managed
Attribution in large software projects is painful enough that lawyers have written a lot on it, and open-source operating systems vendors have built somewhat elaborate systems to manage it. This isn’t just a problem for copyleft: it is also a problem for the supposedly easy case of attribution-only licenses.
Now, again, instead of dozens of authors, often employed by the same copyright-owner, imagine hundreds or thousands. And imagine that instead of combining these pieces in basically the same way each time you build the software, imagine that every time you have a different query, you have to provide different attribution data (because the relevant slices of data may have different sources or authors). That’s data!
The least-bad “solution” here is to (1) tag every field (not just data source) with licensing information, and (2) have data-reading software create new, accurate attribution information every time a new view into the data is created. (I actually know of at least one company that does this internally!) This is not impossible, but it is a big burden on data software developers, who must now include a lawyer in their product design team. Most of them will just go ahead and violate the licenses instead, pass the burden on to their users to figure out what the heck is going on, or both.
Who creates data
Most software is either under a very standard and well-understood open source license, or is produced by a single entity (or often even a single person!) that retains copyright and can adjust that license based on their needs. So if you find a piece of software that you’d like to use, you can either (1) just read their standard FOSS license, or (2) call them up and ask them to change it. (They might not change it, but at least they can if they want to.) This helps make copyleft problems manageable: if you find a true incompatibility, you can often ask the source of the problem to fix it, or fix it yourself (by changing the license on your software).
Data sources typically can’t solve problems by relicensing, because many of the most important data sources are not authored by a single company or single author. In particular:
Governments: Lots of data is produced by governments, where licensing changes can literally require an act of the legislature. So if you do anything that goes against their license, or two different governments release data under conflicting licenses, you can’t just call up their lawyers and ask for a change.
Community collaborations: The biggest open software relicensing that’s ever been done (Mozilla) required getting permission from a few thousand people. Successful online collaboration projects can have 1-2 orders of magnitude more contributors than that, making relicensing is hard. Wikidata solved this the right way: by going with CC0.
What is the bottom line?
Copyleft (and, to a lesser extent, attribution licenses) works when the obligations placed on a user are in balance with the benefits those users receive. If they aren’t in balance, the materials don’t get used. Ultimately, if the data does not get used, our egos feel good (we released this!) but no one benefits, and regardless of the license, no one gets attributed and no new material is released. Unfortunately, even minor requirements like attribution can throw the balance out of whack. So if we genuinely want to benefit the world with our data, we probably need to let it go.
So what to do?
So if data is legally hard to build a license for, and the nature of data makes copyleft (or even attribution!) hard, what to do? I’ll go into that in my next post.
Generating Article Placeholders on the Welsh Wikipedia
The Welsh Language Wicipedia already punches above its weight with seventy thousand articles. That’s roughly one article for every eight Welsh speakers. But now a student in Germany has developed a new tool which can fill in the gaps on Wikipedia by borrowing data from another of Wikimedia’s projects – Wikidata.
The aim of this new feature is to increase the access to open and free knowledge in Wikipedia. The Article Placeholder will gather data, images and sources from Wikidata and display it Wikipedia style, making it easily readable and accessible.
Currently the Article Placeholder is being trialled on a few smaller Wikipedia’s and after a consultation with the Welsh Wicipedia community it was agreed that we would activate the new extension here in Wales.
An Article Placeholder for Hobbits on the Welsh Wikipedia
The most obvious advantage of this functionality is the easy access to information which has not yet been included on Wicipedia, and with 20 million items in Wikidata, it’s not short on information. This in turn should encourage editors to create new articles using the information presented in the Article Placeholder.
But perhaps the most exciting aspect of using Wikidata to generate Wikipedia content, is that Wikidata speaks hundreds of languages, including Welsh! This means that many pages it generates on the Welsh Wikipedia appear entirely in Welsh.
If the Wikidata entry being used hasn’t yet been translated into Welsh, the Placeholder will display the information in English, however it is now easier than ever to link from the Placeholder to the Wikidata item and add a Welsh translation. And plans are underway to hold Translate-a-thons with Welsh speakers in order to translate more Wikidata items into Welsh.
Welsh can easily be added to any Wikidata label
It is hoped that embedding this feature into the Welsh language Wicipedia will provide Welsh speakers with a richer Wiki experience and will encourage more editors to create content and add Welsh translations to Wikidata, cementing the place of the Welsh language in the digital realm.
Update (15 September 2016): The EC has released their official proposal for the Directive. It differs in some minor ways from the leaked version. Those differences do not substantially affect the analysis and concerns discussed in this post.
Several documents by the European Commission (EC) leaked during the past couple of weeks, giving us a clear view of the Commission’s plans for EU copyright reform. The EC had great ambitions to modernize copyright and to “ensure wider access to content across the EU.” However, its proposals do not look good for the public’s ability to access and share knowledge on the Internet. The burden is now on the European Parliament and the EU Council to balance the proposal.
The EC proposes creating a new copyright for publishers that would make it harder for the public to find news articles online and restrict their freedom to share the articles they do find. Another proposal targets online platforms built on user contributions—as the Wikimedia projects are—forcing them to implement technology to monitor for copyright infringement. Just as important are what the EC left out of its proposal, such as an EU-wide freedom of panorama copyright exception to give people the right to share photographs of public spaces. Some of the proposed rules would benefit libraries, museums, schools, and other important institutions for public knowledge. However, these benefits are highly circumscribed and far from outweigh the recommended measures’ harms.
As part of its public consultation in preparation for its proposal, the EC asked for input specifically on the topic of freedom of panorama. We focused much of our comments on that exception, which has already been adopted in many EU member states. With a full freedom of panorama exception, it does not infringe copyright for people to take and share pictures of art and buildings in public spaces. It is a sensible copyright reform that redounds to the public’s benefit without significantly harming artists’ and architects’ ability to make a living. It is also a major topic in European copyright discussions and within the Wikimedia communities. While the EC does recommend all EU Member States incorporate freedom of panorama into their national law, and it recognizes that “the current situation holds back digital innovation in the areas of education, research, and preservation of cultural heritage”, it does not even consider harmonizing freedom of panorama EU-wide. In its study of possible reforms, it relegates the freedom of panorama issue to a single footnote.
While failing to propose positive reforms like freedom of panorama harmonization, the EC pushes for regulations that are potentially harmful to Wikimedia. The EC wants to force sites that host “large amounts of works” to enter agreements with rightsholders that would require the services to monitor for copyright infringement on their platforms. The EC seems unconcerned with the difficulty in determining which platforms the law would affect, saying it would be based on “factors including the number of users and visitors and the amount of content uploaded”. Based on those factors, however, the Wikimedia projects may meet the criteria for regulation. There are tens of millions of articles on Wikipedia and media files on Wikimedia Commons, and hundreds of millions of monthly visitors to the Wikimedia sites. However, as seemed to be the consensus at a multistakeholder discussion of similar requirements in the US, it would be absurd to require the Wikimedia Foundation to implement costly and technologically impractical automated systems for detecting copyright infringement. The Wikimedia projects are dedicated to public domain and freely licensed content and they have dedicated volunteers who diligently remove content suspected of infringing copyright. Furthermore, beyond Wikimedia, this proposal would lead to over-removing non-infringing content, with a corresponding chilling effect on free expression and creativity.
The EC’s recommendations also include the creation of a new 20-year copyright for press publishers—even more extreme than we and others feared. The concern behind the publisher’s right is that sites like Google News that aggregate news articles and list their headlines (accompanied by brief excerpts or summaries) are reducing traffic to news sites and thereby diminishing publishers’ ad revenue. The publisher’s right would force news aggregators to pay fees in order to aggregate articles—potentially including to simply list article headlines—or else be liable for copyright infringement. This proposal would make it more difficult for the public to find and access news articles, because there would be additional financial barriers to providing that access. It would also make it more difficult for new news aggregators to emerge to challenge existing ones. The extraordinarily long term for this right exacerbates these problems. Creating a new copyright for publishers could impair the public’s ability to learn about important events in the world around them.
The proposal does contain small steps in the direction of positive copyright reform. It grants an exception to “cultural heritage institutions” who make copies of works for preservation purposes. However, “cultural heritage institution” is narrowly defined to include only libraries, archives, museums and film heritage institutions, with apparently no consideration of entities like Wikimedia and Internet Archive that are important for cultural heritage but do not fit a traditional mold. Limiting who is allowed to preserve works makes it more likely that the world will lose them. The proposal also recognizes the value and importance of text and data mining for research. Unfortunately, the proposed exception only covers public interest research institutions.
These few brighter spots in the EC’s proposal are overshadowed by its many problems. Altogether, the Impact Assessment’s language and focus suggest that the EC’s primary concern is the amount of money legacy publishers are making. They try to frame this as concern for long-term “cultural diversity”, but they offer no support or argument for why it will diminish cultural diversity for businesses built on ink-and-paper revenue models to fail. They appear to give no credit, or even consideration, to the democratic cultural production that has flourished thanks to technological developments and Internet platforms. Instead, they paint these platforms as mostly indifferent to copyright infringement and as obstinate for refusing to capitulate to rightsholders’ demands for overzealous takedown systems.
There are more issues with this proposal than can be addressed in one blog post, but it should be apparent by now that the EC’s recommendations must not be enacted as legislation. The European Parliament now has the opportunity to amend or reject the proposal.
tl;dr: Databases are a very poor fit for any licensing scheme, like copyleft, that (1) is intended to encourage use by the entire world but also (2) wants to place requirements on that use. This is because of broken legal systems and the way data is used. Projects considering copyleft, or even mere attribution, for data, should consider other approaches instead.
I’ve been a user of copyleft/share-alike licenses for a long time, and even helped draft several of them, but I’ve come around to the point of view that copyleft is a poor fit for data. Unfortunately, I’ve been explaining this a lot lately, so I want to explain why in writing. This first post will focus on how the legal system around databases is broken. Later posts will focus on how databases are hard to license, and what we might do about it.
FOSS licensing, and particularly copyleft, relies on legal features database rights lack
Defenders of copyleft often have to point out that copyleft isn’t necessarily anti-copyright, because copyleft depends on copyright. This is true, of course, but the more I think about databases and open licensing, the more I think “copyleft depends on copyright” almost understates the case – global copyleft depends not just on “copyright”, but on very specific features of the international copyright system which database law lacks.
To put it in software terms, the underlying legal platform lacks the features necessary to reliably implement copyleft.
Consider some differences between the copyright system and database law:
Maturity: Copyright has had 100 or so years as an international system to work out kinks like “what is a work” or “how do joint authors share rights?” Even software copyright law has existed for about 40 years. In contrast, database law in practice has existed for less than 20 years, pretty much all of that in Europe, and I can count all the high court rulings on it on my fingers and toes. So key terms, like “substantial”, are pretty hard to define-courts and legislatures simply haven’t defined, or refined, the key concepts. This makes it very hard to write a general-purpose public license whose outcomes are predictable.
Stability: Related to the previous point, copyright tends to change incrementally, as long-standing concepts are slowly adapted to new circumstances. (The gradual broadening of fair use in the Google era is a good example of this.) In contrast, since there are so few decisions, basically every decision about database law leads to upheaval. Open Source licenses tend to have a shelf-life of about ten years; good luck writing a database license that means the same thing in ten years as it does today!
Global nature: Want to share copyrighted works with the entire world? Copyright (through the Berne Convention) has you covered. Want to share a database? Well, you can easily give it away to the whole world (probably!), but want to reliably put any conditions on that sharing? Good luck! You’ve now got to write a single contract that is enforceable in every jurisdiction, plus a license that works in the EU, Japan, South Korea, and Mexico. As an example again, “substantial” – used in both ODbL and CC 4.0 – is a term from the EU’s Database Directive, so good luck figuring out what it means in a contract in the US or within the context of Japan’s database law.
Default rights: Eben Moglen has often pointed out that anyone who attacks the GPL is at a disadvantage, because if they somehow show that the license is legally invalid, then they get copyright’s “default”: which is to say, they don’t get anything. So they are forced to fight about the specific terms, rather than the validity of the license as a whole. In contrast, in much of the world (and certainly in the US), if you show that a database license is legally invalid, then you get database’s default: which is to say, you get everything. So someone who doesn’t want to follow the copyleft has very, very strong incentives to demolish your license altogether. (Unless, of course, the entire system shifts from underneath you to create a stronger default – like it may have in the EU with the Ryanair case.)
With all these differences, what starts off as hard (“write a general-purpose, public-facing license that requires sharing”) becomes insanely difficult in the database context. Key goals of a general-purpose, public license – global, predictable, reliable – are very hard to do.
In upcoming posts, I’ll try to explain why, even if it were possible to write such a license from a legal perspective, it might not be a good idea because of how databases are used.
Coffee is an essential part of a million morning rituals every day. For millions of bathrobe-clad and bed-headed people, coffee is a cup of pure vitality.
But there’s a dark side to the dark liquid. For those who experience anxiety symptoms, coffee can encourage the onset of panic attacks. It’s a fascinating and little-discussed side effect, and you can read all about it thanks to students in Dr. Michelle Mynlieff’s Neurobiology course at Marquette University.
Student editors in that course transformed an article that sat at less than 400 words, expanding it by more than 10 times its original length. The article, “Caffeine-induced anxiety disorder,” discusses how caffeine works, and how it effects anxiety.
It’s part of a set of articles improved by Dr. Mynlieff’s students, including a major expansion of the article on neuroscientists themselves! The Neuroscientist article sat at just three paragraphs, with two references (and one of them was a dead link). Students expanded the article with historical context and a summary of existing research projects.
Students expanded the article on Adipsia, the rare decreased sensation of thirst that can be a sign of diabetes. Others expanded Camptocormia, a bent spine often seen among the elderly, and Myocionus dystonia, a muscle disorder that causes abnormal posture. The camptocormia article was expanded from a three-sentence article to discuss the history of the disease, the ways it is diagnosed, and some of the causes, treatments and current areas of research.
They also expanded knowledge available to researchers. For example, students expanded an article on a particular gene — SLC7A11. The absence or impairment of this gene’s expression may play a role in drug addiction and schizophrenia, as well as Alzheimer’s and Parkinson diseases.
These articles have been viewed 610,000 times since these students took them on! Thanks to these students, the entire world has access to knowledge that helps us better understand the way our bodies work.
Think your students might want to share their knowledge to improve the world? Check out our Year of Science initiative, or send us an e-mail: contact@wikiedu.org.
The system described in the paper looks for red links in Wikipedia and classifies them based on their context. To find section titles, it then looks for similar existing articles. With these titles, the system searches the web for information, and eventually uses content summarization and a paraphrasing algorithm. The researchers uploaded 50 of these automatically created articles to Wikipedia, and found that 47 of them survived. Some were heavily edited after upload, others not so much.
While I was enthusiastic about the results, I was surprised by the suboptimal quality of the articles I reviewed – three that were mentioned in the paper. After a brief discussion with the authors, a wider discussion was initiated on the Wiki-research mailing list. This was followed by an entry on the English Wikipedia administrators’ noticeboard (which includes a list of all accounts used for this particular research paper). The discussion led to the removal of most of the remaining articles.
The discussion concerned the ethical implications of the research, and using Wikipedia for such an experiment without the consent of Wikipedia contributors or readers. The first author of the paper was an active member of the discussion; he showed a lack of awareness of these issues, and appeared to learn a lot from the discussion. He promised to take these lessons to the relevant research community – a positive outcome.
In general, this sets an example for engineers and computer-science engineers, who often show a lack of awareness of certain ethical issues in their research. Computer scientists are typically trained to think about bits and complexities, and rarely discuss in depth how their work impacts human lives. Whether it’s social networks experimenting with the mood of their users, current discussions of biases in machine-learned models, or the experimental upload of automatically created content in Wikipedia without community approval, computer science has generally not reached the level of awareness of some other sciences for the possible effects of their research on human subjects, at least as far as this reviewer can tell.
Even in Wikipedia, there’s no clear-cut, succinct Wikipedia policy I could have pointed the researchers to. The use of sockpuppets was a clear violation of policy, but an incidental component of the research. WP:POINT was a stretch to cover the situation at hand. In the end, what we can suggest to researchers is to check back with the Wikimedia Research list. A lot of people there have experience with designing research plans with the community in mind, and it can help to avoid uncomfortable situations.
See also our 2015 review of a related paper coauthored by the same authors: “Bot detects theatre play scripts on the web and writes Wikipedia articles about them” and other similarly themed papers they have published since then: “WikiKreator: Automatic Authoring of Wikipedia Content”[2], “WikiKreator: Improving Wikipedia Stubs Automatically”[3], “Filling the Gaps: Improving Wikipedia Stubs”[4].DV
Ethics researcher: Vandal fighters should not be allowed to see whether an edit was made anonymously
A paper[5] in the journal Ethics and Information Technology examines the “system of surveillance” that the English Wikipedia has built up over the years to deal with vandalism edits. The author, Paul B. de Laat from the University of Groningen, presents an interesting application of a theoretical framework by US law scholar Frederick Schauer that focuses on the concepts of rule enforcement and profiling. While providing justification for the system’s efficacy and largely absolving it of some of the objections that are commonly associated with the use of profiling in e.g. law enforcement, de Laat ultimately argues that in its current form, it violates an alleged “social contract” on Wikipedia by not treating anonymous and logged-in edits equally. Although generally well-informed about both the practice and the academic research of vandalism fighting, the paper unfortunately fails to connect to an existing debate about very much the same topic – potential biases of artificial intelligence-based anti-vandalism tools against anonymous edits – that was begun last year[6] by the researchers developing ORES (an edit review tool that was just made available to all English Wikipedia users, see this week’s Technology report) and most recently discussed in the August 2016 WMF research showcase.
The paper first gives an overview of the various anti-vandalism tools and bots in use, recapping an earlier paper[7] where de Laat had already asked whether these are “eroding Wikipedia’s moral order” (following an even earlier 2014 paper in which he had argued that new-edit patrolling “raises a number of moral questions that need to be answered urgently”). There, de Laat’s concerns included the fact that some stronger tools (rollback, Huggle, and STiki) are available only to trusted users and “cause a loss of the required moral skills in relation to newcomers”, and that they a lack of transparency about how the tools operate (in particular when more sophisticated artificial intelligence/machine learning algorithms such as neural networks are used). The present paper expands on a separate but related concern, about the use of “profiling” to pre-select which recent edits will be subject to closer human review. The author emphasizes that on Wikipedia this usually does not mean person-based offender profiling (building profiles of individuals committing vandalism), citing only one exception in form of a 2015 academic paper – cf. our review: “Early warning system identifies likely vandals based on their editing behavior“. Rather, “the anti-vandalism tools exemplify the broader type of profiling” that focuses on actions. Based on Schauer’s work, the author asks the following questions:
“Is this profiling profitable, does it bring the rewards that are usually associated with it?”
“is this profiling approach towards edit selection justified? In particular, do any of the dimensions in use raise moral objections? If so, can these objections be met in a satisfactory fashion, or do such controversial dimensions have to be adapted or eliminated?”
But snakes are much more dangerous! According to Schauer, while general rules are always less fair than case-by-case decisions, their existence can be justified by other arguments.
To answer the first question, the author turns to Schauer’s work on rules, in a brief summary that is worth reading for anyone interested in Wikipedia policies and guidelines – although de Laat instead applies the concept to the “procedural rules” implicit in vandalism profiling (such as that anonymous edits are more likely to be worth scrutinizing). First, Schauer “resolutely pushes aside the argument from fairness: decision-making based on rules can only be less just than deciding each case on a particularistic basis “. (For example, a restaurant’s “No Dogs Allowed” rule will unfairly exclude some well-behaved dogs, while not prohibiting much more dangerous animals such as snakes.) Instead, the existence of rules have to be justified by other arguments, of which Schauer presents four:
Rules “create reliability/predictability for those affected by the rule: rule-followers as well as rule-enforcers”.
Rules “promote more efficient use of resources by rule-enforcers” (e.g. in case of a speeding car driver, traffic police and judges can apply a simple speed limit instead having to prove in detail that an instance of driving was dangerous).
Rules, if simple enough, reduce the problem of “risk-aversion” by enforcers, who are much more likely to make mistakes and face repercussions if they have to make case by case decisions.
Rules create stability, which however also presents “an impediment to change; it entrenches the status-quo. If change is on a society’s agenda, the stability argument turns into an argument against having (simple) rules.”
The author cautions that these four arguments have to be reinterpreted when applying them to vandalism profiling, because it consists of “procedural rules” (which edits should be selected for inspection) rather than “substantive rules” (which edits should be reverted as vandalism, which animals should be disallowed from the restaurant). While in the case of substantive rules, their absence would mean having to judge everything on a case-by-case basis, the author asserts that procedural rules arise in a situation where the alternative would be to to not judge at all in many cases: Because “we have no means at our disposal to check and pass judgment on all of them; a selection of a kind has to be made. So it is here that profiling comes in”. With that qualification, Schauer’s second argument provides justification for “Wikipedian profiling [because it] turns out to be amazingly effective”, starting with the autonomous bots that auto-revert with an (aspired) 1:1000 false-positive rate.
De Laat also interprets “the Schauerian argument of reliability/predictability for those affected by the rule” in favor of vandalism profiling. Here, though, he fails to explain the benefits of vandals being able to predict which kind of edits will be subject to scrutiny. This also calls into question his subsequent remark that “it is unfortunate that the anti-vandalism system in use remains opaque to ordinary users”. The remaining two of Schauer’s four arguments are judged as less pertinent. But overall the paper concludes that it is possibile to justify the existence of vandalism profiling rules as beneficial via Schauer’s theoretical framework.
Police traffic stops: A good analogy for anti-vandalism patrol on Wikipedia?
Next, de Laat turns to question 2, on whether vandalism profiling is also morally justified. Here he relies on later work by Schauer, from a 2003 book, “Profiles, Probabilities, and Stereotypes”, that studies such matters as profiling by tax officials (selecting which taxpayers have to undergo an audit), airport security (selecting passengers for screening) and by police officers (e.g. selecting cars for traffic stops). While profiling of some kind is a necessity for all these officials, the particular characteristics (dimensions) used for profiling can be highly problematic (see e.g. Driving While Black). For de Laat’s study of Wikipedia profiling, “two types of complications are important: (1) possible ‘overuse’ of dimension(s) (an issue of profile effectiveness) and (2) social sensibilities associated with specific dimension(s) (a social and moral issue).” Overuse can mean relying on stereotypes that have no basis in reality, or over-reliance on some dimensions that, while having a non-spurious correlation with the deviant behavior, are over-emphasized at the expense of other relevant characteristics because they are more visible or salient to the profile. E.g. while Schauer considers that it may be justified for “airport officials looking for explosives [to] single out for inspection the luggage of younger Muslim men of Middle Eastern appearance”, it would be an over-use if “officials ask all Muslim men and all men of Middle Eastern origin to step out of line to be searched”, thus reducing their effectiveness by neglecting other passenger characteristics. This is also an example for the second type of complication profiling, where the selected dimensions are socially sensitive – indeed, for the specific case of luggage screening in the US, “the factors of race, religion, ethnicity, nationality, and gender have expressly been excluded from profiling” since 1997.
Applying this to the case of Wikipedia’s anti-vandalism efforts, de Laat first observes that complication (1) (overuse) is not a concern for fully automated tools like ClueBotNG – obviously their algorithm applies the existing profile directly without a human intervention that could introduce this kind of bias. For Huggle and STiki, however, “I see several possibilities for features to be overused by patrollers, thereby spoiling the optimum efficacy achievable by the profile embedded in those tools.” This is because both tools do not just use these features in their automatic pre-selection of edits to be reviewed, but expose at least the fact whether an edit was anonymous to the human patroller in the edit review interface. (The paper examines this in detail for both tools, also observing that Huggle presents more opportunities for this kind of overuse, while STiki is more restricted. However, there seems to have been no attempt to study empirically whether this overuse actually occurs.)
Regarding complication (2), whether some of the features used for vandalism profiling are socially sensitive, de Laat highlights that they include some amount of discrimination by nationality: IP edits geolocated to the US, Canada, and Australia have been found to contain vandalism more frequently and are thus more likely to be singled out for inspection. However, he does not consider this concern “strong enough to warrant banning the country-dimension and correspondingly sacrifice some profiling efficacy”, chiefly because there do not appear to be a lot of nationalistic tensions within the English Wikipedia community that could be stirred up by this.
In contrast, de Laat argues that “the targeting of contributors who choose to remain anonymous … is fraught with danger since anons already constitute a controversial group within the Wikipedian community.” Still, he acknowledges the “undisputed fact” that the ratio of vandalism is much higher among anonymous edits. Also, he rejects the concern that they might be more likely to be the victim of false positives:
“
normally [IP editors] do not experience any harm when their edits are selected and inspected as a result of anon-powered profiling; they will not even notice that they were surveilled since no digital traces remain of the patrolling. … The only imaginable harm is that patrollers become over focussed on anons and indulge in what I called above ‘overinspection’ of such edits and wrongly classify them as vandalism … As a consequence, they might never contribute to Wikipedia again. … Nevertheless, I estimate this harm to be small. At any rate, the harm involved would seem to be small in comparison with the harassment of racial profiling—let alone that an ‘expressive harm hypothesis’ applies.
”
With this said, de Laat still makes the controversial call “that the anonymous-dimension should be banned from all profiling efforts” – including removing it from the scoring algorithms of Huggle, STiki and ClueBotNG. Instead of concerns about individual harm,
“
my main argument for the ban is a decidedly moral one. From the very beginning the Wikipedian community has operated on the basis of a ‘social contract’ that makes no distinction between anons and non-anons – all are citizens of equal stature. … In sum, the express profiling of anons turns the anonymity dimension from an access condition into a social distinction; the Wikipedian community should refrain from institutionalizing such a line of division. Notice that I argue, in effect, that the Wikipedian community has only two choices: either accept anons as full citizens or not; but there is no morally defensible social contract in between.
”
Sadly, while the paper is otherwise rich in citations and details, it completely fails to provide evidence for the existence of this alleged social contract. While it is true that “the ability of almost anyone to edit (most) articles without registration” forms part of Wikipedia’s founding principles (a principle that this reviewer strongly agrees with), the “equal stature” part seems to be de Laat’s own invention – there is a long list of things that, by longstanding community consensus, require the use of an account (which after all is freely available to everyone, without even requiring an email address). Most of these restrictions – say, the inability to create new articles or being prevented from participating in project governance during admin or arbcom votes – seem much more serious than the vandalism profiling that is the topic of de Laat’s paper. TB
Briefly
Conferences and events
Registration is open for WikiConference North America October 7-10. The conference will include a track about academic engagement and Wikipedia in education.
A list of other recent publications that could not be covered in time for this issue—contributions are always welcome for reviewing or summarizing newly published research. This month, the list mainly gathers research about the extraction of specific content from Wikipedia.
“Large SMT Data-sets Extracted from Wikipedia”[8] From the abstract: “The article presents experiments on mining Wikipedia for extracting SMT [ statistical machine translation ] useful sentence pairs in three language pairs. … The optimized SMT systems were evaluated on unseen test-sets also extracted from Wikipedia. As one of the main goals of our work was to help Wikipedia contributors to translate (with as little post editing as possible) new articles from major languages into less resourced languages and vice-versa, we call this type of translation experiments ‘in-genre’ translation. As in the case of ‘in-domain’ translation, our evaluations showed that using only ‘in-genre’ training data for translating same genre new texts is better than mixing the training data with ‘out-of-genre’ (even) parallel texts.”
“Recognizing Biographical Sections in Wikipedia”[9] From the abstract: “Thanks to its coverage and its availability in machine-readable format, [Wikipedia] has become a primary resource for large scale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections from them: given a person’s page, we identify the list of sections where information about her/his life is present [as opposed to nonbiographical sections, e.g. ‘Early Life’ but not ‘Legacy’ or ‘Selected writings’].”
“Extraction of lethal events from Wikipedia and a semantic repository”[10] From the abstract and conclusion: “This paper describes the extraction of information on lethal events from the Swedish version of Wikipedia. The information searched includes the persons’ cause of death, origin, and profession. […] We also extracted structured semantic data from the Wikidata store that we combined with the information retrieved from Wikipedia … [The resulting] data could not support the existence of the Club 27“.
“Learning Topic Hierarchies for Wikipedia Categories”[11] (from frequently used section headings in a category, e.g. “eligibility”, “endorsements” or “results” for Category:Presidential elections)
“‘A Spousal Relation Begins with a Deletion of engage and Ends with an Addition of divorce’: Learning State Changing Verbs from Wikipedia Revision History.”[12] From the abstract: “We propose to learn state changing verbs [such as ‘born’, ‘died’, ‘elected’, ‘married’] from Wikipedia edit history. When a state-changing event, such as a marriage or death, happens to an entity, the infobox on the entity’s Wikipedia page usually gets updated. At the same time, the article text may be updated with verbs either being added or deleted to reflect the changes made to the infobox. … We observe in our experiments that when state-changing verbs are added or deleted from an entity’s Wikipedia page text, we can predict the entity’s infobox updates with 88% precision and 76% recall.”
“Extracting Representative Phrases from Wikipedia Article Sections”[13] From the abstract: “Since [Wikipedia’s] long articles are taking time to read, as well as section titles are sometimes too short to capture comprehensive summarization, we aim at extracting informative phrases that readers can refer to.”
“Accurate Fact Harvesting from Natural Language Text in Wikipedia with Lector”[14] From the abstract: “Many approaches have been introduced recently to automatically create or augment Knowledge Graphs (KGs) with facts extracted from Wikipedia, particularly its structured components like the infoboxes. Although these structures are valuable, they represent only a fraction of the actual information expressed in the articles. In this work, we quantify the number of highly accurate facts that can be harvested with high precision from the text of Wikipedia articles […]. Our experimental evaluation, which uses Freebase as reference KG, reveals we can augment several relations in the domain of people by more than 10%, with facts whose accuracy are over 95%. Moreover, the vast majority of these facts are missing from the infoboxes, YAGO and DBpedia.”
“Extracting Scientists from Wikipedia”[15] From the abstract: “[We] describe a system that gathers information from Wikipedia articles and existing data from Wikidata, which is then combined and put in a searchable database. This system is dedicated to making the process of finding scientists both quicker and easier.”
“LeadMine: Disease identification and concept mapping using Wikipedia”[16] From the abstract: “LeadMine, a dictionary/grammar based entity recognizer, was used to recognize and normalize both chemicals and diseases to MeSH [ Medical Subject Headings ] IDs. The lexicon was obtained from 3 sources: MeSH, the Disease Ontology and Wikipedia. The Wikipedia dictionary was derived from pages with a disease/symptom box, or those where the page title appeared in the lexicon.”
“Finding Member Articles for Wikipedia Lists”[17] From the abstract: “… for a given Wikipedia article and list, we determine whether the article can be added to the list. Its solution can be utilized on automatic generation of lists, as well as generation of categories based on lists, to help self-organization of knowledge structure. In this paper, we discuss building classifiers for judging on whether an article belongs to a list or not, where features are extracted from various components including list titles, leading sections, as well as texts of member articles. … We report our initial evaluation results based on Bayesian and other classifiers, and also discuss feature selection.”
“Study of the content about documentation sciences in the Spanish-language Wikipedia”[18] (in Spanish). From the English abstract: “This study explore how [Wikipedia] addresses the documentation sciences, focusing especially on pages that discuss the discipline, not only the page contents, but the relationships between them, their edit history, Wikipedians who participated and all aspects that can influence on how the image of this discipline is projected” [sic]. TB
↑Aprosio, Alessio Palmero; Tonelli, Sara (17 September 2015). “Recognizing Biographical Sections in Wikipedia”. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal. pp. 811–816.
↑Hu, Linmei; Wang, Xuzhong; Zhang, Mengdi; Li, Juanzi; Li, Xiaoli; Shao, Chao; Tang, Jie; Liu, Yongbin (2015-07-26). “Learning Topic Hierarchies for Wikipedia Categories”(PDF). Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers). Beijing, China. pp. 346–351.
↑Ekenstierna, Gustaf Harari; Lam, Victor Shu-Ming. Extracting Scientists from Wikipedia. Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, Proceedings of the Workshop, July 11, 2016, Krakow, Poland.
Last year I got two Dell P2415Q 24″ Ultra-HD monitors, replacing my old and broken 1080p monitor, to use with my MacBook Pro. Since the model’s still available, thought I’d finally post my experience.
tl;dr:
Picture quality: great
Price: good for what you get and they’re cheaper now than they were last year.
Functionality: mixed; some problems that need workarounds for me.
So first the good: the P2415Q is the “right size, right resolution” for me; with an operating system such as Mac OS X, Windows 10, or some Linux environments that handles 200% display scaling correctly, it feels like a 24″ 1080p monitor that shows much, much sharper text and images. When using the external monitors with my 13″ MacBook Pro, the display density is about the same as the internal display and the color reproduction seems consistent enough to my untrained eye that it’s not distracting to move windows between the laptop and external screens.
Two side by side plus the laptop makes for a vveerryy wwiiddee desktop, which can be very nice when developing & testing stuff since I’ve got chat, documentation, terminal, code, browser window, and debugger all visible at once.
The monitor accepts DisplayPort input via either full-size or mini, and also accepts HDMI (limited to 30 Hz at the full resolution, or full 60Hz at 1080p) which makes it possible to hook up random devices like phones and game consoles.
There is also an included USB hub capability, which works well enough but the ports are awkward to reach.
The bad: there are three major pain points for me, in reducing order of WTF:
Sometimes the display goes black when using DisplayPort; the only way to resolve it seems to be to disconnect the power and hard-reset the monitor. Unplugging and replugging the DisplayPort cable has no effect. Switching cables has no effect. Rebooting computer has no effect. Switching the monitor’s power on and off has no effect. Have to reach back and yank out the power.
There are neither speakers nor audio passthrough connectors, but when connecting over HDMI devices like game consoles and phones will attempt to route audio to the monitor, sending all your audio down a black hole. Workaround is to manually re-route audio back to default or attach a USB audio output path to the connected device.
Even though the monitor can tell if there’s something connected to each input or not, it won’t automatically switch to the only active input. After unplugging my MacBook from the DisplayPort and plugging a tablet in over HDMI, I still have to bring up the on-screen menu and switch inputs.
The first problem is so severe it can make the unit appear dead, but is easily worked around. The second and third may or may not bother you depending on your needs.
So, happy enough to use em but there’s real early adopter pain in this particular model monitor.
The Tech News weekly summaries help you monitor recent software changes likely to impact you and your fellow Wikimedians. Subscribe, contribute and give feedback.
Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.
Recent changes
The Wikimedia Commons app for Android can now show nearby places that need photos. [1]
<maplink> and <mapframe> can now use geodata from OpenStreetMap if OpenStreetMap has defined a region and given it an ID in Wikidata. You can use this to draw on the map and add information. [2][3]
Changes this week
The RevisionSlider will be available as a beta feature on all wikis from 13 September. This will make it easier to navigate between diffs in the page history. [4]
A new user right will allow most users to change the content model of pages. [5][6]
The new version of MediaWiki will be on test wikis and MediaWiki.org from 13 September. It will be on non-Wikipedia wikis and some Wikipedias from 14 September. It will be on all wikis from 15 September (calendar).
Meetings
You can join the next meeting with the VisualEditor team. During the meeting, you can tell developers which bugs you think are the most important. The meeting will be on 13 September at 19:00 (UTC). See how to join.
Future changes
When you search on the Wikimedia wikis in the future you could see results from sister projects in your language. You can read more and discuss how this could work.
Hello, here’s Harmonia Amanda again (I think Ash_Crow just agreed to let me squat his blog indefinitely). This article shouldn’t be too long, for once.[1] It started as an hashtag on twitter #SundayQuery, where I wrote SPARQL queries for people who couldn’t find how to ask their question. So this article, and the next ones if I keep this up, are basically a kind of “how to translate a question in SPARQL step by step” tutorial.
The question this week was asked by Jean-No: who are the oldest French actresses still alive?
To begin: the endpoint and PREFIX…
SPARQL is the language you use to query a semantic database. For that you can use an endpoint, which are services that accept SPARQL queries and return results. There are many SPARQL endpoints around, but we will be using the Wikidata endpoint.
A semantic base is constituted of triples of information: subject, predicate, object (in Wikidata, we usually call that item, property, value, but it’s the exact same thing). As querying with full URIs all the time would be really tedious, SPARQL needs PREFIX.
These are the standard prefixes used to query Wikidata. If used, they should be stated at the top of the query. You can add other prefixes as needed to. Actually, the Wikidata endpoint has been created to specifically query Wikidata, so all these prefixes are already declared, even if you don’t see them. But it’s not because they aren’t visible that they aren’t there, so always remember: SPARQL NEEDS PREFIX. There.
All French women
The first thing to declare after the prefixes, is what we are asking to have for results. We can use the command SELECT or the command SELECT DISTINCT (this one ask to clear off duplicates) and then listing the variables.
As we are looking for humans, we’ll call this variable “person” (but we could choose anything we want).
SELECT ?person
We will then define the conditions we want our variable to respond to, with WHERE. We are seeking the oldest French actresses still alive. We need to cut that in little understandable bits. So the first step is, we are querying for human beings (and not fictional actresses). Humans beings are all items which have the property “instance of” (P31) with the value “human” (Q5).
So:
SELECT ?person
WHERE {
?person wdt:P31 wd:Q5 .
}
This query will return all humans in the database.[2] But we don’t want all humans, we want only the female ones. So we want humans who also answer to “sex or gender” (P21) with the value “female” (Q6581072).
Well that seems like a good beginning, but we don’t want all women, we only want actresses. It could be as simple as “occupation” (P106) “actor” (Q33999) but it’s not. In reality Q33999 doesn’t cover all actors and actresses: it’s a class. Subclasses, like “stage actor”, “television actor”, “film actor” have all “subclass of” (P279) “actor” (Q33999). We don’t only want the humans with occupation:actor; we also want those with occupation:stage actor and so on.
So we need to introduce another variable, which I’ll call ?occupation.
SELECT ?person
WHERE {
?person wdt:P31 wd:Q5 . #All humans
?person wdt:P21 wd:Q6581072 . #Of female gender
?person wdt:P27 wd:Q142 . #With France as their citizenship country
?person wdt:P106 ?occupation . #With an occupation
?occupation wdt:P279* wd:Q33999 . #This occupation is or is a subclass off "actor"
}
Actually, if we really wanted to avoid the ?occupation variable, we could have written:
I’s the same thing, but I found it less clear for beginners.
Still alive
So now we need to filter by age. The first step is to ensure they have a birth date (P569). As we don’t care (for now) what this birth date is, we’ll introduce a new variable instead of setting a value.
Now we want the French actresses still alive, which we’ll translate to “without a ‘date of death’ (P570)”. For that, we’ll use a filter:
SELECT ?person
WHERE {
?person wdt:P31 wd:Q5 .
?person wdt:P21 wd:Q6581072 .
?person wdt:P27 wd:Q142 .
?person wdt:P106 ?occupation .
?occupation wdt:P279* wd:Q33999 .
?person wdt:P569 ?birthDate . #With a birth date
FILTER NOT EXISTS { ?person wdt:P570 ?deathDate . } #Without a death date
}
Here it is, all French actresses still alive!
The Oldest
To find the oldest, we’ll need to order our results. So after the WHERE section of our query, we’ll add another request: now that you have found me my results, SPARQL, can you order them as I want? Logically, this is called ORDER BY. We can “order by” the variable we want. We could order by ?person, but as it’s the only variable we have selected in our query, it’s already done. To order by a variable, we need to ask to select this variable first.
The obvious way would be to order by ?birthDate. The thing is, Wikidata sometimes has more than one birth date for people because of conflicting sources, which translates in the same people appearing twice. So, we’ll group the people by their Qid (using GROUP BY) si that duplicates exist now as a group. Then we use SAMPLE (in our SELECT) to take only the groups’ first birth date that we find… And then ORDER BY it :
SELECT ?person (SAMPLE(?birthDate) AS ?date)
WHERE {
?person wdt:P31 wd:Q5 .
?person wdt:P21 wd:Q6581072 .
?person wdt:P27 wd:Q142 .
?person wdt:P106 ?occupation .
?occupation wdt:P279* wd:Q33999 .
?person wdt:P569 ?birthDate .
FILTER NOT EXISTS { ?person wdt:P570 ?deathDate . }
} GROUP BY ?person ORDER BY ?date
This gives us all French actresses ordered by date of birth, so the oldest first. We can already see that we have problems with the data: actresses whom we don’t know the date of birth (with “unknown value”) and actresses manifestly dead centuries ago but whom we don’t know the date of death. But still! Here is our query and we answered Jean-No’s question as we could.
With labels
Well, we certainly answered the query but this big list of Q and numbers isn’t very human-readable. We should ask to have the labels too! We could do this the proper SPARQL way, with using RDFS and such, but we are querying Wikidata on the Wikidata endpoint, so we’ll use the local tool instead.
We add to the query:
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,en" . }
which means we ask to have the labels in French (as we are querying French people), and if it doesn’t exist in French, then in English. But just adding that doesn’t work: we need to SELECT it too! (and to add it to the GROUP BY). So:
SELECT ?person ?personLabel (SAMPLE(?birthDate) AS ?date)
WHERE {
?person wdt:P31 wd:Q5 .
?person wdt:P21 wd:Q6581072 .
?person wdt:P27 wd:Q142 .
?person wdt:P106 ?occupation .
?occupation wdt:P279* wd:Q33999 .
?person wdt:P569 ?birthDate .
FILTER NOT EXISTS { ?person wdt:P570 ?deathDate . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,en" . }
} GROUP BY ?person ?personLabel ORDER BY ?date
Tada! It’s much more understandable now!
Make it pretty
I don’t want all living French actresses, including the children, I want the oldest; so I add a LIMIT to the results.[3]
} GROUP BY ?person ?personLabel ORDER BY ?date LIMIT 200
And why should I have a table with identifiers and names when I could have the results as a timeline? We ask that at the top, even before the SELECT:[4]
#defaultView:Timeline
Wait, wait! Can I have pictures too? But only if they have a picture, I still want the results if they don’t have one (and I want only one picture if they happen to have several so I use again the SAMPLE). YES WE CAN:
#defaultView:Timeline
SELECT ?person ?personLabel (SAMPLE(?birthDate) AS ?date) (SAMPLE(?photo) AS ?pic)
WHERE {
?person wdt:P31 wd:Q5 .
?person wdt:P21 wd:Q6581072 .
?person wdt:P27 wd:Q142 .
?person wdt:P106 ?occupation .
?occupation wdt:P279* wd:Q33999 .
?person wdt:P569 ?birthDate .
FILTER NOT EXISTS { ?person wdt:P570 ?deathDate . }
OPTIONAL { ?person wdt:P18 ?photo . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,en" . }
} GROUP BY ?person ?personLabel ORDER BY ?date LIMIT 200
Peter Baldwin—a Professor of History at the University of California, Los Angeles, a Global Distinguished Professor at New York University, and co-founder of the philanthropic Arcadia Foundation—has been appointed to the Wikimedia Endowment Advisory Board.
Baldwin joins Wikipedia founder Jimmy Wales and venture capitalist Annette Campbell-White as the third member of the board that is entrusted with overseeing the Wikimedia Endowment, a permanent source of funding to ensure Wikipedia thrives for generations to come.
Baldwin and his wife, Lisbet Rausing, have a long track record of philanthropic granting. In 2002, they founded Arcadia to focus on preserving cultural heritage, the environment, and supporting open access resources. Baldwin is the chairman of its Donor and Advisory Boards, and as of December 2015, Arcadia has made grant commitments of over $363 million to charities and scholarly institutions globally that preserve cultural heritage and the environment, and promote open access. These include the Endangered Languages Documentation program at SOAS, University of London, the Endangered Archive Program at the British Library, and Fauna & Flora International’s Halcyon Land and Sea fund.
“Arcadia has long supported open access and other ways of disseminating knowledge broadly,” Baldwin said. “Wikimedia’s efforts have been an example and a lodestar to all of us in the field for decades now. Nothing could be more important than to ensure Wikipedia’s continued health, growth and well-being, and I am pleased to be part of that effort.”
Peter and Lisbet have also been long-time donors and supporters of the Wikimedia Foundation.
“Peter brings a deep commitment to free knowledge and in depth experience in nonprofit Board management,” said Annette Campbell-White, fellow Advisory Board member. “We’re looking forward to having his unique expertise and shared passion for the Wikimedia mission on the Board.”
Endowment Board members are selected based on active involvement in philanthropic endeavors, prior nonprofit board experience, fundraising expertise, and a strong commitment to the Wikimedia Foundation’s mission.
In his academic career, Baldwin is a renowned scholar who is interested in the historical development of the modern state––a broad field that has led him in many different directions throughout his career. He has published works on the comparative history of the welfare state, on social policy more broadly, and on public health; his latest book is a transnational political history of copyright from 1710 to the present. He also has projects underway on the historical development of privacy, on the history of honor, and a global history of the state. Baldwin holds a PhD in History from Harvard University and a BA in Philosophy from Yale University.
Marc Brent, Endowment Director
Wikimedia Foundation
These diagrams where created at Wikimedia Deutchland by Jan Dittrich, Charlie Kritschmar and myself for an upcoming presentation I’m doing on the Clean Architecture. There are plenty of diagrams available already if you include Onion Architecture and Hexagonal, which have essentially the same structure, though none I’ve found so far have a permissive license. Furthermore, I’m not so happy with the wording and structure of a lot of these. In particular, some incorporate more than they can chew with the “dependencies pointing inward rule”, glossing over important restrictions which end up not being visualized at all.
These images are SVGs. Click them to go to Wikimedia Commons where you can download them.
The program for SMWCon Fall 2016 in Frankfurt am Main, Germany (September 28-30, 2016) has now been announced. It is compiled from a variety of interesting talks and presentations about semantic wikis and related topics. The keynote will be held by Prof. Dr. Sören Auer of Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Enterprise Information Systems. See all the details on the program pages for the tutorial day and the conference days.
All interested participants are encouraged to register at the ticketing site. Note that the slightly extended Early Bird registration period ends on September 14, 2016.
Wikipedia comes in 294 languages … and counting. It’s a drop in the bucket compared with the number of actual languages in the world (after all, Australia alone has more than 250 native languages), but alongside Google Translate (with 104 languages) it makes Wikipedia one of the most ambitious language projects today.
Where the English Wikipedia has more than 5m articles, there are hundreds of much smaller Wikipedias including Abkhazian, Cherokee, Norfolk and Fijian. Sitting in the site’s incubator program, along with 10 other languages, is Wikipedia’s first Indigenous Australian language, Noongar.
There are roughly 35,000 Noongar people today, according to the Noongar Boodjar Language Centre, making it one of the largest Indigenous groups in Australia. For thousands of years they’ve lived on Noongar boodjar (Noongar country), what is now known as the south-western corner of Western Australia and includes the capital city, Perth. Artefacts likely carried by early Noongar ancestors have been dated as far back as 30,000 years.
The Noongarpedia project began in 2014 by a team from the University of Western Australia led by school of Indigenous studies professor and Noongar elder Leonard Collard, with Curtin University’s John Hartley and the Miles Franklin award-winning novelist Kim Scott. Although not yet formally launched, the site is live and users can create an account and contribute by writing or editing articles
Its front page bears a welcome to country, an ancient spiritual greeting practised by many Indigenous Australian nations:
Kaya wanju gnullar NoongarPedia. Gnullar waarnkniy kwop kwop birdiyah wiern, maaman, yorga, koorlinga. Gnullar waarnkiny noonar yoorl koorliny waarnkiny nidja NoongarPedia. / Welcome to our Noongarpedia. We speak in good spirit of our ancestors, spirits, men, women and children. We hope you come and contribute to our Noongarpedia.
It is just one of many subtle but important departures from the larger Wikipedias. For example, Collard says inherent to English Wikipedia is an assumption that “all information is freely available to everybody”. Such a policy would conflict with Noongar knowledge convention and law which places restrictions on who can know what knowledge, so the Noongarpedia community is developing procedures to prevent general access to certain information.
A research associate, Jennifer Buchanan, says Wikipedia convention favours mainstream books, newspapers and scientific journals as creditable sources. For the Noongar people, while published works are also important, their elders are considered the greatest authority in knowledge and culture (meanwhile, inaccuracies about their people run rampant in the media and academic circles).
NASA’s official mural for the OSIRIS-REx mission. Image via NASA, public domain/CC0.
Sometime between 7:05pm and 9:05pm EDT tonight, NASA will launch OSIRIS-REx—a spacecraft that will travel to the asteroid 101955 Bennu. If its four-year journey goes as planned, it will be the first spacecraft to bring asteroid samples back to Earth for study.
Wikipedia tells us that this mission, the third in the space agency’s New Frontiers program (behind Juno and New Horizons), will help us gain a greater understanding of the formation and evolution of the Solar System, among other things.
“Astronomers are currently functioning with a model called ‘accretion‘ that attempts to explain the formation of the Solar System,” says Wikipedia editor BatteryIncluded, who has worked extensively on the OSIRIS-REx article. “There is very little known, and many hypotheses are advanced. The fact is that the current accretion model cannot explain many phenomena we see today in other solar systems, stars and galaxies, so any in-situ data [like that OSIRIS-REx will obtain] is useful.”
BatteryIncluded and fellow editor Hadron137 have worked on the OSIRIS-REx and Bennu articles. More broadly, they are two of many people who work on Wikipedia’s space-related articles, a topic which has no less than three devoted WikiProjects (focusing on spaceflight, the solar system, and astronomy). Hadron says that many are not experts on the topic, but that isn’t necessarily a negative:
“I don’t think a person needs to … have spacecraft-specific knowledge to contribute; that’s what’s so cool about Wikipedia. It just takes a bit of curiosity. I often read a paragraph in Wikipedia about the spacecraft and it gets me thinking about ‘how does that work,’ or ‘why do they do it like that’? Then I research the answer in credible publications and add the content to Wikipedia.”
Rendering of OSIRIS-REx by NASA/JPL/University of Arizona/Lockheed Martin, public domain/CC0.
This cohort of editors is quick to jump on breaking space stories that are covered in reliable sources; on 25 August, for instance, minutes to hours after scientists announced in Nature that they had found an exoplanet orbiting Alpha Proxima, Wikipedia had a comprehensive article.
But if you do have some knowledge of chemistry, physics, planetary science, or any related field, and you’ve been looking for a good place to jump in and contribute to Wikipedia, your easiest opening may be in the years after these events, when scientists begin to relate what we’ve learned from these missions.
As BatteryIncluded related to us, “In Wikipedia we can do better in highlighting that every space probe is more of a scientific ‘mission’ than a flight, particularly with sample-return missions” like OSIRIS-REx. “I tend to revisit old articles and add the most relevant scientific results, which are overlooked by most other WP editors. … Explaining the mission [in simple terms and in the] context of the objectives is important.”
Ed Erhart, Editorial Associate Wikimedia Foundation
In T135327, the WMF Technical Collaboration team collected a list of Phabricator bugs and feature requests from the Wikimedia Developer Community. After identifying the most promising requests from the community, these were presented to Phacility (the organization that builds and maintains Phabricator) for sponsored prioritization.
I am very pleased to report that we are already seeing the benefits of this initiative. Several sponsored improvements have landed on https://phabricator.wikimedia.org/ over the past few weeks. For an overview of what's landed recently, read on!
Notice three of those have task numbers lower than 2000. Those long-standing tasks date from the first months of WMF's Phabricator evaluation and RFC period. When those tasks were originally filled, Phabricator was just a test install running in WMF Labs. For me, It's especially satisfying to close so many long-standing issues that have effected many of us for more than a year.
Work in Progress
Several more issues were identified for sponsorship which are still awaiting a complete solution. Some of these are at least partially fixed and some are still pending. You can find out more details by reading the comments on each task linked below.
This very helpful feature displays a graphical representation of a task's Parents and Subtasks.
Initially there was an issue with this feature that made tasks with many relationships unable to load. This was exacerbated by the historical use of "tracking tasks" in the Wikimedia Bugzilla context. Thankfully after a quick patch from @epriestley (the primary author of Phabricator) and lots of help and testing from @Danny_B and @Paladox, @mmodell was able to deploy a fix for the issue a little over 24 hours after it was discovered.
Here's to yet more fruitful collaborations with upstream Phabricator!
When a person think about genetics, she’s often interested in how the study of genes applies to her, her family, or her community. From whom did she inherit a particular trait? Did a genetic mutation lead to her grandmother’s medical condition? How does her genetic makeup inform her ancestry? People looking for better context for how genes work can, and often do, find answers on Wikipedia.
At the The Allied Genetics Conference in Orlando, Florida, biologists, geneticists, and students spoke to me about their research, and how Wikipedia fits into the classroom. Instructors assign their students to edit Wikipedia because they care about a) the public’s access to reliable, current, comprehensible scientific research and b) students doing real-world assignments that build transferable skills as they research and write.
Boosting the public’s understanding of science
We’re proud to say that students in higher ed classrooms are driving a lot of that new science content. At the end of the first term of our Year of Science, 6% of all new Wikipedia content in the sciences came from Wikipedia writing assignments. That’s an incredible contribution to the public’s awareness of science!
For the majority of the people who go online to broaden their understanding of science topics, that means the information they find is more reliable and more complete. It means less knowledge hidden away behind paywalls, and more information carefully presented to the public.
Building transferable skills
Scientists at the conference told me that they liked how Wikipedia was always evolving. Evaluating different versions of Wikipedia articles can illustrate that science is iterative. Scientists build on the work of those who came before them—just like Wikipedians.
When a student looks to improve a Wikipedia article, or even build a new one, it’s the start of a long project that calls on their critical thinking, reading, and writing skills. They have to weigh the information they see, and whether it holds up to the scrutiny of what they know. To challenge it, they have to draw on the resources of their university. They compile a bibliography of reliable sources, and make contributions in their own words.
Those are skills that help students, be it in their future academic careers, or on the job market.
The Year of Science
The Year of Science is in its second half, and we’re still working with new courses in genetics and beyond. If you’re interested in applying your students’ writing and critical thinking skills to a global service learning project, we’d love to hear from you. We provide tools, staff support, and online trainings for you and your students. We even have a guide specifically aimed at students writing Wikipedia articles about genes and proteins. We take care of the Wikipedia side, and let you focus on the material you know best. Get in touch: contact@wikiedu.org.
A car of the taxi company Bruckbacher from Austria with OpenStreetMap – Text on the car: „Better safe than sorry“ 1 | Picture by Günther Zinsberger under CC0
About us
We were asked after publishing issue 319 why MAPS.ME was marked as non-free software. MAPS.ME is not free software because it containsCode2000 font which is a shareware. We mark a software as free software if all its parts are free software as defined by Debian Free Software Guidelines.
Mapping
Johan Gyllenspetz of Mapillary, tweets about their new filter function for traffic signs.
Jochen Topf blogs about a service that Martin Raifer created to show the history of tags in nice graphs. Some more background in a diary entry from Martin. Matthijs Melissen has already used this to find some interesting statistics.
On the Tagging mailing list, Martin Koppenhaofer initiates a discussion on the tagging of the BND (the German intelligence service) in Bad Aibling. He points out that landuse = military doesn’t fit and suggests alternatives.
Swiss OSM community discusses, in the Talk-ch mailing list, whether they should map postal code polygons (as boundary relations) similar to what happened in Germany.
User Nammala from Mapbox updates about progress of reviewing navigation data in Germany in the OSM forum and diary.
Community
This year, for the first time in history of OpenStreetMap, there is a preparation for the OpenStreetMap Awards, they will be presented this September at the State of the Map 2016 in Brussels. Don’t forget to vote for your favourites! Anyone who is registered at OSM may participate. So you are! Please vote.
On the OSM-Talk mailing list, Matthijs Melissa proposed to replace the wiki page Map Features with an automatically generated version from Taginfo.
Martin Raifer has visualized the information of anonymized and simplified log-files of OSMF Tileserver provided by the OSMF.
Imports
Greg shares his idea to import trigonometric points in OSM in the Talk-gb mailing list.
OpenStreetMap Foundation
Brian Prangle communicates in the talk-GB mailing list informing that the statutes of the OpenStreetMap UK Community Interest Company has been reviewed by solicitors.
Events
Geofabrik invites everyone to be a part of the hack weekend on 29th and 30th October in Karlsruhe.
We are going to have two continental State of the Map conferences: in Asia and Latin America.
The State of the Map Asia will take place in Manila, Philippines, on October 1st and 2nd. The program schedule will be published soon.
São Paulo, Brazil, will host the State of the Map Latam from November 25th to 27th. The call for talks, workshops and working groups is still open: submit yours before September 25th! The event accepts activities in English, Spanish and Portuguese.
SB 79 advertises the Elbe-Labe Meeting which takes place from October 7th to 9th in Dresden. Register before September 14 to get a T-shirt.
Humanitarian OSM
Submit your scholarship application for SotM Asia.
Microsoft and Mapzen have built a Augmented reality app which is a navigation system for visually impaired people. It was presented at the SotM-US as well.
The Newton hurricane struck the area of Southern California. Tasking Manager has been made to facilitate mapping by the community. The red zone is the priority area. The whole area is poorly mapped, so the challenge is strong! Please help map!
Mapswipe was probably used with success in Myanmar.
Maps
[1] The Austrian Taxi company Bruckbacher advertises its regional knowledge and therefore painted (automatic translation) a huge OSM map on one of its cars.
The Guardian published an interesting quiz – ‘Can you identify the world cities from their running heat-maps?”. Try it.
switch2OSM
The Swedish Civil Contingencies Agency, is using OpenStreetMap in their crowdsourcing attempt of getting to know the real reach of their Emergency population warning system, Jan Ainali told us. (automatic translation)
The Overpass API v0.7.53 release comes with a bundle of useful improvements with changes in database and interface, coupled with other useful changes.
Programming
Yves asks in the Dev mailing list, “Along with minutely diffs, I wonder if expired tiles lists would be something to be shared”. Andy Allan justified why that does not happen.
Due to high load on the OSMF tile server, requests without HTTP-Referrer are considered as low-priority processes (that is too slow for interactive maps). This happens even if you include the tiles over HTTP on an HTTPS page,.
Petr Pridall of Klokan Technologies, the project Tileserver GL. This is a tile server written in Node.js that uses Mapbox GL Native (C and OpenGL).
IEEE Spectrum reports a new approach of the University of Texas to reach GPS centimeter accuracy with the help of ground stations.
Arthur Brenaut has published on github a nice game to locate cities. The game has the potential for development.
Here, which belongs to Audi, BMW and Daimler, is “open to for new investors”, the German IT Blog Heise-Online says. (automatic translation) Golem, an other IT news blog suspects two investors: MS and Amazon. (automatic translation)
Network World shows a map by our competitor visualizing which countries in the world have open source laws.
PlaneMad explains why the line on the ground given at the Greenwich Royal Observatory is about 100 metres off zero longitude today.
An Icelandic tourist drew this map on the envelope instead of an address. The postal service delivered!
Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..
This weekly was produced by Hakuch, Nakaner, Peda, Polyglot, Rogehm, SrrReal, YoViajo, derFred, jinalfoflia, kreuzschnabel.
“Ich suche mir keine Aufgaben. Normalerweise kommen die Aufgaben einfach zu mir,” sagt Thomas, freiwilliger Entwickler für MediaWiki. Wie sieht eigentlich die ehrenamtliche Tätigkeit eines freiwilligen Entwicklers aus? Wer steckt hinter dem Code und den Features, die tagtäglich von vielen Editoren benutzt werden? Julia Schuetze setzte sich mit Thomas aka tpt zusammen, um einen Einblick in die Programmiertätigkeit eines Freiwilligen zu bekommen.
An interview by Julia Schuetze with Thomas Pellissier-Tanon aka Tpt
“I work on the software behind Wikipedia!” That’s what Thomas aka (Tpt), a Volunteer Developer from France, tells his friends if they ask him about what he does in his free time. Up to ten hours per week he dedicates to free knowledge that way.
In the past two months, I got the chance to talk to some of our volunteer developers about their experience with the Wikimedia movement. I’d like to share Thomas’ story, his views, concerns, ideas and accomplishments with you.
Thomas started in 2009 when he was still in high school. A passion for egyptian history and pharaohs inspired him to contribute to the French Wikipedia. Back then, programming was new to him. He started by writing templates and by learning how to use the functions around Wikipedia.
Starting is not easy. Wikipedia is a project created, maintained and developed by millions of people. Thousands contribute at least once a month. People commit, some stay for longer, some only for a short time. I wondered what made Thomas stick around and become a very innovative volunteer developer in our community for over seven years now.
MediaWiki: “huge, complex and often ugly”
The first few months can be rocky, he says. It was an exploration for him because MediaWiki, the free and open source wiki application, which stores the content into databases, “is huge, complex and often ugly.” “It was a lot of reading code to see how it works and how all the pieces are fitting together,” Thomas remembers. “Some of that can act as barriers. Especially for developers who are not familiar with Wikis,” he explains. It was quite difficult to write code matching MediaWiki standards and conventions and with a good enough level of quality at first.”
By Deror_avi (Own work) [CC BY-SA 4.0] Thomas (Tpt) at the Wikimedia Hackathon in Jerusalem
Improvements have been made in the past years to make the start for volunteers developers as smooth as possible. The WMDE engineering page for volunteer developers aims to provide relevant links and explains the processes and tools the developers work with on Wikimedia projects. For MediaWiki, the developer hub aims to give new developers an overview and the article “how to become a MediaWiki hacker” tries to give advice for beginners. “The documentation of MediaWiki is good enough to get into it,” Thomas says “because it has nice little schemas.” On this level it is good, but Thomas raises an important point. Good documentation is not always the case for the extensions. That can be problematic if an extension is not maintained and someone wants to pick it up again.
What made him stick around for so long?
An overarching theme during our chat was “need” – the need for this tool or that extension to be developed which made him stick around. When he started “it was very painful because the Wiki source code was breaking because the extension wasn’t maintained. And the system at this time for deployment was kind of bad. So I have written unit tests in 2013. Unit tests are a kind of automated tests that are written in order to ensure that the software still behaves correctly before the deployments”. This shows how the projects have potential but it’s the people who make the Wikimedia projects to what they are today by developing useful tools, features and thinking of ways of how the project can look like in the future.
When Wikidata started in 2012, it was another milestone for Thomas. He was keen to follow the development work and proposed some changes. Questions about how it would be structured and discussions with Denny Vrandecic about if links to external sources should be in the cycling question really got him involved in the project early on. AskPlatypus, aWikidata-based search engine, which can translate natural language to questions Wikidata can answer, should soon become his biggest Wikidata project. That is a good example of how volunteers should and can shape the direction of a Wikimedia project.
Another way Thomas got involved was via the technical wishlist. During the last Wikimedia Hackathon in Jerusalem, he worked on the VisualEditor support for Wikisource. It’s one of the items on the wish list. So that can be a good way to get started as well and find the first tasks.
About the impact beyond Wikimedia projects
Thomas also notes how impactful his work can be to other open source projects. Wikimedia/MediaWiki projects can expand into other projects which are connected or not. One of the ones Thomas mentioned is WSexport. It is a tool that exports Wikisource content into epub, pdf and those formats.
This initiative emerged because the community and him found that it was very painful to read Wikisource content out of Wikisource. “I think Wikisource was losing a lot of contributors because it was quite complicated compared to Project Gutenberg that has quite a lot of tools to read books in a lot of different formats. And for us it was completely different. It’s not very useful, but it’s quite nice and fancy and very interesting to develop.”
Other reasons for why he is motivated to develop something is because others and him would like this tool, because it would be so useful. So he says: “Hey, I should do it”. Or “Hey, it could be amazing to have such a thing.” For that the community appreciates him. He receives a lot of direct requests. “I don’t have to find tasks. Usually, it is tasks which are coming to me. I just receive an email for each request.” For developer tasks, volunteers use Phabricator and Github. Some tasks are extra marked with ‘volunteers needed’. Talk pages, too, play an important role. Thomas is known in the community, so there are a lot of people who ask him “You should fix it” or “You should implement this”. “Usually, there are a lot of requests, that is a very good side effect of the talk page,” he says.
Thomas’ hope is to find other Volunteer Developers to work with in the future.
By looking at the many ways Thomas has contributed, it becomes clear how diverse the impact of Volunteer Developers on Wikimedia projects can be. Thomas’ hope is to find other Volunteer Developers to work with in the future. “If there were more, some of the extensions could be maintained better.”
Some final advice for us? “If the contribution process is easy enough to make people realise that contributing to MediaWikis is not so difficult and that with a small contribution you can get huge improvement in the Wikis, let’s say contribution of workflows (…) then usually people come up with smart things.”
And wishes for the future?
“In general, for the Wikimedia Movement to have more interest in Wikisource and keeping up the good work for Wikidata.”
At the beginning I asked him which three words he associates with Wikimedia. They were knowledge, sharing and community. And his role in the community, I wondered?
“Maintenance,” he says would be the best word to describe his volunteering position.
I believe caretaker is a good one for Thomas aka tpt, too.
Thank you, Thomas for taking the time to share your experience.
It is all over the news; another psychology study debunked. With two thirds of the repeated studies being debunked, there is a lot in the literature of psychology no longer valid. The source for the article I read is Mr Eric-Jan Wagenmakers professor at the university of Amsterdam.
The NWO, the Netherlands Organisation for Scientific Research, is funding 3 million Euro to repeat key research. The problem is that science is in love with what is new and quick results. Three million is at best a start.
When science cannot be relied on, collaboration with scientists and universities easily becomes controversial. The programs taught are inherently point of view and often a conflict of interest is easily established. Consider; when doctors prescribe substances that are FDA approved, it seems obvious that these substances have a positive effect on patients. Then consider that we have a Wikipedian in Residence at Cochrane, they make a reputation from debunking much of the use of such substances. We provide end user information and it seems obvious that just repeating the list of FDA approved substances without further information is not at all in our users best interest. It is even likely that we are liable for misinformation under several legislatures.
There is a need to be sceptical about sources. It is important that we not only improve the technology behind our sources, we also need an ability to mark information as debunked and have that information filter through our projects and in the information we provide. Remember, debunked is not a POV it comes with sources of its own. Thanks, GerardM
Many years ago, I was given a school assignment. Each member of our class was given a different artist’s name, and tasked to research it at the local library that weekend. Some got famous individuals such as Leonardo di Vinci or Van Gogh. Unluckily, I got an obscure early 20th-century cubist painter.
After several hours looking through the library’s very limited stock of books on the subject, I had found two sentences of dull information and a small, grainy black and white photograph of one of his works. I was put off art history pretty much for life, but I remember thinking at the time that there had to be a better approach to education.
Some years later, the internet arrived.
I became excited at the realisation that we were entering a new phase of human history, one in which every person—and in particular, every young adult—could have access to first-class, well-written, factual material.
I was fascinated by the idea that, given a similar assignment today, a student should be able to find exciting articles that would set them alight with a love of the subject… If, of course, we combined our collective skills to write them. One of the topics I choose to write about on the Wikipedia are our castles.
The United Kingdom, my home country, is permeated with castles. Almost everyone lives near them; we drive past them on our way to work; we’ve named streets and neighbourhoods after them; our head of state lives in one of the oldest in the land. But normally most of us barely look at them, let alone think about them. Their features blend into the landscape, often covered by the grey haze of our inclement weather.
Norman castles formed part of a terrifying new form of warfare, driving their nobility deep up into our Celtic valleys and hills. For the later English kings, they were the “bones of the kingdom”, combining rugged force with royal symbolism. Styles of architecture came and went, some sites being abandoned, others rebuilt time and time again. Generations of women, men and children lived, loved and toiled within their chambers and walls. If you know how to read them, castles are our crystallised history.
We can retell those stories through good Wikipedia articles.
Stones, bricks and earth become ways of reaching out to invaders, traitors, heroes, gamblers, reclusive old ladies, lovers… I remember discovering one day that a small stone tower in the countryside, built by a nouveau-riche wool merchant, was intended to resemble the massive walls of Edward I’s Caernarvon, which was itself designed to imitate the imagined glories of the Byzantine emperors. A Wikipedia article can be the gateway to new horizons and new ways of seeing.
I find the challenge stimulating: the task is both analytic and artistic, in that as an editor you are constantly choosing which facts to select and how best to present the narrative and characters. It is a bit like completing a cryptic crossword, mixed with writing a bit of historical fiction. Producing good castle articles isn’t necessarily easy though: it takes many hours of research, attention to detail and often a willingness to “go the extra mile” by acquiring access to specialist books and journals.
Of course, it’s made fun by the presence of enthusiastic fellow editors, and nothing can beat finding out that someone has used an article you’ve written when they’ve visited a site, or as part of their own school project.
“Why I …” is an ongoing series on the Wikimedia Blog. We want to hear what motivates you to contribute to Wikimedia sites: send us an email at blogteam[at]wikimedia[dot]org if you know of someone who wants to share their story about what gets them to write articles, take photographs, proofread transcriptions, and beyond.