en.planet.wikimedia

February 21, 2017

Jeroen De Dauw

Simple is not easy

Simplicity is possibly the single most important thing on the technical side of software development. It is crucial to keep development costs down and external quality high. This blog post is about why simplicity is not the same thing as easiness, and common misconceptions around these terms.

Simple is not easy

Simple is the opposite of complex. Both are a measure of complexity, which arises from intertwining things such as concepts and responsibilities. Complexity is objective, and certain aspects of it, such as Cyclomatic Complexity, can be measured with many code quality tools.

Easy is the opposite of hard. Both are a measure of effort, which unlike complexity, is subjective and highly dependent on the context. For instance, it can be quite hard to rename a method in a large codebase if you do not have a tool that allows doing so safely. Similarly, it can be quite hard to understand an OO project if you are not familiar with OO.

Achieving simplicity is hard

I’m sorry I wrote you such a long letter; I didn’t have time to write a short one.

Blaise Pascal

Finding simple solutions, or brief ways to express something clearly, is harder than finding something that works but is more complex. In other words, achieving simplicity is hard. This is unfortunate, since dealing with complexity is so hard.

Since in recent decades the cost of software maintenance has become much greater than the cost of its creation, it makes sense to make maintenance as easy as we can. This means avoiding as much complexity as we can during the creation of the software, which is a hard task. The cost of the complexity does not suddenly appear once the software goes into an official maintenance phase, it is there on day 2, when you need to deal with code from day 1.

Good design requires thought

Some people in the field conflate simple and easy in a particularly unfortunate manner. They reason that if you need to think a lot about how to create a design, it will be hard to understand the design. Clearly, thinking a lot about a design does not guarantee that it is good and minimizes complexity. You can do a good job and create something simple or you can overengineer. The one safe conclusion you can make based on the effort spent is that for most non-trivial problems, if little effort was spent (by going for the easy approach), the solution is going to be more complex than it could have been.

One high-profile case of such conflation can be found in the principles behind the Agile Manifesto. While I don’t fully agree with some of the other principles, this is the only one I strongly disagree with (unless you remove the middle part). Yay Software Craftsmanship manifesto.

Simplicity–the art of maximizing the amount of work not done–is essential

Principles behind the Agile Manifesto

Similarly we should be careful to not confuse the ease of understanding a system with the ease of understanding how or why it was created the way it was. The latter, while still easier than the actual task of creating a simple solution, is still going to be harder than working with said simple solution, especially for those that lack the skills used in its creation.

Again, I found a relatively high-profile example of such confusion:

If the implementation is hard to explain, it’s a bad idea. If the implementation is easy to explain, it may be a good idea.

The Zen of Python

I think this is just wrong.

You can throw all books in a library onto a big pile and then claim it’s easy to explain where a particular book is – in the pile – though actually finding the book is a bigger challenge. It’s true that you need more skills to use a well-organized library effectively than you need to go through a pile of books randomly. You need to know the alphabet, be familiar with the concept of genres, etc. Clearly an organized library is easier to deal with than our pile of books for anyone that has those skills.

It is also true that sometimes it does not make sense to invest in the skill that allows working more effectively, and that sometimes you simply cannot find people with the desired skills. This is where the real bottleneck is: learning. Most of the time these investments are worth it, as they allow you to work both faster and better from that point on.

See also

In my reply to the Big Ball of Mud paper I also talk about how achieving simplicity requires effort.

The main source of inspiration that led me to this blog post is Rich Hickeys 2012 Rails Conf keynote, where he starts by differentiating simple and easy. If you don’t know who Rich Hickey is (he created Clojure), go watch all his talks on YouTube now, well worth the time. (I don’t agree with everything he says but it tends to be interesting regardless.) You can start with this keynote, which goes into more detail than this blog post and adds a bunch of extra goodies on top. <3 Rich

Following the reasoning in this blog post, you cannot trade software quality for lower cost. You can read more about this in the Tradable Quality Hypothesis and Design Stamina Hypothesis articles.

There is another blog post titled Simple is not easy, which as far as I can tell, differentiates the terms without regard to software development.

by Jeroen at February 21, 2017 07:43 PM

Wikimedia UK

How do Wikipedia editors decide what are reliable sources?

Public domain image by user:Julo

Over the past few weeks, Wikimedia UK has received a large number of press inquiries related to the Guardian’s article ‘Wikipedia bans Daily Mail as ‘unreliable’ source’. Now that the dust has settled on this story a little, we thought it might be helpful to clarify how the community of editors who create Wikipedia and its sister projects came to adopt a policy to generally avoid using references to Daily Mail articles.

Much of the coverage of this editorial decision, both by The Guardian and by other media, referred to Wikipedia at least as often as Wikipedia editors; although The Guardian did add that ‘The move is likely to stop short of prohibiting linking to the Daily Mail’, because as many Wikimedians will be fully aware, one of the Five Pillars of Wikipedia is that ‘Wikipedia has no firm rules’.

‘Wikipedia bans the Daily Mail’ is pretty much the headline which every media outlet went with for this story.

We at Wikimedia UK recognise that there is often confusion between the UK charity and the Wikimedia Foundation based in the United States, as well as the relationship between the Wikimedia movement, chapters like Wikimedia UK, and the open knowledge websites owned by the Foundation including Wikipedia. Often, people do not even realise that Wikileaks, Julian Assange’s website (which is not in fact based on a wiki structure) has nothing to do with Wikimedia.

Unfortunately, talking about ‘Wikipedia does X’ tends to give the public the impression that Wikipedia is a unitary body run by a company, and this is what will stick in people’s minds, even if the article itself includes a more complex analysis. In a world that is commercialised and run for profit, the very concept of a decentralised, open source encyclopaedia whose infrastructure is administered by a non-profit charity can seem difficult to understand for many.

While it is true to say that it’s rare for publications to be singled out as unreliable, it has generally been the case that established policy guidelines such as those on Identifying Reliable Sources (also known as WP:RS) have served the purpose of ensuring that poor references do not creep in. For example, ‘self-published media [like blogs, forum posts] are largely not acceptable.’

The wider point here is worth noting: Wikipedia editors have long deprecated the use of tabloid articles as references. Wikipedia guidelines on potentially unreliable sources (WP:PUS) states that:

The point about notability is particularly important here. Lots of people want to start articles about things that aren’t ‘Wikipedia worthy’, like celebrities, their mum, cat, friend or the time in school where they pulled a prank that was totally rad back in the 90s. Wikipedia is an encyclopaedia. It’s not a place to collect celebrity facts, and there needs to be some filtering out of things that aren’t widely important or influential.

It’s also worth noting that the reliable sources (WP:RS) guidelines state that,

Wikipedia articles (and Wikipedia mirrors) are not reliable sources for any purpose (except as sources on themselves per WP:SELFSOURCE). Because Wikipedia forbids original research, there is nothing reliable in it that is not citable with something else.’

When it comes to sources for medical articles, the rules are even more stringent (WM:MEDRS). Wikipedia editors want to protect and ensure the site’s reputation for reliability, and to do that, the standards for referencing need to be very high.

Wikipedia and the other Wikimedia projects are complex creations which have accumulated as the result of millions of human work hours. As such, they can be quite opaque for most users of Wikipedia who only engage with the site’s surface by reading the articles. These articles comprise only around 30% of the entire number of pages on Wikipedia, compared to another 70% which is made up of the discussion pages, policy documents and guidelines intended to help editors decide what should go into the articles themselves.

It is quite hard to understand how these parts all work together unless you get involved in editing yourself, and particularly in the discussion pages behind the articles. The ethos of the site is based on consensus and discussion with the aim of taking a neutral stance on contentious issues.

Despite this, the content of the site is influenced by the interests of those who edit it. The majority of the editors are white, European or North American and male. This means that the content reflects their interests more than those of, for example, black or Latino women.

If people who read the Daily Mail believe that Wikipedia editors are biased, they are more than welcome to get involved in editing Wikimedia projects. As long as they follow the guidelines on Neutral Point of View (NPoV), good referencing and assuming the good intentions of other editors, they are free to argue that the Daily Mail is in fact an accurate source for referencing factual information.

That’s how Wikipedia works, and we would be more than happy to have the 1.49 million daily readers of the Mail involved in improving the volume and accuracy of content on Wikipedia.

by John Lubbock at February 21, 2017 04:19 PM

Sam Wilson

Editing MediaWiki pages in an external editor

, Fremantle.

I've been working on a MediaWiki gadget lately, for editing Wikisource authors' metadata without leaving the author page. It's fun working with and learning more about OOjs-UI, but it's also a pain because gadget code is kept in Javascript pages in the MediaWiki namespace, and so every single time you want to change something it's a matter of saving the whole page, then clicking 'edit' again, and scrolling back down to find the spot you were at. The other end of things—the re-loading of whatever test page is running the gadget—is annoying and slow enough, without having to do much the same thing at the source end too.

So I've added a feature to the ExternalArticles extension that allows a whole directory full of text files to be imported at once (namespaces are handled as subdirectories). More importantly, it also 'watches' the directories and every time a file is updated (i.e. with Ctrl-S in a text editor or IDE) it is re-imported. So this means I can have MediaWiki:Gadget-Author.js and MediaWiki:Gadget-Author.css open in PhpStorm, and just edit from there. I even have these files open inside a MediaWiki project and so autocompletion and documentation look-up works as usual for all the library code. It's even quite a speedy set-up, luckily: I haven't yet noticed having to wait at any time between saving some code, alt-tabbing to the browser, and hitting F5.

I dare say my bodged-together script has many flaws, but it's working for me for now!

+ Add a commentComments on this blog post
No comments yet

by Sam at February 21, 2017 12:00 AM

February 19, 2017

Gerard Meijssen

#Wikidata - The "first" president of Haiti

When people express a strong interest for a subject; when there is a chance that this subject is finally getting the attention it deserves, it is a good moment to assist, particularly when it is just a matter of concentrating on what you do anyway.

So Haiti has presently my attention. I have added the known members of the Chamber of Deputies all six of them, I have added the succession on most of the Presidents of Haiti. The problem here is that I do not know enough to make sense out of the early rulers and I will the known members of the Senate.

When I am done with this, I hope to get a list of the present members of the Chamber of Deputies. It is easy enough to include them in Wikidata and this may be followed up by generating lists for use in any Wikipedia that will take it.

Lists like this are wonderful because they provide early structure. When someone adds an article, it is already linked in many places in that Wikipedia and this will make for meaningful early linking in a project. Lists of award winners, lists of politicians for a party or an office. It is all possible when you think in potential particularly when the objective is to share in the sum of all knowledge.
Thanks,
      GerardM


by Gerard Meijssen (noreply@blogger.com) at February 19, 2017 08:57 AM

February 18, 2017

Shyamal

Research techniques - Wikipedian ways

Over the years, I have been using Wikipedia, as a kind of public research note book. I sometimes fail to keep careful notes and I regret it. For instance, some years ago I was reading through some scanned materials on an archive and came across a record of the Great Indian Hornbill in the Kolli Hills in Tamil Nadu. It was carefully noted by some British medical officer who was visiting the place and he commented on the presence of the species in the region as part of a report that he submitted on the sanitary and medical conditions of the district. Google searches did not see or index the document and I thought I would find the content when I wanted it but I have never managed since to find it again. Imagine how useful it would have been to me and others if I had put in a reference to it in the Wikipedia article on the hornbill species with a comment on its past distribution. 
https://en.wikipedia.org/wiki/Samuel_B._Fairbank

Not long ago, someone on the email list Taxacom-L sought information on Samuel B. Fairbank - a collector of specimens in India. I knew the name as he was one of the collaborators of Allan Octavian Hume (who even named a species after him) and decided that I knew enough to respond to the request for information. I looked around on the Internet and found that there was enough material scattered around to put together a decent biography (I even found a portrait photo whose copyright had thankfully expired) and it led to a Wikipedia entry that should spare anyone else looking for it the effort that I put in. Of course one follows the normal Wikipedia/reseach requirements of adding citations to the original sources so that anyone interested in more information or in verifying the sources can double check it.

These additions to Wikipedia may strike you as something that is not very different from what an ant does when it (actually usually she) goes out foraging - when she finds food, she eats a bit and then returns to the nest leaving behind a trail marker on the ground that says "this way for food". Other ants that are walking by spot the message written on the ground and if interested go on and help harvest the food resource. The ants that find the food again add a trail marker - now the strength of the trail marker chemical indicates veracity and possibly the amount of food available. This kind of one-to-many communication between individuals mediated via environmental cues has a term - stigmergy. Now the ant colony has been termed as a "super organism", a kind of distributed animal, with eyes, legs and even a brain that is distributed across little seemingly independent entities. Now there is a lot of research on how super-organisms work - it is an area of considerable interest in computer science because - the system is extremely resilient to damage - a colony goes on as if nothing has happened if you went and crushed a whole bunch of ants underfoot. How far this metaphor helps in understanding the organic growth of Wikipedia is uncertain but it certainly seems to be a useful way of conveying the idea of how contributors work.

I sometimes run workshops to recruit new people to contribute to Wikipedia and my usual spiel does not include any talk on "how to edit" Wikipedia but deals with why contribute and about how to incorporate Wikipedia into one's normal day-to-day activities. I sometimes take pictures from walks, record bird calls and research topics for my own learning. I compare what I learn with what Wikipedia has to say and where it fails, I try and fix defects. This does not actually come in the way of my learning process or work much but I like to think that it helps others who may come looking for the same kinds of things.

Incorporating Wikipedia into normal learning practice - should only need a small incremental effort.

The real problem in some parts of the world, such as in India, is that not everyone has access to good enough routes to learning - experts are often inaccessible and libraries are often poorly stocked even if they happen to be available. Of course there are privileged contributors who do have access to better information sources than others but these are the people that often look at Wikipedia and complain about its shortcomings - it seems likely therefore that the under-privileged might be better at contributing. In recent times, Russian underground sites like sci-hub have altered the ecosystem in a kind of revolution but there are also legal channels like the Wikipedia Reference Exchange that really go a long way to aiding research.

Of course there are an endless array of ways in which one could contribute - by translating from one language to another - if you are proficient in two languages - there is the gap finder which allows you to find what entries are on one language and missing on another - http://recommend.wmflabs.org/ . If you are interested in challenging your research abilities and want to see how good you are at telling good and reliable resources from websites with "alternative facts and news" then you should try finding references for dubious or uncited content from https://tools.wmflabs.org/citationhunt/en .

One of the real problems with Indian editors on Wikipedia is that a large number of them support their additions with newspaper and media mentions and many of them do not know what reliable sources mean. Information literacy is key and having more scholarly information resources is important. I have therefore tried to compile a list of digital libraries and resources (especially those with India related content).

Here they are in no particular order:
Although all of these are accessible, you may need little tricks like finding the right keywords to search, using the right google operators in some cases and for some people finding references for obscure things is fun. And some of us, like me, will be happy to help others in their research. With this idea, I created a Facebook group where you can seek references or content hidden behind a paywall. This assistance is provided in the hope that you can summarize your research findings on Wikipedia and make life easier for the ants that walk by in the future.

by Shyamal L. (noreply@blogger.com) at February 18, 2017 04:30 PM

User:Bluerasberry

A history of the Wikimedia Foundation Support and Safety team

I started following Wikipedia community organization from 2012. At that time a trend was beginning to talk about addressing online harassment, especially harassment of women by men, but at the time each instance of the problem was imagined as an independent anomaly rather than a systemic problem. By that I mean that big online communities like Facebook, Twitter, Reddit, Wikipedia, and all the others did not organize public discussions about the systemic causes of online harassment as a consequence of the nature of online social interaction. Now, five years later, things have changed and it is routine for everyone managing online communities to have some shared background context, mutual understanding, and objective distance for discussing the problem. One of the stumbling blocks that had to be overcome recently was the recognition that online sociology with harassment plays out differently than offline sociology. This is true for many practical matters of online social interaction, but the digital age as not yet matured, and we are still having to process many broad categories of social interaction and talk through how they differ when done online versus offline. Another of the barriers was mutual acknowledgement that all online communities have this problem. Before 2012 and perhaps not before 2014-15 it was not common knowledge that all online communities had this problem, so the public-facing corporate representatives of every individual online community took action to suppress the fact that their community experienced harassment problems. The logic was that if any online community stepped up and announced that they had a major harassment problem that was unlike anything in offline socialization, then the managers of that community would have been stigmatized for moral failings and organizational fault and pressured to attempt impossible tasks to fix the issue. When everyone was doing enough posturing to acknowledge the problem, them it became easier for any community to state its own situation and advance the necessary discussion and collaboration. If there was a public imagining of a systemic problem before 2012, then it was low-level general incivility that might target any group which regularly faces conflicts, like LGBT+ people, individuals editing contentious political or religious topics, and women included but not particularly in need of special attention.

In April 2014 someone started the Wikipedia article “Gender bias on Wikipedia“, so that marks a point in time where there was enough published journalism on the topic to fill out a Wikipedia article. The fact of a Wikipedia article means that conversation in Wikipedia had been ongoing long enough for journalists and researchers to respond to that conversation, and then finally for someone to notice that enough publication had been done to create a Wikipedia article summarizing it.

In February 2015 the Wikimedia Foundation began what was to become a recurring grants program called “Inspire“, which piloted with a call for project proposals to increase engagement among women. Regardless of whatever else that program accomplished with actual grantmaking, that instance of the Inspire program led the Wikimedia community to have discussions about the extent to which outreach to women was necessary. Before that program, many people doubted it. After the program, with significant credit to Dorothy, word got around that harassment of women was a serious enough problem to merit an allocation of staffing and grants. I wish to emphasize that definitely at the time of that February 2015, repeatedly and in every way the prevailing thought and default assumption in any decision-making process was that the Wikimedia Foundation would leave the management of harassment to community volunteers and not have its own staff be involved in policy development, dispute resolution, best practice, or anything similar. Also, people who spoke for the Wikimedia Foundation publicly or in rumors had no intention to direct any significant amount of funding into this space. There were significant but quieter gestures before the Inspire campaign, but the Inspire campaign was a public and controversial demonstration that the WMF did care about this issue. Still, despite the Inspire campaign being a bold push, it was also cautious and reserved with limited staff time to make it as effective enough to meet community demand and limited funding to make the change. There was no fault with the campaign – the model started at an appropriate size and scope for a first iteration that needed to grow with time. The Inspire campaign was just entering an issue much bigger than it was designed to manage, and the intent of the campaign never was to address harassment anyway. That iteration of it was just supposed to promote participation by women by encouraging positive participation rather than imagining how to reduce negative activities.

I had been talking with Dorothy about harassment perhaps since she moved to New York in August 2013. My concern was about LGBT+ harassment and Dorothy knew more about harassment for women. I came to believe that since there are lots of women, and since they get the most harassment, then the best way to sort the harassment problem for LGBT+ and for any other minority group experiencing harassment would be to join the effort to solve the problem for women. When there would be a solution for harassment against women then that solution should also work for LGBT+ harassment or any other kind of harassment. I wrote in a March 2015 post “Logistics behind promoting and planning a long-term solution to online harassment with partners” about some of the challenges we were considering. The most significant takeaways that I had from early exploration of the topic was that in 2015, the philosophy at the Wikimedia Foundation was that addressing harassment with staff attention was something that was not likely to happen at any time in the forseeable future.

“Harassment” is something that should be defined, and it is challenging to summarizing the topic, but I mean the perception of being the target of hostility which deters a person from positive participation in the community. If harassment is managed on a scale of 1-10, with 1 being mild and 10 being severe, then I feel that the Wikimedia community addresses harassment on a level of 1-3 very well with volunteer infrastructure. The Wikimedia Foundation has always been available to address severe harassment at the 9-10 level, like death or legal threats. The challenge is always that mid-tier level 4-8 harassment. Problems addressing this harassment include cost, high failure rate in attempts to make the situation better, high liability in even getting involved, and high commitment required to serve all situations and languages globally if any cases are taken in any single situation in one region. I understand why the Wikimedia Foundation along with all other similar organizations found the idea of getting involved to be unthinkable; all options which anyone had presented would be costly, ineffective, and unsatisfying. However, around 2015 the way that Dorothy and I presented the issue in the IdeaLab was not as a problem which needed a solution, but instead as a problem which needed a commitment for long-term financial investment to address.

I think that no one has an easily comprehensible way to address harassment but there are some viable ideas in circulation that might, for example, target 1% of Wikipedia’s harassment cases and resolve them with 50% efficacy. Usually when I hear someone’s idea to solve the entire Wikipedia harassment problem, I would characterize it as a “1% of the problem, 50% of the time” solution, with all of these ideas being costly. Still, in the longer term Wikipedia has to do something and piloting some of these ideas and researching their efficacy could be a path to a solution. Something that Dorothy and I were allowed to say, but that Wikimedia Foundation staff were not allowed to say because of conflict of interest, was that we wanted the Wikimedia Foundation to create a documentation repository of complaints. Things developed and in December 2016 the WMF published some draft advice for Wikimedia volunteer community groups about documenting evidence of harassment as a way to support victims of harassment. I think the advice that someone should do documentation is sound, but I do not agree that crowdsourced volunteers can manage documentation in a way that does not compromise the safety of all involved and risk weird blame, accusations and making everything worse. In an office if there is a harassment complaint, the human resources department will not crowdsource a solution among the employees, but instead they document the issue in a professional way and keep the trust of everyone to limit access to the information they hold. It is quite beyond volunteer ability to do meaningful documentation.

Dorothy and I had the idea that the WMF would operate a complaint box. The idea was that anyone could report any complaint into that complaint box, and it would be documented safely with a time stamp. Access to any given complaint could be granted by the person who made it, but also, highly vetted data researchers with no interest in any particular case could have access to the complaints for the purpose of discovering fundamental unknown information about harassment. Basic questions which we cannot answer include how often people complain, why they complain, whether they ever find a resolution, how serious the issue is, and whether multiple people all have complaints about any single individual. We started developing the proposal for the Centralised harassment reporting and referral service in October 2014. This grant proposal – which was never to request funds for anyone or any organization in particular, and was always just a call for funding to be allocated to any reputable organization for crisis management – was turned down in April 2015 after review in that first Inspire campaign for women’s issues. The mood at the time and before this was that there would never WMF staff appointed to manage community social or personal crisis, and with the denial of the grant, there would be no funding for external support either. I understood the denial of the grant because our grant request was totally beyond the rules of the grant request process. That inspire grant series was more about funding small projects with small amounts of money, because again at the time – April 2015 – the Wikimedia Foundation was not ready to say that harassment was among the issues which were among its responsibilities to allocate funding to address. I am still glad that Dorothy and I applied. A consequence of our applying for this grant was that we raised the point in discussion that this was an issue which merited 100s of thousands of dollars in funding soon and in an organized way. Even if that money did not bring a solution, then at least it limited liability. Having a harassment crisis in the context of not investing in a solution is much different than having a harassment crisis when the problem is funded with staff, collaborations, research, and pilot projects. The point of the request was to establish an expectation that funding would go to crisis management and to that end it succeeded. Also we were forcing the conversation to compel the WMF to consider whether if there were a steady stream of 100s of thousands of dollars going to this issue, would that money go to an external organization, or would the WMF like to keep that money internally and use it to build out its empire? Framing the discussion as a question of who will be funded to address the problem made the conversation different than asking the WMF to hire someone internally to address the problem. As long as the conversation was about asking the WMF to take responsibility, they response was that the matter was out of scope. When the conversation was about which external university or nonprofit organization was going to get a community mandate to make a territorial claim in a high profile position of the Wikimedia community and get perpetual approval for disbursement of large amounts of funds managed by the WMF, then the conversation started trending toward the WMF feeling more able to provide some services by hiring staff in-house.

Dorothy and I contacted any likely organization would could find through search or referral to be the one to partner with the WMF to address harassment. We had wondered if there were a large project to address harassment, funded on the order of 300k, then who would manage the project? If a suitable organization existed with expertise in addressing harassment in online communities, then they should get the funds. We contacted various organizations. Some of them asked that we not publicly talk about them and Wikipedia, and with other organizations we thought better of asking them about a public conversation. In that time – 2014-2015 – I would not want anyone in the future to misunderstand that Wikipedia still had a poor reputation. There is a trend for Wikipedia’s reputation to be improving, but still Wikipedia is mostly reviled, and in the time that we were asking we were hardly taken seriously. Organizations which I felt were minor communicated with us with a perception that Wikipedia was a fad almost at its end and that there would be no merit in supporting its community base, even if paid with funding. I do not want to name names, but just imagine that Dorothy and I sent emails to about 10 likely big-enough organizations. Lots of them took us seriously for what we said, especially when we were offering them hundreds of thousands of dollars and an ongoing stream of funding. We had phone and video chats with many of them, some of them several times, and by their seeming lack of awareness of what we imagined to be the fundamental background of the issue, we came to believe that there would be no ideal organizational partner with a background in harassment management. For context, the professional backgrounds which we sought were in de-escalation, crisis hotline, diversity training, and online community management. My assertion is that in 2014-2015 major organizations operating in these fields had not yet seen value in applying their expertise to online communities. Even organizations which which had been recently founded specifically to address online harassment were far away from our expectations. Either they wished to force the application of offline techniques in irrelevant ways to online communities, or simply had what Wikipedians would call no meaningful experience with online communities, or they were just corporate minded and wanted to be highly funded perpetually to work mostly in secret without community consultation while periodically releasing products. I became satisfied that it would be easier to teach Wikipedia community leaders to learn harassment management than it would be to teach harassment management professionals what they needed to know about online communities.

Following our grant request, I felt like Dorothy and I had success as the WMF went contrary to everything which we had been told many times by many people before. The position of the WMF switched, and in fact, they did start investing in long term harassment management even with the understanding that in the short term this expensive investment would provide no relief.

In April 2014 Patrick accepted an appointment to the WMF’s community advocacy division. Patrick is a colleague and friend of both Dorothy and me, and we were glad to see him in the role. At the time there were no public plans to put more people into community advocacy, and not much news about what community advocacy would become, but Patrick said that first steps would be information collection, sorting what has already been done, doing surveys, and seeking community comment on various proposals. I am sure that Patrick was hired for his neutral and objective outlet and his good sense of personal boundaries and work-life balance, which are essential skills in the crisis business.

A strange thing about Patrick’s hire was that it was a reimagining of the concept of the “Community Advocacy” department, with Patrick given an unusual opportunity to express a lot of personal vision to influence both the direction of the department and the WMF position on harassment as a whole. When he was hired, the nature of the position it could have been perceived as a chump role just because as I said professionals in the field of harassment management mostly had no respect for Wikiepdia or even online communities. I had been watching the harassment situation play out as an epic and when Patrick placed himself into this unwanted role, I realized that he would impose the hegemony of the Cascadia ideal of virtue onto Wikipedia’s community values and by extension all human interaction in every context forever after. He was taking an undesirable humble job opening which also which had a greater worth and impact than almost anyone watching realized, and which was going to grow into something history making. He had a great skill set going into the job, and now he is a world authority in a hot field. I was delighted to see someone familiar and appropriate get this position.

For background context, in February 2012 the Wikimedia Foundation established a group of staff called “Community Advocacy”. I only know the work of this group as it faced the Wikimedia community, but I am no insider who knows what private projects it did. Between 2012 and 2014 it offered legal support to Wikipedia editors who were the target of legal threats for editing Wikipedia articles. At the time it was first imagined that legal threats were a major problem. They are, and every day all kinds of people from the Internet go to Wikipedia to post and email wild lawsuit threats. Dozens of these come in every day, but in contrast to the pre-Internet age when legal threats were more serious, nowadays the legal threats for wiki go into a pile for volunteers to address as with so many other wiki task queues. The volunteers who address these usually manage them with canned responses and laughs, but volunteers also need to be able to identify which legal threats they should send to the staff legal team at the WMF. Legal@wikimedia.org assists volunteers by issuing an escalated service tier of canned responses and laughs to people who write in requesting a more thorough response. In September 2012 a company targeted two Wikipedia volunteers with a lawsuit. The Wikimedia Foundation funded their legal counsel, and that led to a general commitment that if anyone is doing good on wiki and is the target of a lawsuit as a consequence, then the Wikimedia Foundation will fund that person’s legal counsel. One significance of that lawsuit was that it led to a recognition of the divergence of needs between legal issues and other sorts of issues. I would attribute the special needs of addressing that lawsuit to being one of the factors which made the Community Advocacy department more open to forking off to another focus, like harassment.

In 2013-14 there was a major conflict between the Wikimedia Foundation and the Wikimedia community over the rollout of the Media Viewer software. In summary, the sudden software changes without sufficient community input greatly disrupted the workflow of many Wikimedia community volunteers. The initial WMF response was to dismiss complaints or have any random staff person speak publicly about the issue instead of planning an institutional response, and hundreds of contributors consequently felt disenfranchised and mistreated. There hundreds of pages of text discussing that. Following that, the WMF made commitments to minimize harm in the future whatever else happened. One change that followed this was that in July 2014 the Community Advocacy team revised the page and added a statement saying, “We provide support on the rollout of major changes, such as legal policy updates and OTRS software upgrades.” The idea was that the group was transitioning from being a support service to Wikipedians from external threats, to also being a support service for protecting the community from harms that might come from the WMF’s own software changes. The team wanted to be available to hear any sort of serious problem that Wikimedia community members were experiencing. Software rollouts should never have been imagined as being approximately the same as addressing harassment or providing assistance to someone being sued, but again – it made sense at the time to combine all these things, and this particular issue was the primary cause for separating software rollouts from harassment.

In January 2016 the Community Advocacy team changed its name and their to “Support and Safety” and also changed the mission of the team. This was a consequence of Patrick’s April 2014 hire into this role as harassment person. The team hired other people, and different people have different roles, and it is not as if Patrick was a mastermind of the process, but he had a major role and influence as the designated starting point of reform. In the context of legal being managed in a different way now, and software rollouts being managed in a different way, that created an opening for “community advocacy” to differentiate itself as “harassment management”.

AT WikiConference North America 2016 Patrick and I talked about community guidelines he had been developing to outline the two big concepts – “Dealing with Online Harassment” and “Keeping events safe“. I told him that I would share these with Wikimedia New York City and that our chapter would be the first adopters of these policies, because we already had been trying to draft our own. Now with these well-prepared drafts to start the conversation, whenever we as a Wikimedia community chapter have a problem and someone pressures us to do something about it, we can point to these guides and explain our position. On Wednesday 15 February 2017 after many small group and interpersonal discussions we introduced the drafts to our public WikiWednesday meeting. That does not mean that everything is sorted, and neither our chapter board accepting the guides nor acceptance at any public meeting would mean that the last word has been said, but now at least with these guides we will never have to begun a discussion from the beginning. The problem remains that easy problems are easy to resolve, and hard problems are easy to pass on, but those mid-range problems are going to continue to be a challenge and guides like this are an emotional, social, and organizational support that make a big impact on how volunteers related to each other and feel about themselves. I am very happy with the pace and output of the Wikimedia Foundation’s investment in this space.

by bluerasberry at February 18, 2017 04:02 PM

Gerard Meijssen

#USA - The Eleanor Roosevelt Award for Human Rights

Just to confuse you; there are two awards by that name. This is about the award that was established in 1998 by the President of the United States Bill Clinton, honouring outstanding American promoters of rights in the United States. In 2010, Secretary of State Hillary Rodham Clinton revived the Eleanor Roosevelt Award for Human Rights and presented the award on behalf of President Obama.

For whatever reason, this award was not very much on the radar of Wikipedians because like with so many awards, it was not well maintained. There are only people for 2010 and there is one person, and he should be on the list for the "other" award. That award is conferred by "Jobs with Justice".

What happened after 2010? Several more years of a United States with President Obama and now, four weeks in the reign of the present incumbent, the sources for the award are gone. They were at a US Government website. Luckily the disappearance of Internet Sources is a well known phenomena, Wikipedians know how to deal with them. The question is if this award is deemed notable enough and there is the rub. It is not obvious what went missing and when the Wikipedia article is not complete, the removal of the data on the web serves its purpose.
Thanks,
        GerardM

by Gerard Meijssen (noreply@blogger.com) at February 18, 2017 08:43 AM

Shyamal

Naturalists in court and courtship

The Bombay Natural History Society offers an interesting case in the history of amateur science in India and there are many little stories hidden away that have not quite been written about, possibly due to the lack of publicly accessible archival material. Interestingly two of the founders of the BNHS were Indians and hardly anything has been written about them even in the pages of the Journal of the Bombay Natural History Society where lesser known British members have obituaries. I suspect that the lack of their obituaries can be traced to the political and social turmoil of the period. Even a major historical two-part piece by Salim Ali in 1978 makes no mention of the Indians involved in the founding of the BNHS. Both of the founders were connected with medicine and medical botany and connected to some of the other naturalists not so just because of their interest in plants but perhaps through their participation in social reform movements (markedly liberal). The only colleague who could have written their obituaries was the BNHS member Kahnhoba Ranchoddas Kirtikar who probably did not because of his conservative-views and a fall-out with the liberals. This is merely my suspicion and it arises from reading between the lines when I recently started to examine, create and upgrade the relevant entries on them on the English language Wikipedia. There are also some rather interesting connections.

Sakharam Arjun
Dr Sakharam Arjun (Raut) (1839-16 April 1885) - This medical doctor with an interest in botanical remedies was for sometime a teacher of botany at the Grant Medical College - but his name perhaps became more well known after a historic court case dealing with child marriage and women's rights, that of Dadaji vs. Rukhmabai. Rukhmabai had been married off at the age of 11 and stayed with her mother and step-father Sakharam Arjun. When she reached puberty, she was asked by Dadaji to join him. Rukhmabai refused and Sakharam Arjun supported her. It led to a series of court cases, the first of which was in Rukhmabai's favour. This rankled the Hindu patriots who believed that this was a display of the moral superiority of the English. The judge had in reality found fault with English law and had commented on the patriarchal and unfair system of marriage that had already been questioned back in England. A subsequent appeal was ruled in favour of Dadaji and Rukhmabai was ordered to go to his home or face six months in prison. Rukhmabai was in the meantime writing a series of articles in the Times of India under the pen-name of A Hindoo Lady (wish there was a nice online Indian newspapers archive) and she declared that she would rather take the maximal prison penalty. This led to further worries - with Queen Victoria and the Viceroy jumping into the fray. Max Müller commented on the case, while Behramji Malabari and Allan Octavian Hume (now retired from ornithology; there may be another connection as Sakharam Arjun seems to have been a member of the Theosophical Society, founded by Hume and others before he quit it) debated various aspects. Somewhat surprisingly Hume tended to being less radical about reforms than Malabari.

Dr Edith Pechey
Dr Rukhmabai
Dr Sakharam Arjun did not live to see the judgement, and he probably died early thanks to the stress it created. His step-daughter Rukhmabai became one of the earliest Indian women doctors and was supported in her cause by Dr Edith Pechey, another pioneering English woman doctor, who went on to marry H.M. Phipson. Phipson of course was a more famous founder of the BNHS. Rukhmabai's counsel included the lawyer J.D.Inverarity who was a big-game hunter and BNHS member. To add to the mess of BNHS members in court, there was (later Lt.-Col.) Kanhoba Ranchoddas Kirtikar (1850-9 May 1917), a student of Sakharam Arjun and like him interested in medicinal plants. Kirtikar however became a hostile witness in the Rukhmabai case, and supported Dadaji. Rukhmabai, in her writings as a Hindoo Lady, indicated her interest in studying medicine. Dr Pechey and others set up a fund for supporting her medical education in London. The whole case caused a tremendous upheaval in India with a division across multiple axes -  nationalists, reformists, conservatives, liberals, feminists, Indians, Europeans - everyone seems to have got into the debate. The conservative Indians believed that Rukhmabai's defiance of Hindu customs was the obvious result of a western influence.

J.D.Inverarity, Barrister
and Vice President of BNHS (1897-1923)
Counsel for Rukhmabai.
It is somewhat odd that the BNHS journal carries no obituary whatsoever to this Indian founding member. I suspect that the only one who may have been asked to write an obituary would have been Kirtikar and he may have refused to write given his stance in court. Another of Sakharam Arjun's students was a Gujarati botanist named Jayakrishna Indraji who perhaps wrote India's first non-English botanical treatise (at least the first that seems to have been based on modern scientific tradition). Indraji seems to be rather sadly largely forgotten except in some pockets of Kutch, in Bhuj. I recently discovered that the organization GUIDE in Bhuj have tried to bring back Indraji into modern attention.

Atmaram Pandurang
The other Indian founder of the BNHS was Dr Atmaram Pandurang Tarkhadkar (1823-1898)- This medical doctor was a founder of the Prarthana Samaj in 1867 in Bombay. He and his theistic reform movement were deeply involved in the Age of Consent debates raised by the Rukhmabai case. His organization seems to have taken Max Muller's suggestion that the ills of society could not be cured by laws but by education and social reform. If Sakharam Arjun is not known enough, even lesser is known of Atmaram Pandurang (at least online!) but one can find another natural history connection here - his youngest daughter - Annapurna "Ana" Turkhud tutored Rabindranath Tagore in English and the latter was smitten. Tagore wrote several poems to her where she is referred to as "Nalini". Ana however married Harold Littledale (3 October 1853-11 May 1930), professor of history and English literature, later principal of the Baroda College (Moreshwar Atmaram Turkhud, Ana's older brother, was a vice-principal at Rajkumar College Baroda - another early natural history hub), and if you remember an earlier post where his name occurs - Littledale was the only person from the educational circle to contribute to Allan Octavian Hume's notes on birds! Littledale also documented bird trapping techniques in Gujarat. Sadly, Ana did not live very long and died in her thirties in Edinburgh somewhere around 1891.

It would appear that many others in the legal profession were associated with natural history - we have already seen the case of Courtenay Ilbert, who founded the Simla Natural History Society in 1885. Ilbert lived at Chapslee House in Simla - now still a carefully maintained heritage home (that I had the fortune of visiting recently) owned by the kin of Maharaja Ranjit Singh. Ilbert was involved with the eponymous Ilbert Bill which allowed Indian judges to pass resolutions on cases involving Europeans - a step forward in equality that also led to rancour. Other law professionals in the BNHS - included Sir Norman A. Macleod and  S. M. Robinson. We know that at least a few marriages were mediated by associations with the BNHS and these include - Norman Boyd Kinnear married a relative of Walter Samuel Millard (the man who kindly showed a child named Salim Ali around the BNHS); R.C. Morris married Heather, daughter of Angus Kinloch (another BNHS member who lived near Longwood Shola, Kotagiri) - and even before the BNHS, there were other naturalists connected by marriage - Brian Hodgson's brother William was married to Mary Rosa the sister of S.R. Tickell (of Tickell's flowerpecker fame); Sir Walter Elliot (of Anathana fame) was married to Maria Dorothea Hunter Blair while her sister Jane Anne Eliza Hunter Blair was married to Philip Sclater, a leading figure in zoology. The project that led to the Fauna of British India was promoted by Sclater and Jerdon (a good friend of Elliot) - these little family ties may have provided additional impetus.


Someone in London asked me in 2014 if I had heard of an India-born naturalist named E.K. Robinson. At that time I did not know of him but it turned out that Edward Kay Robinson (1857?-1928) born in Naini Tal was the founder of the British (Empire) Naturalists' Association. He fostered a young and promising journalist who would later dedicate a work to him - To E.K.R. from R.K. - Rudyard Kipling. Now E.K.R. had an older brother named Phil Robinson who was also in the newspaper line - and became famous for his brand of Anglo-Indian nature writing - a style that was more prominently demonstrated by E.H. Aitken (Eha) of the BNHS. Now Phil - Philip Stewart Robinson - despite the books he wrote like In my Indian Garden and Noah's ark, or, "Mornings in the zoo." Being a contribution to the study of unnatural history is not a well-known name in Indian natural history writing. One reason for his works being unknown may be the infamy that Phil achieved from affairs aboard ships between India and England that led to a scandalous divorce case and bankruptcy.

by Shyamal L. (noreply@blogger.com) at February 18, 2017 07:19 AM

February 17, 2017

Wikimedia Tech Blog

Node 6 at Wikimedia: Stability and substantial memory savings

Photo by JJ Harrison, CC BY-SA 3.0.

Albatrosses are highly efficient in the air, able to travel hundreds of miles across the ocean in a single push, and live for up to 60 years. Node 6, the new Long Term Support release, is achieving slightly more modest feats of efficiency and reliability. Photo by JJ Harrison, CC BY-SA 3.0.

Over the last years, Wikimedia engineers have built significant Node.js services to complement the venerable MediaWiki wiki platform implemented in PHP.  Over 10 such services are deployed in our production environment, powering features like VisualEditor, scientific formulae rendering, rich maps display and the REST content API. Individual services are built and owned by specific teams on top of the overall Node.js platform, which is maintained by the Wikimedia Services team. Recently, we upgraded our infrastructure to Node.js v6. This blog post describes how we did it, and what we learned in the process.

Upgrade to Node 6

In large production environments, it is vital that you can trust the stability and security of your infrastructure. While early upgrades (starting with 0.8) had a fair amount of API churn and regressions, Node has come a long way with each subsequent release thanks to the hard work and maturity of the community. The first LTS (long term support) release (Node 4) was the best so far, from our perspective. Naturally, we were curious to find out whether this trend would continue for the first upgrade between official LTS releases.

Looking at the changelog, the most exciting changes in Node 6 LTS looked to be v5.1 of the v8 JavaScript engine, which (in addition to 5.0 changes) offers various language improvements and now supporting over 90% of ES6 features. We are especially excited about the incorporation of several components of the new “Orinoco” garbage collector and the native integration with Chrome’s debugger.

Before we roll out a new version to production, we naturally need to run thorough tests to catch regressions, ascertain compatibility, and characterize expected performance changes. Node 6 was no exception.

Compatibility

The first step was to assess the effort needed by our developers to port the services to Node 6. We found that Node v6 provides the highest level of compatibility of all Node releases we have used thus far. For all but one service, all that needed to be done was to recompile the services’ binary node module dependencies.

Performance

At scale, performance regressions will be noticed quickly by users, and can easily translate to outages. Cost is also a consideration, as our highest traffic node services run on clusters with 16 – 24 servers per datacenter. Performance testing is thus a mandatory part of upgrade preparations.

As a first pass, we ran throughput tests on our most performance-sensitive services. For the Parsoid wikitext processing service, we saw median latency for converting a specific page to HTML drop by about 5%, and (more remarkably) peak memory consumption drop by 30%. Similarly, RESTBase saw a 6% throughput improvement. There were no performance regressions, so we felt comfortable proceeding to longer-term functional testing in staging.

promise-micro-benchmark

Native Promises still slow

All our services are using Promises to organize asynchronous computations. In previous profiling, we identified V8’s native Promise implementation as responsible for significant portions of overall service CPU time & allocations, and have since used the highly optimized Bluebird implementation for our Promise needs. Curious about Node 6 changes, we re-ran a micro benchmark based on an asynchronous stream transformation pipeline, and found that native Promise performance is still around six times slower than Bluebird. Looking ahead, the V8 team has made some inroads in Node 7.5.0 (with V8 5.3), but is still trailing Bluebird significantly. In a brief exchange we had with the V8 team on this topic last year, they mentioned that their latest efforts are towards moving Promises to C++, and this work has since made it into the tree. We will revisit this benchmark once that code is available in Node.

Staging environment

With the confidence provided by compatibility and performance benchmarking, we proceeded at thoroughly testing each of them, first locally and then left them running in our staging environment for a while. This environment is set up to closely mirror our production site allowing us to monitor the services and look out for regressions in a more realistic manner. As none were spotted, we decided to upgrade the clusters running Node.js services one by one over the course of several days at a rate of one per day.

Deployment

In our production environment, we have several clusters on which all of these services are running; most are homogeneous and host only one service per cluster, while one is designated as the miscellaneous cluster hosting 9 stateless Node services.

Thanks to the good compatibility between Node v4 and Node v6, no code changes were needed. Right after upgrading the services, we saw an instant decrease in memory consumption of up to 50%. We routinely measure heap usage for all services using service-runner—a generic service supervisor library in which we have abstracted common services’ needs, such as child processes monitoring, configuration management, logging, metrics collection and rate limiting. The following figure shows the amount of memory consumed per worker for the Change Propagation service.

node6-changeprop

The green line shows the amount of peak RSS memory consumed by individual workers. On the left-hand side of the graph (until around 23:00 UTC), we have the memory consumption while running on Node v4.6.9; the average peak memory fluctuates around 800 MB. After switching the service to Node v6.9.1, one can see the maximum amount of memory being halved to approximately 400MB. That steady amount of memory implies that workers are respawned less often, which contributes directly to the service’s performance.

Conclusion

Node 6 really delivered on stability and performance, setting a new benchmark for future releases. During the upgrade we encountered no compatibility issues, and found no performance regressions to speak of. The V8 Orinoco garbage collector work has yielded impressive memory savings of up to 50%. As a consequence, strong heap pressure has become rare, and latency for our most demanding services has become more predictable.

Combined with our shared library infrastructure and deployment processes, we are in a good spot with our Node platform. This lets our engineers focus on delivering reliable features for users, and minimizes time spent on unexpected issues.

Marko Obrovac, Senior Software Engineer 
Petr Pchelko, Software Engineer
Gabriel Wicke, Principal Software Engineer
Services team, Wikimedia Foundation

by Marko Obrovac, Petr Pchelko and Gabriel Wicke at February 17, 2017 04:56 PM

Weekly OSM

weeklyOSM 343

02/07/2017-02/13/2017

Logo

Mennonites in Bolivia start mapping with local knowledge 1 |

Mapping

  • The German forum points out (automatic translation) that glass recycling containers for bottles and jars should be tagged as recycling:glass_bottles=yes only, since the simpler key recycling:glass=* includes types of glass which are not accepted.
  • [1] The Mennonite community of Colonia Pinondi in the Bolivian lowlands has started mapping its municipality in OSM with local knowledge. (Via OSM Cochabamba). We think it is a good example for Mennonite communities all over the world. 😉
  • Martijn reports about his visit to an OSMTime event in Cluj, Romania. Beata Jancso from Telenav organises OSMTime, a mapping event, once a month.
  • Arun from the Mapbox data team exhibits the mapping of a street as an area. Specialized maps could get even better rendering results.
  • Jherome Miguel asks for comments on his suggestion of the extended tagging of power poles.
  • Joost Schouppe asks on the Tagging mailing list about how to tag knotted willows and search for English terms.
  • Editing a single node with Level0, built a transcontinental highway from Europe to Brazil.
  • Pascal Neis’ Suspicious Changesets now also displays rough-cut views of the content of suspicious changes. It does that using the Augmented Diffs of the Overpass API (Source: Twitter)

Community

  • Christian Bittner wrote an article on OSM in Israel and Palestine. He notes, among other things, that the OSM data and the maps derived from it are of particular importance in this political conflict (keyword: “ground truth”). Since it’s not an open access-publication, Christian shares PDFs upon request by email.
  • Ediyes shares her experience in mapping her hometown of Andahuaylas where she improved existing streets, added names, points of interest, parks, etc.
  • Peter Barth summarizes the results of the last Google Summer of Code and invites students to take part this year. If you are interested, visit the ideas page in the OSM wiki.
  • Martin Dittus (dekstop) presents the results of his research on how feedback on tasking manager affects the further activity of newcomers. Unfortunately, he does not disclose the rules by which he has classified the individual personal messages.

OpenStreetMap Foundation

  • On the 12th February several OSMF operated services got new SSL certificates by Let’s Encrypt. Now you also get a valid certificate if you request foobar.osm.org instead of foobar.openstreetmap.org. (source: Chef repository of OWG)

Humanitarian OSM

  • The Irish MapLesotho group is again visiting Lesotho and there is a report about their work on Twitter too.
  • HOT reports about their new mapping projects that have started in Istanbul and Uganda and is increasingly focusing on crowdsourcing to make high-quality maps available for these regions.
  • LearnOSM has been updated with some interesting sections which include a guide for anyone to organise a Mapathon, tips aimed at new mappers which include an introduction to iD and tasking manager, and much more.
  • MakingMalariaHistory reports on a joint project to fight Malaria. Everyone with an OpenStreetMap account is asked to help.

Maps

  • The LA Times published their code to generate web based maps using Mapzen’s Tangram.

Open Data

  • The Landsat program has a Twitter bot uploading images. Learn more about it.
  • OpenDataSoft published a post discussing the nature and representation of public transport data from heterogeneous sources. French open data providers are listed, and both static and dynamic feeds are aggregated on a single map.

Programming

  • Ilya Zverev from MAPS.ME submitted a pull request which adds MAPS.ME as an authentication provider for the OpenStreetMap website, i.e. users can log in with their MAPS.ME account.
  • Peter Liu from Mapbox, explains how he used satellite imagery and height data to generate an interactive 3D map using Three.js. He plans to make a similar demo with Unity next.
  • The behavior of osm2pgsql when no database is specified using the -d option will change in the next release. Up to now, the default name is gis.

Releases

Software Version Release date Comment
PostGIS 2.3.2 2017-01-31 Bugfix release.
iD 2.1.2 2017-02-07 Bugfix release.
Mapillary iOS * 4.6.5 2017-02-07 Notification when entering an area that needs to be mapped.
PostgreSQL 9.6.2 2017-02-09 This release contains a variety of fixes from 9.6.1.
OpenLayers 4.0.0 2017-02-10 Enhancements and fixes from 107 pull requests since the previous release.
Traccar Server 3.10 2017-02-11 No Info.
Mapillary Android * 3.27 2017-02-13 Bugfix release.

Provided by the OSM Software Watchlist. (*) unfree software. See: freesoftware.

OSM in the media

  • Faultlines, black holes and glaciers: mapping uncharted territories, an interesting article by Lois Parshley where he talks about how certain areas in the world still remain a mystery. HOT is mentioned in the essay.

Other “geo” things

  • Increasing usage of smartphones as navigation aids appears (automatic translation) to impair sales for manufacturers of navigation systems.
  • Sebastian Meier used a laser cutter and moss to generate a nice map of Berlin.
  • Jurij Stare created a map of the light pollution in the world.
  • Mapbox joined 96 companies in filing the Brief of Technology Companies and other Businesses as Amicus Curiae in Support of Appellees in the Ninth Circuit Court of Appeals, opposing the Executive Order barring entry from and revoking visas from citizens of seven Muslim-majority countries.
  • Sebastian Meyer published a script for automatically transforming geojson boundaries into rectangles.

Upcoming Events

Where What When Country
Karlsruhe Hack Weekend 02/18/2017-02/19/2017 germany
Sucre Charla sobre OSM para chicas del HackLab Sucre 02/18/2017-02/19/2017 bolivia
Seattle Mapping UW Sidewalks 02/19/2017 united states
Bonn Bonner Stammtisch 02/21/2017 germany
Scotland Edinburgh 02/21/2017 uk
Derby Derby Pub Meetup 02/21/2017 uk
Lüneburg Mappertreffen Lüneburg 02/21/2017 germany
Brussels Bar meetup 02/21/2017 belgium
Viersen OSM Stammtisch Viersen 02/21/2017 germany
Montpellier Rencontre mensuelle 02/22/2017 france
Wyoming Humanitarian Mapathon University of Wyoming, Laramie 02/22/2017 us
Urspring Stammtisch Ulmer Alb 02/23/2017 germany
Lübeck Lübecker Mappertreffen 02/23/2017 germany
Paris Rencontre mensuelle 02/23/2017 france
Zaragoza Mapeado Colaborativo 02/24/2017 spain
Cardiff OpenDataCamp UK 02/25/2017-02/26/2017 wales
Bremen Bremer Mappertreffen 02/27/2017 germany
Graz Stammtisch Graz 02/27/2017 austria
Stuttgart Stuttgarter Stammtisch 03/01/2017 germany
Montreal Les Mercredis cartographie 03/01/2017 canada
Dresden Stammtisch 03/02/2017 germany
Colorado Springs Humanitarian Mapathon University of Northern Colorado, Greeley 03/02/2017 us
Helsinki Monthly Missing Maps mapathon at Finnish Red Cross HQ 03/03/2017 finland
Lyon Stand OSM Salon Primevère 03/03/2017-03/05/2017 france
Minsk byGIS Conference 03/04/2017 belarus
Graz Map the Change @ Elevate 03/04/2017 austria
Dortmund Stammtisch 03/05/2017 germany
Amagasaki International Open Data Day 2017 in Amagasaki ~そのだ、マッパーになろう~ 03/05/2017 japan
Rostock Rostocker Treffen 03/07/2017 germany
Passau FOSSGIS 2017 03/22/2017-03/25/2017 germany
Avignon State of the Map France 2017 06/02/2017-06/04/2017 france
Kampala State of the Map Africa 2017 07/08/2017-07/10/2017 uganda
Aizu-wakamatsu Shi State of the Map 2017 08/18/2017-08/20/2017 japan
Buenos Aires FOSS4G+SOTM Argentina 2017 10/23/2017-10/28/2017 argentina
Lima State of the Map – LatAm 2017 11/29/2017-12/02/2017 perú

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropriate.

This weeklyOSM was produced by Hakuch, Nakaner, Peda, Polyglot, Softgrow, Spec80, derFred, jinalfoflia, keithonearth, kreuzschnabel, roirobo, vsandre.

by weeklyteam at February 17, 2017 04:47 PM

Jeroen De Dauw

Why Every Single Argument of Dan North is Wrong

Alternative title: Dan North, the Straw Man That Put His Head in His Ass.

This blog post is a reply to Dans presentation Why Every Element of SOLID is Wrong. It is crammed full with straw man argumentation in which he misinterprets what the SOLID principles are about. After refuting each principle he proposes an alternative, typically a well-accepted non-SOLID principle that does not contradict SOLID. If you are not that familiar with the SOLID principles and cannot spot the bullshit in his presentation, this blog post is for you. The same goes if you enjoy bullshit being pointed out and broken down.

What follows are screenshots of select slides with comments on them underneath.

Dan starts by asking “What is a single responsibility anyway”. Perhaps he should have figured that out before giving a presentation about how it is wrong.

A short (non-comprehensive) description of the principle: systems change for various different reasons. Perhaps a database expert changes the database schema for performance reasons, perhaps a User Interface person is reorganizing the layout of a web page, perhaps a developer changes business logic. What the Single Responsibility Principle says is that ideally changes for such disparate reasons do not affect the same code. If they did, different people would get in each others way. Possibly worse still, if the concerns are mixed together, and you want to change some UI code, suddenly you need to deal with, and thus understand, the business logic and database code.

How can we predict what is going to change? Clearly you can’t, and this is simply not needed to follow the Single Responsibility Principle or to get value out of it.

Write simple code… no shit. One of the best ways to write simple code is to separate concerns. You can be needlessly vague about it and simply state “write simple code”. I’m going to label this Dan North’s Pointlessly Vague Principle. Congratulations sir.

The idea behind the Open Closed Principle is not that complicated. To partially quote the first line on the Wikipedia Page (my emphasis):

… such an entity can allow its behaviour to be extended without modifying its source code.

In other words, when you ADD behavior, you should not have to change existing code. This is very nice, since you can add new functionality without having to rewrite old code. Contrast this to shotgun surgery, where to make an addition, you need to modify existing code at various places in the codebase.

In practice, you cannot gain full adherence to this principle, and you will have places where you will need to modify existing code. Full adherence to the principle is not the point. Like with all engineering principles, they are guidelines which live in a complex world of trade offs. Knowing these guidelines is very useful.

Clearly it’s a bad idea to leave in place code that is wrong after a requirement change. That’s not what this principle is about.

Another very informative “simple code is a good thing” slide.

To be honest, I’m not entirely sure what Dan is getting at with his “is-a, has-a” vs “acts-like-a, can-be-used-as-a”. It does make me think of the Interface Segregation Principle, which, coincidentally, is the next principle he misinterprets.

The remainder of this slide is about the “favor compositions about inheritance” principle. This is really good advice, which has been well-accepted in professional circles for a long time. This principle is about code sharing, which is generally better done via composition than inheritance (the later creates very strong coupling). In the last big application I wrote I have several 100s of classes and less than a handful inherit concrete code. Inheritance has a use completely different from code reuse: sub-typing and polymorphism. I won’t go into detail about those here, and will just say that this is at the core of what Object Orientation is about, and that even in the application I mentioned, this is used all over, making the Liskov Substitution Principle very relevant.

Here Dan is slamming the principle for being too obvious? Really?

“Design small , role-based classes”. Here Dan changed “interfaces” into “classes”. Which results in a line that makes me think of the Single Responsibility Principle. More importantly, there is a misunderstanding about the meaning of the word “interface” here. This principle is about the abstract concept of an interface, not the language construct that you find in some programming languages such as Java and PHP. A class forms an interface. This principle applies to OO languages that do not have an interface keyword such as Python and even to those that do not have a class keyword such as Lua.

If you follow the Interface Segregation Principle and create interfaces designed for specific clients, it becomes much easier to construct or invoke those clients. You won’t have to provide additional dependencies that your client does not actually care about. In addition, if you are doing something with those extra dependencies, you know this client will not be affected.

This is a bit bizarre. The definition Dan provides is good enough, even though it is incomplete, which can be excused by it being a slide. From the slide it’s clear that the Dependency Inversion Principle is about dependencies (who would have guessed) and coupling. The next slide is about how reuse is overrated. As we’ve already established, this is not what the principle is about.

As to the DIP leading to DI frameworks that you than depend on… this is like saying that if you eat food you might eat non-nutritious food which is not healthy. The fix here is to not eat non-nutritious food, it is not to reject food altogether. Remember the application I mentioned? It uses dependency injection all the way, without using any framework or magic. In fact, 95% of the code does not bind to the web-framework used due to adherence to the Dependency Inversion Principle. (Read more about this application)

That attitude explains a lot about the preceding slides.

Yeah, please do write simple code. The SOLID principles and many others can help you with this difficult task. There is a lot of hard-won knowledge in our industry and many problems are well understood. Frivolously rejecting that knowledge with “I know better” is an act of supreme arrogance and ignorance.

I do hope this is the category Dan falls into, because the alternative of purposefully misleading people for personal profit (attention via controversy) rustles my jimmies.

If you’re not familiar with the SOLID principles, I recommend you start by reading their associated Wikipedia pages. If you are like me, it will take you practice to truly understand the principles and their implications and to find out where they break down or should be superseded. Knowing about them and keeping an open mind is already a good start, which will likely lead you to many other interesting principles and practices.

by Jeroen at February 17, 2017 02:55 PM

Gerard Meijssen

#Sources: the Charles S. Johnson Award

Awards honour both the recipients, the organisation that confers it and often the person the award is named for. The Charles S. Johnson Award is named after Charles S. Johnson. He is notable; has his own Wikipedia articles. The organisation that confers it is notable; the Southern Sociological Society has its own Wikipedia article.

The awardees, well that is a problem because there is only a partial list of people who received the award. There are many gaps in the list and sources are available to indicate why they think someone received the award.

The Wikimedia blog has a post that mentions Lillian Smith. It mentions that she gave an acceptance speech at Fisk University in 1966. But there is no source for it. This does not mean that she did not receive the award, it just means that there is no source in the article.

When sources are provided to the Sociological Society, Mrs Smith will be connected to a list of other remarkable people. All notable in their own way. It is just a matter of connecting the dots.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at February 17, 2017 07:40 AM

February 16, 2017

Wikimedia Foundation

Digital archivist brings forgotten stories to light on Wikipedia

Photo by C. M. Stieglitz via the Library of Congress, public domain/CC0.

The Wikipedia article on author Lillian Smith’s novel Strange Fruit is one of Toncray’s favorite contributions to Wikipedia. Photo by C. M. Stieglitz via the Library of Congress, public domain/CC0.

“There are so many people who have sunk into obscurity and had their stories forgotten. We can’t put all of them on Wikipedia, but we can work on ensuring that we do what we can,” says Wikipedia editor Shalor Toncray.

A master’s student in the Library and Information Science program at Drexel University, Toncray has been editing Wikipedia as Tokyogirl79 for over a decade. Through her volunteer work with WikiProject Virginia and the Library of Virginia (LVA), she has helped bring Virginia history to life.

“One of the things that surprised me was how few Wikipedia articles there were for postbellum former slaves turned politicians,” Toncray says. To her, it is a reflection of the racist undertones she sees in the southern U.S. today. However, through Toncray’s editing, she has helped to remediate this problem and add narratives of those previously forgotten to Wikipedia.

“When I edit Wikipedia I feel like I’m helping to make a difference,” she told us. “I’ve learned of more interesting things via Wikipedia than I would have if I hadn’t started editing. I also have that lovely glow of knowing that I’m helping put out information, especially if it’s a topic that might not have otherwise been added.”

That sentiment is what led her to become an active editor on Wikipedia, eventually influencing her academic and professional pursuits. Thanks to Wikipedia, she became interested in digital archiving, the focus of her master’s program.

Originally, I just read Wikipedia and eventually began wondering why certain articles weren’t created or expanded,” she remarked. “After looking at a fairly obscure article I knew a little bit about … I realized that I could edit it just as easy as the next person—and that there was a distinct possibility that the article could languish on Wikipedia for months or years without being improved. So why not me?

 
Toncray even credits her editing experience with helping to land her volunteer position with the LVA, an archival institute dedicated to preserving “Virginia history, culture, and government”. Students in her field are encouraged to gain hands-on experience at local libraries. Given her background with Wikipedia, the LVA was a sound choice.

“The LVA was interested in expanding content on the site and adding sources (primary and secondary) to ensure accuracy,” she says. “I listed Wikipedia experience on my application, [and that] got me in the door.”

The institution’s large archive, as well as engagement in local activities, are what attracted Toncray to the LVA. One of their biggest goals is to improve the accuracy of articles on Wikipedia by cross-referencing sources to make sure that information on Wikipedia matches up.

Photograph courtesy of Shalor Toncray.

Photograph courtesy of Shalor Toncray.

Working with a library is especially fulfilling for Toncray because she can use her access to primary sources to help verify the accuracy of content on Wikipedia. “How many of us have become frustrated when we found that something was unavailable online in any format?,” she asks.

Another goal for the LVA is to let people know that the Library itself can be used as a source, either as a primary source, which links to the library’s archives, or as a secondary source, using material written by the library’s scholars. It also hopes to “ensure notable people are recognized for their accomplishments—something that fits well with Wikipedia’s own goals”, Toncray remarked.

Through her work with the LVA and GLAM, Toncray has gained a newfound appreciation for people who work with institutions and the pedagogical aspect of such work. Ultimately, she hopes to ensure that LVA has a Wikipedian-in-Residence in their facility or remains active in GLAM through edit-a-thons or other events.

In addition to her work with the LVA, Toncray is a volunteer administrator on Wikipedia, where she also helps guide new contributors and ensure that articles are well-sourced. Most of her work as an administrator focuses on cleaning up Wikipedia by way of speedy deletions.

One of her favorite contributions to Wikipedia is the article on the controversial novel Strange Fruit. “That’s an article I turned from a two sentence stub into what it is today. It was sort of surreal finding all of this coverage for the book—it was a stage play, an Academy Award nominated short film (loosely based on the book and the song) … it was even banned from being mailed at one point in time,” she said.

And her thoughts for new editors looking to emulate her contributions? “It’s not going to be easy, but it’s worth it.”

Katie Koerper, Public Relations Intern
Wikimedia Foundation

by Katie Koerper at February 16, 2017 07:54 PM

Wiki Education Foundation

Join us at AAAS!

This week, Wiki Education Foundation staff will attend the annual meeting of the American Association for the Advancement of Science (AAAS).

In 2016, we attended the meeting in Washington, D.C., where we met dozens of scientists across disciplines. Their enthusiasm for sharing knowledge with the public was clear. Wikipedia has a far reach to non-scientists, and some attendees told us improving its quality should be part of the 21st century job description for anyone who intends to inform the public.

In Wiki Ed’s Classroom Program, university instructors assign students to write Wikipedia articles. In place of a traditional writing assignment, students research course-related topics that are missing or underrepresented, and they synthesize the available literature to share through Wikipedia. After supporting thousands of students, we’ve proven this model brings high-quality academic information to wide audiences. Along the way, students hone their communication skills, learn how to discern between reliable and unreliable sources, and help combat fake news on the internet.

If you’re attending the meeting in Boston, you’ll find Director of Programs LiAnna Davis and Wikipedia Content Expert Ian Ramjohn in the exhibit hall. They’ll showcase student work in the sciences, give a tour of Wiki Ed’s suite of tools for running a Wikipedia assignment, and can discuss how writing Wikipedia can help achieve your student learning objectives.

On Saturday, February 18, LiAnna will join Greg Boustead of the Simons Foundation and Dario Taraborelli of the Wikimedia Foundation to dive deeper into how Wikipedia connects scientists and the public. Join their talk from 12:00–1:00pm in Room 207 of the Hynes Convention Center.

Also on Saturday, Cornell University instructors Mark A. Sarvary, Ashley Downs, and Kelee Pacion will be presenting a poster from 9:30 am to 4:30 pm on their experiences teaching with Wikipedia through our program.

If you’re not attending the conference but would like to join our programs to help the public access reliable information, email us at contact@wikiedu.org.

by Jami Mathewson at February 16, 2017 05:07 PM

Jeroen De Dauw

Implementing the Clean Architecture

Both Domain Driven Design and architectures such as the Clean Architecture and Hexagonal are often talked about. It’s hard to go to a conference on software development and not run into one of these topics. However it can be challenging to find good real-world examples. In this blog post I’ll introduce you to an application following the Clean Architecture and incorporating a lot of DDD patterns. The focus is on the key concepts of the Clean Architecture, and the most important lessons we learned implementing it.

The application

The real-world application we’ll be looking at is the Wikimedia Deutschland fundraising software. It is a PHP application written in 2016, replacing an older legacy system. While the application is written in PHP, the patterns followed are by and large language agnostic, and are thus relevant for anyone writing object orientated software.

I’ve outlined what the application is and why we replaced the legacy system in a blog post titled Rewriting the Wikimedia Deutschland fundraising. I recommend you have a look at least at its “The application” section, as it will give you a rough idea of the domain we’re dealing with.

A family of architectures

Architectures such as Hexagonal and the Clean Architecture are very similar. At their core, they are about separation of concerns. They decouple from mechanisms such as persistence and used frameworks and instead focus on the domain and high level policies. A nice short read on this topic is Unclebob’s blog post on the Clean Architecture. Another recommended post is Hexagonal != Layers, which explains that how just creating a bunch of layers is missing the point.

The Clean Architecture

cleanarchitecture

The arrows crossing the circle boundaries represent the allowed direction of dependencies. At the core is the domain. “Entities” here means Entities such as in Domain Driven Design, not to be confused by ORM entities. The domain is surrounded by a layer containing use cases (sometimes called interactors) that form an API that the outside world, such as a controller, can use to interact with the domain. The use cases themselves only bind to the domain and certain cross cutting concerns such as logging, and are devoid of binding to the web, the database and the framework.

class CancelDonationUseCase {
    private /* DonationRepository */ $repository;
    private /* Mailer */ $mailer;

    public function cancelDonation( CancelDonationRequest $r ): CancelDonationResponse {
        $this->validateRequest( $r );

        $donation = $this->repository->getDonationById( $r->getDonationId() );
        $donation->cancel();
        $this->repository->storeDonation( $donation );

        $this->sendConfirmationEmail( $donation );

        return new CancelDonationResponse( /* ... */ );
    }
}

In this example you can see how the UC for canceling a donation gets a request object, does some stuff, and then returns a response object. Both the request and response objects are specific to this UC and lack both domain and presentation mechanism binding. The stuff that is actually done is mainly interaction with the domain through Entities, Aggregates and Repositories.

$app->post(
    '/cancel-donation',
    function( Request $httpRequest ) use ( $factory ) {
        $requestModel = new CancelDonationRequest(
            $httpRequest->request->get( 'donation_id' ),
            $httpRequest->request->get( 'update_token' )
        );

        $useCase = $factory->newCancelDonationUseCase();
        $responseModel = $useCase->cancelDonation( $requestModel );

        $presenter = $factory->newNukeLaunchingResultPresenter();
        return new Response( $presenter->present( $responseModel ) );
    }
);

This is a typical way of invoking a UC. The framework we’re using is Silex, which calls the function we provided when the route matches. Inside this function we construct our framework agnostic request model and invoke the UC with it. Then we hand over the response model to a presenter to create the appropriate HTML or other such format. This is all the framework bound code we have for canceling donations. Even the presenter does not bind to the framework, though it does depend on Twig.

If you are familiar with Silex, you might already have noticed that we’re constructing our UC different than you might expect. We decided to go with our own top level factory, rather than using the dependency injection mechanism provided by Silex: Pimple. Our factory internally actually uses Pimple, though this is not visible from the outside. With this approach we gain a nicer access to service construction, since we can have a getLogger() method with LoggerInterface return type hint, rather than accessing $app['logger'] or some such, which forces us to bind to a string and leaves us without type hint.

use-case-list

This use case based approach makes it very easy to see what our system is capable off at a glance.

use-case-directory

And it makes it very easy to find where certain behavior is located, or to figure out where new behavior should be put.

All code in our src/ directory is framework independent, and all code binding to specific persistence mechanisms resides in src/DataAccess. The only framework bound code we have are our very slim “route handlers” (kinda like controllers), the web entry point and the Silex bootstrap.

For more information on The Clean Architecture I can recommend Robert C Martins NDC 2013 talk. If you watch it, you will hopefully notice how we slightly deviated from the UseCase structure like he presented it. This is due to PHP being an interpreted language, and thus does not need certain interfaces that are beneficial in compiled languages.

Lesson learned: bounded contexts

By and large we started with the donation related use cases and then moved on to the membership application related ones. At some point, we had a Donation entity/aggregate in our domain, and a bunch of value objects that it contained.

class Donation {
    private /* int|null */            $id
    private /* PersonalInfo|null */   $personalInfo
    /* ... */
}

class PersonalInfo {
    private /* PersonName */          $name
    private /* PhysicalAddress */     $address
    private /* string */              $emailAddress
}

As you can see, one of those value objects is PersonalInfo. Then we needed to add an entity for membership applications. Like donations, membership applications require a name, a physical address and an email address. Hence it was tempting to reuse our existing PersonalInfo class.

class MembershipApplication {
    private /* int|null */            $id
    private /* PersonalInfo|null */   $personalInfo
    /* ... */
}

Luckily a complication made us realize that going down this path was not a good idea. This complication was that membership applications also have a phone number and an optional date of birth. We could have forced code sharing by doing something hacky like adding new optional fields to PersonalInfo, or by creating a MorePersonalInfo derivative.

Approaches such as these, while resulting in some code sharing, also result in creating binding between Donation and MembershipApplication. That’s not good, as those two entities don’t have anything to do with each other. Sharing what happens to be the same at present is simply not a good idea. Just imagine that we did not have the phone number and date of birth in our first version, and then needed to add them. We’d either end up with one of those hacky solutions, or need to refactor code that has nothing to do (apart from the bad coupling) with what we want to modify.

What we did is renaming PersonalInfo to Donor and introduce a new Applicant class.

class Donor {
    private /* PersonName */          $name
    private /* PhysicalAddress */     $address
    private /* string */              $emailAddress
}

class Applicant {
    private /* PersonName */          $name
    private /* PhysicalAddress */     $address
    private /* EmailAddress */        $email
    private /* PhoneNumber */         $phone
    private /* DateTime|null */       $dateOfBirth
}

These names are better since they are about the domain (see ubiquitous language) rather than some technical terms we needed to come up with.

Amongst other things, this rename made us realize that we where missing some explicit boundaries in our application. The donation related code and the membership application related code where mostly independent from each other, and we agreed this was a good thing. To make it more clear that this is the case and highlight violations of that rule, we decided to reorganize our code to follow the strategic DDD pattern of Bounded Contexts.

contexts-directory

This mainly consisted out of reorganizing our directory and namespace structure, and a few instances of splitting some code that should not have been bound together.

Based on this we created a new diagram to reflect the high level structure of our application. This diagram, and a version with just one context, are available for use under CC-0.

Clean Architecture + Bounded Contexts

Lesson learned: validation

A big question we had near the start of our project was where to put validation code. Do we put it in the UCs, or in the controller-like code that calls the UCs?

One of the first UCs we added was the one for adding donations. This one has a request model that contains a lot of information, including the donor’s name, their email, their address, the payment method, payment amount, payment interval, etc. In our domain we had several value objects for representing parts of donations, such as the donor or the payment information.

class Donation {
    private /* int|null */            $id
    private /* Donor|null */          $donor
    private /* DonationPayment */     $payment
    /* ... */
}

class Donor {
    private /* PersonName */          $name
    private /* PhysicalAddress */     $address
    private /* string */              $emailAddress
}

Since we did not want to have one object with two dozen fields, and did not want to duplicate code, we used the value objects from our domain in the request model.

class AddDonationRequest {
    private /* Donor|null */          $donor
    private /* DonationPayment */     $payment
    /* ... */
}

If you’ve been paying attention, you’ll have realized that this approach violates one of the earlier outlined rules: nothing outside the UC layer is supposed to access anything from the domain. If value objects from the domain are exposed to whatever constructs the request model, i.e. a controller, this rule is violated. Loose from the this abstract objection, we got into real trouble by doing this.

Since we started doing validation in our UCs, this usage of objects from the domain in the request necessarily forced those objects to allow invalid values. For instance, if we’re validating the validity of an email address in the UC (or a service used by the UC), then the request model cannot use an EmailAddress which does sanity checks in its constructor.

We thus refactored our code to avoid using any of our domain objects in the request models (and response models), so that those objects could contain basic safeguards.

We made a similar change by altering which objects get validated. At the start of our project we created a number of validators that worked on objects from the domain. For instance a DonationValidator working with the Donation Entity. This DonationValidator would then be used by the AddDonationUseCase. This is not a good idea, since the validation that needs to happen depends on the context. In the AddDonationUseCase certain restrictions apply that don’t always hold for donations. Hence having a general looking DonationValidator is misleading. What we ended up doing instead is having validation code specific to the UCs, be it as part of the UC, or when too complex, a separate validation service in the same namespace. In both cases the validation code would work on the request model, i.e. AddDonationRequest, and not bind to the domain.

After learning these two lessons, we had a nice approach for policy-based validation. That’s not all validation that needs to be done though. For instance, if you get a number via a web request, the framework will typically give it to you as a string, which might thus not be an actual number. As the request model is supposed to be presentation mechanism agnostic, certain validation, conversion and error handling needs to happen before constructing the request model and invoking the UC. This means that often you will have validation in two places: policy based validation in the UC, and presentation specific validation in your controllers or equivalent code. If you have a string to integer conversion, number parsing or something internationalization specific, in your UC, you almost certainly messed up.

Closing notes

You can find the Wikimedia Deutschland fundraising application on GitHub and see it running in production. Unfortunately the code of the old application is not available for comparison, as it is not public. If you have questions, you can leave a comment, or contact me. If you find an issue or want to contribute, you can create a pull request. If you are looking for my presentation on this topic, view the slides.

As a team we learned a lot during this project, and we set a number of firsts at Wikimedia Deutschland, or the wider Wikimedia movement for that matter. The new codebase is the cleanest non-trivial application we have, or that I know of in PHP world. It is fully tested, contains less than 5% framework bound code, has strong strategic separation between both contexts and layers, has roughly 5% data access specific code and has tests that can be run without any real setup. (I might write another blog post on how we designed our tests and testing environment.)

Many thanks for my colleagues Kai Nissen and Gabriel Birke for being pretty awesome during our rewrite project.

by Jeroen at February 16, 2017 01:46 PM

Gerard Meijssen

#Wikidata - Who is Ann Dale

Ann Dale won the Molson Prize in 2013. In a rare twist, the English Wikipedia does not list the winners of this award but other Wikipedias do. It was therefore easy to import the list from the German Wikipedia into Wikidata.

As there is a link to the website of the award, it was easy to include the more recent winners of the award. Most of the recent winners already had a Wikipedia article so it was easy to add them.

When you disambiguate for Ann Dale using Reasonator, there was a Ann Marie Dale. There was not much known for her except for her publications. Given that it was possible to find out that she worked for the Washington University in St. Louis, it was a miss. The information on the Molson Prize website provided the answer; it was a different Mrs Ann Dale.

The research by Mrs Dale is on governance, innovation and community vitality and is designed to provide useful knowledge to Canadian decision-makers. There might be something in her work that is of interest to the Wikimedia Foundation as well.

NB Mrs Ann Dale is now registered in Wikidata. More information is left to other interested souls.
Thanks,
      GerardM


by Gerard Meijssen (noreply@blogger.com) at February 16, 2017 09:26 AM

February 15, 2017

Wikimedia Foundation

Community digest: Norwegian-Armenian partnership to improve Wikipedia’s content on the two countries; news in brief

Photo by Astrid Carlsen, CC BY-SA 4.0.

Photo by Astrid Carlsen, CC BY-SA 4.0.

In November and December 2016, Wikimedia Armenia and Wikimedia Norway (Norge)—two independent chapters that work to advance the Wikimedia movement—held parallel editing contests in their home countries. The initiative encouraged Armenian editors to write about Norway and Norwegian editors to write about Armenia on their respective language Wikipedias; the winner in each country is getting a free trip to the country they wrote about as a reward.

The cross-border event is held to show the values of Wikimedia as an international movement. It was kindled when Tove Eivindsen, a former trustee of Wikimedia Norway (pictured above), visited Wikimedia Armenia’s office in 2015.

On the Norwegian Wikipedia, the contest focused on quality. Two active editors read the new articles, and nominated the top 10. Another committee, consisting of a representative from the National Archives in Norway, a nonfiction writer, and a journalist, reviewed the top 10 nominees and gave a final decision. Commentary on each of the top 10 articles was posted to the article discussion pages to inspire further improvement.

Wikipedian Telaneo took first place in this competition with his article about Levon Ter-Petrosian, the first president of Armenia. Telaneo is planning a visit to Armenia in the summer. In total, 65 new Norwegian-language articles have been created as part of the contest.

Fridtjof Nansen, Norwegian humanitarian and Nobel Peace Prize laureate, who spent much time trying to help Armenian refugees. He is sometimes seen as a symbol of fellowship between Norwegians and Armenians. As part of the contest, a photo collection was uploaded by the National Library about The Fridtjof Nansen trips to Armenia. Photo by Kozak, public domain.

Fridtjof Nansen, Norwegian humanitarian and Nobel Peace Prize laureate, who spent much time trying to help Armenian refugees. He is sometimes seen as a symbol of fellowship between Norwegians and Armenians. As part of the contest, a photo collection was uploaded by the National Library about The Fridtjof Nansen trips to Armenia. Photo by Kozak, public domain.

On the Armenian Wikipedia, 34 users participated in the contest, focusing on Norway’s geography, the history of the country, and celebrities; they created 766 articles. After the contest, a jury consisting of Wikimedia Armenia staff assessed and gave points to each article. The Armenian-language winner was Gardmanahay, given for editing the best articles about Norway. The winner announcement and award ceremony were part of the Wikipedia 16 celebration at the Wikimedia Armenia office in Yerevan in January.

We hope to continue our collaboration on cross-chapter work in the coming years!

Astrid Carlsen, Wikimedia Norway (Norge)
David Saroyan, Wikimedia Armenia

In brief

Open science fellows program in Germany: In September 2016, Wikimedia Germany, in collaboration with the Stifterverband, adopted an open science fellows program. The program supports ten selected scientists in different research areas. Scientists are provided with the needed funding and information to help each one of them carry out one project within six months. The program aims at making research more accessible and compatible with the standards of Open Science and open knowledge sharing. Research findings are being published on Wikimedia Germany’s blog.

Melanie Tietje, whose project concentrates on reproducibility, community feedback, and open publishing, has written about her research both in a German and in English. Like Tietje, did Nic Schmelling, who is working on the evolution of biological clocks, while other fellows are working on copyright law or media studies.

New York City hosts several editathons for Black History Month: Throughout February, the Wikipedia community in New York City is hosting several editing workshops, meetups, and other events. The main topic for the events is black history month. More details about the event venues and schedule are available on Wikipedia.

Editing for gender diversity on Wikipedia: This month WikiProject Women in Red is also celebrating the black history month by holding an online editathon with the theme of ‘black women in history’ that is lasting through the end of February. Wiki Loves Women, another project working on increasing gender diversity on Wikipedia, is joining the effort with two editing events, 16 African Women Translatathon, an event that aims at translating the biographies of 16 African women to as many languages as possible. The other event was an editing workshop on Women in Agriculture that took place last week.

Wikimania updates: Montreal is getting ready to host hundreds of Wikipedians for their annual conference of the Wikimedia movement next summer. However, this week is the last opportunity for applying to get a scholarship to attend this year’s conference. Deadline for scholarship applications is 20 February 2017 at 23:59 UTC. For prospective applicants, scholarship committee head Martin Rulsch told us that “it’s always useful in applications to be honest, to provide evidence for what you claim to have done, and not to over- or understate your efforts. Make it realistic and look around you how active other users are. It’s always good to have a plan what to do at Wikimania. It’s also important to take this application seriously, as only two sentences on how you report your activities may not be enough. If you have attended Wikimania before (in particular on a Wikimedia Foundation scholarship), you should explain how Wikimania helped you and your communities, and why/how you want to continue this service.” A public reviewer’s guide and FAQ page.

Moreover, the Wikimania committee is looking for new volunteer Wikipedians to join its members who will help putting together the program and schedule of the conference. More information on the requirements and how to express interest on Wikimedia-l.

Wiki Loves Music documents musical instruments on Wikipedia: On the German Wikipedia, user Gnom has started a new project that aims at documenting Wikipedia’s coverage of musical instruments, their history and collecting freely-licensed photos of them.

Wikimedia Taiwan celebrates ten years supporting Wikipedia: Last week, Wikimedia Taiwan, the independent chapter that supports Wikimedia in Taiwan celebrated their tenth anniversary. The Taiwanese chapter helps with Wikipedia student activities, gender diversity initiatives and many other online and offline projects.

New Signpost published: A new edition of the English Wikipedia’s community-written news journal was published last week. Stories included an interview with the Wikipedians behind WikiProject Birds; tips on making the most of Wikipedia editing workshops; Media and politics on Wikipedia; Technology; and Foundation updates.

Wikimedia Hong Kong becomes the first revoked Wikimedia chapter: Last week, the Wikimedia Affiliations Committee notified Wikimedia Hong Kong of the suspension of their status as a Wikimedia chapter for long standing non-compliance with reporting requirements. The chapter has a long movement history, including hosting Wikimania in 2013. More information on Wikimedia-l.

Compiled and edited by Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Astrid Carlsen, David Saroyan and Samir Elsharbaty at February 15, 2017 10:17 PM

Semantic MediaWiki

Semantic MediaWiki 2.4.6 released/en

Semantic MediaWiki 2.4.6 released/en


February 15, 2017

Semantic MediaWiki 2.4.6 (SMW 2.4.6) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides an enhancement for users of MySQL data stores to the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

by TranslateBot at February 15, 2017 08:04 PM

Semantic MediaWiki 2.4.6 released

Semantic MediaWiki 2.4.6 released
DeutschEnglish

February 15, 2017

Semantic MediaWiki 2.4.6 (SMW 2.4.6) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides an enhancement for users of MySQL data stores to the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

by Kghbln at February 15, 2017 08:03 PM

Pete Forsyth, Wiki Strategies

Wikipedia’s ban of Daily Mail exposes news publisher flaws

Who doesn’t love a good media feud?

As reported by the Guardian on February 8, the English language Wikipedia has (mostly) banned the Daily Mail as an acceptable source for citation, after declaring it “unreliable”. The report touched a nerve; the Mail swiftly issued a shrill and rambling retort, and the story was swiftly picked up in more than a dozen other publications.

But this feud more than a mere “pass the popcorn” moment. It’s also a learning opportunity, highlighting important dynamics in how media outlets function. The widespread interest in how Wikipedia evaluates its sources is welcome and overdue. In this post, I’ll consider:

  1. What’s the context of Wikipedia’s decision? What exactly was the decision, how was it made, and how binding is it?
  2. Was it the right decision?
  3. What insights does the media response to Wikipedia’s decision offer?

1. What Wikipedia decided, and how

The decision about the Daily Mail may be the first such decision to be widely reported, but Wikipedia editors routinely make decisions about the suitability of sources. We have to! Hundreds of thousands of volunteer editors write and maintain Wikipedia. Evaluating the relative merits of competing sources has been a central approach to resolving the inevitable disagreements that arise, throughout the site’s history. In fact, Wikipedia has a highly active discussion board devoted to the topic. A 2008 discussion about Huffington Post (with cogent arguments for both inclusion and exclusion, though it did not result a blanket decision one way or another) is just one of hundreds of discussions where Wikipedia editors weigh sources against the site’s criteria.

Journalist Noam Cohen, Wikimedia director Katherine Maher, journalism professor Tim Wu discussed Wikipedia’s reliability in January 2017. Photo by King of Hearts, licensed CC BY-SA.

With topics like “fake news” and “alternative facts” dominating recent headlines, much has been said about Wikipedia’s diligence in evaluating sources. The central role of human evaluation sets Wikipedia apart from other top web sites like Facebook and Twitter; and the relative transparency of Wikipedia’s process sets it apart from other top publishers. These distinctions have been highlighted in many venues, by prominent Wikipedians including former Wikimedia Foundation (WMF) trustee James Heilman, WMF director Katherine Maher, and Wikipedia cofounder Jimmy Wales.

The discussion, and the formal finding at its conclusion, are available for public review; click above for the details.

In the context of Wikipedia’s usual process, the Daily Mail decision was pretty unremarkable. A few dozen Wikipedians deliberated, and then five site administrators made the decision based on their understanding of the discussion, and its ties to Wikipedia policy and precedent.

Consensus has determined that the Daily Mail (including its online version, dailymail.co.uk) is generally unreliable, and its use as a reference is to be generally prohibited, especially when other more reliable sources exist. …

One angle seems under-reported: the decision was not based on a mere “majority rule” vote of partisan Wikipedians; it was (as always) an effort to determine the best path forward in light of what is publicly known.

Numerous independent evaluations of the Daily Mail’s diligence were considered by Wikipedia editors. That’s at the heart of how we work, in this and similar cases; we consider what reputable publications have had to say.

2. Wikipedia’s decision

Wikipedia editors made their ruling. Public domain image from Wikimedia Commons

Criticism of the Wikipedia decision boils down to two things:

  • The legitimacy of the process: Were there enough Wikipedia editors involved in the decision? Was the discussion open for long enough before a decision was made?
  • The reasoning of the decision: Was the deliberation thorough and sound? Was it infected with partisan bias, or did it ignore important facts?

To properly address the process questions would require a detailed breakdown of how Wikipedia decisions are made. I’ve taken on such questions elsewhere. Without getting into the details, consider: would you want an analysis of a U.S. Senate decision from somebody who knows little of the Senate’s parliamentary rules, or of the U.S. court system from somebody who’s never read a law journal? Views on Wikipedia’s process from arbitrary media commentators should be viewed with a skeptical eye.

As a longtime Wikipedia administrator, I’ll say this: The number of people involved and the length of time were entirely sufficient to satisfy Wikipedia’s requirements. Like nearly every Wikipedia decision, this one can be overturned in the future, if there’s reason to do so. The questions about Wikipedia’s process are without merit.

So, how diligent were those making the decision? Well, I’m not necessarily better qualified to answer that than any other Wikipedian, so I’m not here to give an overall endorsement or rebuttal; instead, I’d mainly encourage you to read the discussion and decision, and decide for yourself. But, here are a few observations:

It seems the Mail had been widely criticized long before Wikipedia took up the question.

I expect you’ll agree that those in the discussion considered a wide variety of evidence — much of it from independent media commentators. Perhaps, if you’re a careful media observer, you know of something they missed. Wikipedians tend to be receptive to new information; so the best thing to do, if you do have further information, is to present it for consideration. You could, for instance, start a new discussion. But before you do so, consider carefully: is your evidence truly likely to sway the decision? You’ll be asking a lot of volunteers for their attention; please exercise that option with appropriate caution.

You might also ask yourself whether there is evidence of partisan bias in what you read. I didn’t see any, but perhaps you disagree. Again, if it’s there, it’s worth pointing out. As a rule, Wikipedians don’t like the idea that politics might influence the site’s content any better than you do. If such a bias can be demonstrated (which is a lot more than a mere accusation), perhaps something can be done about it.

One point, raised in several venues since the decision, does sound out. To quote the Guardian’s Charles Arthur (archive link):

There’s … a distinction to be made between the [Mail’s] website, which churns stories at a colossal rate and doesn’t always check facts first, and the newspaper, which (in my experience, going up against its writers) does. The problem is that you can’t see which derives from which when all you do is go for the online one.

Andrew Orlowski of the Register noted the brand confusion among the Mail’s various properties, as well. A 2012 New Yorker profile delves deeper, offering useful background on the various brands within the Daily Mail brand.

3. The Guardian’s solid analysis, rooted in non-sequitur

The Daily Mail didn’t like the statement. Public domain image from Wikimedia Commons

The Daily Mail issued a rather amusing statement on the matter; there’s no substance to it, but curious readers may enjoy my line-by-line rebuttal.

But some of the coverage the episode sparked contained genuine insights.

The Guardian offered a solid followup piece, delving into many of the issues involved. But while the reporting was accurate and helpful, the Guardian inadvertently further illustrated the great gulf between traditional media and Wikipedia. Without explanation, the reporter declined to build the story around an interview with one of the decision-makers involved in the case.

Five people ultimately made Wikipedia’s decision about the Daily Mail; they would have made worthy sources for the Guardian story. If they weren’t available, there are more than 1,200 of us with the authority to make such a decision, who can therefore provide expert commentary on the matter. But instead, the Guardian centered its story on Wikimedia executive director Katherine Maher. Maher is of course aware of the various issues, and represented Wikipedia admirably. The choice to feature her in an interview, though, was roughly equivalent to seeking out the U.N. secretary general for comment on a domestic decision of the U.S. Supreme Court.

The Guardian story acknowledged the point in its second paragraph, but inexplicably chose to focus on Maher’s views anyway. What’s going on here?

To earn a comment from the top executive at a ~$100 million organization, it helps to have status, and it helps to have connections. The Guardian, of course, has both. It’s one of the world’s top news outlets, and Wikipedia co-founder Jimmy Wales (one of Maher’s bosses on the organization’s board of trustees) sits on the Guardian’s board.

To get a comment from one of 1,200+ volunteer administrators of the world’s most widely read source of original comment, though, takes good old fashioned legwork. You send a message, you pick up the phone, and if you don’t get a decent response, you go on to the next person. It’s not sexy, and it’s not always fun but if you stick to it, sooner or later you have a real basis for solid reporting.

Wikipedia, of course, is famous for its hordes of detail-oriented volunteers. If the media needed a clear demonstration that Wikipedia might just be better equipped for certain tasks than traditional publications like the Daily Mail or the Guardian, it need look no further.

by Pete Forsyth at February 15, 2017 06:29 PM

Wikimedia Foundation

WikiIndaba 2017: A continent gathers to chart a path forward

Video by Victor Grigas, CC BY-SA 3.0. You can also view it on Vimeo and Youtube.

The African continent supports about 1.1 billion people, nearly a seventh of the world’s population. As of 2014, 19% of the continent’s population used the internet, including ready access in fourteen of its major cities.

This means there are more Africans who read or have had interaction with Wikipedia than those who are contributing to the website.

This then raises several questions: why are Africans themselves not contributing to Wikipedia?  Who is contributing? What challenges are Wikipedia volunteers facing in the African continent? What are the successes and what can be learned from other African Wikimedians? And what needs to be done going forward?

Photo by Zachary McCune, CC BY-SA 4.0.

Bobby Shabangu, the author. Photo by Zachary McCune, CC BY-SA 4.0.

All these were questions raised at the second-ever WikiIndaba conference, held in Accra, Ghana from 20–22 January 2017. A regional conference of African Wikimedians, WikiIndaba is named for the Zulu tradition of gathering the Indunas (chiefs) to determine problems and find ways forward. 49 Wikimedians from 18 countries, including 13 African nations, attended.

At WikiIndaba, African Wikimedians, stakeholders, and non-Africans, all with an interest in the open movement, came together to identify solution to the challenges facing Wikipedia user groups and chapters on the African continent and its diaspora. This was in line with the conference’s theme, which was centered around three key areas:

  • Growth within the African continent
  • Building capacity among Africans
  • Connecting Africans both within and the diaspora.
Photo by Bayelharriet, CC BY-SA 4.0.

Photo by Bayelharriet, CC BY-SA 4.0.

The conference covered several diverging themes, from why Africa’s growth depends on cross collaborations of open movements, to how to choose partnerships and communicate with partners, to the potential of Wikidata.

Two sessions, however, stood out to me the most. One came from Peter Gallert, a university professor in Namibia who focused on the use of oral citations on Wikipedia. Peter noted that there is more to oral citations than what is captured on visual or audio recordings, and the recordings can be misinterpreted by those unfamiliar with the specific cultural context. This can go both ways, Peter noted, as the framing of questions is important. I was particularly fascinated by his example of interviewing a village elder, where Peter noted that something as simple as “what year was electricity brought to the village?” could be misunderstood, as “he might not be counting the way you are counting.” In this instance, one needs to ask about what chief was in power when electricity came.

The focus of Peter’s presentation was to show the stark difference between Western and African citations, how each can learn from the other, and to argue that Wikipedia needs to accommodate all kinds of knowledge sources. But when the problem is compounded by a culture of speedily deleted articles, the conference realised that this would be a difficult feat to accomplish.

Photo by Celestinesucess, CC BY-SA 4.0.

Photo by Celestinesucess, CC BY-SA 4.0.

The second session that stood out to me was Asaf Bartov’s presentation on conflict engagement, which focused on giving Wikimedians the necessary tools to deal with or handle conflict better. Asaf , who is a Senior Program Officer of emerging Wikimedia communities at the Wikimedia Foundation, argued that as humans we cannot avoid conflicts because it’s part of human behaviour. He made the example of an elephant and a mahout, the elephant being your emotions and the mahout being yourself. “As Wikipedians we must strive to control our emotions,” he said. “That’s what the mahout does to the elephant.” Wikimedians are people too; sometimes they lose their cool. When you are confronted by someone who deletes an article you’ve just created or reverts your edit, Asaf said, we should try to deal with the person on the merits of their action rather than personal attacks. Bring as many facts as you can to the table, and if you win your case, don’t crush that person’s ego by bragging about your win.

An emphasis of this presentation was that conflict can be prevented by clear and open communication. When one person frames what they want to say by clearly letting people know the subject topic of what they want to address, they then have to advocate their claim by supporting their statement, illustrate how their idea will work, and obtain other opinions on what they think of the idea. “If you don’t reach consensus,” Asaf said, “you can always solve a dispute by a vote, that’s the culture at Wikimedia.” As Asaf was presenting, I realized that all of this is not an easy thing to do, but it’s something that can be learned over time.

Photo by Zachary McCune, CC BY-SA 4.0.

Ghanaian Wikimedians. Photo by Zachary McCune, CC BY-SA 4.0.

The final part of the conference was open to questions from the floor, which helped to give an overall view of what participants were thinking about the conference and what they thought needed to be done going forward. For example, one point of concern was that there is not much representation from all the regions of African continent; more activation work needs to be done by participants in their respective countries to activate volunteers for better representation. Katherine Maher, the Wikimedia Foundation’s Executive Director, gave closing remarks thanking all who participated and the organising team. She ended with a congratulatory note to Tunisia, who will be the next host country for Indaba 2018.

Bobby Shabangu, Wikimedian

by Bobby Shabangu at February 15, 2017 05:50 PM

Gerard Meijssen

#Wikidata - recognition matters

It is not the first time that this blog features Raif Badawi. Ín 2015 he was awarded the Sakharov Prize and as the poster has it "CENSORED. JAILED. FLOGGED. BUT NOT FORGOTTEN."

He is not forgotten. In 2017 he received the Monismanien Prize. Raif was a blogger, what he had to say got him into trouble. Amnesty International recognises him as a prisoner of conscience.

It is important to register the recognition given to a few. They represent so many more people who are equally deserving. As we recognise awards, we recognise people and associate them with people that have been recognised in a similar way. It does not matter what for; it is how it works.

When recognition is given, it means something. Raif is recognised as a blogger.. others are reporters, doctors or Wikimedians.
Thanks,
     GerardM

by Gerard Meijssen (noreply@blogger.com) at February 15, 2017 03:57 PM

February 14, 2017

Wiki Education Foundation

Opening a can of bookworms

Mark A. Sarvary is a senior lecturer in the Department of Neurobiology and Behavior at Cornell University in Ithaca, NY. He is the director of the Investigative Biology Teaching Laboratories. You can read his blogs at www.investigativebiology.cornell.edu and follow him @cornellbiolabs.

When you ask students: “When was the last time you walked into a library to pick up a book?” they often respond with a blank stare. Books in a library? In the “Become a Wikipedian: Write for Wikipedia and contribute to the World’s understanding of biology” course at Cornell University, students were asked to go and find a book in the library, and use that book as a reference for their biology-related Wikipedia entry. There was mixed confusion and excitement in the air. Some groups walked out of the classroom and then stopped, wondering how to find a book in a library that they frequently visit to use the computers, to print their term papers, or simply to take a quick nap in one of the comfortable chairs.

Mark, far left, with his students.
Mark, far left, with his students.

“This is just like a treasure hunt,” one student said as she disappeared in the stacks of the 3rd floor of Mann library at Cornell University.

You may be thinking: “what are the benefits of finding a book in this modern age?” A scene from the first Matrix movie may answer that. Do you remember when Trinity and Neo decide to rescue Morpheus? Suddenly, shelves and shelves of ammunitions appear, and they stock up. The stacks in the library are just like that. You go and find a book but you rarely walk out with only one. These books are organized by topic, so when you locate one, you locate an arsenal of books on the same topic, expressing different opinions or the same information digested by different scientists.

These books, among other publications, become important resources for Wikipedia editors, or so called Wikipedians. Most students first go to Wikipedia to learn about a new phenomenon. Instead of discouraging this habit, it is better to educate students how to edit these Wikipedia entries, so readers can find reliable information when they open this modern, virtual encyclopedia. Wikipedia students at Cornell University decided to edit challenging topics, such as genetically modified sperm, osmoconformers, the fear-avoidance model in child development, and fusiform gyrus. While these students may still be taking introductory biology courses, they can become the experts of their topic of choice through collecting information from many resources, and share it with the World through Wikipedia. This is a special form of science communication that empowers the author, starts conversations with other Wikipedians all over the world, and engages readers who would like to learn more about this topic. Editing Wikipedia is a global level peer-review that students enjoy and benefit from.

The Wikipedia course was designed following the Wiki Education Foundation’s suggested template in the Dashboard. This tool attempts to standardize the curriculum regardless of subject discipline in which Wikipedia writing and editing is taught. One of the strengths of this course was the collaborative teaching effort by Ashley Downs and Kelee Pacion, two Cornell librarians, and a biology faculty member, Dr. Mark Sarvary.

What the instructors and the students took away from this course in the past three semesters will be presented at the American Association for the Advancement of Science (AAAS) meeting in Boston. The poster will be displayed in the Education category on Saturday, February 18, 9:30 am-4:30 pm.

If you cannot be at the AAAS meeting, but you are interested in joining this treasure hunt for knowledge and become a Wikipedian, start by following the inviting vanilla scent of books to the library.

If you’d like to learn more about how to incorporate Wikipedia into your course, visit teach.wikiedu.org or send us an email at contact@wikiedu.org.

by Guest Contributor at February 14, 2017 05:49 PM

Wikimedia Foundation

Quebec City mosque shooting: In the wake of tragedy, Wikipedia responds with facts

A street in Quebec City at night. Photo by Wladyslaw, CC BY-SA 3.0.

A Quebec City street at night. Photo by Wladyslaw, CC BY-SA 3.0.

“We mourn the death of an esteemed member of the faculty and the university, a devoted man beloved by his colleagues and students.”

This is how Denis Briere, the rector of Université Laval in Quebec, publicly mourned the loss of Khaled Belkacemi, a 60-year-old professor and a father of three children.

Belkacemi and five others were killed on January 29 when a gunman opened fire at a Quebec City mosque shortly after the Isha (evening prayers). They included Ibrahima Barry and Tanou Barry, two Guinean cousins who lived with their children in Canada and supported their families back home. Nineteen others were injured.

Many were shocked with the news of the attack, especially those that knew Quebec as a city that sees little crime and violence. The Canadian government condemned the attack, with Prime Minister Justin Trudeau describing it as a “despicable act of terror” that took the lives of “a group of innocents targeted for practicing their faith.”

Thousands of Canadians took to the streets in Quebec, Montreal and other cities to hold vigils, express their outrage over the attack, and show their solidarity with the Canadian Muslim community.

Only two hours after the shooting, a small article about it was created on the English Wikipedia by the user Rossbawse. In the next 24 hours, Wikipedians featured the attack in 14 languages, which received over 87,000 views. As of the publishing time, there are 21 articles in different languages on Wikipedia, which have received over 240,000 views.

Ahmed Al Elq, a Wikipedian from Saudi Arabia, wanted to add a snippet about the mass shooting to the Arabic Wikipedia’s “In the news” section on the main page, but he could find no article about it in Arabic—so he created it himself. “I have translated parts of the English Wikipedia article into Arabic and added some reliable sources to it,” says Al Elq. “Then, the page was expanded by other Wikipedia volunteers.”

Al Elq believes that documenting momentous news on Wikipedia is crucial: “Wikipedia articles are more like comprehensive reports. They can be categorized and easily accessed. Unlike media websites, where the news is posted in scattered places and pages are not easy to link to. Wikipedia articles are usually coherent and continuously updated.”

In the next few days after the incident, the Wikipedia page about it has been updated hundreds of times to reflect updates to the case and investigation findings.

“Shortly after the shootings, fake news surfaced from both sides of the spectrum: that the shooters were two French Canadians or that the shooters were Syrian refugees,” says Patar Knight, a Canadian editor and administrator on the English Wikipedia and one of the main contributors to the article. He explains:

“Even reliable sources weren’t immune from mistakes during this event. Police leaks to La Presse and Radio Canada (CBC) resulted in many outlets publishing that one of the suspects arrested that night was a Moroccan. By noon the next day, it turned out that, that “suspect” was actually a witness who had called 911 and had been administering first aid before being detained for fleeing from police after he mistook them for the gunman returning. Many outlets responsibly deleted/updated their articles to reflect the truth, but others did not, most notably Fox News, which left a tweet with the inaccurate information up for the better part of a day, until the Canadian Prime Minister’s Office formally complained.”

According to Newsweek, sharing fake news about this attack “jeopardized the trust in digital media”—in this case, it fueled negative perceptions of the sequence of events. Al Elq has created his article about the shooting on Wikipedia for many reasons but most importantly for “authenticity,” he says. “Some people question Wikipedia’s authenticity, [but it helps spread] the news with references and neutrality,” he explains.

Like Al Elq, Patar Knight joined the English article editors in order to “increase the quality… since it’s an important topic, and Wikipedia is in a good place to deal with the confusion around incidents like this one.”

In November 2015, when an even more brutal terrorist attack hit Paris, Patar Knight was one of the main contributors to the article about it. Though he finds editing such articles to be “pretty soul-crushing,” as the editor has to “sift through stories, some of which contain very grisly details,” he is comforted by his firm belief that Wikipedia plays an important role when it comes to breaking news:

“Wikipedia has the good fortune of being able to stay up to date as long as we have vigilant editors. Outdated and inaccurate information can be used to spread hate on all sides of the spectrum, so Wikipedia editors have the responsibility of getting the story right.”

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at February 14, 2017 05:43 PM

Gerard Meijssen

#Wikidata - Anne Ridley and #AcademiaNet

AcademiaNet is a database of excellent female scientists. It has identifiers for all of them and thanks to the excellent Mix'n'Match tool by Magnus many of these notable scientists are now part of Wikidata.

Anne Ridley is one of them. What AcademiaNet provides is much more information. It would be good when we confirm information using the AcademiaNet database and when they are so inclined, we could use their information to populate information about these notable women, making it even easier to write articles about them in Wikipedia.

Mrs Ridley is the first winner of the Hooke medal. She did receive additional awards..
Thanks,
     GerardM

by Gerard Meijssen (noreply@blogger.com) at February 14, 2017 02:25 PM

February 13, 2017

Wikimedia Foundation

Niels Christian Nielsen appointed to Wikimedia Endowment Advisory Board

Photo courtesy of Niels Christian Nielsen.

Photo courtesy of Niels Christian Nielsen.

Niels Christian Nielsen, an international leader in technology, research, and education, has been appointed to the Wikimedia Endowment Advisory Board as its fourth permanent member. He joins Jimmy Wales, the founder of Wikipedia; Annette Campbell-White, a successful venture capitalist and founder of MedVenture Associates; and Peter Baldwin, professor and co-founder of the Arcadia Foundation.

Niels will join the board that is entrusted with overseeing the Wikimedia Endowment, a permanent source of funding to ensure Wikipedia and the Wikimedia projects thrive for generations to come.

“Wikimedia is increasingly one of the most influential forces in the twenty-first century,” Nielsen says:

“It’s a global community of tens of thousands of contributors, and serves hundreds of millions of users globally. It offers an open model that enhances the commitment to free knowledge and verifiable fact. In a time of fake news, alternative facts and changing roles of technology platforms, the value of Wikimedia is indispensable. We need to ensure its longevity and vitality—its ability to innovate itself—for decades, centuries to come. I want to contribute my part towards that purpose.”

Niels serves or has served on the boards of some thirty companies, including twelve as chairman. These include both public and privately held companies around the world and in multiple sectors: IT, financial services, advanced manufacturing, media, professional services, edtech, biotech and venture capital.

“Niels Christian Nielsen has the kind of global connections and experience that mirror Wikipedia’s range and influence,” says current Endowment Board member Peter Baldwin. “His voice will be crucial to securing Wikipedia’s continuing presence.”

Earlier in his career, Niels was the founder and CEO of Catenas, a venture to create a globally leading professional services corporation. He was part of the team that, through a sequence of mergers, created the Danish Technological Institute (DTI), a technology consultancy and industrial research lab.

Niels has initiated a number of large strategic projects, involving multiple participants and public-private collaboration. He designed the Danish Network Cooperation program, which was subsequently adapted in a number of other countries, often with Niels as strategic advisor. He proposed the creation of the Learning Lab in Denmark and was a member of its Board and briefly its interim CEO, helping to contribute to innovation in education. As Managing Director of World Refugee School he plays a central role in this private sector initiative to leverage technology and bring scalable, affordable quality education to the world’s refugee children.

Niels has also been involved in Green Tech initiatives for decades through his work with the Danish Energy Research Council, the Copenhagen Climate Council (which he helped form), the Turkish government’s Sustainable Growth Strategy for South East Anatolia, and large-scale green urban developments.

Niels speaks frequently on business, and he has contributed to a number of books on network cooperation; the knowledge-based economy; leadership; competitiveness and green growth strategies; technology, services and organizational innovation; and education, including the Danish bestseller Verdens Bedste Uddannelsessystem (The World’s Best Education System). He has advised governments in Scandinavia, Spain, Portugal, the UK, Australia, New Zealand, the US, Canada, and Turkey.

A native of Denmark, Nielsen holds a Master of Arts degree in Philosophy. He is a visiting scholar at The University of California Berkeley and an Adjunct Professor at Copenhagen Business School.

Marc Brent, Endowment Director
Wikimedia Foundation

by Marc Brent at February 13, 2017 07:37 PM

Wikimedia Foundation releases first transparency report of 2017

Photo by Diego Delso, CC BY-SA 4.0.

Photo by Diego Delso, CC BY-SA 4.0.

The Wikimedia Foundation’s mission is to enable everybody to freely share knowledge with others around the world. We don’t create the projects: you do. Thousands of volunteers write and edit content, contribute images, and more. Sometimes, we get requests from private parties or governments to provide information about these volunteers, or to add, remove, or change content on the Wikimedia projects. We work hard to defend user privacy and the freedom to contribute. So if these requests are inappropriate, or don’t meet our standards, we push back.

Twice a year, we publish our Transparency Report, providing details about the number of requests we received, their types and countries of origin, and other information. The report also includes an FAQ and stories about interesting requests. This newly-issued report covers information about requests we received between July and December 2016.

We report on five types of requests:

Content alteration and takedown requests. Wikimedia content is created and controlled by the volunteers who disseminate free knowledge on the projects. When governments, individuals, or companies contact the Foundation in an attempt to change project content, we encourage these requesters to discuss their questions and concerns with experienced project editors. In the last six months of 2016, we received 187 requests to alter or remove project content, and did not grant any of them. Two of these requests came from government entities.

Copyright takedown requests. Most content on the Wikimedia projects is freely licensed or in the public domain. Volunteers are mindful of copyright law, and work hard to keep copyrighted content off the projects. Sometimes, however, we receive Digital Millennium Copyright Act (DMCA) notices from copyright holders who claim that their work has appeared on the projects without their permission. We then evaluate whether or not the material is protected by copyright, if it appears on the projects under a fair use or other copyright exception, and other factors. Between July 1 and December 31, 2016, we received 12 DMCA notices. We granted four of these. Due to the diligence of Wikimedia volunteers, we receive a strikingly low number of DMCA notices compared with many other web platforms.

Right to erasure. In the second half of 2016, the Wikimedia Foundation received one request to remove information from the projects based on the right to erasure, also called the right to be forgotten. We did not grant the request. The right to erasure was established in the European Union by a 2014 Court of Justice of the European Union decision. It allows individuals to request that search engines delist certain pages from search results for their name. The Foundation has previously expressed our concerns about the right to erasure. We believe in a right to remember that protects everyone’s ability to receive and share information. In October 2016, we petitioned to intervene in a case that could expand delisting from EU domains to all domains worldwide. Despite the fact that the Wikimedia projects are not search engines, we do receive a small number of requests to delete information from the projects that cite the right to erasure as the basis for removal.

Requests for user data. Sometimes governments, or private organizations or individuals, submit requests for information about Wikimedia users. In the last six months of 2016, we received 13 such requests for nonpublic user data. These include both informal requests and more formal process, such as court orders or warrants. We do not disclose user data in response to a request that does not follow our Requests for User Information Procedures and Guidelines (which includes a provision for emergency conditions, as noted below). The legal team carefully evaluates each request for compliance with our policies and applicable law. During this reporting period, we disclosed information in response to one request.

Emergency disclosures. Under extraordinary circumstances, the Foundation may provide information to authorities in order to protect a user or other individuals from serious harm. Sometimes, volunteers will notice something troubling on the projects, and bring it to our attention: for example, if they discover a bomb threat, or comments suggesting that a user plans to commit suicide. In these rare circumstances, we may voluntarily provide information to the proper authorities in order to bring about a peaceful resolution. Additionally, we provide an emergency request procedure for law enforcement to seek information that may prevent imminent harm. From July to December 2016, we voluntarily disclosed information in 17 cases, and provided data in response to two emergency requests.

Our biannual Transparency Report is a window into the types of requests that the Wikimedia Foundation receives from people, governments, and businesses around the world. The Wikimedia movement values transparency, privacy, and freedom of expression, and this report is one of the ways in which we affirm our commitment to those principles, and to defending and supporting the hard work of Wikimedia contributors. We will continue to resist efforts to discover nonpublic information about our users, and will support the communities’ prerogative to determine project content. We invite you to read the entire Transparency Report to learn more about these efforts, and to our blog posts on the last five transparency reports for historical information.

Jim Buatti, Associate Counsel
Aeryn Palmer, Legal Counsel

This transparency report would not have been possible without assistance from various individuals, including Michelle Paulson, Jacob Rogers, Siddharth Parmar, James Alexander, Rachel Stallman, Ana Maria Acosta, Allison Davenport, Leighanna Mixter, Tarun Krishnakumar, and the entire Wikimedia Communications team.

by Jim Buatti and Aeryn Palmer at February 13, 2017 06:46 PM

Nurunnaby Chowdhury Hasive, building the Bengali Wikipedia

Photo by Victor Grigas, CC BY-SA 3.0.

Photo by Victor Grigas, CC BY-SA 3.0.

“I love Wikipedia,” Nurunnaby Chowdhury Hasive says. “It’s the only encyclopedia where I can get information in different languages and add content in my language.”

Hasive has been editing Wikipedia since 2008. In the last nine years, he has tried to add something new to Wikipedia every day, and has visited over 30 universities across his country, Bangladesh, to tell the students about Wikipedia. He has also co-organized many events to encourage people to edit, and published a style guide book about the Bengali Wikipedia. He is constantly driven to search for new ways to support the Wikimedia movement, the community of people who create and maintain the Wikimedia projects, like Wikipedia.

Hasive’s devotion to the idea of free knowledge led him to join Wikimedia Bangladesh (WMBD), an independent chapter supporting Wikimedia in his country. With the chapter, he’s helped to integrate Wikipedia Zero in Bangladesh, a program that gives cell phone users free access to Wikipedia, and organized the first-ever Wiki Loves Monuments in his country. The international contest encourages photographers to share their photos of their country’s monuments with the world, free for anyone to use.

Hasive now serves as a board member of the Bengali chapter, where he helps with communications and supports the community by coordinating large events. His previous experiences with the Bangladesh Open Source Network (BdOSN), an active volunteer group in his country, helped prepare him for the role.

“We held an anniversary program in 2015 to celebrate the Bengali Wikipedia’s tenth birthday,” Hasive recalls. It was three months long and featured two conferences, each with workshops and seminars, and various activities both online and off. With these, they were able to reach into every division of Bangladesh.

Bengali is spoken by nearly 250 million people, making it one of the most widely spoken languages in the world. With nearly 48,000 articles on Wikipedia, representing the Bengali culture on the internet feels like a national duty for Hasive.

“We have recently started work on two new projects in my country,” says Hasive. “The first one is called “Women in Wikipedia,” which aims to engage more Bengali women in editing Wikipedia (rather than just reading). The second has seen us working with a group of medical students who have shown interest in addressing the shortage of medical content on the Bengali Wikipedia.”

And in order to get his message heard by anyone interested in Wikipedia in Bangladesh, he has written a book explaining how Wikipedia works and how to edit it. He has published this book under a free license to let anyone copy and re-publish the book for free.

The time he spends helping other editors does not distract him from editing. Hasive has recently started a 100 Wiki Days challenge, where the participant commits to creating one new Wikipedia article every day for 100 consecutive days.

For his challenge, Hasive is creating and editing articles about soldiers who fought in the Liberation War of 1971, particularly those who received the Bir Sreshtho, Bir Uttom, Bir Bikrom, and Bir Protik medals that were awarded to 767 people for the role they played in the war. But for Hasive, the challenge is not limited to only 100 days. Hasive has a long-term goal of creating and developing articles about each medal recipient.

However, Wikipedia is not the only outlet for Hasive’s writings. He works as a journalist during the day and spends the bulk of his free time thinking about knowledge sharing on Wikipedia.

“My dream is to connect all people from my country with Wikipedia—people who are looking for real content. I try to recruit editors with different professional backgrounds, such as doctors, engineers, etc. They will help build a respected and legitimate encyclopedia.”

Samir Elsharbaty, Digital Content Intern, Wikimedia Foundation

by Samir Elsharbaty at February 13, 2017 05:56 PM

Resident Mario

February 12, 2017

Gerard Meijssen

#Wikimedia Foundation and #Energy

The energy use of the Wikimedia Foundation makes the WMF a dirty supplier of data. The WMF position is that it has no data centre of its own and relies on the energy provided by the hosting company.

When you look at energy use in relation to Wikipedia, there are two components. There is supplying the data and there is consuming the data. The WMF has a problem in one and it has two problems that it can solve.

The WMF is making a war chest that should enable it to function in times when the current funding effort proves to be problematic; they are investing for a rainy day. One other problem for the WMF is that many people do not use Wikipedia and one of the reasons why is a lack of energy. The purpose of what the WMF does is share the sum of all knowledge..

We could invest in the generation of energy near the location where our servers are but we could also invest in increasing our reach and invest in green energy at the same time. When we provide clean energy and target schools they will still have to pay us back including some profit.

When the total sum of generated energy is at least the amount of energy used by our servers, we provide the argument why we are not a dirty data provider. More importantly we give a boost to a world that uses green energy and by focusing where it matters most, we ensure no new dirty energy is used.

There probably are existing funds that provide such investments. When the WMF adds its weight to them and promotes it as an open investment project, many people are likely to invest in this as well making the effort mushroom.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at February 12, 2017 09:18 AM

#Wikimedia language committee and #trust

The Wikimedia language committee was originally created to stem the flow of new projects that just did not work. At the end of a lengthy process a specialist is sought to certify that the language used is indeed the language a new Wikipedia is about.

Rehmat Aziz Chitrali is such a specialist for the Khowar language. When you read one of the Wikipedia articles about him, there are seventeen articles, it is obvious that he is more than qualified for the task.

With so many articles, it is obvious that awards he received may have well been documented in some but not in other languages. For Wikidata several but not all awards were added to do some justice.

When the news is in that the Khowar incubator is indeed in Khowar, we will soon have a new Wikipedia.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at February 12, 2017 08:41 AM

Frank Schulenburg

I dropped out of 100wikiCommonsDays – here’s why I enjoyed it anyway

Is it possible to upload a photo to Wikimedia Commons, Wikipedia’s media repository, every day for 100 consecutive days? The answer is yes. Until today, two photographers have completed the challenge and many more will hopefully follow their lead. I started my own 100wikiCommons challenge and dropped out after uploading 68 photos. Here’s why I […]

by Frank Schulenburg at February 12, 2017 02:34 AM

February 10, 2017

William Beutler

What’s the Truth, What’s the Use?On Wikipedia and the Daily Mail

The_Daily_Mail_logo

Earlier this week, Wikipedia editors decided to restrict the use of a publication as a source for information in its articles, and then a funny thing happened: it made international news. First The Guardian, and then many other publications, reported on the outcome of a proposal to prohibit the UK tabloid Daily Mail from reference sections in Wikipedia articles. The first paragraph of the decision summary reads:

Consensus has determined that the Daily Mail (including its online version, dailymail.co.uk) is generally unreliable, and its use as a reference is to be generally prohibited, especially when other more reliable sources exist. As a result, the Daily Mail should not be used for determining notability, nor should it be used as a source in articles. An edit filter should be put in place going forward to warn editors attempting to use the Daily Mail as a reference.

It’s not the first time a source has been “blacklisted”[1]There is no official blacklist. but most of the time it’s because of spamming efforts, and the Daily Mail is by far the most high-profile recent example. In fact, it has the biggest online reach of any news website around the world, according to comScore. An effort is now under way to replace all existing Daily Mail citations with better sources.[2]It has a long way to go.

To understand what happened, it’s helpful to know about how Wikipedia considers the various third-party sources it prefers, allows, and prohibits as citations. The official guideline is called “Identifying reliable sources” and, over the course of several thousand words, it seeks to define sources with a “reputation for fact-checking and accuracy”.

The lengthy discussion on the “Daily Mail RfC”[3]short for “Request for comment” includes numerous examples of why editors believe it fails that test. The pithiest take—

Under NO circumstances should the Daily Mail be used for anything, ever. They have proven themselves to be willing to make up fake quotes and to create doctored pictures, and nothing they say or do is to be trusted. Even in the cases that some of the editors in this discussion believe to be OK (sports scores, for example), if it really happened then the Daily Mail won’t be the only source and if the Daily Mail is the only source, it probably didn’t happen.

—was followed by links to numerous examples of falsehoods and outright fabrications from the last few years.[4]Here, here, and here, for example, with a few cited to the Guardian itself. As the newspaper’s own Wikipedia article demonstrates, the Daily Mail has been the subject of multiple successful libel suits, not to mention other controversies calling the paper’s trustworthiness into question.

The Daily MailTo be sure, the Daily Mail is not the only publication that cares for clicks more than facts, but the determination of editors was that it cares for attention to the exclusion of facts.

An interesting, comparatively ancient[5]I thought I’d written about this, but apparently this just slightly pre-dated The Wikipedian—I was however quoted by Wired about it. example was whether to acknowledge the National Enquirer‘s reporting on then-U.S. presidential candidate John Edwards’ extramarital affair (producing a child, no less) when no one else had confirmed it. Wikipedians argued about it until the story was confirmed by others. The Enquirer received its share of reluctant praise, but it still isn’t generally allowed as a source on Wikipedia.

The media’s fascination with the prohibition may stem from their own trepidation: what if we’re next? They shouldn’t worry—the Daily Mail is an outlier case, and perhaps a useful caution for publications that skirt the line between truth and truthiness in their drive for traffic.

Still, it would be very bad if this became a trend. The difference between “you have to be careful with this publication” and “we simply cannot trust this publication” is hard to define, but important. The latter category should be a very small number. Moreover, Wikipedia should be careful not to apply a political test, even a de facto one, for publications. Wikipedians frequently argue over the political leanings of certain sources, but in all but a few cases, reliability can be established separately from a given point of view. The reasons must be based on trustworthiness of factual reporting, not the gloss added by the writer or by editors.

The reliability of Wikipedia depends on the reliability of reporting in the news media, which is the source of most information in articles about current and recent events. Journalism is in the midst of a long, slow decline accelerated by the internet (first Craigslist, then Facebook) and over the long term, this is bad news for Wikipedia, too. Maybe there is a way for Wikipedia to establish reliability for smaller publications that don’t look like traditional newspapers and lack their reach. But you don’t improve Wikipedia by allowing marginal sources, even if it necessarily limits what can be covered in its virtual pages. Fortunately, the Daily Mail doesn’t report on much that’s encyclopedic to begin with.

P.S. The first part of this post title is taken from a gorgeous non-album track by Radiohead, which happens to be called: “The Daily Mail”. You can listen to it here:

Notes   [ + ]

1. There is no official blacklist.
2. It has a long way to go.
3. short for “Request for comment”
4. Here, here, and here, for example, with a few cited to the Guardian itself.
5. I thought I’d written about this, but apparently this just slightly pre-dated The Wikipedian—I was however quoted by Wired about it.

by William Beutler at February 10, 2017 11:00 PM

Weekly OSM

weeklyOSM 342

01/31/2017-02/06/2017

Mapping

  • Transifex is a tool that helps to localize the content and reach thousands of people. If you want to contribute in making learnOSM in your language, start translating using Transifex.

  • Jinal Foflia writes a post about getting started with OpenStreetMap, the importance of open data and why one should contribute to OpenStreetMap.

  • Voonosm asks in the forum how periodically occurring lakes could be mapped.

  • Volker Schmidt asks in the tagging mailing list about the best way to tag beef fattening stations.

  • Pavel Zbytovský would like to extend Simple Indoor Tagging, which he calls CoreIndoor. Sadly most of his extensions are based on wrong assumptions and misinterpretations about Simple Indoor Tagging, which documentation should be enhanced.

  • Joachim asks on the tagging mailing list for comments for an amenity = snow_removal_station where trucks can be freed from ice sheets.

  • Unlike more enlightened places, England and Wales have a system of regulated access across otherwise private land. On the talk-gb mailing list Dave F mentions that the local government’s record of these might not match an actual path on the ground. The discussion then continues about which of these may be considered "correct".

  • In the Swiss mailing list Hans Herzig asks about the tagging of gorges.

Community

  • Wa Mbedmi writes a post introducing OpenStreetMap Senegal and showcasing the importance of OpenStreetMap and its impact on the lives of the citizens.

  • Hernán De Angelis published a laudation of OpenStreetMap on his blog, which he has experienced through his Garmin.

  • The Nicaraguan OSM community invites participants to a workshop they organise about urban cycling in Managua.

  • User RichRico writes about the to-fix plugin and how that plugin coupled with the Tiger data layer can be used to align the misaligned roads in the US.

  • Sajjad Anwar from Mapbox writes about using OSMcha and OSM-compare to validate and analyse changes happening in OpenStreetMap. He urges the community to join the effort to keep OpenStreetMap the best map out there.

  • Due to progress in Belgium, Joost Schouppe queries whether there are examples of other governments to incorporate OSM into their data management.

Imports

  • Mappers in West Midlands are going to import updated NAPTAN data (additional fields and improved coordinate accuracy). A discussion on Imports mailing list has not been started yet.

Events

  • The call for logo designs for the SOTM Latam 2017 taking place in Lima, Peru, is out.

  • The exact date of the State of the Map US 2017 has not been announced yet, only that it will take place in Boulder, Colorado in October. But the date of the international State of the Map conference in Aizuwakamatsu, Japan is known—18 to 20 August 2017.

  • Stefan Keller invites to the 8th Micro Mapping Party in Rapperswil, Switzerland on March 10th.

Humanitarian OSM

  • HOT launches their new micro grants program to enable the development of local OSM communities, to have access to equipment and increase skills, capacity and experience.

  • HOT announced that OpenAerialMap is officially no longer in beta status.

  • The Guardian reported about the number of girls in Tanzania who manage to escape genital mutilation, thanks to collaborative crowd-mapping efforts of places they can flee to and how to get there.

Maps

  • Wheelmap, the map for wheelchair-accessible places based on OSM, added support for 27 additional POI types.

  • GeekWire reports on Access Map research project at the University of Washington, which, among other things, provides routing for pedestrians and wheelchair users, using a mix of different data sources. The project (its import, its procedure and its tagging scheme) met with some people in the OSM community on rejection.

Open Data

  • [1] Stuttgart’s Citizen Science self-measure the particulates in their city. The project is also available in other areas, as you can see it on this map.

Software

Programming

  • Jochen Topf blogs about the new osmium-tool extract command that creates OSM data extracts faster than any other existing tool, while enabling several extracts on the same run.

  • Chris Loer reports how Mapbox improved label rendering using better line-breaking algorithms. This can be particularly tricky when displaying right-to-left languages.

  • The maximum size of changesets was restricted to 10,000 (down from 50,000). This should only affect editor developers.

  • Mariusz Rogowski complains on the dev mailing list about the quality of Nominatim and the project’s ability to attract new developers. Sarah and Frederik give detailed answers and, among other things, clarify why erroneous code is not included in the project.

Releases

Did you know …

  • "Trending Places in OSM" by Bhavya Chanra? The project analyzes the anonymous server logs, generates places that are most frequently visited and publishes them on Twitter.

Other “geo” things

  • ITECHPOST reports that the Philippine catastrophe management initiative "NOAH" ceased at the end of February. OpenStreetMap was very involved in this project last year, as this blog post shows.

  • The Guardian featured a small quiz by Alex Szabo-Haslam, in which one should identify cities by their waterways.

Upcoming Events

This weeklyOSM was produced by Hakuch, Nakaner, Peda, Polyglot, Rogehm, SeleneYang, derFred, jcoupey, jinalfoflia, keithonearth, roirobo.

by weeklyteam at February 10, 2017 05:11 PM

February 08, 2017

This month in GLAM

This Month in GLAM: January 2017

by Admin at February 08, 2017 07:07 PM

Pete Forsyth, Wiki Strategies

Andy Mabbett (User:Pigsonthewing)

It was twenty years ago today…

According to Google, it was twenty years ago today, that I made my first comment in an on-line forum (that doesn’t link to my comment which, it seems, has escaped the archives, but to one which quotes it).

Champagne uncorking photographed with a high speed air-gap flash

It was a post to the then-active alt.music.pink-floyd . It includes the obligatory typo (PInk) and an embarrassingly-mangled signature (I shared a dial-up account with my then boss, Graham). The content was relatively trivial.

But even so, I had no idea where it would lead me. It was the first-step on a life-changing journey; being online effectively became my career, first as a website manager, then as a freelance consultant, and as a Wikipedian (and Wikimedian) in Residence. It greatly enhanced my life experiences, created opportunities for travel, and is the foundation of many long-lasting “” friendships, with people from all around the world.

So I’m using this anniversary as an excuse to ask you all to call for an open, fair and internet. Join the Open Rights Group or a similar organisation. Let your MP or other representative know that you support and oppose . Don’t let vested interests spoil what we have made.

And please, forgive my twenty years of awful typing.

The post It was twenty years ago today… appeared first on Andy Mabbett, aka pigsonthewing.

by Andy Mabbett at February 08, 2017 03:45 PM

February 07, 2017

Wikimedia Tech Blog

Algorithms and insults: Scaling up our understanding of harassment on Wikipedia

Visualization by Hoshi Ludwig, CC BY-SA 4.0.3D representation of 30 days of Wikipedia talk page revisions of which 1092 contained toxic language (shown as red if live, grey if reverted) and 164,102 were non-toxic (shown as dots). Visualization by Hoshi Ludwig, CC BY-SA 4.0.

“What you need to understand as you are doing the ironing is that Wikipedia is no place for a woman.” –An anonymous comment on a user’s talk page, March 2015

Volunteer Wikipedia editors coordinate many of their efforts through online discussions on “talk pages” which are attached to every article and user-page on the platform. But as the above quote demonstrates, these discussions aren’t always good-faith collaboration and exchanges of ideas—they are also an avenue of harassment and other toxic behavior.

Harassment is not unique to Wikipedia; it is a pervasive issue for many online communities. A 2014 Pew survey found that 73% of internet users have witnessed online harassment and 40% have personally experienced it. To better understand how contributors to Wikimedia projects experience harassment, the Wikimedia Foundation ran an opt-in survey in 2015. About 38% of editors surveyed had experienced some form of harassment, and subsequently, over half of those contributors felt a decrease in their motivation to contribute to the Wikimedia sites in the future.

Early last year, the Wikimedia Foundation kicked off a research collaboration with Jigsaw, a technology incubator for Google’s parent company, Alphabet, to better understand the nature and impact of harassment on Wikipedia and explore technical solutions. In particular, we have been developing models for automated detection of toxic comments on users’ talk pages applying  machine learning methods. We are using these models to analyze the prevalence and nature of online harassment at scale. This data will help us prototype tools to visually depict  harassment, helping administrators respond.

Our initial research has focused on personal attacks, a blatant form of online harassment that usually manifests as insults, slander, obscenity, or other forms of ad-hominem attacks. To amass sufficient data for a supervised machine learning approach, we collected 100,000 comments on English Wikipedia talk pages and had 4,000 crowd-workers judge whether the comments were harassing in 1 million annotations. Each comment was rated by 10 crowd-workers whose opinions were aggregated and used to train our model.

This dataset is the largest public annotated dataset of personal attacks that we know of. In addition to this labeled set of comments, we are releasing a corpus of all 95 million user and article talk comments made between 2001-2015. Both data sets are available on FigShare, a research repository where users can share data, to support further research.

The machine learning model we developed was inspired by recent research at Yahoo in detecting abusive language. The idea is to use fragments of text extracted from Wikipedia edits and feed them into a machine learning algorithm called logistic regression. This produces a probability estimate of whether an edit is a personal attack. With testing, we found that a fully trained model achieves better performance in predicting whether an edit is a personal attack than the combined average of 3 human crowd-workers.

Prior to this work, the primary way to determine whether a comment was an attack was to have it annotated by a human, a costly and time-consuming approach that could only cover a small fraction of the 24,000 edits to discussions that occur on Wikipedia every day. Our model allows us to investigate every edit as it occurs to determine whether it is a personal attack. This also allows us to ask more complex questions around how users experience harassment. Some of the questions we were able to examine include:

  1. How often are attacks moderated? Only 18% of attacks were followed by a warning or a block of the offending user. Even for users who have contributed four or more attacks, moderation only occurs for 60% of those users.
  2. What is the role of anonymity in personal attacks? Registered users make two-thirds (67%) of attacks on English Wikipedia, contradicting a widespread assumption that anonymous comments by unregistered contributors are the primary contributor to the problem.
  3. How frequent are attacks from regular vs. occasional contributors? Prolific and occasional editors are both responsible for a large proportion of attacks (see figure below). While half of all attacks come from editors who make fewer than 5 edits a year, a third come from registered users with over 100 edits a year.

Chart by Nithum Thain, CC BY-SA 4.0.Chart by Nithum Thain, CC BY-SA 4.0.

More information on how we performed these analyses and other questions that we investigated can be found in our research paper:

Wulczyn, E., Thain, N., Dixon, L. (2017). Ex Machina: Personal Attacks Seen at Scale (to appear in Proceedings of the 26th International Conference on World Wide Web – WWW 2017).

While we are excited about the contributions of this work, it is just a small step toward a deeper understanding of online harassment and finding ways to mitigate it. The limits of this research include that it only looked at egregious and easily identifiable personal attacks. The data is only in English, so the model we built only understands English. The model does little for other forms of harassment on Wikipedia; for example, it is not very good at identifying threats. There are also important things we do not yet know about our model and data; for example, are there unintended biases that were inadvertently learned from the crowdsourced ratings? We hope to explore these issues by collaborating further on this research.

We also hope that collaborating on these machine-learning methods might help online communities better monitor and address harassment, leading to more inclusive discussions. These methods also enable new ways for researchers to tackle many more questions about harassment at scale—including the impact of harassment on editor retention and whether certain groups are disproportionately silenced by harassers.
Tackling online harassment, like defining it, is a community effort. If you’re interested or want to help, you can get in touch with us and learn more about the project on our wiki page. Help us label more comments via our wikilabels campaign.

Ellery Wulczyn, Data Scientist, Wikimedia Foundation
Dario Taraborelli, Head of Research, Wikimedia Foundation
Nithum Thain, Research Fellow, Jigsaw
Lucas Dixon, Chief Research Scientist, Jigsaw

Editor’s note: This post has been updated to clarify potential misunderstandings in the meaning of “anonymity” under “What is the role of anonymity in personal attacks?”

by Ellery Wulczyn, Dario Taraborelli, Nithum Thain and Lucas Dixon at February 07, 2017 06:30 PM

Sam Wilson

New feature for ia-upload

, Fremantle.

I have been working on an addition to the IA Upload tool these last few days, and it's ready for testing. Hopefully we'll merge it tomorrow or the next day.

This is the first time I've done much work with the internal structure of DjVu files, and really it's all been pretty straight-forward. A couple of odd bits about matching element and page names up between things, but once that was sorted it all seems to be working as it should.

It's a shame that the Internet Archive has discontinued their production of DjVu files, but I guess they've got their reasons, and it's not like anyone's ever heard of DjVu anyway. I don't suppose anyone other than Wikisource was using those files. Thankfully they're still producing the DjVu XML that we need to make our own DjVus, and it sounds like they're going to continue doing so (because they use the XML to produce the text versions of items).

Update two days later: this feature is now live.

+ Add a commentComments on this blog post
No comments yet

by Sam at February 07, 2017 04:18 PM

New feature for ia-upload

, Fremantle.

I have been working on an addition to the IA Upload tool these last few days, and it's ready for testing. Hopefully we'll merge it tomorrow or the next day.

This is the first time I've done much work with the internal structure of DjVu files, and really it's all been pretty straight-forward. A couple of odd bits about matching element and page names up between things, but once that was sorted it all seems to be working as it should.

It's a shame that the Internet Archive has discontinued their production of DjVu files, but I guess they've got their reasons, and it's not like anyone's ever heard of DjVu anyway. I don't suppose anyone other than Wikisource was using those files. Thankfully they're still producing the DjVu XML that we need to make our own DjVus, and it sounds like they're going to continue doing so (because they use the XML to produce the text versions of items).

by Sam at February 07, 2017 04:18 PM

Gerard Meijssen

#Wikidata - The Rumford Medal

The Rumford Medal is a prestigious award. It has been awarded to European scientists from 1800. The first recipient was Lord Rumsford himself.

The best information on the award winners was the Portuguese Wikipedia article. For once the data in the English article was not really usable because the names where combined in a template. A template that has been deprecated.

The Portuguese Wikipedia includes more articles so several Wikidata items only have articles in Portuguese. The 2016 awardee only has an English article and it does not mention that Mr Ortwin Hess received any awards at all. The article suggest that the award was last awarded in 2014...

When you add information from Wikipedia, it is important to inspect the data. When everything goes as planned, the items for the winners include that they received at least one award. Someone else may append the data for instance the date when people received their award.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at February 07, 2017 03:50 PM

Wikimedia Foundation

The Metropolitan Museum of Art makes 375,000 images of public domain art freely available under Creative Commons Zero

Photo by Arad, CC BY-SA 3.0.

Entrance to the Met. Photo by Arad, CC BY-SA 3.0.

Today, the Metropolitan Museum of Art in New York City, known by many as the Met, announced that it is placing more than 375,000 images of public-domain works in the museum’s collection under a Creative Commons Zero (CC0) dedication. The release, which covers images of the great majority of the museum’s holdings, is part of the Met’s Open Access initiative and will enable anyone, anywhere to freely access, use, and remix photos of some of the world’s most well-known works of art.

Over the coming months, I will be working in close collaboration with the Met and the Wikimedia community as the museum’s first Wikimedian in Residence to make these images more readily available and integrated within the Wikimedia projects. While much of the Met’s collection are historical works in the public domain, the Met is now lifting any licensing restrictions on its own photography of these artworks, and unambiguously releasing them under CC0, so they can be used freely online. With the Met’s CC0 release today and updating of its licensing policy, images of the Met’s public domain artwork will be freely available online to be reused for any purpose, without restriction under copyright law.

Among the 375,000 newly copyright-free items are the Met’s images of Emanuel Leutze’s famous Washington Crossing the Delaware, a painting by Monet from his Water Lilies series, images and background on Robe à la Française (18th-century French attire), background on expertly carved jade from 3rd century B.C. China, and stunning fragments of a statue depicting an Egyptian queen from the 14th century B.C.

For now, it all begins with a little gold frog. A small gold pre-Columbian pendant of a tree frog from the 11–16th century is one of several dozen three-dimensional objects, like jewelry, clothing, furniture, and weapons, uploaded to Wikimedia Commons and Wikidata with the thoroughness of a museum and the openness of Wikipedia.  Small things can lead to serendipitous discoveries of opportunities and gaps in coverage—only when creating the Wikidata item for this little frog, did I find the culture that created it has an article only on Polish Wikipedia.

Photo via the Metropolitan Museum of Art, public domain/CC0.

Photo via the Metropolitan Museum of Art, public domain/CC0.

In my role as the Met’s Wikimedian in Residence, I will collaborate with other Wikimedians through projects like WikiProject Metropolitan Museum of Art to add newly available images to Wikimedia Commons, document each artwork’s metadata within Wikidata (the Wikimedia knowledge base that is used by all Wikimedia projects), and facilitate the writing of Wikipedia articles on major artworks and art topics in the collection. My ultimate goal is to “Wiki-fy the Met, and Met-ify the Wiki”, bringing together the complementary strengths of global community and institutional knowledge.

I plan to work with the Met to host online, multilingual edit-a-thons and partner with other Wikimedians and affiliate groups who can use the Met’s collections in their own contributions to the projects. We also plan to work with the Wikimedia technical community on new models of volunteer tools to improve art coverage on Wikidata and beyond, enabling broader utilization of Met collections throughout Wikimedia.

I’m looking forward to starting this work and continuing to work with the Wikimedia community and the Met on future collaborations. I hope the Met’s historic contribution inspires other GLAM institutions (galleries, libraries, archives, and museums) around the world to open up their collections to the world and make them freely available for everyone to learn from, enjoy, and share freely.

Richard Knipel, Wikimedian-in-Residence
Metropolitan Museum of Art, New York

by Richard Knipel at February 07, 2017 03:16 PM

Gerard Meijssen

The #Labor hall of Honour

When you add awards for one person, you will find other awards that are of interest. The Department of Labor's Hall of Honor is one such. It is a monument to honor Americans who have made a positive contribution to how people in the United States work and live.

Regularly new people are included. For some awards there are many red links. This time there was only one. Mark Ayers was added in 2012. When you google Mr Ayers, you find several dates of death.

To do the list justice, it is easy to add sources to the more relevant statements for Mr Ayers; him being part of the hall of honour and his death. It is left for other people to complement the data. As it is, the item fulfils its function; it completes a list.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at February 07, 2017 01:23 PM

Resident Mario

February 06, 2017

Gerard Meijssen

#Wikidata - the Congressional gold medal

The United States Congress awards a gold medal to those who they think deserve it. Of interest is that the medal is made for the person involved. As an illustration you find the medal made for Rosa Parks.

When information is completed for a person like Mrs Parks in Wikidata, often Wikidata is lacking the associated items. For others like this gold medal there is a category that makes it easy to add other recipients for the award.

Adding information for one person adds somewhat to the quality of the information for that item. Another way of looking at quality is finding how connected for instance a Nancy Reagan and a Rosa Parks is. They were both honoured by Congress. To enable this, it is important to complete information when possible.

For Mrs Parks awards were added to Wikidata. Even when there is now only one recipient, it may establish a link to the organisation that conferred it. It makes it easy to add the award in the future and this is how slowly but surely quality at Wikidata improves.

Connecting as much as possible is what will make Wikidata great.
Thanks,
       GerardM


by Gerard Meijssen (noreply@blogger.com) at February 06, 2017 07:26 AM

Wikimedia Foundation

Wikimedia Foundation joins amicus brief supporting challenge to U.S. immigration and travel restrictions

Map of scheduled airline traffic around the world, c. June 2009. Image by Jpatokal, CC BY-SA 3.0.

Map of scheduled airline traffic around the world, c. June 2009. Image by Jpatokal, CC BY-SA 3.0.

Today, the Wikimedia Foundation joined more than 90 organizations including Facebook, Levi Strauss & Co., Microsoft, Mozilla, and Paypal in an amicus brief in State of Washington v. Trump. The case challenges the recent executive order establishing restrictions on immigration and international travel based on national origin. The signatories emphasize the importance of international mobility to innovation, and underscore how the executive order does not meet basic constitutional and statutory requirements. The brief details the real and immediate impact these restrictions will have on the Wikimedia Foundation and other signatories’ staff, operations, user communities, and customers.

The Wikimedia Foundation supports Wikipedia, the other Wikimedia projects, and a global movement of volunteers and affiliate organizations committed to free knowledge. Although we are headquartered in the United States, our mission is global. Wikipedia is built by people from every corner of the globe, across a broad spectrum of nationalities, creeds, political beliefs, and identities. The open exchange of ideas, information, community, and culture is an essential part of our vision: a world in which every single person can freely share in the sum of all knowledge.

Wikimedia Foundation staff and contractors support this mission, and our communities and projects, in every part of the world from Boston to Baghdad to Bangalore. International mobility is critical to our work. We operate technology that delivers knowledge to people on every continent in nearly 300 languages. We cross borders to develop and sustain strategic partnerships with organizations and affiliates around the globe. We travel to gatherings and hackathons to support and collaborate with Wikimedians around the world. We represent Wikimedia research and methodologies at conferences with librarians and scientists from across the globe. We meet with community leaders and board members internationally to exercise corporate and community governance and execute strategic oversight.

Restrictions on immigration and international travel such as those implemented by this executive order will limit the ability of Foundation staff and contractors, and Wikimedia community members to participate in these activities, creating a serious impediment to the organization’s operations.

The arbitrary and overly broad restrictions against the citizens of seven countries, and the threat of expanding the restrictions to include any number of additional countries, have created an environment of uncertainty in our ongoing operations. Many people affiliated with the Wikimedia Foundation can no longer travel to the United States, or are now unwilling to leave the United States for fear they will not be able to return. This uncertainty places an unreasonable burden on members of Wikimedia Foundation staff and the Wikimedia communities and makes it difficult for the Foundation to effectively plan for our future programmatic and operational activities.

At the Wikimedia Foundation, we believe firmly that knowledge knows no borders. In support of free knowledge and the international cooperation that makes our work possible, we find it essential to join in this brief today and other cases challenging this executive order as needed. We urge the courts to find this order unlawful and protect the rights of our communities and staff.

Michelle Paulson, Interim General Counsel
Wikimedia Foundation

Special thanks to the law firm Mayer Brown for drafting the brief, to the other signatories of the brief for their collaboration and support in this matter, and to the Wikimedia Foundation Communications, Legal, Talent and Culture, and Travel teams for their work since the order was first issued.

by Michelle Paulson at February 06, 2017 05:56 AM

February 05, 2017

Gerard Meijssen

The woman who bankrupted the #KKK

Her child was lynched. Her child was lynched by members of the Klu Klux Klan. It became a court case and not only was the guilt of the perpetrators proven, it was also proven that the Klan told its members to go out and lynch. In recognition of this fact, the Klan was penalised, they could not afford the money the organisation went bankrupt.

In recognition of all this, Mrs Beulah Mae Donald received the Candace award in 1988 and consequently she is notable enough for Wikidata.

In 2017 Mr Trump removed white supremacist groups from the Terror Watch Program. Does this mean that the definition of terrorism changed or only that the United States does not mind home grown terrorism?

Effectively the KKK is no longer considered a terrorist organisation. It is however well documented that more people died in America because of home grown terrorists than by terrorists of any other kind. For a definition of terrorism you may check an Amnesty International or the UN. When you consider their definition and apply it to observable fact it is no longer a political opinion.

My question to US Americans is; when black lives matter what does this say about your government. It is not only about Muslims being welcome there is this as well.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at February 05, 2017 06:04 PM

#Wikidata - Including the Candace award


I saw a short video on modern dance. It praised a Mrs Katherine Dunham. As I often do, I check out if Wikidata knows about it and see if it is reasonably complete. As always there was more to do and I added her to the Candace award. In the process I added some 79 ladies who are "Black role models of uncommon distinction who have set a standard of excellence for young people of all races".

I wanted an illustration and I turned to Google. Google did not know this award; it got confused with a Candace Cameron Bure, an actor who received some awards. I did provide Google with feedback :)

Google still provided me with relevant information; the Wikipedia article has it that the award was terminated in 1992 but I found that the organisation that confers the award was seeking nominees in 2009. The National Coalition of 100 Black Women is still going strong by the way.

It is however not obvious what happened after 1992. The one source that can provide some clarity is the NCBW. They can make a difference by adding information to the Wikipedia article. "Red links"  are fine. The point of role models is simple. It is one thing to recognise them, it is another to have people know about them.
Thanks,
        GerardM

by Gerard Meijssen (noreply@blogger.com) at February 05, 2017 08:51 AM

February 03, 2017

Wikimedia Tech Blog

Pre-university students contribute to Wikimedia in Google Code-in 2016

Photo by Vajrapani666, CC BY-SA 3.0.

Programming work. Photo by Vajrapani666, CC BY-SA 3.0.

Google Code-in is an annual contest for 14-17-year-old students exploring free and open source software projects via small tasks in the areas of code, documentation, outreach, research, and design. Students who complete their tasks receive a certificate and a T-shirt from Google, while the top students in every participating organization get invited to visit the Google headquarters in Mountain View, California.

For the fourth time, Wikimedia was one of the participating organizations by offering mentors and tasks.

Number of tasks resolved per week. Graph by Andre Klapper, CC BY-SA 4.0.

The number of tasks resolved per week. Graph by Andre Klapper, CC BY-SA 4.0.

To list some student achievements:

  • Many improvements were made to the Pywikibot framework, Kiwix (for Wikipedia offline reading), Huggle (to fight vandalism), Wikidata, documentation, etc.
  • 20 improvements were carried out on the Wiki Education dashboard.
  • MediaWiki’s Newsletter extension was subject to extensive code changes.
  • The Pageview API offered monthly request stats per article title.
  • A dozen MediaWiki extension pages received screenshots.
  • Glossary wiki pages followed the formatting guidelines.
  • Research on team communication tools was carried out.
  • Many deprecated codes in MediaWiki core and extensions were removed.
  • Long CREDIT showcase videos were split into ‘one video per topic’ videos on Wikimedia Commons.
  • Proposals for a redesign of the Romanian Wikipedia’s main page.
  • jQuery.suggestions offered suggested reasons to block, delete, protect forms.
  • A {{PAGELANGUAGE}} magic word was added.
  • Performance improvements to the importDump.php maintenance script were made.
  • Special:RecentChanges was converted to use the OOUI library.
  • Allowed users to apply change tags as they make logged actions using the MediaWiki web API.
  • Added the ability to configure web service endpoint and added phpcs checks in MediaWiki’s extension for Ideographic Description Sequences.

Examples of what some participating students wrote about the event:

  • “In 1.5 months, I learned more than in 1.5 years.” — Filip
  • “I know these things will be there forever and it’s a big thing for me to have my name on such a project as MediaWiki.” — Victor
  • “What makes kids like me continue a work is appreciation and what the community did is giving them a lot.” — Subin
  • “I spent my best time of my life during the contest.” — David

Congratulations to our winners (Filip Grzywok, Justin Du), to our finalists (David Siedtmann, Nikita Volobuiev, Yurii Shnitkovskyi), and to the many hard working students for their valuable contributions to make free knowledge available for everybody. We’ll see you around on IRC, mailing lists, tasks, and patch comments.

We would like to thank all our mentors for their commitment. The time they spent on weekends, coming up with task ideas, working together with students and quickly reviewing contributions.

Wikimedia always welcomes contributions to help improve our free and open software. Check out how you can contribute.

Andre Klapper, Bug Wrangler in the Technical Collaboration team
Wikimedia Foundation

by Andre Klapper at February 03, 2017 08:59 PM

Weekly OSM

weeklyOSM 341

01/24/2017-01/30/2017

Mapping

  • Since last few days, an increased activity has been detected by newcomers who are targeting Pokémon Go – unfortunately, the term of "vandalism" is applicable to many of those changes. Michael explains (de) in his blog how to find Pokémon mappers and deal with them. Mic.com reported as well.

  • MikeN says in his blog that two more MapRoulette tasks for railway fans are to be done: Crossing Ways: Highway-Railway, US and Crossing Type: Highway-Railway, US. These tasks are well thought through, technically perfect compared to the tasks in Rivers going up the mountain and tasks in Ecuador which still have some technical issues.

  • Pascal Neis presents the new functions in "Find Suspicious OpenStreetMap Changesets" to group by contributor and see all changesets with sums.

  • Bing Maps has fresh imagery for Brazil. 8.5 million square kilometers covering its total area, including land and water.

  • The German forum discusses (de) (automatic translation) whether roads which show direction arrows yet no lane separation line should be tagged lanes=1.5.

  • Martijn van Exel asks your opinion about the OpenStreetCam plugin for JOSM. Suggestions and any other help is welcome.

  • Interesting OSM diary entry by user nammala, that shows some common errors and unexplained edits that happened this month in OpenStreetMap.

Community

  • Eric Rodenbeck summarizes the past year for his company Stamen Design and presents all the accomplished projects.

  • Following the discussion about Bing Maps imagery updates, @katpatuka underlines the importance of filling in the gaps, for example, in Turkey.

  • Inspired by the "Belgium Welcome Tool" from user M1dgard, user sabas88 wrote a new "Italian Welcome Tool" from scratch. It is available on GitHub. Regarding different languages sabas88 told us: "Currently the language function is made only for the ‘snippets’. Each snippet is a reusable message template (or part of it) you can drag to make the message. Internationalization of the proper software is in my mind, but actually I still need to do it."

  • On January 25 and for the first time, more than 6000 mappers were active on OpenStreetMap in one day, as Pascal Neis reported on Twitter.

OpenStreetMap Foundation

  • Ilya Zverv published details and statistics on the last OSMF board election.

Humanitarian OSM

  • In early January, 60 students met in Bagamoyo, Tanzania, to learn how to map in OSM.

  • Rebecca Firth, HOT Community Partnerships Manager, participated at the UN World Data Forum in South Africa. The achievement of the UN Sustainable Development Goals (SDGs) takes place with the support of the HOT community. An example is the fight against malaria. The UN conference participants were very impressed by the work and the results achieved by the OSM map support.

  • Missing Maps thank their 20,000 volunteers for their efforts. We too take the opportunity to congratulate them.

  • The HOT Task Manager is currently very slow because another program regularly consumes a lot of resources on the same server.

Maps

  • A map about the Syrian war in 2017 uses tiles from the German style OSM tile server as a background, causing overload to the server. Maybe the German style, which is maintained by Sven Geggus, was preferred for its labeling countries and cities in both Latin and Arabic.

  • Mapzen reported in their blog about graphical changes that their maps (style) offer and also regarding API keys.

  • This split map by Martijn van Exel compares OSM mapping states of 2007 and today.

  • OpenRouteService introduces "E-bike and Level of Fitness" into its routing application.

  • Joshua Comeau has created a nice map that visualizes the activities of Unsplash. Unsplash is a photo community where you can get free high-resolution photos is under CC0. In an article Joshua explained the background to his map.

  • User aromatiker asks how to map whiskey distilleries. The Whiskymap is according to his opinion by far not yet complete.

switch2OSM

  • Florian Lainez points out that the DuckDuckGo search engine has OpenStreetMap as an option for directions.

  • The SNCF, the French National Railway Company, uses OSM in their TGV. (via 0x010C)

  • Mapzen now supplies map data for all smart devices with the operating system Tizen.

  • Chile’s police uses OpenStreetMap to show crime and other data. (via OpenStreetMap Chile)

  • Pokémon Go named OSM in South Korea as a data source.

Software

  • The University of Heidelberg experimented with deep learning methods to automatically detect populated areas in rural regions .

Programming

  • Mapbox writes about Project Atlas that turns blueprints into maps – ingesting thousands of drawings and work orders per project, converting them from building information modeling (BIM) and computer-aided design (CAD) formats to Mapbox vector tiles, and making them interactive with Mapbox GL JS.

  • Panier Avide reports in his user blog on the Pic4Carto.js JavaScript library, which places georeferenced images from various sources (Flickr, OpenStreetCam, Mapillary) on a map.

  • Mapbox has released its MapboxNavigation.swift as open source, which provides an SDK for routing applications for iOS.

Releases

Did you know …

  • Disaster Mapping and Management from the University of Heidelberg? Heidelberg supports humanitarian activities in disaster management through innovative technologies and services using OSM.

  • OSMstats by Pascal Neis? The site contains a lot of interesting statistics about OpenStreetMap.

  • this guide for mapping turn restrictions?

Upcoming Events

This weeklyOSM was produced by Nakaner, Peda, Rogehm, Spec80, YoViajo, derFred, jinalfoflia, kreuzschnabel.

by weeklyteam at February 03, 2017 06:37 PM

Wikidata (WMDE - English)

Being a Volunteer Developer for Wikimedia projects: An Interview with Greta Doçi

Interview by Sandra Muellrick. This blog post is also available in German.

As a volunteer, Greta has been active in the Wikimedia movement for only a few years. She gives talks about Wikidata and is involved with Open Source development. In this blog post we want to introduce both her and the many opportunities the Wikimedia movement offers to try out new things, learn, and improve.

“Everything I know I try to put online to share e.g work with MediaWiki, queries or editing in Wikipedia or Wikidata. ”

For three years now, Greta has been editing Wiki projects on almost every afternoon. She is enthusiastic about bringing Free Knowledge to the world out of her native country of Albania. She’s been an editor for Wikipedia for over three years, and for more than one and a half year she’s been active on Wikidata. She also served on the board of the Albanian user group. Apart from her day job as an IT expert at an Albanian state organization, she organizes Wikidata workshops as well as the Albanian edition of the “Wiki Loves Monuments” photo contest, teaches university students on how to use Wikipedia, and for 3 months now she’s been teaching herself how to contribute code to MediaWiki.

“I love these things. That’s why I’m volunteering.”

She started to volunteer for charity causes at a young age. It’s important for Greta to produce something meaningful, finish projects, and have an impact on society. She loves to learn things, share knowledge and teach others. This is why she feels right at home at the Wikimedia movement.

“My first article in Wikipedia was about computer security. It was my favourite topic during my studies. It was my first article that I translated from English to Albanian during the first days of the Albanian Wikipedia.”

She started her journey into the world of Open Source as a participant in a Wikipedia workshop at her local hackerspace. This is how she started sharing knowledge about Albania, and translating knowledge into Albanian. As a translator from English to Albanian, she has worked for FSFE projects, Nextcloud, OSM, or participated in the toolbar development for Mozilla. Wikipedia proved to be the most efficient project for her, as editing and updates on data can are easy to be integrated into her daily schedule.

“I’m so in love in Wikidata. I’m working more than I used to”

In particular, she fell in love with the Wikidata project. She started to work on Wikidata as she was looking for a more technical Wikimedia project. Then she started adding facts on Albania and began coordinating Albania-related items with other volunteers. The biggest She spent the biggest amount of time updating data from the Albanian Wiki Loves Monuments contest in Wikidata. But it’s not just the work as an editor that she likes — she’s also happy about the opportunity to teach others about Wikidata. She just came back from the 33rd Chaos Communication Congress (or 33C3) where she stood in front of a an audience in a packed room with other volunteers to enthusiastically teach more people about Wikidata.

“The ideology of open source is to learn for yourself and others can take my code.”

At the “Ladies That FOSS” hackathon she got her feet wet at the MediaWiki source code for the first time. Matt Flaschen of the Wikimedia Foundation was her mentor there. She liked the cooperation: “He asked me to google problems first and look through the MediaWiki documentation to find my own solution”, she says. “That helped me not only with the smaller tasks at hand, but also to get an general overview of MediaWiki. There are no events like “Ladies That FOSS” in Albania where you get a mentor who goes over the code with you. There may be a presentation at a tech meetup and then you have to try it all by yourself.”

Ever since “Ladies That FOSS” she’s been contributing code, even though she struggles with finding the right place to get her code reviewed so she doesn’t have to wait for feedback for a long time.

“The community gives me a good feeling. Wikimedia is giving credit to the right people for the work they do. That’s something that motivates me to volunteer.”

Greta would love to see more support for local communities. The movement provides volunteers with with many programs to sponsor travelling to international meetings or organize Wikidata workshops at hackathons. Grants and scholarships are very helpful for her. Without them, she couldn’t afford travelling as a volunteer. However, her local Wikimedia user group has only eight active members who would love to expand their Open Source activities, but lack the resources to organize meetups and hackathons. “Offline meetings are where the community gets their motivation from”, says Greta.

In order to get more developers for MediaWiki, she thinks it’s important to have more people like herself at local spaces who share their knowledge with other volunteers and teach them programming or invite staff developers for in-depth workshops in Albania to get the local programmers’ community engaged. In any case, she hopes for more offline events to get more volunteers involved with projects.

by Jens Ohlig at February 03, 2017 12:37 PM

February 02, 2017

Wikimedia UK

#1lib1ref at the University of Edinburgh

indexI’ve been interested in Wikimedia projects since taking part in the University of Edinburgh’s Women and Medicine editathon in February 2015, when I wrote an article on the Scottish doctor and women’s medical health campaigner Margaret Ida Balfour. I enjoyed researching her life and achievements and found it immensely rewarding and satisfying to see her page appear on Wikipedia (and at the top of Google search results!).

Since then, I have gone on to receive training as a Wikimedia Ambassador from Ewan McAndrew, the University of Edinburgh’s Wikimedian in Residence, and led my own small training session for the Library’s Centre for Research Collections staff and volunteers. At the upcoming History of Medicine editathon, I’m exploring Wikimedia projects beyond Wikipedia, starting by testing out Wikisource with one of our recently digitised, out-of-copyright PhD theses.

Hunting citations

However, it’s not just the big, research-heavy element of Wikipedia that interests me; I also like using the Citation Hunt Tool to improve the quality of existing content. The tool provides the user with a paragraph of text from Wikipedia which contains a statement not backed up by reliable evidence (and therefore labelled with the [citation needed] tag). The challenge is to track down a trustworthy source, such as a peer-reviewed journal article or news article from a reputable publication, in order to back up the statement made in the text. It’s very satisfying when you discover an appropriate source and, as the statements can come from anywhere on Wikipedia, it’s easy to end up researching a range of bizarre and random topics.

In one of the examples I’ve worked on, I used a press release from the official San Francisco 49ers website to confirm the statement that American Footballer Justin Renfrow “signed a contract with San Francisco 49ers on May 18, 2015 along with Michigan State’s Mylan Hicks.”

Citation Hunt

 

#1Lib#Ref

I first dabbled with this tool last January as part of Wikipedia’s #1lib1ref campaign to mark its 15th birthday. At one of our team meetings, Library staff set about making 32 edits to Wikipedia, some using the Citation Hunt Tool and others using their own knowledge and research. We therefore had a very clear target to beat this year! Wikimedia has added a new feature to the tool, so users can now select citations from a topic of interest, rather than just being provided with a completely random statement from the encyclopaedia. Added to this, many of my colleagues were using the visual editor for the first time and feedback was that this made the whole editing process far easier and more enjoyable.

Despite this, one of the big issues raised by colleagues was how to define exactly what can be considered a reliable source. There is lots of information on Wikipedia’s help pages about this issue but a short one-page guide to using reliable sources would be useful for occasions such as this. I personally got into a spot of bother when I used a source which, although published and available on Google Books, was not considered by the Wikimedia community to be reliable enough…

All in all, library staff and our colleagues from the Learning Teaching and Web division managed a grand total of 63 edits, meaning we almost doubled last year’s effort. There are rumours of a friendly rivalry with our colleagues at the National Library of Scotland… this will certainly encourage me to add a few more citations!

Gavin Willshaw

Digital Curator

Library and University Collections

University of Edinburgh

@gwillshaw

by Gavin Wilshaw at February 02, 2017 05:23 PM

Wikimedia Tech Blog

Hiring a data scientist

Photo by NASA, public domain/CC0.

Photo by NASA, public domain/CC0.

Note: this post applies to employers hiring Data Analysts, Data Scientists, Statisticians, Quantitative Analysts, or any one of the dozen more titles used for descriptions of the job of “turning raw data into understanding, insight, and knowledge” (Wickham & Grolemund, 2016), the only differences being the skills and disciplines emphasized.

We recently needed to backfill a data analyst position at the Wikimedia Foundation. If you’ve hired for this type of position in the past, you know that this is no easy task—both for the candidate and the organization doing the hiring.

Based on our successful hiring process, we’d like to share what we learned, and how we drew on existing resources to synthesize a better approach to interviewing and hiring a new member of our team.

Why interviewing a data scientist is hard

It’s really difficult to structure an interview for data scientist positions. In technical interviews, candidates are often asked to recite or invent algorithms on a whiteboard. In data science interviews specifically, candidates are often asked to solve probability puzzles that seem similar to homework sets in an advanced probability theory class. This shows that they can memorize formulas and can figure out the analytical solution to the birthday problem in 5 minutes, but it doesn’t necessary indicate whether they can take raw, messy data, tidy it up, visualize it, glean meaningful insights from it, and communicate an interesting, informative story.

These puzzles, while challenging, often have nothing to do with actual data or the kinds of problems that would be encountered in an actual working environment. It can be both a frustrating experience for candidates and organizations alike—which is why we wanted to think about a better way to hire a data scientist for our team.

We also wanted our process to attract diverse candidates. As Stacy-Marie Ishmael, a John S. Knight Fellow at Stanford University and former Managing Editor at BuzzFeed News, put it: “job descriptions matter… and where they’re posted matter[s] even more.”

In this post we will walk you through the way we structured our job description and interview questions, and how we created a task for candidates to complete to assess their problem-solving skills.

How to write a job post that attracts good, diverse candidates

Defining “data scientist”

The most obvious (but sometimes overlooked) issue in hiring a data scientist is figuring out what kind of skillset you’re actually looking for. The term “data scientist” is not standard; different people have different opinions about what the job entails depending on their background.

Jake VanderPlas, a Senior Data Science Fellow at the University of Washington’s eScience institute, describes data science as “an interdisciplinary subject” that “comprises three distinct and overlapping areas: the skills of a statistician who knows how to model and summarize datasets (which are growing ever larger); the skills of a computer scientist who can design and use algorithms to efficiently store, process, and visualize this data; and the domain expertise—what we might think of as ‘classical’ training in a subject—necessary both to formulate the right questions and to put their answers in context.”

That’s more or less the description I personally subscribe to, and the description I’ll be using for the rest of this piece.

How to ensure you’re attracting a diverse group of candidates

Now that you’ve defined “data scientist,” it’s necessary to move onto the next section of your job description: what a person actually will do!

The exact phrasing of job descriptions is important because research in this area has shown that women feel less inclined to respond to “male-sounding” job ads and truly regard “required qualifications” as required qualifications. In a study of gendered wording in job posts by Gaucher et al. (2011), they found that “job advertisements for male-dominated areas employed greater masculine wording than advertisements within female-dominated areas,” and “when job advertisements were constructed to include more masculine than feminine wording, participants perceived more men within these occupations and women found these jobs less appealing.”

We had a job description (J.D.) that was previously used for hiring me, but it wasn’t perfect—it included lines like “Experience contributing to open source projects,” which could result in preference for people who enter and stay in open source movement because they don’t experience the same levels of harassment that others experienceor a preference for people who have the time to contribute to open source projects (which may skew towards a certain type of person.)

We consulted Geek Feminism wiki’s how-to on recruiting and retaining women in tech workplaces and the solutions to reducing male bias in hiring when rewriting the job description so as to not alienate any potential candidates. From that document, we decided to remove an explicit requirement for years of experience and called out specific skills that women are socialized to be comfortable with associating with themselves, adding time management to required skills and placing greater emphasis on collaboration.

Once we finished this draft, we asked for feedback from several colleagues who we knew to be proponents of diversity and intersectionality.

A super important component of this: we did not want to place the burden of diversifying our workforce on the women or people of color in our workplace. Ashe Dryden, an inclusivity activist and an expert on diversity in tech spaces, wrote, “Often the burden of fostering diversity and inclusion falls to marginalized people,” and, “all of this is often done without compensation. People internal to the organization are tasked with these things and expected to do them in addition to the work they’re already performing” We strongly believe that everyone is responsible for this, and much has been written about how the work of “[diversifying a workplace] becomes a second shift, something [members of an underrepresented group] have to do on top of their regular job.” To remedy this, we specified colleagues to give feedback during their office hours, when/if they had time for it (so it wouldn’t negatively affect their work), and only if they actually wanted to help out.

From the feedback, we rephrased some points and included an encouragement for diverse range of applicants (“Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply. We also welcome remote and international applicants across all timezones.”).  We then felt confident publishing the job description, which our recruiters advertised on services like LinkedIn. In addition, we wanted to advertise the position where DataSci women would congregate, so I reached out to a friend at R-Ladies (a network of women using R) who was happy to let the mailing list know about this job opening.

In short: be proactive, go where people already congregate, and ensure your language in a job post is as inclusive as possible, and you will likely attract a wider pool of potential candidates.

Sample Job Description

You might be asking yourself, “So what did this job description actually look like?” Here it is, with important bits bolded and two italicized notes interjected:

———

The Wikimedia Foundation is looking for a pragmatic, detail-oriented Data Analyst to help drive informed product decisions that enable our communities to achieve our Vision: a world in which every single human being can freely share in the sum of all knowledge.

Data Analysts at the Wikimedia Foundation are key members of the Product team who are the experts within the organization on measuring what is going on and using data to inform the decision making process. Their analyses and insights provide a data-driven approach for product owners and managers to envision, scope, and refine features of products and services that hundreds of millions of people use around the world.

You will join the Discovery Department, where we build the anonymous path of discovery to a trusted and relevant source of knowledge. Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply. We also welcome remote and international applicants across all timezones.

As a Data Analyst, you will:   

  • Work closely with product managers to build out and maintain detailed on-going analysis of the department’s products, their usage patterns and performance.
  • Write database queries and code to analyze Wikipedia usage volume, user behaviour and performance data to identify opportunities and areas for improvement.
  • Collaborate with the other analysts in the department to maintain our department’s dashboards, ensuring they are up-to-date, accurate, fair and focussed representations of the efficacy of the products.
  • Support product managers through rapidly surfacing positive and adverse data trends, and complete ad hoc analysis support as needed.
  • Communicate clearly and responsively your findings to a range of departmental, organisational, volunteer and public stakeholders – to inform and educate them.

Notice the emphasis on collaboration and communication—the social aspect, rather than technical aspect of the job.

Requirements:   

  • Bachelor’s degree in Statistics, Mathematics, Computer Science or other scientific fields (or equivalent experience).
  • Experience in an analytical role extracting and surfacing value from quantitative data.
  • Strong eye for detail and a passion for quickly delivering results for rapid action.
  • Excellent written, verbal, scientific communication and time management skills.
  • Comfortable working in a highly collaborative, consensus-oriented environment.
  • Proficiency with SQL and R or Python.

Pluses:  

  • Familiarity with Bayesian inference, MCMC, and/or machine learning.
  • Experience editing Wikipedia or with online volunteers.
  • Familiarity with MediaWiki or other participatory production environments.
  • Experience with version control and peer code review systems.
  • Understanding of free culture / free software / open source principles.
  • Experience with JavaScript.

Notice how we differentiate between requirements and pluses. Other than SQL and R/Python, we don’t place a lot of emphasis on technologies and specific advanced topics in statistics. We hire knowing that the candidate is able to learn Hive and Hadoop and that they can learn about multilevel models and Bayesian structural time series models if a project requires it.

Benefits & Perks *

  • Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)
  • The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, massages, cooking classes and much more
  • The 401(k) retirement plan offers matched contributions at 4% of annual salary
  • Flexible and generous time off – vacation, sick and volunteer days
  • Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses
  • For those emergency moments – long and short term disability, life insurance (2x salary) and an employee assistance program
  • Telecommuting and flexible work schedules available
  • Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax
  • Great colleagues – international staff speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people

* for benefits eligible staff, benefits may vary by location

———

Take-home task

Many engineering and data science jobs require applicants to complete problems on a whiteboard. We decided not to do this. As Tanya Cashorali, the Founder of TCB Analytics, put it: “[Whiteboard testing] adds unnecessary stress to an environment that’s inherently high stress and not particularly relevant to real-world situations.” Instead, we prefer to give candidates a take-home task. This approach gives candidates the opportunity to perform the necessary background research, get acquainted with the data, thoroughly explore the data, and use the tools they are most familiar with to answer questions.

After our candidates passed an initial screening, they were given 48 hours to complete this task, inspired by this task that I had completed during my interview process. The tasks were designed so the candidate would have to:

  • Develop an understanding and intuition for the provided dataset through exploratory data analysis
  • Demonstrate critical thinking and creativity
  • Deal with real world data and answer actual, potentially-open-ended questions
  • Display knowledge of data visualization fundamentals
  • Write legible, commented code
  • Create a reproducible report (e.g. include all code, list all dependencies) with a summary of findings

We recommend designing a task that uses your own data and a question you’ve answered previously, to give candidates an example of their day-to-day work in the future. If your team or organization have worked on a small-scale, data-driven project to answer a particular business question, a good starting point would be to convert that into the take-home task.

Interview questions

Now that you have your candidates, you have to interview them. This, too, can be tricky—but we wanted to judge each candidate on their merits, so we created a matrix ahead of time that could measure their answers.

One of the things we wanted to emphasize was how our prospective applicants thought about privacy and ethics. From how we handle requests for user data, to our public policy on privacy, our guidelines for ethically researching Wikipedia, and our conditions for research efforts, it is clear that privacy and ethical considerations are really important to the Wikimedia Foundation, and we wanted to ensure that final candidates could both handle the data and the privacy concerns that come with this job.

When we thought about the sorts of questions we’ve been asked in previous interviews and the kinds of topics that were important for us, we devised the following goals:

  • Assess candidate’s critical thinking and research ethics
  • Require candidate to interpret, not calculate/generate results
  • Learn about candidate’s approach to analysis
  • Gauge candidate’s awareness/knowledge of important concepts in statistics and machine learning

To that end, I asked the candidates some or all of the following questions within the hour I had with them:

  1. “What do you think are the most important qualities for a data scientist to have?”
  2. Data Analysis:
    1. “What are your first steps when working with a dataset?” (“Exploratory data analysis” is too vague! Inquire about tools they prefer and approaches that have worked for them in the past.)
    2. “Describe a data analysis you had the most fun doing. What was the part that you personally found the most exciting?”
    3. “Describe a data analysis you found the most frustrating. What were the issues you ran into and how did you deal with them?”
  3. I used this question to assess the candidate’s ability to identify ethics violation in a clear case of scientific misconduct because I wanted to work with someone who understood what was wrong with the case, knew why it was wrong, but also could devise a creative solution that would respect privacy.First, I asked if they’ve heard about the OKCupid fiasco. If they haven’t, I briefly caught them up on the situation, described how answers on OkCupid work (if they didn’t know), and specifically mentioned that the usernames were left in the dataset.
    1. “Please discuss the ethical problems with compiling this dataset in the first place and then publicly releasing it.”
    2. “You’re an independent, unaffiliated researcher. Maybe you’re a researcher here at the Foundation but you worked on this project in your personal capacity outside of work. Describe the steps you might take to make the individuals in the dataset less easily re-identifiable and the kinds of steps you might take before releasing the dataset.”
  4. Concepts in Statistics:
    1. Statistical power, p-value, and effect size is an important trio of concepts in classical statistics that relies on null hypothesis significance testing (NHST). As Andrew Gelman, a professor of Statistics at Columbia University, writes, “naive (or calculating) researchers really do make strong claims based on p-values, claims that can fall apart under theoretical and empirical scrutiny.”I presented outcome of a large sample size (e.g. 10K subjects) A/B test that yielded tiny (e.g. odds ratio of 1.0008) but statistically significant (e.g. p < 0.001) results, and then I asked if we should deploy the change to production. Why or why not?
    2. Bootstrapping is a popular and computationally-intensive tool for nontraditional estimation and prediction problems that can’t be solved using classical statistics. While there may be alternative non-parametric solutions to the posed problem, the bootstrap is the simplest and most obvious for the candidate to describe, and we consider it an essential tool in a data scientist’s kit.I asked the candidate how we might approach an A/B test where developed a new metric of success and a similarity measure that we can’t use any of the traditional null hypothesis significance tests for.
    3. In statistical models, not satisfying the assumptions can lead the scientist to wrong conclusions by making invalid inferences. It was important for us that the candidate was aware of the assumptions in the most common statistical model and that they understood if/how the hypothetical example violated those assumptions. Furthermore, we wanted to see whether the candidate could offer a more valid alternative from—for example—time series analysis, to account for temporal correlation.“One of the things we’re interested in doing is detecting trends in the usage of our APIs – interfaces we expose to the public so they can search Wikipedia. Say I’ve got this time series of daily API calls in the millions and I fit a simple linear regression model to it and I get a positive slope estimate of 3,000 from which I infer that use of our services is increasing by 3,000 API calls every day. Was this a correct solution to the problem? What did I do wrong? What would you do to answer the same question?”
  5. Concepts in Machine Learning:
    1. Model Tuning: Many statistical and machine learning models rely on parameters (and hyperparameters) which must be specified by the user. Sometimes software packages include default values and sometimes those values are calculated from the data using recommended formulas—for example, for a dataset with p features in the example below, m would be √p. A data scientist should not always use the default values and needs to know how parameter tuning (usually via cross-validation) is used to find a custom, optimal value that results in the smallest errors but also avoids overfitting.First, I asked if they knew how a random forest works in general and how its trees are grown. If not, it was not a big deal because I’m not interested in their knowledge of a particular algorithm. I reminded them that at every split the algorithm picks a random subset of m features to decide which predictor to split on, and then I asked what m they’d use.
    2. Model Evaluation: It’s not enough to be able to make a predictive model of the data. Whether forecasting or classifying, the analyst needs to be able to assess whether their model is good, how good it is, and what its weaknesses are. In the example below, the classification model might look good overall (because it’s really good at predicting negatives since most of the observations are negatives), but it’s actually terrible at predicting positives! The model learned to maximize its overall accuracy by classifying observations “negative” most of the time.“Let’s say you’ve trained a binary outcome classifier and got the following confusion matrix. This comes out to misclassification rate of 17%, sensitivity of 99%, specificity of 18%, prevalence of 80%, positive predictive value of 83%. Pretend I’m a not-so-technical executive and I don’t know what any of these numbers mean. Is your model good at predicting? What are its pitfalls, if any?”
Predicted Positive Predicted Negative
Actual Positive 2K 9K
Actual Negative 500 45K

It worked!

Based on this process, we successfully hired Chelsy Xie—who writes awesome reports, makes fantastic additions to Discovery’s dashboards (like sparklines and full geographic breakdowns), and (most importantly) is super inquisitive and welcomes a challenge (core traits of a great data scientist).

This process was easier, in part, because Chelsy was not the first data scientist hired by the Wikimedia Foundation; our process was informed by having gone through a previous hiring cycle, and we were able to improve during this iteration.

It’s harder for employers who are hiring a data scientist for the first time because they may not have someone on their team who can put together a data scientist-oriented interview process and design an informative analysis task. Feel free to use this guide as a way to navigate the process for the first time, or for improving your existing process.

This isn’t the only way to interview a candidate for a data scientist position, nor is it the best way. Much of our thinking on how to approach this task was shaped by our own frustrations as applicants, as well as our experience of what data scientists actually do in the workforce. These insights likely also apply to hiring pipelines in other technical disciplines.

We are also interested in continually improving and iterating on this process. If you have additional tips or would like to share best practices from your own data scientist hiring practices, please share them.

References

Further reading

Mikhail Popov, Data Analyst
Wikimedia Foundation

Special thanks to Dan Garry (Discovery’s hiring manager for the Data Analyst position), Discovery’s former data analyst Oliver Keyes, our recruiters Sarah Roth and Liz Velarde, our colleagues Moriel Schottlender, Anna Stillwell, and Sarah Malik who provided invaluable feedback on the job description, and our colleagues Chelsy Xie and Melody Kramer for their input on this post.

by Mikhail Popov at February 02, 2017 04:44 PM

February 01, 2017

Wikimedia Foundation

Announcing the Wikimedia Foundation’s updated Donor Privacy Policy

Photo by Phil Roeder, CC BY 2.0.

Photo by Phil Roeder, CC BY 2.0.

The Wikimedia Foundation is committed to providing a space where anyone can access and contribute to free knowledge. Privacy is an important value of the Wikimedia movement, and one of our foremost concerns here at the Foundation. To that end, protecting the personal information of users, donors, and community members around the globe is a top priority.

Today, we are pleased to announce the latest update of our Donor Privacy Policy, which provides donors with more information about the data we collect, and how we handle and protect it. We believe that when you entrust us with your data, we have a responsibility to remain transparent about and accountable for our information-handling procedures.

The updated policy enumerates the Foundation’s practices for collecting, using, maintaining, protecting, and disclosing donor information in more detailed and robust terms. Little has changed substantively from the previous policy, which was last updated in 2011, but this version is more comprehensive and clearer.

Some key features of the updated policy are as follows:

  • The policy covers the collection, transfer, processing, storage, disclosure, and use of personal and non-personal information collected during the process of making a donation to the Wikimedia Foundation, or interacting with donation- and fundraising-related sites, emails, and banners.
  • The policy does not cover other uses of Foundation projects or sites (such as Wikipedia) which continue to be covered by our main Privacy Policy. The new policy also does not cover donations to or other interactions with Wikimedia affiliates or chapters, which are separate entities. Their handling of donor data is covered by their respective privacy policies and practices.
  • The policy affirms that the Foundation will never sell, trade, or rent the nonpublic personal information of donors.
  • The policy enumerates the purposes for which donor information may be used and strictly limits the circumstances under which we may share donor information (such as with online payment processors, or with a donor’s permission).
  • The policy recognizes that donors may wish to remain anonymous, and that the Foundation strives to honor such requests and preserve donor anonymity when possible.

With the launch of the new Donor Privacy Policy, which goes into effect today, February 1, 2017, the Wikimedia Foundation reinforces our commitment to transparency and the protection of donor privacy. If you have questions about the policy, please email us at privacy@wikimedia.org.

Aeryn Palmer, Legal Counsel
Michael Beattie, Donor Services Manager

Special thanks to Legal Fellow Tarun Krishnakumar for assistance with the preparation of this blog post.

by Aeryn Palmer and Michael Beattie at February 01, 2017 07:07 PM

Gerard Meijssen

Leupp, Arizona


According to the category system of English Wikipedia, Leupp, Arizona is a concentration camp. From 1907, Leupp became the headquarters of the Leupp Indian Land and it is only for a short time in the second world war that people were imprisoned there.

It is kind of ironic that a Wikipedia article that explains about the involvement of the Navajo also registers it as a concentration camp. It is probably why another project is necessary to document more fully what it is all about. Then again, it would be cool to collaborate with them and include all the information in Wikipedia.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at February 01, 2017 06:41 AM

Wiki Loves Monuments

Wiki Loves Monuments 2016 statistics

In 2016, Wiki Loves Monuments (WLM) was a top ranking Wikimedia community initiative in terms of attention raised. In this post, we provide further stats on the 2016 contest.

Overall contest growth in a nutshell

All key metrics showed significant growth compared to last year: the number of participating countries (more than 30% increase), the number of participants (62% increase), the number of first time contributors to Wikimedia projects (78% increase), and the number of photos uploaded (20% increase). This year, we also welcomed eight first time participating countries from very different parts of the world (100% increase).

Countries

In 2016, 42 national competitions participated in WLM contest, 9 more than in 2015, the 2016 contest ranks second after 2013 when 51 national competitions participated. (See first table). 8 countries participated for the first time (Bangladesh, Georgia, Greece, Malta, Morocco, Nigeria, Peru and South Korea) while 7 countries participated for the 6th time (Belgium, France, Germany, Norway, Russia, Spain, and Sweden).

Uploads

A total of 277,406 (and counting) images were uploaded in 2016 which is 20% more than 2015.

WLM – Uploads per year

Germany, with almost 14% of the total number of image uploads, was the top country in terms of the number of uploads (38,809 photos were uploaded for Germany’s contest), closely followed by India and Ukraine. In the following chart you can see the number of uploads by country.

WLM 2016 – Uploads by country

While we are at reporting the 2016 upload counts per country, it is interesting to look at the cumulative number of uploads by country since 2010 when WLM started. See chart below.

WLM – Cumulative uploads by country

Contributors

In the context of WLM, contributors are those who upload at least one image to the contest. In 2016 India and United States excelled in number of uploaders: 1784 and 1783 uploaders, respectively.  Below you can see the number of contributors by country, and contributors by country, year by year (top 10).

WLM 2016 – Contributors by country

WLM 2016 – Contributors by country, year by year (top 10)

Edit activity on Commons

Every year the WLM contest brings peak activity on Commons in the month of September. The second peak earlier in the year, mostly since 2014, is the result of the Wiki Loves Earth contest.

The plot below shows the overall file upload activity in Commons, under two category of bot uploads and manual uploads. The spike in manual uploads in September is due to Wiki Loves Monuments, and while the bot uploads had been pretty flat in the past 3 years, we observe a spike of bot upload starting September 2016.

 

———————————————-

  • This post was initially written and posted at http://infodisiac.com/blog/2017/01/wiki-loves-monuments-2016/ by Erik Zachte. The current post is a slight adaptation of that post by Erik for Wiki Loves Monuments Blog.
  • The featured image is by Wolfgang Beyer published under CC BY-SA 3.0 in https://commons.wikimedia.org/wiki/File:Mandel_zoom_11_satellite_double_spiral.jpg

by Leila at February 01, 2017 04:58 AM

January 31, 2017

Wiki Education Foundation

Welcome, Mahala Stewart!

Mahala_Stewart smOver the past few months, we’ve been conducting research to evaluate student learning outcomes of Wikipedia-based assignments. I’m pleased to announce Mahala Stewart has joined Wiki Ed as a Research Assistant to analyze and interpret survey and focus group data. She will work closely with Research Fellow Zach McDowell, who has been leading the project.

Mahala is a PhD candidate at the University of Massachusetts Amherst in the Department of Sociology, where her dissertation research examines mothers’ experiences of making schooling decisions for their children. She has also been involved with research projects studying interracial couples’ residential and schooling choices and the experiences of childfree adults. She has assisted and taught a range of courses while at UMass, and is currently collaborating on a forthcoming reader, Gendered Lives, Sexual Beings: A Feminist Anthology.

She said she was drawn to Wiki Ed’s student learning outcomes research because of her interest in creative open access approaches to teaching and knowledge production. In addition to analyzing the data, she will be working on written reports and plans to collaborate with others on journal articles based on the research.

Outside of academic work, Mahala enjoys four season New England hiking, and exploring the western Massachusetts live music scene.

Please join me in welcoming Mahala!

by Ryan McGrady at January 31, 2017 09:22 PM

Wikimedia Foundation

Community digest: Dutch Wikipedia rewards prolific contributors with owls; news in brief

Photo by Kolossos, CC BY-SA 4.0.

One of the German WikiEules, which inspired the Dutch Wikipedia community. Photo by Kolossos, CC BY-SA 4.0.

In a new year’s celebration event in Leiden, in the Netherlands, over one hundred Wikipedians gathered to celebrate their success stories from 2016, show appreciation to those helped make it a success, and discuss what can be improved in 2017.

During the event, the Dutch Wikipedia community awarded seven stone owl statues to community-nominated best contributors of the year:

“Our fellow Wikipedians from Germany had already chosen owls,” says Romaine, one of the main organizers of the event who had the idea and worked on implementing it on the Dutch Wikipedia. “And owls stand for wisdom and knowledge, which seems to us appropriate for Wikipedia contributors.”

The Dutch WikiUilen (WikiOwl) idea is an adaptation of the German Wikipedia’s WikiEule. Romaine saw the WikiEules being awarded to prolific Wikipedians during WikiCon 2014 in Cologne, Germany and he worked with fellow Wikipedian Taketa to make it happen. The prizes have been awarded regularly in the last two years.

Every year in November, any contributor with a minimum of 100 edits to Wikipedia articles can nominate any other user to get an owl. The community reviews the candidate contributions and sends in their votes, on a secret ballot, to a special account the organizers manage.

“To ensure neutrality, we have excluded ourselves completely,” Romaine notes. “We can’t nominate, we can’t vote and we can’t be nominated.”

In January, candidates attend the award ceremony without knowing who the prizes will be awarded to. When the names are announced, WikiOwl holders from the previous year call the selected candidates’ names and hand them the owl statues. In case last year’s holder can’t make it to the event or is nominated again, another Wikipedian takes their place in dispensing the award. The surprise factor, in addition to the effort made to make contributors feel appreciated, plays a vital role in this process.

“Without recognition, people may lose their motivation and quit editing.”

In Brief

Wikipedia Day 2017: On the 15 January, celebration events were held in different cities around the world to mark Wikipedia’s 16th birthday. Wikipedians, educators, students and supporters of the projects were there to mark Wikipedia’s anniversary. Read more about Wikipedia day on Wikipedia. Photos from the events held in eight countries are available on Wikimedia Commons.

New MediaWiki extension created by a student: Harri Alasi, a student at Tartu University in Estonia has created a new extension on MediaWiki, the free open-source software used on Wikimedia projects and other websites. The new extension enables users to view and manipulate 3D objects in STL format.

Phabricator updates: On Phabricator, the platform used by Wikimedians to track bug reporting and software development, static items in the top bar’s dropdown menu have been replaced by “Favorites,” where a user can adjust their preferred links.

AffCom welcomes new members: The Wikimedia Affiliations Committee (AffCom), the body charged with advising and recommending the recognition of new movement affiliates, has announced its new members. Camelia Boban, Kirill Lokshin and Satdeep Gill will serve as members of the committee for 2017–19.

Conference grants program deadlines announced: Applications for the newly introduced conference grants program will be open three times a year. Round one applications are now being accepted through 26 February and the decisions will be announced on 24 March. The next application opening will be on 28 May 2017. More information can be found on Meta.

Iraqi Wikimedians kick off a series of editing workshops in Baghdad: Last week, the Iraqi Wikimedians user group held a new editing workshop in Baghdad on the editing basics for new users in the Iraqi capital. Over the past year, the user group organized several introductory workshops in the city of Erbil before heading south to extend the effort to Baghdad.

Wikipedian Pino dies: Pino was an active editor and administrator on the Esperanto Wikipedia. He joined the movement in 2008 where he made every effort to support the Esperanto Wikipedia and its community. He made over 37,000 edits on Wikimedia projects. Pino died on 19 January in Beaune, France.

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at January 31, 2017 07:13 PM

Knowledge knows no boundaries

Photo by NASA, public domain/CC0.

Photo by NASA, public domain/CC0.

At the Wikimedia Foundation, our mission was born of a belief that everyone, everywhere, has something to contribute to our shared human understanding. We believe in a world that encourages and protects the open exchange of ideas and information, community and culture; where people of every country, language, and culture can freely collaborate without restriction; and where international cooperation leads to common understanding.

The new U.S. administration’s executive order on immigration is an affront to this vision. It impedes the efforts of our colleagues and communities who work together from around the world to make shared, open knowledge accessible to all. When our ability to come together across borders is restricted, the world is poorer for it.

Knowledge knows no borders. Our collective human wisdom has long been built through the exchange of ideas, from our first navigational knowledge of the seas to our ongoing exploration of the heavens. When one society has stumbled and slipped into ignorance, others have preserved our records and archives, and built upon them. Throughout the Early Middle Ages in Europe, scholars in Baghdad kept alive the writings of Greek philosophers. These meticulous studies, along with the discoveries of Persian and Arab mathematicians, would in turn help spark the intellectual renaissance of Europe.

Wikipedia is an example of what is possible when borders do not hinder the exchange of ideas. Today, Wikipedia contains more than 40 million articles across nearly 300 languages. It is built one person at a time, across continent and language. It is built through collaboration in person and in communities, at international gatherings of ordinary individuals from around the world. These collaborative efforts serve hundreds of millions of people every month, opening up opportunity and education to all.

The Wikimedia Foundation is headquartered in the U.S., where we have unique freedoms that are essential to supporting the Wikimedia projects. But our mission is global. We support communities and projects from every corner of the globe. Our staff and community members need to be able to move freely in order to support this global movement and foster the sharing of ideas and knowledge, no matter their country of origin.

We strongly urge the U.S. administration to withdraw the recent executive order restricting travel and immigration from certain nations, and closing the doors to many refugees. It threatens our freedoms of inquiry and exchange, and it infringes on the fundamental rights of our colleagues, our communities, and our families.

Although our individual memories may be short, the arc of history is long, and it unfurls in a continuous progression of openness. At the Wikimedia Foundation, we will continue to stand up for our values of open discourse and international cooperation. We join hands with everyone who does.

Katherine Maher, Executive Director
Wikimedia Foundation

by Katherine Maher at January 31, 2017 02:07 AM

Sébastien Santoro (Dereckson)

One year of contributions to Wikimedia — 2016

Some statistics I’ve computed about my production contributions to Wikimedia:

342 actions logged on the server admin log
573 commits to Wikimedia repos, of which:
 5  new wikis created (hello tcy.wikipedia.org)

Thanks to all the people whom I’ve met or I’ve been engaged with during this year for these contributions.

by Dereckson at January 31, 2017 01:23 AM