en.planet.wikimedia

February 03, 2017

Wikidata (WMDE - English)

Being a Volunteer Developer for Wikimedia projects: An Interview with Greta Doçi

Interview by Sandra Muellrick. This blog post is also available in German.

As a volunteer, Greta has been active in the Wikimedia movement for only a few years. She gives talks about Wikidata and is involved with Open Source development. In this blog post we want to introduce both her and the many opportunities the Wikimedia movement offers to try out new things, learn, and improve.

“Everything I know I try to put online to share e.g work with MediaWiki, queries or editing in Wikipedia or Wikidata. ”

For three years now, Greta has been editing Wiki projects on almost every afternoon. She is enthusiastic about bringing Free Knowledge to the world out of her native country of Albania. She’s been an editor for Wikipedia for over three years, and for more than one and a half year she’s been active on Wikidata. She also served on the board of the Albanian user group. Apart from her day job as an IT expert at an Albanian state organization, she organizes Wikidata workshops as well as the Albanian edition of the “Wiki Loves Monuments” photo contest, teaches university students on how to use Wikipedia, and for 3 months now she’s been teaching herself how to contribute code to MediaWiki.

“I love these things. That’s why I’m volunteering.”

She started to volunteer for charity causes at a young age. It’s important for Greta to produce something meaningful, finish projects, and have an impact on society. She loves to learn things, share knowledge and teach others. This is why she feels right at home at the Wikimedia movement.

“My first article in Wikipedia was about computer security. It was my favourite topic during my studies. It was my first article that I translated from English to Albanian during the first days of the Albanian Wikipedia.”

She started her journey into the world of Open Source as a participant in a Wikipedia workshop at her local hackerspace. This is how she started sharing knowledge about Albania, and translating knowledge into Albanian. As a translator from English to Albanian, she has worked for FSFE projects, Nextcloud, OSM, or participated in the toolbar development for Mozilla. Wikipedia proved to be the most efficient project for her, as editing and updates on data can are easy to be integrated into her daily schedule.

“I’m so in love in Wikidata. I’m working more than I used to”

In particular, she fell in love with the Wikidata project. She started to work on Wikidata as she was looking for a more technical Wikimedia project. Then she started adding facts on Albania and began coordinating Albania-related items with other volunteers. The biggest She spent the biggest amount of time updating data from the Albanian Wiki Loves Monuments contest in Wikidata. But it’s not just the work as an editor that she likes — she’s also happy about the opportunity to teach others about Wikidata. She just came back from the 33rd Chaos Communication Congress (or 33C3) where she stood in front of a an audience in a packed room with other volunteers to enthusiastically teach more people about Wikidata.

“The ideology of open source is to learn for yourself and others can take my code.”

At the “Ladies That FOSS” hackathon she got her feet wet at the MediaWiki source code for the first time. Matt Flaschen of the Wikimedia Foundation was her mentor there. She liked the cooperation: “He asked me to google problems first and look through the MediaWiki documentation to find my own solution”, she says. “That helped me not only with the smaller tasks at hand, but also to get an general overview of MediaWiki. There are no events like “Ladies That FOSS” in Albania where you get a mentor who goes over the code with you. There may be a presentation at a tech meetup and then you have to try it all by yourself.”

Ever since “Ladies That FOSS” she’s been contributing code, even though she struggles with finding the right place to get her code reviewed so she doesn’t have to wait for feedback for a long time.

“The community gives me a good feeling. Wikimedia is giving credit to the right people for the work they do. That’s something that motivates me to volunteer.”

Greta would love to see more support for local communities. The movement provides volunteers with with many programs to sponsor travelling to international meetings or organize Wikidata workshops at hackathons. Grants and scholarships are very helpful for her. Without them, she couldn’t afford travelling as a volunteer. However, her local Wikimedia user group has only eight active members who would love to expand their Open Source activities, but lack the resources to organize meetups and hackathons. “Offline meetings are where the community gets their motivation from”, says Greta.

In order to get more developers for MediaWiki, she thinks it’s important to have more people like herself at local spaces who share their knowledge with other volunteers and teach them programming or invite staff developers for in-depth workshops in Albania to get the local programmers’ community engaged. In any case, she hopes for more offline events to get more volunteers involved with projects.

by Jens Ohlig at February 03, 2017 12:37 PM

February 02, 2017

Wikimedia UK

#1lib1ref at the University of Edinburgh

indexI’ve been interested in Wikimedia projects since taking part in the University of Edinburgh’s Women and Medicine editathon in February 2015, when I wrote an article on the Scottish doctor and women’s medical health campaigner Margaret Ida Balfour. I enjoyed researching her life and achievements and found it immensely rewarding and satisfying to see her page appear on Wikipedia (and at the top of Google search results!).

Since then, I have gone on to receive training as a Wikimedia Ambassador from Ewan McAndrew, the University of Edinburgh’s Wikimedian in Residence, and led my own small training session for the Library’s Centre for Research Collections staff and volunteers. At the upcoming History of Medicine editathon, I’m exploring Wikimedia projects beyond Wikipedia, starting by testing out Wikisource with one of our recently digitised, out-of-copyright PhD theses.

Hunting citations

However, it’s not just the big, research-heavy element of Wikipedia that interests me; I also like using the Citation Hunt Tool to improve the quality of existing content. The tool provides the user with a paragraph of text from Wikipedia which contains a statement not backed up by reliable evidence (and therefore labelled with the [citation needed] tag). The challenge is to track down a trustworthy source, such as a peer-reviewed journal article or news article from a reputable publication, in order to back up the statement made in the text. It’s very satisfying when you discover an appropriate source and, as the statements can come from anywhere on Wikipedia, it’s easy to end up researching a range of bizarre and random topics.

In one of the examples I’ve worked on, I used a press release from the official San Francisco 49ers website to confirm the statement that American Footballer Justin Renfrow “signed a contract with San Francisco 49ers on May 18, 2015 along with Michigan State’s Mylan Hicks.”

Citation Hunt

 

#1Lib#Ref

I first dabbled with this tool last January as part of Wikipedia’s #1lib1ref campaign to mark its 15th birthday. At one of our team meetings, Library staff set about making 32 edits to Wikipedia, some using the Citation Hunt Tool and others using their own knowledge and research. We therefore had a very clear target to beat this year! Wikimedia has added a new feature to the tool, so users can now select citations from a topic of interest, rather than just being provided with a completely random statement from the encyclopaedia. Added to this, many of my colleagues were using the visual editor for the first time and feedback was that this made the whole editing process far easier and more enjoyable.

Despite this, one of the big issues raised by colleagues was how to define exactly what can be considered a reliable source. There is lots of information on Wikipedia’s help pages about this issue but a short one-page guide to using reliable sources would be useful for occasions such as this. I personally got into a spot of bother when I used a source which, although published and available on Google Books, was not considered by the Wikimedia community to be reliable enough…

All in all, library staff and our colleagues from the Learning Teaching and Web division managed a grand total of 63 edits, meaning we almost doubled last year’s effort. There are rumours of a friendly rivalry with our colleagues at the National Library of Scotland… this will certainly encourage me to add a few more citations!

Gavin Willshaw

Digital Curator

Library and University Collections

University of Edinburgh

@gwillshaw

by Gavin Wilshaw at February 02, 2017 05:23 PM

Wikimedia Tech Blog

Hiring a data scientist

Photo by NASA, public domain/CC0.

Photo by NASA, public domain/CC0.

Note: this post applies to employers hiring Data Analysts, Data Scientists, Statisticians, Quantitative Analysts, or any one of the dozen more titles used for descriptions of the job of “turning raw data into understanding, insight, and knowledge” (Wickham & Grolemund, 2016), the only differences being the skills and disciplines emphasized.

We recently needed to backfill a data analyst position at the Wikimedia Foundation. If you’ve hired for this type of position in the past, you know that this is no easy task—both for the candidate and the organization doing the hiring.

Based on our successful hiring process, we’d like to share what we learned, and how we drew on existing resources to synthesize a better approach to interviewing and hiring a new member of our team.

Why interviewing a data scientist is hard

It’s really difficult to structure an interview for data scientist positions. In technical interviews, candidates are often asked to recite or invent algorithms on a whiteboard. In data science interviews specifically, candidates are often asked to solve probability puzzles that seem similar to homework sets in an advanced probability theory class. This shows that they can memorize formulas and can figure out the analytical solution to the birthday problem in 5 minutes, but it doesn’t necessary indicate whether they can take raw, messy data, tidy it up, visualize it, glean meaningful insights from it, and communicate an interesting, informative story.

These puzzles, while challenging, often have nothing to do with actual data or the kinds of problems that would be encountered in an actual working environment. It can be both a frustrating experience for candidates and organizations alike—which is why we wanted to think about a better way to hire a data scientist for our team.

We also wanted our process to attract diverse candidates. As Stacy-Marie Ishmael, a John S. Knight Fellow at Stanford University and former Managing Editor at BuzzFeed News, put it: “job descriptions matter… and where they’re posted matter[s] even more.”

In this post we will walk you through the way we structured our job description and interview questions, and how we created a task for candidates to complete to assess their problem-solving skills.

How to write a job post that attracts good, diverse candidates

Defining “data scientist”

The most obvious (but sometimes overlooked) issue in hiring a data scientist is figuring out what kind of skillset you’re actually looking for. The term “data scientist” is not standard; different people have different opinions about what the job entails depending on their background.

Jake VanderPlas, a Senior Data Science Fellow at the University of Washington’s eScience institute, describes data science as “an interdisciplinary subject” that “comprises three distinct and overlapping areas: the skills of a statistician who knows how to model and summarize datasets (which are growing ever larger); the skills of a computer scientist who can design and use algorithms to efficiently store, process, and visualize this data; and the domain expertise—what we might think of as ‘classical’ training in a subject—necessary both to formulate the right questions and to put their answers in context.”

That’s more or less the description I personally subscribe to, and the description I’ll be using for the rest of this piece.

How to ensure you’re attracting a diverse group of candidates

Now that you’ve defined “data scientist,” it’s necessary to move onto the next section of your job description: what a person actually will do!

The exact phrasing of job descriptions is important because research in this area has shown that women feel less inclined to respond to “male-sounding” job ads and truly regard “required qualifications” as required qualifications. In a study of gendered wording in job posts by Gaucher et al. (2011), they found that “job advertisements for male-dominated areas employed greater masculine wording than advertisements within female-dominated areas,” and “when job advertisements were constructed to include more masculine than feminine wording, participants perceived more men within these occupations and women found these jobs less appealing.”

We had a job description (J.D.) that was previously used for hiring me, but it wasn’t perfect—it included lines like “Experience contributing to open source projects,” which could result in preference for people who enter and stay in open source movement because they don’t experience the same levels of harassment that others experienceor a preference for people who have the time to contribute to open source projects (which may skew towards a certain type of person.)

We consulted Geek Feminism wiki’s how-to on recruiting and retaining women in tech workplaces and the solutions to reducing male bias in hiring when rewriting the job description so as to not alienate any potential candidates. From that document, we decided to remove an explicit requirement for years of experience and called out specific skills that women are socialized to be comfortable with associating with themselves, adding time management to required skills and placing greater emphasis on collaboration.

Once we finished this draft, we asked for feedback from several colleagues who we knew to be proponents of diversity and intersectionality.

A super important component of this: we did not want to place the burden of diversifying our workforce on the women or people of color in our workplace. Ashe Dryden, an inclusivity activist and an expert on diversity in tech spaces, wrote, “Often the burden of fostering diversity and inclusion falls to marginalized people,” and, “all of this is often done without compensation. People internal to the organization are tasked with these things and expected to do them in addition to the work they’re already performing” We strongly believe that everyone is responsible for this, and much has been written about how the work of “[diversifying a workplace] becomes a second shift, something [members of an underrepresented group] have to do on top of their regular job.” To remedy this, we specified colleagues to give feedback during their office hours, when/if they had time for it (so it wouldn’t negatively affect their work), and only if they actually wanted to help out.

From the feedback, we rephrased some points and included an encouragement for diverse range of applicants (“Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply. We also welcome remote and international applicants across all timezones.”).  We then felt confident publishing the job description, which our recruiters advertised on services like LinkedIn. In addition, we wanted to advertise the position where DataSci women would congregate, so I reached out to a friend at R-Ladies (a network of women using R) who was happy to let the mailing list know about this job opening.

In short: be proactive, go where people already congregate, and ensure your language in a job post is as inclusive as possible, and you will likely attract a wider pool of potential candidates.

Sample Job Description

You might be asking yourself, “So what did this job description actually look like?” Here it is, with important bits bolded and two italicized notes interjected:

———

The Wikimedia Foundation is looking for a pragmatic, detail-oriented Data Analyst to help drive informed product decisions that enable our communities to achieve our Vision: a world in which every single human being can freely share in the sum of all knowledge.

Data Analysts at the Wikimedia Foundation are key members of the Product team who are the experts within the organization on measuring what is going on and using data to inform the decision making process. Their analyses and insights provide a data-driven approach for product owners and managers to envision, scope, and refine features of products and services that hundreds of millions of people use around the world.

You will join the Discovery Department, where we build the anonymous path of discovery to a trusted and relevant source of knowledge. Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply. We also welcome remote and international applicants across all timezones.

As a Data Analyst, you will:   

  • Work closely with product managers to build out and maintain detailed on-going analysis of the department’s products, their usage patterns and performance.
  • Write database queries and code to analyze Wikipedia usage volume, user behaviour and performance data to identify opportunities and areas for improvement.
  • Collaborate with the other analysts in the department to maintain our department’s dashboards, ensuring they are up-to-date, accurate, fair and focussed representations of the efficacy of the products.
  • Support product managers through rapidly surfacing positive and adverse data trends, and complete ad hoc analysis support as needed.
  • Communicate clearly and responsively your findings to a range of departmental, organisational, volunteer and public stakeholders – to inform and educate them.

Notice the emphasis on collaboration and communication—the social aspect, rather than technical aspect of the job.

Requirements:   

  • Bachelor’s degree in Statistics, Mathematics, Computer Science or other scientific fields (or equivalent experience).
  • Experience in an analytical role extracting and surfacing value from quantitative data.
  • Strong eye for detail and a passion for quickly delivering results for rapid action.
  • Excellent written, verbal, scientific communication and time management skills.
  • Comfortable working in a highly collaborative, consensus-oriented environment.
  • Proficiency with SQL and R or Python.

Pluses:  

  • Familiarity with Bayesian inference, MCMC, and/or machine learning.
  • Experience editing Wikipedia or with online volunteers.
  • Familiarity with MediaWiki or other participatory production environments.
  • Experience with version control and peer code review systems.
  • Understanding of free culture / free software / open source principles.
  • Experience with JavaScript.

Notice how we differentiate between requirements and pluses. Other than SQL and R/Python, we don’t place a lot of emphasis on technologies and specific advanced topics in statistics. We hire knowing that the candidate is able to learn Hive and Hadoop and that they can learn about multilevel models and Bayesian structural time series models if a project requires it.

Benefits & Perks *

  • Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)
  • The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, massages, cooking classes and much more
  • The 401(k) retirement plan offers matched contributions at 4% of annual salary
  • Flexible and generous time off – vacation, sick and volunteer days
  • Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses
  • For those emergency moments – long and short term disability, life insurance (2x salary) and an employee assistance program
  • Telecommuting and flexible work schedules available
  • Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax
  • Great colleagues – international staff speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people

* for benefits eligible staff, benefits may vary by location

———

Take-home task

Many engineering and data science jobs require applicants to complete problems on a whiteboard. We decided not to do this. As Tanya Cashorali, the Founder of TCB Analytics, put it: “[Whiteboard testing] adds unnecessary stress to an environment that’s inherently high stress and not particularly relevant to real-world situations.” Instead, we prefer to give candidates a take-home task. This approach gives candidates the opportunity to perform the necessary background research, get acquainted with the data, thoroughly explore the data, and use the tools they are most familiar with to answer questions.

After our candidates passed an initial screening, they were given 48 hours to complete this task, inspired by this task that I had completed during my interview process. The tasks were designed so the candidate would have to:

  • Develop an understanding and intuition for the provided dataset through exploratory data analysis
  • Demonstrate critical thinking and creativity
  • Deal with real world data and answer actual, potentially-open-ended questions
  • Display knowledge of data visualization fundamentals
  • Write legible, commented code
  • Create a reproducible report (e.g. include all code, list all dependencies) with a summary of findings

We recommend designing a task that uses your own data and a question you’ve answered previously, to give candidates an example of their day-to-day work in the future. If your team or organization have worked on a small-scale, data-driven project to answer a particular business question, a good starting point would be to convert that into the take-home task.

Interview questions

Now that you have your candidates, you have to interview them. This, too, can be tricky—but we wanted to judge each candidate on their merits, so we created a matrix ahead of time that could measure their answers.

One of the things we wanted to emphasize was how our prospective applicants thought about privacy and ethics. From how we handle requests for user data, to our public policy on privacy, our guidelines for ethically researching Wikipedia, and our conditions for research efforts, it is clear that privacy and ethical considerations are really important to the Wikimedia Foundation, and we wanted to ensure that final candidates could both handle the data and the privacy concerns that come with this job.

When we thought about the sorts of questions we’ve been asked in previous interviews and the kinds of topics that were important for us, we devised the following goals:

  • Assess candidate’s critical thinking and research ethics
  • Require candidate to interpret, not calculate/generate results
  • Learn about candidate’s approach to analysis
  • Gauge candidate’s awareness/knowledge of important concepts in statistics and machine learning

To that end, I asked the candidates some or all of the following questions within the hour I had with them:

  1. “What do you think are the most important qualities for a data scientist to have?”
  2. Data Analysis:
    1. “What are your first steps when working with a dataset?” (“Exploratory data analysis” is too vague! Inquire about tools they prefer and approaches that have worked for them in the past.)
    2. “Describe a data analysis you had the most fun doing. What was the part that you personally found the most exciting?”
    3. “Describe a data analysis you found the most frustrating. What were the issues you ran into and how did you deal with them?”
  3. I used this question to assess the candidate’s ability to identify ethics violation in a clear case of scientific misconduct because I wanted to work with someone who understood what was wrong with the case, knew why it was wrong, but also could devise a creative solution that would respect privacy.First, I asked if they’ve heard about the OKCupid fiasco. If they haven’t, I briefly caught them up on the situation, described how answers on OkCupid work (if they didn’t know), and specifically mentioned that the usernames were left in the dataset.
    1. “Please discuss the ethical problems with compiling this dataset in the first place and then publicly releasing it.”
    2. “You’re an independent, unaffiliated researcher. Maybe you’re a researcher here at the Foundation but you worked on this project in your personal capacity outside of work. Describe the steps you might take to make the individuals in the dataset less easily re-identifiable and the kinds of steps you might take before releasing the dataset.”
  4. Concepts in Statistics:
    1. Statistical power, p-value, and effect size is an important trio of concepts in classical statistics that relies on null hypothesis significance testing (NHST). As Andrew Gelman, a professor of Statistics at Columbia University, writes, “naive (or calculating) researchers really do make strong claims based on p-values, claims that can fall apart under theoretical and empirical scrutiny.”I presented outcome of a large sample size (e.g. 10K subjects) A/B test that yielded tiny (e.g. odds ratio of 1.0008) but statistically significant (e.g. p < 0.001) results, and then I asked if we should deploy the change to production. Why or why not?
    2. Bootstrapping is a popular and computationally-intensive tool for nontraditional estimation and prediction problems that can’t be solved using classical statistics. While there may be alternative non-parametric solutions to the posed problem, the bootstrap is the simplest and most obvious for the candidate to describe, and we consider it an essential tool in a data scientist’s kit.I asked the candidate how we might approach an A/B test where developed a new metric of success and a similarity measure that we can’t use any of the traditional null hypothesis significance tests for.
    3. In statistical models, not satisfying the assumptions can lead the scientist to wrong conclusions by making invalid inferences. It was important for us that the candidate was aware of the assumptions in the most common statistical model and that they understood if/how the hypothetical example violated those assumptions. Furthermore, we wanted to see whether the candidate could offer a more valid alternative from—for example—time series analysis, to account for temporal correlation.“One of the things we’re interested in doing is detecting trends in the usage of our APIs – interfaces we expose to the public so they can search Wikipedia. Say I’ve got this time series of daily API calls in the millions and I fit a simple linear regression model to it and I get a positive slope estimate of 3,000 from which I infer that use of our services is increasing by 3,000 API calls every day. Was this a correct solution to the problem? What did I do wrong? What would you do to answer the same question?”
  5. Concepts in Machine Learning:
    1. Model Tuning: Many statistical and machine learning models rely on parameters (and hyperparameters) which must be specified by the user. Sometimes software packages include default values and sometimes those values are calculated from the data using recommended formulas—for example, for a dataset with p features in the example below, m would be √p. A data scientist should not always use the default values and needs to know how parameter tuning (usually via cross-validation) is used to find a custom, optimal value that results in the smallest errors but also avoids overfitting.First, I asked if they knew how a random forest works in general and how its trees are grown. If not, it was not a big deal because I’m not interested in their knowledge of a particular algorithm. I reminded them that at every split the algorithm picks a random subset of m features to decide which predictor to split on, and then I asked what m they’d use.
    2. Model Evaluation: It’s not enough to be able to make a predictive model of the data. Whether forecasting or classifying, the analyst needs to be able to assess whether their model is good, how good it is, and what its weaknesses are. In the example below, the classification model might look good overall (because it’s really good at predicting negatives since most of the observations are negatives), but it’s actually terrible at predicting positives! The model learned to maximize its overall accuracy by classifying observations “negative” most of the time.“Let’s say you’ve trained a binary outcome classifier and got the following confusion matrix. This comes out to misclassification rate of 17%, sensitivity of 99%, specificity of 18%, prevalence of 80%, positive predictive value of 83%. Pretend I’m a not-so-technical executive and I don’t know what any of these numbers mean. Is your model good at predicting? What are its pitfalls, if any?”
Predicted Positive Predicted Negative
Actual Positive 2K 9K
Actual Negative 500 45K

It worked!

Based on this process, we successfully hired Chelsy Xie—who writes awesome reports, makes fantastic additions to Discovery’s dashboards (like sparklines and full geographic breakdowns), and (most importantly) is super inquisitive and welcomes a challenge (core traits of a great data scientist).

This process was easier, in part, because Chelsy was not the first data scientist hired by the Wikimedia Foundation; our process was informed by having gone through a previous hiring cycle, and we were able to improve during this iteration.

It’s harder for employers who are hiring a data scientist for the first time because they may not have someone on their team who can put together a data scientist-oriented interview process and design an informative analysis task. Feel free to use this guide as a way to navigate the process for the first time, or for improving your existing process.

This isn’t the only way to interview a candidate for a data scientist position, nor is it the best way. Much of our thinking on how to approach this task was shaped by our own frustrations as applicants, as well as our experience of what data scientists actually do in the workforce. These insights likely also apply to hiring pipelines in other technical disciplines.

We are also interested in continually improving and iterating on this process. If you have additional tips or would like to share best practices from your own data scientist hiring practices, please share them.

References

Further reading

Mikhail Popov, Data Analyst
Wikimedia Foundation

Special thanks to Dan Garry (Discovery’s hiring manager for the Data Analyst position), Discovery’s former data analyst Oliver Keyes, our recruiters Sarah Roth and Liz Velarde, our colleagues Moriel Schottlender, Anna Stillwell, and Sarah Malik who provided invaluable feedback on the job description, and our colleagues Chelsy Xie and Melody Kramer for their input on this post.

by Mikhail Popov at February 02, 2017 04:44 PM

February 01, 2017

Wikimedia Foundation

Announcing the Wikimedia Foundation’s updated Donor Privacy Policy

Photo by Phil Roeder, CC BY 2.0.

Photo by Phil Roeder, CC BY 2.0.

The Wikimedia Foundation is committed to providing a space where anyone can access and contribute to free knowledge. Privacy is an important value of the Wikimedia movement, and one of our foremost concerns here at the Foundation. To that end, protecting the personal information of users, donors, and community members around the globe is a top priority.

Today, we are pleased to announce the latest update of our Donor Privacy Policy, which provides donors with more information about the data we collect, and how we handle and protect it. We believe that when you entrust us with your data, we have a responsibility to remain transparent about and accountable for our information-handling procedures.

The updated policy enumerates the Foundation’s practices for collecting, using, maintaining, protecting, and disclosing donor information in more detailed and robust terms. Little has changed substantively from the previous policy, which was last updated in 2011, but this version is more comprehensive and clearer.

Some key features of the updated policy are as follows:

  • The policy covers the collection, transfer, processing, storage, disclosure, and use of personal and non-personal information collected during the process of making a donation to the Wikimedia Foundation, or interacting with donation- and fundraising-related sites, emails, and banners.
  • The policy does not cover other uses of Foundation projects or sites (such as Wikipedia) which continue to be covered by our main Privacy Policy. The new policy also does not cover donations to or other interactions with Wikimedia affiliates or chapters, which are separate entities. Their handling of donor data is covered by their respective privacy policies and practices.
  • The policy affirms that the Foundation will never sell, trade, or rent the nonpublic personal information of donors.
  • The policy enumerates the purposes for which donor information may be used and strictly limits the circumstances under which we may share donor information (such as with online payment processors, or with a donor’s permission).
  • The policy recognizes that donors may wish to remain anonymous, and that the Foundation strives to honor such requests and preserve donor anonymity when possible.

With the launch of the new Donor Privacy Policy, which goes into effect today, February 1, 2017, the Wikimedia Foundation reinforces our commitment to transparency and the protection of donor privacy. If you have questions about the policy, please email us at privacy@wikimedia.org.

Aeryn Palmer, Legal Counsel
Michael Beattie, Donor Services Manager

Special thanks to Legal Fellow Tarun Krishnakumar for assistance with the preparation of this blog post.

by Aeryn Palmer and Michael Beattie at February 01, 2017 07:07 PM

Gerard Meijssen

Leupp, Arizona


According to the category system of English Wikipedia, Leupp, Arizona is a concentration camp. From 1907, Leupp became the headquarters of the Leupp Indian Land and it is only for a short time in the second world war that people were imprisoned there.

It is kind of ironic that a Wikipedia article that explains about the involvement of the Navajo also registers it as a concentration camp. It is probably why another project is necessary to document more fully what it is all about. Then again, it would be cool to collaborate with them and include all the information in Wikipedia.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at February 01, 2017 06:41 AM

Wiki Loves Monuments

Wiki Loves Monuments 2016 statistics

In 2016, Wiki Loves Monuments (WLM) was a top ranking Wikimedia community initiative in terms of attention raised. In this post, we provide further stats on the 2016 contest.

Overall contest growth in a nutshell

All key metrics showed significant growth compared to last year: the number of participating countries (more than 30% increase), the number of participants (62% increase), the number of first time contributors to Wikimedia projects (78% increase), and the number of photos uploaded (20% increase). This year, we also welcomed eight first time participating countries from very different parts of the world (100% increase).

Countries

In 2016, 42 national competitions participated in WLM contest, 9 more than in 2015, the 2016 contest ranks second after 2013 when 51 national competitions participated. (See first table). 8 countries participated for the first time (Bangladesh, Georgia, Greece, Malta, Morocco, Nigeria, Peru and South Korea) while 7 countries participated for the 6th time (Belgium, France, Germany, Norway, Russia, Spain, and Sweden).

Uploads

A total of 277,406 (and counting) images were uploaded in 2016 which is 20% more than 2015.

WLM – Uploads per year

Germany, with almost 14% of the total number of image uploads, was the top country in terms of the number of uploads (38,809 photos were uploaded for Germany’s contest), closely followed by India and Ukraine. In the following chart you can see the number of uploads by country.

WLM 2016 – Uploads by country

While we are at reporting the 2016 upload counts per country, it is interesting to look at the cumulative number of uploads by country since 2010 when WLM started. See chart below.

WLM – Cumulative uploads by country

Contributors

In the context of WLM, contributors are those who upload at least one image to the contest. In 2016 India and United States excelled in number of uploaders: 1784 and 1783 uploaders, respectively.  Below you can see the number of contributors by country, and contributors by country, year by year (top 10).

WLM 2016 – Contributors by country

WLM 2016 – Contributors by country, year by year (top 10)

Edit activity on Commons

Every year the WLM contest brings peak activity on Commons in the month of September. The second peak earlier in the year, mostly since 2014, is the result of the Wiki Loves Earth contest.

The plot below shows the overall file upload activity in Commons, under two category of bot uploads and manual uploads. The spike in manual uploads in September is due to Wiki Loves Monuments, and while the bot uploads had been pretty flat in the past 3 years, we observe a spike of bot upload starting September 2016.

 

———————————————-

  • This post was initially written and posted at http://infodisiac.com/blog/2017/01/wiki-loves-monuments-2016/ by Erik Zachte. The current post is a slight adaptation of that post by Erik for Wiki Loves Monuments Blog.
  • The featured image is by Wolfgang Beyer published under CC BY-SA 3.0 in https://commons.wikimedia.org/wiki/File:Mandel_zoom_11_satellite_double_spiral.jpg

by Leila at February 01, 2017 04:58 AM

January 31, 2017

Wiki Education Foundation

Welcome, Mahala Stewart!

Mahala_Stewart smOver the past few months, we’ve been conducting research to evaluate student learning outcomes of Wikipedia-based assignments. I’m pleased to announce Mahala Stewart has joined Wiki Ed as a Research Assistant to analyze and interpret survey and focus group data. She will work closely with Research Fellow Zach McDowell, who has been leading the project.

Mahala is a PhD candidate at the University of Massachusetts Amherst in the Department of Sociology, where her dissertation research examines mothers’ experiences of making schooling decisions for their children. She has also been involved with research projects studying interracial couples’ residential and schooling choices and the experiences of childfree adults. She has assisted and taught a range of courses while at UMass, and is currently collaborating on a forthcoming reader, Gendered Lives, Sexual Beings: A Feminist Anthology.

She said she was drawn to Wiki Ed’s student learning outcomes research because of her interest in creative open access approaches to teaching and knowledge production. In addition to analyzing the data, she will be working on written reports and plans to collaborate with others on journal articles based on the research.

Outside of academic work, Mahala enjoys four season New England hiking, and exploring the western Massachusetts live music scene.

Please join me in welcoming Mahala!

by Ryan McGrady at January 31, 2017 09:22 PM

Wikimedia Foundation

Community digest: Dutch Wikipedia rewards prolific contributors with owls; news in brief

Photo by Kolossos, CC BY-SA 4.0.

One of the German WikiEules, which inspired the Dutch Wikipedia community. Photo by Kolossos, CC BY-SA 4.0.

In a new year’s celebration event in Leiden, in the Netherlands, over one hundred Wikipedians gathered to celebrate their success stories from 2016, show appreciation to those helped make it a success, and discuss what can be improved in 2017.

During the event, the Dutch Wikipedia community awarded seven stone owl statues to community-nominated best contributors of the year:

“Our fellow Wikipedians from Germany had already chosen owls,” says Romaine, one of the main organizers of the event who had the idea and worked on implementing it on the Dutch Wikipedia. “And owls stand for wisdom and knowledge, which seems to us appropriate for Wikipedia contributors.”

The Dutch WikiUilen (WikiOwl) idea is an adaptation of the German Wikipedia’s WikiEule. Romaine saw the WikiEules being awarded to prolific Wikipedians during WikiCon 2014 in Cologne, Germany and he worked with fellow Wikipedian Taketa to make it happen. The prizes have been awarded regularly in the last two years.

Every year in November, any contributor with a minimum of 100 edits to Wikipedia articles can nominate any other user to get an owl. The community reviews the candidate contributions and sends in their votes, on a secret ballot, to a special account the organizers manage.

“To ensure neutrality, we have excluded ourselves completely,” Romaine notes. “We can’t nominate, we can’t vote and we can’t be nominated.”

In January, candidates attend the award ceremony without knowing who the prizes will be awarded to. When the names are announced, WikiOwl holders from the previous year call the selected candidates’ names and hand them the owl statues. In case last year’s holder can’t make it to the event or is nominated again, another Wikipedian takes their place in dispensing the award. The surprise factor, in addition to the effort made to make contributors feel appreciated, plays a vital role in this process.

“Without recognition, people may lose their motivation and quit editing.”

In Brief

Wikipedia Day 2017: On the 15 January, celebration events were held in different cities around the world to mark Wikipedia’s 16th birthday. Wikipedians, educators, students and supporters of the projects were there to mark Wikipedia’s anniversary. Read more about Wikipedia day on Wikipedia. Photos from the events held in eight countries are available on Wikimedia Commons.

New MediaWiki extension created by a student: Harri Alasi, a student at Tartu University in Estonia has created a new extension on MediaWiki, the free open-source software used on Wikimedia projects and other websites. The new extension enables users to view and manipulate 3D objects in STL format.

Phabricator updates: On Phabricator, the platform used by Wikimedians to track bug reporting and software development, static items in the top bar’s dropdown menu have been replaced by “Favorites,” where a user can adjust their preferred links.

AffCom welcomes new members: The Wikimedia Affiliations Committee (AffCom), the body charged with advising and recommending the recognition of new movement affiliates, has announced its new members. Camelia Boban, Kirill Lokshin and Satdeep Gill will serve as members of the committee for 2017–19.

Conference grants program deadlines announced: Applications for the newly introduced conference grants program will be open three times a year. Round one applications are now being accepted through 26 February and the decisions will be announced on 24 March. The next application opening will be on 28 May 2017. More information can be found on Meta.

Iraqi Wikimedians kick off a series of editing workshops in Baghdad: Last week, the Iraqi Wikimedians user group held a new editing workshop in Baghdad on the editing basics for new users in the Iraqi capital. Over the past year, the user group organized several introductory workshops in the city of Erbil before heading south to extend the effort to Baghdad.

Wikipedian Pino dies: Pino was an active editor and administrator on the Esperanto Wikipedia. He joined the movement in 2008 where he made every effort to support the Esperanto Wikipedia and its community. He made over 37,000 edits on Wikimedia projects. Pino died on 19 January in Beaune, France.

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at January 31, 2017 07:13 PM

Knowledge knows no boundaries

Photo by NASA, public domain/CC0.

Photo by NASA, public domain/CC0.

At the Wikimedia Foundation, our mission was born of a belief that everyone, everywhere, has something to contribute to our shared human understanding. We believe in a world that encourages and protects the open exchange of ideas and information, community and culture; where people of every country, language, and culture can freely collaborate without restriction; and where international cooperation leads to common understanding.

The new U.S. administration’s executive order on immigration is an affront to this vision. It impedes the efforts of our colleagues and communities who work together from around the world to make shared, open knowledge accessible to all. When our ability to come together across borders is restricted, the world is poorer for it.

Knowledge knows no borders. Our collective human wisdom has long been built through the exchange of ideas, from our first navigational knowledge of the seas to our ongoing exploration of the heavens. When one society has stumbled and slipped into ignorance, others have preserved our records and archives, and built upon them. Throughout the Early Middle Ages in Europe, scholars in Baghdad kept alive the writings of Greek philosophers. These meticulous studies, along with the discoveries of Persian and Arab mathematicians, would in turn help spark the intellectual renaissance of Europe.

Wikipedia is an example of what is possible when borders do not hinder the exchange of ideas. Today, Wikipedia contains more than 40 million articles across nearly 300 languages. It is built one person at a time, across continent and language. It is built through collaboration in person and in communities, at international gatherings of ordinary individuals from around the world. These collaborative efforts serve hundreds of millions of people every month, opening up opportunity and education to all.

The Wikimedia Foundation is headquartered in the U.S., where we have unique freedoms that are essential to supporting the Wikimedia projects. But our mission is global. We support communities and projects from every corner of the globe. Our staff and community members need to be able to move freely in order to support this global movement and foster the sharing of ideas and knowledge, no matter their country of origin.

We strongly urge the U.S. administration to withdraw the recent executive order restricting travel and immigration from certain nations, and closing the doors to many refugees. It threatens our freedoms of inquiry and exchange, and it infringes on the fundamental rights of our colleagues, our communities, and our families.

Although our individual memories may be short, the arc of history is long, and it unfurls in a continuous progression of openness. At the Wikimedia Foundation, we will continue to stand up for our values of open discourse and international cooperation. We join hands with everyone who does.

Katherine Maher, Executive Director
Wikimedia Foundation

by Katherine Maher at January 31, 2017 02:07 AM

January 30, 2017

Wikimedia UK

Wikimedia UK and National Library of Scotland announce new Gaelic post

The Callanish stones, a prehistoric site on the Western Isles of Scotland - Image by lolaire~commonswiki
The Callanish stones, a prehistoric site on the Western Isles of Scotland – Image by lolaire~commonswiki

The Gaelic language is to be promoted through one of the world’s most popular websites thanks to a new role based at the National Library of Scotland.

Dr Susan Ross, who learned Gaelic as a teenager and has since gained a doctorate in Gaelic studies, has been appointed the world’s first Gaelic Wikimedian in Residence. This year-long Wikimedian in Residence post will see her working with the Gaelic community across Scotland to improve and create resources on Uicipeid, the Scottish Gaelic Wikipedia.

Wikipedia is the world’s most popular online encyclopaedia of which Uicipeid forms one part. It has been in existence since 2004 and currently has more than 14,000 pages of information in Gaelic. Dr Ross will work with the existing community of users to identify priorities for development and encourage new users to begin contributing.

Over the coming year Dr Ross will collaborate with Gaelic speakers, community groups and organisations to improve Uicipeid content by offering training and edit-a-thons. The work will also seek to promote use of the extensive Gaelic resources held by the National Library of Scotland, many of which can be accessed online.

Dr Ross, who has been contributing to Uicipeid since 2010, said: ‘Contributing to Gaelic Wikipedia builds a 21st century information source where knowledge, in Gaelic, about both the Gaelic world and the wider world, can be stored and shared. It is a great opportunity for Gaelic speakers to exercise reading and writing skills in a creative, informal, collaborative environment and I’m excited about the possibilities to get more people involved.’

Gill Hamilton, Digital Access Manager for the National Library said: ‘We were impressed by the number of high quality applications we received from Gaelic speakers to fill this role which demonstrates the importance in which it is held. Susan emerged as the best candidate and we look forward to working with her as she develops this exciting role.’

Daria Cybulska, Head of Programmes and Evaluation at Wikimedia UK said: ‘Issues of diversity and equality are central to Wikimedia UK’s vision and we work to enable people from all ethnic and linguistic backgrounds living in the UK and beyond to enjoy increased access to their own heritage. This project will be crucial in addressing this focus, and we are really looking forward to supporting it.’

The initiative is a partnership between the National Library of Scotland and Wikimedia UK, the charity that supports and promotes the free online encyclopaedia. It is supported by grants from Bòrd na Gàidhlig, the agency responsible for promoting Gaelic language throughout Scotland and internationally, and Wikimedia UK.

The National Library has some of the best collections of Gaelic material anywhere in the world and has been working hard in recent years to make much of this material as available online. This material demonstrates the key role played by Gaelic in Scottish history and culture.

by John Lubbock at January 30, 2017 04:03 PM

January 29, 2017

Sam Wilson

Wikisource hangout notes

The notes from the Wikisource hangout last night are now on Meta.

by Sam Wilson at January 29, 2017 09:05 AM

Gerard Meijssen

#Wikimedia - We shall overcome

I am a #Wikimedian. I am probably one of the prolifice ones with edits in many projects. I say this not to brag but to make clear that for me Wikimedia and what we stand for is integral to who I am.

With the authorities of the United States preventing our community coming to the United States it fractures us in a profound way. It makes our head quarters problematic because many key people are no longer welcome to visit; they cannot get there.

One aspect of "the enemy" is that for many people it is not clear who "the enemy" is. Given that we are about a neutral point of view, this is an area where we should make a difference. Documenting our foe is not what I seek; that will be done but let us document all the countries and people whose citizens, whose refugees are no longer officially welcome in the USA. Let us document them with articles in Wikipedia, with images in Commons, with data in Wikidata but let us be clear we shall overcome and we are all welcome at any of our offices, our chapters and in our community.
Thanks,
        GerardM

by Gerard Meijssen (noreply@blogger.com) at January 29, 2017 08:34 AM

January 28, 2017

Wikimedia India

Fifth AGM of Wikimedia India Chapter

An AGM is an annual general meeting or General Body Meeting conducted once per year where members of Wikimedia India chapter gathers to discuss, audit and make report of the chapter’s functioning. Wikimedia India Chapter, which was approved by the Chapters Committee of the Wikimedia Foundation in 2010, is registered under the Karnataka Societies of Registration Act 1960 which complies the chapter to conduct an AGM along with audit of the accounts every year. A notice of an AGM is generally announced 21 days prior to the date of the Annual General Meeting and sent to all the members of the chapter.  During the annual general meeting as per Karnataka Societies Registration Act 1960, election of executive committee members of the chapter, submission of annual account reports and restructuring of rules and policies are generally agreed upon.

The 5th annual general meeting is scheduled to take place on 29 January 2017 at the chapter’s office located at

Work Adda, No.98/1, MMR

Plaza, 1st Floor, Above DCB Bank, Sarjapur Main Road, Jakkasandra, Koramangala 1st Block,

Bangalore

The meeting and discussion during the 5th AGM will reflect the chapter’s work during the fiscal year of 2014-2015 which includes,  approval of the report of the activities of Wikimedia India Chapter, consideration and approval of the audited accounts, consideration and approval of the results of the Executive committee elections conducted in November 2015. Draft budget (2015-2016) and appointment of auditors for 2015-16 will be considered. Discussion to extend the membership validity from 1 year to 5 years will take place during the meeting. Future projects and the learnings of the chapter from their past events will be summarized and any other subject will be discussed with the permission of the chair.

by Jim Carter at January 28, 2017 05:33 AM

Wiki Education Foundation

Roundup: Food Browning

If the only ingredients in caramel are sugar and water, why does it have a taste and smell different from sugar? Why do bananas get darker as they ripen? How do caramelized onions get so sweet? Why do people have different opinions about the center of a piece of bread as opposed to its crust? Why would someone choose a browned piece of meat to a similarly cooked, but visibly lighter piece?

To understand the answers to these questions requires an explanation of the different chemical reactions involved in “food browning“. The brown color of ripe bananas, bread crust, and caramelized onions, for example, are the result of three different processes — the first is enzymatic and the other two are have to do with the rearranging or breakdown of amino acids or sugars at different temperatures.

Food browning is something that affects what we eat everyday, but if you were to look for information about it on Wikipedia back in September, you’d see a short article with no citations or references. Then it was improved by a student in Heather Tienson’s Introduction to Biochemistry class at UCLA. Now it’s three times the size, complete with citations to reliable scientific sources.

Another student in the class substantially improved Wikipedia’s biography of African American nutritional chemist and former Howard University dean, Cecile Hoover Edwards. One of the more difficult, more frequently underdeveloped parts of science biographies is the section on the person’s research contributions — yet those contributions are typically the very reason we have a biography about them in the first place. For the article on Edwards, the student focused almost entirely on building out the section on her work, such as her extensive efforts to identify low-cost foods for optimal protein production.

Other stand-out work from this class included expansions of articles on the proteins VDAC1 and SCN8A, the process of protein folding, and biographies of pharmacologist Nancy Zahniser and University of Colorado professor Natalie Ahn.

There’s a lot of great chemistry content on Wikipedia, but also a whole lot of room for improvement. Some important topics have articles which are underdeveloped, outdated, missing references, or missing altogether. Students are of an age when they’ve begun to grasp the material, but they also remember what it’s like not to have the necessary vocabulary. In that way, they are well suited to writing on Wikipedia, where they not only research class topics but communicate it to a general audience. To learn more about how to get involved, send us an email at contact@wikiedu.org or visit teach.wikiedu.org.

Photo: Barangan banana Indonesia.JPG by Midori, CC BY-SA 3.0, via Wikimedia Commons.

by Ryan McGrady at January 28, 2017 12:50 AM

January 27, 2017

Wikimedia UK

Wikimedia UK Education Summit #WMUKED17

Wikipedia in Education meetup - Image by Josie Fraser
Wikipedia in Education meetup – Image by Josie Fraser

Blog post by Josie Fraser, educational technologist and trustee of Wikimedia UK

If you would like to attend, please sign up on the Eventbrite page.

The Wikimedia UK Education Summit takes place on February 20th at Middlesex University, London, in partnership with the University’s Department of Media.

It follows on from the successful 2016 Wikimedia UK Education Meetup. Wikimedians and educators working in schools, colleges, higher education and adult education met in Leicester to help inform the work of Wikimedia UK in relation to education, and connect to others using (or wanting to use) Wikimedia projects. The day showcased educators supporting learning and actively engaging learners using a range of projects, including Wikipedia, Wikisource and Wikidata.

This event will continue to build connections and share expertise in relation to Wikimedia UK’s work in formal education. Everyone is welcome – whether you are just getting started and want to find out more about how Wikimedia projects can support education, or you are an established open education champion!

Why should educators attend?

The day will open with two talks. Melissa Highton (Director of the Learning, Teaching and Web Services, University of Edinburgh) will talk about the benefits of appointing a Wikimedian in Residence. If your institution is looking for an effective, affordable and innovative way of actively engaging students and supporting staff development through real world knowledge projects, this is a not-to-be-missed talk!

Stefan Lutschinger (Associate Lecturer in Digital Publishing, Middlesex University) will talk about incorporating Wikipedia editing into the university curriculum. Stefan will cover the practical experience of using Wikimedia projects with formal learning communities.

There will be a range of workshops throughout the day – ideal for those looking for an introduction to specific projects, or to brush up on their skills. Workshops include Wikidata, Wikipedia in the Classroom (and using the Education Dashboard), and how to maximise the potential of a Wikimedian in Residence in a university setting. There will also be a session looking at identifying and curating Wikimedia project resources for educators, helping to support others across the UK. Alongside all of this will be a facilitated unconference space for attendees to discuss subjects not covered by the planned programme.

Please consider signing up here for a lightening talk (of up to five minutes) to share projects and ideas, or email karla.marte@wikimedia.org.uk.

What can Wikimedia UK offer educators?

Wikimedia UK is the national charity for the global Wikimedia movement and enables people and organisations to contribute to a shared understanding of the world through the creation of open knowledge. We recognise the powerful and important role formal education can and does play in relation to this, but also the challenges sometimes faced by educators in relation to institutional adoption and use of Wikimedia projects, including Wikipedia.

This summit offers educators and Wikimedians in the UK the opportunity to work together to help learners and organisations connect and contribute to real world projects and to the global Wikimedia community.

Wikimedia UK can support educators in a wide range of ways: providing events, training, support, connecting communities to volunteers, and helping identify potential project funding.

Can’t make the summit, but want to be involved?

Become a Wikimedia UK member – membership is only £5 per year and provides a range of practical benefits – directly supporting the work of the organisation to make knowledge open and available to all, and being kept in touch about Wikimedia UK events, activities and opportunities. You can join online here.

by Josie Fraser at January 27, 2017 11:35 PM

Wiki Education Foundation

Thanks for the code contributions, GCI students!

The Wiki Ed Dashboard got 20 improvements over the last two months from five young coders participating in Google Code-In, a contest that gets pre-university students involved in open source software development. Their work ranged from new features, to accessibility and performance improvements, to bug fixes, to new automated code tests, to expanded documentation on how to set up the development environment, to “code quality” work that makes the system easier for others to understand and change later. And all of these are live on dashboard.wikiedu.org now!

As a mentor participating in the Code-In alongside others in the Wikimedia tech community, I spent some time in November identifying a few coding tasks that were beginner-friendly. I wasn’t sure what to expect. The Dashboard uses a very different set of technologies from most Wikimedia projects, and in the past, just getting a development environment up and running has been a stumbling block for both newbies and veteran developers. I had recently put some effort into streamlining the setup instructions, but for the Code-In I expected to put a lot of time into simply helping people get set up. But after the first week, I realized that these students were more than capable of getting the system up and running — and that I’d need to find more — and more challenging — tasks for them. I enjoyed seeing young minds exploring Ruby — the programming language I’ve become quite fond of through my work on the Dashboard.

The student contributions didn’t go unnoticed among my Wiki Ed colleagues, either. Jami thanked me the other day for the “LIFE CHANGING” addition of some extra data about not-yet-submitted courses on the Dashboard. I had to tell her she should be thanking two of the Code-In students, who had done that work.

So thanks again to all the students who helped improve the Wiki Ed Dashboard, and thanks as well to the Wikimedia Developer Relations team for facilitating it!

The Wiki Ed Dashboard is free software, which anyone may use, study, modify and share. We develop it in the open, and we welcome anyone with the skills to help us improve it. Our code powers not only Wiki Ed’s dashboard.wikiedu.org, but also the global Wikimedia Programs & Events Dashboard that supports Wikipedia editing projects anywhere, in any language. If you’re looking for an impactful, socially relevant software project to contribute to, give me a ping at sage@wikiedu.org.

Google Code-in and the Google Code-in logo are trademarks of Google Inc.

by Sage Ross at January 27, 2017 09:41 PM

Weekly OSM

weeklyOSM 340

01/17/2017-01/23/2017

Mapping

  • Chethan Gowda from Mapbox writes about this awesome OSMIC JOSM style plugin which enables icons on all POIs when editing OpenStreetMap.

  • A user is asking in the OSM Forum if there is an alternative to Bing for Malaysia as large areas are not covered yet.

  • User Wille from Brasilia, made available the GPS tracks collected by the Brazilian Environmental Protection Agency. To use this data in ImproveOSM, Martijn van Exel tweaked the algorithm for recognizing missing roads. ImproveOSM now contains many unpaved roads in Brazil. ImproveOSM, works with iD editor and there is also a JOSM plugin of the same.

  • A tweet from Brundritt informs that Bing Maps have been updating its imagery. This process should be completed in a few months according to the tweet.

  • On talk-GB, Andrew Hain reports that a mapper added names to polygons landuse=residential and landuse=commercial in south west London (UK). This mapper did not respond to the changeset comments posted by Hain indicating that the names should be in the description and not the polygons themselves.

  • Joost Schouppe asks which tagging scheme for dog toilettes should he publish as a proposal for voting.

  • On the Tagging mailing list, there is a discussion about OSM tags for public transport cards data, which are gradually replacing transport tickets, according to user Warin.

  • On the Tagging mailing list, Martijn van Exel is asking about destination:street tags which were noticed by Telenav mapping team on (mostly) motorway_link off-ramps in Canada. It’s an undocumented sub-tag of the destination tag. Van Exel is asking about how it is being used and if there is some sort of consensus that is documented somewhere else other than the OSM wiki.

  • Joost Schouppe raises the discussion about shop=fuel which was already mentioned here. The issue concerns shops that sell fuel but are not fuel stations. Joost proposes identifying such products.

  • Mapbox updated the basemap imagery in Washington, DC with 2015 aerial imagery at 3 inch (7.5 cm) resolution. Great for mapping (make mapping great again) and counting people as well when you need alternative facts.

Community

  • Escada interviews Steve All from California (USA), for his Mapper of the Month series. The interview is published on the new website of the Belgian OSM community.

  • According to Pascal Neis, while the number of OSM mappers is increasing around the world, it is decreasing in Germany.

  • Are 25.000 mappers enough for Germany? It’s enough for the urban areas but in the rural ones there is still much to do, a weeklyOSM editor says.

Imports

  • Michael Spreng asks on the imports mailing list about the import of addresses in the Canton of Bern (GEBADR list). There has been no feedback concerning to it. The import was refined and the bulk import is starting, which is expected to take some time. Building layouts are going to be improved when possible.

Humanitarian OSM

  • Pascal Neis comments on a Russell Deffner’s tweet about the validation process of Missing Maps. This process apparently produces a lot of OSM changes.

  • Logistics Cluster updates its access constraints map in South Sudan every Friday. This should be of special interest to humanitarian deliveries in the area.

Maps

  • Paul Norman suggests a simple extension to CartoCSS which would decrease the size of our main page’s style by about one third.

  • Molly Lloyd from Mapbox teamed up with some organizers from the Women’s March to create a map where you can find all the events of Woman’s March in different cities and countries.

  • Take back the tech! uses technology to end violence against women and encourages activism against gender-based violence. Using an OSM-based map, people can report cases of violence against women from all over the world.

  • Some of the issues faced by people creating symbols for map styles aren’t always appreciated.

  • Stephan Bösch-Plepelits showcases on the Dev mailing list, the PTMap, a public transport map rendering based on Overpass API, which renders route relations according to the ‘new’ PTv2 scheme.

  • Joost Schouppe shows in his diary how road mapping (tag highway=*) evolved in Brussels.

switch2OSM

Open Data

  • The international Open Data Hackathon will take place on March 4, 2017. The map was broken at our editorial deadline.

  • Weather data which is produced by the German Weather Service (DWD) could be freely available in the future, according to an article (de) of the Spiegel Online. (automatic translation)

Licences

  • Due to copyright issues with Mapbox, the project OSM2VectorTiles was discontinued. The authors have created a successor, the OpenMapTiles, with their own vector tile scheme, and free from legal problems.

Software

Programming

  • For the Google Summer of Code 2017, project proposals are being collected in the Wiki.

  • Geofabrik reports in a blog post how they recently improved referential integrity in their extracts. With a marginal impact on file size while cutting down processing time, this was made possible by switching to the latest osmium version.

Releases

OSM in the media

  • Mapanica, the OSM community in Nicaragua, proposes a new project to help improve the frequency data of public transport, in order to create a system that allows people to better plan their trips in the city of Managua.

  • German TV GRIP_RTL2 used stamen’s great looking #watercolor OpenStreetMap for Romania in their yesterday’s episode (via pascal_n). No attribution was mentioned.

Other “geo” things

  • This is how buildings look when OSM enthusiasts are rebuilding their house.

  • Geospatial World wrote about DigitalGlobe’s AComp: "When a satellite takes an image, the light reflecting from the ground is impacted by the atmosphere and can affect the visual aesthetics of the image. That’s where DigitalGlobe’s Atmospheric Compensation (AComp) steps in."

  • Carlos Felipe Castillo informed: "The new private beta from Blueshift has arrived!" A fun and easy tool to create dynamic maps.

  • Yorokobu from Spain featured the nice maps from Axis Maps.

  • Euro space agency’s Galileo satellites have been stricken by mystery clock failures.

  • Eric Gundersen shows a satellite image of Barack Obama’s presidential inauguration in 2009 by GeoEye, now DigitalGlobe.

  • Open Stats from India notes that Uber’s OpenData Platform is not really Open Data. They call it #openwashing.

Upcoming Events

This weeklyOSM was produced by Hakuch, Peda, Polyglot, Rogehm, SeleneYang, Spec80, YoViajo, derFred, jinalfoflia, keithonearth, vsandre, wambacher, widedangel.

by weeklyteam at January 27, 2017 11:58 AM

Shyamal

Moving Plants

All humans move plants, most often by accident and sometimes with intent. Humans, unfortunately, are only rarely moved by plants. 

The history of plant movements have often been difficult to establish. In the past the only way to tell a plant's homeland was to look for the number of related species in a region to provide clues on origin. This idea was firmly established by Nikolai Vavilov before being sent off to his unfortunate death in Siberia. Today, genetic relatedness of plants can be examined by comparing the similarity of chosen DNA sequences and among individuals of a species those sequence locations that are most variable. Some recent studies on individual plants and their relatedness have provided some very interesting glimpses into human history. A study on baobabs in India and their geographical origins in East Africa established by a study in 2015 and that of coconuts in 2011 are hopefully just the beginnings. These demonstrate ancient human movements which have never received much attention in story-tellings of history. 

Unfortunately there are a lot of older crank ideas that can be difficult for untrained readers to separate. I recently stumbled on a book by Grafton Elliot Smith, a Fullerian professor who succeeded J.B.S.Haldane but descended into crankdom. The book "Elephants and Ethnologists" (1924) can be found online and it is just one among several similar works by Smith. It appears that Smith used a skewed and misapplied cousin of Dollo's Law. According to him, cultural innovation tended to occur only once and that they were then carried on with human migrations. Smith was subsequently labelled a "hyperdiffusionist", a disparaging term used by ethnologists. When he saw illustrations of Mayan sculpture he envisioned an elephant where others saw at best a stylized tapir. Not only were they elephants, they were Asian elephants, complete with mahouts and Indian-style goads and he saw this as definite evidence for an ancient connection between India and the Americas! An idea that would please some modern-day cranks and zealots.

Smith's idea of the elephant as emphasised by him.
The actual Stela in question
 "Fanciful" is the current consensus view on most of Smith's ideas, but let's get back to plants. 

I happened to visit Chikmagalur recently and revisited the beautiful temples of Belur on the way. The "Archaeological Survey of India-approved" guide at the temple did not flinch when he described an object in one of the hands of a carving as being maize. He said maize was a symbol of prosperity. Now maize is a crop that was imported to India and by most accounts only after the Portuguese sea incursions into India in 1492. In the late 1990s, a Swedish researcher identified similar  carvings (actually another one at Somnathpur) from 12th century temples in Karnataka as being maize cobs. It was subsequently debunked by several Indian researchers from IARI and from the University of Agricultural Sciences where I was then studying. An alternate view is that the object is a mukthaphala, an imaginary fruit made up of pearls.
Somnathpur carvings. The figures to the
left and right hold the puported cobs.
(Photo: G41rn8)

The pre-Columbian oceanic trade ideas however do not end with these two cases from India. The third story (and historically the first, from 1879) is that of the sitaphal or custard apple. The founder of the Archaeological Survey of India, Alexander Cunningham, described a fruit in one of the carvings from Bharhut, a fruit that he identified as custard-apple. The custard-apple and its relatives are all from the New World. The Bharhut Stupa is dated to 200 BC and the custard-apple, as quickly pointed out by others, could only have been in India post-1492. The Hobson-Jobson has a long entry on the custard apple that covers the situation well. In 2009, a study raised the possibility of custard apples in ancient India. The ancient carbonized evidence is hard to evaluate unless one has examined all the possible plant seeds and what remains of their microstructure. The researchers however establish a date of about 2000 B.C. for the carbonized remains and attempt to demonstrate that it looks like the seeds of sitaphal. The jury is still out.
I was quite surprised that there are not many writings that synthesize and comment on the history of these ideas on the Internet and somewhat oddly I found no mention of these three cases in the relevant Wikipedia article (naturally, fixed now with an entire new section) - pre-Columbian trans-oceanic contact theories

There seems to be value for someone to put together a collation of plant introductions to India along with sources, dates and locations of introduction. Some of the old specimens of introduced plants may well be worthy of further study.

Introduction dates
  • Pithecollobium dulce - Portuguese introduction from Mexico to Philippines and India on the way in the 15th or 16th century. The species was described from specimens taken from the Coromandel region (ie type locality outside native range) by William Roxburgh.
  • Eucalyptus globulus? - There are some claims that Tipu planted the first of these (See my post on this topic).  It appears that the first person to move eucalyptus plants (probably E. globulosum) out of Australia was  Jacques Labillardière. Labillardiere was surprized by the size of the trees in Tasmania. The lowest branches were 60 m above the ground and the trunks were 9 m in diameter (27 m circumference). He saw flowers through a telescope and had some flowering branches shot down with guns! (original source in French) His ship was seized by the British in Java and that was around 1795 or so and released in 1796. All subsequent movements seem to have been post 1800 (ie after Tipu's death). If Tipu Sultan did indeed plant the Eucalyptus here he must have got it via the French through the Labillardière shipment.  The Nilgiris were apparently planted up starting with the work of Captain Frederick Cotton (Madras Engineers) at Gayton Park(?)/Woodcote Estate in 1843.
  • Muntingia calabura - when? - I suspect that flowerpecker populations boomed after this.
  • Delonix regia - when?
  • In 1857, Mr New from Kew was made Superintendent of Lalbagh and he introduced in the following years several Australian plants from Kew including Araucaria, Eucalyptus, Grevillea, Dalbergia and Casuarina. Mulberry plant varieties were introduced in 1862 by Signor de Vicchy. The Hebbal Butts plantation was establised around 1886 by Cameron along with Mr Rickets, Conservator of Forests, who became Superintendent of Lalbagh after New's death - rain trees, ceara rubber (Manihot glaziovii), and shingle trees(?). Apparently Rickets was also involved in introducing a variety of potato (kidney variety) which got named as "Ricket". -from Krumbiegel's introduction to "Report on the progress of Agriculture in Mysore" (1939) 

Further reading
  • Johannessen, Carl L.; Parker, Anne Z. (1989). "Maize ears sculptured in 12th and 13th century A.D. India as indicators of pre-columbian diffusion". Economic Botany 43 (2): 164–180.
  • Payak, M.M.; Sachan, J.K.S (1993). "Maize ears not sculpted in 13th century Somnathpur temple in India". Economic Botany 47 (2): 202–205. 
  • Pokharia, Anil Kumar; Sekar, B.; Pal, Jagannath; Srivastava, Alka (2009). "Possible evidence of pre-Columbian transoceaic voyages based on conventional LSC and AMS 14C dating of associated charcoal and a carbonized seed of custard apple (Annona squamosa L.)" Radiocarbon 51 (3): 923–930. - Also see
  • Veena, T.; Sigamani, N. (1991). "Do objects in friezes of Somnathpur temple (1286 AD) in South India represent maize ears?". Current Science 61 (6): 395–397.

by Shyamal L. (noreply@blogger.com) at January 27, 2017 02:47 AM

January 26, 2017

Wikimedia Foundation

A history of the finger, as seen through Wikipedia

Photo by Unknown, public domain/CC0.

The first visually documented use of the finger—look at the man standing in the back row, far left side. Photo by unknown, public domain/CC0.

Last Tuesday, English Wikipedia editor Muboshgu left work and jumped on the highway to get home. Along the way, a driver in the lane next to him decided to merge without checking to see if any other cars—like, say, Muboshgu’s—were occupying the space.

Seeing this, Muboshgu took action. “First he got the horn, then he got the bird,” he told me.

That “bird,” a colloquial term for giving someone the finger, has become an indelible symbol of contempt in Western culture.

Naturally, 24 different-language Wikipedias have an article on it.

Muboshgu played a significant role in expanding the finger’s English Wikipedia article to where it is today. He nominated it for “good article” status, a marker of quality awarded only after a peer review from a fellow editor, and it appeared on Wikipedia’s front page in the “Did you know?” section.

He came across the finger’s article in July 2012 when rewriting the biography on Phil Nevin, a former pro baseball player who gave the finger to a heckling fan in 2002. “In linking to the article, I saw how short it was,” he said. “I knew that such a widely used gesture with the implications and reactions associated with it deserved a longer article.”

Indeed, Muboshgu’s research showed that the finger’s origins go back to Ancient Greece and Rome, where in the latter it was known as digitus impudicus—the “shameless, indecent or offensive finger.” The first modern visual evidence of the finger comes from the United States, where in 1886 a baseball pitcher named Charles “Old Hoss” Radbourn was photographed giving it to his team’s rivals. There is no information available to know if he faced any repercussions, but he did make it into the sport’s Hall of Fame. Ironically, just over a century later, a baseball executive resigned after giving the finger to a fan on Fan Appreciation Night.

Although the use of the finger has been on the rise in recent decades, its use is still controversial. “People have varying opinions about items of cultural phenomena such as this,” Muboshgu says, as “it can produce visceral reactions in people, which can cloud judgment.”

This extends to Wikipedia as well; while writing an article on the finger, Muboshgu anticipated potential backlash from other editors’ dislike of the obscene gesture. While the English Wikipedia has a strong policy against censorship, its exact interpretation can and has been often debated when controversial material comes up. In this case, however, Muboshugu was faced only with constructive criticism aimed at the article’s content, not its subject.

And what does Muboshgu think of the finger? “I love the First Amendment to the United States Constitution. This gesture is just one of many things that I can do that in another country might result in my being thrown in jail. I don’t take that right for granted.”

Ed Erhart, Editorial Associate
Wikimedia Foundation

by Ed Erhart at January 26, 2017 11:04 PM

Wikimedia Foundation receives $500,000 from the Craig Newmark Foundation and craigslist Charitable Fund to support a healthy and inclusive Wikimedia community

Photo by Daniel McKay, CC BY-SA 2.0.

Photo by Daniel McKay, CC BY-SA 2.0.

Today, the Wikimedia Foundation announced the launch of a community health initiative to address harassment and toxic behavior on Wikipedia, with initial funding of US$500,000 from the Craig Newmark Foundation and craigslist Charitable Fund. The two seed grants, each US$250,000, will support the development of tools for volunteer editors and staff to reduce harassment on Wikipedia and block harassers.

Approximately 40% of internet users, and as many as 70% of younger users have personally experienced harassment online, with regional studies showing rates as high as 76% for young women. While harassment differs across the internet, on Wikipedia and other Wikimedia projects, harassment has been shown to reduce participation on the sites. More than 50% of people who reported experiencing harassment also reported decreasing their participation in the Wikimedia community.

Volunteer editors on Wikipedia are often the first line of response for finding and addressing harassment on Wikipedia. “Trolling,” “doxxing,” and other menacing behaviors are burdens to Wikipedia’s contributors, impeding their ability to do the writing and editing that makes Wikipedia so comprehensive and useful. This program seeks to respond to requests from editors over the years for better tools and support for responding to harassment and toxic behavior.

“To ensure Wikipedia’s vitality, people of good will need to work together to prevent trolling, harassment and cyber-bullying from interfering with the common good,” said Craig Newmark, founder of craigslist. “To that end, I’m supporting the work of the Wikimedia Foundation towards the prevention of harassment.”

The initiative is part of a commitment to community health at the Wikimedia Foundation, the non-profit organization that supports Wikipedia and the other Wikimedia projects, in collaboration with the global community of volunteer editors. In 2015, the Foundation published its first Harassment Survey about the nature of the issue in order to identify key areas of concern. In November 2016, the Wikimedia Foundation Board of Trustees issued a statement of support calling for a more “proactive” approach to addressing harassment as a barrier to healthy, inclusive communities on Wikipedia.

“If we want everyone to share in the sum of all knowledge, we need to make sure everyone feels welcome,” said Katherine Maher, Executive Director of the Wikimedia Foundation. “This grant supports a healthy culture for the volunteer editors of Wikipedia, so that more people can take part in sharing knowledge with the world.”

The generous funding from the Craig Newmark Foundation and craigslist Charitable Fund will support the initial phase of a program to strengthen existing tools and develop additional tools to more quickly identify potentially harassing behavior, and help volunteer administrators evaluate harassment reports and respond effectively. These improvements will be made in close collaboration with the Wikimedia community to evaluate, test, and give feedback on the tools as they are developed.

This initiative addresses the major forms of harassment reported on the Wikimedia Foundation’s 2015 Harassment Survey, which covers a wide range of different behaviors: content vandalism, stalking, name-calling, trolling, doxxing, discrimination—anything that targets individuals for unfair and harmful attention.

From research and community feedback, four areas have been identified where new tools could be beneficial in addressing and responding to harassment:

  • Detection and prevention – making it easier and faster for editors to identify and flag harassing behavior
  • Reporting – providing victims and respondents of harassment improved ways to report instances that offer a clearer, more streamlined approach
  • Evaluating – supporting tools that help volunteers better evaluate harassing behavior and inform the best way to respond
  • Blocking – making it more difficult for someone who is blocked from the site to return

For more information, please visit the community health initiative‘s main page.

A related press release is available on the Wikimedia Foundation’s website.

by Wikimedia Foundation at January 26, 2017 08:30 PM

Semantic MediaWiki

Help:Embedded format

Help:Embedded format
Embedded format
Embed selected articles.
Available languages
deenzh-hans
Further Information
Provided by: Semantic MediaWiki
Added: 0.7
Removed: still supported
Requirements: none
Format name: embedded
Enabled by default: 
Indicates whether the result format is enabled by default upon installation of the respective extension.
yes
Authors: Markus Krötzsch
Categories: misc
Group:
Table of Contents

↓ INFO ↓

The result format embedded is used to embed the contents of the pages in a query result into a page. The embedding uses MediaWiki transclusion (like when inserting a template), so the tags <includeonly> and <noinclude> work for controlling what is displayed.

Parameters

General

Parameter Type Default Description
source text empty Alternative query source
limit whole number 50 The maximum number of results to return
offset whole number 0 The offset of the first result
link text all Show values as links
sort list of texts empty Property to sort the query by
order list of texts empty Order of the query sort
headers text show Display the headers/property names
mainlabel text no The label to give to the main page name
intro text empty The text to display before the query results, if there are any
outro text empty The text to display after the query results, if there are any
searchlabel text ... further results Text for continuing the search
default text empty The text to display if there are no query results

Format specific

Parameter Type Default Description
embedformat text h1 The HTML tag used to define headings
embedonly yes/no no Display no headings

The embedded format introduces the following additional parameters:

  • embedformat: this defines which kinds of headings to use when pages are embedded, may be a heading level, i.e. one of h1, h2, h3, h4, h5, h6, or a description of a list format, i.e. one of ul and ol
  • embedonly: if this parameter has any value (e.g. yes), then no headings are used for the embedded pages at all.

Example

The following creates a list of recent news posted on this site (like in a blog):

{{#ask:
 News date::+
 language code::en
 |sort=news date
 |order=descending
 |format=embedded
 |embedformat=h3
 |embedonly=yes
 |searchlabel= <br />[view older news]
 |limit=3
}}

This produces the following output:


English

Semantic MediaWiki 2.4.5 (SMW 2.4.5) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides bugfixes for the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

English

Semantic MediaWiki 2.4.4 (SMW 2.4.4) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides bugfixes for MySQL 5.7 issues of the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

English

Semantic MediaWiki 2.4.3 (SMW 2.4.3) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides bugfixes for the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.
[view older news]

NoteNote: The newline (<br />) is used to put the further results link on a separate line.

Remarks

Note that by default this result format also adds all annotations from the pages that are being embedded to the page they are embedded to.1 Starting with Semantic MediaWiki 2.4.0 this can be prevented for annotations done via parser functions #set parser function and #subobject parser function by setting the embedonly parameter to "yes".2 In-text annotations will continue to be embedded. Thus these annotations need to be migrated to use the #set parser function to prevent this from happening.

Also note that embedding pages may accidently include category statements if the embedded articles have any categories. Use <noinclude> to prevent this, e.g. by writing

<noinclude>Category:News feed</noinclude>

Semantic MediaWiki will take care that embedded articles do not import their semantic annotations, so these need not be treated specifically.

Last but not least note that printout statements have no effect on embedding queries.

Limitations

You cannot use the embed format to embed a query from another page if that query relies on the magic word {{PAGENAME}}.



This documentation page applies to all SMW versions from 0.7 to the most current version.
      Other languages: defrzh-hans

Help:Embedded format en 0.7


by Kghbln at January 26, 2017 07:11 PM

Wiki Education Foundation

Teaching Digital History with Wikipedia

As more archives become digitized, historians are turning to new technologies to delve into the past. Drawing from a variety of disciplines, ranging from computational science to digital mapping, the burgeoning field of Digital History is enabling historians to comb through vast amounts of historical data and visualize the past in new ways. From archiving historical restaurant menus to mapping emancipation, historians are embracing new technologies to reimagine the past, and they’re increasingly making this knowledge available to the general public.

It’s not lost on a growing number of historians that digital history projects can provide their students with exciting new ways to understand and engage with the past. At institutions around the country, historians are using social media, online archives, and a host of digital technologies to help their students think critically about history. For some, however, implementing digital history projects is an appealing, but daunting prospect.

Wikipedia-based assignments are an ideal foray into the world of digital history projects. That’s why Educational Partnerships Manager, Jami Mathewson, and I visited the American Historical Association’s 2017 annual meeting in Denver at the beginning of January. When it comes to choosing a digital history assignment, it’s easy to be overwhelmed by the variety of new tools and technologies, and as Jami and I heard time and time again, many instructors are simply unsure where to begin.

That’s where the Wiki Education Foundation can step in. When you incorporate a Wikipedia-based assignment into your course, you don’t need to start from scratch. Wiki Ed has recommendations for all stages of your Wikipedia assignment and has developed the technology and resources to make it all come together. Our brochures and handouts and interactive training modules ensure that both instructors and students have the basic knowledge to begin contributing to Wikipedia. Our Dashboard tool keeps track of your assignment from assignment design to peer review to the moment when students make live edits to Wikipedia. And of course, we have an entire staff devoted to supporting Wikipedia assignments.

There are numerous digital history projects from which history instructors can choose, but Wikipedia is particularly well-suited for conveying some of the fundamental principles of the practice of history.

  • Distinguishing sources: Primary sources are the bread and butter of the historian’s trade, but because Wikipedia has a policy of “no original research” that includes users’ analysis, evaluation, interpretation, or synthesis of primary sources, they can typically only be used for straightforward statements of fact. Far from incompatible with the goals of a history course, however, students who contribute to Wikipedia in the field of history learn how to distinguish between primary and secondary sources of information. They become adept at examining a document’s authority and accuracy and learn how primary sources of information form the foundation of secondary literature.
  • History vs. historiography: The difference between these two concepts can be difficult for history students to grasp. The former is the narrative that historians piece together to describe the past, and the latter is the study of historical methodology. When students tackle history articles on Wikipedia, they have to engage with both concepts. They have to convey the facts as captured in the most current secondary literature, but they also have to consider how different schools of historical thought approach a given historical period. In doing so, students learn that facts and methodology are inseparable in the field of history.
  • Bias and perspective: Historians are storytellers above all. They organize a set of facts into a coherent narrative, and they ultimately decide which parts of the story to include and which to discard. Who writes a piece of history matters, and students must face this reality head on when contributing to Wikipedia. Though they should strive to be as objective as possible, they must ultimately decide which facts and sources of information will produce the most robust and well-balanced Wikipedia entry. When others come along and improve upon their work, they can see first-hand that history is an unfolding narrative rather than a static set of facts.
  • Underrepresentation and misrepresentation: In recent decades, the field of history has come to encompass people and voices previously left out of the historical canon. Women, minorities, and other historically disenfranchised groups are now the focus of many historical inquiries, but many of their entries on Wikipedia are either underdeveloped or missing altogether. History students have a unique opportunity to fill in these important content gaps and help Wikipedia reflect the true breadth and depth of history.

As the American Historical Association has well documented, only a small percentage of history students go on to become historians. While Wikipedia assignments are particularly adept at teaching students the tools of the historical trade, they also help students develop critical media literacy and technical skills that they can apply to their future academic and professional lives. Students who contribute to Wikipedia learn to navigate an increasingly complicated media landscape. They develop the skills necessary to make critical judgments about sources of information, such as whether a news headline is real or fake, and come to better understand the ways in which consuming information is not the same as producing it, and that they are two sides of the same coin.

The goal of digital history is to help historians — and, in turn, the public — understand the past in new ways by drawing on new forms of digital analysis. Similarly, Wikipedia assignments can help students grasp the study of history by understanding how to use one of the most prolific digital tools in use today, while at the same time contributing to a historical record millions of people read every day.

There’s no time like the present to begin documenting the past. If you’re interested in incorporating a Wikipedia assignment into your course, please email us at contact@wikiedu.org or visit teach.wikiedu.org.

by Helaine Blumenthal at January 26, 2017 06:52 PM

Wikimedia Foundation

Wikimedia Research Newsletter, December 2016

Getting more female editors may not increase the ratio of articles about women

Reviewed by Reem Al-Kashif

A bachelor’s degree thesis by Feli Nicolaes[1] finds that, contrary to the general perception, male and female editors do not tend to edit biographical articles on people of their own gender.

Previous research suggested that one solution to the lack of Wikipedia’s biographies of women could be to increase the number of female editors. This was based on the assumption that women would prefer to edit women’s biographies, and men would prefer to edit men’s biographies. Nicolaes refers to this as homophily in her thesis, “Gender bias on Wikipedia: an analysis of the affiliation network”. However, homophily has so far neither been formally investigated nor proved to exist in Wikipedia. Nicolaes analyzes this using datasets from her research group at the University of Amsterdam, of English Wikipedia editors and the pages they edit. She tracks the editing behavior of both self-identified male and female editors on Wikipedia. Contrary to the mainstream assumption, homophily was not found. In other words, female users’ edits are not focused on female biography pages. In fact, Nicolaes finds “inverted homophily” when considering female users who edit a single biographical article more than 200 times: they are more likely to direct this amount of attention to biography articles about men than male editors are.

This brings to mind an initiative to increase content about women—be it biography articles or other content related to women—that has been live since December 2015 in the Arabic Wikipedia. The initiative is in a form of contest where male and female editors try to achieve as much as they can from their self-set goals. Over the four rounds of the contest, only one woman reached the top three in two rounds. So, if the goal is to add more content about women, bringing more women might not be useful. However, Nicolaes also argues that the study should be replicated on larger datasets to validate the results. It remains to be seen whether the same editor behaviour exists in other language editions. Another limitation of the study is its apparent reliance on the gender information that editors publicly state in their user preferences—a method that is widely used but may be susceptible to biases (discussed in more detail in this review).

Theorizing the foundations of the gender gap

Reviewed by Aaron Shaw

In a forthcoming paper, “‘Anyone can edit’ not everyone does: Wikipedia and the gender gap”[2], Heather Ford and Judy Wajcman use some of the theoretical tools of feminist science and technology studies (STS) to describe underpinnings of the Wikipedia gender gap. The authors argue that three aspects of Wikipedia’s infrastructure define it as a particularly masculine or male-dominated project:

(1) the epistemological foundations of what constitutes valid encyclopedic knowledge,
(2) Wikipedia’s software infrastructure, and
(3) Wikipedia’s policy infrastructure.

The authors argue that each of these arenas represents a space where male activity and masculine norms of truth, scientific fact, legitimacy, and freedom define boundaries of legitimate contribution and action. Accordingly, these boundaries of legitimate contribution and action systematically exclude or devalue perspectives and contributions that could overcome the lack of female participation or perspectives in the Wikipedia projects. The result, according to Ford and Wajcman, is that Wikipedia has created a novel and powerful form of knowledge-production expertise on a foundation that reproduces existing gender hierarchies and inequalities.

How old and new astronomy papers are being cited

Reviewed by Piotr Konieczny

The author analyzes[3] Wikipedia’s citations to academic peer reviewed articles, finding that “older papers from before 2008 are increasingly less likely to be cited”. The authors attempt to use Wikipedia citations as a proxy for public interest in astronomy, though the analysis makes no comparison to other research about public interest in sciences. The article notes that citations to articles from 2008 are most common, and it represents the peak of citations, with fewer and fewer citations for years since 2008. The analysis is also limited due to the cut-off date (1996), “because Scopus indexing of journals changes in this year”. The author concludes that the observed citation pattern is likely “consistent with a moderate tendency towards obsolescence in public interest in research”, as papers become obsolete and newer ones are more likely to be cited; older papers are cited for timeless, uncontroversial facts, and newer for newer findings. They also note that the late 2000s, i.e. the years around 2008, may represent when most of Wikipedia’s content in astronomy was created, through this is not backed up by much besides speculation. Overall, it is an interesting question, but one that does not provide any surprising insights.

Wikipedia is not a suitable source for election predictions

Reviewed by Piotr Konieczny

The topic of this conference paper, “Election prediction based on Wikipedia pageviews”,[4] is certainly timely. The authors look at which of Wikipedia’s articles related to the US presidential election registered high popularity, and then ask whether elections can be predicted based “on the number of views the spiking pages have and on the correlation between these pages and the presidential nominees or their political program”. They provide an online visualization showing some “Wikipedia topics that have spiked before, during or after [an] election event.”

The authors limit themselves (reasonably) to the English and Spanish Wikipedias. They do a good job of presenting their methods, and outlining problems with gathering data on popularity of articles—something that would be much easier if Wikipedia articles and databases were more friendly when it comes to information about their popularity. Within the limitations described in the paper, the authors conclude that Wikipedia articles about politicians are used mostly after, not before or during debates or other events such as primaries or elections, which suggests that they are not used for fact checking but instead as an information source after the event. “Wikipedia is not, in fact, a reliable polling source”, write the authors, based on (this could be clarified further) the fact that people check Wikipedia after the events, not before them, hence making Wikipedia’s pageviews problematic for prediction.

“Black Lives Matter in Wikipedia: Collaboration and collective memory around online social movements”

Reviewed by Piotr Konieczny
Protesters lying down over rail tracks with a "Black Lives Matter" banner.

Black Lives Matter die-in protesting alleged police brutality in 2015

In this paper,[5] the researchers look at the relation between the Black Lives Matter (BLM) social movement and its coverage in Wikipedia, asking the following research questions:

  1. “How has Wikipedia editing activity and the coverage of BLM movement events changed over time?”
  2. “How have Wikipedians collaborated across articles about events and the BLM movement?” and
  3. “How are events on Wikipedia re-appraised following new events?”

They aim to contribute to academic discourse on social movements and claim to describe “knowledge production and collective memory in a social computing system as the movement and related events are happening.” They conclude that Wikipedia is a neutral platform, but does indirectly support (or hinder) the movement (or its opponents) by virtue of increased visibility, in the same vein as coverage by the media would. The quality of the movement’s history and documentation on Wikipedia is judged to be of higher value, accessibility, and quality than snapshots on social media platforms like Twitter. Wikipedia also provides space for interested editors to work on articles indirectly related to BLM, further increasing the visibility of related topics, as interested editors move beyond direct BLM articles to other aspects. Examples include historical articles about events preceding BLM that would probably not be written/expanded on in Wikipedia if not for the rise of the BLM movement. The authors conclude that social movement activists can use Wikipedia to document their activities without compromising Wikipedia’s neutrality or other policies: “Without breaking with community norms like NPOV, Wikipedia became a site of collective memory documenting mourning practices as well as tracing how memories were encoded and re-interpreted.” This is a valuable argument that draws interesting connections between Wikipedia and social movements, particularly considering that some (like this reviewer) consider Wikipedia itself to be a social movement.

Briefly

Conferences and events

The third annual Wiki Workshop will take place on April 4 as part of the WWW2017 conference in Perth, Australia. The workshop serves as a platform for Wikimedia researchers to get together on an annual basis and share their research with each other (see also our overview of the papers from the 2016 edition). All Wikimedia researchers are encouraged to submit papers for the workshop and attend it. More details at the call for papers.

See the research events page on Meta-wiki for other upcoming conferences and events, including submission deadlines.

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions are always welcome for reviewing or summarizing newly published research.

  • “Facilitating the use of Wikidata in Wikimedia projects with a user-centered design approach”[6] From the abstract: “In its current form, [data from Wikidata] is not used to its full potential [on other Wikimedia projects] for a multitude of reasons, as user acceptance is low and the process of data integration is unintuitive and complicated for users. This thesis aims to develop a concept using user-centered design to facilitate the editing of Wikidata data from Wikipedia. With the involvement of the Wikimedia community, a system is designed which integrates with pre-existing work flows.”
  • “A corpus of Wikipedia discussions: over the years, with topic, power and gender labels”[7] From the abstract: “… we present a large corpus of Wikipedia Talk page discussions that are collected from a broad range of topics, containing discussions that happened over a period of 15 years. The dataset contains 166,322 discussion threads, across 1236 articles/topics that span 15 different topic categories or domains. The dataset also captures whether the post is made by an registered user or not, and whether he/she was an administrator at the time of making the post. It also captures the Wikipedia age of editors in terms of number of months spent as an editor, as well as their gender.”
  • “Wikipedia and the politics of openness” Two reviews of the 2014 book with this title[supp 1], in the journal Information, Communication & Society[8] and in Contemporary Sociology: A Journal of Reviews[9], with the latter summarizing the book as follows: “Tkacz’s text has three main empirical chapters. The first sorts out the ‘politics of openness,’ by which he means how collaboration emerges and forms in an open-ended context. The second empirical contribution is about the possibility that the framing of social interaction might, by itself, be enough to create order and encourage productivity in an environment like Wikipedia. … The third empirical contribution is that project exit has an extremely important role in maintaining the stability of Wikipedia. As people develop projects, they create parallel, break-off versions of a project [forks].”
  • “Derivation of ‘is a’ taxonomy from Wikipedia category graph”[10]
  • “‘En Wikipedia no se escribe jugando’: Identidad y motivación en 10 wikipedistas regiomontanos.”[11] From the English abstract: “This study qualitatively analyses the contributions in the talk pages of the Spanish Wikipedia by the ten most-active registered users in Monterrey, Mexico. Using virtual ethnography … this research finds that these self-styled ‘wikipedistas’ assume the site’s collective identity when interacting with anonymous users, and that their main motivations for ongoing participation are not related to the repository of knowledge in itself, but to their group dynamics and inter-personal relationships within the community.”
  • “Schreiben in der Wikipedia” (“Writing in Wikipedia”)[12] From the book (translated): “From the perspective of Wikipedia research, it can observed that Wikipedia must not be regarded as a community medium [‘gemeinschaftliches Medium’] per se, but that it reflects a conglomerate of individual and community writing processes, which in turn both influence the text genesis, with differing scopes. This chronological development is laid open here for the first time in case of some exemplary article texts, and subsequently, specific properties of each article topic are related to creation of the article that is basd on it.”
  • “Beyond the Book: linking books to Wikipedia”[13] From the abstract: “The book translation market is a topic of interest in literary studies, but the reasons why a book is selected for translation are not well understood. The Beyond the Book project investigates whether web resources like Wikipedia can be used to establish the level of cultural bias. This work describes the eScience tools used to estimate the cultural appeal of a book: semantic linking is used to identify key words in the text of the book, and afterwards the revision information from corresponding Wikipedia articles is examined to identify countries that generated a more than average amount of contributions to those articles. … We assume a lack of contributions from a country may indicate a gap in the knowledge of readers from that country. We assume that a book dealing with that concept could be more exotic and therefore more appealing for certain readers … An indication of the ‘level of exoticness’ thus could help a reader/publisher to decide to read/translate the book or not. Experimental results are presented for four selected books from a set of 564 books written in Dutch or translated into Dutch, assessing their potential appeal for a Canadian audience.”
  • “A multilingual approach to discover cross-language links in Wikipedia”[14] From the abstract: “… given a Wikipedia article (the source) EurekaCL uses the multilingual and semantic features of BabelNet 2.0 in order to efficiently identify a set of candidate articles in a target language that are likely to cover the same topic as the source. The Wikipedia graph structure is then exploited both to prune and to rank the candidates. Our evaluation carried out on 42,000 pairs of articles in eight language versions of Wikipedia shows that our candidate selection and pruning procedures allow an effective selection of candidates which significantly helps the determination of the correct article in the target language version.”
  • “Analyzing organizational routines in online knowledge collaborations: a case for sequence analysis in CSCW[15] From the abstract: “Research into socio-technical systems like Wikipedia has overlooked important structural patterns in the coordination of distributed work. This paper argues for a conceptual reorientation towards sequences as a fundamental unit of analysis for understanding work routines in online knowledge collaboration. Using a data set of 37,515 revisions from 16,616 unique editors to 96 Wikipedia articles as a case study, we analyze the prevalence and significance of different sequences of editing patterns.” See also slides and a separate review by Aaron Halfaker (“This is a weird paper. It isn’t actually a study. It’s more like a methods position paper.”)
  • “Wikipedia: medium and model of collaborative public diplomacy”[16] From the abstract: “Taking a case-study approach, the article posits that Wikipedia holds a dual relevance for public diplomacy 2.0: first as a medium; and second, as a model for public diplomacy’s evolving process. Exploring Wikipedia’s folksonomy, crowd-sourced through intense and organic collaboration, provides insights into the potential of collective agency and symbolic advocacy.”
  • “Enabling fine-grained RDF data completeness assessment”[17] From the abstract: “The idea of the paper is to have completeness information over RDF data sources and use it for checking query completeness. In particular, [for Wikidata,] an indexing technique was developed to allow to scale completeness reasoning to Wikidata-scale data sources. The applicability of the framework was verified using Wikidata and COOL-WD, a completeness tool for Wikidata, was developed. The tool is available at http://cool-wd.inf.unibz.it/
  • “Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO”[18] From the abstract: “In recent years, several noteworthy large, cross-domain and openly available knowledge graphs (KGs) have been created. These include DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Although extensively in use, these KGs have not been subject to an in-depth comparison so far. In this survey, we provide data quality criteria according to which KGs can be analyzed and analyze and compare the above mentioned KGs.” From the paper: “… Wikidata covers all relations of the gold standard, even though it contains considerably less relations [than Freebase] (1,874 vs. 70,802). The Wikidata methodology to let users propose new relations, to discuss about their coverage and reach, and finally to approve or disapprove the relations, seems to be appropriate.”

    Mus musculus had all its genes imported into Wikidata

  • “Wikidata as a semantic framework for the Gene Wiki initiative”[19] From the abstract: “… we imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59 721 human genes and 73 355 mouse genes have been imported from NCBI and 27 306 human proteins and 16 728 mouse proteins have been imported from the Swissprot subset of UniProt. … The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified. … Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists.”
  • “Connecting every bit of knowledge: The structure of Wikipedia’s First Link Network”[20] From the abstract: “By following the first link in each article, we algorithmically construct a directed network of all 4.7 million articles: Wikipedia’s First Link Network. … By traversing every path, we measure the accumulation of first links, path lengths, groups of path-connected articles, and cycles. … we find scale-free distributions describe path length, accumulation, and influence. Far from dispersed, first links disproportionately accumulate at a few articles—flowing from specific to general and culminating around fundamental notions such as Community, State, and Science. Philosophy directs more paths than any other article by two orders of magnitude. We also observe a gravitation towards topical articles such as Health Care and Fossil Fuel.” (See also media coverage: “All Wikipedia Roads Lead to Philosophy, but Some of Them Go Through Southeast Europe First” and Wikipedia:Getting to Philosophy)

References

  1. Nicolaes, Feli (2016-06-24). “Gender Bias on Wikipedia: An analysis of the affilliation network” (PDF). Faculty of Science, Science Park 904, 1098 XH Amsterdam: University of Amsterdam. 
  2. Ford, Heather; Wajcman, Judy. Anyone can edit’ not everyone does: Wikipedia and the gender gap” (PDF). Social Studies of Science. ISSN 0306-3127. >
  3. Thelwall, Mike (2016-11-14). “Does astronomy research become too dated for the public? Wikipedia citations to astronomy and astrophysics journal articles 1996–2014”. El Profesional de la Información 25 (6): 893–900. doi:10.3145/epi.2016.nov.06. ISSN 1699-2407. >
  4. Ciocirdel, Georgiana Diana; Varga, Mihai (2016). Election prediction based on Wikipedia pageviews (PDF). p. 9. 
  5. Twyman, Marlon; Keegan, Brian C.; Shaw, Aaron (2016-11-03). “Black Lives Matter in Wikipedia: Collaboration and collective memory around online social movements”. arXiv:1611.01257 [physics]. doi:10.1145/2998181.2998232. 
  6. Kritschmar, Charlie (2016-03-03). Facilitating the use of Wikidata in Wikimedia projects with a user-centered design approach (PDF) (Thesis).  Bachelor’s thesis written at the HTW Berlin in Internationale Medieninformatik
  7. Prabhakaran, Vinodkumar; Rambow, Owen (2016). “A corpus of Wikipedia discussions: over the years, with topic, power and gender labels”. p. 5. 
  8. Gotkin, Kevin (2016-02-24). “Wikipedia and the politics of openness”. Information, Communication & Society 0 (0): 1–3. doi:10.1080/1369118X.2016.1151911. ISSN 1369-118X.  Closed access
  9. Rojas, Fabio (2016-03-01). “Wikipedia and the Politics of Openness”. Contemporary Sociology: A Journal of Reviews 45 (2): 251–252. doi:10.1177/0094306116629410lll. ISSN 0094-3061. 
  10. Ben Aouicha, Mohamed; Hadj Taieb, Mohamed Ali; Ezzeddine, Malek (2016-04-01). “Derivation of “is a” taxonomy from Wikipedia category graph”. Engineering Applications of Artificial Intelligence 50: 265–286. doi:10.1016/j.engappai.2016.01.033. ISSN 0952-1976.  Closed access
  11. Corona Reyes, Sergio; Reyes, Sergio Antonio Corona; Yáñez, Brenda Azucena Muñoz (2015-12-29). ““En Wikipedia no se escribe jugando”: Identidad y motivación en 10 wikipedistas regiomontanos.”. Global Media Journal México 12 (23). 
  12. Kallass, Dr Kerstin (2015). Schreiben in der Wikipedia. Springer Fachmedien Wiesbaden. doi:10.1007/978-3-658-08265-9. ISBN 978-3-658-08265-9.  Closed access (in German)
  13. Martinez-Ortiz, C.; Koolen, M.; Buschenhenke, F.; Dalen-Oskam, K. v (2015-08-01). “Beyond the Book: linking books to Wikipedia”. 2015 IEEE 11th International Conference on e-Science (e-Science). 2015 IEEE 11th International Conference on e-Science (e-Science). pp. 12–21. doi:10.1109/eScience.2015.12.  Closed access
  14. Bennacer, Nacéra; Vioulès, Mia Johnson; López, Maximiliano Ariel; Quercini, Gianluca (2015-11-01). “A multilingual approach to discover cross-language links in Wikipedia”. In Jianyong Wang, Wojciech Cellary, Dingding Wang, Hua Wang, Shu-Ching Chen, Tao Li, Yanchun Zhang (eds.). Web Information Systems Engineering – WISE 2015. Lecture Notes in Computer Science. Springer International Publishing. pp. 539–553. ISBN 9783319261898.  Closed access
  15. Keegan, Brian C.; Lev, Shakked; Arazy, Ofer (2015-08-19). “Analyzing organizational routines in online knowledge collaborations: a case for sequence analysis in CSCW”. arXiv:1508.04819 [physics, stat]. 
  16. Byrne, Caitlin; Johnston, Jane (2015-10-23). “Wikipedia: medium and model of collaborative public diplomacy”. The Hague Journal of Diplomacy 10 (4): 396–419. doi:10.1163/1871191X-12341312. ISSN 1871-191X.  Closed access
  17. Darari, Fariz; Razniewski, Simon; Prasojo, Radityo Eko; Nutt, Werner (2016). “Enabling fine-grained RDF data completeness assessment”. Proceedings of the 16th International Conference on Web Engineering (ICWE ’16). Lugano, Switzerland. 2016. Springer International Publishing. doi:10.1007/978-3-319-38791-8_10.  Closed access (preprint freely available online)
  18. Färber, Michael; Ell, Basil; Menne, Carsten; Rettinger, Achim; Bartscherer, Frederic (2016). Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. 
  19. Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Mitraka, Elvira; Turner, Julia; Putman, Tim; Leong, Justin; Naik, Chinmay; Pavlidis, Paul; Schriml, Lynn; Good, Benjamin M.; Su, Andrew I. (2016-01-01). “Wikidata as a semantic framework for the Gene Wiki initiative”. Database 2016: 015. doi:10.1093/database/baw015. ISSN 1758-0463. PMID 26989148. 
  20. Ibrahim, Mark; Danforth, Christopher M.; Dodds, Peter Sheridan (2016-05-01). “Connecting every bit of knowledge: The structure of Wikipedia’s First Link Network”. arXiv:1605.00309 [cs]. 
Supplementary references:
  1. Tkacz, Nathaniel (2014-12-19). Wikipedia and the politics of openness. Chicago ; London: University Of Chicago Press. ISBN 9780226192277. 

Wikimedia Research Newsletter
Vol: 6 • Issue: 12 • December 2016
This newletter is brought to you by the Wikimedia Research Committee and The Signpost
Subscribe: Syndicate the Wikimedia Research Newsletter feed Email WikiResearch on Twitter WikiResearch on Facebook[archives] [signpost edition] [contribute] [research index]


by Tilman Bayer at January 26, 2017 06:12 AM

January 25, 2017

Wikimedia UK

The first week’s highlights from #1lib1ref

We are just over a week into the second annual #1lib1ref campaign, where we “imagine a world where every librarian adds one more reference to Wikipedia.”

Jerwood Library, Trinity Hall, Cambridge. Photo by Andrew Dunn, CC BY-SA 2.0.

We are just over a week into the second annual #1lib1ref campaign, where we “imagine a world where every librarian adds one more reference to Wikipedia.”

Wikipedia is based on real facts, backed up by citations—and librarians are expert at finding supporting research.

This year’s campaign launched on January 15, to celebrate Wikipedia’s sixteenth birthday.  As of Monday, participants have made over 1,543 contributions on 1,065 articles in 15 different languages.

We know that more librarian meetups, events, editathons, webinars, coffee hours, tweets, photos, sticker-selfies, blog posts and more have happened—share them on social media to help spread the campaign! Here are a few highlights from the week.

IFLA white papers

Following a year-long conversation with the International Federation of Library Associations, they kicked off #1lib1ref by officially publishing two “Opportunities Papers” emphasizing the potential for collaboration between Wikipedia and academic and public libraries.

Showing the story of a citation

#1lib1ref provides a great opportunity for communities to create resources about how to contribute to Wikimedia projects. Below are great new ones made for the campaign:

Video via Wikimedia Germany and the Simpleshow Foundation, CC BY-SA 4.0.
  1. Wikimedia Deutschland made a great video explainer in both English and German.
  2. NCompass Live hosted a webinar: The Wikimedia Foundation’s Alex Stinson alongside Wiki-Librarians Jessamyn West, Phoebe Ayers, Merrilee Profitt and Kelly Doyle provided an overview of the ways different library communities can improve Wikipedia.
  3. Wikipedian in Residence at the University of Edinburgh, Ewan McAndrew, developed excellent introductory videos for how to contribute to #1lib1ref!

A global story grows bigger

The campaign is already bigger than last year, as we’ve already surpassed our contributions from last year and we’re not even finished yet.  To capture the scope and excitement, we created a Storify to capture and share some of the most interesting of last week’s tweets, which numbered over 1,000.

We still have two more weeks to go! Keep pushing to get your local librarians and libraries involved with the campaign, and help share the gift of a citation with the world.

Alex Stinson, GLAM Strategist
Jake Orlowitz, Head of the Wikipedia Library
Wikimedia Foundation

by Alex Stinson at January 25, 2017 04:01 PM

Gerard Meijssen

#Wikidata - Sultanism anyone?

The definition of "sultanism" is:
In political science, sultanism is a form of authoritarian government characterized by the extreme personal presence of the ruler in all elements of governance. The ruler may or may not be present in economic or social life, and thus there may be pluralism in these areas, but this is never true of political power.
There are prominent scientists who use the term. It  therefore must be applicable and indeed there are some who consider that any sultanate is defined by it.  The problem is that the name is very much linked to Islam but that it equally applies to monarchs like Henry VIII. King Henry started the church of England but the way that the church of England came to be makes sultanism applicable.

It does not really matter how the concept of sultanism came to be. The name chosen is extremely prejudicial. The problem we face is that words and facts matter. Both Wikipedia and Wikidata represent a neutral point of view and therefore a concept like sultanism deserves a place. However, when such a concept is to be applied, it needs to applied in a neutral way. It means that you can not point to a country and say "sultanate". It means that it applies to a ruler and it therefore applies to Henry as much as to an evil genius like Jafar.
Thanks,
     GerardM

by Gerard Meijssen (noreply@blogger.com) at January 25, 2017 07:09 AM

January 24, 2017

Wikimedia Foundation

The first week’s highlights from #1lib1ref

Jerwood Library, Trinity Hall, Cambridge. Photo by Andrew Dunn, CC BY-SA 2.0.

Jerwood Library, Trinity Hall, Cambridge. Photo by Andrew Dunn, CC BY-SA 2.0.

We are just over a week into the second annual #1lib1ref campaign, where we “imagine a world where every librarian adds one more reference to Wikipedia.”

Wikipedia is based on real facts, backed up by citations—and librarians are expert at finding supporting research.

This year’s campaign launched on January 15, to celebrate Wikipedia’s sixteenth birthday.  As of Monday, participants have made over 1,543 contributions on 1,065 articles in 15 different languages.

We know that more librarian meetups, events, editathons, webinars, coffee hours, tweets, photos, sticker-selfies, blog posts and more have happened—share them on social media to help spread the campaign! Here are a few highlights from the week.

IFLA white papers

Following a year-long conversation with the International Federation of Library Associations, they kicked off #1lib1ref by officially publishing two “Opportunities Papers” emphasizing the potential for collaboration between Wikipedia and academic and public libraries.

Showing the story of a citation

#1lib1ref provides a great opportunity for communities to create resources about how to contribute to Wikimedia projects. Below are great new ones made for the campaign:

File:Explainer Video - Using good sources on wikipedia.webm

Video via Wikimedia Germany and the Simpleshow Foundation, CC BY-SA 4.0.

 

  1. Wikimedia Deutschland made a great video explainer in both English and German.
  2. NCompass Live hosted a webinar: The Wikimedia Foundation’s Alex Stinson alongside Wiki-Librarians Jessamyn West, Phoebe Ayers, Merrilee Profitt and Kelly Doyle provided an overview of the ways different library communities can improve Wikipedia.
  3. Wikipedian in Residence at the University of Edinburgh, Ewan McAndrew, developed excellent introductory videos for how to contribute to #1lib1ref!

A global story grows bigger

The campaign is already bigger than last year, as we’ve already surpassed our contributions from last year and we’re not even finished yet.  To capture the scope and excitement, we created a Storify to capture and share some of the most interesting of last week’s tweets, which numbered over 1,000.

We still have two more weeks to go! Keep pushing to get your local librarians and libraries involved with the campaign, and help share the gift of a citation with the world.

Alex Stinson, GLAM Strategist
Jake Orlowitz, Head of the Wikipedia Library
Wikimedia Foundation

Image by Spiritia, public domain/CC0.

by Alex Stinson and Jake Orlowitz at January 24, 2017 08:43 PM

William Beutler

#1Lib1Ref and Adventures in Practical Encyclopedia-Building

Wikipedia_Library_owlThe Wikipedian has long been of the opinion, perhaps controversial on Wikipedia, that it is a mistake to think that it can recruit the entire world to become Wikipedia editors. Yet this is the premise upon which so many aspects of Wikipedia’s platform are based.

Start with the fact that anyone can edit (almost) any page at any time. This was Wikipedia’s brilliant original insight, and there is no doubt it made Wikipedia what it is today. But along with scholars and other knowledge-loving contributors comes the riff raff. The calculation is that the value of good editors attracted by Wikipedia’s open-editing policy will outweigh the vandals and troublemakers. On one hand, it is an article of faith not rigorously tested. On the other hand, Wikipedia’s mere existence is proof that the bet is generally sound.

All of which is preamble to praise Wikipedia’s #1Lib1Ref project, now in its second year, for taking what is to my mind a more sensible approach to building Wikipedia’s editorship: targeting persons and professions that already have more in common with Wikipedia than they might realize, in this case librarians. Whereas the official Wikimedia vision statement calls for “a world in which every single human being can freely share in the sum of all knowledge”, the #1Lib#1Ref tagline suggests “a world where every librarian added one more reference to Wikipedia.”

This is great! As much as The Wikipedian strongly supports the big-picture goal of the vision statement, the fact is asking “every” person to contribute “all” things is no place to begin. But asking a very specific type of person to make just one contribution actually turns out to be massively more powerful because it is vastly more effective.

Speaking anecdotally, the greatest hurdle to becoming a Wikipedia contributor is figuring out how to make that very first edit.[1]The second greatest hurdle is getting that person to figure out what to do next, but that is for another day. Encouraging the determination to give it a try, and creating a simple set of steps to help them get there, will do a lot more than the sum of all lofty rhetoric.

#1Lib#1Ref runs January 15 to February 3, and you can learn more about it via The Wikipedia Library. If you decide to get involved, you should also consider posting with the obvious hashtag on Twitter or another social platform of your choice. Oh, and if you don’t get to it before February 3, I’m sure they’ll be happy to have you join in after the fact.

P.S. You have no idea how hard it was to write this without making either a Bob Marley or U2 reference. If you now have one song or the other stuck in your head, you are most welcome.

The Wikipedia Library logo by User:Heatherawalls, licensed under Creative Commons.

Notes   [ + ]

1. The second greatest hurdle is getting that person to figure out what to do next, but that is for another day.

by William Beutler at January 24, 2017 04:09 PM

January 23, 2017

Andy Mabbett (pigsonthewing)

Bromptons in Museums and Art Galleries

Every time I visit London, with my Brompton bicycle of course, I try to find time to take in a museum or art gallery. Some are very accommodating and will cheerfully look after a folded Brompton in a cloakroom (e.g. Tate Modern, Science Museum) or, more informally, in an office or behind the security desk (Bank of England Museum, Petrie Museum, Geffrye Museum; thanks folks).


Brompton bicycle folded

When folded, Brompton bikes take up very little space

Others, without a cloakroom, have lockers for bags and coats, but these are too small for a Brompton (e.g. Imperial War Museum, Museum of London) or they simply refuse to accept one (V&A, British Museum).

A Brompton bike is not something you want to chain up in the street, and carrying a hefty bike-lock would defeat the purpose of the bike’s portability.


Jack Wills, New Street (geograph 4944811)

This Brompton bike hire unit, in Birmingham, can store ten folded bikes each side. The design could be repurposed for use at venues like museums or galleries.

I have an idea. Brompton could work with museums — in London, where Brompton bikes are ubiquitous, and elsewhere, though my Brompton and I have never been turned away from a museum outside London — to install lockers which can take a folded Brompton. These could be inside with the bag lockers (preferred) or outside, using the same units as their bike hire scheme (pictured above).

Where has your Brompton had a good, or bad, reception?

Update

Less than two hours after I posted this, Will Butler-Adams, MD of Brompton, >replied to me on Twitter:

so now I’m reaching out to museums, in London to start with, to see who’s interested.

by Andy Mabbett at January 23, 2017 08:24 PM

Wikimedia Foundation

“I knew that once I started, I wouldn’t be able to stop writing”: Başak Tosun

Photo by Muzammil, CC BY-SA 4.0.

Photo by Muzammil, CC BY-SA 4.0.

Başak Tosun has been editing the Turkish Wikipedia for over a decade, and she still remembers the feeling she got when she received an email inviting her to contribute.

“The moment I read about [Wikipedia], the idea of writing encyclopedic articles sounded like fun,” Tosun recalls. “But I know myself very well. I hesitated about visiting that website. I knew that once I started, I wouldn’t be able to stop writing.”

Tosun successfully held out for a few months but eventually decided to take the plunge. Editing Wikipedia started as a simple hobby, writing articles about her favorite anime characters, before it became more when she decided to channel her efforts into filling content gaps on the Turkish Wikipedia.

“Many artists, writers and scientists were missing on Wikipedia, or not being fairly or adequately represented on the internet,” says Tosun. “I felt empowered knowing I could do something about it.”

Photo by Horace Vernet, public domain.

Massacre of the Mamluks in Cairo Citadel, 1805. Photo by Horace Vernet, public domain.

One of the 1,100 articles created by Tosun was on Ibn Taghribirdi, a historian from fifteenth century Egypt. He lived with Cairo’s Mamluk elite (the Turkish ruling class of slave origins). Ibn Taghribirdi is known for his analytic style in documenting the Mamluk rulers of Egypt and the history of Egypt during the middle ages.

Overall, Tosun has a passion for editing history and biographies and has invested much time in developing articles about the history of Turkey, women in art, musician and dancer profiles, and more. This interest has bled over into her professional life as well: “When I started editing history, I recognized that my general knowledge of it was lacking,” she said. “So I started a four-year degree program in history at an open university that I’m completing this year.”

Tosun doesn’t have a utopian view of the Wikipedia community. Mistakes occur, she notes, but assuming good faith makes them tolerable. Even editing conflicts can result in collaboration on developing a topic. “Sometimes there are conflicts on the ethnic origins of people in the biographies I write,” Tosun explains. “Most of the time, the person in question is of mixed origins, and therefore all sides of the conflict have merit. In such cases, I usually try to add as much detailed information and references as I can to support all views.”

Tosun enjoys sharing her experience with others and showing them how to contribute. According to her, “It’s always easy to start contributing to Wikipedia as long as the new user recognizes the edit button.” Based on that, she suggested that her sister, a psychology professor, assign her students editing tasks on Wikipedia as part of their syllabus. Tosun offered to help train the students on how to edit Wikipedia.

The plan worked and the next semester, another professor joined the efforts with 102 students. “Most students do not continue contributing extensively, but at least they become better readers of Wikipedia,” Tosun explains. Together with the Turkish Wikipedia community, Tosun is now helping with several Wikipedia courses in different universities.

When not on Wikipedia, Tosun works for a web hosting and domain registration company. She studied political science before going for a second degree in history.

One of her wishes for 2017, is to help organize an editathon (editing workshop) at the Poetry Library in the city where she lives, where the participants would focus on editing Turkish poet profiles.

Interview by Syed Muzammiluddin, Wikimedia Community Volunteer
Profile by Samir Elsharbaty, Digital Content Intern, Wikimedia Foundation

by Syed Muzammiluddin and Samir Elsharbaty at January 23, 2017 08:15 PM

Wiki Education Foundation

Students share linguistics with the world

While visiting the Linguistic Society of America Conference in Austin earlier this month, I asked attendees: why do you think the study of linguistics is so relevant today? Their replies were varied: the election, the rise of fake news, the importance of understanding language bias, and knowing how we use rhetoric to persuade others.

In 2016, linguistics was a topic of interest not just in academic scholarship, but also in popular culture and politics. When Arrival hit theaters in the fall, it challenged us to think about the power of language in shaping our understanding of the world — or other worlds. Throughout the year, news outlets asked us to consider the relevance of the President-elect’s rhetorical devices and speech patterns in shaping public opinion.

Here at Wiki Ed, we agree that the public’s understanding of these issues is paramount. That’s why in November 2015, just as we were starting to promote our Year of Science, we announced our partnership with the Linguistic Society of America to support students as they work to systematically improve coverage of linguistics topics on Wikipedia. And in the last year we’ve done just that.

Since the beginning of our partnership in the spring 2016 term, we’ve supported 25 courses with 348 students as they contributed to language and linguistics articles on Wikipedia. Together, the 373 articles they improved, including 13 new entries, have been viewed over 13.5 million times. These numbers further illustrate the relevance of linguistics to public conversations in 2016. That’s partly why I found myself so glad to be returning to LSA as we kick off the new year, and why Wiki Ed is so proud of our partnership. In 2017, we’d love to continue to grow our support of these classes.

Wiki Ed provides technical tools, training materials, and flexible assignment timelines to make integrating Wikipedia into your courses as simple as possible. Instructors and students also receive staff support throughout the semester. One instructor teaching with us for the second time this spring came by my booth at the conference and said “Wiki Ed’s support will save me 50 hours in prep time!” I hope you’ll join us in using Wikipedia to help the world understand the crucial work of linguists.

For more information about teaching with Wikipedia generally, visit teach.wikiedu.org. If you’d like to talk with someone about setting up an assignment in your next course, reach out at contact@wikiedu.org.

by Samantha Weald at January 23, 2017 08:00 PM

Sam Wilson

Wikisource Hangout

I wonder how long it takes after someone first starts editing a Wikimedia project that they figure out that they can read lots of Wikimedia news on https://en.planet.wikimedia.org/ — and when, after that, they realise they can also post to the news there? (At which point they probably give up if they haven’t already got a blog.)

Anyway, I forgot that I can post news, but then I remembered. So:

There’s going to be a Wikisource meeting next weekend (28 January, on Google Hangouts), if you’re interested in joining:
https://meta.wikimedia.org/wiki/Wikisource_Community_User_Group/January_2017_Hangout

by Sam Wilson at January 23, 2017 11:43 AM

Gerard Meijssen

#Wikipedia - #Sources anyone?

Sources are important. They make it obvious what is correct and what is not. For content in Wikidata Wikipedia is an important source of information. It aims to be neutral and there are loads of sources.

When you bring the information together in a tree like the one to the right, it follows that all the information has to agree with that interpretation. It all starts with "Duqaq Temür Yalığ" but he is called "Toqaq" in the article on Seljuq.

The article on the Seljuk Empire is quite wonderful because it includes the spouses of the Sultans and their lineage. Really relevant to understand the politics of the time.

I do include information where I can find and understand it. Quite often, information is problematic. Sometimes it is obviously wrong as in attributing a person to a modern country. As more data is entered, the information becomes more complicated and coherent. Errors become more glaringly obvious. It becomes more and more a matter of adding individual statements that are the difference and not so much long lists of data.

At some stage the puzzles will be left and sources will need to be sought to make the right statements, not the obvious statements.
Thanks,
       GerardM

by Gerard Meijssen (noreply@blogger.com) at January 23, 2017 10:34 AM

Andre Klapper

Wikimedia in Google Code-in 2016

(Google Code-in and the Google Code-in logo are trademarks of Google Inc.)

Google Code-in 2016 has come to an end. Wikimedia was one of the 17 organizations who took part to offer mentors and tasks to 14-17 year old students exploring free and open source software projects via small tasks.

Congratulations to our 192 students and 46 mentors for fixing 424 tasks together!

Being one of the organization admins, deciding on your top five students at the end of the contest always takes time and discussions as many students have provided impressive work and it hurts to have to put a great contributor on the 6th or 7th place.
Google will announce the Grand Prize winners and finalists on January 30th.

Reading the final feedback of students always re-assures that all the effort mentors and organization admins put into GCI are worth it:

  • In 1.5 month, I learned more than in 1.5 year. — Filip
  • I know these things will be there forever and it’s a big thing for me to have my name on such a project as MediaWiki. — Victor
  • What makes kids like me continue a work is appreciation and what the community did is give them a lot. — Subin
  • I spent my best time of my life during the contest — David

Read blogposts by GCI students about their experience with Wikimedia.

To list some of the students’ achievements:

  • Many improvements to Pywikibot, Kiwix (for Wikipedia offline reading), Huggle, WikiEduDashboard, Wikidata, documentation, …
  • MediaWiki’s Newsletter extension received a huge amount of code changes
  • The Pageview API offers monthly request stats per article title
  • jQuery.suggestions offer reason suggestions to block, delete, protect forms
  • A {{PAGELANGUAGE}} magic word was added
  • Changes to number of observations in the Edit Quality Prediction model
  • A dozen MediaWiki extension pages received screenshots
  • Lots of removal of deprecated code in MediaWiki core and extensions
  • Long CREDIT showcase videos got split into ‘one video per topic’ videos on Wikimedia Commons
  • Proposals for a redesign of the Romanian Wikipedia’s main page
  • Performance improvements to the importDump.php maintenance script
  • Converted Special:RecentChanges to use the OOUI library
  • Allow users to apply change tags as they make logged actions using the MediaWiki web API
  • Added some hooks to Special:Unblock
  • Added a $wgHTTPImportTimeout setting for Special:Import
  • Added ability to configure the web service endpoint and added phpcs checks in MediaWiki’s extension for Ideographic Description Sequences
  • Glossary wiki pages follow the formatting guidelines
  • Research on team communication tools

We also received valuable feedback from our mentors on what we can improve for the next round.

Thanks to everybody for your friendliness, patience, and help provided.
Thanks for your contributions to free software and free knowledge.
See you around on IRC, mailing lists, tasks, and patch comments!

by aklapper at January 23, 2017 04:47 AM

January 21, 2017

Andy Mabbett (pigsonthewing)

Four Stars of Open Standards

I’m writing this at UKGovCamp, a wonderful unconference. This post constitutes notes, which I will flesh out and polish later.

I’m in a session on open standards in government, convened by my good friend Terence Eden, who is the Open Standards Lead at Government Digital Service, part of the United Kingdom government’s Cabinet Office.

Inspired by Tim Berners-Lee’s “Five Stars of Open Data“, I’ve drafted “Four Stars of Open Standards”.

These are:

  1. Publish your content consistently
  2. Publish your content using a shared standard
  3. Publish your content using an open standard
  4. Publish your content using the best open standard

Bonus points for:

  • making clear which standard you use
  • publishing your content under an open licence
  • contributing your experience to the development of the standard.

Point one, if you like is about having your own local standard — if you publish three related data sets for instance, be consistent between them.

Point two could simply mean agreeing a common standard with other items your organisation, neighbouring local authorities, or suchlike.

In points three and four, I’ve taken “open” to be the term used in the “Open Definition“:

Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).

Further reading:

by Andy Mabbett at January 21, 2017 03:13 PM

Gerard Meijssen

#Wikipedia - Support understanding the #gender gap

#Wikidata needs to mature. #Wikipedia needs to mature. They both have wishes they aim to fulfil that escapes them. The gender gap is such an issue and it can be used to illustrate how both will mature when they cooperate.

When you want to know how many articles are expected to be written at a given point you need to analyse the red links. They indicate articles that are likely notable and indicate a structural need in Wikipedia. To do that you need data and you need a tool.

When links exist for every red link to an item in Wikidata, you have both the data and a tool. This will help Wikipedia with its disambiguation, and it will show up what a Wikipedia is missing. It is a tool that may drive people to write articles about the missing links.

All the red links will now link to Wikidata and articles in other Wikipedias. It also allows for people to add statements to Wikidata so that facts about those items are known. For instance that it is about a woman. When statements to awards, professions and events are known, there is added weight to write an article.

In this way two purposes are served; researchers have better tools that help them understand the gender gap and it will help people who care about he gender gap work on reducing that gap.

Technically it is not that complicated to achieve. If there is a problem with this proposal it may be that Wikipedians need to understand that this is not a power grab but a way to improve quality and efficiency of their project.
Thanks,
      GeratdM

by Gerard Meijssen (noreply@blogger.com) at January 21, 2017 08:51 AM

January 20, 2017

Wiki Education Foundation

Monthly Report for December 2016

Highlights

  • At the end of the Wikipedia Year of Science, we tallied the contributions our 287 science courses contributed: 4.93 million words added to 5,640 articles, including 622 new entries, viewed 270 million times just during their respective terms. That means we added the equivalent of 11% of the last print edition of Encyclopedia Britannica to science content on Wikipedia during the Year of Science.
  • Our fall term wrapped up in December, with us supporting more than 6,300 students in 276 courses. In the fall term, student editors added 4.2 million words of content across all disciplines, providing better content for 253 million readers.
  • We announced new Wikipedia Visiting Scholars that will be working with the University of San Francisco’s Department of Rhetoric and Language and San Francisco State University’s Paul K. Longmore Institute on Disability.
  • We released an addition to our series of subject-specific editing brochures: Editing Wikipedia articles on Political Science.

Programs

Educational Partnerships

Samantha Weald attends the American Geophysical Union conference in San Francisco
Samantha Weald attends the American Geophysical Union conference in San Francisco

In December, Outreach Manager Samantha Weald, Classroom Program Manager Helaine Blumenthal, Director of Programs LiAnna Davis, and Educational Partnerships Manager Jami Mathewson attended the final academic conference during the Wikipedia Year of Science. At the American Geophysical Union’s annual meeting in San Francisco, staff members met earth scientists eager to improve Wikipedia’s content. At the conference, we spoke to dozens of scientists who believe Wikipedia is a valuable website for them, their students, and the world. We’re excited to bring more geophysics, geology, and earth science students to Wikipedia in the coming years, helping us amplify the impact of this year’s Wikipedia Year of Science.

As we wrapped up another year of recruitment, we reflected on our aim to increase the Wiki Education Foundation’s visibility to university and college instructors in the United States and Canada. Over the course of the year, we attended 23 conferences to share Wiki Ed’s mission with university instructors. We also made 12 campus visits, where Wiki Ed’s program participants hosted us to encourage their colleagues to join our efforts. Additionally, we hosted four outreach webinars. Through these outreach initiatives, we brought more instructors than ever into the Classroom Program, supporting a record 515 courses and nearly 11,000 students in 2016.

Classroom Program

Status of the Classroom Program for Fall 2016 in numbers, as of December 31:

  • 276 Wiki Ed-supported courses were in progress (130 or 47%, were led by returning instructors).
  • 6,307 student editors were enrolled.
  • 60% of students were up-to-date with the student training.
  • Students edited 5,700 articles, created 722 new entries, and contributed 4.18 million words.

The Fall 2016 term has come to a close, and we’re busily preparing for Spring 2017. Our most successful term to date was defined by growth, productivity, and experimentation. With 276 courses doing Wikipedia assignments, the Classroom Program has grown to nearly triple the size it was in Fall 2014. And of course with this rapid growth, our students are having an even greater impact on Wikipedia. To ensure that all of our instructors and students get the support they need, we implemented several new programs during the Fall 2016 term, including a series of interactive webinars and a more robust help section built into the Dashboard.

While we’re proud of the above numbers, the true success of the Classroom Program is, in some ways, immeasurable. As recent events have demonstrated, fake news poses a serious threat to an informed citizenry. Students who learn how to contribute to Wikipedia are not only making reliable information available to the public at large, they are also developing critical media literacy skills that enable them to discern real from fake sources of information. In learning Wikipedia’s strict policies around sourcing, our students know to question headlines and to dig deeper. These are lifelong skills that not only serve our students, but society more generally.

The close of 2016 also marked the end of the Wikipedia Year of Science. During this year-long initiative, we strove to improve Wikipedia content in STEM and social science fields, while developing critical science communication skills among our students. Our Year of Science campaign consisted of 287 courses and 6,270 students. Together, they contributed 4.93 million words to 5,640 Wikipedia articles, including 622 new entries, and their work was viewed 270 million times in the spring and fall terms alone. A specific goal of the Year of Science was to improve Wikipedia’s coverage of women scientists, and our students either expanded or created well over 100 articles on important but overlooked women in the sciences. While the Year of Science has come to an end, we recognize that our work in this area has, in many ways, only just begun. Science literacy, along with media literacy, are key components of an accurately informed society, and we will continue to prioritize both going forward.

Angel_food_cake_with_strawberries_(4738859336)
The Wikipedia article on angel food cake was among those improved in Richard Ludescher’s Food Physical Systems class at Rutgers University.
Image: Angel food cake with strawberries by F_A, CC BY 2.0, via Wikimedia Commons.

We saw some great work from several courses:

When we think about food, we think about taste, preparation, and whether it’s healthy or not. We rarely think about things like hydrogen bonding, electrostatic interactions, or Van der Waals interactions. But these are important aspects, and thanks to students in Richard Ludescher’s Food Physical Systems class at Rutgers University, information of this sort is now available through a number of Wikipedia articles. A student in the class expanded the angel food cake article by adding sections about the manufacturing process, the ingredients used in commercial production, and the physical and biochemical roles played by these ingredients in the final product. Another student expanded the croissant article by adding information about their manufacturing and the changes in the physical and chemical properties of ingredients during manufacturing, baking, and storage. Other students added information on the physical and chemical properties of a number of other foods including marshmallows, mayonnaise and chewing gum. The expansion of the meat analogue article added information about the composition, processing, and physical structure of the product that are required to mimic the texture and taste of meat.

Students in Glenn Dolphin’s Introductory Geology class continued their work expanding biographies of women geologists. Maria Crawford contributed to lunar petrology, continental collisions (on earth) and the geology of the Pennsylvania Piedmont, but at the beginning of the term her Wikipedia biography was only four sentences long and said nothing of her contributions. A student turned that stub into a substantial article which documented her achievements from a career that spanned four decades. Another student created an article on Virginia Harriett Kline, a a stratigrapher who earned a Ph.D. in geology in 1935 and made important contributions to petroleum geology. Other students in the class continued to expand the articles they had worked on earlier in the term.

Early studies of child psychology often focused on conflict and aggression. Lois Barclay Murphy chose instead to focus on normal childhood development; she played an important role in the development of that field. Marvin Zuckerman played an important role in the development of the field of sensation seeking. Mary K. Rothbart is an expert on infant temperament development. Rena R. Wing is an expert on the behavioral treatment of obesity. None of these psychologists had biographies on Wikipedia. Similarly, the field of geriatric psychology had been omitted. These were among the articles created by students in James Council’s History and Systems of Psychology class at North Dakota State University. Other students worked to expand existing articles, like the profile of mood states, which was expanded from a short stub into a substantial article.

One of Wiki Ed’s great successes has been recruiting professors in archaeology and anthropology to expand and improve articles on archaeological sites, artifacts and methods. There are thousands of sites which are notable but not covered in Wikipedia — students in courses like Rice University’s African Prehistory have added to articles like Manyikeni in Mozambique and KM2 and KM3 in Tanzania. They’ve also updated the article on South Africa’s Border Cave, an already substantial article which now covers more modern work on the site and artifacts found within it.

Critical theory is hard work. Explaining it to laypeople is even harder. Doing so on Wikipedia, harder still. The language of critical theory (in nearly any discipline: law, economics, feminism) is often disjoint from or at odds with the main voices in the discipline — otherwise it’s hard to say it’s critical! Students in John Willinsky’s Critical Theory and Pedagogies outdid themselves in added to Wikipedia’s coverage of critical mathematics pedagogy, critical pedagogy (a hard phrase to hear for the policy debate veterans in our audience), and expanding coverage of books like Learning to Labour, a critical educational ethnography. Work on narrow, difficult topics like critical pedagogy of place requires research and preparation and the students’ work speaks to the hard work they’ve done.

Finally, interim Content Expert Rob Fernandez, who graciously agreed to join our staff temporarily to help out with the rush at the end of the fall term, wrapped up his contract with Wiki Ed in December. Rob’s help to ensure our student editors and instructors got top-notch support was invaluable. Thank you for your contributions, Rob, and best of luck on your new job!

Community Engagement

Blausen_0088_BloodClot
Barbara Page’s article about thrombosis prevention explains treatments to prevent the formation of blood clots inside a blood vessel.
Image: Blausen 0088 BloodClot.png by Blausen.com staff, CC AT 3.0, via Wikimedia Commons.

Community Engagement Manager Ryan McGrady announced two new Visiting Scholars positions at the beginning of this month, which got their start at the very end of last month and are already using institutional resources to improve Wikipedia. User:Lingzhi partnered with the University of San Francisco to improve rhetoric and language topics, and Jackie Koerner is developing articles on disability, such as disability in the United States, with the Paul K. Longmore Institute on Disability at San Francisco State University.

Existing Scholars continued to produce great work. George Mason University’s Gary Greenbaum had another article achieve the impressive Featured Article designation, Alabama Centennial half dollar. The University of Pittsburgh’s Barbara Page built up her portfolio of impressive medical editing with substantial improvements to Wikipedia’s entry for thrombosis prevention.

The community is getting ready to start the 11th annual WikiCup competition, in which experienced editors are awarded points for producing high-quality content. For the 2016 event that just ended, Wiki Ed sponsored a side competition with prizes for the two users with the most Good Article and Featured Articles on scientific topics. In first place was also the overall winner of the competition, User:Casliber, who developed articles like the violet webcap mushroom and the Lynx constellation. In second place, despite sitting out for the final round of the competition, was User:Cwmhiraeth, who improved some very big topics like millipede and habitat.

Program Support

Editing_Wikipedia_articles_on_Political_Science_(Wiki_Ed).pdf
Our newest subject-specific editing brochure will help students working on political science articles.

Communications

LiAnna Davis has been working with San Francisco-based media firm PR & Company to pitch stories to national press about impact Wiki Ed’s programs are having, and especially the impact of the Year of Science.

We announced the newest in our series of subject-specific editing brochures in December: Editing Wikipedia articles on Political Science. Thanks to the Wikipedia editors and partners at the MPSA who provided review and/or feedback for this.

Blog posts:

External media:

Digital Infrastructure

Continuing with the main technical focuses from last month, Product Manager Sage Ross spent December focused on collaboration and mentorship, as well as working on bug fixes and feature development.

Sejal Khatri started her Outreachy internship this month to improve the Dashboard’s user profile pages. She’s already made considerable progress toward our plans for these profile pages, and she’s made several improvements and bug fixes for course pages as well. Check out Sejal’s latest post on her internship blog to see what she’s been up to. December was also a busy month for the high school students participating in Google Code-In. Sage has been mentoring them on Dashboard tasks, including some performance and accessibility improvements, documentation and testing, bug fixes, and new features to help Wiki Ed staff handle new courses more efficiently. This month saw contributions to Wiki Ed’s codebase from nine developers outside of our staff and contractors — a new record.

Sage developed the initial version of an Article Viewer tool that lets you see a full Wikipedia article — as it looks on Wikipedia — without leaving the Dashboard. The Article Viewer is currently available alongside the Diff Viewer when you zoom in on a particular edited article in a course’s Articles tab.

In anticipation of increased Dashboard usage in 2017, in late December — just before Christmas weekend — we migrated the software to a more powerful server. The ensuing 20 minutes were the only downtime for the Dashboard during the entire 2016 term (although a handful of network disruptions and problems with Wikimedia servers did affect Dashboard users at earlier points in the term).

Research and Academic Engagement

In December, Research Fellow Zach McDowell completed the 13 focus groups portion of the research program. A total of 475 minutes of focus group recordings were sent away for transcription, resulting in more than 250 pages of text for analysis.

Survey research participation continued to grow, with more than 1,200 responses in the pre-assessment as well as more than 850 responses for the post-assessment. Surveys close on January 17, 2017.

Zach spent the remainder of his time beginning preliminary assessment of the data and cleanup plan. Additionally, Zach has been seeking out a graduate student to engage as a data science intern to expedite the analysis process of the data.

Finance & Administration / Fundraising

Wiki_Education_Foundation's_San_Francisco_team_holiday_party
Wiki Education Foundation’s San Francisco team holiday party.

Finance & Administration

To celebrate the holiday season, San Francisco based staff gathered at LiAnna’s house for a holiday party. Executive Director Frank Schulenburg led the group in the creation and consumption of a Feuerzangenbowle, and we enjoyed dinner and games.

For the month of December, expenses were $157,772 versus the approved budget of $206,733. The majority of the $49k variance continues to be due to staffing vacancies ($13k); as well as the timing of outside professional services ($22k), and the printing ($11k) expenses.

Wiki Ed Expenses 2016-12
Expenses December 2016 Actual vs. Plan

Our year-to-date expenses of $900,208 was also less than our budgeted expenditures of $1,196,085 by $296k. Like the monthly variance, the year-to-date variance was also largely impacted by staffing vacancies ($100k). In addition, the timing and deferral of professional services ($69k); marketing and cultivation ($18k); volunteer workshops ($13k); and printing ($18k); as well as savings in staffing related expenses ($16k) and in travel ($61k) contributed to the variance.

Fundraising

  • Wiki Ed Expenses 2016-12 YTD
    Expenses Year to Date December 2016 Actual vs. Plan

    Wiki Ed conducted its first–ever individual donor acquisition mailing, which reached more than 11,000 individuals. Appeals were sent via U.S. Mail in late December.

  • Google renewed their support with a $20,000 gift to Wiki Ed.

Office of the ED

Current priorities:

  • Securing funding
  • Developing a plan for next fiscal year
  • Working with the board on additional funding options

In order to be able to share an early outline of our future programmatic work with the board, Frank traditionally starts brainstorming ideas with senior staff in December. That’s why this month we embarked on thinking about the general direction for the upcoming fiscal year 2016–17 and developed a vision for the time ahead to be shared with the board on January 28–29.

Frank also started conversations with existing and prospective funders on some initiatives that are in our project pipeline for 2017. These conversations – as well as our projections of the expected impact – will inform our roadmap for the upcoming year and beyond.

Also in December, Frank prepared a series of documents for an ad hoc board taskforce that will look into additional funding streams prior to the in-person board meeting at the end of January. The board taskforce meetings will start next month via video conference with the goal of coming up with a recommendation to the board as a whole.

 

Visitors and guests

  • Merrilee Proffitt, OCLC
  • Steve Kaplan, Message LA

by Ryan McGrady at January 20, 2017 11:56 PM

Content Translation Update

January 20 CX Update: More fixes for page loading and template editor

Hello, and welcome to another CX update post, in which I am happy to report about several significant bug fixes.

  • Pages that had full stops (⟨.⟩) in headings couldn’t be loaded after auto-saving and closing the browser tab. This is now fixed. It’s a follow-up to a similar bug a fix for which was reported last week. If you still have issues with loading saved pages, please report them. (bug report)
  • Adapted infoboxes would often say “Main Page” on the top, no matter what was the page being translated, or into what language. This could also happen with other kinds of templates. This affected pages with templates that used that {{PAGENAME}} magic word. This is now fixed, and the auto-adapted template now shows the relevant page name. (bug report)
  • An unnecessary horizontal scrollbar was shown on some pages that had wide tables. It was removed. (code change)

 


by aharoni at January 20, 2017 07:50 PM

Wiki Education Foundation

Rediscovering the “higher” in higher education with a Wikipedia writing assignment

Dr. Joel Parker is Associate Professor of Biological Sciences at SUNY Plattsburgh, where he has incorporated Wikipedia into his Cell Biology courses. In November we featured some of the great work his students did in our Cell Service roundup. In this post, he explains how assigning students to contribute to Wikipedia brings them through the process of discovery.

Joel Parker
Joel Parker

Assigning students to write for Wikipedia achieves the highest outcome of higher education by teaching your students the full process of discovery. This lesson is especially important today as higher education is being debased with lower learning outcomes that overemphasize the practical training of our students for the workplace. What makes higher education “higher” is the opportunity for students to work with scholars to learn how to advance both our own knowledge and knowledge within our scholarly disciplines. Writing for Wikipedia can facilitate the transition from passive learner to active discoverer for your senior students. I make this happen for my senior level Cell Biology class with a writing for Wikipedia assignment that requires my students to go through all of the stages of the academic discovery process.

Discovery is the general common objective that defines higher education. This discovery process happens at three levels at universities. The first level of discovery is students discovering for themselves previously learned knowledge about the world. This is mastering the material and background knowledge that one expects of a degree holder. The next level is students and academics working together to discover truths about how the universe works. It involves noticing a gap or flaw in our current knowledge, then imagining and proving a solution. Finally, and no less important, the third level is a personal version of the second: effectively contributing the answer to the discipline and the world by communicating the discovery. This personal discovery is the transformation required by our students to gain the confidence to transition from just being beneficiaries of knowledge, to becoming propagators and contributors of new knowledge. This third level is especially important to higher education as the teens and early twenties are perhaps the most formative years when our adult personalities and sense of selves are formed.

In my senior level cell biology class I have my students do each step of the discovery process in a Wikipedia writing assignment. The first step begins when I assign my students to search for and critique an existing cell biology Wikipedia article. This means finding mistakes, missing sections, and places where they can improve the article. The next step is actually doing the fixes and creating new content to fill the voids. This technical side includes writing in the encyclopedic style, communicating science at the correct level, and can even include graphic design when figures are called for. The final step is publicly publishing the article in the correct format and style, then dealing with the judgments, suggestions and edits from the rest of the community. I constantly remind my students throughout the assignment that the overriding goal and assessment criteria is their contribution through improving the articles. It is not sufficient to just write the minimum number of sentences and put in some number of new citations. The changes must improve the article and the citations have to be citations that others will genuinely find helpful or else they do not count. All academics will instantly recognize this outlined process as exactly what we do in our own intellectual work from identifying a question, answering the question, and publishing the solution with peer review. The goal, and the measurable outcome for the students that put in the effort, are articles significantly improved in some way.

With writing for Wikipedia, my students have the opportunity to experience the personal transition from being beneficiaries of their most used and appreciated reference source, to becoming contributors to that source. The objective is for them to become confident enough to see themselves as experts with the ability to contribute and improve the world with what they have learned from their university education. They experience this directly because their work is not just about their grade, but also clearly beneficial to future students like themselves who will be using the edited pages. Thus the assignment forces a maturing change in perspective. Even the most incremental of improvements means the world is different and better thanks to the application of their education to the world’s largest and most used encyclopedia.

Facilitating and advancing discovery is what defines higher education. Wikipedia writing assignments are the one of the best ways to teach, and to remind ourselves, of that primary learning outcome.

If you’d like to learn more about how to incorporate Wikipedia into your course, visit teach.wikiedu.org or send us an email at contact@wikiedu.org.

Photo: Dr. Joel Parker.jpg, by Joel Parker, CC BY-SA 4.0, via Wikimedia Commons.

by Guest Contributor at January 20, 2017 06:02 PM

Wikimedia UK

Wikidata: the new hub for cultural heritage

This article is by: Dr Martin Poulter, Wikimedian In Residence at the University of Oxford – This post was originally published on the Oxford University Museums blog.

There is a site that lets users create customised and unusual lists of art works: works of art whose title is an alliteration, self-portraits by female artists, watercolour paintings wider than they are tall, and so on. These queries do not use any gallery or museum’s web site or search interface but draw from many collections around the world. The art works can be presented in various ways, perhaps on a map of locations they depict, or in a timeline of their creation, colour-coded by the collection where they are held. The data are incomplete, but these are the early days of an ongoing and ambitious project to share data about cultural heritage—all of it.

Judith_with_the_head_of_Holofernes
Judith with the head of Holofernes, Self Portrait (1610s) Fede Galizia, John and Mable Ringling Museum of Art

Wikimedia is a family of charitable projects that are together building an archive of human knowledge and culture, freely shareable and reusable by anyone for any purpose. Wikipedia, the free encyclopedia, is only the best-known part of this effort. Wikidata is a free knowledge base, with facts and figures about tens of millions of items. These data are offered as freely as possible, with no restriction at all on their copying and reuse.

Already, large amounts of data about artworks are being shared by formal partnerships. The University of Barcelona have worked with Wikimedians to share data about Art Nouveau works, recognising that it is far better to have all these data in one place than scattered across various online and offline sources. The National Library of Wales has employed a Wikidata Visiting Scholar to share data about its artworks, including the people and places they depict. The Finnish National Gallery, the Rijksmuseum in Amsterdam and the National Galleries of Scotland are among the institutions who have either formally uploaded catalogue data to Wikidata, or made data freely available for import. To see the sizes of these shared catalogues, one just has to ask Wikidata.

Wikidata logo – Image CC BY-SA 3.0

Wikidata queries can be built using SPARQL, a database query language not for the faint-of-geek. However, there is an open community of users sharing and improving queries. The visualisations they create can be shared online or embedded inside other sites or apps. Developers can build applications for the public; easy to use, but offering a distinctive view of Wikidata’s web of knowledge.

One such application is Crotos, a family of tools generating image galleries and maps of art, filtered by format, artist, place depicted and other attributes. Crotos shows images of the art, so it only includes works with a digital image available in Wikimedia Commons. Wikidata itself has no such restriction: it describes art whether or not a freely-shareable scan is available.

So while the Wikidata site itself might not have mass appeal, the service it provides is gradually transforming the online world, providing a single source of data for some of the most popular web sites and apps. Those “infoboxes” summarising key facts and figures at the top of Wikipedia articles are increasingly being driven from Wikidata, so dates, locations and other facts can be entered in one place but appear on hundreds of sites.

The really exciting prospect is that of building visualisations and other interactive educational objects, integrating information from many collections and other data sources. Wikidata would be interesting enough as an art database, but it also shares bibliographic, genealogical, scientific, and other kinds of data, covering modern as well as historical topics. This allows combined queries, such as art by people born in a particular region and time period, or works depicting people described in a particular book.

Wikidata is massively multilingual, using language-independent identifiers and connecting these to names in hundreds of languages as well as to formal identifiers. In a way it is the ultimate authority file; a modern Rosetta Stone connecting identifiers from institutions’ authority files, scholarly databases and other catalogues (Hinojo (2015)).

There are thousands of properties that a Wikidata item can have. Just considering a small selection that are relevant to art and culture, it is clear that the number of possible queries is astronomical.

  • Many features of an art work can be described:
    • instance of: in other words, the type. Wikidata has many types to choose from, from oil sketch and drawing, via architectural sculpture and stained glass, to aquatint and linocut
    • collection
    • material used
    • height, width
    • genre, movement
    • co-ordinates of the point of view
  • People and places can be connected to an artwork: depicts, creator, attributed to, owned by, after a work by, commissioned by.
  • There are relations between people: parent, sibling, influenced by, school of, author and addressee of a letter.
  • People can also be connected to groups or organisations: member of, founder, employer, educated at.

With so many kinds of data, Wikidata draws in volunteer contributors with varying interests. Just as there are people who will sit down for an evening to improve a Wikipedia article or to categorise images on Wikimedia Commons, there are people fixing and improving Wikidata’s entries and queries. As with Wikipedia, Wikidata benefits from the intersection of different interests. Contributors speak different languages and have different background knowledge. Some are interested in a particular institution’s collection, while others are interested in a particular style of art, others in a given location or historic individual. Hence one entry can attract multiple contributors, each motivated by a different interest.

Over time, Wikidata’s role in Wikipedia will expand. Explore English Wikipedia and you find many list articles, such as List of works by Salvador Dalí or List of Hiberno-Saxon illuminated manuscripts. At the moment, these are all manually maintained, but a program—the ListeriaBot—has been created to turn Wikidata queries into lists suitable for Wikipedia: see for example this (draft) list of paintings of art galleries. Catalan Wikipedia, with a much smaller contributor base than the English language version, is already using the bot to write list articles such as Works of Jacob van Ruisdael, saving many hours of human effort. As automated creation of list articles becomes more widespread, cultural institutions that share catalogue data will help ensure the correctness and completeness of these articles.

Un paisatge del riu amb figures, by Jacob Van Ruysdael (1628/1629–1682), Museu de Belles Arts Puixkin

Like Wikipedia, Wikidata depends on Verifiability: any statement of fact is expected to cite or link a credible published source. Hence it has active links to catalogues and other formally vetted sites, which usually supply more scholarly detail and primary research than Wikidata itself. So Wikidata is not a replacement for cultural institutions’ catalogues. The hub metaphor is apt: it is a central point, linking together disparate resources and giving them a useful shape. Its credibility will always depend on the formally vetted sources that it cites, and there will always be users who want to check what they read by following up the citations. In practice, this means that sharing ten thousand records with Wikidata is a way to get ten thousand incoming links to the institution’s own catalogue. What’s more, the free reuse of Wikidata means that other sites will use those links.

Wikidata and its partners have a huge task ahead of them, but the potential reward is vast. We could have data on all artworks, browsable in endless and genuinely new ways, with connections to their official catalogues, their physical locations, and scholarly literature. The sooner the cultural sector as a whole gets involved, the sooner we can bring this about.

References

Note

I am grateful to Wikidata users Jane Darnell (User:Jane023), Magnus Manske (User:Magnus Manske – creator of User:ListeriaBot) and Andy Mabbett (User:Pigsonthewing) for many of the useful links in this article.

 

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Licence.

by Martin Poulter at January 20, 2017 12:49 PM

User:Geni

The Canon EF 11-24mm f/4 for Wikipedians

Its a £2700 lens. At that price I suspect anyone buying it can come to their own conclusions. Still on a full frame camera it is an extremely useful lens. The width makes it great for urban architecture, larger items in museums, and interiors in general. The short minimum focus distance makes it great for objects in cases and the lens’s sharpness makes it viable to crop the resulting images.

Obviously if you want to shoot longer than 24mm then you need another lens but for wide angle work the lens is excellent.

Downsides. Its a £2700 lens. You could buy quite a lot of other gear for that. The Sigma 12-24mm f/4 is about £1000 cheaper and nearly as sharp at the wide end. If you are shooting on a crop sensor then the 10-18mm is under £300 so unless you really really need the sharpness for some reason I wouldn’t go near this lens for a crop system. On top of that its big and its heavy. Not something I have an issue with but for anyone more weight conscious (but then why shoot full frame?) it may present a problem. The f/4 speed may be less than idea for indoor work but thats becoming less and less of a problem as camera low light abilities improve.

Overall a very useful bit of kit but also really rather on the expensive side.


by geniice at January 20, 2017 11:48 AM

Shyamal

The many shades of citizen science

Everyone is a citizen but not all have the same kind of grounding in the methods of science. Someone with a training in science should find it especially easy to separate pomp from substance. The phrase "citizen science" is a fairly recent one which has been pompously marketed without enough clarity.

In India, the label of a "scientist" is a status symbol, indeed many actually proceed on paths just to earn status. In many of the key professions (example: medicine, law) authority is gained mainly by guarded membership, initiation rituals, symbolism and hierarchies. At its roots, science differs in being egalitarian but the profession is at odds and its institutions are replete with tribal ritual and power hierarchies.

Long before the creation of the profession of science, "Victorian scientists" (who of course never called themselves that) pursued the quest for knowledge (i.e. science) and were for the most part quite good as citizens. In the field of taxonomy, specimens came to be the reliable carriers of information and they became a key aspect of most of zoology and botany. After all what could you write about or talk about if you did not have a name for the subject under study. Specimens became currency. Victorian scientists collaborated in various ways that involved sharing information, sharing /exchanging specimens, debating ideas, and tapping a network of friends and relatives for gathering more "facts". Learned societies and their journals helped the participants meet and share knowledge across time and geographic boundaries.  Specimens, the key carriers of unquestionable information, were acquired for a price and there was a niche economy created with wealthy collectors, not-so-wealthy field collectors and various agencies bridging them. That economy also included the publishers of monographs, field guides and catalogues who grew in power along with organizations such as  museums and later universities. Along with political changes, there was also a move of power from private wealthy citizens to state-supported organizations. Power brings disparity and the Victorian brand of science had its share of issues but has there been progress in the way of doing science?

Looking at the natural world can be completely absorbing. The kinds of sights, sounds, textures, smells and maybe tastes can keep one completely occupied. The need to communicate our observations and reactions almost immediately makes one need to look for existing structure and framework and that is where organized knowledge a.k.a. science comes in. While the pursuit of science might seem be seen by individuals as being value neutral and objective, the settings of organized and professional science are decidedly not. There are political and social aspects to science and at least in India the tendency is to view them as undesirable and not be talked about so as to appear "professional".  

Being silent so as to appear diplomatic probably adds to the the problem. Not engaging in conversation or debate with "outsiders" (a.k.a. mere citizens) probably fuels the growing label of "arrogance" applied to scientists. Once the egalitarian ideal of science is tossed out of the window, you can be sure that "citizen science" moves from useful and harmless territory to a region of conflict and potential danger. Many years ago I saw a bit of this  tone in a publication boasting the virtues of Cornell's ebird and commented on it. Ebird was not particularly novel to me (especially as it was not the first either by idea or implementation, lots of us would have tinkered with such ideas, even I did with - BirdSpot - aimed to be federated and peer-to-peer - ideally something like torrent) but Cornell obviously is well-funded. I commented in 2007 that the wording used sounded like "scientists using citizens rather than looking upon citizens as scientists", the latter being in my view the nobler aim to achieve. Over time ebird has gained global coverage, but has remained "closed" not opening its code or discussions on software construction and by not engaging with its stakeholders. It has on the other hand upheld traditional political hierarchies and processes that ensure low-quality in parts of the world where political and cultural systems are particularly based on hierarchies of users. As someone who has watched and appreciated the growth of systems like Wikipedia it is hard not to see the philosophical differences - almost as stark as right-wing versus left-wing politics.

Do projects like ebird see the politics in "citizen-science"?
Arnstein's ladder is a nice guide to judge
the philosophy behind a project.
I write this while noting that criticisms of ebird as it currently works are slowly beginning to come out (despite glowing accounts in the past). There are comments on how it is reviewed by self-appointed police  (it seems that the problem seems to be not just in the appointment - indeed why could not have the software designers allowed anyone to question any record and put in methods to suggest alternative identifications - gather measures of confidence based on community queries and opinions on confidence measures), there are supposedly a class of user who manages something called "filters" (the problem here is not just with the idea of creating user classes but also with the idea of using manually-defined "filters", to an outsider like me who has some insight in software engineering poor-software construction is symptomatic of poor vision, guiding philosophy and probably issues in project governance ), there are issues with taxonomic changes (I heard someone complain about a user being asked to verify identification - because of a taxonomic split - and that too a split that allows one to unambiguously relabel older records based on geography - these could have been automatically resolved but developers tend to avoid fixing problems and obviously prefer to get users to manage it by changing their way of using it - trust me I have seen how professional software development works), and there are now dangers to birds themselves. There are also issues and conflicts associated with licensing, intellectual property and so on. Now it is easy to fix all these problems piecemeal but that does not make the system better, fixing the underlying processes and philosophies is the big thing to aim for. So how do you go from a system designed for gathering data to one where you want the stakeholders to be enlightened. Well, a start could be made by first discussing in the open.

I guess many of us who have seen and discussed ebird privately could have just said I told you so, but it is not just a few nor is it new. Many of the problems were and are easily foreseeable. One merely needs to read the history of ornithology to see how conflicts worked out between the center and the periphery (conflicts between museum workers and collectors); the troubles of peer-review and open-ness; the conflicts between the rich and the poor (not just measured by wealth); or perhaps the haves and the have-nots. And then of course there are scientific issues - the conflicts between species concepts not to mention conservation issues - local versus global thinking. Conflicting aims may not be entirely solved but you cannot have an isolated software development team, a bunch of "scientists" and citizens at large expected merely to key in data and be gone. There is perhaps a lot to learn from other open-source projects and I think the lessons in the culture, politics of Wikipedia are especially interesting for citizen science projects like ebird. I am yet to hear of an organization where the head is forced to resign by the long tail that has traditionally been powerless in decision making and allowing for that is where a brighter future lies. Even better would be where the head and tail cannot be told apart.

Postscript: 

There is an interesting study of fieldguides and their users in Nature - which essentially shows that everyone is quite equal in making misidentifications - just another reason why ebird developers ought to just remove this whole system creating an uber class involved in rating observations/observers.

23 December 2016 - For a refreshingly honest and deep reflection on analyzing a citizen science project see -  Caroline Gottschalk Druschke & Carrie E. Seltzer (2012) Failures of Engagement: Lessons Learned from a Citizen Science Pilot Study, Applied Environmental Education & Communication, 11:178-188.
20 January 2017 - An excellent and very balanced review (unlike my opinions) can be found here -  Kimura, Aya H.; Abby Kinchy (2016) Citizen Science: Probing the Virtues and Contexts of Participatory Research Engaging Science, Technology, and Society 2:331-361.

by Shyamal L. (noreply@blogger.com) at January 20, 2017 04:49 AM

Wikimedia Foundation

Community digest: Wiki Loves Women, bridging two gaps at a time; news in brief

Photo by Teemages, CC BY-SA 4.0.

Photo by Teemages, CC BY-SA 4.0.

Malouma is a Mauritanian singer and songwriter who was forced to put her career on hold after being forced into marriage. Many of her songs advocate for women’s rights, so much so that she was censored for part of the 1990s in Mauretania.

Hannah Kudjoe was a Ghanaian dressmaker who later became a political activist. She became of the the major figures calling for for the independence of their country in the 1940s.

Qut el Kouloub was a writer from Egypt who contributed generously to French literature in the first half of the twentieth century. Many critics were confused as to whether her works are fictional or nonfictional historical biographies, and like Malouma, Kouloub used her novels to advocate for women’s rights in Egypt.

The work of Malouma, Kudjoe, and Kouloub all deserve a place in history, but women of their background and experience are not well-represented on the internet. Two of the three women had no article on the English Wikipedia until Wiki Loves Women participants created them; they also helped develop the third one. In addition to that, over 1,300 other pages have been created or developed as part of the project.

Wiki Loves Women is a project that addresses two content gaps on Wikipedia at the same time. Its aim is to encourage both gender and geographical diversity on Wikipedia by adding content about African women. The project is now active in Côte d’Ivoire, Cameroon, Nigeria and Ghana.

“I … realised recently that many articles on Wikipedia are not being read [often],” says Olaniyan Olushola, the project manager of Wiki Loves Women in Nigeria. Olushola used the Wikimedia user group Nigeria Facebook page to promote the content created as part of the Wiki Loves Women events that he leads.

Olushola is trying to “find a way to honor Nigerian women by bridging gender inequalities and reducing systemic bias on Wikipedia.” He was introduced to Wikipedia and mentored by a woman: Isla Haddow-Flood, a co-founder of Wiki Loves Women.

Together with Florence Devouard, Haddow-Flood worked on developing the idea of a project that could help increase the presence of African women on Wikipedia. “After working together on Kumusha Takes Wiki and Wiki Loves Africa, it was apparent that the content gap relating to women was a real issue,” Devouard and Haddow-Flood wrote in an email to us. They continued:

With less than 20% of (all) Wikipedia contributors being female, the global community has long acknowledged the gender gap as a problem. But in sub-Saharan Africa, when combined with the contributor gap—only 25% of edits to subjects about the Sub-Saharan region come from within the region—the lack of information about women forms an abyss.

 
Wiki Loves Women kicked off in January 2016 with a writing contest that was held as part of Wikipedia’s fifteenth anniversary. Several partners, including the German cultural association Goethe-Institut, and four teams in different African countries joined the initiative. So far, participants of the project have uploaded over 1,000 photos to Wikimedia Commons, the free media repository, in addition to editing and creating a similar number of articles on Wikipedia.

On International Women’s Day in March, Wiki Loves Women will hold a translate-a-thon, an editing event to translate Wikipedia articles about women in different languages. The organizers emphasize that everyone is welcome to join.

“It is time for the people of Africa to tell their own stories, change their narrative, shake up the global stereotypes, and share information about what they value and find interesting and important in the world,” say Haddow-Flood and Devouard.

In brief

Wikimania updates: Scholarship applications for Wikimania 2017, which is being held in Montréal, Canada on 11–13 August, are now being accepted. The deadline is on 20 February 2017 at 23:59 UTC. More information is available on the scholarships page and on the FAQ page of the event. Moreover, the steering committee of Wikimania, has decided to explore Cape Town in South Africa as a host for Wikimania 2018. A final decision will be made by spring 2017.

Three billion edits: This week, the total edit count on all Wikimedia projects reached 3,000,000,000. Near the same time, the WikiSpecies community celebrated creating their 500,000th page. The entry is about Pseudocalotes drogon and was created by Wikimedian Burmeister.

Wikimedia developer summit 2017: Last week, many Wikimedia technical contributors, third-party developers, users of MediaWiki and Wikimedia APIs gathered at Golden Gate Club in San Francisco for the Wikimedia developer summit 2017. The event lasted for 2 days where the attendees discussed a list of main topics selected by the community.

Donating data to Wikidata: Wikimedia Germany have published a tutorial video about Wikidata, the collaboratively-edited knowledge base. The short video explores WikiData and how contributing to the website works.

2016 on the Arabic Wikipedia: Mohamed Badaren, an editor and administrator on the Arabic Wikipedia has created a video with a summary of the major events in 2016 and their impact on Wikipedia. The video is an adaptation of earlier English-language versions, Edit 2014 and Edit 2015.

New Signpost published: A new edition of the English Wikipedia’s community-written news journal was published this week. Stories included a “surge” in new administrator promotions on the English Wikipedia; an introspective piece looking at the future of the Signpost; coverage of recent research suggesting that women are not more likely to edit about women; an interview with an active Wikipedian who has been blind since birth; and more.

Kurier: New pieces in the Kurier, the German Wikipedia’s “not necessarily neutral [and] non-encyclopedic” news page, include a three-part look back at the year 2016 and an invitation to a Wiki Loves Music event in Hamburg.

Wiki Project Med Foundation is open for members: Wiki Project Med Foundation is a user group that promotes for better coverage of Medical content on Wikimedia projects. The group is now open for membership applications.

Samir Elsharbaty, Digital Content Intern
Wikimedia Foundation

by Samir Elsharbaty at January 20, 2017 01:29 AM

January 19, 2017

Wikimedia Foundation

Introducing the Wikimedia Resource Center: A hub that helps volunteers find the resources they need

Photo via the Library of the London School of Economics and Political Science, Flickr Commons.

Photo via the Library of the London School of Economics and Political Science, Flickr Commons.

Wikimedia volunteers embrace a wide spectrum of work when it comes to contributing to Wikimedia projects: from reporting a bug, to developing a tool, to requesting a grant to start a new Wikimedia program, and more. As the movement expands to include more affiliates, and more programmatic activities every year, newer Wikimedians are faced with lack of experience in the movement and its various channels to request support.

The response to these questions will lead our new Wikimedian to different pages, from Outreach Wikimedia, to Meta Wikimedia, and MediaWiki.org, as well as connect them to experienced Wikimedians who may be able to help. In a recent user experience research, we learned that the majority of program leaders rely heavily on their personal network and personal contacts to find the information they need.

In order to expand Wikimedia communities’ efforts, however, we need to guarantee open access to resources that support this very important work. The Wikimedia Resource Center is a hub designed in response to this issue: it is a single point of entry for Wikimedians all over the world to access the resources and staff support they need to develop new initiatives, and also expand existing ones.

File:Wikimedia Resource Center - Demo.webm

Demo of the new Wikimedia Resource Center.

 

How does it work?

In the Wikimedia Resource Center you will find resources grouped in nine different tabs, according to the goal the resources serve. Let’s imagine you wanted to start a new Wikimedia program. Under Skills Development tab, you will find evaluation tools, program reports and toolkits, and learning patterns, among other resources. Each tab has an introduction page that describes the area, what each resource means and who can give you direct support in any given topic. Skills Development, together with Grants SupportPrograms SupportProduct DevelopmentGlobal ReachLegal, and Communications, all have the same logic.

Contact and Questions, and Consultations Calendar are slightly different. Under Contact and Questions, you will find frequently asked questions that are searchable by topic. This tab also has a new feature: Ask a question. Wikimedians can use this feature to inquire Wikimedia Foundation staff about any topic that is not covered in the FAQ, and they can do so publicly through the Wikimedia Resource Center, or privately via email. Under Contact and Questions, Wikimedians will also find information about the Emergency response system, and in future developments, also a network of Wikimedians.

Consultations Calendar is a public schedule of upcoming collaborations between Wikimedia Foundation and communities. In this tab, you will also find Wikimedia Community News, that transcludes the content of calendar on Meta Wikimedia main page.

If you get lost, you can always find help on the top right corner on every page.

Help us test!

This release constitutes the alpha version of the Wikimedia Resource Center, and at this stage, user feedback is key to improve its functionality. We want to hear from you! If you have comments about the Wikimedia Resource Center, you can submit feedback publicly, on the Talk Page, or privately, via a survey hosted by a third party, that shouldn’t take you more than 4 minutes to complete.

We started small, only including resources developed by the Wikimedia Foundation, in order to be able to launch an initial version of the hub. In this way, we can learn what works and what needs to be developed further, to include features to better connect Wikimedians. Check the project’s progress on Meta by clicking here.

We hope that this hub will better support Wikimedians’ efforts all over the world, and improve findability of the resources that empower them to do their best work.

María Cruz, Communications and Outreach Project Manager, Community Engagement
Wikimedia Foundation

by María Cruz at January 19, 2017 07:26 PM

Weekly OSM

weeklyOSM 339

01/10/2017-01/16/2017

Karte mit neuen Straßen

Data collected by Red Cross volunteers 1 | © OpenStreetMap Mitwirkende CC-BY-SA 2.0

Mapping

  • Kartitotp shows in her blogpost that the community together with the Mapbox team in Ayacucho took a great step forward to make the 150,000-inhabitant city in the Peruvian Andes, the best mapped city in Latin America. 20 bus routes from 22 public transport companies are now available in OSM.
  • Martin Koppenhöfer raises once again the question why monuments up until today are not clearly distinguished by the two proposed subkeys.

Community

Events

  • Fatouma Harber and Aboul Hassane Cisse, hosted from 7th-9th January 2017 in collaboration with the OSM community of Mali a CartoCamp (mapping party) in Tombouctou.
  • Ulf Treger, in his lecture on maps on the 33C3, takes a look back at the historical development of maps and map projections to date and their geopolitical background.
  • Selene Yang published in a diary of photos from the SotM Latam, 2016 that happened in São Paulo, Brazil.
  • The State of the Map Africa Working Group has starting a logo contest.

Humanitarian OSM

  • In an OSM diary entry, “everyone_sinks_starco” complained about a HOT mapathon in Indonesia. It turned out to be a very effective rant, because various members of HOT Indonesia posted comments to explain what had happened (if you don’t understand Bahasa Indonesia you’ll need to copy and paste the second half of the comments into an online translation tool). To this, user Iyan the Project Manager of Humanitarian OpenStreetMap Team Indonesia clarified and explained about the project.
  • That mapathons can be done well, too, is shown by the Red Cross: After training local mappers, 7000 villages in Liberia, Guinea and Sierra Leone are mapped and GPS traces of 70,000 km of roads & paths are collected by those new volunteers.
  • The blog globalinnovationexchange.org has a very upbeat post published on the topic: Fighting Ebola with Information.

Maps

  • J. Budissin is seeking a volunteer who would setup and operate a Sorbian map as the former admin is not willing to do so anymore. Preferably people from Lusatia and the surroundings can volunteer.

switch2OSM

  • The Chilean tax administration uses OSM maps. (Via osmCL)

Open Data

  • Martin Isenburg reports on rapidlasso.com that now there is open and free LiDAR data in Germany. First, North Rhine-Westphalia and then Thuringia have opened their geoportals for free download of geospatial data at the beginning of 2017. We are full of hope, Martin says, that other federal states will follow their lead. It would simply not make any sense to try to sell this kind of data, as it was shown in England recently.

Software

  • Robot8A tries to convince Telegram developer to use OSM instead of Google Maps. Interesting discussion follows.

Programming

  • Adrien Pavie shows his JS library Pic4Carto which allows to embed geolocated pictures into a website. Right now it supports Flickr, Mapillary and Wikimedia Commons.
  • Karlos shows the newest changes of OSM go, e.g. built-in 3D-models of benches or wind turbines and first impressions from the London tube.

Releases

Software Version Release date Comment
Locus Map Free * 3.21.1 2017-01-10 Bugfix release.
Mapbox GL JS v0.31.0 2017-01-10 One new feature and two bugfixes.
Mapillary iOS * 4.5.12 2017-01-10 Minor fixes.
OSRM 5.5.3 2017-01-11 Two enhancements and three bugfixes.
Naviki iOS;* 3.53 2017-01-12 Supporting Apple Watch.
OSM Contributor 3.0.1 2017-01-12 Bugfix release.
QGIS 2.18.3 2017-01-13 No info.
libosmium 2.11.0 2017-01-14 Many changes, please read release info.
Traccar Client Android 4.0 2017-01-14 No info.
pyosmium 2.11 2017-01-15 Use current libosmium.

Provided by the OSM Software Watchlist.

(*) unfree software. See: freesoftware.

Did you know …

  • … Franz-Benjamin Mocnik’s visualizations on OpenStreetMap changeset and wiki tags?
  • TorFlow? It shows the traffic between the individual nodes of Tor in real time.

OSM in the media

  • The MVV, the local traffic company of Munich, will soon launch (automatic translation) a new service based on OpenStreetMap to show arrivals and delays of local trains. The MVV notes that OpenStreetMap data is not only free but also more current than data from HERE.
  • Federal Agency for Civic Education published an article how OpenStreetMap could be used for educational purposes in a public school. (automatic translation)
  • The Herald, in Zimbabwe writes about the importance of collaborative mapping initiatives, such a Missing Maps to help build resilience and better humanitarian response.

Other “geo” things

  • Examples of using OpenStreetMap data and Mapzen tools in news companies.
  • The QuickMapServices (we reported earlier) now contains more than 555 services.
  • Mashable presents jeans from Spinali Design that helps to navigate. We hope for the sake of your safety that only OpenStreetMap data is being used.

Upcoming Events

Where What When Country
Tokyo 東京!街歩き!マッピングパーティ:第4回 根津神社 01/21/2017 japan
Manila 【MapAm❤re】OSM Workshop Series 8/8, San Juan 01/23/2017 philippines
Bremen Bremer Mappertreffen 01/23/2017 germany
Graz Stammtisch Graz 01/23/2017 austria
Nottingham Nottingham Pub Meetup 01/24/2017 uk
Dresden Stammtisch 02/02/2017 germany
Lyon Stand OSM Salon Primevère 02/03/2017-02/05/2017 france
Brussels FOSDEM 2017 02/04/2017-02/05/2017 belgium
Genoa OSMit2017 02/08/2017-02/11/2017 italy
Cardiff OpenDataCamp UK 02/25/2017-02/26/2017 wales
Passau FOSSGIS 2017 03/22/2017-03/25/2017 germany
Avignon State of the Map France 2017 06/02/2017-06/04/2017 france
Aizu-wakamatsu Shi State of the Map 2017 08/18/2017-08/20/2017 japan
Buenos Aires FOSS4G+SOTM Argentina 2017 10/23/2017-10/28/2017 argentina

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..

This weeklyOSM was produced by Peda, Polyglot, Rogehm, SeleneYang, SomeoneElse, TheFive, YoViajo, derFred, jinalfoflia, keithonearth, vsandre.

by weeklyteam at January 19, 2017 03:44 PM

January 18, 2017

Wikimedia Foundation

Why I wrote 100 articles in 100 days about inspiring Jewish women

Ester Rada, an Israeli musician who now has an article on the Spanish Wikipedia. Photo by Oren Rozen, CC BY-SA 3.0..

Ester Rada, an Israeli musician who now has an article on the Spanish Wikipedia. Photo by Oren Rozen, CC BY-SA 3.0.

Seven months ago, I was looking for a new job.

With little else to do after applying for a new one, I browsed Facebook, where I saw Wikimedian friends posting with #100wikidays.

I quickly discovered that the hashtag referred to a challenge undertaken by Wikipedians to write one new article each day for a hundred days. It was the brainchild of my Bulgarian colleague Spiritia, who only a month earlier was the runner-up Wikipedian of the year for coming up with it.

To release the stress of the job hunt, I decided to do it—but my way, by writing articles about Jewish women on the Spanish, Portuguese, English, and Ladino Wikipedias.

I started with a woman from Venezuela, the country I was born in: Margot Benacerraf, a movie director of Moroccan-Jewish origin who received the Cannes Prize in 1959. Who could imagine that in the late 50s, a young woman from a country little-known to many would capture the attention of critics at the Cannes Festival? Benacerraf is now considered the mother of the Venezuelan cinema, founder of the National Cinematheque.

Another woman worthy of mentioning is Houda Nonoo, who served as ambassador of Bahrain to the US from 2008 to 2013. She is the third woman to be an ambassador of Bahrain and the first Jew named as an ambassador from any country in the Arab World.

During these 100 days, I spoke much about these women, telling their stories to everyone who asked about them. One day, I came across an article about a semi-legendary queen in Ethiopia that ended the Axum dynasty, crowned herself, and set the churches of Abyssinia on fire. I asked an Ethiopian friend about her, who immediately replied “Esato? She burned Ethiopia, killed the princes, and took all their gold!”

I don’t know if it’s a legend or not, and neither do historians, but my friend sounded very excited to tell me about her. And now we have an article about Gudit!

On every one of my hundred days, I spent time on the internet looking for another notable Jewish woman whose life would catch my attention. Some were so impressive that I needed to create them on the spot.

One was Caterina Tarongí, who was burned alive by the Spanish Inquisition. Her words to her brother on the way to the auto-da-fé have survived through folk songs and expressions. Another was Raquel Líberman, born in Poland, who declared in publicly denouncing and breaking a Argentine human trafficking network that specialized in Jewish women that “I can only die once, I won’t withdraw the complaint.” The organized network was comprised of over 30,000 women over a seventy-year period.

At some point, I started searching for interesting Jewish women in other language Wikipedias, looking to spread awareness of these people across national and linguistic borders. The one that interested me most was Violeta Yakova, a Bulgarian resistance fighter during the Second World War. Along with two fellow Jews, Yakova killed well-known anti-semites and Nazi informers. The only article about her was in Bulgarian, but after I translated her article into English, other participants in the challenge translated it into seven more! When things like this happen, you get this feeling of accomplishment, of not just contributing to the expansion of free knowledge, but also of engaging other people do it with you as well. It’s a win-win situation.

#100wikidays also gave me the opportunity to interact with other Wikipedians, many of whom I had never met before. Out of these, one of the most remarkable colleagues I befriended in this experience has been Mervat Salman. Mervat lives in Amman, Jordan; I live in Jerusalem, a religious Jew who became an Israeli citizen at Ben-Gurion Airport.

At first sight, one would only focus on our differences. But there’s more: we both work in the IT industry, we both like Middle Eastern food and music and—the most important thing—we both believe in freedom of knowledge and the need to make it accessible for everyone.

After I finished the challenge, I was exhausted. #100wikidays took up a good deal of my time over those one hundred days, but it was satisfying and completely worth the effort.

But I couldn’t rest for long. Only days later, Mervat started asking me if I wanted to take on the #100WikiCommonsDays challenge—like #100wikidays but with pictures. Since I didn’t start immediately, she asked again, and again … until I started uploading photos to Commons. And here I am, halfway through it!

Inspiration is essential in life. I was inspired by all these 100 women, and I hope others will be too.

Maor Malul, Wikipedian

by Maor Malul at January 18, 2017 11:18 PM

Greg Sabino Mullane

MediaWiki extension.json change in 1.25

I recently released a new version of the MediaWiki "Request Tracker" extension, which provides a nice interface to your RequestTracker instance, allowing you to view the tickets right inside of your wiki. There are two major changes I want to point out. First, the name has changed from "RT" to "RequestTracker". Second, it is using the brand-new way of writing MediaWiki extensions, featuring the extension.json file.

The name change rationale is easy to understand: I wanted it to be more intuitive and easier to find. A search for "RT" on mediawiki.org ends up finding references to the WikiMedia RequestTracker system, while a search for "RequestTracker" finds the new extension right away. Also, the name was too short and failed to indicate to people what it was. The "rt" tag used by the extension stays the same. However, to produce a table showing all open tickets for user 'alois', you still write:

<rt u='alois'></rt>

The other major change was to modernize it. As of version 1.25 of MediaWiki, extensions are encouraged to use a new system to register themselves with MediaWiki. Previously, an extension would have a PHP file named after the extension that was responsible for doing the registration and setup—usually by mucking with global variables! There was no way for MediaWiki to figure out what the extension was going to do without parsing the entire file, and thereby activating the extension. The new method relies on a standard JSON file called extension.json. Thus, in the RequestTracker extension, the file RequestTracker.php has been replaced with the much smaller and simpler extension.json file.

Before going further, it should be pointed out that this is a big change for extensions, and was not without controversy. However, as of MediaWiki 1.25 it is the new standard for extensions, and I think the project is better for it. The old way will continue to be supported, but extension authors should be using extension.json for new extensions, and converting existing ones over. As an aside, this is another indication that JSON has won the data format war. Sorry, XML, you were too big and bloated. Nice try YAML, but you were a little *too* free-form. JSON isn't perfect, but it is the best solution of its kind. For further evidence, see Postgres, which now has outstanding support for JSON and JSONB. I added support for YAML output to EXPLAIN in Postgres some years back, but nobody (including me!) was excited enough about YAML to do more than that with it. :)

The extension.json file asks you to fill in some standard metadata fields about the extension, which are then used by MediaWiki to register and set up the extension. Another advantage of doing it this way is that you no longer need to add a bunch of ugly include_once() function calls to your LocalSettings.php file. Now, you simply call the name of the extension as an argument to the wfLoadExtension() function. You can even load multiple extensions at once with wfLoadExtensions():

## Old way:
require_once("$IP/extensions/RequestTracker/RequestTracker.php");
$wgRequestTrackerURL = 'https://rt.endpoint.com/Ticket/Display.html?id';

## New way:
wfLoadExtension( 'RequestTracker' );
$wgRequestTrackerURL = 'https://rt.endpoint.com/Ticket/Display.html?id';

## Or even load three extensions at once:
wfLoadExtensions( array( 'RequestTracker', 'Balloons', 'WikiEditor' ) );
$wgRequestTrackerURL = 'https://rt.endpoint.com/Ticket/Display.html?id';

Note that configuration changes specific to the extension still must be defined in the LocalSettings.php file.

So what should go into the extension.json file? The extension development documentation has some suggested fields, and you can also view the canonical extension.json schema. Let's take a quick look at the RequestTracker/extension.json file. Don't worry, it's not too long.

{
    "manifest_version": 1,
    "name": "RequestTracker",
    "type": "parserhook",
    "author": [
        "Greg Sabino Mullane"
    ],
    "version": "2.0",
    "url": "https://www.mediawiki.org/wiki/Extension:RequestTracker",
    "descriptionmsg": "rt-desc",
    "license-name": "PostgreSQL",
    "requires" : {
        "MediaWiki": ">= 1.25.0"
    },
    "AutoloadClasses": {
        "RequestTracker": "RequestTracker_body.php"
    },
    "Hooks": {
        "ParserFirstCallInit" : [
            "RequestTracker::wfRequestTrackerParserInit"
        ]
    },
    "MessagesDirs": {
        "RequestTracker": [
            "i18n"
            ]
    },
    "config": {
        "RequestTracker_URL": "http://rt.example.com/Ticket/Display.html?id",
        "RequestTracker_DBconn": "user=rt dbname=rt",
        "RequestTracker_Formats": [],
        "RequestTracker_Cachepage": 0,
        "RequestTracker_Useballoons": 1,
        "RequestTracker_Active": 1,
        "RequestTracker_Sortable": 1,
        "RequestTracker_TIMEFORMAT_LASTUPDATED": "FMHH:MI AM FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_LASTUPDATED2": "FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_CREATED": "FMHH:MI AM FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_CREATED2": "FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_RESOLVED": "FMHH:MI AM FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_RESOLVED2": "FMMonth DD, YYYY",
        "RequestTracker_TIMEFORMAT_NOW": "FMHH:MI AM FMMonth DD, YYYY"
    }
}

The first field in the file is manifest_version, and simply indicates the extension.json schema version. Right now it is marked as required, and I figure it does no harm to throw it in there. The name field should be self-explanatory, and should match your CamelCase extension name, which will also be the subdirectory where your extension will live under the extensions/ directory. The type field simply tells what kind of extension this is, and is mostly used to determine which section of the Special:Version page an extension will appear under. The author is also self-explanatory, but note that this is a JSON array, allowing for multiple items if needed. The version and url are highly recommended. For the license, I chose the dirt-simple PostgreSQL license, whose only fault is its name. The descriptionmsg is what will appear as the description of the extension on the Special:Version page. As it is a user-facing text, it is subject to internationalization, and thus rt-desc is converted to your current language by looking up the language file inside of the extension's i18n directory.

The requires field only supports a "MediaWiki" subkey at the moment. In this case, I have it set to require at least version 1.25 of MediaWiki - as anything lower will not even be able to read this file! The AutoloadClasses key is the new way of loading code needed by the extension. As before, this should be stored in a php file with the name of the extension, an underscore, and the word "body" (e.g. RequestTracker_body.php). This file contains all of the functions that perform the actual work of the extension.

The Hooks field is one of the big advantages of the new extension.json format. Rather than worrying about modifying global variables, you can simply let MediaWiki know what functions are associated with which hooks. In the case of RequestTracker, we need to do some magic whenever a <rt> tag is encountered. To that end, we need to instruct the parser that we will be handling any <rt> tags it encounters, and also tell it what to do when it finds them. Those details are inside the wfRequestTrackerParserInit function:

function wfRequestTrackerParserInit( Parser $parser ) {

    $parser->setHook( 'rt', 'RequestTracker::wfRequestTrackerRender' );

    return true;
}

The config field provides a list of all user-configurable variables used by the extension, along with their default values.

The MessagesDirs field tells MediaWiki where to find your localization files. This should always be in the standard place, the i18n directory. Inside that directory are localization files, one for each language, as well as a special file named qqq.json, which gives information about each message string as a guide to translators. The language files are of the format "xxx.json", where "xxx" is the language code. For example, RequestTracker/i18n/en.json contains English versions of all the messages used by the extension. The i18n files look like this:

$ cat en.json
{
  "rt-desc"       : "Fancy interface to RequestTracker using <code>&lt;rt&gt;</code> tag",
  "rt-inactive"   : "The RequestTracker extension is not active",
  "rt-badcontent" : "Invalid content args: must be a simple word. You tried: <b>$1</b>",
  "rt-badquery"   : "The RequestTracker extension encountered an error when talking to the RequestTracker database",
  "rt-badlimit"   : "Invalid LIMIT (l) arg: must be a number. You tried: <b>$1</b>",
  "rt-badorderby" : "Invalid ORDER BY (ob) arg: must be a standard field (see documentation). You tried: <b>$1</b>",
  "rt-badstatus"  : "Invalid status (s) arg: must be a standard field (see documentation). You tried: <b>$1</b>",
  "rt-badcfield"  : "Invalid custom field arg: must be a simple word. You tried: <b>$1</b>",
  "rt-badqueue"   : "Invalid queue (q) arg: must be a simple word. You tried: <b>$1</b>",
  "rt-badowner"   : "Invalid owner (o) arg: must be a valud username. You tried: <b>$1</b>",
  "rt-nomatches"  : "No matching RequestTracker tickets were found"
}

$ cat fr.json
{
  "@metadata": {
     "authors": [
         "Josh Tolley"
      ]
  },
  "rt-desc"       : "Interface sophistiquée de RequestTracker avec l'élement <code>&lt;rt&gt;</code>.",
  "rt-inactive"   : "Le module RequestTracker n'est pas actif.",
  "rt-badcontent" : "Paramètre de contenu « $1 » est invalide: cela doit être un mot simple.",
  "rt-badquery"   : "Le module RequestTracker ne peut pas contacter sa base de données.",
  "rt-badlimit"   : "Paramètre à LIMIT (l) « $1 » est invalide: cela doit être un nombre entier.",
  "rt-badorderby  : "Paramètre à ORDER BY (ob) « $1 » est invalide: cela doit être un champs standard. Voir le manuel utilisateur.",
  "rt-badstatus"  : "Paramètre de status (s) « $1 » est invalide: cela doit être un champs standard. Voir le manuel utilisateur.",
  "rt-badcfield"  : "Paramètre de champs personalisé « $1 » est invalide: cela doit être un mot simple.",
  "rt-badqueue"   : "Paramètre de queue (q) « $1 » est invalide: cela doit être un mot simple.",
  "rt-badowner"   : "Paramètre de propriétaire (o) « $1 » est invalide: cela doit être un mot simple.",
  "rt-nomatches"  : "Aucun ticket trouvé"
}

One other small change I made to the extension was to allow both ticket numbers and queue names to be used inside of the tag. To view a specific ticket, one was always able to do this:

<rt>6567</rt>

This would produce the text "RT #6567", with information on the ticket available on mouseover, and hyperlinked to the ticket inside of RT. However, I often found myself using this extension to view all the open tickets in a certain queue like this:

<rt q="dyson"></rt>

It seems easier to simply add the queue name inside the tags, so in this new version one can simply do this:

<rt>dyson</rt>

If you are running MediaWiki 1.25 or better, try out the new RequestTracker extension! If you are stuck on an older version, use the RT extension and upgrade as soon as you can. :)

by Greg Sabino Mullane (noreply@blogger.com) at January 18, 2017 03:41 AM

Broken wikis due to PHP and MediaWiki "namespace" conflicts

I was recently tasked with resurrecting an ancient wiki. In this case, a wiki last updated in 2005, running MediaWiki version 1.5.2, and that needed to get transformed to something more modern (in this case, version 1.25.3). The old settings and extensions were not important, but we did want to preserve any content that was made.

The items available to me were a tarball of the mediawiki directory (including the LocalSettings.php file), and a MySQL dump of the wiki database. To import the items to the new wiki (which already had been created and was gathering content), an XML dump needed to be generated. MediaWiki has two simple command-line scripts to export and import your wiki, named dumpBackup.php and importDump.php. So it was simply a matter of getting the wiki up and running enough to run dumpBackup.php.

My first thought was to simply bring the wiki up as it was - all the files were in place, after all, and specifically designed to read the old version of the schema. (Because the database scheme changes over time, newer MediaWikis cannot run against older database dumps.) So I unpacked the MediaWiki directory, and prepared to resurrect the database.

Rather than MySQL, the distro I was using defaulted to using the freer and arguably better MariaDB, which installed painlessly.

## Create a quick dummy database:
$ echo 'create database footest' | sudo mysql

## Install the 1.5.2 MediaWiki database into it:
$ cat mysql-acme-wiki.sql | sudo mysql footest

## Sanity test as the output of the above commands is very minimal:
echo 'select count(*) from revision' | sudo mysql footest
count(*)
727977

Success! The MariaDB instance was easily able to parse and load the old MySQL file. The next step was to unpack the old 1.5.2 mediawiki directory into Apache's docroot, adjust the LocalSettings.php file to point to the newly created database, and try and access the wiki. Once all that was done, however, both the browser and the command-line scripts spat out the same error:

Parse error: syntax error, unexpected 'Namespace' (T_NAMESPACE), 
  expecting identifier (T_STRING) in 
  /var/www/html/wiki/includes/Namespace.php on line 52

What is this about? Turns out that some years ago, someone added a class to MediaWiki with the terrible name of "Namespace". Years later, PHP finally caved to user demands and added some non-optimal support for namespaces, which means that (surprise), "namespace" is now a reserved word. In short, older versions of MediaWiki cannot run with modern (5.3.0 or greater) versions of PHP. Amusingly, a web search for this error on DuckDuckGo revealed not only many people asking about this error and/or offering solutions, but many results were actual wikis that are currently not working! Thus, their wiki was working fine one moment, and then PHP was (probably automatically) upgraded, and now the wiki is dead. But DuckDuckGo is happy to show you the wiki and its now-single page of output, the error above. :)

There are three groups to blame for this sad situation, as well as three obvious solutions to the problem. The first group to share the blame, and the most culpable, is the MediaWiki developers who chose the word "Namespace" as a class name. As PHP has always had very non-existent/poor support for packages, namespaces, and scoping, it is vital that all your PHP variables, class names, etc. are as unique as possible. To that end, the name of the class was changed at some point to "MWNamespace" - but the damage has been done. The second group to share the blame is the PHP developers, both for not having namespace support for so long, and for making it into a reserved word full knowing that one of the poster children for "mature" PHP apps, MediaWiki, was using "namespace". Still, we cannot blame them too much for picking what is a pretty obvious word choice. The third group to blame is the owners of all those wikis out there that are suffering that syntax error. They ought to be repairing their wikis. The fixes are pretty simple, which leads us to the three solutions to the problem.


MediaWiki's cool install image

The quickest (and arguably worst) solution is to downgrade PHP to something older than 5.3. At that point, the wiki will probably work again. Unless it's a museum (static) wiki, and you do not intend to upgrade anything on the server ever again, this solution will not work long term. The second solution is to upgrade your MediaWiki! The upgrade process is actually very robust and works well even for very old versions of MediaWiki (as we shall see below). The third solution is to make some quick edits to the code to replace all uses of "Namespace" with "MWNamespace". Not a good solution, but ideal when you just need to get the wiki up and running. Thus, it's the solution I tried for the original problem.

However, once I solved the Namespace problem by renaming to MWNamespace, some other problems popped up. I will not run through them here - although they were small and quickly solved, it began to feel like a neverending whack-a-mole game, and I decided to cut the Gordian knot with a completely different approach.

As mentioned, MediaWiki has an upgrade process, which means that you can install the software and it will, in theory, transform your database schema and data to the new version. However, version 1.5 of MediaWiki was released in October 2005, almost exactly 10 years ago from the current release (1.25.3 as of this writing). Ten years is a really, really long time on the Internet. Could MediaWiki really convert something that old? (spoilers: yes!). Only one way to find out. First, I prepared the old database for the upgrade. Note that all of this was done on a private local machine where security was not an issue.

## As before, install mariadb and import into the 'footest' database
$ echo 'create database footest' | sudo mysql test
$ cat mysql-acme-wiki.sql | sudo mysql footest
$ echo "set password for 'root'@'localhost' = password('foobar')" | sudo mysql test

Next, I grabbed the latest version of MediaWiki, verified it, put it in place, and started up the webserver:

$ wget http://releases.wikimedia.org/mediawiki/1.25/mediawiki-1.25.3.tar.gz
$ wget http://releases.wikimedia.org/mediawiki/1.25/mediawiki-1.25.3.tar.gz.sig

$ gpg --verify mediawiki-1.25.3.tar.gz.sig 
gpg: assuming signed data in `mediawiki-1.25.3.tar.gz'
gpg: Signature made Fri 16 Oct 2015 01:09:35 PM EDT using RSA key ID 23107F8A
gpg: Good signature from "Chad Horohoe "
gpg:                 aka "keybase.io/demon "
gpg:                 aka "Chad Horohoe (Personal e-mail) "
gpg:                 aka "Chad Horohoe (Alias for existing email) "
## Chad's cool. Ignore the below.
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 41B2 ABE8 17AD D3E5 2BDA  946F 72BC 1C5D 2310 7F8A

$ tar xvfz mediawiki-1.25.3.tar.gz
$ mv mediawiki-1.25.3 /var/www/html/
$ cd /var/www/html/mediawiki-1.25.3
## Because "composer" is a really terrible idea:
$ git clone https://gerrit.wikimedia.org/r/p/mediawiki/vendor.git 
$ sudo service httpd start

Now, we can call up the web page to install MediaWiki.

  • Visit http://localhost/mediawiki-1.25.3, see the familiar yellow flower
  • Click "set up the wiki"
  • Click next until you find "Database name", and set to "footest"
  • Set the "Database password:" to "foobar"
  • Aha! Looks what shows up: "Upgrade existing installation" and "There are MediaWiki tables in this database. To upgrade them to MediaWiki 1.25.3, click Continue"

It worked! Next messages are: "Upgrade complete. You can now start using your wiki. If you want to regenerate your LocalSettings.php file, click the button below. This is not recommended unless you are having problems with your wiki." That message is a little misleading. You almost certainly *do* want to generate a new LocalSettings.php file when doing an upgrade like this. So say yes, leave the database choices as they are, and name your wiki something easily greppable like "ABCD". Create an admin account, save the generated LocalSettings.php file, and move it to your mediawiki directory.

At this point, we can do what we came here for: generate a XML dump of the wiki content in the database, so we can import it somewhere else. We only wanted the actual content, and did not want to worry about the history of the pages, so the command was:

$ php maintenance/dumpBackup.php --current > acme.wiki.2005.xml

It ran without a hitch. However, close examination showed that it had an amazing amount of unwanted stuff from the "MediaWiki:" namespace. While there are probably some clever solutions that could be devised to cut them out of the XML file (either on export, import, or in between), sometimes quick beats clever, and I simply opened the file in an editor and removed all the "page" sections with a title beginning with "MediaWiki:". Finally, the file was shipped to the production wiki running 1.25.3, and the old content was added in a snap:

$ php maintenance/importDump.php acme.wiki.2005.xml

The script will recommend rebuilding the "Recent changes" page by running rebuildrecentchanges.php (can we get consistentCaps please MW devs?). However, this data is at least 10 years old, and Recent changes only goes back 90 days by default in version 1.25.3 (and even shorter in previous versions). So, one final step:

## 20 years should be sufficient
$ echo '$wgRCMAxAge = 20 * 365 * 24 * 3600;' >> LocalSettings.php
$ php maintenance/rebuildrecentchanges.php

Voila! All of the data from this ancient wiki is now in place on a modern wiki!

by Greg Sabino Mullane (noreply@blogger.com) at January 18, 2017 03:23 AM

January 17, 2017

Erik Zachte

Browse winning Wiki Loves Monuments images offline

wlm_2016_in_aks_the_reflection_taj_mahal

Click to show full size (1136×640), e.g. for iPhone 5

 

The pages on Wikimedia Commons which list the winners of the yearly contests [1] contain a feature ‘Watch as Slideshow!’. Works great.

However, wouldn’t it be nice if you could also show these images offline (outside a browser), annotated and resized for minimal footprint?

Most end-of-year vacations I do a hobby project for Wikipedia. This time I worked on a script [2] [3] to make the above happen. The script does the following:

  • Download all images from Wiki Loves Monuments winners pages [1]
  • Collect image, author and license info for each image on those winners pages
  • or if not available there, collect these meta data from the upload pages on Commons
  • Resize the images so they are exactly the required size
  • Annotate the image unobtrusively in a matching font size:
    contest year, country, title, author, license
wlm-annotations

Font size used for 2560×1600 image

 

  • Prefix the downloaded image for super easy filtering on year and/or countrywlm-winners-file-list-detail


I pre-rendered several sets with common image sizes, ready for download. You can request an extra set for other common screen sizes [4] [5]:

wlm_download_folder


For instance the 1920×1080 set is ideal for HDTV (e.g. for Appl
e TV screensaver) or large iPhones. On TV the texts are readable by itself, on phone some manual zooming is needed (but unobtrusiveness is key).

[1] 2010 2011 2012 2013 2014 2015 2016
[2] The script has been tested on Windows 10.
Prerequisites: curl and ImageMagicks convert (in same folder).
[3] I am actually already rewriting the script, separating it into two scripts, to make it more modular and more generally applicable. First script will extract information from WLM/WLE (WLA?) winners pages and image upload pages, and generate a csv file. Second script will read this csv, download images, resize and annotate them. I will announce the git url here when done.
[4] 4K is a bit too large for easy upload. I may do that later when the script can also run on WMF servers.
[5] Current sets are optimal for e.g. HDTV and new iPhones (again, others may follow):
1920×1080 HDTV and iPhone 6+/7+
1334×750 iPhone 6/6s/7
1136×640 iPhone 5/5s 

by Erik at January 17, 2017 12:46 PM

Gerard Meijssen

#Wikimedia - What is our mission

Many Wikipedians have a problem with Wikidata. It is very much cultural. One argument is that Wikidata does not comply with their policies and therefore cannot be used. A case in point is "notability", Wikidata knows about much more and how can that all be good?

To be honest, Wikidata is immature and it needs to be a lot better. When a Wikipedia community does not want to incorporate data from Wikidata at this point, fine. Let us find what it takes to do so in the future. Let us work on approaches that are possible now and add value to everyone.

Many of the arguments that are used show a lack of awareness of Wikipedia's own history. There are no reminders to the times when it was good to be "bold". It is forgotten that content should be allowed to improve over time and, this is still true for all of the Wikimedia content.

The problem is that every Wikidata provides a service to every Wikimedia project and as a consequence there are parts of a project where Wikidata will never comply with its policies. Arguably, all the policies of all the projects including Wikidata service what the Wikimedia Foundation is about it is to provide "every single person on the planet is given free access to the sum of all human knowledge".  When the argument is framed in this way, the question becomes a different one; it becomes how can we benefit from each other and how can we strengthen the quality of each others offerings.

Wikidata got a flying start when it replaced all the interwiki links. When all the wiki links and red links are associated with Wikidata links, it will allow for new ways to improve the consistency of Wikipedia. The problem with culture is that it is resistant to change. So when the entrenched practice is that they do not want Wikidata, let's give them the benefits of Wikidata. In a "phabricator" thingie I tried to describe it.

The proposal is for both red links and wiki links to be associated with Wikidata items. It will make it easier to use the data tools associated with Wikidata to verify, curate and improve the Wikipedia content. Obviously every link could have an associated statement. When more and more Wikipedia links are associated with statements Wikidata improves but as part of the process, these links are verified and errors will be removed.

The nice thing is that the proposal allows for it to be "opt in". The old school Wikipedians do not have to notice. It will only be for those who understand the premise of using Wikidata to improve content. In the end it will allow Wikidata and even Wikipedia to mature. It will bring another way to look at quality and it will ensure that all the content of the Wikimedia Foundation will get better integrated and be of a higher quality.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 17, 2017 09:25 AM

Wikimedia Foundation

Wikipedia is built on the public domain

Image by the US Department of Agriculture, public domain/CC0.

Image by the US Department of Agriculture, public domain/CC0.

Wikipedia is built to be shared and remixed. This is possible, in part, thanks to the incredible amount of material that is available in the public domain. The public domain refers to a wide range of creations that are not restricted by copyright, and can be used freely by anyone. These works can be copied, translated, or remixed, so the public domain provides a crucial foundation for new creative works. On Wikipedia, some articles are based on text from older public domain encyclopedias or include images no longer protected by copyright. People regularly use public domain material to bring educational content to life and create interesting new ways to share it further.

There are three basic ways that material commonly enters the public domain.

First, when you think of the public domain, you may think of the very old creations that have expired copyright. In the United States and many other countries, copyright lasts for the life of the author plus seventy years. Works published before 1923 are in the public domain, but the rest are governed by complex copyright rules. Peter B. Hirtle of Cornell University created a helpful chart to determine when the copyright terms for various types of works will expire in the U.S.. Due to the copyright term extension in the 1976 Copyright Act and later amendments, published works from the United States will not start entering the public domain until 2019. In places outside of the U.S., copyright terms expire after shorter terms on January 1, celebrated annually as public domain day.

Second, a valuable contributor to the public domain is the U.S. federal government. Works created by the U.S. government are in the public domain as a matter of law. This means that government websites may provide a rich source of freely usable photographs and other material. A primary purpose of copyright is to promote creation by rewarding people with exclusive rights, but the government does not need this sort of incentive. Government works are already funded directly by taxpayers, and should belong to the public. Putting the government’s creations in the public domain allows everyone to benefit from the government’s work.

Third, some authors choose to dedicate their creations to the public domain. Tools like Creative Commons Zero (CC0) allow people to mark works that the public can freely used without restrictions or conditions. CC0 is used for some highly creative works, like the photographs on Unsplash. Other creators may wish release their works freely, but still maintain some copyright with minimal conditions attached. These users may adopt a license like Creative Commons Attribution Share-Alike (CC BY-SA) to require other users to provide credit and re-license their works. Most of the photographs on Wikimedia Commons and all the articles on Wikipedia are freely available under CC BY-SA. While these works still have copyright and are not completely in the public domain, they can still be shared and remixed freely alongside public domain material.

In the coming years, legislators in many countries will consider writing new copyright rules to adapt to changes in technology and the economy. One important consideration is how these proposals will protect the public domain to provide room for new creations. The European Parliament has already begun considering a proposed change to the Copyright Directive, including concerning new rights that would make the public domain less accessible to the public. As copyright terms have been extended over the past few decades, works from the 1960s remain expensive and inaccessible that would otherwise be free of copyright. As we consider changing copyright rules, we should remember that everyone, including countless creators, will benefit from a rich and vibrant public domain.

Stephen LaPorte, Senior Legal Counsel
Wikimedia Foundation

Interested in getting more involved? Learn more about the Wikimedia Foundation’s position on copyright, and join the public policy mailing list to discuss how Wikimedia can continue to protect the public domain.

by Stephen LaPorte at January 17, 2017 12:25 AM

January 16, 2017

Wiki Education Foundation

The Roundup: Serious Business

It can be tricky to find publicly accessible, objective information about business-related subjects. It’s more common for there to be monetary incentives to advocate, promote, omit, or underplay particular aspects, points of view, or examples. The concepts can also be complex, weaving together theory, history, law, and a variety of opinions. Effectively writing about business on Wikipedia thus requires neutrality, but also great care in selecting sources and the ability to summarize the best information about a topic. It’s for these reasons that students can make particularly valuable contributions to business topics on Wikipedia. They arrive at the subject without the burden of a conflict of interest that a professional may have, they have access to high-quality sources, and have an expert to guide them on their way.

Students in Amy Carleton’s Advanced Writing in the Business Administration Professions course at Northeastern University made several such contributions.

One student contributed to the article on corporate social responsibility, adding information from academic research on the effects of the business model on things like employee turnover and customer relations.

Another student created the article about the investigation of Apple’s transfer pricing arrangements with Ireland, a three-year investigation into the tax benefits Apple, Inc. received. The result was the “biggest tax claim ever”, though the decision is being appealed.

Overtime is something that affects millions of workers, and which has been a common topic of labor disputes. Wikipedia has an article about overtime in general, but it’s largely an overview of relevant laws. What had not been covered until a student created the article, are the effects of overtime. Similarly, while Wikipedia covers a wide range of immigration topics, it did not yet cover the international entrepreneur rule, a proposed immigration regulation that would to admit more foreign entrepreneurs into the United States. As with areas where there are common monetary conflicts of interest, controversial subjects like immigration policy are also simultaneously challenging to write and absolutely crucial to write about.

Some of the other topics covered in the class include philanthropreneurs, the globalization of the football transfer market, peer-to-peer transactions, and risk arbitrage.

Contributing well-written, neutral information about challenging but important topics is a valuable public service. If you’re an instructor who may want to participate, Wiki Ed is here to help. We’re a non-profit organization that can provide you with free tools and staff support for you and your students as you have them contribute to public knowledge on Wikipedia for a class assignment. To learn more, head to teach.wikipedia.org or email contact@wikiedu.org.

Photo: Dodge Hall Northeastern University.jpg, User:Piotrus (original) / User:Rhododendrites (derivative), CC BY-SA 3.0, via Wikimedia Commons.

by Ryan McGrady at January 16, 2017 05:07 PM

Semantic MediaWiki

Semantic MediaWiki 2.4.5 released/en

Semantic MediaWiki 2.4.5 released/en


January 16, 2017

Semantic MediaWiki 2.4.5 (SMW 2.4.5) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides bugfixes for the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

by TranslateBot at January 16, 2017 01:27 PM

Semantic MediaWiki 2.4.5 released

Semantic MediaWiki 2.4.5 released

January 16, 2017

Semantic MediaWiki 2.4.5 (SMW 2.4.5) has been released today as a new version of Semantic MediaWiki.

This new version is a minor release and provides bugfixes for the current 2.4 branch of Semantic MediaWiki. Please refer to the help page on installing Semantic MediaWiki to get detailed instructions on how to install or upgrade.

by Kghbln at January 16, 2017 01:24 PM

User:Legoktm

MediaWiki - powered by Debian

Barring any bugs, the last set of changes to the MediaWiki Debian package for the stretch release landed earlier this month. There are some documentation changes, and updates for changes to other, related packages. One of the other changes is the addition of a "powered by Debian" footer icon (drawn by the amazing Isarra), right next to the default "powered by MediaWiki" one.

Powered by Debian

This will only be added by default to new installs of the MediaWiki package. But existing users can just copy the following code snippet into their LocalSettings.php file (adjust paths as necessary):

# Add a "powered by Debian" footer icon
$wgFooterIcons['poweredby']['debian'] = [
    "src" => "/mediawiki/resources/assets/debian/poweredby_debian_1x.png",
    "url" => "https://www.debian.org/",
    "alt" => "Powered by Debian",
    "srcset" =>
        "/mediawiki/resources/assets/debian/poweredby_debian_1_5x.png 1.5x, " .
        "/mediawiki/resources/assets/debian/poweredby_debian_2x.png 2x",
];

The image files are included in the package itself, or you can grab them from the Git repository. The source SVG is available from Wikimedia Commons.

by legoktm at January 16, 2017 09:18 AM

January 15, 2017

Wikimedia Foundation

Librarians offer the gift of a footnote to celebrate Wikipedia’s birthday: Join #1lib1ref 2017

Photo by Diliff, CC BY-SA 4.0.

Photo by Diliff, CC BY-SA 4.0.

Wikipedia has just turned 16, at a time when the need for accurate, reliable information is greater than ever. In a world where social media channels are awash with fake news, and unreliable assertions come from every corner, the Wikimedia communities and Wikipedia in particular have offered a space for that free, accessible and reliable information to be aggregated and shared with the broader world.

Making sure that the public, our patrons, reach the best sources of information is at the heart of the Wikipedia community’s ideals. The concept of all the information on Wikipedia being “verifiable”, connected to an editorially controlled source, like a reputable newspaper or academic journal, has helped focus the massive collaborative effort that Wikipedia represents.

This connection of Wikipedia’s information to sourcing, however, is an ideal; Wikipedia grows through the contributions of thousands of people every month, and we cannot practically expect every new editor to understand how Wikipedia relies on footnotes, how to find the right kinds of research material, or how to add those references to Wikipedia. All of these steps require not only a broader understanding of research, but how those skills apply to our context.

Unlike an average Wikipedia reader, librarians understand these skills intimately: not only do librarians have training and practical experience finding and integrating reference materials into written works, but they teach patrons these vital 21st-century information literacy skills every day. In the face of a flood of bad information, the health of Wikipedia relies not only on contributors, but community educators who can help our readers understand how our content is created. Ultimately, the skills and goals of the library community are aligned with the Wikipedia community.

That is why we are asking librarians to “Imagine a world where every librarian added one more reference to Wikipedia” as part of our second annual “1 Librarian, 1 Reference” (#1lib1ref) campaign. There are plenty of opportunities to get involved: there are over 313,000 “citation needed” statements on Wikipedia and 213,000 articles without any citations at all.

Last year, #1lib1ref spread around the world, helping over 500 librarians contribute thousands of citations, and sparking a conversation among library communities about what role Wikipedia has in the information ecosystem. Still, Wikipedia has over 40 million articles in hundreds of languages; though the hundreds of librarians made many contributions to the English, Catalan and a few other language Wikipedias, we need more to significantly change the experience of Wikipedia’s hundreds of millions of readers.

This year, we are calling on librarians the world over to make #1lib1ref a bigger, better contribution to a real-information based future. We are:

  • Supporting more languages for the campaign
  • Providing a kit to help organize gatherings of librarians to contribute and talk about Wikipedia’s role in librarianship.
  • Extending the campaign for another couple weeks, from January 15 until February 3.

Share the campaign in your networks and go to your library to ask your librarian to join in the campaign in the coming weeks, to contribute a precious Wikipedia birthday gift to the world: one more citation on Wikipedia!

Alex Stinson, GLAM-Wiki Strategist
Wikimedia Foundation

You can learn more about 1lib1ref at its campaign page.

by Alex Stinson at January 15, 2017 08:31 PM

Gerard Meijssen

#Wikipedia - Who is Fiona Hile?

When you look for Fiona Hile on the English Wikipedia, you will find this. It is a puzzle and there are probably two people by that name that do not have an article (yet).

One of them is an Australian poet. When you google for her you find among other things a picture. When you seek her information on VIAF you find two identifiers and in the near future she will have a third: Wikidata.

From a Wikidata point of view it is relevant to have an item for her because she won two awards. It completes these lists and it connects the two awards to the same person.

When you asks yourself is Mrs Hile really "notable", you find that the answer depends on your point of view. Wikipedia already mentions her twice and surely a discussion on the relative merits of notability is not everyone's cup of tea.

Why is Mrs Hile notable enough to blog about? It is a great example that Wikipedia and Wikidata together can produce more and better information.
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 15, 2017 07:40 PM

The Peter Porter Poetry Prize

For me the Peter Porter Poetry Prize is an award like so many others. There is one article, it lists the names of some of the people who are known to have won the prize. Some are linked and some are not. For one winner I linked to a German article and for a few others I created an item.

This list is complete, it has a link to a source so the information can be verified and I am satisfied with the result up to a point.

What I could do is add more awards and people who have won awards. The article for Tracy Ryan, the 2009 winner, has a category for another award that she won.  This award does not have a webpage with all the past winners so the question is; is Wikipedia good enough as a source. I added the winners to the award, made a mistake corrected it and now Wikidata knows about a Nathan Hobby.

Jay Martin is the 2016 winner of the  T.A.G. Hungerford Award. It has a source but it is extremely likely that this will disappear in  2017. The problem I have is that I want to see this information shared but all the work done to improve on Wikidata data is not seen at Wikipedia. When we share our resources and when we are better in tune with each others needs as editors, we will be better able to "share in the sum of our available knowledge".
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 15, 2017 12:20 PM

Is #Wikipedia the new #Britannica?

At the time the Britannica was best of breed. It was the encyclopaedia to turn to. Then Wikipedia happened and obviously it was not good enough, people were not convinced. When you read the discussions why Wikipedia was not good enough, there was however no actual discussion. The points of view were clear, they had consequences and it was only when research was done that Wikipedia became respectable. Its quality was equally good and it was more informative and included more subjects. The arguments did not go away the point of view became irrelevant. People and particularly students use Wikipedia.

Today Wikipedia is said to be best of breed. It is where you find encyclopaedic information and as Google rates Wikipedia content highly it is seen and used a lot by many people.

The need for information is changing. We have recently experienced a lot of misinformation and the need to know what is factually correct has never been more important. What has become clear is that arguments and information alone is not what sways people. So the question is where does that leave Wikipedia?

The question we have to ask is, what does it take to convince people, to be open minded. What to do when people expect a neutral point of view but the facts are unambiguous in one direction? What if the language used is not understood? What are the issues of Wikipedia, what are its weaknesses and what are its strength?

So far quality is considered to be found in sources, in the reputation of its writers. When this is not what convinces, how do we show our quality or better, how do we get people to reconsider and see the other point of view?
Thanks,
      GerardM

by Gerard Meijssen (noreply@blogger.com) at January 15, 2017 08:04 AM

January 13, 2017

Weekly OSM

weeklyOSM 338

01/03/2017-01/09/2017

Logo

New routing possiblities for wheelchairs 1 |

Mapping

  • Regio OSM, a completeness checker for addresses now checks 1702 communities and many cities in Germany, one of the 11 countries where the tool can be used.
  • An interesting combination of OpenData and OSM to improve the OSM data of schools in the UK. One drawback is that a direct link exists only to iD. If iD is open, however, you can open JOSM from there. 😉
  • Pascal Neis describes in a blog post his tools for QA in OSM
  • Arun Ganesh shows the significance of the wikidata=* tag by an example of the North Indian city of Manali. In his contribution, he also points to possibilities for improving OSM with further information via Wikidata, Wikimedia Commons, WikiVoyage and also points out information about using Wikidata with Mapbox tools.
  • The OSM Operations team announced a new feature on the main map page: Public GPS-Tracks.
  • Tom Pfeifer asks, how the quite modern form of cooperation, sharing of workspaces and equipment in the form of the coworking space should be tagged.
  • Chris uses AutoHotKey (Windows) and JOSM to optimize his mapping experience. He demonstrates this in a video, while tracing building outlines.
  • User rorym shows why it is useful not to make mechanical edits but “look at the area and look for other mistakes!”

Community

OpenStreetMap Foundation

Events

  • Klaus Torsky reports (de) (automatic translation) on the last FOSS4G in Germany. He links to an interview (en) with Till Adams the brain behind the organisation of FOSS4G in Bonn.
  • Frederik Ramm invites people for the February hack weekend happening in Karlsruhe.
  • A mapping party took place in Tombuctu took place from 7th to 9th of January.

Humanitarian OSM

  • Kizito Makoye reports on the initiative of the Dar es Salaam City Administration, Tanzania, in the floodplains of poor regions such as Tandal drones. The Ramani-Huria project supports this by implementing the acquired data in OSM-based maps. This and other measures will improve the living conditions and the infrastructure in the slum areas.

Maps

switch2OSM

  • Uber uses OpenStreetMap. Grant Slater expects Uber to contribute to OSM data.

Software

  • The Wikimedia help explains how to use the Wikidata ID to display the outline of OSM Objects in Wikimedia maps.
  • User Daniel writes a diary on how the latest release of Open Source Routing Machine (version 5.5) has made it easier to set up our own routing machine and shares some documentation related to it.

Releases

OpenStreetMap Routing Machine released version 5.5 which comes with some huge enhancements in guidance, tags, API and infrastructure.

Software Version Release date Comment
OSRM 5.5.0 2016-12-16 Navigation, tag interpretation and the API infrastructure have been improved.
JOSM 11427 2016-12-31 No info.
Mapillary Android * 3.14 2017-01-04 Much faster GPX fix.
Mapbox GL JS v0.30.0 2017-01-05 No info.
Naviki Android;* 3.52.4 2017-01-05 Accuracy improved.
Mapillary iOS * 4.5.11 2017-01-06 Improved onboarding.
SQLite 1.16.2 2017-01-06 Four fixes.

Provided by the OSM Software Watchlist.

(*) unfree software. See: freesoftware.

Did you know …

  • … the daily updated extracts by Netzwolf?
  • … your next holiday destination? If yes, then the map with georeferenced images in Wikimedia Commons is ideal to inform oneself in advance.
  • … the GPS navigator uNav for Ubuntu smartphones? This OSM-based Navi-App is now available in version 0.64 for the Ubuntu Mobile Operating System (OTA-14).

OSM in the media

  • Tracy Staedter (Seeker) explained the maps of Geoff Boeing. He calls his visualization tool OSMnx (OSM + NetworkX). The tool can create the physical characteristics of the streets of each city in a black & white grid, showing impressive historical city developments. Boeing says, “The cards help change opinions by demonstrating to people that the density of a city is not necessarily bad.”

Other “geo” things

  • The Open Traffic Partnership (OTP) is an initiative in Manila, Philippines which aims to make use of anonymized GPS data to analyze traffic congestion. The partnership has led to an open source platform – OSM is represented by Mapzen – that enables developing countries to record and analyze traffic patterns. Alyssa Wright, President of the US OpenStreetMap Foundation, said: “The partnership seeks to improve the efficiency and effectiveness of global transport use and supply through open data and capacity expansion.”
  • This is how the Mercator Projection distorts the poles.
  • Treepedia, developed by MIT’s Senseable City Lab and World Economic Forum, provides a visualization of tree cover in 12 major cities including New York, Los Angeles and Paris.

Upcoming Events

Where What When Country
Lyon Mapathon Missing Maps pour Ouahigouya 01/16/2017 france
Brussels Brussels Meetup 01/16/2017 belgium
Essen Stammtisch 01/16/2017 germany
Grenoble Rencontre groupe local 01/16/2017 france
Manila 【MapAm❤re】OSM Workshop Series 7/8, San Juan 01/16/2017 philippines
Augsburg Augsburger Stammtisch 01/17/2017 germany
Cologne/Bonn Bonner Stammtisch 01/17/2017 germany
Scotland Edinburgh 01/17/2017 uk
Lüneburg Mappertreffen Lüneburg 01/17/2017 germany
Viersen OSM Stammtisch Viersen 01/17/2017 germany
Osnabrück Stammtisch / OSM Treffen 01/18/2017 germany
Karlsruhe Stammtisch 01/18/2017 germany
Osaka もくもくマッピング! #02 01/18/2017 japan
Leoben Stammtisch Obersteiermark 01/19/2017 austria
Urspring Stammtisch Ulmer Alb 01/19/2017 germany
Tokyo 東京!街歩き!マッピングパーティ:第4回 根津神社 01/21/2017 japan
Manila 【MapAm❤re】OSM Workshop Series 8/8, San Juan 01/23/2017 philippines
Bremen Bremer Mappertreffen 01/23/2017 germany
Graz Stammtisch Graz 01/23/2017 austria
Brussels FOSDEM 2017 02/04/2017-02/05/2017 belgium
Genoa OSMit2017 02/08/2017-02/11/2017 italy
Passau FOSSGIS 2017 03/22/2017-03/25/2017 germany
Avignon State of the Map France 2017 06/02/2017-06/04/2017 france
Aizu-wakamatsu Shi State of the Map 2017 08/18/2017-08/20/2017 japan
Buenos Aires FOSS4G+SOTM Argentina 2017 10/23/2017-10/28/2017 argentina

Note: If you like to see your event here, please put it into the calendar. Only data which is there, will appear in weeklyOSM. Please check your event in our public calendar preview and correct it, where appropiate..

This weeklyOSM was produced by Peda, Polyglot, Rogehm, SeleneYang, SomeoneElse, SrrReal, TheFive, YoViajo, derFred, jinalfoflia, keithonearth, wambacher.

by weeklyteam at January 13, 2017 07:00 PM

Wikimedia Tech Blog

Importing JSON into Hadoop via Kafka

Photo by Eric Kilby, CC BY-SA 2.0.

Photo by Eric Kilby, CC BY-SA 2.0.

JSON is…not binary

JSON is awesome.  It is both machine and human readable.  It is concise (at least compared to XML), and is even more concise when represented as YAML. It is well supported in many programming languages.  JSON is text, and works with standard CLI tools.

JSON sucks.  It is verbose.  Every value has a key in every single record.  It is schema-less and fragile. If a JSON producer changes a field name, all downstream consumer code has to be ready.  It is slow.  Languages have to convert JSON strings to binary representations and back too often.

JSON is ubiquitous.  Because it is so easy for developers to work with, it is one of the most common data serialization formats used on the web [citation needed!].  Almost any web based organization out there likely has to work with JSON in some capacity.

Kafka was originally developed by LinkedIn, and is now an open source Apache project with strong support from Confluent.   Both of these organizations prefer to work with strongly typed and schema-ed data.  Their serialization format of choice is Avro.  Organizations like this have tight control over their data formats, as it rarely escapes outside of their internal networks.  There are very good reasons Confluent is pushing Avro instead of JSON, but for many, like Wikimedia, it is impractical to transport data in a binary format that is unparseable without extra information (schemas) or special tools.

The Wikimedia Foundation lives openly on the web and has a commitment to work with volunteer open source contributors.  Mediawiki is used by people of varying technical skill levels in different operating environments.  Forcing volunteers and Wikimedia engineering teams to work with serialization formats other than JSON is just mean!  Wikimedia wants our software and data to be easy.

For better or worse, we are stuck with JSON.  This makes many things easy, but big data processing in Hadoop is not one of them.  Hadoop runs in the JVM, and it works more smoothly if its data is schema-ed and strongly typed.  Hive tables are schema-ed and strongly typed.  They can be mapped onto JSON HDFS files using a JSON SerDe, but if the underlying data changes because someone renames a field, certain queries on that Hive table will break.  Wikimedia imports the latest JSON data from Kafka into HDFS every 10 minutes, and then does a batch transform and load process on each fully imported hour.

Camus, Gobblin, Connect

LinkedIn created Camus to import Avro data from Kafka into HDFS.   JSON support was added by Wikimedia.  Camus’ shining feature is the ability to write data into HDFS directory hierarchies based on configurable time bucketing.  You specify the granularity of the bucket and which field in your data should be used as the event timestamp.

However, both LinkedIn and Confluent have dropped support for Camus.  It is an end-of-life piece of software.  Posited as replacements, LinkedIn has developed Gobblin, and Kafka ships with Kafka Connect.

Gobblin is a generic HDFS import tool.  It should be used if you want to import data from a variety of sources into HDFS.  It does not support timestamp bucketed JSON data out of the box.  You’ll have to provide your own implementation to do this.

Kafka Connect is generic Kafka import and export tool, and has a HDFS Connector that helps get data into HDFS.  It has limited JSON support, and requires that your JSON data conform to a Kafka Connect specific envelope.  If you don’t want to reformat your JSON data to fit this envelope, you’ll have difficulty using Kafka Connect.

That leaves us with Camus.  For years, Wikimedia has successfully been using Camus to import JSON data from Kafka into HDFS.  Unlike the newer solutions, Camus does not do streaming imports, so it must be scheduled in batches. We’d like to catch up with more current solutions and use something like Kafka Connect, but until JSON is better supported we will continue to use Camus.

So, how is it done?  This question appears often enough on Kafka related mailing lists, that we decided to write this blog post.

Camus with JSON

Camus needs to be told how to read messages from Kafka, and in what format they should be written to HDFS.  JSON should be serialized and produced to Kafka as UTF-8 byte strings, one JSON object per Kafka message.  We want this data to be written as is with no transformation directly to HDFS.  We’d also like to compress this data in HDFS, and still have it be useable by MapReduce.  Hadoop’s SequenceFile format will do nicely.  (If we didn’t care about compression, we could use the StringRecordWriterProvider to write the JSON records \n delimited directly to HDFS text files.)

We’ll now create a camus.properties file that does what we need.

First, we need to tell Camus where to write our data, and where to keep execution metadata about this Camus job.  Camus uses HDFS to store Kafka offsets so that it can keep track of topic partition offsets from which to start during each run:

# Final top-level HDFS data output directory. A sub-directory
# will be dynamically created for each consumed topic.
etl.destination.path=hdfs:///path/to/output/directory

# HDFS location where you want to keep execution files,
# i.e. offsets, error logs, and count files.
etl.execution.base.path=hdfs:///path/to/camus/metadata

# Where completed Camus job output directories are kept,
# usually a sub-dir in the etl.execution.base.path
etl.execution.history.path=hdfs:///path/to/camus/metadata/history

Next, we’ll specify how Camus should read in messages from Kafka, and how it should look for event timestamps in each message.  We’ll use the JsonStringMessageDecoder, which expects each message to be  UTF-8 byte JSON string.  It will deserialize each message using the Gson JSON parser, and look for a configured timestamp field.

# Use the JsonStringMessageDecoder to deserialize JSON messages from Kafka.
camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder


camus.message.timestamp.field specifies which field in the JSON object should be used as the event timestamp, and camus.message.timestamp.format specifies the timestamp format of that field.  Timestamp interpolation is handled by Java’s SimpleDateFormat, so you should set camus.message.timestamp.format to something that SimpleDateFormat understands, unless your timestamp is already an integer UNIX epoch timestamp.  If it is, you should use ‘unix_seconds’ or ‘unix_milliseconds’, depending on the granularity of your UNIX epoch timestamp.

Wikimedia maintains a slight fork of JSONStringMessageDecoder that makes the camus.message.timestamp.field slightly more flexible.  In our fork, you can specify sub-objects using dotted notation, e.g. camus.message.timestamp.field=sub.object.timestamp. If you don’t need this feature, then don’t bother with our fork.

Here are a couple of examples:

Timestamp field is ‘dt’, format is an ISO-8601 string:

# Specify which field in the JSON object will contain our event timestamp.
camus.message.timestamp.field=dt

# Timestamp values look like 2017-01-01T15:40:17
camus.message.timestamp.format=yyyy-MM-dd'T'HH:mm:ss


Timestamp field is ‘meta.sub.object.ts’, format is a UNIX epoch timestamp integer in milliseconds:

# Specify which field in the JSON object will contain our event timestamp.
# E.g. { “meta”: { “sub”: { “object”: { “ts”: 1482871710123 } } } }
# Note that this will only work with Wikimedia’s fork of Camus.
camus.message.timestamp.field=meta.sub.object.ts

# Timestamp values are in milliseconds since UNIX epoch.
camus.message.timestamp.format=unix_milliseconds

If the timestamp cannot be read out of the JSON object, JsonStringMessageDecoder will log a warning and fall back to using System.currentTimeMillis().

Now that we’ve told Camus how to read from Kafka, we need to tell it how to write to HDFS. etl.output.file.time.partition.mins is important. It tells Camus the time bucketing granularity to use.  Setting this to 60 minutes will cause Camus to write files into hourly bucket directories, e.g. 2017/01/01/15. Setting it to 1440 minutes will write daily buckets, etc.

# Store output into hourly buckets.
etl.output.file.time.partition.mins=60

Use UTC as the default timezone.
etl.default.timezone=UTC

# Delimit records by newline.  This is important for MapReduce to be able to split JSON records.
etl.output.record.delimiter=\n


Use SequenceFileRecordWriterProvider if you want to compress data.  To do so, set mapreduce.output.fileoutputformat.compress.codec=Snappy (or another splittable compression codec) either in your mapred-site.xml, or in this camus.properties file.

# SequenceFileRecordWriterProvider writes the records as Hadoop Sequence files
# so that they can be split even if they are compressed.
etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.SequenceFileRecordWriterProvider

# Use Snappy to compress output records.
mapreduce.output.fileoutputformat.compress.codec=SnappyCodec


Finally, some basic Camus configs are needed:

# Replace this with your list of Kafka brokers from which to bootstrap.
kafka.brokers=kafka1001:9092,kafka1002:9092,kafka1003:9092

# These are the kafka topics camus brings to HDFS.
# Replace this with the topics you want to pull,
# or alternatively use kafka.blacklist.topics.
kafka.whitelist.topics=topicA,topicB,topicC

# If whitelist has values, only whitelisted topic are pulled.
kafka.blacklist.topics=

There are various other camus properties you can tweak as well.  You can see some of the ones Wikimedia uses here.

Once this camus.properties file is configured, we can launch a Camus Hadoop job to import from Kafka.

hadoop jar camus-etl-kafka.jar com.linkedin.camus.etl.kafka.CamusJob -P /path/to/camus.properties -Dcamus.job.name="my-camus-job"


The first time this job runs, it will import as much data from Kafka as it can, and write its finishing topic-partition offsets to HDFS.  The next time you launch a Camus job with this with the same camus.properties file, it will read offsets from the configured etl.execution.base.path HDFS directory and start consuming from Kafka at those offsets.  Wikimedia schedules regular Camus Jobs using boring ol’ cron, but you could use whatever new fangled job scheduler you like.

After several Camus runs, you should see time bucketed directories containing Snappy compressed SequenceFiles of JSON data in HDFS stored in etl.destination.path, e.g. hdfs:///path/to/output/directory/topicA/2017/01/01/15/.  You could access this data with custom MapReduce or Spark jobs, or use Hive’s org.apache.hive.hcatalog.data.JsonSerDe and Hadoop’s org.apache.hadoop.mapred.SequenceFileInputFormat.  Wikimedia creates an external Hive table doing just that, and then batch processes this data into a more refined and useful schema stored as Parquet for faster querying.

Here’s the camus.properties file in full:

#
# Camus properties file for consuming Kafka topics into HDFS.
#

# Final top-level HDFS data output directory. A sub-directory
# will be dynamically created for each consumed topic.
etl.destination.path=hdfs:///path/to/output/directory

# HDFS location where you want to keep execution files,
# i.e. offsets, error logs, and count files.
etl.execution.base.path=hdfs:///path/to/camus/metadata

# Where completed Camus job output directories are kept,
# usually a sub-dir in the etl.execution.base.path
etl.execution.history.path=hdfs:///path/to/camus/metadata/history

# Use the JsonStringMessageDecoder to deserialize JSON messages from Kafka.
camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder

# Specify which field in the JSON object will contain our event timestamp.
camus.message.timestamp.field=dt

# Timestamp values look like 2017-01-01T15:40:17
camus.message.timestamp.format=yyyy-MM-dd'T'HH:mm:ss

# Store output into hourly buckets.
etl.output.file.time.partition.mins=60

# Use UTC as the default timezone.
etl.default.timezone=UTC

# Delimit records by newline.  This is important for MapReduce to be able to split JSON records.
etl.output.record.delimiter=\n

# Concrete implementation of the Decoder class to use
camus.message.decoder.class=com.linkedin.camus.etl.kafka.coders.JsonStringMessageDecoder

# SequenceFileRecordWriterProvider writes the records as Hadoop Sequence files
# so that they can be split even if they are compressed.
etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.SequenceFileRecordWriterProvider

# Use Snappy to compress output records.
mapreduce.output.fileoutputformat.compress.codec=SnappyCodec

# Max hadoop tasks to use, each task can pull multiple topic partitions.
mapred.map.tasks=24

# Connection parameters.
# Replace this with your list of Kafka brokers from which to bootstrap.
kafka.brokers=kafka1001:9092,kafka1002:9092,kafka1003:9092

# These are the kafka topics camus brings to HDFS.
# Replace this with the topics you want to pull,
# or alternatively use kafka.blacklist.topics.
kafka.whitelist.topics=topicA,topicB,topicC

# If whitelist has values, only whitelisted topic are pulled.
kafka.blacklist.topics=

# max historical time that will be pulled from each partition based on event timestamp
#  Note:  max.pull.hrs doesn't quite seem to be respected here.
#  This will take some more sleuthing to figure out why, but in our case
#  here it’s ok, as we hope to never be this far behind in Kafka messages to
#  consume.
kafka.max.pull.hrs=168

# events with a timestamp older than this will be discarded.
kafka.max.historical.days=7

# Max minutes for each mapper to pull messages (-1 means no limit)
# Let each mapper run for no more than 9 minutes.
# Camus creates hourly directories, and we don't want a single
# long running mapper keep other Camus jobs from being launched.
# We run Camus every 10 minutes, so limiting it to 9 should keep
# runs fresh.
kafka.max.pull.minutes.per.task=9

# Name of the client as seen by kafka
kafka.client.name=camus-00

# Fetch Request Parameters
#kafka.fetch.buffer.size=
#kafka.fetch.request.correlationid=
#kafka.fetch.request.max.wait=
#kafka.fetch.request.min.bytes=

kafka.client.buffer.size=20971520
kafka.client.so.timeout=60000

# Controls the submitting of counts to Kafka
# Default value set to true
post.tracking.counts.to.kafka=false

# Stops the mapper from getting inundated with Decoder exceptions for the same topic
# Default value is set to 10
max.decoder.exceptions.to.print=5

log4j.configuration=false

##########################
# Everything below this point can be ignored for the time being,
# will provide more documentation down the road. (LinkedIn/Camus never did! :/ )
##########################

etl.run.tracking.post=false
#kafka.monitor.tier=
kafka.monitor.time.granularity=10

etl.hourly=hourly
etl.daily=daily
etl.ignore.schema.errors=false

etl.keep.count.files=false
#etl.counts.path=
etl.execution.history.max.of.quota=.8

Nuria Ruiz, Lead Software Engineer (Manager)
Andrew Otto, Senior Operations Engineer
Wikimedia Foundation

by Andrew Otto at January 13, 2017 06:05 PM