Shortcut: WD:PC

Wikidata:Project chat

From Wikidata
Jump to: navigation, search
Wikidata project chat
Place used to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please take a look at the frequently asked questions to see if your question has already been answered.
Also see status updates to keep up-to-date on important things around Wikidata.
Requests for deletions can be made here.
Merging instructions can be found here.

IRC channel: #wikidata connect
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2016/10.
Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day.

Project
chat

Administrators'
noticeboard

Development
team

Translators'
noticeboard

Requests
for permissions

Interwiki
conflicts

Requests
for deletions

Property
proposal

Properties
for deletion

Requests
for comment

Partnerships
and imports

Request
a query

Bot
requests

Contents

Allow non-existing article in a language to list articles in other languages[edit]

Suggestion: Allow indicating that a non-existing article in current language exists in another language, so show the other language(s) in the Languages list, when suggesting creation of article.

Solution (idea): WikiData must allow language specification for a non-existing article in a certain language.

Advantage: Clicking on a read link for a non-existing article in a certain language could then give the option to read the article in other languages, in addition to creating the article. The reader can then get the information in another language of own choice, while the link still shows that the article is non-existing in the current language.

Note: Suggestion was also previously posted at [1].  – The preceding unsigned comment was added by MortenZdk (talk • contribs) at 11:35, 11 September 2016 (UTC).

Bot generated data[edit]

sv:User:Lsj runs a bot that generates geographical articles (e.g. villages, rivers) in the Swedish and Cebuano wikis using freely available data from NASA and other sources. The bot extracts data about a location, then formats it into text and generates stub articles. Example for en:Abuko, with data items bolded:

Abuko has a savanna climate. The average temperature is {{convert|24|C}}. The hottest month is April, with {{convert|27|C}} and the coldest month is July, with {{convert|22|C}}.<ref name = "nasa">{{Cite web |url= http://neo.sci.gsfc.nasa.gov/dataset_index.php|title= NASA Earth Observations Data Set Index|access-date = 30 January 2016 |publisher= NASA}}</ref> Average annual rainfall is {{convert|1148|mm}}. The wettest month is August, with {{convert|449|mm}} of rain, and the driest month is February, with {{convert|1|mm}} of rain.<ref name = "nasarain">{{Cite web |url= http://neo.sci.gsfc.nasa.gov/view.php?datasetId=TRMM_3B43M&year=2014|title= NASA Earth Observations: Rainfall (1 month - TRMM)|access-date = 30 January 2016 |publisher= NASA/Tropical Rainfall Monitoring Mission}}</ref>
  • Would there by any problem with the bot storing the data in Wikidata?
  • Would there be any problem with articles embedding the wikidata items for a location into standardized text at display time?

Other types of data that could be stored by bots for settlements include census data and election results. Standard templates could then pull the data into chunks of Wikipedia text for articles in all languages, picking up the latest values at display time. Crazy? Aymatth2 (talk) 23:41, 24 September 2016 (UTC)

I think such data *should* be stored in Wikidata, and not as generated text in the Wikipedias, together with the provenance (i.e. whether it is from NASA or other sources). Whether the Wikipedias accept this kind of text, and whether they accept queries in running text, is up to the individual Wikipedias (I am rather skeptical regarding that approach), but that's really up to the local Wikipedia communities
Even without generating the text via queries from Wikidata, there are many ways that the projects could benefit from having the data stored in Wikidata, e.g. for checking the Wikipedia text whether it corresponds with Wikidata, etc. --Denny (talk) 01:32, 25 September 2016 (UTC)
+1. We don't need GeoNames stubs, in no language.--Sauri-Arabier (talk) 10:15, 25 September 2016 (UTC)
The data in question comes from GeoNames which is CC-BY licensed. A Wikipedia can important data with less copyright (sui generis database) concerns than Wikidata can. There's also a question about data quality. There are people who are concerned about importing a lot of wrong data into Wikidata. ChristianKl (talk) 08:15, 25 September 2016 (UTC)
An alternative to import only very high quality data is to provide wikidata data to quality heuristics or metrics, as told in the recent RfC : Data quality framework for Wikidata. The positive stuff about importing datas into Wikidata is that this is a starting point to improve datas by collaborative work, as opposed to trying to clean a dataset by someoneself alone. This also allows to spot inconsistencies between dataset because Wikidata can store several inconsistent datasets, hence provide some heuristic to know where the datas should be improved. author  TomT0m / talk page 08:37, 25 September 2016 (UTC)
Another piece of the puzzle is mw:Extension:ArticlePlaceholder who can generate text from wikidata datas and generates stub articles on the fly. This could basically make the bot useless. author  TomT0m / talk page 08:31, 25 September 2016 (UTC)

Can a bot reliably tell if there is already an item about the village, river, etc.? This can be difficult due to spelling variations, alternate names, and different levels of government. For example, near me, there are Rutland County (Q513878), Rutland (Q25893), and Rutland (Q1008836). Jc3s5h (talk) 12:37, 25 September 2016 (UTC)

  • The Swedish bot is steadily creating articles in the sw and ceb wikipedias for all locations in the world, mostly using geonames / NASA data. I assume these all get Wikidata entries. There may be errors, e.g. not realizing that Paraty and Parati are the same place, but that can be sorted out. The Swedish bot data is in the public domain: nobody can copyright mere facts on average rainfall or temperatures. Can we backtrack from existing Wikidata entries to the corresponding NASA data, then update attributes like "average July rainfall" from the NASA data, giving the source? That would give the Wikipedias a higher level of confidence about importing the data into their articles, and possibly generating articles to match the Wikidata entries. Aymatth2 (talk) 15:09, 25 September 2016 (UTC)
Having doublicate items isn't necessarily an error. It's not ideal but if someone notices they can merge. On the other hand GeoNames often contains wrong coordinates for an items. If then the temperature data are pulled based on the incorrect coordinates the whole item would have real errors in it's data. ChristianKl (talk) 19:08, 25 September 2016 (UTC)
  • @ChristianKl: Do we know how often the GeoNames coordinates are wrong, and how far they are wrong? The Swedish bot seems to be causing entries to be made in Wikidata for a great many places. I assume this includes coordinates. If they are within a kilometer or two, the temperature and rainfall data will be close enough - they are rough values anyway. If only 0.001% of the coordinates are completely wrong, we can live with that. Perfection is the enemy of excellence. But if 10% of the coordinates are completely wrong we have a very serious problem. Aymatth2 (talk) 21:59, 25 September 2016 (UTC)
  • @Aymatth2, ChristianKl: A lot of the data in GeoNames is just garbage, especially for Central America. I have no idea where GeoNames gets their data from, but it definitely isn't reliable. From spot checks of areas I know well, I would estimate that about 5% of their data for Central America is totally bogus. Kaldari (talk) 23:06, 26 September 2016 (UTC)
  • It looks like GeoNames gets their data from 74 other databases, which explains why some of the data is high quality and some of it is garbage. Kaldari (talk) 23:27, 26 September 2016 (UTC)
  • As far as the temperature data goes, there's are currently proposal to have currently to add a property for it https://www.wikidata.org/w/index.php?title=Wikidata:Property_proposal/average_yearly_temperature . Currently there isn't a property for it. ChristianKl (talk) 18:47, 25 September 2016 (UTC)
  • Oppose - A large amount of the data from GeoNames is poor quality (especially outside of Europe and North America). GeoNames is the largest geography database on the internet, not the most accurate. They aggregate data from 74 other databases, some of which are high quality and some of which have no quality control whatsoever. Our species data is already polluted by Lsjbot. I would hate to see the same thing happen with our geographical data. Kaldari (talk) 23:46, 26 September 2016 (UTC)
  • @Kaldari: I get the impression that as Lsjbot churns out geo-articles in the sv and ceb wikipedias, the coordinates from GeoNames get loaded into Wikidata. It would help to have some hard numbers on what percentages of these coordinates in Wikidata are a) accurate b) within 1km c) within 10km d) off by more than 10km. Is there a way to check a random sample of the coordinates against what we would consider reliable sources? Perhaps it could be done on a country-by-country basis. The bot data on climate etc. derived from coords+NASA could then be accepted for countries where coordinates are fairly accurate, rejected for others.
If there are countries where other sources give more accurate coordinates than GeoNames, is there a way to override the GeoNames Wikidata coordinates with data from those sources? Which are those countries? Aymatth2 (talk) 03:04, 27 September 2016 (UTC)
The problem is that the data quality from GeoNames is essentially random, as it depends mostly on which original database the data came from. Evaluating the quality of such an aggregated meta-database is practically impossible. It's like asking "What is the quality of data in Wikidata?". What Swedish Wikipedia should be doing is evaluating the quality of each of the 74 sources that GeoNames uses, figuring out which ones have high-quality data and importing only that data directly from the original sources. Kaldari (talk) 08:24, 27 September 2016 (UTC)

What is the percentage of errors in Wikidata, Wikipedia and Geonames? The data IS in Wikipedia and consequently it should be in Wikidata. The best thing we can do is work on this data and improve where necessary. Dodging the bullet by making it appear to be outside of what we do is plain silly. It is what we do among other things. Thanks, GerardM (talk) 10:25, 27 September 2016 (UTC)

I disagree with GerardM (talkcontribslogs)'s statement "The data IS in Wikipedia and consequently it should be in Wikidata". The whole idea of importing data from Wikipedia is dicey, since the quality of Wikipedia data is not as good as some other sources. Certainly if I came across some demonstrably wrong data in Wikipedia, and couldn't find a correct replacement, I should delete the data from both Wikipedia and Wikidata. Jc3s5h (talk) 12:25, 27 September 2016 (UTC)
  • Have we talked to the GeoNames people? I assume they have tried to use the most accurate data sources they can access, but in some cases have had to make do with imperfect sources. Spot-checks can give a good measure of the quality of data in GeoNames or, for that matter, in Wikipedia. If we find that GeoNames coordinates for British locations are 99.99% accurate in GeoNames, and 98.4% accurate in Wikipedia, we should replace all the British coordinates in Wikipedia and Wikidata with the Geonames coordinates. It is possible that one of the 00.01% of inaccurate GeoNames coordinates will replace an accurate Wikpedia coordinate, but the trade-off seems reasonable. We can then use a modified version of the Swedish bot to match the coordinates to the NASA data to get the altitude, temperature and rainfall data for those British locations and store it in Wikidata for use by Wikipedia. Why not? Aymatth2 (talk) 12:48, 27 September 2016 (UTC)
I don't know about all the Wikipedia's, but at the English Wikipedia, if a bot repeatedly replaces information that has been individually researched by a human editor, and for which reliable sources have been provided, with incorrect values, that bit will find itself indefinitely blocked. The current compromise on using Wikidata information at the English Wikipedia (other than linking to equivalent articles in other languages) may be found at w:Wikipedia:Requests for comment/Wikidata Phase 2. Jc3s5h (talk) 13:41, 27 September 2016 (UTC)
  • @Jc3s5h: An approach that may work is to have a bot take the coordinates given in a Wikipedia infobox (which may come from Wikidata), and use those coordinates to fetch the temperature and rainfall data from NASA and format them as text in the appropriate language. The chunk of text would be held in a separate Wikipedia file, transcluded into the article like a template, and the text would make it clear that it is NASA data for those coordinates as of the retrieval date. The bot could be rerun occasionally, or on demand, to refresh the data. It would be nice to store the data in Wikidata so all the Wikipedias could use it, but I get the impression that getting the Wikipedias and Wikidata to agree is tough. Aymatth2 (talk) 16:26, 27 September 2016 (UTC)

One large problem with Geonames is that they have matched data from different databases, but matched them so poorly that a small village with two families near my home got a population of several hundreds. This error was introduced because the village share the same name as a small town 1000 kilometers from here. The population data was correct, but GeoNames did not match the correct place. Another large problem is that Geonames have many duplicate items. Both French and English databases have been used for Canada, therefor many Canadian items in Geonames can be found twice. Once with a French name and once with an English name. A lake at the border between Northern Territories and Western Australia can be found at least twice. Places who ends with the letter Ö in Sweden, are categorised as islands, even if they are not islands. Large parts of Faraoe Islands can be found at the bottom of the Atlantic Ocean. Almost every coordinate is rounded to nearest minute, locating mountain peaks floating in the air and lakes on dry land. Many items about buildings does not tell very much about the building at all. It only tells that this kind of building at least have existed here at some point between Stone Age and today. -- Innocent bystander (talk) 13:37, 27 September 2016 (UTC)

  • The immediate concern that triggered this discussion is with villages, where we need accurate enough coordinates to derive rainfall and temperature data from NASA. Are the GeoNames coordinates usually "good enough" for this purpose? Duplicate names are probably not a huge issue with villages. In Canada a lake, river or mountain might have variants (e.g. Lake Champlain/Lac Champlain), but a village would have the same name in both languages. Aymatth2 (talk) 16:26, 27 September 2016 (UTC)
Duplicate names are an issue with villages. Villages names often aren't unique. ChristianKl (talk) 17:13, 27 September 2016 (UTC)
  • If GeoNames has two entries for one village, St. Jean and Saint John, whatever, and they both have roughly accurate coordinates, good enough for climate data, there is no problem for the purpose being discussed as long as one of them can be matched to the Wikidata entry. The problem is when GeoNames places St. Jean, Quebec somewhere in Alabama. I suspect that wildly inaccurate coordinates are rare. Aymatth2 (talk) 17:28, 27 September 2016 (UTC)
  • I'm not convinced that getting the wrong village in the same county (or similar geographic unit) is good enough. I've hiked in an area where one side of a mountain ridge line is a temperate rain forest, and the other side is an ordinary northern forest. Jc3s5h (talk) 18:07, 27 September 2016 (UTC)
  • Climate data is always an approximation. My garden has different microclimates and vegetation on the dry, sunny slope in front of the house and the moister, shaded hollow behind. The climate data for a village in the Congo may be based on reports from meteorological stations more than 100 kilometers away. If we insist on perfect data we will get no data at all. Aymatth2 (talk) 23:10, 27 September 2016 (UTC)
When data is used to create articles in Wikipedias we are not talking about English Wikipedia we are talking about the process whereby new content is created in multiple Wikipedias. When we refuse to acknowledge processes like this and not include the data we have no way of improving the data before it is actually used to created articles. What use is it for us to be the data repository for Wikipedia when we refuse to be of service? It is wonderful to disagree but what does it bring us? NOTHING. We can do better and we should do better. Thanks, GerardM (talk) 20:03, 27 September 2016 (UTC)
  • We should provide the best data we can, then constantly work on improving quality. Spot checks on accuracy must show the data are good and steadily getting better. Surely Wikidata can do a better job of assembling and maintaining accurate bulk data like coordinates, temperatures and rainfall than editors of individual Wikipedia articles. Aymatth2 (talk) 23:10, 27 September 2016 (UTC)
  • @GerardM: The problem here isn't just the accuracy of coordinate data. We're talking about potentially importing data for 10 million place names, many of which don't even exist, are misclassified, are duplicates of other places in GeoNames, are conflations of multiple places, or are duplicates of places with different names in Wikidata. Can we seriously hope to check and fix even a tiny fraction of that? Adding new items is easy. Deleting and merging bogus ones is much more difficult. If we aren't willing to import the data directly from GeoNames, why should we be willing to import it indirectly from Swedish Wikipedia? The real danger here, in my mind, is that in the rush to fill Wikidata (and Swedish Wikipedia) with as much data as possible, we are eroding the trust that the larger Wikipedias have in Wikidata's data quality and thus alienating Wikidata from a huge editor pool, dooming it to die a slow death by data-rot. Kaldari (talk) 04:54, 28 September 2016 (UTC)
  • @Kaldari: We will not do this for all the Chinese places; we already have them. We will import them anyway if they import them into Wikipedias first. We will then not have the nNow the question is: What is Wikidata good for. Why are we considering best practices for data quality when we do not make them operational, when we do not use them for the needs that are there. Yes, there will be problems but we will have them anyway and, it is much better to be in the driver sear and think on how to improve the data before they become Wikipedia articles. Just consider, all these places have likely red links in one of our Wikipedias. Kaldari, use what we have for our mutual benefit and forget about the big Wikipedias. We are there for the smaller ones as much and data and data quality is what we are there for. Thanks, GerardM (talk) 06:28, 28 September 2016 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

  • A bot developer can combine web searches with AI techniques to check whether a GeoNames place name is a) the name of a populated place, b) not a duplicate of some more common name and c) has accurate coordinates. The process is iterative: the bot generates a confidence score for a sample of items; the developer checks the high-scoring items; where there is a problem, the developer trains the bot to detect and downgrade items like this. Eventually the bot reaches the level where 99.99% of items above a given score are clearly correct. That is, among 10,000 items there is just one error. All other items are discarded or placed in a list for manual attention.
@Kaldari: Would you accept having the bot populate Wikidata on a one-shot basis with the high-scoring names and coordinates if it reached this level of accuracy? If not, what level of accuracy would you accept? Aymatth2 (talk) 13:29, 28 September 2016 (UTC)
Do you think there's currently a person who wants to write such a bot? ChristianKl (talk) 15:20, 28 September 2016 (UTC)
  • I might write one myself, but would not want to start unless there were clearly defined and agreed acceptance criteria. I would want assurance that I would not run into a stone wall of resistance to implementation after it had been proved to meet these agreed criteria. Let's see what user:Kaldari has to say. Aymatth2 (talk) 15:39, 28 September 2016 (UTC)
  • You are dodging the issue. The developer of this data is able to do a lot of all this if not all of it. THe point is when we do not cooperate he can just opt to add all this data to Wikipedias and then what! They are obviously articles and there will be multiples in Wikipedias so there will be items. When he STARTS with cooperating in Wikidata, we can start with disambiguation. We can add all the rest and compare with other sources and do what is necessary (whatever that is). The articles may be placeholders in any language and in any language we can seek cooperation. Now ask yourself, is this not a perfect example of how we can leverage Wikidata in a positive way, we would be proactively working on the data quality of Wikipedia or do you really want to insist on working after the fact. In all cases we have to deal with this shit. It is in our best interest to cooperate and not be so afraid what a subset of some Wikipedia communities have to say. Thanks, GerardM (talk) 18:30, 28 September 2016 (UTC)
  • @GerardM: I think you are overestimating the ability of the Wikidata community to clean up this data. No one has cleaned up any of the bogus species data that we imported from Swedish Wikipedia 3 years ago, nor is it even practical to do so. Let's say that I wanted to remove a totally bogus species from Wikidata, like Zygoballus mundus (Q5345040) (which has actually been deleted from the original database it was imported from). With a lot of effort (and Google Translate) I could probably get it deleted from Swedish Wikipedia, but it would still exists on the ceb and war Wikipedias, neither of which I have any clue how to interact with, so it will still persist on Wikidata indefinitely. Multiply that by the thousands of bogus species that need to be deleted and it quickly becomes an impossible task. I'm sure it won't be any easier getting all the abandoned logging camps and real estate developments (see below) removed from Wikidata after they are imported. Kaldari (talk) 19:21, 28 September 2016 (UTC)
  • @Aymatth2: If there was such a way to automatically determine accuracy, I would probably be willing to endorse it, but this sounds like a very challenging goal to accomplish. There is also the issue of notability to consider. GeoNames has no threshold for notability. It classifies neighborhoods, ghost towns, and real estate developments as "populated places" with no way to distinguish them from actual towns and cities. To give you one clear example of the problem, let's look at Chiquibul National Park in Belize. This has been a national park since 1995 and no one is allowed to live within the park except for park rangers. Within the boundaries of Chiquibul, GeoNames includes over a dozen logging camps that haven't existed for at least 20 years. These were never permanent settlements, just camps for loggers, yet GeoNames classifies them as "populated places". If you want to double check that these are in fact abandoned logging camps and not villages or towns, here's a list of some of them: Aguacate Camp, San Pastor Camp, Los Lirios Camp, Cebada Camp, Valentin Camp, Cowboy Camp, Retiro, Puchituk Camp, Mountain Cow, Blue Hole Camp, Cubetas. The reason these are included in GeoNames is because back in the 1970s (when Belize was British Honduras), the British government did a survey of the logging camps, and this survey data eventually ended up in GeoNames. How would you propose training an AI to detect cases where "populated places" were actually just abandoned logging camps or real estate developments? I imagine you would have to give it input from more reliable databases, and if you're already doing that, why not just use those databases to start with rather than GeoNames? Kaldari (talk) 19:04, 28 September 2016 (UTC)
@Kaldari: Well, to me it looks like these places are in fact "Populated places with an end date". There is nothing strange about that. We have many such items already. I started a thread about such items here some time ago.
But we'll never be able to actually supply an end date since there are no reliable sources about these camps. All we know is that they definitely don't exist anymore. And regardless of this specific example, my point is that we shouldn't be creating items for places that aren't covered in reliable sources. As it stands now, I could create 100 totally bogus cities in GeoNames (via their editing interface) and in a few months they would automatically become articles on Swedish Wikipedia complete with official looking NASA references, and then they would be copied to other wikis and imported into Wikidata where they would live forever without anyone ever questioning their existence. Even if someone did discover that one of them was fake, there would be no way to link them to the other fake cities. Doesn't that seem like a problem? Shouldn't we demand some minimum level of quality control and verifiability for the data we import? Kaldari (talk) 22:33, 28 September 2016 (UTC)
I strongly advise against importing any other data than GeoNames ID (P1566) from these svwiki or cebwiki-articles. If you want to import any other data from GeoNames, then do it directly from the poor database. We have detected many strange errors in these articles on svwiki. Many of the problems were detected when the bot reached Finland. Finland is a country with a fair share of active users on svwiki, since Finland is partly Swedish speaking. The articles were found describing savanna (Q42320) in parts of the arctic country, February were the hottest month in some cases. And the data about the lakes were often hilariously wrong. The bot was halted for some time, to discuss the quality-problems but it has started again, in full speed I'm afraid. -- Innocent bystander (talk) 20:20, 28 September 2016 (UTC)
  • I feel like someone who has poked a stick into a hornet's nest. It would be useful, if we know the name and coordinates of a populated place, to store that information in Wikidata and then to also store data derived from the name or coordinates such as census or NASA climate data so it could be shared by all the Wikipedias. I had no idea there was so much controversy about GeoNames. Lets forget about that as a source and look at the Datasources used by GeoNames in the GeoNames Gazetteer. Some of these look good to me. For example, the Instituto Brasileiro de Geografia e Estatística is a very reputable Brazilian government agency that provides a wealth of data about municipalities such as Cambuci that could be used to enhance the decidedly minimalist en:Cambuci article. I see no reason to treat a source like this with suspicion. This is what the Brazilian government says about their country. Is there a problem, in principle, with importing data from it so the Wikipedias can share it, and share updates? Aymatth2 (talk) 23:42, 28 September 2016 (UTC)
You are still dodging the bullet and, it may miss. When you import all this data you will have a certain percentage of error. It will probably be within the 3% range and that is better than all the work that I have done. I do make mistakes particularly when I am carefully adding content by hand. So when we want a mechanism to both update Wikidata and the Wikipedias, there is a precedent, there are two precedents. Listeria is able to update all the lists we have and, with a little bit of effort it can show the content for an item in a Reasonator kind of way. There is always the Placeholder, it is the official version of all this. The point is that we are thinking in one way; quality must be maintained and each project is an island. Yes and no. Quality must be maintained and refusing this data and having it through the backdoor is absolutely the way of NOT improving data. Improving quality can be done in many ways and YES we have communities. Why not ask our friends in India to verify and complete the data for India, why not ask the same for our Welsh friends. It is then for a part up to them to help us out but they CAN have the same data available to them if they so choose, available in a Listeria / Reasonator / Placeholder kinda way.
For all the nay sayers, tell me: what prospect do you have to improve this data that is better? It is not a good idea to say: "You may not import data from any Wikipedia" because I was told that there was no option but to accept erroneous data so we have a precedent whereby dodgy data is to be accepted. If we do not accept the data I will again open a can of wurms. Thanks, GerardM (talk) 04:46, 29 September 2016 (UTC)
  • If the IBGE data is imported mechanically to Wikidata it will be 100% accurate - as a reflection of the IBGE data. It will be safe for any Wikipedia article to say "According to the 2010 census by the Brazilian Institute of Geography and Statistics, the population was 12,456, of which 53% were female and 49% were male." The numbers do not add up, but that is indeed what the census says. The IBGE site is the official publication. IBGE may correct the numbers, and there will be another census in 2020, so we will want to periodically rerun the import to freshen up the data. As for the Wikipedias, there are two options, and I am not sure which is best:
  1. Dynamically pull the data from Wikidata at display time
  2. Periodically pull the data from Wikidata, format it and store it in each Wikipedia as a "template" to be embedded in the article.
The second approach is less immediate, but perhaps gives more control, and may be more efficient. Either way, Wikipedia and Wikidata editors would not update the data, which are identified as the IBGE numbers, not the "true numbers". If an editor finds a better source of population data than the census, they can include that in their article and suggest that it too is held in Wikidata. The Wikipedias may format the data as text or in tables according to editor preference. I see applying this approach to other reliable sources as a huge benefit to all Wikidata consumers, including all the Wikipedias. Aymatth2 (talk) 12:51, 29 September 2016 (UTC)
When the IBGE is imported, it still needs a lot of prepatory work. We already have many places of Brazil in Wikidata. It will only bring the current places and not the abandoned places. So yes it is valuable data but it is not all the data. Thanks, GerardM (talk) 05:04, 30 September 2016 (UTC)
@Aymatth2: Pulling data directly from the primary databases sounds like a much better idea to me. At least then we have a real source for verifiability and can assume a certain level of reliability (rather than it being a crap-shoot). Kaldari (talk) 18:44, 29 September 2016 (UTC)
Even if we assume that all data at the source is correct, there is still a lot of work to match each item in IBGE with each item here at Wikidata. You will then still get a percentage of errors. By hard work, we can improve that. GeoNames and the Lsjbot-project has here unfortunately made it worse. -- Innocent bystander (talk) 18:53, 29 September 2016 (UTC)
Sticks and stones. You do not address the issue. Lsjbot and GeoNames are realities we have to deal with. There are also other sources that have been imported that are way more problematic. Wikidata is not operating in a vacuum. It is dangerous to think we should ignore an opportunity that allows us to have an influence on the eventual content of multiple Wikipedias. It is discrimination pure and simple. Thanks, GerardM (talk) 05:04, 30 September 2016 (UTC)
@GerardM: I am not here to solve everything. My main opinion here is that we should not use svwiki or cebwiki as direct source for such things as height of mountains and surface area of lakes and some other data, since the methods the bot has used to find such data is very problematic. We found very large mistakes in Finland, and that is the only country we have been able to review. It becomes even worse since the bot a little to often has not been able to match correct GeoNames-item with correct Wikidata-item. That is not a big deal, if they are not matched at all. But a little to often "John Doe (city)" have been matched with "John Doe (mountain)" or "John Doe (parish)". The links to Wikipedia inside GeoNames has made it even worse since those are very often wrong. And that bad data has already been imported here. I used to daily correct such mistakes, but since I cannot see that I will finish before heat death of the universe (Q139931) I have quit doing so. -- Innocent bystander (talk) 07:19, 30 September 2016 (UTC)
It is not about you. It is about what we face. You propose discrimination on the fact that Cebuano and Swedish do not matter to you. What this issue brings is to the front that according to you a lot of GeoNames data we already hold needs work and we are already doing that work. You do not qualify the error rate in GeoNames, you do not compare it to other sources. It is opinion only. Compare that to 12% of the most subscribed medicines are not proven to be effective and we are to have all recognised substances approved for medica use in Wikidata.. REALLY? I often fix links to people where there is according to English Wikipedia a link only to find that Wikipedia has no link and the linked item is a person with the same name of a different century. Wikidata is as bad as GeoNames if not worse. But we have more resources than GeoNames to improve our data and we can help them fix their data. We, not you but you as well. We cannot say that Swedish does not matter. You can say it but that is just you. We cannot because improving the data in the Wikipedias is one of the most important functions of Wikidata and when we do this well, there is no real argument left not to use Wikidata for its data. Thanks, GerardM (talk) 04:41, 1 October 2016 (UTC)
  • All sources have errors. People are born and die while a census is being taken. Clerks make transcription errors. We cannot expect to record the truth, only what plausible sources like IBGE have said. I see no difficulty matching the IBGE entries for municipalities in each state of Brazil with the Wikipedia / Wikidata entries. There are only a few thousand of them. What is involved in getting accepted definitions in Wikidata of the official census data attributes, and approval to run a bot to load them for the Brazilian municipalities? Aymatth2 (talk) 22:28, 29 September 2016 (UTC)
Does the Brazilian census have Ids that they use to identitfy Brazialian municipalities? If so it would make sense to propose a new property for that Id.
In general it makes sense to announce the bot project a few days beforehand in this project chat, offer a few examples and see whether somebody objects. If nobody objects you can go ahead. In the case of the Brazilian census I doubt that anybody will object, but that's a project that has little to do with the GeoNames data. ChristianKl (talk) 17:46, 30 September 2016 (UTC)
  • Yes, Brazil assigns municipal codes. For Cambuci, Rio de Janeiro, it is 3300902. They also participate in the Open Geospatial Consortium, as do many other sources of high-quality geographical data. Perhaps Wikidata should too, as a consumer and distributor of the data. Aymatth2 (talk) 16:16, 1 October 2016 (UTC)

Bot generated data (break)[edit]

As GerardM I find the approach to ignore reality strange. I have on svwp followed Lsjbot closely with a focus on quality and I was responsable to issue a pause in August, in order for all community to discuss different issues that had turned up during the first million article being generated. There were some minor adjustment we agreed upon in order to be able to support lsjbot to continue, like not include itemtype "cliff/stoneblock" which geonames had existing on both land and in sea.

For the 1,3 M articles on species I have done an extensive analaysis, where the input from Wikidata was of great help. I found errors in a few hundred of the articles generated, representing around 1-3 per 10000 articles. In the same analysis I found that of the manual created ones the error frequences was 1-3 per 100 articles (it is easy to get the letters wrong in a latin name of 30-40 charters). I also found that of the errors reported in Wikdata about 1/3 was in fact no error. I consider this Botcreation a 100% success and also see it beinge repeated in a number (6-8 in total) of other language versions. There are challanges, though, where I beleive you with your Wikidata knowledge could help out. The taxons change frequent, meaning a number of taxons has been changes since the COL database of 2012. Could this be handled on Wikidata level and how then to tranfer this updates to the different language versions?

For geonames the quality issue is much more complex and I would very much appriciate if you put your energy and competence in discussing these. For example we know that duplicates are generated, like, as IB mentions above, in Canada where there are often created one with an English name and on with a French. On svwp we have said this is no (real) problem, as it gives the reader value anyway and it will be enogh with a mergetemplate. But how should Wikidata take care of these? We have also found that the approach on how to handle the case where a city and a commune (municipally) is more or less the same. On svwp we always treat this as two items, but we know that on other versions these are reprented in only one article. What is the Wikidata view on this? We have had a long dicssion on the quality of coordinates where it seems geonames often use a grid, making an error in the precision. But here we see this better then nothing, and when more exact coordiantes exists in existing articles these are used, and if more exact coordianted exist on other versions/wikidata these ought to be used (where you know better on how these more precise values can replace the rough ones). And there are issues worth discussing for several oher itemtypes, like weather data which is the start of this thread. Hope to see more of you in helping us in making the data from the botgerention even more valuable, and not only on a few language versions.Yger (talk) 08:19, 2 October 2016 (UTC)

Wikidata has per default one item for every svwp article. Duplicate Wikidata items aren't a huge deal. For Wikidata it's more important that they data in the items is correct. That said if svwp merges two items and Wikidata items exist to map the two concepts of the svwp articles it might make sense to merge the Wikidata items as well. :As far as taxons go, could you give an example of a taxon that recently changed it's name and the data source you have for it changing it's name? ChristianKl (talk) 22:33, 2 October 2016 (UTC)

Having far too much time on my hands, I checked the species mentioned in the main source for en:Rio Cautário Federal Extractive Reserve#Environment, the Brazilian Ministry of the Environment (MMA). I was interested in what the Wikipedias had, and what Wikispecies had. This is just one example of a collection of species from a location in western Brazil by someone who clearly favors reptiles over birds, so not "typical", but sort of interesting. Findings are shown below:

MMA name Alternative .en .es .sv .species Comments
Amburana acreana Y Y Y Y
Apuleia leiocarpa - Y Y Y
Bertholletia excelsa Y Y Y Y
Cedrela odorata Y Y Y Y
Dinizia excelsa - Y Y Y The .es and .sv entries are not linked
Dipteryx odorata Y Y Y Y
Erisma bicolor - - Y -
Erisma uncinatum - - Y Y
Hymenolobium petraeum - - Y Y
Mezilaurus itauba Mezilaurus ita-uba Y - Y Y In Wikispecies as Mezilaurus ita-uba
Swietenia macrophylla Y Y Y Y
Atractus insipidus - - Y -
Bothrocophias hyoprora Bothrops hyoprorus Y - Y -
Bothrocophias microphthalmus Bothrops microphthalmus Y - Y Y .sv entry as Bothrocophias
Bothrops mattogrossensis Bothrops matogrossensis - Y - -
Callithrix emiliae Mico emiliae Y Y Y Y .sv entry as Callithrix
Callithrix melanura Mico melanurus Y Y - Y
Chironius flavolineatus - - Y Y
Coluber mentovarius Masticophis mentovarius - Y Y Y .sv and Wikispecies as Masticophis
Crotalus durissus Y Y - Y .sv redirects to Crotalus adamanteus
Drymobius rhombifer - - Y Y
Drymoluber brazili - - Y Y
Enyalioides laticeps Y - Y Y
Enyalius leechii - - Y -
Epicrates crassus - Y - -
Epictia diaplocia Leptotyphlops diaplocius Y - Y - .en and .sv have Leptotyphlops
Erythrolamprus mimus - - Y -
Hoplocercus spinosus Y - Y -
Leposoma osvaldoi - - Y -
Micrablepharus maximiliani - - Y -
Micrurus mipartitus - - Y -
Ninia hudsoni - - Y -
Oxyrhopus formosus Y - Y -
Oxyrhopus rhombifer - Y Y -
Oxyrhopus vanidicus - - - - .fr has a stub
Pseudoboa nigra - - - Y
Saguinus fuscicollis Y Y Y Y
Siagonodon septemstriatus Leptotyphlops septemstriatus Y - Y Y Leptotyphlops in .en, .sv, Siagonodon in .species
Siphlophis worontzowi - - Y -
Tupinambis longilineus - Y Y -
Xenodon merremii Xenodon merremi
Waglerophis merremi
Y - Y - .en Xenodon and .sv Waglerophis not linked

The taxonomy sometimes changes, but it takes a while before consensus is reached on the new structure. Mico vs Callithrix seems to still be under debate. Every species mentioned by the source has an article in one of the wikis, although apart from .sv and .ceb most individuals wikis get less than half the species. A central clearing house for new entries and updates giving data on taxonomy, IUCN status and range would be a major step forward. Surely that is what Wikidata is for? Aymatth2 (talk) 11:57, 4 October 2016 (UTC)

Bot generated data (break2)[edit]

When there are quality sources such as IUCN, it's certainly the role of Wikidata to host that data. We do have IUCN-ID (P627), IUCN protected areas category (P814) and IUCN conservation status (P141). I doubt anybody would oppose a bot that imports that data directly from IUCN. ChristianKl (talk) 10:22, 5 October 2016 (UTC)
  • @ChristianKl: You are an optimist. When the IUCN Redlist does not find a species (e.g. Erisma bicolor), it directs the reader to the Species 2000 & ITIS Catalog of Life. The Catalog of Life seems reputable to me, but it is the primary source for Lsjbot, and to quote User:Kaldari (above) "Our species data is already polluted by Lsjbot." I imagine that introducing Catalog of Life common names / taxa via the IUCN back door would be just as contentious.
It would help if we had some well-defined criteria and process for determining which sources will be considered good enough for a bot to import their data to Wikidata. Then a bot developer who has met the criteria and followed the process can safely invest in the effort of developing the bot to load Wikidata. After that, of course, they have to get the Wikipedias to accept information from Wikidata. Aymatth2 (talk) 18:19, 5 October 2016 (UTC)
I understand the phrase "import data from IUCN" to mean to import information of species that do have and IUCN-ID (P627). If you would cite IUCN as a source for CatalogueOfLife data for species that don't have an IUCN-ID (P627) than I would think that people would rightfully object. References are very important for Wikidata. Many Wikipedia's don't like to import claims without references and currently most of the GeoNames and CatalogueOfLife imported data doesn't have references on Wikidata about the provenance of the data. It would be good to focus on quality of data and not on quantity.
In general importing data from it's original source and with a link to the original source is optimal. The CatalogueOfLife is a merged data set from 143 taxonomic databases.
As far as I understand the status quo is that it's desired that people who import massive amounts of data into Wikidata with a bot ask beforehand and seek consensus. I don't think there a history of this community being angry with people who announced what they wanted to do then did what they announced with a bot. ChristianKl (talk) 19:22, 5 October 2016 (UTC)
To be clear, I strongly support importing data from reliable sources, but neither GeoNames nor Catalog of Life qualify as a reliable source. Both include self-published data that is not vetted or reviewed. For example, at the time of Lsjbot's species project, Catalog of Life used a self-published non-peer-reviewed website as the authoritative source for all data on the animal family Salticidae, which includes over 5000 species. They have since corrected this problem and now use a totally different database for this family, but the damage is done and now 3 different Wikipedias and Wikidata have bogus, idiosyncratic data for this family. With GeoNames the problem is even worse. Anyone can add, edit, or delete data from GeoNames with no oversight whatsoever (similar to Wikipedia or OpenStreetMap but without a community to patrol the changes). They also have an extremely low standard for including data in the database and poor accuracy for place classification in some areas. In both the Catalog of Life and GeoNames cases most people aren't noticing these problems because these problems don't occur with popular items. For example, the Catalog of Life data for birds is pretty impeccable, but for obscure arthropods it's hit or miss. The GeoNames data for Sweden is awesome, but for Belize it's a mess. In both of these cases, high quality data does exist; it just takes more work to find, vet, and import. Using these mega-aggregate-databases is lazy and short-sighted. As admirable as Lsj's goals are, compiling data for all the world's places or species just isn't a task that should be undertaken by a single person or bot. It should be done with careful deliberation and only using vetted reliable sources (or at least sources that have some sort of community that is keeping the data updated and clean). Regardless, I'm not a member of the Swedish Wikipedia community and I have no influence there, so GerardM is probably right. We just have to learn to live with this mess. In the meantime, I don't support making it worse by importing any data directly from GeoNames or importing anything from Swedish Wikipedia besides article titles and GeoNames IDs. Kaldari (talk) 22:27, 5 October 2016 (UTC)
@Kaldari, Aymatth2, ChristianKl: Question : do CoL or geonames informations are traceable ? This means, do they cite their sources ? This would mean by reimporting datas from them we could source them from CoL then add the primary source they use to second the claims. Actually we already face problems with alignement to databases who have their own bugs with VIAF datas, and the solution seem to be a cooperation with them to upstream the wikidata corrections and periodic updates of Wikidata thanks to their input. If we could achieve that and source some of the claims that were directly or indirectly imported from them by up-to-date datas, the datas they deleted imported here could be left without sources and we could deal with them - delete? deprecate? - here after we're confident a large part of taxonomy datas are sourced from over databases, for example.
Halleluja! The problem we face is that a lot of assumption we have are wrong. When you want to be inclusive about plant names consider IPNI as a source. What we consider incorrect names are often scientifically valid names. Once we decide to seriously consider collaboration, it does not follow that what a CoL or Geonames hold is incorrect. What follows is that we continue to source data to multiple sources and compare statements. We will seek understanding about differences and in this way we contribute to our quality. The point is very much about the point of view we take. We are no longer new, we do provide service to other projects. What we do is not about importing data, it is about how we deal with the data we import. For Wikipedias we MUST accept their data but that does not mean that what they hold is good. We have been curating their data in Wikidata and this is largely unnoticed.
There is a distinction between valid and valuable. Our data is no better than any of the other user curated projects. Only when we consider how we narrow down where we spend our time improving the data our work will become more valuable. In such a process our data becomes more valid. Thanks, GerardM (talk) 09:51, 6 October 2016 (UTC)
  • The Catalog of Life gives its sources. See Erisma uncinatum for an example. The database is run by subject experts and is worth more than the sum of its sources since the merging and review process turns up problems to be fixed. The Catalog of Life gives a more complete and accurate overview of species than the Wikipedias, although lacking the depth a Wikipedia article may give on a given species. Yes, it has errors; all data sources have errors.
There are databases on everything from extragalactic objects to shipwrecks that provide more complete data than the Wikipedias, and keep adding entries and making corrections. They still have errors, of course; all data sources have errors. But providing data from these sources, saying where the data came from, is better than providing no data. If two sources give different values for the same data element, we can record both versions.
An update mechanism would be needed, so Wikidata would pick up additions and corrections from the data sources. For example, the accepted scientific name for a plant may change, with the former name now listed as a synonym. If the Catalog of Life scientific name value changes, Wikidata should change the value of scientific name that it shows as sourced from the Catalog of Life.
Perhaps the key is to view Wikidata as a repository of fairly current data from more-or-less reliable sources, with the sources identified, not as a repository of 100% true and accurate data. If we demand perfection we will achieve nothing. Aymatth2 (talk) 14:56, 6 October 2016 (UTC)
The value of taxon name (P225) of an item should never be changed. Create a new one, move sitelinks. --Succu (talk) 15:08, 6 October 2016 (UTC)
  • If the source corrects a spelling error (e.g. Bothrops mattogrossensis should be Bothrops matogrossensis), presumably Wikidata should reflect the correction. Aymatth2 (talk) 15:28, 6 October 2016 (UTC)
There are very rare cases this could be carefully done. It was not necessary for Bothrops matogrossensis (Q2911754). The misspelling of Mato Grosso (Q42824) could be found in the original desription of Bothrops neuwiedi matogrossensis (Q27118116) and was reintroduced in 2008 ([2]) as the subspecies was raised to a species. --Succu (talk) 16:19, 6 October 2016 (UTC)
  • I think we are violently agreeing. The sources are mostly accurate, but when an error is found it should be fixed. I expect the correction will often come from the source, since they are constantly working on their data. If we find an error in a source like the Catalog of Life we should report it back to them, cooperating in improving data quality. Aymatth2 (talk) 17:30, 6 October 2016 (UTC)
I don't think so. Data aggregators like CoL, GBIF or EOL are a bad starting points to enrich Wikidata with reliable data. As far as I'm aware Lsj failed to correct his early (2012?) CoL import. So we (as Wikidata) had to deal with this. I doubt your analysis „I found errors in a few hundred of the articles generated, representing around 1-3 per 10000 articles. In the same analysis I found that of the manual created ones the error frequences was 1-3 per 100 articles (it is easy to get the letters wrong in a latin name of 30-40 charters).“ is well grounded. Expanding the scope of Wikidata (=taxa not treated by any Wikipedia) should be carefully done. So the mapping of Flora of North America taxon ID (P1727) is nearly complete. Flora of China ID (P1747) lacks this completness due to a high amount of spelling errors. --Succu (talk) 21:35, 6 October 2016 (UTC)
@GerardM: „consider IPNI as a source“ - International Plant Names Index (Q922063) is far away from being a reliable source. --Succu (talk) 21:35, 6 October 2016 (UTC)
@Succu: „consider IPNI as a source“ - International Plant Names Index (Q922063) is inclusive of all the literature on plant species. Its origins are impeccable and it is superior at registering all the permutations over time. Trust me I analysed their data for all the succulents. I ended up with a 60Mb database where I normalised their data to bring all the errors ever produced in literature to a more manageable series of imho correct entries. When you talk about IPNI and its errors, you obviously do not know what IPNI is about. Thanks, GerardM (talk) 05:53, 7 October 2016 (UTC)
For „all the succulents“?! - No, I dont't trust this statement of yours. I know pretty well what the goals of IPNI are and what they reached until now. --Succu (talk) 08:16, 7 October 2016 (UTC)
You doubt my word.. Why? I am to trust your judgment on succulents based on what.. What we do does not conform with nomenclature. Too much is missing. The author, the publication and the publication date are essential parts of a valid name. We do not hold that information so all the data is deficient in principle. Thanks, GerardM (talk) 17:53, 7 October 2016 (UTC)
The trait „succulence“ is not well defined. So you have to say on what kind of definition your dataset is based on. Thats all. Yes, too much is missing. But that has nothing todo with nomenclature. It's a titanic workload we have to do and every help adding taxon authors and publications is welcome. --Succu (talk) 18:23, 7 October 2016 (UTC)
Sorry, why bother. My approach fitted my needs. The current approach of nomenclature is wrong at best and you think what others have done is of no consequence because you do not understand it. For me most of the arguments used are problematic what galls me most is the notion that you know best and try to enforce what is not correct in the first place. Thanks, GerardM (talk) 19:22, 7 October 2016 (UTC)
Whatever your „need“ was... I see nothing at your side that Wikidata helps to close those titanic gaps or made IPNI a better resource. Maybe you blogged about it? --Succu (talk) 19:32, 7 October 2016 (UTC)
I blogged about it in may 2007. You are not really interested beyond your own scope and that is fine. It is why I did not bother with taxonomy. There is enough to do anyway. My point, the one that you acknowledge is that our taxonomy data is flawed. IPNI has quality data it is a reliable source. My problem with Wikidata is that people like do not appreciate what Wikidata is about, what it can do and why it is so relevant. Thanks, GerardM (talk) 05:31, 8 October 2016 (UTC)
Is IPNI fit enought to be a source for the scientific names described in Descriptions of three hundred new species of South American plants, with an index to previously published South American species by the same author (Q21775025)? --Succu (talk) 05:51, 8 October 2016 (UTC)
@User:Yger: Sorry missed you. --Succu (talk) 21:55, 6 October 2016 (UTC)
Wikdata can give excellent value at little effort if it distributes information from databases maintained by specialists. We should attribute the data to the sources and refresh the data from the sources to ensure we reflect their current view. We do not have the resources to do it ourselves. User:Yger analyzed the articles on species, found errors in 1-3 per 10,000 bot-generated articles, and found errors in 1-3 per 100 manually created articles. If we try to compete with the specialist databases we will fail. Aymatth2 (talk) 23:18, 6 October 2016 (UTC)
Flora of North America (Q1429295) is a series of books accompanied by a website. An what you propose is allready done. --Succu (talk) 08:16, 7 October 2016 (UTC)
Aymatth2: ITIS is a lame duck and not an up-to-date resource. ITIS is not a „specialist databases”. FishBase (Q837101) or Avibase (Q20749148) are far better. --Succu (talk) 20:46, 7 October 2016 (UTC)
The analysis as run by Yger seems meaningless (to put it kindly). Any analysis depends on input data, and in this case there are no reliable input data. It is also meaningless to use the amount of error recorded in Wikidata as a starting point: finding errors is a difficult and thankless business, and therefore it is mostly not done. At a rough guess only 1% of the errors in svwiki has been marked as such in Wikidata.
        As Kaldari says, CoL is very variable in the quality of its data at any one time, and this will vary with time (what was hopless last year is better this year, etc).
        There are areas in svwiki (based on CoL) where the error rate is something like 50% (CoL does give its sources, and almost invariably these sources make it clear that what ended up in CoL is wrong). Hopefully this 50% is a maximum, found only in limited areas, but there is nobody who can really know how much error there is in svwiki. All I can tell is that the error rate is off the scale.
        And by errors I do not mean taxa that have changed their name (in such cases both names are found in the literature, and are good data), but I mean 'taxa' that do not exist, never have existed, and never will exist. - Brya (talk) 10:56, 7 October 2016 (UTC)
@Yger: You say I consider this Botcreation a 100% success. An example I'm running into today. In Four new species of Hypolytrum Rich. (Cyperaceae) from Costa Rica and Brazil (Q27137125) four new species are described: no label (Q27136913): Hypolytrum espiritosantense (Q15587199), Hypolytrum glomerulatum (Q15588104), Hypolytrum lucennoi (Q15588880) and no label (Q27136913). Lsjbot (Q17430942) failed to create the latter one. Do you include such omissions in your analysis? --Succu (talk) 20:25, 7 October 2016 (UTC)
@Brya: These are serious allegations against organizations that receive significant public funding. Can you point us to examples of errors in items on the Catalogue of Life database? Aymatth2 (talk) 22:47, 7 October 2016 (UTC)
These are not allegations but observations. They are not new either. For heavy rates of error check the Ebenaceae or Apiaceae in svwiki against the current CoL (CoL has realized its error). For an example that illustrates that Yger is personally putting back complete nonsense see here. - Brya (talk) 04:15, 8 October 2016 (UTC)

Bot generated data (break3)[edit]

  • @Brya: I am not particularly interested in what has happened in the past, except in what we can learn from it. There were teething problems with the Catalogue of Life, and we have no automated process to refresh our data as they make corrections and additions.
Your example is useful. Maba quiloënsis was described by Hiern in 1873, named Ebenus quiloënsis by Kuntze in his 1891 Revisio generum plantarum vascularium... and named Diospyros quiloënsis by White in 1956. The last is now the accepted name. The Museum of Natural History (Q688704) "Virtual Herbaria" had entry 260951 for Ebenus quiloënsis, and entry 69345 for Diospyros quiloënsis aka Maba quiloënsis. They have since merged the entries so they are identical, giving all three names, but the Catalogue of Life has not yet picked up the merger.
It would be correct for us to record that the Catalogue of Life shows Ebenus quiloënsis and Diospyros quiloënsis as separate species, while the UofV "Virtual Herbaria" shows them as synonyms. This is not "complete nonsense". Then, when the Catalogue of Life makes the correction, we should refresh our data to show their current view. Have you notified the Catalogue of Life of this problem, which may be an oddity or may be systemic? Do you know of other problems with items on the current Catalogue of Life database? Aymatth2 (talk) 13:06, 8 October 2016 (UTC)
The CoL has been going for quite a while, some fifteen years, with a new version every year. It had and has more than teething problems. Quite a few are structural.
        Maba quiloensis, Ebenus quiloensis, Diospyros quiloensis are three different names (three different formal nomenclatural entities), so there can be / should be three items in Wikidata. These names are homotypic, so they can not refer to different species, by definition. They refer to the same species (not necessarily the same circumscription) and in any particular taxonomic viewpoint only one of these names can be used at a time. If one believes in a genus Maba (which nobody has for quite a while) then Maba quiloensis is (likely) the correct name for a species. If one believes in an all-encompassing genus Diospyros (which has been the consensus for quite a while) then Diospyros quiloensis is (likely) the correct name for a species. By definition, Ebenus quiloensis is never the correct name of a species (never has been). If svwiki is aiming to be an encyclopedia, then it should have at most one entry for the species. In fact, dewiki would not allow an entry such as held by svwiki, as it has no meaningful content. But svwiki does hold two entries both claiming that the name is the correct name of a species: a miraculous duplication of species. Or to put it differently, a bold faced lie.
        I see you failed to run even a basic comparison between svwiki and CoL (you are just defending svwiki's wrongdoings?). The CoL has had Diospyros quiloensis as the accepted name for something like a year and a half now. Recording that CoL has held different contents earlier would only be useful in a database that collected metadata on errors in databases. Brya (talk) 14:26, 8 October 2016 (UTC)
  • @Brya: I missed the fact that the Swedish wiki is citing the historical 2014 version of the Catalogue of Life, since corrected. This is a useful example because it shows the danger of importing from a source but not updating. If Wikidata had imported Ebenus quiloënsis from the Museum of Natural History (Q688704) "Virtual Herbaria" in 2014, we would have got the same information as the 2014 Catalogue of Life entry. If we had not refreshed that data, we would still be reflecting the error in the 2014 "Virtual Herbaria", as the Swedish Wikipedia does. If we refreshed from the latest "Virtual Herbaria" or the latest Catalog of Life we would automatically get the correction. Again, can you point us to problems with items in the current Catalogue of Life database? Aymatth2 (talk) 16:06, 8 October 2016 (UTC)
You seem to be increasingly separated from reality? The Swedish Wikipedia / svwiki has not been refreshed but is still showing all the errors it has imported. As I remember the Vienna database, it did not have these errors, but these were generated by CoL. - Brya (talk) 16:52, 8 October 2016 (UTC)
  • @Brya: I am not trying to defend the Swedish Wikipedia, which has not been refreshed but is still showing all the errors it has imported. Can you point us to problems with items in the current Catalogue of Life database? Aymatth2 (talk) 23:47, 8 October 2016 (UTC)
OK, the Swedish Wikipedia has not been refreshed and is still showing all the errors it has imported. A great deal of these errors are also in Wikidata, as eliminating them is very difficult.
        I don't closely follow CoL, and would be quite happy if it had never been published, but the errors are inescapable. The only error that is easily pointed at is the BIG ERROR, whereby the names of cattle, sheep, the goat, etc are wrong (disallowed by the ICZN: CoL has them wrong because ITIS has them wrong, ITIS has them wrong because MSW has them wrong, and MSW has them wrong because they were rushed by an oncoming deadline and they panicked). But an indicator of the degree of error can be found in the amount of homonyms: these can be likened to names that jump up and down shouting something wrong here, please take action. Any time I look (which is not often) I seem to see such homonyms. Of course homonyms are not the only errors, but they are easily visible: the tip of the iceberg. - Brya (talk) 06:45, 9 October 2016 (UTC)
  • @Brya: Can you give a specific example, as in "the current CoL entry for Hypolytrum aymatthii is wrong because ..." ? Aymatth2 (talk) 12:04, 9 October 2016 (UTC)
Like I said cattle (the CoL-entries "Bos taurus indicus Linnaeus, 1758", "Bos taurus primigenius Bojanus, 1827", "Bos taurus taurus Linnaeus, 1758" are wrong), sheep (the CoL-entries "Ovis aries Linnaeus, 1758", "Ovis aries aries Linnaeus, 1758", "Ovis aries orientalis Gmelin, 1774" are wrong), the goat (the CoL-entries "Capra hircus Linnaeus, 1758", "Capra hircus aegagrus Erxleben, 1777" are wrong), etc. In this case because the ICZN has ruled against them (see amongst others here). - Brya (talk) 12:35, 9 October 2016 (UTC)
It would be of more interest if they changed their mind in no label (Q21682705). --Succu (talk) 14:09, 9 October 2016 (UTC)
You are twisting facts. Certainly "the traditional approach of treating the wild goat as a sub-species [of the domesticated goat]" is a wild reversal of fact. MSW in its earlier editions deviated from a very well-established tradition among zoologists treating the domesticated animals as part of their wild predecessors, so several zoologists put in a formal case at the ICZN to put a stop to it. After allowing and evaluating input from zoologists across the world the ICZN decided to follow tradition and made this tradition mandatory for the animals enumerated in the case.
        It did not rule "that wild relatives of domestic animals should be named as if they were separate species," and it never would since whether or not a group of animals represents a taxon, and if so, if this taxon should be given the rank of species or subspecies is a matter of taxonomy, not of nomenclature. It is perfectly all right to recognise the wild and domesticated goat as subspecies, but the ruling is that these then must be named Capra aegagrus aegagrus and Capra aegagrus hircus. No way that Capra hircus aegagrus can be the correct scientific name of an animal. Not in this universe.
        Some of the authors of the book allowed themselves to get panicked by the oncoming deadline into perpetuating their defeated rebellion. It may be possible to feel sympathy for them, but that does not make them less wrong. The fact that there is a book that has these names wrong means very little. If somebody publishes a book that the earth is flat, or that 2 + 3 = 17, this does not make the earth flat, or makes 2 + 3 = 17. - Brya (talk) 14:52, 9 October 2016 (UTC)
  • @Brya: These controversies are very exciting. Do you have any other specific examples? Aymatth2 (talk) 15:13, 9 October 2016 (UTC)
There is a presumably large but indefinite number of cases. You have made it pretty clear that it is pointless to list any of them. - Brya (talk) 15:20, 9 October 2016 (UTC)
  • @Succu: @Kaldari: perhaps you could contribute examples. It is important to understand the issues we face with authorizing the import of data. It would not have occurred to me that the Smithsonian's Mammal Species of the World was a controversial or unreliable source. Are there specific examples of other types of problem with other Catalogue of Life sources? Aymatth2 (talk) 16:13, 9 October 2016 (UTC)
I do not use CoL. The point is to be careful when creating new items about taxa. And if possible double check them with a second reliable source. At least this is what I try to do. --Succu (talk) 16:19, 9 October 2016 (UTC)
@Aymatth2: All of those examples are relatively pointless as they are just showing that the CoL has outdated information and information from reliable sources that don't agree with other sources. The problems with the CoL are more substantial than that. Here is a better example. The CoL previously included the species name Modunda narmadaensis, which was imported to Swedish Wikipedia, and subsequently to Wikidata. The name Modunda narmadaensis originates with a self-published website that cites itself as the source of the name (with no other explanation). The name has never been accepted by any peer-reviewed source or the authoritative catalog for the family. It is purely the speculative opinion of one person on the internet who couldn't be bothered to write a paper about it (or lacked the evidence to do so). Same with Modunda pashanensis and numerous other examples. The Catalog of Life is only as good as the feeder databases that it pulls from, and in some cases it has pulled from very low-quality databases that are not reliable. Kaldari (talk) 20:53, 9 October 2016 (UTC)
To the CoL's credit, they have since deleted Modunda narmadaensis entirely, but it still exists on three different Wikipedias. Kaldari (talk) 20:59, 9 October 2016 (UTC)
It's Bianor narmadaensis (Q3150207). --Succu (talk) 21:37, 9 October 2016 (UTC)
  • @Kaldari: I was hoping for examples of problems with the current Catalogue of Life. Past errors are relevant, but errors in the present stable version would be more relevant. So far all that has been identified is the Smithsonian / ICZN difference on domestic animal names, which may be just a problem with publication dates – although that is a type of problem that must be recognized. Given the level of EU / US government funding, the contributors, curators and consumers of the data, one would expect very high quality – certainly higher than most specialist databases on other types of information from which we might want to import data. There is a trade-off between taking data from an aggregator like the Catalogue of Life, perhaps being selective about originators, and going direct to the originators. The aggregator provides a convenient single interface to a bot, with a single agreement for content reuse, and may add value by vetting the originators. On the other hand, they may somehow introduce errors. Do we have specific, current examples of errors in the Catalog of Life that might illustrate other types of problem? Aymatth2 (talk) 01:12, 10 October 2016 (UTC)
@Aymatth2: I don't have any examples of errors in the current Catalog of Life. In fact, I might be OK with importing data from CoL, if two conditions were met:
  1. The bot that imports the data also updates it once per year (or the maintainer provides source code for doing so)
  2. The updates support not only adding data, but also flagging items that may need to be merged or deleted (which should be done with human review)
FWIW, I personally support some of the more conservative taxonomy in the CoL (via Mammal Species of the World) but I know there are widely differing opinions on that. Kaldari (talk) 02:04, 10 October 2016 (UTC)
The problem is not with the taxonomy of Mammal Species of the World; I have no opinion whatsoever on their taxonomy (a matter of science), but with the fact that in some cases they use names that have explicitly been disallowed by the ICZN (the issue is not taxonomical, but nomenclatural, a matter of 'law'). - Brya (talk) 04:23, 10 October 2016 (UTC)
To illustrate, take a comparable case. There is a Google®; suppose there is a small company selling phones that decides to call themselves Google also, arguing that there can be no confusion since they are selling phones, not web-services. Google® takes them to court, and the judge, after hearing the case, rules that the small company may not use this name. CoL is like a phone directory which continues to use the proscribed name. (The difference is, of course, that Google® is a megabuck company that can enforce such rulings, while the ICZN has no direct means to enforce anything) - Brya (talk) 05:49, 10 October 2016 (UTC)
  1. An annual (or more frequent) update is essential for many bots that import data to Wikidata, whether the data is on taxa, galaxies, municipalities or shipwrecks. Even data that should never change will change as the sources make corrections.
  2. Part of the update process would be to flag items for merge or possibly deletion, although I would be inclined to keep dud entries flagged as obsolete rather than delete them altogether.
  3. After updating Wikidata, perhaps from several sources (e.g. Catalog of Life and IUCN redlist), there should be an extract of the data and then updates to the Wikipedias. en:Wikipedia:Village pump (idea lab)#Bot content with updates explores how we could let a Wikipedia article transclude text generated from Wikidata, picking up updates automatically, while also containing content written by editors.
  4. Our role should not be to decide on the "correct" data, but to record what reputable sources have said. Where there is dispute, we should record both versions. Thus we should be able to say that according to Lloyd's the Santa Isabella sank on March 5, while according to the Admiralty she stayed afloat until March 7. A Wikipedia article can report what reliable independent sources say about the difference.
I think something along these lines, and other concepts, need to be formalized as a bot-generated data policy, so we can ensure that bots follow good practice and give bot developers assurance that if they follow the policy their bot will be accepted. Aymatth2 (talk) 12:08, 10 October 2016 (UTC)
Before that we should define rules for users harvesting data from Wikipedias. It remains unclear what update process means. All Create, read, update and delete (Q60500) operations? Why should we to rely on CoL? We match scientific names against GBIF and EOL. Why bother with CoL? Are you aware of the gender problem? I'm updating IUCN conservation status (P141) for a while, but I never would create a taxon name (P225) based on data provided by IUCN. If I did not made substancial errors we have a complete mapping of MSW ID (P959) (=Mammal Species of the World (Q1538807)). Or more recently English common names (=taxon common name (P1843) prefered by IOC World Bird List, Version 6.3 (Q27042747). From time to time I try to close a major gaps in our species data. E.g. we had lots of genera of Foraminifera (Q107027) from eswiki, but not a lot of species. With the help of Fossilworks (Q796451) and World Register of Marine Species (Q604063) I changed this, but should we inform the Wikipedias about they missed them (adding redlinks to their genus article)? We should build a knowledge base of our own. Not copying data from aggregators. Options for taxa are Data paper (Q17009938) (examples) or exploiting papers in the TaxPub format. By the way: this would give us more references we lack. As would the use of ZooBanks nomenclatural acts (=ZooBank nomenclatural act (P1746)). --Succu (talk) 19:49, 10 October 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Succu: I will try to respond to some of your points:.

  1. I assume there are already guidelines for importing data to Wikidata from Wikipedias. In the reverse direction, the Wikipedias will develop their own guidelines for importing data from Wikidata. They may decide to generate and later update stub articles that show what Wikidata says, which their editors can then supplement with descriptive text drawing on other sources.
  2. We should not rely on the Catalog of Life, or any other source. But it is a large, reputable database, and it will be useful to record the data that it gives – along with data from other sources. We should clearly identify that this is the Catalog of Life view, by using properties like "CoL synonym", "CoL distribution" or "CoL source". These values may or may not be the same as the equivalent values from other sources. If they are different, that is interesting.
  3. By "update process" I meant a process to synchronize Wikidata with a data source, so that Wikidata accurately reflects what the source now says. That could involve refreshing the values of properties like "CoL distribution", or nulling those values. A Wikidata item where all the properties have been nulled may perhaps be flagged for manual attention as "no longer used".
  4. If the IUCN gave a taxon name that was not found in other sources we could create an entry for it, showing that the name is used only by the IUCN. If possible the entry would point to a "correct" form. That would be useful for users who find the taxon name in IUCN and look it up in Wikidata. Assuming the IUCN later changed the taxon name, the update process would null the IUCN properties, but might leave the entry with its pointer to the "correct" form as a convenience to our readers.
  5. Almost all reference databases or books are aggregations of entries created by many individuals over a period of time. A Catalog of Life entry refers to an entry in the World Porifera database that refers to a 1932 Report on the shallow-water marine sponges in the collections of the Indian Museum, which draws on a description published in an 1885 scientific journal. At great effort we could go back to the 1885 publication, but would we be confident that it was up to date? The aggregator adds value by selecting and vetting sources. They will make mistakes, but should correct them when they are found. The 1885 journal will not correct its mistakes.

Does the above make any sense? The basic concept is that our role should not be to decide on the "correct" data, or to build a knowledge base of our own, but to record what reputable sources have said. We must accept that the scientific community will place more importance on correcting errors in the Catalogue of Life's feeder databases than on correcting errors in Wikidata, and establish a mechanism so we automatically pick up those corrections. Aymatth2 (talk) 13:31, 11 October 2016 (UTC)

Catalogue of Life focuses on being inclusive and might include some data we don't want. The views of a 1885 journal are notable in a way that views from an UGC website aren't. You also claim that Catalogue of Life is reputable without linking to any expert in the field making such a statement and speaking about it's data quality. ChristianKl (talk) 13:26, 14 October 2016 (UTC)
We disagree. I wrote „we should build a knowledge base of our own“ and showed some points how this could be achived. There is no need for another CoL clone. -Succu (talk) 16:03, 11 October 2016 (UTC)
  • Importing and maintaining views of data from the Catalogue of Life, IUCN, BirdLife International, etc. does not prevent us from independently building and maintaining a knowledge base where we have the resources. Importing data adds layers of information and differing viewpoints to the knowledge base. It is not a competition. We can do both. Aymatth2 (talk) 23:10, 13 October 2016 (UTC)
I don't think anybody here spoke against importing IUCN data. I don't see why you still treat it as being in the same category as the Catalogue of Life. It makes me feel like you aren't trying to understand the views of other people but just try to convince people to accept Catalogue of Life or GeoNames data.
Apart form data quality issues the legality of importing Catalogue of Life is also questionable. It's an EU database project (with means it has Sui Genesis in Europe) and it says at it's own website that it not only requires attribution but also noncommercial usage. ChristianKl (talk) 13:26, 14 October 2016 (UTC)
About what „layers of information” are you talking, Aymatth2? What kind of knowledge provided by CoL helps us to be more trustable? --Succu (talk) 21:50, 14 October 2016 (UTC)

Bot generated data (break4) @Lsj[edit]

Mind to comment User:Lsj? --Succu (talk) 21:37, 9 October 2016 (UTC)

Bot generated data (break5)[edit]

The deletion log at svwiki today shows a set of deleted bot generated articles. (University-articles only describing the geography around a building.) The opposition against the GeoNames-based articles on svwiki is increasing. -- Innocent bystander (talk) 17:26, 15 October 2016 (UTC)

Sounds like good news to me. --Succu (talk) 22:36, 15 October 2016 (UTC)

Unsourced sexual orientation (P91) statements[edit]

sexual orientation (P91) is a property which requires sensitive use, and its talk page particularly states that it should be used “only together with a reference in that the person itself states her/his sexual orientation or for historical persons if most historians agree”. Using this SPARQL query, I can find 4790 violations of this rule (5560 property uses in total, thus 86% violation rate). The issue came to my attention because two days ago a Wikidata user added almost 2000 new violations by a data import from enwiki and eswiki (see this complex violations report diff).

What to do now? It appears unlikely that someone adds the required sources to the statements in question, although parts of the imported data are probably properly sourced at enwiki or eswiki. I would therefore propose to remove all unsourced sexual orientation (P91) statements in the very near future, given the fact that we use this property almost exclusively in case of non-heterosexuality (Q339014) (usage statistics). —MisterSynergy (talk) 15:55, 27 September 2016 (UTC)

This was discussed recently. Such values should not be removed, without first making efforts to source them - not least by checking the originating Wikipedia. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:02, 27 September 2016 (UTC)
Could you please provide a link to the recent discussion? This does not seem fair to me. Someone made a mass import, violated a clearly stated rule and now other Wikidata users should do the difficult part of the job, which is a manual verification and addition of sources? The information of the unsourced statements would not be lost after a possible removal, since it is mostly still available in Wikipedias. —MisterSynergy (talk) 16:11, 27 September 2016 (UTC)
+1 This is not the task of other contributors to correct or complete data from previous contributions especially when the initial import was not respecting a constraint. The only way to teach people the respect of the rules isto delete their work when it is not complying with the rules. Snipre (talk) 16:44, 27 September 2016 (UTC)
I think more people said this and just a few had the same opinion as Pigsonthewing. Sjoerd de Bruin (talk) 17:23, 27 September 2016 (UTC)
Wikidata:Project chat/Archive/2016/08#Unsourced and Wikipedia sourced P91 statements. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:19, 27 September 2016 (UTC)
Thanks for the link. The positions in this discussion were pretty much the same as here, with roughly the same number of users on both sides. If we now went with the "manually check before remove" scenario, how could that ever work out? I don’t see a chance that anybody solves the problem this way, but I am open to hear suggestions by anyone… —MisterSynergy (talk) 19:34, 27 September 2016 (UTC)
User:Thryduulf made a very useful suggestion in the linked discussion about ranking statements without sources. In the case under discussion, the edits would fall under group 1, and should be deleted within a short space of time. Not sure what the definition of a short space of time would be (a couple of days?), but if people want to keep unsourced, potentially libellous and defamatory, controversial data, then provide a reliable source. Robevans123 (talk) 20:44, 27 September 2016 (UTC)
Restating that suggestion with slight refinement, as I think it warrants more discussion than it got. The idea is that the combination of property and class of item (e.g. living people / long-deceased people / inanimate objects / etc) will provide a default level, which can be tweaked based on the claim in the context of the individual item. For example a claim about the religion of a Catholic cardinal is not controversial and doesn't need immediate sourcing (unless it's something other than some flavour of Catholicism), a claim about the religion of a US presidential candidate very much does require strong sourcing. Because of the way the Wikidata UI works, immediate sourcing is not reasonable requirement of a manual human editor. I make no comment about editors using bots, etc as I have no experience. The levels I propose are:
Level 0 - Source is provided
The statement has one or more reliable sources associated with it, no further action is required.
Level 1 - Source always required
The statement will be deleted if a source is not provided within a short time (24 hours?) of the statement being added. This should only be used for a very few properties by default and almost never when used for deceased people);
Level 2 - Source almost always required
The statement will normally be deleted if a source is not provided within a reasonably short time (2-3 days?) but exceptions are possible based on common sense, especially for deceased people. This should not be used by default on many properties.
Level 3 - Source should be provided
Statements should be accompanied by a source but they will not be routinely deleted without consideration of the circumstances. Where a source is given as "imported from [a] Wikipedia", the given Wikipedia should be checked before deletion. This should be default many, perhaps most, properties that are not external ids or other links external sources. Where possible, editors should be alerted and given reasonable time to find sources before deletion.
Level 4 - Low priority
Statements should be accompanied by a source but they will not normally be deleted unless verification has failed or the statement is implausible. Only properties that will rarely be controversial should be at this level by default, but it is appropriate for statements such as a person's being a specific religion or nationality, etc. if their membership of a group open only to people of a specific religion, nationality is not in doubt (e.g. the religion of a Roman Catholic cardinal is not controversial, nor the nationality of a 20th Century US president).
Level 5 - self-sourcing
No independent source is required. Most (all?) external identifiers should be at this level by default, but it also applies to other self-sourcing properties, e.g. the title of creative works hosted on Wikisource on the items about the work, or "instance of" statements on items about Wikimedia disambiguation pages.
This proposal does requires a moderate amount of initial work to assign the levels, and some ongoing work to maintain them (but the latter should not be a big task). I have got no idea how to mark them though (would it require developer time?). My feeling is that the setting or changing the level of a property should be something that should be restricted to established users at a level similar to rollbacker granted by administrators on request to anyone who they feel is unlikely to abuse it. Administrators would also have the abiltiy to revoke the right. Maybe the reviewer right could be use for this. Thryduulf (talk) 21:37, 27 September 2016 (UTC)
Basically a Source [is] always required. --Succu (talk) 21:56, 27 September 2016 (UTC)
@Succu: No. A source is required for levels 1-4, but with decreasing urgency and increasing requirements to look for sources before deleting unsourced statements. Sources are not required for level 5. Also, sources can be inferred in some cases - e.g. if there is a source statement that Joe Smith is a member of an organisation that only admits bisexuals then it is not at all urgent to provide a source for the statement that Joe Smith is a bisexual. Thryduulf (talk) 00:55, 28 September 2016 (UTC)
It's an unnecessary bureaucratic approach which has the potential to raise conflicts. --Succu (talk) 21:22, 28 September 2016 (UTC)
@Thryduulf: I like the idea of such an approach, but this proposal seems to be too complicated. 2 levels of properties are basically enough: (1) requires sources, and statements must be removed if source is not provided after a short time, and (2) requires sources, but we "permanently" accept data without (I ignore authority control properties here). P91 and very few other things such as P140 would be case (1), the vast majority of properties case (2), regardless of which items are affected. If particular statements require sources for other reasons, we should have another rule.
To provide a path for a solution: we could perhaps use the constraints on the property talk pages to define actions in case of lacking sources. Example: P91 has a complex constraint template, which if mandatory could already be enough to enforce data removal after a waiting time to be defined. We just need to write down such a rule at an appropriate page. —MisterSynergy (talk) 05:04, 28 September 2016 (UTC)
@MisterSynergy: while simplification is obviously superficially attractive, reality is much more complicated than the binary (or trinary) you propose and it wont scale when you start to think about properties other than simply sexual orientation (P91), and data removal should never happen without a check by a human to determine whether the claim is actually unsourced (e.g. a source hasn't been added since the query results were last updated) and if it is, whether that can be trivially fixed - for example it took about 2 minutes to source the P91 statement for Mhairi Black (Q19863151) and less than half of that was finding a source. Thryduulf (talk) 10:27, 28 September 2016 (UTC)
We are talking about 5000 unsourced statements here, which amounts to more than 200 man-hours of work if we added sources to all of them at your rate (equivalent to 5 weeks of fulltime employment!). Let's face reality: nobody will ever invest this amount of time. We either remove the data, or the constraint of required sources is useless. Whoever wants to have this data at Wikidata can re-add the statement again within a couple of seconds. I really don't like the idea of data removal, but given the fact that P91 is indeed somewhat delicate, I don't see another option. Btw. I don't think our positions are that far apart: I did not talk about your #0 and #5, our #1 have same intentions, but I just did not differentiate your #2-#4. —MisterSynergy (talk) 10:54, 28 September 2016 (UTC)
In this case the problem is that can be potentially defamatory, maybe already said but I think that we must delete all this data without source. --ValterVB (talk) 16:46, 27 September 20Q34752816 (UTC)
Project:Be bold (Q3916099) MisterSynergy: Remove them. --Succu (talk) 21:41, 27 September 2016 (UTC)
  • Yes, please remove them. Either we require the claims to be sourced immediately or we don't (as we generally do). While the description and the creation of the property is confusing, it seems clear that we require sourcing here and that the contributor has no intention of adding them. -- Jura 02:51, 28 September 2016 (UTC)
But if you do remove them, please, go on removing all unsourced dates of birth, places of residence etc. I am strongly against the special treatment of P91. – Máté (talk) 04:33, 28 September 2016 (UTC)
Sure, any other problematic statements should be removed as well. It's somewhat rare for DOB, but still.
--- Jura 06:06, 28 September 2016 (UTC)
Date of birth is an interesting case indeed, which is publicly debated at the moment due to a new Californian law according to which websites (particularly IMDb) must remove this data upon request of an actor. They found that there is some age discrimination of actors in Hollywood, and one easily comes to the conclusion that in most other businesses of this world the situation is probably similar. However, I still support treating P91 (and few others) differently. Fact is that basically all sexual orientations other than heterosexuality (and this is what we collect here) lead to massive discrimination and legal/social threats in most parts of this world, way beyond anything which results from public date of birth data which can typically be infered from other public information to some extent anyway. We should therefore only accept and permanently keep this P91 information if the person itself had a public coming-out, proven by the reference. —MisterSynergy (talk) 06:40, 28 September 2016 (UTC)
I'm against removing unsourced dates of birth and residence statements. Dates of birth and residence statements are often quite important to distinguish different people with the same name.
Furthermore I don't think "be bold" should apply to mass deletion content without a consensus that it should be deleted. Probably RfC consensus. ChristianKl (talk) 10:33, 28 September 2016 (UTC)
Now again, deceased people are quite unlikely to mind any discrimination against them resulting from an unsourced Wikidata statement. – Máté (talk) 12:23, 28 September 2016 (UTC)

Okay, we had no input in this thread for some days now, which means that there is still no concept how to solve this situation other than data removal. I’d wait for another day for input, but from tomorrow on I would suggest to be bold and remove unsourced P91 data without further manual checking due to the large amount of affected data. Properties other than P91 with the same problem should be discussed here at WD:PC first before we start to be bold there as well. —MisterSynergy (talk) 09:07, 2 October 2016 (UTC)

I would agree on the removal, though would suggest that a report is generated of the removals (pre or post).  — billinghurst sDrewth 10:29, 2 October 2016 (UTC)
I think an revision of Wikidata:Database reports/Complex constraint violations/P91 would do. Sjoerd de Bruin (talk) 10:46, 2 October 2016 (UTC)
Good points, billinghurst and Sjoerddebruin. Wikidata:Database reports/Complex constraint violations/P91 contains exactly the items which would be affected by the removal procedure in the very first step. I think we should leave a permanent note on Property talk:P91 after data removal, containing a diff link to the complex constraint violations page after it has been reduced to the purged state. —MisterSynergy (talk) 11:16, 2 October 2016 (UTC)

I will soon start with the removal of unsourced statements, using this results set with Autolist; with only one edit every 10 seconds it will take some hours to work through all affected items. If other users or bot operators want to help, please let me know. If you see the removal of a statement you’d like to keep on your watchlist, feel free to add it again including a valid source according to this rule: Property talk:P91#Rules for Usage. —MisterSynergy (talk) 09:38, 3 October 2016 (UTC)

Note that Autolist doesn't support any comments in the edit summary, I think that is needed to avoid edits wars. Sjoerd de Bruin (talk) 09:45, 3 October 2016 (UTC)
Correct, the best would be to have a customized edit summary which states the reason for the removal and links to this discussion. However, I don’t know which tool to use other than Autolist or a bot (I don’t have one). If you think we should use a bot to provide reasonable edit summaries, we need to find an operator :-) —MisterSynergy (talk) 09:49, 3 October 2016 (UTC)

Unsourced statement removal is finished now. I omitted items about fictional characters, so that the complex constraint violations page will not be empty tomorrow. At Property talk:P91 you can find a query to identify new violations on items about humans. In total, only three removals have been reverted and subsequently equipped with a source. Other than that there were no complaints until now.

There is now the question what to do with imported from (P143)-sourced statements. Formally, those are considered as “unsourced”, yet they at least provide a connection to the Wikipedia where this claim stems from. Any idea? I think there are some hundreds of cases we talk about… —MisterSynergy (talk) 06:42, 6 October 2016 (UTC)

Is it possible to break down the imported from (P143) statements by language? It is likely going to be easier for e.g. an English speaker to verify a claim (and import the source from) an English language project than e.g. a Russian one. Similarly if we find that there are only a small number imported from say the Maltese Wikipedia, that becomes a much easier task for a Maltese speaker than a list of several hundreds. Thryduulf (talk) 11:31, 6 October 2016 (UTC)
Good idea! This is definitely possible, but we need a SPARQL magician to build that query Face-smile.svg Anyone willing? —MisterSynergy (talk) 11:43, 6 October 2016 (UTC)
WD:Request a query is good place for this. Anyway:
SELECT ?item ?itemLabel ?value ?valueLabel ?wiki ?wikiLabel {
  ?item p:P91 [ ps:P91 ?value; prov:wasDerivedFrom/pr:P143 ?wiki ] .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
} ORDER BY ?wiki
SPARQL query
Matěj Suchánek (talk) 13:31, 6 October 2016 (UTC)
Thanks, Matěj Suchánek! Data was almost exclusively imported from enwiki (98%). Which means that almost all of us could do some work here! I propose to wait for another week or so and see whether the amount of P143-sourced claims significantly reduces from its current value of 608. If not, I’ll suggest to remove those as well (adding references to 600 claims is still equivalent of ~25 hrs of work, but let’s see). —MisterSynergy (talk) 14:47, 6 October 2016 (UTC)
I haven't got time to look at the query results atm, but is that claims that are sourced only by imported from (P143)? I'm sure I've seen claims with that and an explicit source - those must certainly not be deleted without discussion. Thryduulf (talk) 16:08, 6 October 2016 (UTC)
I have looked at a couple of items from the results set and did not find any claim with both a P143 and a real source yet (and I need much more than 2.5 minutes per added source). However, if it really comes to a removal next week, I will look for only-P143-sourced claims of course. —MisterSynergy (talk) 16:12, 6 October 2016 (UTC)

It is difficult for me to see how the progress goes. Could anyone please provide a query that lists items with P91 statements, which are P143-“sourced” and do not have any other reference? My Sparql skills are very limited unfortunately (I know Wikidata:Request a query, but let’s keep things simple at one place). Thanks, MisterSynergy (talk) 05:57, 13 October 2016 (UTC)

@MisterSynergy: That would be this :

# Could anyone please provide a query that lists items with P91 statements, which are P143-“sourced”

select ?item where {
  ?item p:P91 [
    prov:wasDerivedFrom ?ref
  ] .
  ?ref pr:P143 []
  filter not exists { ?ref ?prop [] . filter (?prop != pr:P143)} .
}

SPARQL query Not a lot of time ahead so I only checked one result, so there might still be mistakes in the query, but at least one result is correct :) author  TomT0m / talk page 06:31, 13 October 2016 (UTC)

Thanks, I will test a bit whether the results are good. —MisterSynergy (talk) 06:38, 13 October 2016 (UTC)
Hm, there’s kind of a problem. We have 603 items with a P143-sourced P91 statement. The query gives 602 results, including Derrick Gordon (Q16466466) (which should not be found), and excluding Catrine Telle (Q4982281) (has a “malformed” reference). So I guess this query looks for other properties in the same reference as P143 belongs to, rather than looking for additional references. Is this correct? —MisterSynergy (talk) 06:46, 13 October 2016 (UTC)
 this one is better
select ?item where {
  ?item p:P91 [
    prov:wasDerivedFrom ?ref
  ] .
  ?ref pr:P143 [] .
  filter not exists {
    ?item p:P91 [
     prov:wasDerivedFrom ?ref2 
    ] 
    filter (?ref2 != ?ref ) 
    filter not exists { ?ref2 pr:P143 [] } .
  }
}
SPARQL query @MisterSynergy: author  TomT0m / talk page 19:09, 13 October 2016 (UTC)
Yes, this one looks good — Thanks a lot! As expected, most (598 of 603) statements only have a P143 source, and nobody spent effort into fixing this during the last week. I’ll wait for another day or two for action, otherwise I think we should remove those as well. —MisterSynergy (talk) 19:17, 13 October 2016 (UTC)

Formatter URLs requiring API keys[edit]

We need to agree a model for showing formatter URL (P1630) values that include API keys.

For example, we currently have:

and a proposal for:

using the strings your-API-key and <apikey>, respectively.

We should have a standard, instead of those strings; for example $2. What string should we use?

Where that string is present in a formatter URL, the values for the property should not be linked. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:16, 4 October 2016 (UTC)

Restored from arhives. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:31, 16 October 2016 (UTC)

Top goalscorer and tv network[edit]

Hello.

1) Is there a property which I can use to show the Top goalscorer of a league?

2) Is there a property which I can use to show the tv network of a league?

Xaris333 (talk) 06:56, 8 October 2016 (UTC)

Regarding the second question, not that I am aware of but a "Broadcast rights holder" property seems like it would be a useful thing to have, e.g.
< Sports league > broadcast rights holder search < Sky Sports >
applies to territorial jurisdiction (P1001) See with SQID < United Kingdom (Q145) View with Reasonator See with SQID >
applies to part (P518) See with SQID < live television (Q431102) View with Reasonator See with SQID >
< Sports league > broadcast rights holder search < BBC >
applies to territorial jurisdiction (P1001) See with SQID < United Kingdom (Q145) View with Reasonator See with SQID >
applies to part (P518) See with SQID < highlights >
. 13:57, 8 October 2016 (UTC)


@Thryduulf:, @Edgars2007:. Is there a way to show in a players page that he was the top goalscorer of a season? For example Sotiris Kaiafas (Q351862) was the top goalscore of 1973–74 Cypriot First Division (Q2736753). We also have Cypriot First Division top goalscorers (Q16327504) (that can be rename to "top goalscorers in Cypriot First Division by season" if that can help). For the same player I can use award received (P166) for winning the European Golden Shoe (Q233454) on 1976. I just want a way to do the same with the top goalscorer honour. Xaris333 (talk) 19:32, 10 October 2016 (UTC)

Maybe simply wait for that property to be created (statistical leader) and place that statement on season's item not duplicating that info on player's item? Otherwise use it on P166. --Edgars2007 (talk) 09:14, 11 October 2016 (UTC)

I have done it this way [3]. Xaris333 (talk) 15:02, 11 October 2016 (UTC)

Check consistency of a map[edit]

Hi, I need a help from a coder. I tried by myself but I failed.

A place in Pokémon games shares border with (shares border with (P47)) other places in 4 directions, i.e. north (Q659), south (Q667), west (Q679) and east (Q684). I need to check the consistency of maps of Pokémon games.

Some places change from game to game: e.g. Cianwood City (Q3745924) differs from Pokémon Gold and Silver (Q837346) and Pokémon HeartGold and SoulSilver (Q611189). In fact, if in shares border with (P47) there is a place with qualifier present in work (P1441) it means that that border is present only in that/those game(s). If there is not such qualifier it means that the border is present in all the games defined by statement present in work (P1441).

I need a script that, given by input the Pokémon game (e.g. Pokémon Gold and Silver (Q837346)) checks that the map is consistent: every place is connected by each other and with the correct direction (if a place A shares the northern border with place B, so place B shares the southern border with place A). If there is an error the script should break and report which places don't share the correct border.

Optional: is it possible to visualize the map, in order to eventually check missing data?

Thank you very much in advance. ----★ → Airon 90 14:54, 8 October 2016 (UTC)

@Airon90: I think writing a script is too much for this. Query like this:
SELECT ?item ?itemLabel ?other ?otherLabel ?dirLabel ?otherdirLabel {
  ?item wdt:P1080 wd:Q17562848;
        p:P47 [ ps:P47 ?other; pq:P560 ?dir ] .
  MINUS {
    ?other p:P47 [ ps:P47 ?item; pq:P560/wdt:P461 ?dir ] .
  } .
  OPTIONAL { ?other p:P47 [ ps:P47 ?item; pq:P560 ?otherdir ] } .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } .
}
SPARQL query
prints some items which are incosistent with others. Matěj Suchánek (talk) 08:33, 9 October 2016 (UTC)
Thank you, Matěj Suchánek, but it seems that it's not enough, as it doesn't recognize the games --

★ → Airon 90 05:55, 16 October 2016 (UTC)

Distinctions between national park areas and national parks[edit]

I've been scanning the national parks of England and Wales. I notice that several have a single Wikipedia page/Wikidata item representing both the geographic area and the national park:

  • Dartmoor
  • Exmoor
  • Lake District
  • New Forest
  • North York Moors
  • The Broads
  • Peak District

These have just national park pages/items:

  • Northumberland National Park
  • Pembrokeshire Coast National Park

These have Wikipedia pages for both the area and the park separately:

  • Brecon Beacons
  • South Downs
  • Yorkshire Dales

I would appreciate some input on whether there should be Wikidata items simply matching the pages, or whether there is logic in having items for area and separately the national parks for each one. There are cases where there are clear differences (which are mentioned in the Wikipedia artices). Pauljmackay (talk) 10:08, 9 October 2016 (UTC)

Having items for both makes a sense but it's not important. The area exists in it's nature even before the inception of the park.ChristianKl (talk) 11:07, 9 October 2016 (UTC)
I don't see why the issue is "not important". The area of a national park is specifically designated and regulated, whereas the geographic area is more generally defined. It is quite possible that the area and the park are not identical. For example, the Brecon Beacons Park includes not only includes the Brecon Beacons (area), but also the Blorenge which most people would not regard as part of the Brecon Beacons (area). In reverse, not all of the Pembrokeshire Coast (area) is included in the Pembrokeshire Coast National Park. In addition, there is usually a statutory governing body, for example the Brecon Beacons National Park Authority. These get a passing mention in the wikipedia articles but have quite wide ranging powers/contact information etc that would be best described in a (third) wikidata item. Robevans123 (talk) 12:21, 9 October 2016 (UTC)
In a case where the area isn't directly the same having items that distinguish them is important. In general that's however not true for all parks so, so I don't think it's a priority to create items for every park in existence that makes the distinction. I however have no problem with running a bot that does this. ChristianKl (talk) 13:07, 9 October 2016 (UTC)

You should create different items and then use located in protected area (P3018) to link mountains, lakes and other things within the park to the park. Thierry Caro (talk) 13:43, 9 October 2016 (UTC)

  • @‎Andrew Gray: seems to think they should be merged. I don't quite see why.
    --- Jura 15:37, 11 October 2016 (UTC)
The only things like this I can remember working on are the Antarctic protected areas, and I've been merging these only where there's a fairly solid 1:1 match between the protected area and a geographical item.
I'd certainly agree that having separate items when the park and the geographical area don't quite match up is the best idea, especially when it's a bit of a fuzzy term applied generally to a region (like the Lake District or the Dales). I think all the UK national parks would fall into this category and thus need two items, but some of the Areas of Outstanding Natural Beauty or SSSIs might match neatly to an existing geographic item - for example, there's one covering the entire archipelago of Isles of Scilly (Q180209), so having one item there makes sense. Conversely, Isle of Wight AONB (Q15228878) only covers half the island, so a separate item is essential.
In general, if there's any ambiguity over the boundaries, then I'd definitely support two items. But if they're clearly the same area defined in the same way, I don't see any reason not to merge them. Happy to be convinced otherwise, though! Andrew Gray (talk) 16:12, 11 October 2016 (UTC)
I think this would be really clear if we had two distinct concept : one for the geographical region, one for the human jurisdiction (the national park) in each cases. We definitely needs a property to link them. author  TomT0m / talk page 16:27, 11 October 2016 (UTC)

Sources for company data[edit]

The verifiability of company information is crucial to enable reuse. I thought that statements like total revenue (P2139) or employees (P1128) would automatically need sources. But now I'm in a discussion, where someone is disputing this. What do you think: Is is necessary to fix this as rule? E.g. "Numbers need sources"?--Kopiersperre (talk) 14:28, 10 October 2016 (UTC)

Companies aren't in the same sense worthy of privacy protection that living people are. Wrong numbers about revenue nor numbers about headcount also don't see libelous to me but simply errors. The justifaction for the living people policy doesn't apply to companies. ChristianKl (talk) 16:18, 10 October 2016 (UTC)
Changed my question. I've never wanted to to compare companies with living people.--Kopiersperre (talk) 17:46, 10 October 2016 (UTC)
I don't think deleting the question is a good way to have a conversation as it makes it harder for people who join to follow the discussion. Anybody who wants to reuse information can simply ignore unsourced claims and is not harmed by the existence of unsourced statements. ChristianKl (talk) 21:21, 10 October 2016 (UTC)
It seems weird to me, that Wikidata is not putting more emphasis on the sources. Without sources a statement is just an allegation. And I thought Wikidata was to solve the problems of Wikipedia.--Kopiersperre (talk) 09:07, 11 October 2016 (UTC)
A Wikipedia that only wants to import sourced statements can do so and the unsourced statements about company information aren't doing any harm for that use case. Wikidata doesn't exist for a single use case. If certain data isn't valuable to you that doesn't mean that it might not be valuable for someone else. There's nothing to be gained by being exclusive. ChristianKl (talk) 13:21, 12 October 2016 (UTC)
I agree. Sources are important, but data about company headcount, revenue, assets or owners are not enough sensitive to justify removal of those unsourced. Note also that unfortunately sourcing at wikidata is complicated and painful, so it worth to come though this complicated process for mass imports, but for single statement it is very time consuming.--Jklamo (talk) 19:40, 11 October 2016 (UTC)

original network (P449)[edit]

original network (P449) seems to be very US-centric: in most other countries people know programs from their actual channel. According to the constraint report, it is also being used mostly for that. For example: only 3940 of 26521 statements are television networks. Should we split this property up? Why shouldn't we include the "re-runs or additional syndication"? And what to do with the situation in the Netherlands, where we have member-based broadcasting associations that broadcast on channels? Sjoerd de Bruin (talk) 18:33, 10 October 2016 (UTC)

  • Splitting it would make sense I suppose. I'm from the US-centric perspective, so I'll take your word for what happens elsewhere :-) -- Ajraddatz (talk) 19:33, 10 October 2016 (UTC)
  • why not change it to "original network or channel" and adjust the constraints? ArthurPSmith (talk) 20:01, 10 October 2016 (UTC)
    • Does it make sense to only allow the original ones? Sjoerd de Bruin (talk) 20:21, 10 October 2016 (UTC)
      • For major drama series, the original broadcaster seems to be very important to those working in the field on en.wp so it seems useful to retain it. A complementary property "broadcast by" would seem to work for reruns, etc, and would also work for the usecase discussed at #Top goalscorer and tv network (I suggested a "broadcast rights holder" property there, but the examples would still work broadening to this). Thryduulf (talk) 10:36, 11 October 2016 (UTC)
The English label for original network (P449) should really be something like "original broadcaster", that is how the property is currently defined, and how it is being used. Danrok (talk) 01:46, 12 October 2016 (UTC)

place of death (P20) when a person died on sea[edit]

What to set as place of death (P20) when a person died on a ship/during a ship accident (or at another somewhat "uncommon" place)? —MisterSynergy (talk) 18:57, 10 October 2016 (UTC)

Atlantic Ocean (Q97), Mediterranean Sea (Q4918), etc. ?
MisterSynergy (talk) 20:02, 10 October 2016 (UTC)
So fix the constraints. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:40, 10 October 2016 (UTC)
city (Q515) is subclass of (P279) geographic location (Q2221906) via a longer path, and the very first allowed value type in this list. In general this property seems to be quite healthy, with only ~1% of ~500.000 usages having an non-permitted type. —MisterSynergy (talk) 11:19, 11 October 2016 (UTC)
For that case, since the coordinates of the ship are known, you could add them as a qualifier to indicate the exact spot in the sea/ocean. I wonder if vessel (P1876) would also make sense as a qualifier. - Nikki (talk) 11:35, 11 October 2016 (UTC)

There have been at least two previous discussions about this:

Thanks, will look at it (later). —MisterSynergy (talk) 11:19, 11 October 2016 (UTC)

See also property?[edit]

Hello. Is there any "see also" property for items? Or something like that. I want in the page Top Premier League goal scorers by season (Q7824589) to add the Premier League (Q9448). (see also (P1659) is only for properties.) Xaris333 (talk) 19:31, 11 October 2016 (UTC)

Maybe facet of (P1269) … —MisterSynergy (talk) 19:40, 11 October 2016 (UTC)
There is not and there should not be. There are relations that can be expressed as statements. It is really ambiguous why and when we should use such a property. Thanks, GerardM (talk) 19:58, 11 October 2016 (UTC)

I think facet of (P1269) is ok. Thanks. Xaris333 (talk) 20:02, 11 October 2016 (UTC)

Items such as Top Premier League goal scorers by season (Q7824589) only exist because someone at a Wikipedia outsourced this part of information from another article, and we keep items for any kind of Wikipedia articles. When considered as last resort, facet of (P1269) is indeed okay for such cases. —MisterSynergy (talk) 20:11, 11 October 2016 (UTC)
We have many pages about top goalscorers. For example, Cypriot First Division top goalscorers (Q16327504). Xaris333 (talk) 20:18, 11 October 2016 (UTC)

Full citation for reference[edit]

In this edit, is there any (easy) way to make that indicate that the material in question is found on page 19 in the 10th edition? WhatamIdoing (talk) 20:48, 11 October 2016 (UTC)

Special:Diff/386932425. Sjoerd de Bruin (talk) 21:04, 11 October 2016 (UTC)
Not sure that the diff by Sjoerddebruin is quite right. It should use edition number (P393) rather than volume (P478). I hesitated to insert the values into reasonable accommodation (Q751097) as page 19 seems a bit unlikely for a dictionary of law. Robevans123 (talk) 21:37, 11 October 2016 (UTC)
As noted below, it's a sub-entry under "accommodation". Page 19 runs from accidere to accommodation subpoena; I can send you a scanned copy of the page via e-mail if you'd like to see it. WhatamIdoing (talk) 04:26, 13 October 2016 (UTC)
Thanks - I'd read the explanation below - I get it - I really don't need to get proof! Face-smile.svg Robevans123 (talk) 14:06, 13 October 2016 (UTC)
@WhatamIdoing: Yes - add further statements to the reference using properties edition number (P393) and page(s) (P304). Other useful properties for references include publisher (P123), publication date (P577), author (P50), place of publication (P291), (and volume (P478)). And if there is an online version also, reference URL (P854) and retrieved (P813). Robevans123 (talk) 21:37, 11 October 2016 (UTC)
I would be more inclined to add properties that are always the same for a source—such as edition number (P393), publisher (P123), publication date (P577), author (P50), and place of publication (P291)—to the item for the source. I would confine additional qualifiers for the reference to things that will be different each time the source is cited, such as page(s) (P304). The volume (P478) could go either way, depending on the nature of the work, whether every volume had the same author(s), etc. Jc3s5h (talk) 23:10, 11 October 2016 (UTC)
Absolutely - oops - just got carried away after citing a lot of web pages, rather than books, recently... Robevans123 (talk) 14:06, 13 October 2016 (UTC)
I've made corrections to the OPs item. According to Help:Sources there should be a separate item for each edition of a book, so I created a new item, Black’s Law Dictionary (10th edition) (Q27221803). Too bad Wikidata can't deal with an apostrophe in the name of an item. I have a personal copy of this book. It turns out that it isn't volume 10, it's edition 10. The reason "reasonable accommodation" appears so early in the book is because it is an additional term under "accommodation". Jc3s5h (talk) 23:42, 11 October 2016 (UTC)
How does edition number (P393) is supposed to work ? Edition number is tight with the same editor or otherwise the book may have been edited by a of others and the edition number is then totally ambiguous. Is this the edition number in absoute ? On the other hand we have a practice and guidelines of editions having their own items so that we can precisely point to the relevant one. This definitely removes the ambiguity as an edition item has all the relevant informations - dates, editor, and so on. This also make a lot easier the subsequent use of the same edition for citation on other items. Especially with the project of creating an automated tool to make wd sourcing better in mind, I'd totally think the creation of edition items should one usecase that has to be really optimized and painful-less. - I've never used edition number (P393) and I don't intend to. This is not really an information you encounter when you search a book on google book, for example. The ISBN on the over hand is more easy to find. It's scarcely used in items, 7409 statements : query and for stuff like olympic games that are not books. By the following query :
#properties used together with "edition number" by number of uses

select ?qual (count(?qual) as ?num)  where {
  ?stmt prov:wasDerivedFrom ?ref .
  ?ref pr:P393 ?val .
  ?ref ?qual [] .

} group by ?qual
having ( ?qual != pr:P393 )
order by desc(?num)  it's used mostly in reference] with [[Property:P248|stated in&#32;<small>(P248)</small>]]<small> [[Image:SQID logo.svg|25px|<span lang="en">See with SQID</span>|link=//tools.wmflabs.org/sqid/#/view?id=P248&lang=en]]</small>
SPARQL query It seems that this is mostly used with stated in (P248) See with SQID which totally means an item is cited, so an edition item could be used as well. It's also used a lot with DOI (P356) See with SQID which is weird because it seems then redundant as each of the edition should have its DOI (should not it?) author  TomT0m / talk page 12:35, 12 October 2016 (UTC)
I don't think that "edition" means "same editor". It's more "same event". An editor could produce multiple issues, and an edition could be edited by dozens of people. In this case, the dictionary is more than a century old, and a new edition (mostly the same book, but with more and different words and updated definitions) is put out every ten years or so. WhatamIdoing (talk) 04:26, 13 October 2016 (UTC)
It means "same version" imho - You are right that the work can have mutiple authors who works on it, like a Wikipedia artice, and it stays sententially the same work. An "edition" is a state of the work that is published in some state. Any exact reprint of the stuff can be considered the same edition. If something has changed, words have been modified, images, colors, page numbers, it's another edition. The editor is usually a company that is responsible for the printing and selling of the edition. But the editions are usually numbered on the copies corresponds to editions that occurred by the same editor - he's not supposed to know that some other editor has also edited the same work. What identifies an edition - what we want to know if we want to know if the page number is correct for example, or if the stuffs was indeed in this edition and not in an earlier version, is the pair of information given by the editor and the numbered sequence of its edition. author  TomT0m / talk page 06:20, 13 October 2016 (UTC)
Since Help:Sources says there should be a separate item for each edition, I believe edition number (P393) should be added to the item for the edition, not to the reference where that edition is cited as a source. The benefit that edition number (P393) gives us over the description of the edition in the label, such as "Black's Law Dictionary (10th edition)" is that it is a machine readable number, which could be used to list the editions in order, and is easier for people who aren't familiar with the language of the label to understand. Also, there are strange quirks with some books, such as the second edition having "Revised Edition" on the title page rather than "2nd edition". Edition number would be easier for people not familiar with the book to understand.
ISBN isn't a good substitute for edition number, because the ISBN will be different for a leather-bound copy, a hardbound copy, a paperback, and an e-book, even though they all have the same contents and the same edition number. Also, traditional citations in scholarly books and journals cite edition numbers, not ISBN, so using edition number lets us know if Wikidata is citing the same version of a book that a scholarly journal is. This might not seem important to someone who likes to operate bots and import items by the thousands, but for editors who spend an hour or more researching a single item, it could be important. Jc3s5h (talk) 11:33, 13 October 2016 (UTC)
Yes - it's also used in Wikipedia when citing sources, and would be useful when generating references for use in WP. Basically equivalent to the "edition" parameter of en:template:Cite book. Robevans123 (talk) 14:06, 13 October 2016 (UTC)
But if you're citing a page number, then an ISBN might be handy, as the pagination usually differs between the hardbound, paperback, and e-book versions. WhatamIdoing (talk) 17:50, 13 October 2016 (UTC)
I don't think the previous post was advocating edition number (P393) over ISBN-10 (P957) or ISBN-13 (P212), just that that an ISBN is not a substitute for an edition number (P393). Basically, add as much specific information as you can to a specific reference, such as page number and chapter (P792), and where possible useful identifiers for the source, such as edition number (P393), ISBN, OCLC control number (P243), publisher (P123), publication date (P577), place of publication (P291) etc. This way, people can easily find the version that was used, or a version close to it. Robevans123 (talk) 20:10, 13 October 2016 (UTC)

Page[edit]

Hey, I'm not sure what the rules are here as far as guidelines over additions go. It looks like notability likely isn't a factor here, at least not like it is on the English Wikipedia or in the same way. I did want to give you guys a head's up though, in that we recently had someone with an undisclosed COI try to create an article for this person's book on Wikipedia. I also note that it looks like there's an attempt to slowly add him into various articles on Wikipedia using the WikiData page, so I'm concerned that they might be using this as a way to circumvent notability guidelines on Wikipedia. Tokyogirl79 (talk) 05:53, 12 October 2016 (UTC)

It seems like he is notable enough for Wikidata, though. I don't have a feeling that Wikidata is used to circumvent notability guidelines on Wikipedia. The item only lacks some sources for their statements, so it would be great if we could improve that. Sjoerd de Bruin (talk) 06:45, 12 October 2016 (UTC)
I am not so sure, since he is linked from two pages of his own books, one of which is currently nominated for deletion, and another one has been apparently already deleted. If the first one gets deleted, we may delete all three items.--Ymblanter (talk) 06:55, 12 October 2016 (UTC)
But the person itself contains 10 identifier properties. Is ISNI also editable by others? Most other identifiers are, so that's why I'm asking. Sjoerd de Bruin (talk) 06:58, 12 October 2016 (UTC)
His book is edited by Titan Inc. (Q26960468) and in the Wikidata, he is listed as the CEO of Titan. That makes the book self-published, and likely to be unsuitable as a source for Wikipedia or Wikidata. If an otherwise non-notable author created a work that was cited as a source for an item in Wikidata, then the work and the author should both be added to Wikidata. So unless any citations to the book(s) stand up to the scrutiny of other editors, the references, the books, the publisher, and the author can all be deleted. Jc3s5h (talk) 08:07, 12 October 2016 (UTC)
There is no Wikidata policy that forbids self-published sources. They are not high quality source but they aren't forbidden.
Linking identifiers together is useful for various libraries. VIAF profits from being informed that the INSI number and a ORCID number describes the same person as an existing VIAF number. Both INSI, ORCID and the German National Library (which is the source for the VIAF number) are serious sources.ChristianKl (talk) 10:34, 12 October 2016 (UTC)
Hmm, ORCID is about as serious as Facebook or twitter, so I wouldn't count that one for notability. Multichill (talk) 19:25, 12 October 2016 (UTC)
OrcID that are used in practice by scientific papers to specify the authors of those papers, seem to me serious even if OrcID can also be used for purposes that are less serious. ChristianKl (talk) 11:19, 15 October 2016 (UTC)

Big data improvement for chemicals[edit]

Just to announce you than thanks to the work of Sebotic, Wikidata increases its coverage of chemicals with a total of ~98'000 chemicals having an item (two month ago we had ~22'000 items about chemicals). All data were imported from PubChem and ChEBI datbases and respect the rules about sources leading to a high improvement of the data quality. Right now an important step has to start to curate the data especially to merge duplicated items. You are welcome to take part to this action and you can get in touch with the Chemistry project in WD for details.

From that work additional importations can start in order to add more identifiers but please announce your intention of data import before any huge importation in order to coordinate the work of bots with the one of contributors curating the item conflicts. Snipre (talk) 11:39, 12 October 2016 (UTC)

PetScan[edit]

Can anyone help me with PetScan? See [4]. I want the results to have the Spanish label of the wikidata pages for all items. Some items don't have Spanish labels. Xaris333 (talk) 12:49, 12 October 2016 (UTC)

In "other sources" (Άλλες πηγές), choose "Wikidata" in "Use wiki" (Χρήση wiki). Keep this in mind as it's one of the most FAQ's regarding PetScan. Matěj Suchánek (talk) 13:55, 12 October 2016 (UTC)

Thanks! Xaris333 (talk) 19:45, 12 October 2016 (UTC)

Add label[edit]

Hello. I have a column with Q numbers of items and a column of labels in a specific language. Is there a way to add the labels easily? Not by hand one by one? Xaris333 (talk) 13:06, 12 October 2016 (UTC)

Quick statements. --Edgars2007 (talk) 13:25, 12 October 2016 (UTC)
...hidden under QuickStatements (Q20084080). Matěj Suchánek (talk) 13:55, 12 October 2016 (UTC)
Yeah, no idea why Magnus puts noindex on most of his tools. Sjoerd de Bruin (talk) 14:44, 12 October 2016 (UTC)

Thanks! Xaris333 (talk) 19:46, 12 October 2016 (UTC)

Support of translatewiki.net[edit]

Hello.Why Wikidata does not support translatewiki.net (Unlike Commons)?Thank you --ديفيد عادل وهبة خليل 2 (talk) 15:12, 12 October 2016 (UTC)

What kind of support would you want? ChristianKl (talk) 15:16, 12 October 2016 (UTC)
Linking page with its counterparts in items --ديفيد عادل وهبة خليل 2 (talk) 15:21, 12 October 2016 (UTC)
Translatewiki.net is not hosted on the Wikimedia servers and isn't officially connected with Wikimedia. Sjoerd de Bruin (talk) 15:34, 12 October 2016 (UTC)
Sjoerddebruin Thank you.I love mergers (all wikis) so I asked this question --ديفيد عادل وهبة خليل 2 (talk) 15:46, 12 October 2016 (UTC)
@ديفيد عادل وهبة خليل 2: We have had discussions about hosting the site but it has never happened. —Justin (koavf)TCM 13:34, 13 October 2016 (UTC)

Two or one villages?[edit]

I first though there was some conflation, and I just need to move a few interwiki links - but now, after looking deeper, I am even more confused, and before I forget it I wanted to raise it here: are Vranduk (Q15924777) and Vranduk (Q1560118) one or two villages? --Denny (talk) 18:31, 12 October 2016 (UTC)

Given that the French Wikipedia version has two entries for Vranduk with one being located in Doboj and the other in Zenica it seems like it should be two items. ChristianKl (talk) 18:46, 12 October 2016 (UTC)
I thought so too, but then it seems that Doboj and Zenica are bordering each other, that they are both in the Zenica-Doboj Canton (Q18253), and that the articles all claim that Vranduk lies on the way from Doboj to Zenica. I am still not completely convinced they are two villages. --Denny (talk) 04:04, 14 October 2016 (UTC)
Weird. OSM also has two villages of that name, one in Doboj, one in Zenica. But the one in Doboj, I can't confirm its existence on a satellite map. I am getting curious enough to put some time aside to figure out what's going on, but if anyone else wants to make a stab... --Denny (talk) 04:29, 14 October 2016 (UTC)
Google Maps seem to consider there to be two different villages. The one in Doboj seems to be a handful of houses and a lot of forest but Google still considers it to be a village. ChristianKl (talk) 21:24, 14 October 2016 (UTC)
OK, I guess I should follow Google Maps ;) The census of the Republic and the Federation both list a Vranduk with very different population numbers, one in Zenica (Federation) and one in Doboj (Republic), and the municipalities which they are listed in are very geographically distinct, so yeah, I assume both exist. Thanks for the sanity check! --Denny (talk) 20:25, 17 October 2016 (UTC)

Check out and endorse the GLAMpipe project![edit]

GLAMpipe metadata manipulation & upload tool is an extensible, open source web-application for cultural metadata. It is aimed for data-savvy wikimedians and data partners. It gives the user the power of bots without the need to code.

Nodes are the building blocks of the data flow. A node can act as a data source, it can split, combine, create wikitext or process data in other ways, and a node can export data to files or web services like Wikimedia Commons or Wikidata. Nodes can be created, altered and shared by the users, making it possible to build upon work by others.

We are applying for a grant from the Wikimedia Foundation to create an online, collaborative version and the possibility of preparing and importing data to Wikidata. Read more about the project, and endorse it at https://meta.wikimedia.org/wiki/Grants:Project/GLAMpipe

Best regards, Ari, Kimmo and Susanna Ånäs (Susannaanas) (talk) 21:34, 12 October 2016 (UTC)

Are labels always necessary[edit]

Should every item, ultimately, have a set of labels in each language? Or are there some items which should deliberately never have a label, at all? If so, which? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:02, 13 October 2016 (UTC)

I can’t think of any reason why an item should deliberately never have a label right now, but I guess you have an example in mind?! I’m not sure what happens if you search for an item (using the search box) which has no labels. Might be difficult to find then… —MisterSynergy (talk) 10:28, 13 October 2016 (UTC)
Wikimedia duplicated page (Q17362920)? It's the only case I can think of... --Harmonia Amanda (talk) 12:07, 13 October 2016 (UTC)
I'm not sure what you mean with *should*. Could you clarify? ChristianKl (talk) 12:35, 13 October 2016 (UTC)
That it is best practice. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:49, 14 October 2016 (UTC)

Supplementary question: Not even if the item has a birth name (P1477) or native label (P1705)? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:30, 13 October 2016 (UTC)

Why would either or those properties mean that a label has no value? ChristianKl (talk) 12:35, 13 October 2016 (UTC)
Labels and descriptions are what ultimately constitutes the identity of an item. I would expect that in a perfect world - besides for some obscure technical purposes - every item would have a label in each language. --Denny (talk) 04:06, 14 October 2016 (UTC)
In a perfect world, wouldn't it be desireable to have no duplicated data? Thus if a label is the same for multiple languages, the label is only set for one language and all other languages are taking use of the language fallback mechanism. --Pasleim (talk) 21:49, 14 October 2016 (UTC)

So are we agreed that a taxon with a taxon name (P225) still needs a label, and that, where there is no no vernacular name, the label should be (for languages using the western alphabet, at least) the taxonomic name? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:49, 14 October 2016 (UTC)

I think there is a difference between the label being the same as the value of a property and there being no label. I can see benefit though in having some sort of mechanism by which the value of e.g. taxon name is displayed as the item label (including in searches, etc) if no label is explicitly set. Thryduulf (talk) 22:44, 14 October 2016 (UTC)

See all labels of an item[edit]

Hello. How can I can see all the labels in any language of an item? I want only to see the languages that have label about the item. Xaris333 (talk) 10:14, 13 October 2016 (UTC)

There is a link below the labels & description box on all item pages, reading “All entered languages”. This unfolds the box and shows all entered labels, descriptions and aliases. There is also a gadget called “labelLister” which you can activate in your preferences. It then shows an additional tab left to the search box with information about labels etc. —MisterSynergy (talk) 10:24, 13 October 2016 (UTC)
Thanks. Do you know why in the list of the labels I can see a language that has no label? Xaris333 (talk) 10:47, 13 October 2016 (UTC)
It either has a description or an alias (or both) in that language, or it is one of the very few (typically 2–4) languages which the software identifies as best suited for you. In the latter case you should already see it before you unfold the box. —MisterSynergy (talk) 10:50, 13 October 2016 (UTC)
Its Lithuanian language. Its weird. Xaris333 (talk) 11:22, 13 October 2016 (UTC)
@Xaris333: Which item? (Please always give the example in question; unless there is a specific reason not to.) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:38, 13 October 2016 (UTC)
In every item. For example 2015–16 Cypriot Cup (Q20645820). Xaris333 (talk) 11:41, 13 October 2016 (UTC)
@Xaris333: I can confirm that Q20645820 has no Lithuanian label, description nor alias. Are you in or near Lithuania? In any case, you may be able to resolve this by putting a "WD:Babel" template on your user page, with the languages you do read & write. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:04, 13 October 2016 (UTC)
I am far away. Ok, thanks. Xaris333 (talk) 12:08, 13 October 2016 (UTC)

Fourth Birthday userbox[edit]

Wikidata cupcake II.svg This user is celebrating Wikidata's 4th birthday.

Here you go: {{User Wikidata birthday 2016}}. Translations needed. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:11, 13 October 2016 (UTC)

Changes to the wikitext output of the action=wbformatvalue API[edit]

Hey folks,

we will soon change how Wikibase outputs data as Wikitext per default. This might affect users of the action=wbformatvalue API that either use generate=text/x-wiki or omit the generate parameter (Wikitext output is the default for that API module). I briefly looked through our API logs in order to see how many users will be affected by this and found that no one uses this feature (T147591).

Please note that for our internal functionality (like the property parser function, or the Lua functionality), we made sure that the output wont change.

Only the output obtained via the wbformatvalue-API module might change! - Hoo man (talk) 13:15, 13 October 2016 (UTC)

Stroke categories in Arabic[edit]

Please could an Arabic speaker explain the difference between Category:Deaths from stroke (Q6509490) and no label (Q7215764), and link them (and label the latter in English) accordingly? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:22, 13 October 2016 (UTC)

Item labels that tie to a place[edit]

This topic "unitary authority of England (Q1136601)" seems a bit odd in the way it is named. Should it be something like "English unitary authority"? It might be because its based on a descriptive page about that topic, rather than a more definitive object page like "unitary authority".

Also is it a logical thing to actually have subclasses that are country specific as in this case? I wondered whether the best practice for a council that is an instance of that, to say "instance of: unitary authority" and "country: England". Pauljmackay (talk) 21:34, 13 October 2016 (UTC)

How many cans of worms can you open in one go?!
But, just to confuse things a bit more, the Ordnance Survey often refers to principal areas as "unitary authorities"...
In Wales, it's fairly simple, the areas are all principal area of Wales (Q15979307), there are 22 of them covering all of Wales, each governed by a unitary authority (Q1160920). The situation in England is more confusing, firstly because the term "principal area" is not widely used (even though it's defined in legislation). The areas are more commonly referred to as non-metropolitan counties, districts or London boroughs, which are all defined as principal areas. All the councils covering these areas should be referred to as unitary authority (Q1160920).
  • unitary authority (Q1160920) is synonomous with "principal council" (again, as defined in the legislation, but not extensively used).Terms such as borough council, district council, and county council are frequently used.
  • Large parts of England are not part of a principal area, but are covered by metropolitan counties and non-metropolitan districts and other entities.
  • Many Wikipedia articles cover a lot of concepts to cover an area. The original (historic) county, the administrative county, the ceremonial county, and the current (local government) county etc, and the borders/history of these can be/are all different. Robevans123 (talk) 00:15, 14 October 2016 (UTC)

No label in a specific language[edit]

Hello. 1969–70 Cypriot Second Division (Q22812255). Is there a way (by a tool) to check is any of items and properties of that item has no label in a specific language? For example in Spanish. I don't want to read the page and to find the word "inglés". Xaris333 (talk) 23:39, 13 October 2016 (UTC)

That's a good idea! --Denny (talk) 04:09, 14 October 2016 (UTC)
SELECT ?prop ?propLabel ?id ?idLabel WHERE {
  wd:Q22812255 ?p ?id .
  ?prop wikibase:directClaim ?p .
   SERVICE wikibase:label {
       bd:serviceParam wikibase:language "es" .
   }
}
SPARQL query
Something like this. --Edgars2007 (talk) 09:10, 14 October 2016 (UTC)

Its working. Thanks! Xaris333 (talk) 10:31, 14 October 2016 (UTC)

babel en-gb broken[edit]

"This user has a native understanding of [[Category:User_en-GB|]]."

The en-gb babel template is saying this, including that ... error at the end. I'm not sure what happened, but I can't seem to find Wikidata:User en-gb. -- numbermaniac (talk) 07:37, 14 October 2016 (UTC)

It's not just en-gb, but for several other Babel language templates too. I don't know what the cause is, though. Jared Preston (talk) 08:32, 14 October 2016 (UTC)
@Numbermaniac, Jared Preston: >>. --Liuxinyu970226 (talk) 10:52, 14 October 2016 (UTC)

Rules for classification property[edit]

Hello. I am thinking about a "Rules for classification" property but I am not sure how to suggest it. For a football league there are some commons rules like:

and some others like

  • head-to-head points
  • head-to-head goal difference
  • head-to-head away goals scored

Some other sports leagues have some of these rules Some others have some others rules like volleyball (for example: earn sets/lost sets).

And all these are in a specific order. For example, in 2016–17 Cypriot First Division (Q23756432) the Rules for classification are: 1) Points; 2) Head-to-head points; 3) Head-to-head goal difference; 4) Head-to-head goals scored; 5) Head-to-head away goals scored (only if two teams); 6) Goal difference; 7) Goals scored; 8) Play-off (only if deciding championship round, relegation round or relegation).

I really need opinions how to suggest that property.

Xaris333 (talk) 10:42, 14 October 2016 (UTC)

Take a look at properties for this type (P1963) Pauljmackay (talk) 11:04, 14 October 2016 (UTC)
How this can help me? Xaris333 (talk) 11:43, 14 October 2016 (UTC)
So you would add properties for this type (P1963) to the football league item. Then add points for (P1358) and number of points/goals/set scored (P1351), etc as values for that property. So that list then defines a template of properties that any instance of football league should have. Pauljmackay (talk) 17:22, 14 October 2016 (UTC)
I don't think is that I need. Xaris333 (talk) 17:49, 14 October 2016 (UTC)

@Edgars2007: @Thryduulf: any ideas? Xaris333 (talk) 18:04, 14 October 2016 (UTC)

My very first thought is to have a new "rules for classification" property which takes items like "points" (or "points scored"), "goals scored", "goals conceded", etc. as values, each with a mandatory series ordinal (P1545) qualifier to determine order in which they are applied. I don't know whether this will work or if there are better solutions, nor do I know how we would structurally define the necessary items. Thryduulf (talk) 22:52, 14 October 2016 (UTC)
Yes, that was also my idea. --Edgars2007 (talk) 02:19, 15 October 2016 (UTC)

I have proposed it this way. Wikidata:Property proposal/rules for classification Xaris333 (talk) 11:53, 15 October 2016 (UTC)

How to specify the target of a diplomatic mission when the target is the UN or the EU?[edit]

Wikiproject diplomatic relations tells us to fill for Embassy of Algeria, Kiev (Q154663) the properties operator (P137) Algeria (Q262) and country (P17) Ukraine (Q212).

This is perfect in most cases, but it does not work when the diplomatic target is an organization:

How should we indicate the diplomatic targets in the cases above? Syced (talk) 11:47, 14 October 2016 (UTC)

Use valid in place (P3005), applies to part (P518) or applies to territorial jurisdiction (P1001)? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:56, 14 October 2016 (UTC)
Andy, thanks for the feedback! Can the UN/EU/NATO really be considered a place, or a part, or a territorial jurisdiction? To help me understand better, would you mind writing the applies to part (P518) statement for the Permanent Representative of France to the United Nations (Q1155320) example? Thanks a lot! Syced (talk) 05:33, 17 October 2016 (UTC)

Commented out constraint still triggering[edit]

The talk page of SoundCloud ID (P3040), which hasn't been edited since 10 August, includes:

<!-- {{Constraint:Type|classes=Q5,Q16334295,Q1076968|relation=instance}} Overly strict? -->

but the corresponding constraint report, updated today, includes "Type human (Q5), group of humans (Q16334295), digital media (Q1076968) violations". Are the comment markers being ignored? Or is the constraint being picked up from elsewhere? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:24, 14 October 2016 (UTC)

Seems that the bot doesn't care about markup. Usually <!--{{Disabled Constraint:...}}--> is the way around. Matěj Suchánek (talk) 17:28, 14 October 2016 (UTC)
@Matěj Suchánek: Thank you. I've applied that in this case, but it would be good if Ivan's bot was fixed. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:43, 14 October 2016 (UTC)

Issues with deletions[edit]

Q19590854 was just deleted, as not meting the notability requirements.

As a general issue, I find the deletion problematic (of course I assume good faith on behalf of the deleting admin). The item was on my watchlist, but I don't recall why, or what it was about. There is no way for me to tell, without asking the deleting admin (or another) to look it up for me; as such requests increase that's likely to become burdensome.

Nor was there any advance notification that I saw, that the item was being considered for deletion, and I don't believe the matter was subject of a discussion.

How might we address these issues? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:49, 15 October 2016 (UTC)

General issues of deletion policy[edit]

Again: How might we address these issues? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:38, 15 October 2016 (UTC)

Specifics of Q19590854[edit]

In this period a deleted a lot of item using the report generated by Pasleim like this. For the specific item the data was the following:
No source, the sitelink was deleted on 23 feb 2016, no backlink --ValterVB (talk) 12:32, 15 October 2016 (UTC)
I've had the same problem today. I think the latest label of a deleted item must be shown in any way.--Kopiersperre (talk) 12:54, 15 October 2016 (UTC)
That would be risky though, as some deleted items are privacy violations for example. Sjoerd de Bruin (talk) 12:56, 15 October 2016 (UTC)
@ValterVB: Like I said, I wanted to raise a general issue here. However, in this specific case, a Google search for "Craig Silverstein Google" finds plenty of sources. Did you look for any, before deleting? Please restore the item, which clearly meets our notability criteria. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:21, 15 October 2016 (UTC)
The item needs to have sources though. Saying that they exist isn't enough. Sjoerd de Bruin (talk) 13:23, 15 October 2016 (UTC)
Exactly. The item was outside of our notability policy for months, no one has added the sources, so when I have deleted the item, the item wasn't notable. --ValterVB (talk) 13:28, 15 October 2016 (UTC)
@ValterVB: My question to you was "Did you look for any, before deleting?" Should I take that as a "no"? Anyway, the Wikipedia item has now been restored. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:36, 15 October 2016 (UTC)
That's not what our notability policy says. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:36, 15 October 2016 (UTC)
If you mean "Have you searched source outside of Wikidata before delete item" the answer is "No" and I don't think that I must do it. --ValterVB (talk) 13:42, 15 October 2016 (UTC)
Same here for Lambda the Ultimate (Q16465918) View with Reasonator See with SQID No way to tell what it was, no formal warning ... nothing. author  TomT0m / talk page 13:37, 15 October 2016 (UTC)
Restored because I haven't see reference URL (P854) --ValterVB (talk) 13:48, 15 October 2016 (UTC)
That property should only be used in the source section, though. Sjoerd de Bruin (talk) 13:51, 15 October 2016 (UTC)
Seriously, just like that ? What's going on in there ? This is just the illustration that Wikidata admins mistakes, that they don't have to justify whatsoever, could silently delete perfectly good content without anybody notified ... There should at the very least have a double check before the deletion. The creators should be notified and there should be a proper formal demand. author  TomT0m / talk page 13:52, 15 October 2016 (UTC)

I restored Craig Silverstein (Q19590854) because English Wikipedia article has been undeleted. --Epìdosis 13:39, 15 October 2016 (UTC)

  • Our notability policy doesn't speak about whether an item has references. Plenty of items on Wikidata don't have references. That doesn't mean that they should be deleted. The fact that he's described as Google's first employee clearly illustrates that he's notable.
I think the history of deleted items should be default be visible to anybody for at least a month.
As far as privacy and defamation is concerned it's an issue that exists for deleted and undeleted items the same way. Those issues can still be hidden from the item's history.
I don't think not displaying the latest labels and description for deleted items makes sense. ChristianKl (talk) 15:49, 15 October 2016 (UTC)
From our policy: « An item is acceptable if it refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references » A description in an item isn't a "serious and publicly available references". --ValterVB (talk) 15:59, 15 October 2016 (UTC)
If you can find sources with Google than the item is an item that **can** be described with "serious and publicly available references". The standard isn't that the item **is** described with "serious and publicly available references". Even without googling it should be obvious that are "serious and publicly available references" about who the first employee in Google happens to be. ChristianKl (talk) 16:49, 15 October 2016 (UTC)
Starting from the end: who said that the item is about the first employes? In item I find nothing about this. For the part "If you can..." Where is write that admin must search and add source that user don't want add?. The item is judged to the state that is, not for what will be or what could be --ValterVB (talk) 17:30, 15 October 2016 (UTC)
As you yourself note above: "Description (en): Google's first employee ". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:58, 16 October 2016 (UTC)
  • Just to note that this discussion pretty much means that we stop deletions. Every Wikidata item, with the exception of obvious vandalism, has external references which in principle could be found. In particular, everything which has been on Wikipedia long enough to make it to Wikidata and then get deleted (ValterVB, me and some other admins are working through the lists of deleted items) can be referenced. IMO, the requirement that external references must be searched before deletion, clearly dilutes the spirit of the notability requirement, and states that whatever made it to Wikidata can not be deleted.--Ymblanter (talk) 16:03, 15 October 2016 (UTC)
If I write a novel and put items for all characters of my novel into Wikidata as I go about my writing process, the creation of those characters is original content for which no external references exist. I don't think that's content that belongs on Wikidata and that the notability guideline as it stands defines that the content doesn't belong on Wikidata.
As far as "spirit" goes the purpose of Wikipedia is "to compile the sum of all human knowledge". Information about how happens to be the first employee in Google clearly fit into that purpose and I don't see a reason to have policies to exclude it. Wikipedia's notability guideline exist because otherwise a lot of false information would enter Wikipedia. Given Wikidata ability to limit statement to structured claims there's less problematic content entered and we can be looser.
To use a more timely example of items that I don't consider notable take Facebook groups with 16 users. That group is likely not described outside of Facebook by serious sources and thus it's not good to store data about it on Wikidata but if the H+ Wiki wants to store the data it's better they host their own database. (see the request above) ChristianKl (talk) 17:09, 15 October 2016 (UTC)
Again, we are not talking about items which individuals put on Wikidata about themselves. We are tlking about the items which were created on Wikipedia, survived there for some time to be transferred to Wikidata by bots (typically a week), and then deleted on Wikipedia. In particular, this article was deleted on Wikipedia for not being notable (it was later reinstated, which is a dofferent issue). Then, if we decide it is notable, or even "obviously notable" because it has some google hits, we just open a backdoor to Wikidata for non-notable content.--Ymblanter (talk) 17:23, 15 October 2016 (UTC)
Wikipedia has a culture that values defending against the creation of certain articles that aren't notable according to Wikipedia's notability policy. I think that policy makes sense for Wikipedia. Wikidata on the other hand doesn't benefit from having the same notability policy but benefits from being looser in it's notability.
If a bot creates an article about a person named "John Doe" without any statements it's not clear to which John Doe the item refers as there are multiple people named John Doe. As such I wouldn't say it's an item about a specific person.
In most cases I don't think it's useful for Wikidata to delete items with statements that clearly specify which person is meant when the Wikipedia article is deleted. What harm do you think would be caused by not deleting items like this? ChristianKl (talk) 18:54, 15 October 2016 (UTC)
Let me put it like this: In the past, the consensus certainly was that the possibility to identify an item was insufficient to keep in at Wikidata. Even a identifier coming from a database everyone can edit such as IMDB was no sufficient. The consensus cold have been changed, but IMO this should be discussed and established. If the consensus has changed since 2012 we need a new workflow for admins. Currently, my workflow wen I decide whether an item needs to be deleted does not include internet search or, indeed, checking external sources. If it were, I would have time to check may be one or two deletion candidates per day rather than 10-15. Note that I am currently one of the five most active admins here.--Ymblanter (talk) 19:27, 15 October 2016 (UTC)
For the characters of the novel do you know the third rule of the notability? You can use characters (P674) in the novel and no one will delete item about characters. In wikidata we have notability policy very "light" but Wikidata don't collect fact about all. --ValterVB (talk) 17:32, 15 October 2016 (UTC)
That would need the novel to be published. If it's unpublished it wouldn't work. ChristianKl (talk) 18:54, 15 October 2016 (UTC)
If it's unpublished but is notable for wikidata, you can use it, where is the problem? --ValterVB (talk) 19:19, 15 October 2016 (UTC)
If it's unpublished and there are no references for it, it might not be notable. In addition I don't think that every link between two items illustrates a "structural need" in particular I don't think characters (P674) illustrates a structural need. It might be worthwhile to have a more specific policy on what a structural need happens to be. ChristianKl (talk) 12:30, 16 October 2016 (UTC)

I think the problem it that deletion process is simply too easy. That has rationale it times, when we had lot of deletion requests because of merging. I think we must consider to develop better deletion policy and process comparable with those on local wikis - separate speedy ("non-controversial") and normal deletions, set-up rules for normal deletions (minimal time for open request, obligation to notify item creator, etc.).--Jklamo (talk) 17:01, 15 October 2016 (UTC)

We could try, but my feeling is that we have too many items to be deleted and too small community to avoid massive backlogs.--Ymblanter (talk) 17:25, 15 October 2016 (UTC)
I don't think is possible there are too much item to start discussion for each item Here you can find a list of lists with item candidates to the deletion (not complete, they are the ones I use), we are talking about thousands or tens of thousands of items . If the community does not trust in its administrators I think that we have a problem, naturally errors are always possible. --ValterVB (talk) 17:41, 15 October 2016 (UTC)
Like you said, mistakes are always possible. You may not realize this because you're an admin and you can see deleted stuffs but there is few more annoying stuffs that seeing an item in your watchlist has been deleted and you have no more information about this. Of course you ask yourself if it's a mistake. If for no other reason, it's a politeness matter to inform the involved parties of what's going on. author  TomT0m / talk page 17:51, 15 October 2016 (UTC)
If thousands of items get deleted in an automated fashion I don't trust that no mistakes are made and it would be useful to have a process to spot mistakes. If someone has an item on their watchlist and they see it getting deleted they should have a recourse that allows them to see that the deletion is made and contest it. ChristianKl (talk) 18:54, 15 October 2016 (UTC)
No one said that we delete item automatically, I check every item before the deletion: source, identifiers, history (to check if the page on wiki was really deleted) and What links here. --ValterVB (talk) 19:17, 15 October 2016 (UTC)
But you don't look for sources - not even a cursory Google search - and don't notify the creator if the item. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:44, 16 October 2016 (UTC)
Yes, I don't search source, already said: « Where is write that admin must search and add source that user don't want add?. The item is judged to the state that is, not for what will be or what could be  ». --ValterVB (talk) 11:05, 16 October 2016 (UTC)
It is written that an item may be deleted when "The item does not meet notability requirements". It is also written that an item meets our notability criteria if it "can be described using serious and publicly available references". If you do not look for such references, and you do not ask the item's creator or the community at large if they have any, then you cannot know that the item meets that criterion for deletion. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:16, 16 October 2016 (UTC)
There are certainly some admins whose actions have shown that trust in them would be misplaced; but even the majority to whom that does not apply, which includes you, are not infallible. In any case, admins should be implementing the consensus of the community, not making unilateral decsions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:44, 16 October 2016 (UTC)

German label of Q27276042[edit]

Additional views on the German label of Elnaz Golrokh (Q27276042), an Iranian woman, are needed

User:Kopiersperre insists that it must be "Elnaz Golroch"; claiming in an edit summary that de:Wikipedia:Namenskonventionen/Arabisch#Persische Transkription "is authoritative". I do not believe that a German Wikipedia page has authority here.

Furthermore the spelling "Elnaz Golrokh" is used by the subject herself, for example as her Twitter and Instagram user names, and by every .de domain page found by Google.

Once again, this is obviously an issue with wider implications than the single example discussed here; please address these wider issues in your replies. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:10, 15 October 2016 (UTC)

It doesn't matter, what the web says. Using one transliteration system is the only way to ensure uniformity for the labels. If Andy gets through with this he has enforced English language imperialism against a smaller language.--Kopiersperre (talk) 18:15, 15 October 2016 (UTC)
The spelling "Elnaz Golrokh" is used by the subject herself, so any imperialism here is not mine. Your comment is extremely offensive. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:40, 15 October 2016 (UTC)
So we are applying en:Wikipedia:Manual of Style/Arabic to all languages? Why? --Succu (talk) 18:26, 15 October 2016 (UTC)
Who is relying on that? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:40, 15 October 2016 (UTC)
Some projects have their own transliteration system, see Wikipedia:Manual of Style/Arabic (Q15868552) --Succu (talk) 19:15, 15 October 2016 (UTC)
Who is relying on that in this case? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:34, 16 October 2016 (UTC)
Elnaz Golrokh is an impossible combination in German, it should be Golroch.--Ymblanter (talk) 18:28, 15 October 2016 (UTC)
The spelling "Elnaz Golrokh" is used by the subject herself. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:40, 15 October 2016 (UTC)
It is absolutely irrelevant. Nobody cares. You will be in Russian Энди Маббетт, even if you decide to spell yourself say Анди Маббет and start suing everybody who disagrees. Internal rules of the language are internal rules of the language.--Ymblanter (talk) 18:43, 15 October 2016 (UTC)
Absolutely irrelevant what an Iraninan woman decides to call herself; we know better then she does what her name is? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:37, 16 October 2016 (UTC)
Ever tried to find out how your name is written in Latvian?--Ymblanter (talk) 11:12, 16 October 2016 (UTC)
This is a discussion about how we write a name in German. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:38, 16 October 2016 (UTC)
I am afraid you just do not get what almost everybody here tries to explain you.--Ymblanter (talk) 12:37, 16 October 2016 (UTC)
I "get it" very well, thank you. I just don't happen to agree with arguments based on false premises. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:57, 16 October 2016 (UTC)
Support for Kopiersperre’s position, although I wouldn’t see this dewiki page as “authoritative”. However, it reproduces how Arabic Persian -> German transcriptions are typically being conducted, and the result “Golroch” is indeed something one can read and spell in German language (unlike “Golrokh”). Please mind that transcription transforms into different languages with the aim to produce something which sounds somewhat similar, and it is not a transformation into another script. That’s why labels are available for languages, not for scripts. Btw. the same issue applies to many cyrillic names (etc): cyrillic “х” is transcripted into “ch” in German and to “kh” in English. You can put the English transcription as an alias, however. —MisterSynergy (talk) 20:28, 15 October 2016 (UTC)
What does Arabic have to do with this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:39, 16 October 2016 (UTC)
I missed that Golrokh is apparently from Iran and was confused by the fact that Persian name conventions are part of mentioned pages titled “Arabic” (or similar). I replaced “Arabic” by “Persian”. —MisterSynergy (talk) 11:43, 16 October 2016 (UTC)
I am tempted to agree with those who aren't Andy Mabbett on this issue, though I believe the spelling which he suggests should be an alias in every language written in a Latin script because of its near-ubiquity. If people like Fyodor Dostoyevsky (Q991) and Muammar al-Gaddafi (Q19878) are allowed to have different transcriptions in each language owing to having a name originally in a non-Latin script, this person should too. Mahir256 (talk) 21:56, 15 October 2016 (UTC)
Duden says that family names in German are not subject to the normal orthographic rules. German media uses the name Elnaz Golrokh for her and nobody knows her under the name Elnaz Golroch. Newspapers who supposedly have editors who know how the German language works don't call her Elnaz Golroch. I think the primary German label should be the name under which she's known in Germany and that happens to be Elnaz Golrokh. ChristianKl (talk) 01:29, 16 October 2016 (UTC)
This is why we have "also known as" options. --EncycloPetey (talk) 03:11, 16 October 2016 (UTC)
No, we absolutely do not have "also known as" options so that we can ignore a woman's decision as to what her own name is. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:37, 16 October 2016 (UTC)
German naming law considers official passports to be more important when it consider the name of a person than what the person themselves chooses but in this case there's no passport saying they are named Elnaz Golroch. ChristianKl (talk) 11:42, 16 October 2016 (UTC)
"Also known as" contains the word "also". The woman isn't known under the name Elnaz Golroch. Wikidata or Wikipedia aren't supposed to be primary sources for how a person is named. Their purpose is rather to document reality and the authoritative sources German sources name the person Elnaz Golrokh. ChristianKl (talk) 11:42, 16 October 2016 (UTC)

Quite apart from the name of this individual, Google finds "about 2,630 results for "Golrokh site:.de and only 39 for Golroch site:.de. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:33, 16 October 2016 (UTC)

Were you doing this Google searches before or after you added the label? In general, it's definitely wrong to copy English transliteration from Cyrillic or Arabic names to German! In the case of Golrokh you may have been lucky because all medias are copying her name from the social media channels, but you were doing the same edits also on Andrey Aleksandrovitsj Snezjko (Q4425493). For Sneschko site:.de 93 results, for Snezjko site:.de 0 results. Kopiersperre fixed the German label but who takes care of the other 70 labels? --Pasleim (talk) 13:44, 16 October 2016 (UTC)
I agree with "Elnaz Golroch" as the German label and with "Elnaz Golrokh" as an alias (or "also known as") for languages using Latin-based scripts. It does not make sense to do it the other way round. --Daniel Mietchen (talk) 17:29, 16 October 2016 (UTC)
By the way, the "o" letters in "Golrokh" could well be spelled "u" (at least in German), judging from the Farsi "الناز گلرخ". --Daniel Mietchen (talk) 17:33, 16 October 2016 (UTC)

This shouldn't be about the transliteration system used, if the person used has effectively taken a (romanised) name, presumably using one transliteration system. If people want to include the values from different transliteration systems they can be added as qualifiers to the native name. It also shouldn't be about whether a letter combination is "impossible" in a given language. I believe that the "ött" letter combination is "impossible" in English, but I wouldn't dream changing the English label (or alias) of Niels Böttcher (Q1988963) to Bertcher, Butcher, or Burtcher. It is true that well-known people become known by different transliterations into different languages, but to pre-empt this by labelling or aliasing a name before this as happened smacks of "original research". ChristianKl's research seems to show that Elnaz Golrokh is used by the german media, and there are no sources for Elnaz Golroch. Wikidata should reflect the world as it is described by reliable sources, not the way we would like it to be... Robevans123 (talk) 21:48, 16 October 2016 (UTC)

Quite. Thank you for making the point much more eloquently than I did. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:45, 17 October 2016 (UTC)

Help:Label says 'the most common name the item would be known by', not the most correct one. We say Czech Republic, not Czechia. If modern reference books prefer a certain spelling widely, or, in case the item is not yet included widely in reference books, if current media is using a certain spelling, this seems to indicate that this is the most common name for the item. I think what a person calls themselves or what the orthographic descriptive rules of a language would prescribe should be relegated to aliases and appropriate properties. Whereas I like the politeness of adhering to the person's own choice, in the end we should strive for being most considerate of our readers, directly and indirectly. --Denny (talk) 18:13, 17 October 2016 (UTC)

Use of Rollback[edit]

User:Sjoerddebruin has removed my rollback right, after I used it on my own talk page to remove a post he left there.

Ironically, on that post, he used it to tell me not to use rollback on my own talk page, to remove an abusive post. As far as I am aware, it is perfectly acceptable to use roll back on one's own talk page (its common practice to do so, for example, on en.Wikipedia).

Wikidata:Rollbackers says "Rollback should only be used to revert vandalism and test edits" and "occasional exceptions may apply". I contend that one's own talk page is clearly within the latter. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:31, 16 October 2016 (UTC)

P.S. It seems I also accidentally rolled back an edit on Wikidata:Property creators as while I was using my mobile phone; I wasn't aware of doing so and it wasn't my intention to do so; I have self reverted that edit. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:34, 16 October 2016 (UTC)
P.P.S It is possibly relevant that there is a false claim here, in a discussion involving Sjoerddebruin, immediately preceding his removal of my rollback right, that I used rollback when in fact I simply reverted. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:51, 16 October 2016 (UTC)
On it.wiki isn't acceptable delete post in talk page. If is diffamatory you can ask an admin to obscure the text, if is an error you can use <s>...</s> --ValterVB (talk) 11:13, 16 October 2016 (UTC)
I don't think even Sjoerddebruin is claiming that a user may not be delete posts on their own talk page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:21, 16 October 2016 (UTC)
"I confirm that my use of rollback will comply with the guideline at Wikidata:Rollbackers", see here. Please note that the rollback rights were removed before in April for the same reason and were added back in June without community consensus. How in the world is it "perfectly acceptable" to rollback someones edits on your talk page? And once again, this is Wikidata. In every discussion you point to practices and guidelines on the English Wikipedia, as they suit your opinion. But this is Wikidata, another project. The discussion here is not related to this, by the way. Sjoerd de Bruin (talk) 11:22, 16 October 2016 (UTC)
My use of rollback on my own talk page has never before been questioned, much less been the cause of its removal. Nor have I ever seen anyone else's use of rollback on their own talk page be an issue. Perhaps you can provide some links to examples? As noted above, my use of rollback on my own talk page very much was in compliance with the guidelines on this wiki, which allow not only for "occasional exceptions" noted, but "commons sense". You have yet to demonstrate otherwise. Your claims about "every discussion" are transparently false. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:33, 16 October 2016 (UTC)
The simple attitude of Pigsonthewing here shows he is not suited to having rollback. Using a mobile device was no excuse. Also, this belongs at WD:AN...--Jasper Deng (talk) 15:49, 16 October 2016 (UTC)
The decision about whether rollback may be used by an editor on their own talk page is a matter for the whole community, not just for admins. Do you have any evidence of a precedence that it may not? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:05, 16 October 2016 (UTC)
  • Reverting a message left on your talk page in good faith with rollback is clearly abuse of the permission. Is it really that hard to just get along with other people? :/ -- Ajraddatz (talk) 00:42, 17 October 2016 (UTC)

I don't think that it was right for Sjoerddebruin to be the one to do it as they were the other party engaged in the concurrent dispute/edit war with Andy, and it was their message that Andy rolled back. I tend to agree that removal was correct, but should have been done by someone who was not involved.
The edit summary is not acceptable at all - firstly it's completely unnecessarily offensive and confrontational, and secondly it's clearly factually inaccurate (whatever the rights and wrongs of doing so, Sjoerddebruin did remove Andy's rollback for the reason cited). Edit summaries should be a (reasonably accurate) summary of the edit made, not commentary on a preceding edit* or other editors. (*except if the edit was made solely to correct a previous edit or edit summary). Thryduulf (talk) 08:56, 17 October 2016 (UTC)

  • Using rollback to remove good faith messages on your (public) talk page, causes that the sender will be notified that his contribution is reverted. That is unnecessarily offensive, and a clear abuse of the rollback right, even if not strictly forbidden. Rights have been abused by Pigsonthewing before, for editwarring. He clearly does not have sufficient intrinsic etiquette to bear these rights. Sadly. I think Sjoerddebruin was right to remove the rights, although I agree with Thryduulf that it would have been better if he had left this to someone else. Lymantria (talk) 06:41, 18 October 2016 (UTC)

Scientific Articles[edit]

Hi! Recently I've created three itens about scientific articles (Chaotic lensing around boson stars and Kerr black holes with scalar hair (Q27314892), Shadows of Kerr black holes with scalar hair (Q27333160), Kerr-Newman black holes with scalar hair (Q27315881)) linked to a certain author (Carlos Herdeiro (Q9697128)). All the articles are deposited in arXiv. Since I hardly can find anything about it, I would like to know if there is any:

  • WikiProject that would be dedicated to the import of scientific articles' data;
  • tool to automatise that import;
  • rule or role model for what structure those itens must have.

edit: I would also to ask on your opinion on translation of the titles. Should they be translated into other languages, be kept in English or should I do nothing at all about them?

Thanks for your answers in advance and sorry for the inconvenience. - Sarilho1 (talk) 10:39, 16 October 2016 (UTC)

@Daniel Mietchen, James Hare (NIOSH):, though I don't think they've imported articles specifically from arXiV. Mahir256 (talk) 16:28, 16 October 2016 (UTC)
@Sarilho1, James Hare (NIOSH), Mahir256: There is Wikidata:WikiProject Source MetaData, which has a data model for scholarly articles, but the focus of activities so far was on journal articles, for which we have some tools. The dataset that I am working on does have arXiv identifiers, though, which may be a good starting point for you to explore. There is no policy re translation yet, and I have seen both approaches. For non-English articles, a translation to English is often available for the title, whereas English-language articles rarely have that, so their English title is usually used for the other Wikidata languages as well. --Daniel Mietchen (talk) 17:21, 16 October 2016 (UTC)
Thank you both for the help. I will look into it. - Sarilho1 (talk) 17:25, 16 October 2016 (UTC)

Exetrnal ID proposal[edit]

More eyes are needed on Wikidata:Property proposal/Supermodels.nl. Specifically, is it acceptable to have a property for an identifier derived from a "private/commercial" website? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:29, 16 October 2016 (UTC)

Poland properties template (translations needed)[edit]

I have just created {{Poland properties}}. Please can someone add Polish translations of the template's labels? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:29, 16 October 2016 (UTC)

Template help needed[edit]

Please see Wikidata talk:Database reports/Humans with missing claims#Talk page template. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:26, 16 October 2016 (UTC)

Wrong nationalities[edit]

I'm removing some 350 country of citizenship (P27)  Italy (Q38) from people who died before 1861, when the modern Italian state was created; at the end of July I did it for thousands of items. I've noticed that most of these new country of citizenship (P27)  Italy (Q38) have been added either through Wikidata Game (this one, another one ...) or through import from Italian Wikipedia. Would it be possible to reduce new wrong statements at least from these two sources? Thank you, --Epìdosis 19:17, 16 October 2016 (UTC)

I've requested a edit filter for the Dutch situation a while ago. Maybe you can request one on Wikidata talk:Abuse filter. Sjoerd de Bruin (talk) 19:25, 16 October 2016 (UTC)
@Epìdosis: Do you have a list with possible citizenship according to the location and the date for Italy ? Nature abhors a vacuum. Snipre (talk) 20:19, 16 October 2016 (UTC)
Just removing them isn't the right approach. Replace it with the more precise country. Otherwise you'll just get into an endless loop. Multichill (talk) 20:31, 16 October 2016 (UTC)
The concepts of nationality and citizenship are not as old as mankind. If there are no suitable predecessors of modern Italy, it might be worth to consider adding no value-claims for country of citizenship (P27). It would then be kind of “occupied” and probably no longer be offered for data import by these tools. —MisterSynergy (talk) 20:48, 16 October 2016 (UTC)
It is the same situation as in Germany, see File:Italy 1843.svg. The second example of Cento seems to be near Ferrara which is at that map part of the Papal States. I don´t think such information should be added by this tool. --Molarus 21:23, 16 October 2016 (UTC)
The most complete list of Italian historic state is it:Antichi Stati italiani: as you can see, the tables are really complex, so I can't manually correct hundreds of items. I replaced Italy (Q38) with Kingdom of Italy (Q172579) for those who died between 1861 and 1946, but for those who lived before 1861 I don't have a definite solution at the moment. --Epìdosis 21:29, 16 October 2016 (UTC)
So the best is to replace wrong value by "some value" until a correct data can be provided with source and I hope that the tool is not considering that value as an empty value. The solution of "no value" is not the best one: even if the definition of citizenship was not known before the XIX century, special rights and duties similar to the ones related to citizenship existed in former European states (taxes, right to wear weapons or to be enrolled in the army, right to be elected in some councils,...). In some cases we can find data which can be considered as citizenship, and in that cases we should be able to add this information. The main problem is always sources. Snipre (talk) 08:55, 17 October 2016 (UTC)
We also have to think about infoboxes using those statements and how module wikidata response to a value "some value". I do not know, since I have no experience with that module. By the way, most people born in the Kingdom of Italy were citizen of Italy a few years later. Therefore, being 100% right, both nationalities plus qualifier should be in the item. Which might make problems for the Wikidata module. At least the cycling module has a lot of special lua code to show the right flag for cycling races, if the data about nationality is complete and formatted right. In my view, there are only two cases regarding nationality in Wikidata: The easy cases and the false cases. Maybe Italy as nationality is not that bad after all. I mean, it is obvious that it is wrong for people living before 1861, but everyone knows that. --Molarus 11:08, 17 October 2016 (UTC)
There are also more modern cases, e.g. is possible for someone to have been a national of Kingdom of Yugoslavia (Q191077), Socialist Federal Republic of Yugoslavia (Q83286), Federal Republic of Yugoslavia (Q838261), Serbia and Montenegro (Q37024) and Serbia (Q403) at various times in their life without having moved. It is not correct to simply say they have been a citizen of Serbia (Q403) their whole life, even if that makes it simple. Thryduulf (talk) 12:04, 17 October 2016 (UTC)

Q38 is just "Italy". If someone wants to use an identifier with more specific meanings, such as the current Republic of Italy or the Kingdom of Italy in 1861 or 1870 (or 476 or 493 or 568 or 800 or 1805 or others), or for some mix thereof, they should create a separate item. Nemo 12:51, 17 October 2016 (UTC)

Pictogram voting comment.svg Comment This is a significant matter and applies to MANY cases that need a resolution. United Kingdom / United Kingdom of Great Britain and Ireland / Kingdom of Great Britain / Kingdom of Scotland; add the complexity of the Commonwealth nations being British subjects prior to 1949, the Irish Republic, etc. There is a significant matter to resolve and it needs more than removals of incorrect, and piecemeal fixing. It needs discussion, and direction, exception reports to be done properly  — billinghurst sDrewth 05:09, 18 October 2016 (UTC)

German states prior to WW1, those nations that didn't have citizenship until they had a semblance of national government. So many examples  — billinghurst sDrewth 05:12, 18 October 2016 (UTC)
This may be less important case, but to collect them in one place... For example, Rūdolfs Blaumanis (Q1082044) - a real Latvian. Was dead before Latvia was independent, so I have to put country of citizenship (P27)=Russian Empire (Q34266) and I can't put country of citizenship (P27)=Latvia (Q211). That means, that if I query for Latvian guys and girls, Rūdolfs Blaumanis (Q1082044) will be excluded... Not good. OK, I could get him via other properties: languages spoken, written or signed (P1412), but that would find also many false positives, name in native language (P1559), but hmm.. there are some cases, when this won't work. And if somebody that isn't so knowledgable about this makes query, he will be disappointed. If some bot goes and adds label from P27 and P106 and converts Russian Empire to Russia - also not good, actually - very bad. --Edgars2007 (talk) 05:56, 18 October 2016 (UTC)
@Edgars2007 Considered ethnic group (P172)? --Njardarlogar (talk) 07:54, 18 October 2016 (UTC)
At en:WP they write born in Ergli, Russian empire, now Latvia and died in Punkaharju, Russian empire, now Finland. To keep it simple, we could create a new property to tell the current country where the item is located.(P17: Russian Empire, Pnew: Latvia) By the way, take old cities, for example Roman cities. At the moment we say those cities, gone 2000 years ago, have a P17 with Italy, German or France as value. Actually it is P17 and roman Empire are value. A new property would help a lot. Or we turn things around and say that P17 tells the current country and Pnew the situation at an earlier time. Therefore Pnew for Rūdolfs Blaumanis would be Russion empire and had he lived before 1721, Pnew would be Swedish Empire. This way we do not have to change a lot. --Molarus 09:55, 18 October 2016 (UTC)
Njardarlogar, yes, I have looked at that item, but it seems one of those sensitive properties. Molarus, I hope you don't suggest chenaging place of birth from city to country, aren't you? :) But I have seen (probably from ruwiki bot imports), that item has claim: "place of birth (P19): city" with qualifiers - located in the administrative territorial entity (P131) and country (P17) with the country, that existed at that time. Don't think placing both country-at-the-time and current country is needed. You can get current country from SPARQL queries. And anyway, this doesn't resolve my usecase about querying Latvians - already mentioned why in previos post. --Edgars2007 (talk) 16:22, 18 October 2016 (UTC)

What's the normal mode for addition of new humans & links to en.wikipedia biography articles[edit]

All the best questions, such as this, sound stupid. Other than users manually (or manually via a gadget) adding links from wikidata to en.wikipedia biography articles, and other than users manually (or via a manually invoked gadget) adding new instances of humans to wikidata as a concommitent of adding a new wikipedia biography article ... how are new human instances and new links to en.wikipedia added to wikidata? Do we have one or more bots doing this? (I ask because the en.wikipedia project Women in Red is using a wikidata-based count to determine how many more women articles are added each month, and I've yet to understand whether we have a dependency on editors knowing that in addition to their work on an article, a wikidata records is required.) I hope that all makes some sort of sense; thanks --Tagishsimon (talk) 22:35, 16 October 2016 (UTC)

the wikidata record is created automatically if the en.wikipedia article is linked to an article in another language wikipedia - via the "Add links" link in the left navigation bar. Other than that, yes there are bots running that may do this but to be safe it may be best to create the record yourself (or first check if the wikidata record already exists - there may be articles about this person in other languages already). ArthurPSmith (talk) 18:08, 17 October 2016 (UTC)

Confusing/wrong interwikis[edit]

I was looking the interwiki at en:Wikipedia:Guidance for younger editors (a) which listed an interwiki to ms.wiki of different topic ms:Wikipedia:Melindungi privasi kanak-kanak (b) which is identical to en:Wikipedia:Protecting children's privacy (c). Looking at their wikidata, Wikipedia:Guidance for younger editors (Q13575670) have (a), but doesn't have (b), while Wikipedia:Protecting children's privacy (Q13417598) correctly have both (b) and (c). My question is then why (a) links to (b)? I don't see any recent edits on both Q13575670 and Q13417598 either. Bennylin (talk) 07:50, 17 October 2016 (UTC)

Old-style interwiki's still exists on some places and should be removed because it creates situations like these. Sjoerd de Bruin (talk) 07:52, 17 October 2016 (UTC)

Ranking order[edit]

Wikidata:Property proposal/rules for classification.

Example by @Thryduulf:: "if person A has 10 points and person B 20 points, which one wins depends on the sorting order - if they points for wins then person B wins, but if they are penalty points then person A wins."

The question is how can we have a property to show that a rule works increasing or decreasing. Sometimes the bigger amount is first, sometimes the bigger amount is last. Xaris333 (talk) 09:09, 17 October 2016 (UTC)

WDQ / SPARQL: Beginner's question[edit]

Would anybody be so kind as to help me with a query? I have no experience with them and don't know SPARQL (yet), but I'd like to learn.

What I'm trying to do is get all Wikidata items that have Teuchos ID (P2018) with a string starting with "P-", then add described by source (P1343)  Philologisches Schriftsteller-Lexikon (Q27357514) to all of them.

How do you translate this into a query? Jonathan Groß (talk) 09:42, 17 October 2016 (UTC)

Wikidata:Request a query is a better place to ask this question. --Pasleim (talk) 09:45, 17 October 2016 (UTC)
I hope the described by source (P1343) would contain specific qualifiers, though. Sjoerd de Bruin (talk) 09:47, 17 October 2016 (UTC)

@Pasleim: Thank you for the hint. @Sjoerddebruin: They shall indeed. Jonathan Groß (talk) 10:02, 17 October 2016 (UTC)

Fusion problems: Ferrero[edit]

I can't fusion en:Ferrero with other languages, for example. Can someone help ? 178.11.10.150 13:25, 17 October 2016 (UTC)

Why do you want to merge a company and a disambiguation page? Sjoerd de Bruin (talk) 13:32, 17 October 2016 (UTC)
Its not the company side, if you would have a look on english page Ferrero. 178.11.10.150 15:27, 17 October 2016 (UTC)
Sorry, no idea how this happened. Sjoerd de Bruin (talk) 17:32, 17 October 2016 (UTC)
So I guess we are talking about Ferrero (Q21493848) (family name, enwiki sitelink; is in fact a disambiguation page) and Ferrero (Q1407854) (disambiguation page, no enwiki sitelink). However, I’m not sure whether merging is the best idea here. —MisterSynergy (talk) 15:48, 17 October 2016 (UTC)
You can't merge beause a disambiguation page is different than a page of persons with the same surname. --ValterVB (talk) 17:24, 17 October 2016 (UTC)
But as the English article listed more than people sharing the same surname, it was a disambiguation page. So I moved the English sitelink to Ferrero (Q1407854) and Ferrero (Q21493848) (the family name) is without sitelink, which isn't a problem. But we need to keep separate family names and disambiguation pages, so no merging the two! --Harmonia Amanda (talk) 06:25, 18 October 2016 (UTC)
No, the english page isn't a disambiguation. In fact you can't found it in en:Special:DisambiguationPages. It's a rational choice, mix surname or name with disambiguation is an error. Just an example to clarify: en:Bacon (name) and en:Bacon (disambiguation) why they are separated? Because are diffrent thing, and we can't mix them. --ValterVB (talk) 06:42, 18 October 2016 (UTC)
Last thing: in en:Category:Surnames is clearly written « However, do not use the template on disambiguation pages that contain a list of people by family name »
As I am the one who is currently disentangling family names and disambiguation pages, I quite know that... Usually when it's a surname page, others uses are under a "See also" section, not an "other" section at the same level that the "persons" one (as if all uses are equal for the article, as if the article is about a disambiguation). But in this case, it's probably easier to correct in on the English Wikipedia so it's more clearly about the family name and the other uses are more a "by the way, that exist to but it's not the subject here". I mostly try not to modify articles on Wikipedias to make them more clean from a Wikidata point of view, but, hey, if in this case you prefer it, not a problem at all, I just don't want family names and disambiguation pages getting mixed. --Harmonia Amanda (talk) 06:55, 18 October 2016 (UTC)

How to get the Wikidata ID from an article[edit]

Well. I can't figure this. And I can't find it. How can I get the Wikidata ID of an article in order to use it in templates? For example, if I want to know that Belgium is Q31 automatically... what would be the query/code/way? Totally stuck with this. -Theklan (talk) 17:50, 17 October 2016 (UTC)

Not possible at the moment. Matěj Suchánek (talk) 17:52, 17 October 2016 (UTC)
Wow! I was getting mad! But this is strange, because in the left column of Wikipedia we can find a link to Wikidata of each article! -Theklan (talk) 18:04, 17 October 2016 (UTC)

In Greek wiki there is a Preference Gadget for showing the wikidata id and label under the title of the article. Xaris333 (talk) 19:43, 17 October 2016 (UTC)

Within an article this is not a problem to my knowledge. Module:Wikidata (Q12069631) is available in many projects and has this functionality via {{#invoke:Wikidata|pageId}}. This can be used in a template to determine the Wikidata-ID of the page which transcludes this template (i.e. on Belgium you’d get "Q31"). However, things are different on an arbitrary page which is not Belgium, on which one wants to have the Wikidata-ID of Belgium… right? —MisterSynergy (talk) 20:01, 17 October 2016 (UTC)
Yes, MisterSynergy is right. On article about Belgium you can get only "Q31" via templates/modules. With javascript there isn't such problem, but that won't help in this case, I suppose. --Edgars2007 (talk) 06:02, 18 October 2016 (UTC)

Petscan help[edit]

Hello. I want, maybe with PetScan [5]

1) Using the wikidata items of the articles of a specific category, for example en:Category:Cypriot First Division seasons

2) Find which of those items are not using, for example, followed by (P156).

Xaris333 (talk) 19:48, 17 October 2016 (UTC)

1) Use wiki: Wikidata
2) (None) "P156" in "Uses items/props". Of course, you can use SPARQL, but I personally use "Uses items/props" for simple cases. --Edgars2007 (talk) 06:08, 18 October 2016 (UTC)

Wikidata weekly summary #231[edit]

value sorting according to qualifier[edit]

Is there a way to sort values of property according to their qualifiers? I am interested in the browser view of item data.

To be specific - ascendent/descendent sort of elo ratings according to their date qualifier, not date added (see Vereslav Eingorn (Q2062580) for a chess player that has elo ratings unsorted). Thanks. --Wesalius (talk) 07:28, 18 October 2016 (UTC)

No, currently it's not possible. There should be a phab ticket about it, which I can't find at the moment. --Edgars2007 (talk) 16:25, 18 October 2016 (UTC)

Wikimedia Developer Summit[edit]

Hello folks,

The Wikimedia Developer Summit will take place in San Francisco on January 9-11, 2017. All Wikimedia technical contributors, third party developers, and users of MediaWiki and the Wikimedia APIs are welcomed.

If you're interested, please not that the deadline to request travel sponsorship is Monday, October 24th. Lea Lacroix (WMDE) (talk) 09:02, 18 October 2016 (UTC)

Correct claims for a cabinet[edit]

I'm trying to cleanup the items identifying Danish Cabinets, i.e. instance of (P31)  Cabinet of Denmark (Q1503072).

So before I started cleaning them up, the ones that had dates assigned used three different strategies:

  1. start time (P580)/end time (P582) as qualifiers on the instance of (P31)
  2. start time (P580)/end time (P582) as statements directly on the item
  3. inception (P571) as statement on the item directly (should probably be augmented with dissolved or abolished (P576), but that was not the case)

While I'm halfway done streamlining these according to strategy number 2, I somewhat got in doubt and has been searching far and wide to understand what would be the concensus to follow in this area.

I've sort of given up on the project chat archive and therefore bring the question here :-) --VicVal (talk) 11:27, 18 October 2016 (UTC)