Language codes used by Wikimedia projects

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Wikimedia projects use language codes to identify language specific editions of a Wikimedia project. The codes are often used as a subdomain, e.g. for the Wikipedia editions it is a subdomain below wikipedia.org. Interlanguage links in the English Wikipedia are sorted by that code.

The codes mostly correspond to the language codes defined by ISO 639-1 and ISO 639-3, and the decision of which language code to use is mostly in accordance with the IETF language tag policy.

One code is not a language code ('be-x-old') but refers to a specific orthography.

Codes that do not agree with the ISO 639 meaning or are deprecated[edit source | edit]

Several codes are used for project editions where the ISO 639 meaning is different from the content contained in the project.

This makes the codes partially "not useful" [1].

Project codes[edit source | edit]

WP edition name WP code ISO 639 code Meaning of WP code if read as ISO 639 code Note
Albanian sq als Broader meaning than edition content: Macrolanguage, Albanian with four individual languages, but Tosk Albanian is mainly used. 'als' is the individual code for Tosk Albanian, but this code is already used for Alemannic Wikipedia.
Alemannic als gsw Unrelated language: Tosk Albanian[2]. 'gsw' is the ISO 639-2 and -3 code for Swiss German, Alemannic and Alsatian. 'gct' is the ISO 639-3 code for Colonia Tovar dialect, 'swg' is for Swabian German and 'wae' is for Walser German.
Arabic ar Broader meaning than edition content: Macrolanguage 'ara'/'ar' [3] is Arabic in general. Egyptian Arabic Wikipedia uses code 'arz'. Modern Standard Arabic has the code 'arb'.
Aromanian roa-rup rup (Not an ISO code.) 'roa' in ISO is Romance (Other).
Banyumasan map-bms - (Not an ISO code.) 'map' is Austronesian (Other), bms is Bilma Kanuri, a language of Niger.
Dutch Low Saxon nds-nl (several) (Not an ISO code.) 'nds' is 'Low Saxon', restricted to Germany in Ethnologue. The Low Saxon dialects in the Netherlands have their own ISO codes.
Belarusian (Taraškievica) be-x-old - (Not an ISO code.) Valid would be 'be-tarask'.
Bihari bh (several) Broader meaning than edition content: collective code 'ISO 639:bih' includes Bhojpuri bho, Maithili mai, Magahi mag and nine others[4]
Yue zh-yue yue (Not an ISO code.) This is a Chinese Wikipedia in written Cantonese.
Chinese zh cmn Broader meaning than edition content: Macrolanguage 'zho'/'zh' [5] is Chinese in general. Chinese Wikipedia is written in modern written vernacular Chinese with four standard forms.[a] Modern written vernacular Chinese is based on modern Mandarin (cmn).
Classical Chinese zh-classical lzh (Not an ISO code.)
Min Nan zh-min-nan nan (Not an ISO code.) 'min' is unrelated. This is a Chinese Wikipedia in written Hokkien POJ.
Malay ms - Broader meaning than edition content: Macrolanguage that includes more than 30 individual languages/dialects Wiki excludes Indonesian because Indonesian Wikipedia (id) exists independently.
Norman nrm - Unrelated language: Narom [6]
Norwegian Bokmål no nb Broader meaning than edition content: Norwegian in general, i.e. Bokmal ('nb'/'nob') and Nynorsk ('nn'/'nno'). Nynorsk correctly uses 'nn'.
Ripuarian ksh - Narrower meaning than edition content: Kölsch, one variety of the Ripuarian languages
Samogitian bat-smg sgs (Not an ISO code.) 'bat' is Baltic (Other), 'smg' is Simbali language.
Serbo-Croatian sh hbs Deprecated: 'sh' for the Serbo-Croatian Wikipedia has been deprecated. [7]
Simple English simple en (Not an ISO code.)
Syriac arc syc not the best match. 'syc' is better suited for the content [8]
Tarantino roa-tara - (Not an ISO code.) 'roa' is Romance (Other).
Võro fiu-vro vro (Not an ISO code.)
Zamboanga Chavacano cbk-zam - (Not an ISO code.) 'cbk' is Chavacano. zam is unrelated Miahuatlán Zapotec.

Internal code changes[edit source | edit]

The following codes have been changed for the mark-up [9]:

  • bat-smg -> sgs (wikipedia)
  • fiu-vro -> vro (wikipedia)
  • zh-classical -> lzh (wikipedia)
  • zh-min-nan -> nan (wikipedia, wiktionary, wikibooks, wikiquote, wikisource)
  • zh-yue -> yue (wikipedia)
  • be-x-old -> be-tarask (wikipedia)
  • als -> gsw (wikipedia, wiktionary, wikibooks, wikiquote)
  • roa-rup -> rup (wikipedia, wiktionary)

Redirects[edit source | edit]

  • 'dk' redirects to 'da' [10]

Usage in Wikimedia projects[edit source | edit]

Project specific codes[edit source | edit]

Wikidata[edit source | edit]

In Wikidata a property named "Wikimedia language code" exists. [12].

References[edit source | edit]

  1. ^ They are the standards for Mainland Chinese and Singaporean (written in simplified Chinese characters), and Taiwanese and Hong Kong/Macau (in traditional Chinese characters). They are automatically converted by the wiki machine. See Chinese Wikipedia.

See also[edit source | edit]

External links[edit source | edit]