Jump to content

Special language codes

From Meta, a Wikimedia project coordination wiki

The language of a Wikimedia wiki can be found in the lang="..." and xml:lang="..." attributes of the <html> element of each page (or other elements for specific subcontents in multilingual pages); they are also used for styling in CSS language selectors. These language codes should generally be canonical language tags as defined by BCP 47.

In most cases, the subdomain names that we use for projects correspond to language codes, but there are some exceptions that remain for historical reasons. For example, sometimes a valid ISO 639 or BCP 47 code was not yet available at the time of creation of the project.

As BCP 47 must keep existing codes for compatibility, deprecated or removed ISO 639 codes are still considered valid in BCP 47. However, to avoid various kinds of complications, they should not be used in new content. Please follow the recommendations listed in the tables below.

Subdomains that do not match their lang attribute

[edit]
Subdomain Language Project(s) Notes
als Local name: Alemannisch
English name: Alemannic
Language family: Germanic
Wikipedia, Wiktionary, Wikibooks, Wikiquote Uses gsw which matches the language's ISO 639-3 code.
bh Local name: भोजपुरी
English name: Bihari
Language family: Indo-Aryan
Wikipedia
Tracked in Phabricator:
Task T41968 stalled

Ambiguous legacy code. Uses bho which matches the language's ISO 639-3 code for one language of the family.

roa-rup Local name: armãneashti
English name: Aromanian
Language family: Italic
Wikipedia, Wiktionary Uses rup which matches the language's ISO 639-3 code.
simple Local name: Simple English
English name: Simple English
Language family: Germanic
Wikipedia, Wiktionary Uses en of ordinary English.
zh-classical Local name: 文言
English name: Classical Chinese
Language family: Sinitic
Wikipedia Classical Chinese has ISO 639-3 code lzh.
zh-min-nan Local name: 閩南語 / Bân-lâm-gí
English name: Minnan
Language family: Sinitic
Wikipedia, Wiktionary, Wikibooks, Wikiquote, Wikisource Min Nan has ISO 639-3 code nan.
zh-yue Local name: 粵語
English name: Cantonese
Language family: Sinitic
Wikipedia Cantonese has ISO 639-3 code yue.

Miscellaneous:

  • All subdomains of wikimedia.org

Subdomains that do not conform to a valid ISO 639 language code

[edit]
Subdomain Language Project(s) Notes
als Local name: Alemannisch
English name: Alemannic
Language family: Germanic
Wikipedia, Wiktionary, Wikibooks, Wikiquote
Tracked in Phabricator:
Task T6793 stalled

Alemannic has ISO 639-3 code gsw. ISO 639-3 code als is assigned to Tosk Albanian instead.

bat-smg Local name: žemaitėška
English name: Samogitian
Language family: Baltic
Wikipedia
Tracked in Phabricator:
Task T27522 stalled

Samogitian has the ISO 639 code sgs.

cbk-zam Local name: Chavacano de Zamboanga
English name: Chavacano de Zamboanga
Language family: Pidgin and Creole
Wikipedia
Tracked in Phabricator:
Task T124657 stalled

Chavacano de Zamboanga has no ISO 639 code as an individual language. ISO 639-3 code cbk is assigned to Chavacano, a superset of Chavacano de Zamboanga.

eml Local name: emiliàn e rumagnòl
English name: Emilian-Romagnol
Language family: Italic
Wikipedia
Tracked in Phabricator:
Task T36217 stalled

ISO 639-3 code eml for Emilian-Romagnol is now retired and split into egl (Emilian) and rgn (Romagnol).

fiu-vro Local name: võro
English name: Võro
Language family: Finno-Permic
Wikipedia
Tracked in Phabricator:
Task T31186 stalled

Võro has ISO 639-3 code vro.

iu Local name: ᐃᓄᒃᑎᑐᑦ / inuktitut
English name: Inuktitut
Language family: Eskimo-Aleut
Wikipedia ISO 639 considers iu/iku not a single language, but a macrolanguage comprising ike and ikt. MediaWiki agrees (see phabricator), but: falls back to ike, called ike-cans; adds ike-latn; has no ikt support. CLDR considers Cans an aspirational script.
ksh Local name: Ripoarisch
English name: Ripuarian
Language family: Germanic
Wikipedia ISO 639-3 code ksh is assigned to Kölsch, a subset of Ripuarian.
map-bms Local name: Basa Banyumasan
English name: Banyumasan
Language family: Sunda-Sulawesi
Wikipedia Banyumasan has no ISO 639 code as an individual language. ISO 639-1 code jv/jav is assigned to Javanese, a superset of Banyumasan.
nds-nl Local name: Nedersaksies
English name: Dutch Low Saxon
Language family: Germanic
Wikipedia Duplicated with Low German's nds.
nrm Local name: Nouormand
English name: Norman
Language family: Italic
Wikipedia
Tracked in Phabricator:
Task T25216 stalled

Norman has no ISO 639 code as an individual language (However, two dialects of Norman, Guernésiais and Jèrriais, are sharing ISO 639-3 code nrf). ISO 639-3 code nrm is assigned to Narom language instead. ISO 639-3 lumps Norman with French, as with most varieties of northern France.

roa-rup Local name: armãneashti
English name: Aromanian
Language family: Italic
Wikipedia, Wiktionary
Tracked in Phabricator:
Task T17988 stalled

Aromanian has ISO 639-3 code rup.

roa-tara Local name: tarandíne
English name: Tarantino
Language family: Italic
Wikipedia Tarantino has no ISO 639 code as an individual language. ISO 639-3 lumps it with Italian, as with most varieties of northern Italy.
sh Local name: srpskohrvatski / српскохрватски
English name: Serbo-Croatian
Language family: Slavic
Wikipedia, Wiktionary sh was originally the ISO 639-1 code for Serbo-Croatian, but it was deprecated in 2000. However, it remains a valid BCP 47 language tag. There is also the ISO 639-3 code hbs for Serbo-Croatian.
simple Local name: Simple English
English name: Simple English
Language family: Germanic
Wikipedia, Wiktionary
Tracked in Phabricator:
Task T110190 stalled

Simple English has no ISO 639 code, but the registered IETF variant subtag simple can be used for any language.
BCP 47-aware applications should have no problem identifying Simple English as being part of normal English, as long as it is properly tagged as "en-simple" and not just "simple".

zh-classical Local name: 文言
English name: Classical Chinese
Language family: Sinitic
Wikipedia
Tracked in Phabricator:
Task T10217 stalled
Tracked in Phabricator:
Task T30443 stalled

Classical Chinese has ISO 639-3 code lzh.

zh-min-nan Local name: 閩南語 / Bân-lâm-gí
English name: Minnan
Language family: Sinitic
Wikipedia, Wiktionary, Wikibooks, Wikiquote, Wikisource
Tracked in Phabricator:
Task T10217 stalled
Tracked in Phabricator:
Task T30442 stalled

Min Nan has ISO 639-3 code nan.

zh-yue Local name: 粵語
English name: Cantonese
Language family: Sinitic
Wikipedia
Tracked in Phabricator:
Task T10217 stalled
Tracked in Phabricator:
Task T30441 stalled

Cantonese has ISO 639-3 code yue.

Miscellaneous:

  • tokipona – defunct Wikipedia subdomain, now redirects to tok.wikipedia.org on the wiki's return.
  • ru-sib – defunct Wikipedia subdomain, hoax in fictional “Siberian” language
  • be-x-old – fixed and redirected to be-tarask Wikipedia subdomain (see phab:T11823)

Other distinctions

[edit]
Subdomain Language Project(s) Notes
ms Local name: Bahasa Melayu
English name: Malay
Language family: Sunda-Sulawesi
Wikipedia, Wikibooks, Wiktionary Malay language used to be "ms", just like Indonesian language is "id", but since the Malay Wikipedia inception, the code "ms" has become the code for macro language (not individual language).

There are many individual languages under "ms"/"msa", including Indonesian ("id"/"ind"), Banjar ("bjn"), Minang ("min"), three living languages with their own Wikimedia projects, as well as Malay (individual language) ("mly"-Deprecated 2008 or "zlm"-Malay or "zsm"-Standard Malay / Malaysian Malay / Malaysian language)

It should be noted that the Malay Wikipedia, Wikibooks, and Wiktionary all predate the change in the language code in 18 February 2008, with the latest one, Malay Wikibooks, created on 24 August 2004.

See also:

ak Local name: ak
English name: Akan
Language family: Niger-Congo
Closed: Wikipedia, Wikibooks, Wiktionary Akan (ak/aka in ISO 639-3) is a macrolanguage consisting of two separate languages Twi (tw) and Fante (fat). The Akan Wikipedia was closed in April 2023 as redundant to the Twi and Fante Wikipedias. Akan Wikibooks and Wiktionary also existed but were closed in 2007/2008 due to never having any content.
de-formal Local name: Deutsch
English name: German
Language family: Germanic
Not used as host names but included as pseudo-variant subtags (unregistered) for some translations in translatewiki.net (used in Meta-Wiki for pages like policies when referring directly to wiki users according to their preferences): we should have used a private-use extension
nl-informal Local name: Nederlands
English name: Dutch
Language family: Germanic

Technical language code

[edit]

The special language code qqx can be used to display the ids of all system messages used on a page.

See also

[edit]