i18n and L10n: 4 - Language Tags
Article about the importance and constuction Language Tags for multilingual websites
Author: Matthew Wittering | Published: 12th July 2009
What are languages and why do they need tags? Language tags are used in HTML content to describe semantically the language used to compose the main body content held within the document. Failing to identify the language used for composition causes unnecessary complications in a multilingual world.
When constructing a correctly Localized L10n document in XHTML you only have to identify the language using the correct language tag once because of the hierarchical nature of XML and HTML documents. This does not stop the you adding additional language tags to define a portion of the document as Arabic.
Including Language Tags
To include the language tags in XHTML content the developer should add both the "xml:lang" and "lang" attributes into the HTML tag which opens the document markup. Below you will see the examples of both with English British.
Example
xml:lang="en-GB" lang="en-GB"
Constructing Language Tags
When building language tags you must follow the formula:
language-script-region-variant-extension-privateuse
If you view the source of this page you will see that I have used the language tag en-GB to show that I am using the English as used in Great Britain. For example constructing the relevant language tag for; Chinese using Simplified Chinese in Hong Kong. The language tag methodology as detailed by the W3C would derived the desired language tag following the format of; language-script-region:
Example
zh-Hans-HK
Summary
Language tags are required in a modern multilingual world. They aid computer applications in the identification of language used to compose human readable content published to the World Wide Web. Developers and authors should tag there content so that content management systems can return versions or translation of url in the user preferred language.
Yes this is an altruistic view of large portal websites. However is does requires change in the metaphor currently used to develop websites, the metaphor of static published documents. They are now far more powerful than a glorified Rich Text Document. With the convergence of Geographical services, Server and Client scripting languages interfaces are very fluid. Easy to change to meet adapting needs for users.
We should be looking to create adaptive website resembling fluid adaptive technologies like the Google Wave not the Rosseta Stone, a classical document etched on a stone tablet composed in Egyptian Hieroglyphic, Demotic and Greek. The texts on the stone are a decree from Ptolemy V, describing the repealing of various taxes and instructions to erect statues in temples.
Developer and architects of international websites should be thinking more about future not that the present to anticipating the approaching needs when constructing the portals of tomorrow. This will be coming increasingly critical in an ever shrinking world of Jumbo jets and the Internet.
Links
- http://www.w3.org/International/articles/language-tags/Overview.en.php
- http://en.wikipedia.org/wiki/Rosetta_Stone
This work is licenced under a Creative Commons Licence
I am a graduate of Lougborough University where I read Computing and Management BSc (Hons) earning a 2:1 classification.