Zotonic Translation module

Manages language selection, and recognition. Used to serve pages in multiple languages, and add language codes to the URLs.

Language recognition

If a resource is added without giving a language, then this module will try to guess the language using n-grams. This is done by extracting n-grams from a text and then comparing with the statistics of n-grams in the various language specific corpuses.

The module currently has profiles for the following languages:

Code	Language
`af`	Afrikaans
`ar`	Arabic
`be`	Belarusian
`bg`	Bulgarian
`bs`	Bosnian
`cs`	Czech
`da`	Danish
`de`	German
`el`	Greek
`en`	English
`es`	Spanish
`et`	Estonian
`fa`	Persian
`fi`	Finnish
`fr`	French
`fy`	Frisian
`ga`	Irish
`he`	Hebrew
`hi`	Hindi
`hr`	Croatian
`hu`	Hungarian
`id`	Indonesian
`is`	Icelandic
`it`	Italian
`ja`	Japanese
`ko`	Korean
`lt`	Lithuanian
`ms`	Malay
`nl`	Dutch
`no`	Norwegian
`pl`	Polish
`pt`	Portuguese
`ro`	Romanian
`ru`	Russian
`si`	Sinhalese
`sk`	Slovak
`sl`	Slovenian
`sv`	Swedish
`sr`	Serbian
`sq`	Albanian
`ta`	Tamil
`th`	Thai
`tl`	Tagalog
`tr`	Turkish
`uk`	Ukranian
`vi`	Vietnamese
`xh`	Xhosa
`zh-hant`	Cantonese
`zh`	Chinese

The n-gram data is copied from Wikipedia. Typically texts about the country or the language is used. Using texts about the country has as an advantage that typical names for the language are also included in the statistics.

Adding a new language

If you want to add a new language then:

Add a text file with the correct language code to priv/data/texts
Run from the Erlang command line: translation_detect:generate_profile_data().
A new file is generated in priv/data/profiles, check this new file in into git.
The new profiles can be loaded with: translation_detect:load_profile_data().