Unidecode

An elixir implementation of Text::Unidecode a perl module to transliterate Unicode characters to US-ASCII.

It doesn't change encoding, as every string in Elixir, all results still are UTF8/Unicode characters. But are they are easy to convert to ASCII. Let's say you have the word código that is the portuguese word for code, and try to convert it to a charlist.

iex> to_charlist("código")
[99, 243, 100, 105, 103, 111]

Unicode is made to make this kind of operation give you better results.

iex> "código" |> Unidecode.decode |> to_charlist
'codigo'

This isn't the exact characters, but is readable and intelligible to anyone who speaks portuguese.

Design Philosophy(taken from original Unidecode perl library)

Unidecode's ability to transliterate from a given language is limited by two factors:

Unidecode, in other words, is quick and dirty. Sometimes the output is not so dirty at all: Russian and Greek seem to work passably; and while Thaana (Divehi, AKA Maldivian) is a definitely non-Western writing system, setting up a mapping from it to Roman letters seems to work pretty well. But sometimes the output is very dirty: Unidecode does quite badly on Japanese and Thai.

If you want a smarter transliteration for a particular language than Unidecode provides, then you should look for (or write) a transliteration algorithm specific to that language, and apply it instead of (or at least before) applying Unidecode.

In other words, Unidecode's approach is broad (knowing about dozens of writing systems), but shallow (not being meticulous about any of them).

Installation

Add unidecode to your depencies

def deps do
  [{:unidecode, "~> 1.0.0"}]
end

Changelog

Code of Conduct

License

Unidecode is under Apache v2.0 license.