Unicode IDNA
Pure-Elixir implementation of UTS #46Unicode IDNA Compatibility Processing, with RFC 3492 Punycode encoding/decoding, the RFC 5893 bidi rule, and CONTEXTJ joiner rules from RFC 5892.
The library converts domain names between their Unicode and ASCII (Punycode xn-- prefixed) representations and validates that each label conforms to IDNA 2008 as relaxed by UTS #46. It passes the full IdnaTestV2.txt conformance suite — 6,389 rows × 3 operations = 19,167 assertions — for Unicode 17.0.
Installation
def deps do
[
{:unicode_idna, "~> 0.1"}
]
endUsage
Unicode.IDNA.to_ascii/2 and Unicode.IDNA.to_unicode/2 accept either a full domain name as a String.t or a list of labels. The return value mirrors the input shape: a string in returns a string out (labels are rejoined with .); a list in returns the list of processed labels.
# String in / string out — full domain
iex> Unicode.IDNA.to_ascii("bücher.de")
{:ok, "xn--bcher-kva.de"}
iex> Unicode.IDNA.to_unicode("xn--bcher-kva.de")
{:ok, "bücher.de"}
# Alternate IDNA label separators are recognised
iex> Unicode.IDNA.to_ascii("中文。中国")
{:ok, "xn--fiq228c.xn--fiqs8s"}
# List in / list out — already-split labels
iex> Unicode.IDNA.to_ascii(["bücher", "de"])
{:ok, ["xn--bcher-kva", "de"]}
iex> Unicode.IDNA.to_unicode(["xn--bcher-kva", "de"])
{:ok, ["bücher", "de"]}
# Errors
iex> Unicode.IDNA.to_ascii("not_valid")
{:error, :disallowed}
iex> Unicode.IDNA.to_ascii("not_valid", use_std3_ascii_rules: false)
{:ok, "not_valid"}Public API
Unicode.IDNA.to_ascii/2— UTS #46 ToASCII. Accepts at:String.t/0or a list of label strings; returns the same shape.Unicode.IDNA.to_unicode/2— UTS #46 ToUnicode, with the same dual-shape semantics.Unicode.IDNA.valid_label?/2— predicate for a single label, equivalent tomatch?({:ok, _}, to_ascii([label], options)).Unicode.IDNA.Punycode.encode/1andUnicode.IDNA.Punycode.decode/1— RFC 3492 primitives.Unicode.IDNA.Bidi.validate/1andvalidate_in_bidi_domain/1— RFC 5893 bidi rule.Unicode.IDNA.Context.validate/1— RFC 5892 Appendix A CONTEXTJ rules for ZWJ / ZWNJ.
Options
| Option | Default | Meaning |
|---|---|---|
:transitional | false |
UTS #46 transitional vs. non-transitional processing. The default false matches modern browsers (Chrome, Firefox, Safari). |
:check_hyphens | true |
Reject leading/trailing hyphens and -- at positions 3–4 (the latter is suppressed for labels that came from a Punycode-decoded ACE form). |
:check_bidi | true | If any label contains a right-to-left character, every label in the domain must satisfy the RFC 5893 bidi rule. |
:check_joiners | true | Labels containing ZWJ (U+200D) or ZWNJ (U+200C) must satisfy the CONTEXTJ rules of RFC 5892 Appendix A. |
:use_std3_ascii_rules | true |
Restrict ASCII characters in a label to letters, digits and hyphen. Set false to allow _ and other STD3-disallowed ASCII (e.g. for Twitter-style permissive subdomain rules). |
:verify_dns_length | true | Reject empty labels, labels longer than 63 octets, and full domains longer than 253 octets per RFC 1035. |
Refreshing Unicode data
mix unicode_idna.download
This refreshes data/idna_mapping_table.txt and data/idna_test_v2.txt (the conformance vectors) from unicode.org. The bundled files are committed to source control; the task exists to make version bumps reproducible.