Unicode IDNA

Pure-Elixir implementation of UTS #46Unicode IDNA Compatibility Processing, with RFC 3492 Punycode encoding/decoding, the RFC 5893 bidi rule, and CONTEXTJ joiner rules from RFC 5892.

The library converts domain names between their Unicode and ASCII (Punycode xn-- prefixed) representations and validates that each label conforms to IDNA 2008 as relaxed by UTS #46. It passes the full IdnaTestV2.txt conformance suite — 6,389 rows × 3 operations = 19,167 assertions — for Unicode 17.0.

Installation

def deps do
  [
    {:unicode_idna, "~> 0.1"}
  ]
end

Usage

Unicode.IDNA.to_ascii/2 and Unicode.IDNA.to_unicode/2 accept either a full domain name as a String.t or a list of labels. The return value mirrors the input shape: a string in returns a string out (labels are rejoined with .); a list in returns the list of processed labels.

# String in / string out — full domain
iex> Unicode.IDNA.to_ascii("bücher.de")
{:ok, "xn--bcher-kva.de"}

iex> Unicode.IDNA.to_unicode("xn--bcher-kva.de")
{:ok, "bücher.de"}

# Alternate IDNA label separators are recognised
iex> Unicode.IDNA.to_ascii("中文。中国")
{:ok, "xn--fiq228c.xn--fiqs8s"}

# List in / list out — already-split labels
iex> Unicode.IDNA.to_ascii(["bücher", "de"])
{:ok, ["xn--bcher-kva", "de"]}

iex> Unicode.IDNA.to_unicode(["xn--bcher-kva", "de"])
{:ok, ["bücher", "de"]}

# Errors
iex> Unicode.IDNA.to_ascii("not_valid")
{:error, :disallowed}

iex> Unicode.IDNA.to_ascii("not_valid", use_std3_ascii_rules: false)
{:ok, "not_valid"}

Public API

Options

Option Default Meaning
:transitionalfalse UTS #46 transitional vs. non-transitional processing. The default false matches modern browsers (Chrome, Firefox, Safari).
:check_hyphenstrue Reject leading/trailing hyphens and -- at positions 3–4 (the latter is suppressed for labels that came from a Punycode-decoded ACE form).
:check_biditrue If any label contains a right-to-left character, every label in the domain must satisfy the RFC 5893 bidi rule.
:check_joinerstrue Labels containing ZWJ (U+200D) or ZWNJ (U+200C) must satisfy the CONTEXTJ rules of RFC 5892 Appendix A.
:use_std3_ascii_rulestrue Restrict ASCII characters in a label to letters, digits and hyphen. Set false to allow _ and other STD3-disallowed ASCII (e.g. for Twitter-style permissive subdomain rules).
:verify_dns_lengthtrue Reject empty labels, labels longer than 63 octets, and full domains longer than 253 octets per RFC 1035.

Refreshing Unicode data

mix unicode_idna.download

This refreshes data/idna_mapping_table.txt and data/idna_test_v2.txt (the conformance vectors) from unicode.org. The bundled files are committed to source control; the task exists to make version bumps reproducible.