Ftfy — fixes text for you

An Elixir port of the Python ftfy library (version 6.3.1). It takes in broken Unicode text and makes it less broken — most importantly, it detects and fixes mojibake (text that was decoded in the wrong encoding).

iex> Ftfy.fix_text("✔ No problems")
"✔ No problems"
iex> Ftfy.fix_text("Broken text… it’s flubberific!")
"Broken text… it's flubberific!"
iex> Ftfy.fix_text("LOUD NOISES")
"LOUD NOISES"
iex> Ftfy.fix_encoding_and_explain("só")
{"só", [{"encode", "latin-1"}, {"decode", "utf-8"}]}

What it does

Ftfy.fix_text/2 runs a sequence of fixes, each individually configurable via Ftfy.TextFixerConfig:

Other entry points mirror the Python API: fix_and_explain/2, fix_encoding/2, fix_encoding_and_explain/2, fix_text_segment/2, apply_plan/2, guess_bytes/1, fix_file/2, and explain_unicode/1. The Ftfy.Fixes, Ftfy.Badness, Ftfy.Chardata, Ftfy.Codecs, and Ftfy.Formatting modules expose the lower-level building blocks.

Configuration

Pass a keyword list or a %Ftfy.TextFixerConfig{}:

Ftfy.fix_text(text, uncurl_quotes: false)
Ftfy.fix_text(text, %Ftfy.TextFixerConfig{normalization: "NFKC"})

Command line

Build the escript and fix text from a file or stdin:

mix escript.build
echo '✔ No problems' | ./ftfy
./ftfy -e latin-1 broken.txt -o fixed.txt

Installation

Add ftfy to your dependencies in mix.exs:

def deps do
[
{:ftfy, "~> 0.1.0"}
]
end

Notes on the port

License and credits

This library is a port of ftfy ("fixes text for you"), created by Robyn Speer. ftfy is the result of years of careful work on the messy reality of broken Unicode, and this Elixir port exists only because of it — our deepest thanks to Robyn Speer for building and maintaining the original, and for releasing it under a permissive license.

The data tables and test corpus in this repository are generated from / ported directly from python-ftfy 6.3.1 and remain the work of the original author. See LICENSE for the full license text and NOTICE for the attribution and change notice required by the Apache License.

If you use ftfy in research, please cite the original author's work as described at https://github.com/rspeer/python-ftfy.