Text

Text & language processing for Elixir.

A toolkit for tokenization, language identification, sentiment analysis, named-entity recognition, word clouds, phonetic encoding, search ranking, and the supporting plumbing — all in pure BEAM, with optional ML backends behind feature flags.

Capabilities

Detection and analysis

Strings

Statistics and search

Inflection

Installation

def deps do
  [
    {:text, "~> 0.3.0"}
  ]
end

For the language identifier, fetch the lid.176.bin model once after install:

mix text.download_lid176

For production environments using the optional Bumblebee-backed modules, mix text.download_models (plural) pre-fetches every external artefact — lid.176.bin plus the default Hugging Face checkpoints — so the first call to each module never hits the network.

A taste

# Sentiment — multilingual, no model download by default.
Text.Sentiment.analyze("J'adore ce livre.", language: :fr).label
#=> :positive

# Language identification — load the fastText model once.
{:ok, model} = Text.Language.Classifier.Fasttext.ModelLoader.load(
  Path.join(:code.priv_dir(:text), "lid_176/lid.176.bin")
)

{:ok, "es"} = Text.Language.Classifier.Fasttext.classify("Hola, ¿cómo estás?", model)

# Word cloud → SVG file in four piped steps.
text
|> Text.WordCloud.terms(language: :en)
|> Text.WordCloud.Layout.layout(width: 800, height: 600, rotations: :radial)
|> Text.WordCloud.SVG.render(palette: Color.Palette.tonal("#3b82f6"))
|> then(&File.write!("cloud.svg", &1))

Guides

In-depth walkthroughs with worked examples:

Optional dependencies

The package works without any optional deps. Adding them enables progressively heavier capabilities:

Dep Enables
:exla Order-of-magnitude faster inference for Fasttext and the Bumblebee-backed modules. Strongly recommended in production.
:bumblebee Neural sentiment, POS, NER, and the KeyBERT word-cloud backend.
:localize CLDR-canonical locale resolution (fr-Latn-CA, zh-Hans-CN) and Localize.LanguageTag input shapes.
:colorColor.Palette.Tonal and Theme palettes for SVG word-cloud rendering.
:text_stemmer Snowball stemming (~30 languages) for word-cloud morphological-variant consolidation.

Calls that need a missing optional dep raise with installation instructions; the rest of the package keeps working.

Every public function that takes a :language (or :locale) accepts an atom (:fr), a string ("fr", "fr-CA", "zh-Hans-CN"), or a Localize.LanguageTag struct (when :localize is loaded). See Text.Language for the normalisation helpers.

Roadmap

License

Apache 2.0 — see LICENSE.md.