Fuzler

A tiny, Rust‑powered string‑similarity helper for Elixir.

Fuzler gives you one public function:

Fuzler.similarity_score(query :: String.t(), target :: String.t()) :: float

It returns a normalised score in $0.0 – 1.0$ that tells you how closely two pieces of text match—robust to typos, word‑order swaps, case and basic punctuation.

Behind the scenes it calls a compiled Rust NIF that mixes:

Hamming distance – for very short, nearly equal‑length strings.
SIMD Levenshtein – fast edit distance from the triple_accel crate.
Token‑bag Jaccard – ignores word order.
Partial‑ratio window – finds the best‑matching snippet when the target is much longer than the query.

The result is symmetric (score(a,b) ≈ score(b,a)), length‑normalised and remains meaningful from single words to multi‑sentence paragraphs.

Installation

Add to your mix.exs:

def deps do
  [
    {:fuzler, "~> 0.1.2"}
  ]
end

You need Rust ≥ 1.70 installed; rustler will compile the NIF automatically.

Quick examples

iex> Fuzler.similarity_score("ciao", "ciao")
1.0

iex> Fuzler.similarity_score("bella ciao", "ciao bella")
0.70       # same words, different order

iex> long_text = "bella ciao come va oggi spero che tu stia bene ..."
iex> Fuzler.similarity_score("ciao", long_text)
0.75       # query appears once inside a 40‑token paragraph

iex> Fuzler.similarity_score("bonjour", long_text)
0.12       # word not present

When should I use it?

Use case	Why it works well
typo‑tolerant autocomplete / “did‑you‑mean”	Hamming + Levenshtein catch small edits fast
matching short queries inside long blobs	windowed partial ratio focuses on the best slice
order‑agnostic key comparison	token‑bag Jaccard treats “ciao bella” = “bella ciao”
quick relevance scoring in Elixir	pure NIF call, no external service needed

Not a full‑text search engine or a semantic synonym matcher—that’s what Tantivy / Embeddings are for.

API

@doc "Returns a similarity score ∈ [0.0, 1.0]"
@spec similarity_score(String.t(), String.t()) :: float

If the NIF failed to load you’ll get:

:erlang.nif_error(:nif_not_loaded)

so your code can decide to fall back or skip tests.

How good is the score?

Query / Target	Score ≈
identical strings (any case / punctuation)	1.00
same words, swapped order	0.68 – 0.72
one‑word query present once in 45‑token paragraph	\~0.75
one‑word query absent from paragraph	≤ 0.15
80‑token paragraph vs same with 1 typo	≥ 0.90
“ciao bella” with +30 random filler tokens appended	\~0.58

Running the test suite

mix test runs a handful of ExUnit cases covering:

case & punctuation variations
word‑order permutations
query present / absent in long paragraph (> 40 tokens)
very long strings with tiny edits
monotonic drop as filler tokens grow

All similarity tests auto‑skip if the NIF isn’t loaded (e.g. on CI without Rust).

License

MIT License