Simile

CIHex.pmHex DocsLicense

String similarity and distance algorithms for Elixir.

Algorithms

Distance (lower = more similar)

Similarity (0.0 to 1.0, higher = more similar)

Usage

Simile.levenshtein("kitten", "sitting")        #=> 3
Simile.damerau_levenshtein("abc", "bac")       #=> 1
Simile.osa_distance("abc", "bac")              #=> 1
Simile.hamming("karolin", "kathrin")           #=> {:ok, 3}
Simile.indel("kitten", "sitting")              #=> 5
Simile.lcs("kitten", "sitting")                #=> 4
Simile.ngram_distance("night", "nacht", 2)     #=> 0.75

Simile.jaro("martha", "marhta")                #=> 0.944...
Simile.jaro_winkler("martha", "marhta")        #=> 0.961...
Simile.sorensen_dice("night", "nacht")         #=> 0.25

Simile.normalized_levenshtein("kitten", "sitting")  #=> 0.428...
Simile.normalized_indel("kitten", "sitting")         #=> 0.714...
Simile.indel_similarity("kitten", "sitting")         #=> 0.285...

Matching

Simile.best_match("elxir", ["elixir", "erlang", "elm"])
#=> [{"elixir", 0.94...}]

Simile.best_match("rb", ["ruby", "rust", "python"], top: 2)
#=> [{"ruby", ...}, {"rust", ...}]

Simile.filter("elxir", ["elixir", "erlang", "elm"], min_score: 0.8)
#=> [{"elixir", 0.94...}]

Both accept a :by option to use any scoring function:

Simile.best_match("night", ["nacht", "nite", "day"],
  by: &Simile.sorensen_dice/2
)

Installation

def deps do
  [
    {:simile, "~> 0.1.0"}
  ]
end

Documentation: hexdocs.pm/simile

License

MIT