Snowball
Snowball string-processing language compiler and runtime for Elixir.
Snowball is a small string-processing language designed for creating stemming algorithms in information retrieval. This package compiles .sbl source files into Elixir modules and provides the runtime support functions that the generated modules call into.
This package does not bundle pre-compiled stemmers. For the canonical Snowball algorithms compiled to Elixir modules (English, French, German, and 30+ more), see the companion text_stemmer package.
What's in the box
Snowball.Lexer,Snowball.Preprocessor,Snowball.Analyser,Snowball.Generator— the four stages of the compiler pipeline.Snowball.Runtime— the runtime helpers (string buffer state, character class membership,find_amongdispatch tables, etc.) that every generated stemmer module calls into.mix snowball.gen— Mix task that walks a directory of.sblsources, runs each through the pipeline, and writes generated.exfiles to disk.
Installation
Add :snowball to your mix.exs deps:
def deps do
[
{:snowball, "~> 0.1"}
]
end
Generating stemmers from .sbl sources
Drop your Snowball sources in src/algorithms/ and run:
mix snowball.gen
By default this generates Snowball.Stemmers.<Lang> modules into lib/snowball/stemmers/. Override either with the relevant flag:
mix snowball.gen --module-prefix MyApp.Stemmers \
--output-dir lib/my_app/stemmers \
--algorithms-dir priv/snowballYou can also generate a specific algorithm by name:
mix snowball.gen english french
The generated modules depend only on Snowball.Runtime for their runtime, so adding :snowball to your deps is sufficient. See text_stemmer for the canonical implementation of snowball-based stemmers.
Documentation
Full API documentation is published at https://hexdocs.pm/snowball.
License
Apache-2.0. See LICENSE.md.