motif
Pure Erlang keyword and topic extraction using the RAKE algorithm. Supports French, English, and German with built-in stop-word lists. No external dependencies.
Installation
%% rebar.config
{deps, [{motif, "0.1.0"}]}.Quick start
%% Extract from English text (language auto-detected)
Results = motif:extract(<<"Red roses are a symbol of love and beauty.">>),
%% [{<<"red roses">>, 4.0}, {<<"symbol">>, 1.0}, {<<"love">>, 1.0}, {<<"beauty">>, 1.0}]
%% Explicit language + max results
Top3 = motif:extract(Text, #{lang => fr, max => 3}),
%% Auto-detect language (samples first 200 words)
Auto = motif:extract(Text, #{lang => auto}),
%% Get the stop-word list for a language
Stops = motif:stop_words(fr).API
%% Extract keyword candidates. Returns [{Keyword, Score}] sorted by score desc.
-spec extract(binary()) -> [{binary(), float()}].
-spec extract(binary(), #{max => pos_integer(),
lang => fr | en | de | auto}) -> [{binary(), float()}].
%% Return the built-in stop-word list for a language.
-spec stop_words(fr | en | de) -> [binary()].Algorithm
RAKE (Rapid Automatic Keyword Extraction):
-
Split text into sentences on
. ! ? - Within each sentence, split into candidate phrases on stop words
-
Score each word:
degree(word) / frequency(word)wheredegree(w)= sum of phrase lengths containingw - Score each candidate: sum of its word scores
- Return sorted by score descending, deduplicated
Multi-word phrases with co-occurring rare words score highest.
Language detection
lang => auto samples the first 200 words, counts stop-word hits per
language, and picks the language with the most hits. Falls back to en
on a tie or empty input.
License
Apache 2.0 — see LICENSE.