Bubblescript Matching Language (BML)

Build status

BML is a rule language for matching natural language against a rule base. Think of it as regular expressions for sentences. Whereas regular expressions work on individual characters, BML rules primarily work on a tokenized representation of the string.

BML ships with a builtin string tokenizer, but for production usage you should look into using a language-specific tokenizer, e.g. to use the output of Spacy's Doc.to_json function.

This project is still in development, and as such, the BML syntax is still subject to change.

To play with BML, check out the demo environment, powered by Phoenix Liveview.

Examples

Matching basic sequences of words:

Match string Example Matches?
hello world Hello, world! yes
hello world Well hello world yes
hello world hello there world no
hello world world hello no

Matching regular expressions:

Match string Example Matches?
/[a-z]+/ abcd yes

Match entities, with the help of Spacy and Duckling preprocessing and tokenizing the input:

Match string Matches Does not match
[person] George Baker Hello world
[time] I walked to the store yesterday My name is John

Rules overview

The match syntax is composed of adjacent and optionally nested, rules. Each individual has the following syntax:

Rule modifiers

Any rule can have a [] block which contains a repetition modifier and/or a capture expression.

Entity blocks are automatically captured as the entity kind.

Sentences

The expression matching works on a per-sentence basis; the idea is that it does not make sense to create expressions that span over sentences.

The builtin sentence tokenizer (BubbleMatch.Sentence.Tokenizer) does not have the concept of sentences, and thus treats each input as a single sentence, even in the existence of periods in the input.

However, the prefered way of using this library is by running the input through an NLP preprocessor like Spacy, which does tokenize an input into individual sentences.

Sigil

For use within Elixir, it is possible to use a ~m sigil which parses the given BML query on compile-time:

defmodule MyModule do
  use BubbleMatch.Sigil

  def greeting?(input) do
    BubbleMatch.match(~m"hello | hi | howdy", input) != :nomatch
  end
end

Installation

If available in Hex, the package can be installed by adding bubble_match to your list of dependencies in mix.exs:

def deps do
  [
    {:bubble_match, "~> 0.1.0"}
  ]
end

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/bubble_match.