elasticsearch_filter

License: Apache-2.0

Erlang library for building Emergence filter agents that search Elasticsearch.

elasticsearch_filter is an intermediate library — it provides handle/2 and base_capabilities/0 for use in agent wrappers. It does not register any agent itself.


How it works

┌─────────────┐ WebSocket ┌──────────────────────┐ HTTP/S ┌──────────────┐
│ em_disco │ ◄─────────────── │ your_agent_app │ ───────────► │Elasticsearch │
│ (broker) │ query / result │ uses this lib │ REST API │ cluster(s) │
└─────────────┘ └──────────────────────┘ └──────────────┘
fan-out per cluster
┌────────┴────────┐
│ elasticsearch_ │
│ filter_app │
└─────────────────┘

On each query, the library fans out across all configured clusters and indices in parallel, maps ES hits to Emergence embryos, and returns the merged list.


Requirements

Erlang/OTP 26+ and rebar3.

Add to your rebar.config:

{deps, [
{elasticsearch_filter, "0.1.0"}
]}.

Quick start

-module(my_elastic_agent_app).
-behaviour(application).
-export([start/2, stop/1]).
start(_StartType, _StartArgs) ->
em_filter:start_agent(my_elastic_agent, elasticsearch_filter_app, #{
capabilities => elasticsearch_filter_app:base_capabilities()
++ [<<"my_domain">>, <<"docs">>]
}),
{ok, self()}.
stop(_State) ->
em_filter:stop_agent(my_elastic_agent).

Place elastic_config.json in the working directory — see Configuration.


The Filter interface

handle(Body :: binary(), Memory :: map()) -> {[Embryo], Memory}
base_capabilities() -> [binary()]

handle/2 is called for every query frame from em_disco.

Query auto-detection

InputStrategy
Plain text: "erlang otp"multi_match with fuzziness across configured fields
ES syntax: "title:erlang", "a AND b", "date:[2024 TO *]"query_string passed as-is to Elasticsearch

Result format

Each hit is mapped to an Emergence embryo:

embryo_typeEmbryo typeRequired ES fields
"url" (default)urlurl_field, title_field, resume_field
"text"textcontent_field

Configuration

elastic_config.json

{
"clusters": [
{
"url": "https://my-cluster.es.example.com:9200",
"auth": {
"type": "api_key",
"key": "your_encoded_api_key"
},
"indices": [
{
"name": "articles",
"search_fields": ["title", "body", "tags"],
"url_field": "url",
"title_field": "title",
"resume_field": "body"
},
{
"name": "logs",
"embryo_type": "text",
"search_fields": ["message", "service"],
"content_field": "message"
}
]
}
],
"timeout": 10,
"result_size": 10
}

Authentication

auth.typeRequired fieldsHeader sent
"api_key"key (pre-encoded Elastic API key)Authorization: ApiKey <key>
"basic"username, passwordAuthorization: Basic <base64(user:pass)>
(absent)No auth header

Index options

KeyDefaultDescription
nameElasticsearch index name (required)
search_fields["title", "body"]Fields searched by multi_match
embryo_type"url""url" or "text"
url_field"url"ES field mapped to url property
title_field"title"ES field mapped to title property
resume_field"body"ES field mapped to resume property (truncated at 300 chars)
content_field"content"ES field mapped to content property (text embryo only)

Top-level options

KeyDefaultDescription
timeout10Per-query timeout in seconds
result_size10Max hits per index (_searchsize)

Multi-cluster

The library fans out across all clusters in parallel using spawn. Each cluster is searched independently and results are merged. Clusters that timeout or error return an empty list — they never block the other clusters.


License

Apache-2.0