em_filter

Hex.pmHex DocsLicense

An Erlang library for building Emergence agents connected to an em_disco discovery service.

Features

Concepts

Every node in the Emergence system is an agent. An agent has two optional features:

Memory is best used for caching expensive operations (HTTP responses, DNS lookups, rate limit state). Do not use memory to deduplicate results — deduplication is handled upstream by the Emquest pipeline.

Handler contract

Every handler module must export handle/2:

handle(Body :: binary(), Memory :: map()) ->
    {Result :: term(), NewMemory :: map()}

Body is the raw JSON query binary. Result is typically a list of embryo maps. Returning the same map as NewMemory is valid for stateless behaviour.

Embryo format

Agents return a list of embryo maps:

#{
    <<"type">>       => <<"rss">>,        %% agent-defined type
    <<"properties">> => #{
        <<"url">>    => <<"https://...">>,
        <<"title">>  => <<"...">>,
        <<"resume">> => <<"...">>
    }
}

Installation

Add to your rebar.config:

{deps, [
    {em_filter, "1.2.0"}
]}.

Usage

Stateless agent

Announces capabilities but does not persist state between queries.

em_filter:start_agent(my_agent, my_handler, #{
    capabilities => [<<"search">>, <<"web">>]
}).
-module(my_handler).
-export([handle/2]).

handle(Body, Memory) ->
    Results = do_search(Body),
    {Results, Memory}.

Agent with memory (cache)

Memory is useful for caching.

-module(my_handler).
-export([handle/2]).

handle(Body, Memory) ->
    Cache = maps:get(cache, Memory, #{}),
    case maps:get(Body, Cache, undefined) of
        undefined ->
            Results  = fetch_from_api(Body),
            NewCache = Cache#{Body => Results},
            {Results, Memory#{cache => NewCache}};
        Cached ->
            {Cached, Memory}
    end.
em_filter:start_agent(my_agent, my_handler, #{
    capabilities => [<<"search">>],
    memory       => ets
}).

Multi-disco connectivity

An agent connects to every disco node listed in emergence.conf. Each node gets its own persistent WebSocket connection and worker process.

[em_disco]
nodes = localhost:8080, em-disco.roques.me

With this config, start_agent/3 spawns two workers automatically:

Port and transport resolution:

Configuration

The em_disco address is resolved in this order:

  1. [em_disco] nodes in emergence.conf (recommended)
  2. EM_DISCO_HOST / EM_DISCO_PORT environment variables (legacy, single node)
  3. Default: localhost:8080

emergence.conf locations:

Full example:

[em_disco]
nodes = localhost:8080, em-disco.roques.me

HTML utilities

The following helpers are available for agents that scrape HTML:

Function Description
strip_scripts/1 Removes <script> tags
extract_elements/2 CSS-style element extraction
get_text/1 Strips all HTML tags
extract_attribute/2 Extracts a tag attribute value
clean_text/3 Strips noise and decodes entities
decode_html_entities/1 Decodes &amp;, &#x...;, &#...;
should_skip_link/2 Filters out unwanted URLs

License

Apache 2.0 — see LICENSE.md.