em_filter
An Erlang library for building Emergence agents connected to an em_disco discovery service.
Features
-
Connects your agent to one or more
em_disconodes over persistent WebSockets - Automatically registers on startup and reconnects on failure
-
Announces agent capabilities to the
em_discoregistry viaagent_hello - Optional persistent memory (ETS) passed across queries
- Full set of HTML scraping utilities included
Concepts
Every node in the Emergence system is an agent. An agent has two optional features:
- Capabilities — a list of strings (
<<"rss">>,<<"dns">>, …) announced toem_discoat startup. Used by disco to route queries to relevant agents only. - Memory — a map passed to
handle/2on every query and updated with the returned value.ram(default): lives in the process state, resets to#{}on restart.ets: persisted in a local ETS table, survives worker restarts within the same BEAM session.
Memory is best used for caching expensive operations (HTTP responses, DNS lookups, rate limit state). Do not use memory to deduplicate results — deduplication is handled upstream by the Emquest pipeline.
Handler contract
Every handler module must export handle/2:
handle(Body :: binary(), Memory :: map()) ->
{Result :: term(), NewMemory :: map()}Body is the raw JSON query binary. Result is typically a list of embryo maps.
Returning the same map as NewMemory is valid for stateless behaviour.
Embryo format
Agents return a list of embryo maps:
#{
<<"type">> => <<"rss">>, %% agent-defined type
<<"properties">> => #{
<<"url">> => <<"https://...">>,
<<"title">> => <<"...">>,
<<"resume">> => <<"...">>
}
}Installation
Add to your rebar.config:
{deps, [
{em_filter, "1.2.0"}
]}.Usage
Stateless agent
Announces capabilities but does not persist state between queries.
em_filter:start_agent(my_agent, my_handler, #{
capabilities => [<<"search">>, <<"web">>]
}).-module(my_handler).
-export([handle/2]).
handle(Body, Memory) ->
Results = do_search(Body),
{Results, Memory}.Agent with memory (cache)
Memory is useful for caching.
-module(my_handler).
-export([handle/2]).
handle(Body, Memory) ->
Cache = maps:get(cache, Memory, #{}),
case maps:get(Body, Cache, undefined) of
undefined ->
Results = fetch_from_api(Body),
NewCache = Cache#{Body => Results},
{Results, Memory#{cache => NewCache}};
Cached ->
{Cached, Memory}
end.em_filter:start_agent(my_agent, my_handler, #{
capabilities => [<<"search">>],
memory => ets
}).Multi-disco connectivity
An agent connects to every disco node listed in emergence.conf.
Each node gets its own persistent WebSocket connection and worker process.
[em_disco]
nodes = localhost:8080, em-disco.roques.me
With this config, start_agent/3 spawns two workers automatically:
my_agent_localhost_8080_server— connected to local discomy_agent_em_disco_roques_me_443_server— connected to public disco
Port and transport resolution:
localhost/127.0.0.1→ port 8080, plain TCP (default)- any other host without port → port 443, TLS (default)
- explicit port 443 → TLS
- any other explicit port → plain TCP
Configuration
The em_disco address is resolved in this order:
[em_disco] nodesinemergence.conf(recommended)EM_DISCO_HOST/EM_DISCO_PORTenvironment variables (legacy, single node)-
Default:
localhost:8080
emergence.conf locations:
-
Linux/macOS:
~/.config/emergence/emergence.conf -
Windows:
%APPDATA%\emergence\emergence.conf
Full example:
[em_disco]
nodes = localhost:8080, em-disco.roques.meHTML utilities
The following helpers are available for agents that scrape HTML:
| Function | Description |
|---|---|
strip_scripts/1 |
Removes <script> tags |
extract_elements/2 | CSS-style element extraction |
get_text/1 | Strips all HTML tags |
extract_attribute/2 | Extracts a tag attribute value |
clean_text/3 | Strips noise and decodes entities |
decode_html_entities/1 |
Decodes &, &#x...;, &#...; |
should_skip_link/2 | Filters out unwanted URLs |
License
Apache 2.0 — see LICENSE.md.