em_filter
An Erlang library for building Emergence agents connected to an em_disco discovery service.
Philosophy
Emergence is a distributed discovery network, not a search engine with a central index. Any agent can contribute any result type. Emquest (the web gateway) fans out queries across all connected agents in parallel, deduplicates results by URL, and streams cards to the browser in real time.
em_filter is the library side of this: it handles the WebSocket connection to em_disco, receives queries, calls your handler, and sends results back. Your handler focuses entirely on one thing — turning a query into a list of result maps (embryos).
Features
-
Connects your agent to one or more
em_disconodes configured inemergence.confover persistent WebSockets - Automatically registers on startup and reconnects on failure
-
Announces agent capabilities to the
em_discoregistry viaagent_hello - Optional persistent memory (ETS) passed across queries
- Full set of HTML scraping utilities included
Concepts
Every node in the Emergence system is an agent. An agent has two optional features:
- Capabilities — a list of strings (
<<"rss">>,<<"dns">>, …) announced toem_discoat startup. Used by disco to route queries to relevant agents only. - Memory — a map passed to
handle/2on every query and updated with the returned value.ram(default): lives in the process state, resets to#{}on restart.ets: persisted in a local ETS table, survives worker restarts within the same BEAM session.
Memory is best used for caching expensive operations (HTTP responses, DNS lookups, rate limit state). Do not use memory to deduplicate results — deduplication is handled upstream by the Emquest pipeline.
Handler contract
Every handler module must export handle/2:
handle(Body :: binary(), Memory :: map()) ->
{Result :: term(), NewMemory :: map()}Body is the raw JSON query binary. Result is typically a list of embryo maps.
Returning the same map as NewMemory is valid for stateless behaviour.
Embryo format
Agents return a list of embryo maps:
#{
<<"type">> => <<"rss">>, %% agent-defined type
<<"properties">> => #{
<<"url">> => <<"https://...">>,
<<"title">> => <<"...">>,
<<"resume">> => <<"...">>
}
}Installation
Add to your rebar.config:
{deps, [
{em_filter, "1.2.4"}
]}.Usage
Stateless agent
Announces capabilities but does not persist state between queries.
em_filter:start_agent(my_agent, my_handler, #{
capabilities => [<<"search">>, <<"web">>]
}).-module(my_handler).
-export([handle/2]).
handle(Body, Memory) ->
Results = do_search(Body),
{Results, Memory}.Agent with memory (cache)
Memory is useful for caching.
-module(my_handler).
-export([handle/2]).
handle(Body, Memory) ->
Cache = maps:get(cache, Memory, #{}),
case maps:get(Body, Cache, undefined) of
undefined ->
Results = fetch_from_api(Body),
NewCache = Cache#{Body => Results},
{Results, Memory#{cache => NewCache}};
Cached ->
{Cached, Memory}
end.em_filter:start_agent(my_agent, my_handler, #{
capabilities => [<<"search">>],
memory => ets
}).Multi-disco connectivity
An agent connects to every disco node listed in emergence.conf.
Each node gets its own persistent WebSocket connection and worker process.
[em_disco]
nodes = localhost:8080, em-disco.roques.me
With this config, start_agent/3 spawns two workers automatically:
my_agent_server— connected to local disco (index 1)my_agent_server_2— connected to public disco (index 2)
Port and transport resolution:
localhost/127.0.0.1→ port 8080, plain TCP (default)- any other host without port → port 443, TLS (default)
- explicit port 443 → TLS
- any other explicit port → plain TCP
Configuration
The em_disco address is resolved in this order:
[em_disco] nodesinemergence.conf(recommended)EM_DISCO_HOST/EM_DISCO_PORTenvironment variables (legacy, single node)-
Default:
localhost:8080
emergence.conf locations:
-
Linux/macOS:
~/.config/emergence/emergence.conf -
Windows:
%APPDATA%\emergence\emergence.conf
Full example:
[em_disco]
nodes = localhost:8080, em-disco.roques.meConsole output
When running, em_filter logs two events at the notice level:
[em_filter] agent connected: my_agent @ localhost:8080
[em_filter] query: <body>
Connection warnings (auth rejected, timeout, unreachable) are logged at the warning level. OTP startup progress reports are suppressed.
HTML utilities
The following helpers are available for agents that scrape HTML:
| Function | Description |
|---|---|
strip_scripts/1 |
Removes <script> tags |
extract_elements/2 | CSS-style element extraction |
get_text/1 | Strips all HTML tags |
extract_attribute/2 | Extracts a tag attribute value |
clean_text/3 | Strips noise and decodes entities |
decode_html_entities/1 |
Decodes &, &#x...;, &#...; |
should_skip_link/2 | Filters out unwanted URLs |
License
Apache 2.0 — see LICENSE.md.