NebulaAPI
Transparent, safe cluster-wide APIs for Elixir — compile-time verified, zero-overhead distributed calls.
Define your functions once. The compiler decides what runs where. Calls across nodes look and feel like local function calls.
The model in 30 seconds
A NebulaAPI cluster is a set of nodes (each one an Erlang VM, e.g.
db@db.example). Every node carries one or more tags — arbitrary atoms. No atom is
special; a tag can name a role (:db, :worker), a capability (:cache), or a whole
deployment (:mainframe_cluster, with the worker off in another cloud as
:cloud_worker_lambda). You declare the map once, in config:
# config/config.exs
config :nebula_api,
nodes: [
"api@api.example": [:mainframe_cluster, :api, :cache],
"db@db.example": [:mainframe_cluster, :db, :cache],
"worker@worker.example": [:cloud_worker_lambda, :worker]
]
In your code you pick where things run with two sigils — by capability, or by name:
&tag— any node carrying that tag (picking by capability).&dbreads as "wherever the:dbtag lives"; the&turns the tag atom:dbinto a selector. Tags are lowercase atoms —&db,&cache,&mainframe_cluster.@node— pick a node by name.@workeris the short name (everything before@); when several nodes share it,@workertargets them all — that's a feature, see short vs full names for pinning exactly one.
! negates either one: !&legacy is "every node without the :legacy tag",
!@backup is "every node except @backup". These are selectors — they tell
the compiler which nodes get the real code.
Now write functions and tag each with the selector for where its body belongs:
defmodule MyApp.Users do
use NebulaAPI
# `&db` → the body is compiled only on nodes carrying the :db tag.
# On every other node, the same call becomes transparent RPC to a :db node.
defapi &db, find(id) do
Repo.get(User, id) # %User{} or nil — returned verbatim, no wrapping
end
# A different capability, on different nodes: the cache lives on &cache nodes.
defapi &cache, update_cache(id, user) do
Cachex.put(:users, id, user)
end
end
On a node tagged :db, find/1 is a direct Repo.get; on every other node the same call
dispatches over Erlang distribution to a :db node and hands back the identical value. The
caller never knows which node ran it — and never has to. The body's value comes back as-is,
so you branch on it like any local call:
# Same call on any node:
case MyApp.Users.find(42) do
%User{} = user -> MyApp.Users.update_cache(user.id, user)
nil -> :not_found
end
That update_cache/2 call carries &cache, so by default it resolves on one node —
locally if the caller is a &cache node, otherwise a single &cache worker (the first
registered one; it's a unicast, not a broadcast and not a race). The other&cache nodes
still hold a stale copy. When you mean "reach more than one", say so explicitly:
# every &cache node serving the method
call_on_all_nodes do
MyApp.Users.update_cache(user.id, user)
end
# one specific node
call_on_node @db do
MyApp.Users.update_cache(user.id, user)
end
# every &cache node except @db — multicast, space-juxtaposed selector + negation
call_on_nodes &cache !@db do
MyApp.Users.update_cache(user.id, user)
end
What you get from compile-time
NebulaAPI resolves all routing decisions at compile time. This is not a runtime router — it's a code generator that produces different bytecode for each node. That buys you four things:
No unnecessary deps. Wrap a use, an import, or a child spec in on_nebula_nodes so
it exists only where it belongs:
defmodule MyApp.Cache do
use NebulaAPI
on_nebula_nodes &cache do
import Cachex, only: [put: 3] # only &cache nodes even reference Cachex
end
defapi &cache, update_cache(id, user), do: put(:users, id, user)
end
The non-matching branch is absent from the bytecode, so a non-&cache node never loads
Cachex (gate the dependency itself the same way and it isn't even pulled in).
Smaller binaries. Code that doesn't belong on a node doesn't exist in its binary — a
defapi body is only emitted on matching nodes. Whole dependencies fall away the same way. The
runnable demo pins Cachex to its
db node (on_nebula_nodes @db plus a conditional dep), so only that build carries Cachex
and its dependency tree (~570 KB); every other node never compiles it and comes out
~38% smaller — ≈860 KB vs the db node's 1.4 MB (measured, per-node _build from
mix compile). Your web node doesn't carry FFmpeg bindings; your worker doesn't carry
Phoenix routes.
Compile-time safety. Reference a tag or node that isn't in your topology and the build stops — no silent RPC into the void:
defapi @nope, f() do ... end
** (CompileError) Unknown nodes in defapi call :
- @nope
Available nodes :
- @api
- @:"api@api.example"
- @db
- @:"db@db.example"
- @worker
- @:"worker@worker.example"
The :nebula compiler goes one further: an app with defapi modules but no
nebula_api_server() wired in fails to compile, instead of silently shipping workers that
never register:
Found 1 module(s) using NebulaAPI with local methods in app :my_app, but no
nebula_api_server() has been found in :my_app's supervisor — their RPC workers
will never start.
App: :my_app
Application: MyApp.Application
^------ hint: add nebula_api_server() to its supervisor's children
Modules using NebulaAPI (with local methods on this node):
- MyApp.Users
Zero runtime overhead. A locally-resolved call is a direct function call — no routing table, no RPC serialization, just a couple of process-dictionary reads to check for an active routing context. Measured, that's ~60 ns versus ~8 ns for a plain call (see Performance) — about 0.00005 ms of overhead, free in any practical sense. The decision was made once, at compile time.
"Compile per release" — the one mental shift. NebulaAPI produces different bytecode per node, so each release is its own build. For Elixir devs used to a single runtime artifact, that's the surprising part. In practice it's one extra
elixir --name node@host -S mix compileper release — a few seconds of CI, paid back many times over in smaller binaries, fewer dependencies, and zero routing overhead.
How it works
Same source, different bytecode. Each release is compiled with its target node name (the
compiler reads node()), so a &db body is real code on a node that has :db and an
RPC stub everywhere else — the stub routes through :pg process groups to a node that
does have the body.
📊 Diagram
┌─────────────────────────────────────────────────────────┐
│ Source code (same) │
│ │
│ defapi &db, find_user(id) do │
│ Repo.get(User, id) │
│ end │
└────────────────────┬────────────────────────────────────┘
│
┌──────────┴──────────┐
│ mix compile │
│ --name node@host │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ @alpha │ │ @beta │
│ (has &db) │ │ (no &db) │
├─────────────┤ ├─────────────┤
│ find_user/1 │ │ find_user/1 │
│ → Repo.get │ │ → RPC call │
│ (local) │ │ (remote) │
└─────────────┘ └──────┬──────┘
│
:pg process groups
│
┌──────▼──────┐
│ @alpha │
│ Worker │
│ Repo.get │
└─────────────┘
Reshape your topology without touching code
This is why NebulaAPI exists: the flexibility of umbrella releases, without rewriting code every time you split a node out or stand up a new release. The same source ships as one node or many — you change config and which releases you build, nothing else.
# dev — one node wears every hat, a single release, every call local
nodes: ["dev@localhost": [:api, :db, :worker, :cache]]
# staging — pull the database onto its own node
nodes: [
"app@app.staging": [:staging_cluster, :api, :worker, :cache],
"db@db.staging": [:staging_cluster, :db, :cache]
]
# prod — scale the workers out, keep one db; w3 lives in another cloud
nodes: [
"app@app.prod": [:mainframe_cluster, :api, :cache],
"worker@w1.prod": [:mainframe_cluster, :gpu],
"worker@w2.prod": [:alpha_cluster, :llm],
"worker@w3.prod": [:cloud_worker_lambda, :gpu, :storage],
"db@db.prod": [:mainframe_cluster, :db, :cache]
]
Moving :db off the app node, or fanning workers across three machines, is a config change
and a rebuild — never a code change. And the tags follow how you actually think about the
fleet: the three workers share the short name worker@ (so @worker hits all of them
without any :worker tag), the deployment tag varies by environment and even by node
(worker@w3.prod is :cloud_worker_lambda — off in another cloud), and the capability
tags (:gpu, :llm, :storage) carve out which worker you mean (@worker &gpu). A tag
is just a label; slice the cluster however suits you.
Installation
Add :nebula_api to your deps — from Hex:
def deps do
[
{:nebula_api, "~> 0.5"}
]
end
Or track the repo directly (e.g. for an unreleased fix):
def deps do
[
{:nebula_api, git: "git@github.com:podCloud/NebulaAPI.git", tag: "v0.5.0"}
]
end
Quick start
1. Define your cluster topology
# config/config.exs
config :nebula_api,
nodes: [
"api@api.example": [:mainframe_cluster, :api],
"db@db.example": [:mainframe_cluster, :db],
"worker@worker.example": [:alpha_cluster, :worker]
]
Each key is a full node name (short@host); each value is a list of capability
tags (see the model above). In selectors you can
use the short name: @db matches :"db@db.example", @worker matches
:"worker@worker.example" — when there's no ambiguity, short names are all you
need.
2. Define distributed functions
defmodule MyApp.Users do
use NebulaAPI
# Body compiles on &db nodes. Everywhere else: transparent RPC.
defapi &db, find(id) do
Repo.get!(User, id)
end
end
3. Wire a server into each app's supervision tree
defmodule MyApp.Application do
use Application
use NebulaAPI.Server
def start(_type, _args) do
Supervisor.start_link([nebula_api_server()], strategy: :one_for_one, name: MyApp.Sup)
end
end
use NebulaAPI.Server brings the nebula_api_server/0 macro into scope (plus the
on_nebula_nodes / call_on_* macros) — without the defapi bookkeeping, since the host
module defines none of its own. Use it on the module that wires the server; use
use NebulaAPI on the modules that actually define defapi endpoints.
nebula_api_server() discovers the app's own modules that use NebulaAPI and starts a
supervised GenServer worker for each one that has local methods on this node; each worker
registers in :pg process groups for discovery across nodes. No module list to maintain —
and because the server lives in the app's own tree, its workers die with the app (so :pg
never holds stale entries).
Optional: guard against forgetting it
Add the :nebula compiler to catch a missing nebula_api_server() at compile time:
def project do
[
# ...
compilers: Mix.compilers() ++ [:nebula]
]
end
If an app has modules with local methods but no nebula_api_server() wired into its
supervisor, mix compile fails with an explanatory error — the same spirit as the
compile error raised for a defapi targeting an unknown node.
4. Compile with the target node name
With the code and server in place, compile each release as the node it will run as —
NebulaAPI keys its codegen on node() at compile time, which you set with the --name
flag on mix compile:
elixir --name api@api.example -S mix compile && mix release api
Forget --name and the build stops with a clear CompileError (node() would be
nonode@nohost — the name isn't unknown, it's unset, so allow_unknown_self_node
won't paper over it). Set allow_nonode_nohost: true if you really mean a nameless
generic build.
Build each release in its own stage, pinning the compile-time node name:
# api release — compiled as node api@api.example
RUN elixir --name api@api.example -S mix compile && mix release api
# worker release — separate stage, compiled as node worker@worker.example
RUN elixir --name worker@worker.example -S mix compile && mix release worker
Then each release must boot as that same node name. That's a separate, runtime
concern, handled by Mix release's own env vars
— RELEASE_NODE (the node name) and RELEASE_DISTRIBUTION (name for fully-qualified
names across hosts; the default is sname):
# at run time, in the api container
RELEASE_DISTRIBUTION=name RELEASE_NODE=api@api.example bin/api start
The compile-time --name and the runtime RELEASE_NODEmust match — that's the whole
contract: the routing was decided for api@api.example at build, so the release has to
actually be api@api.example when it runs. NebulaAPI enforces it: if the running node
differs from the one the release was compiled as, the server crashes at boot with a
clear message rather than misrouting silently — unless you opt into running it as a
generic node. (RELEASE_NODE defaults to
<release_name>@… with short-name distribution, so set it explicitly to get the
fully-qualified name.)
In dev/test, you typically don't start the VM with --name. Use
default_opts to tell the compiler which node to pretend to be:
# config/dev.exs
config :nebula_api,
default_opts: [self_node: :"api@api.example"]
5. Call it — local or remote, same API
# On @db (has &db) → local Repo.get!
MyApp.Users.find(42)
#=> %User{id: 42, ...}
# On @worker (no &db) → transparent RPC to a &db node
MyApp.Users.find(42)
#=> %User{id: 42, ...}
Selectors
Selectors tell the compiler which nodes get the real implementation. Every other node gets a stub in its place — a generated function that forwards the call over RPC to a node that does have the body.
| Syntax | Meaning |
|---|---|
&tag | Nodes with this tag |
!&tag | Nodes without this tag |
@node | Specific node (short or full name) |
!@node | All nodes except this one |
| (no selector) | Every node — the body is local everywhere |
Combine selectors by juxtaposing them with a space — no commas between them, no
brackets. This is the canonical NebulaAPI syntax, and it's what keeps the code readable
(&db !@backup reads as "a :db node, but not @backup"):
# Nodes with the :db tag, excluding @backup
defapi &db !@backup, run_migration(version) do
Ecto.Migrator.run(Repo, :up, to: version)
end
# Specific node only
defapi @worker, transcode(input, opts) do
FFmpex.new_command()
|> FFmpex.add_input_file(input)
|> FFmpex.add_output_file(opts[:output])
|> FFmpex.execute()
end
# No selector → the body is local on every node, each returning its own data
defapi get_node_health() do
%{node: node(), uptime: :erlang.statistics(:wall_clock) |> elem(0)}
end
Short vs full names
In config, node names are full Erlang names — short@host. In a selector you can use just
the short part (everything before @), which keeps call sites readable:
# Equivalent when only one node is named "db@…":
defapi @db, do_something() do ... end
defapi @:"db@db.example", do_something() do ... end # full name as an atom
The full-name form is @:"name@host" (an atom, because of the @) — and !@:"name@host"
to negate it.
The short name is intentionally "many": that's a feature. A short name matches every
node that shares it, which is usually exactly what you want for a horizontally-scaled role.
Picture three nodes running the same worker release on three hosts (as the
runnable demo does), each kitted out
differently:
"worker@worker1.test": [:alpha_cluster, :gpu, :storage],
"worker@worker2.test": [:beta_server, :llm],
"worker@worker3.test": [:alpha_cluster, :vps]
@worker targets all three — every node whose release name is worker, across hosts,
whatever capability tags they happen to carry. To pin exactly one, reach for its full name:
@:"worker@worker2.test".
What gets generated
For each defapi, the macro generates:
<name>/N— the public router callers actually invoke.__nbapi_remote_<name>/N— RPC dispatch viaAPIServer, on every node.__nbapi_local_<name>/N— the real body, on matching nodes only. Elsewhere nothing is emitted: the router goes remote there, so there's no stub to keep.
The remote function is generated on every node, including nodes
that have the local implementation. This is what makes call_on_node
and call_on_nodes work from anywhere — even a &db node can call
other &db nodes remotely for quorum writes, load distribution, etc.
Router and priorities
The public router on each defapi decides where a call goes, from the default outward —
the more explicit you get, the more it wins. Take the same call, MyApp.Cache.get(key):
- Default —
MyApp.Cache.get(key)runs locally if this node serves the method, otherwise a single remote call (unicast). - Wrapped in a block — the same call inside
call_on_nodes &cache do … endroutes per the block instead. - Its own trailing opts win over the block —
MyApp.Cache.get(key, multicast: true)routes itself, even inside a block; a routing key set tonil/falseopts the call back out to the default.
Default unicast goes to the first node on the :pg list that serves the method — never
the others. Concretely that's the first node serving the API that connected to NebulaAPI
(joined the method's :pg group); that's the only node that runs the call. No fan-out, no
load-balancing by default. Membership is live, though: if that node drops, :pg removes it,
so the next call simply lands on whoever is now first among the nodes still connected. (Want
several nodes at once, a specific one, a random one, or a load-aware pick? That's
runtime routing.)
on_nebula_nodes — conditional compilation
Include or exclude entire blocks of code based on the current node.
Unlike defapi, this works at any level — module body, use
directives, supervision trees:
defmodule MyApp.Repo do
use NebulaAPI.AST
# Only connect to the database on &db nodes.
# Other nodes don't even load Ecto.
on_nebula_nodes &db do
use Ecto.Repo, otp_app: :my_app
end
end
defmodule MyApp.Application do
use NebulaAPI.AST
# Start the FFmpeg pool only on worker nodes
on_nebula_nodes &worker do
def extra_children, do: [MyApp.TranscoderPool]
else
def extra_children, do: []
end
end
The non-matching branch is completely absent from the compiled bytecode. A module that
does only this can use NebulaAPI.AST — the lightest entry point, no defapi bookkeeping.
Runtime routing
The selector on a defapi is the default route. Sometimes you need to override it at
runtime — send one call to a specific node, fan it out to several, or pick a node by load.
Three macros wrap a block to do that, named after how far the call goes:
call_on_node— unicast: run on exactly one node.call_on_nodes— multicast: run on every node a selector matches.call_on_all_nodes— broadcast: run on every node that serves the method.
call_on_node — unicast
# Force execution on a specific node
call_on_node @worker do
MyApp.Jobs.transcode(file, opts)
end
# Pick a node dynamically based on runtime info — least loaded
call_on_node fn nodes_info ->
nodes_info
|> Enum.filter(fn {_, info} -> info.connected && info.runtime end)
|> Enum.min_by(fn {_, info} -> info.runtime.memory_percent end)
|> elem(0)
end do
MyApp.HeavyTask.run()
end
# Or just pick one at random
call_on_node fn nodes_info -> nodes_info |> Map.keys() |> Enum.random() end do
MyApp.Jobs.transcode(file, opts)
end
call_on_nodes — multicast
# Call all &worker nodes, wait for all results
call_on_nodes &worker, strategy: :all, timeout: 30_000 do
MyApp.Jobs.health_check()
end
# First to respond wins
call_on_nodes &worker, strategy: :first do
MyApp.Jobs.transcode(file, opts)
end
# Quorum: a strict majority of the configured &db nodes must succeed (the default).
# A single live node out of three configured refuses — that's the point of a quorum.
call_on_nodes &db, strategy: :quorum do
MyApp.Users.write_replica(user)
end
# A selector function over live node info — fan out only to nodes seen recently
call_on_nodes fn nodes_info ->
cutoff = DateTime.add(DateTime.utc_now(), -30, :second)
nodes_info
|> Enum.filter(fn {_, i} -> i.last_seen_at && DateTime.compare(i.last_seen_at, cutoff) == :gt end)
|> Enum.map(&elem(&1, 0))
end, strategy: :all do
MyApp.Cache.invalidate(:all)
end
call_on_all_nodes — broadcast
call_on_all_nodes timeout: 5_000 do
MyApp.Cache.invalidate(:all)
end
Multicast strategies
Results are always tagged per node — {node, value} on success,
{node, {:nebula_error, reason}} for a node whose call failed at the transport level.
| Strategy | Behavior |
|---|---|
:all | Wait for every node (or timeout). Returns a list of {node, value}. |
:first | Return the first response that counts as a success (then stop waiting on the rest — the pending tasks are brutal-killed); {:nebula_error, :no_success, results} if none. |
:quorum | Wait for a strict majority of the quorum set, or an exact at_least: count. The set is the configured nodes serving the method (quorum: :configured, the default — connected or not, so a single live node can't pass a 3-node quorum) or the connected workers (quorum: :available). The moment the quorum is reached it stops waiting on the rest (same brutal-kill as :first); fails fast (:quorum_unreachable) when the live set can't reach it. |
"Stops waiting" is exactly that: once you have what you asked for (a first success, or the quorum), the rest is just wasted waiting — so NebulaAPI kills the local tasks still awaiting a reply and discards their late responses. A body that already started running on a remote node isn't aborted — the RPC was already sent.
:first and :quorum let you define what counts as a success with a success: (or
failure:) predicate — by default, any node that responded counts:
# A write quorum that only accepts {:ok, _} replies
call_on_nodes &replica, strategy: :quorum, success: &match?({:ok, _}, &1) do
MyApp.Store.write(key, value)
end
Node info and intelligent routing
call_on_node and call_on_nodes accept selector functions that
receive live runtime data about every node:
%{
short_name: :db,
long_name: :"db@db.example",
host: "db.example",
tags: [:mainframe_cluster, :db],
connected: true,
last_seen_at: ~U[2024-06-15 12:00:00Z],
runtime: %{
memory_used_mb: 256,
memory_total_mb: 1024,
memory_percent: 25.0,
process_count: 1542,
schedulers: 8,
otp_release: "26",
uptime_seconds: 86400
}
}
A node whose worker just registered but isn't in the background snapshot yet still appears,
with runtime: nil / last_seen_at: nil until the next refresh — so filter on
info.runtime before reading through it.
# Route to the node with the most headroom
call_on_node fn nodes_info ->
nodes_info
|> Enum.filter(fn {_, info} -> info.connected && info.runtime end)
|> Enum.min_by(fn {_, info} -> info.runtime.memory_percent end)
|> elem(0)
end do
MyApp.HeavyTask.run()
end
# Only call nodes seen in the last 30 seconds
call_on_nodes fn nodes_info ->
cutoff = DateTime.add(DateTime.utc_now(), -30, :second)
nodes_info
|> Enum.filter(fn {_, info} ->
info.last_seen_at && DateTime.compare(info.last_seen_at, cutoff) == :gt
end)
|> Enum.map(&elem(&1, 0))
end do
MyApp.Cache.invalidate(:all)
end
Return values
NebulaAPI never wraps your return value. A defapi body returns exactly what it
computed — local or over RPC, the result is identical:
defapi &db, find(id) do
Repo.get(User, id) # returns %User{} or nil
end
find(1) #=> %User{...}
find(999) #=> nil
# Tuples you return yourself are passed through untouched, including your own
# {:ok, _} / {:error, _}:
defapi &db, create(attrs) do
Repo.insert(User.changeset(attrs)) # {:ok, user} or {:error, changeset}
end
create(%{name: "Ada"}) #=> {:ok, %User{...}}
create(%{}) #=> {:error, %Ecto.Changeset{...}}
The one value the library does inject is a :nebula_error tuple — a library or
transport failure (a timeout, no worker available, a crashing body, a quorum that wasn't
reached), never a business outcome. So any :ok / :error you ever see is yours, and
you never have to guess whether an {:error, _} came from your code or the framework. An
exception, throw or exit escaping a body is reported the same way — identically whether the
body ran locally or remotely.
Its shape depends on the scope of the failure. A single-node failure (unicast, or one
node inside a multicast result) is the 2-tuple {:nebula_error, reason}. A whole-call
multicast failure carries an extra element with the partial results — {:nebula_error, :no_success, results}, {:nebula_error, :quorum_not_reached, results},
{:nebula_error, :quorum_unreachable, %{workers: n, required: m}} (see
Calling → multicast results). Match the 3-tuples when
you handle a :first / :quorum call's top-level outcome, not just {:nebula_error, _}.
Wrap any single-node library
Here's the pattern that tends to click: NebulaAPI turns any single-node
library into a cluster-wide one without touching the library. No fork, no
monkey-patch — just a few lines of defapi that delegate to it on a chosen
node.
If you've ever thought "I'd love to use Cachex / a counter / a cron here, but
its state is per-node, so now I need Redis / a shared DB / :global locks…" —
this is the escape hatch. The library stays exactly as it is. You pin it to one
node and wrap it.
# Cachex runs only on the @cache node; every node shares one cache through the wrapper.
defmodule MyApp.Cache do
use NebulaAPI
defapi @cache, get(key), do: Cachex.get(:app_cache, key)
defapi @cache, put(key, value), do: Cachex.put(:app_cache, key, value)
end
Any node calls MyApp.Cache.get/1; it resolves locally on @cache and routes
transparently everywhere else. One shared cache, no Redis. The same trick gives you
cluster-wide rate limiters, counters, run-once-per-cluster schedulers, singleton
coordinators, and feature-flag stores.
An honest caveat. This is great for values read often and invalidated rarely (dynamic config, reference data). But for a hot path doing thousands of reads per second per node, every read becomes an RPC round-trip — that's the wrong use, and a real distributed cache (Redis, or
:mnesia) stays better. NebulaAPI is the right tool when the access pattern fits, not a universal replacement for a distributed cache.
Worked example: a 3-role cluster
Three nodes, three roles — an API front, a database node, and a worker:
config :nebula_api,
nodes: [
"api@api.example": [:mainframe_cluster, :api],
"db@db.example": [:alpha_server, :db],
"worker@worker.example": [:mainframe_cluster, :gpu]
]
Data access — &db nodes only
defmodule MyApp.Users do
use NebulaAPI
defapi &db, get(id) do
Repo.get(User, id)
end
defapi &db, list(filters \\ []) do
User |> where_filters(filters) |> Repo.all()
end
# A plain def — no defapi: keep utils and pure business logic local, on every release.
def user_name(%User{nickname: name}), do: name
# Helper only exists on &db nodes
on_nebula_nodes &db do
defp where_filters(query, filters) do
Enum.reduce(filters, query, fn {k, v}, q -> where(q, [u], field(u, ^k) == ^v) end)
end
end
end
Background jobs — @worker only
defmodule MyApp.Jobs do
use NebulaAPI
# @worker targets the worker node by its (short) name — no :worker tag needed.
defapi @worker, transcode(input, opts) do
FFmpex.new_command()
|> FFmpex.add_input_file(input)
|> FFmpex.add_output_file(opts[:output])
|> FFmpex.execute()
end
# @worker AND &gpu — a faster path that only the GPU-equipped workers carry.
defapi @worker &gpu, quick_transcode(input, opts) do
GpuTranscoder.run(input, opts)
end
end
Conditional application setup
defmodule MyApp.Application do
use Application
use NebulaAPI.Server
def start(_type, _args) do
# Only the &db node starts the Repo; everyone runs the nebula server.
children =
[nebula_api_server()] ++
on_nebula_nodes &db do
[MyApp.Repo]
else
[]
end
Supervisor.start_link(children, strategy: :one_for_one, name: MyApp.Sup)
end
end
Cross-node calls from a web controller
defmodule MyAppWeb.UserController do
def show(conn, %{"id" => id}) do
# "Just works" on any node. Local on @db, RPC everywhere else.
# get/1 returns the struct (or nil) directly — no wrapping.
case MyApp.Users.get(id) do
%MyApp.User{} = user -> render(conn, :show, user: user)
nil -> send_resp(conn, 404, "Not found")
end
end
def transcode(conn, %{"path" => path}) do
# Explicitly route to a worker, even if we have the code locally
call_on_node @worker do
MyApp.Jobs.transcode(path, output: "/tmp/out.mp3")
end
end
end
When NOT to use NebulaAPI
Being honest about the edges:
- External clients. If the caller isn't a node in your Erlang cluster — a public web client, a non-Elixir mobile app — gRPC or REST is still the right boundary. NebulaAPI is for intra-cluster calls.
- Node names unknown at build time. NebulaAPI needs your node names and tags in
config when you compile. The nodes themselves can come up and go down freely at
runtime — workers register and drop through
:pg, and selectors only ever route to what's actually connected. What it can't handle is a node whose name wasn't known at build time: an unbounded fleet of randomly-named pods has no compiled identity to route to — though a fixed, generic caller node is easy (see generic nodes). Scaling the count of known roles is fine; minting brand-new node identities at runtime is not. - Topologies whose roles change at runtime. Adding a wholly new tag or node name to the cluster means a recompile — NebulaAPI decided the routing at build time. Bringing more instances of an existing role online needs nothing but starting them.
Performance
Measured by bench/routing.exs on OTP 26 (run it yourself with
elixir --name bench@127.0.0.1 --cookie nebula_bench -S mix run bench/routing.exs):
| Call | Per call |
|---|---|
| Plain local Elixir call (baseline) | ~8 ns |
| NebulaAPI, resolved local | ~60 ns |
| Cross-node round-trip, same host (loopback) | ~50 µs |
The point: a locally-resolved NebulaAPI call adds only a handful of nanoseconds over a
plain call — a couple of process-dictionary reads and a cond — so it's free in any
practical sense. A cross-node call is a standard Erlang-distribution round-trip; the ~50 µs
above is loopback (same host), and over a real network you pay link latency on top
(commonly ~0.2–2 ms). Either way the rule of thumb holds: resolve local whenever you can,
and a cross-node hop costs roughly what a distributed GenServer.call costs — no more.
Configuration reference
config :nebula_api,
# Required: cluster topology — tags per node.
# Used at compile time to decide what code goes where.
nodes: [
"api@api.example": [:mainframe_cluster, :api, :cache],
"db@db.example": [:mainframe_cluster, :db, :cache],
"worker@worker.example": [:cloud_worker_lambda, :worker]
],
# Optional: override node identity for dev/test.
# In production, compile with: elixir --name node@host -S mix compile.
# default_opts also accepts inherited defaults for every `use NebulaAPI` module:
# max_concurrent_calls: and default_timeout:.
default_opts: [self_node: :"api@api.example"],
# Optional: global default timeout (ms) for remote calls.
# Per-call timeout: > per-module default_timeout: > this > 5000.
default_timeout: 5_000,
# Optional: how often (ms) each node's background NodesInfoCache rebuilds
# the node-info snapshot served to selector functions.
nodes_info_refresh_interval: 5_000
Generic nodes: serve nothing, call everything
A release is normally tied to one node: it must run as the node it was compiled for (see
the boot policy). A generic node is the
exception — a node that serves nothing (no workers, registers nothing in :pg) and routes
everydefapi call remotely. To actually reach the cluster it must be distributed (a
real name@host); a nonode@nohost build can't join a cluster (Node.connect is a no-op
there), so it stays inert — safe, but it calls no one. Two ways to get one:
1. A dedicated server-less build (allow_nonode_nohost). Set the flag and compile
without--name, so node() is nonode@nohost and every defapi compiles as a pure
remote stub — no local bodies, no server, the smallest binary:
config :nebula_api, nodes: [ ...the real cluster nodes... ], allow_nonode_nohost: true
mix compile && mix release console # no --name → a generic, server-less build
The flag registers nonode@nohost as an empty, tagless node so the build compiles cleanly
(you can't list it in :nodes yourself — it's reserved; the flag is the only way to admit
it). Run it as nonode@nohost and it's inert; launch it under a real name to make it a
connected, calls-everything client.
2. Any build, repurposed. No dedicated build on hand? Boot an existing release (a
worker, an api) under a node name that isn't the one it was compiled for. It serves
nothing and routes every call remote just the same — you only carry the extra local bodies
that build happens to contain.
Either way, launching under a name that isn't the compiled one is a node mismatch, so you opt
in with ALLOW_RUNTIME_NEBULA_NODE_MISMATCH=1 (keep allow_nonode_nohost in the build that
wants it, not the shared cluster config). The operational recipe — a prod console, a debug
shell — is in
Calling → spawning a generic node.
But wait — how do the nodes actually connect?
NebulaAPI decides what code goes where; it does not form the Erlang cluster. That's
deliberate — clustering is your call, and the library stays agnostic. All it needs is that
the nodes are connected Erlang nodes (so :pg syncs and distribution RPC flows); how they
find each other is entirely up to you. Anything that ends up calling Node.connect/1 works:
- libcluster — the usual answer. Pick a strategy
for your environment:
Gossipon a flat network,Kubernetes/Kubernetes.DNSon k8s,EpmdDNSbehind a headless service, or a staticEpmdlist for a fixed fleet. Point its topology at the same node names you put inconfig :nebula_api, :nodes. (The runnable demo does exactly this with libcluster'sEpmdstrategy over a Docker network.) - Plain epmd +
Node.connect/1— for a handful of known hosts, a fewNode.connectcalls at boot (or-kernel sync_nodes_mandatory ...invm.args) are enough. - Anything else — a custom strategy, a service-discovery hook, manual connects from a
release
env.sh. NebulaAPI never looks; it only ever readsnode()and:pg.
Two practical notes: share the same cookie across the cluster, and use long names
(name@host, RELEASE_DISTRIBUTION=name) so the running node names match what you compiled
for. Once the nodes are connected, NebulaAPI's workers register in :pg and routing just
works.
Architecture
Two halves: a compile-time code generator (AST.Parser / AST.Builder / Config,
which fail the build on an unknown tag or node) and a small runtime layer
(NebulaAPI.Server per app starting a Worker per locally-served module, APIServer
holding the :pg routing and the node-info ETS cache).
📊 Diagram
┌─────────────────────────────────────────────────────┐
│ Compile time │
│ │
│ AST.Parser parses selectors (&tag, @node, !&) │
│ AST.Builder generates the defapi functions │
│ Config resolves nodes, validates topology │
│ → CompileError on unknown tag/node │
└─────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Runtime │
│ │
│ NebulaAPI.Server per-app supervisor; starts one │
│ Worker per locally-served module │
│ (wired via nebula_api_server()) │
│ APIServer :pg routing + node-info ETS cache │
│ APIServer.Worker per-module GenServer; registers │
│ its methods in :pg │
│ :pg groups worker discovery across nodes │
└─────────────────────────────────────────────────────┘
Documentation
This README is the whole picture. The docs/ pages go deeper, in the order you
meet each theme:
- Configuration — nodes, tags, topology, compile-per-node, dev/test, validation
- Defining APIs — the three
usemacros,defapi, selectors, return values,on_nebula_nodes, wiring the server - Calling across nodes — calling endpoints,
call_on_*, multicast strategies, node-info routing, wrapping single-node libraries, spawning a generic node - Gotchas and troubleshooting — trailing opts, process scope, the
nil-selector distinction, common errors
Deep dive:
- AST deep-dive — how the per-node code is generated
License
MIT