BatchServing

BatchServing is a fork of Nx.Serving focused on generic concurrent batch work.

It lets many callers submit work independently, then transparently groups calls arriving in the same time window into one batch, executes that batch once, and returns the right slice of results to each caller.

This is useful when you have high fan-in workloads (for example from web/API requests, jobs, or pipelines) where per-call execution is expensive but batched execution is efficient.

Installation

Add batch_serving to your dependencies:

def deps do
  [
    {:batch_serving, "~> 1.0.0"}
  ]
end

Core idea

Define a serving function that receives a list of values.
Start a serving process with batch_size and batch_timeout.
Use BatchServing.dispatch/2 for single items and BatchServing.dispatch_many/2 for explicit batches.
Calls close together in time are merged and executed once.

Quick start

1) Define and start a serving

children = [
  BatchServing.create_serving_process_group_spec(),
  {BatchServing,
   serving: BatchServing.new(fn values ->
     Enum.map(values, &(&1 * &1))
   end),
   name: MyServing,
   batch_size: 10,
   batch_timeout: 100}
]

Supervisor.start_link(children, strategy: :one_for_one)

batch_size is the max combined size per execution. batch_timeout (ms) is how long to wait for more calls before dispatching.

2) Submit work

BatchServing.dispatch_many!(MyServing, [2, 3])
#=> [4, 9]
BatchServing.dispatch!(MyServing, 2)
#=> 4

BatchServing.dispatch_many!(MyServing, ["2"])
#=> !!!! exit !!!!!
BatchServing.dispatch!(MyServing, "2")
#=> !!!! exit !!!!!

BatchServing.dispatch_many(MyServing, [2, 3])
#=> {:ok, [4, 9]}
BatchServing.dispatch(MyServing, 2)
#=> {:ok, 4}

BatchServing.dispatch_many(MyServing, ["2"])
#=> {:error, _}
BatchServing.dispatch(MyServing, "2")
#=> {:error, _}

From multiple concurrent callers:

Task.async_stream(1..20, fn n ->
  BatchServing.dispatch(MyServing, n)
end, max_concurrency: 20)
|> Enum.to_list()

Each caller gets its own result, but execution is internally batched.

API overview

BatchServing.new/1 - Build a serving from a function.
BatchServing.map_inputs/2 - Map complex input into value lists.
BatchServing.map_results/2 - Map execution results into caller-facing shapes.
BatchServing.inline/2 - Run a single item inline.
BatchServing.inline_many/2 - Run an explicit batch inline.
BatchServing.dispatch/2 - Send a single item to a serving process for coalesced execution.
BatchServing.dispatch_many/2 - Send an explicit batch to a serving process.

Advanced options

partitions: n - Run multiple partitions for parallel batch execution.
streaming/2 - Stream batch results/events.

Hooks (Streaming Runtime Events)

Hooks are useful when you need live runtime signals in addition to final batch output.

When streaming is enabled:

{:batch, output} events carry result data.
hook events carry custom telemetry/progress data in the shape {hook_name, payload}.

Example:

serving =
  BatchServing.new(MyServingModule, :ok)
  |> BatchServing.streaming(hooks: [:progress])

BatchServing.dispatch_many!(MyServing, inputs)
|> Enum.each(fn
  {:progress, meta} -> IO.inspect(meta, label: "progress")
  {:batch, output} -> IO.inspect(output, label: "batch")
end)

Practical use case:

embeddings/indexing pipeline where users need live progress, latency, and cost updates while batches are running.

Multi-user note:

if a single batch contains items from multiple users, batch-level metrics are not directly user-attributable.
use one of the three approaches below to keep per-user reporting meaningful.

See detailed guide and examples:

Notes

Start BatchServing early in your supervision tree so dependent processes can use it.
inline/2 executes one item immediately in the caller process.
inline_many/2 executes one explicit batch immediately in the caller process.
dispatch/2 coalesces single-item calls across callers.
dispatch_many/2 coalesces explicit batch calls across callers.