self_hosted_inference_core logo

SelfHostedInferenceCore

self_hosted_inference_core is the service-runtime kernel for local and self-hosted inference backends.

It owns the runtime concerns that sit between raw process placement and backend-specific boot logic:

backend registration
runtime instance registration
startup-kind handling
readiness orchestration
health monitoring
lease and reuse semantics
endpoint publication
backend-to-consumer compatibility calculation

It does not own transport mechanics or client protocol execution. external_runtime_transport owns process placement and IO lifecycle. req_llm remains the data-plane client after an endpoint has been resolved.

Runtime Stack

external_runtime_transport
  -> self_hosted_inference_core
  -> concrete backend package or attach adapter
  -> req_llm consumers through EndpointDescriptor

That split keeps service lifecycle in the runtime stack and keeps request execution in the client layer.

Backends

Two backend shapes are now proved:

built-in attach adapter: SelfHostedInferenceCore.Ollama
concrete spawned backend package: llama_cpp_ex

SelfHostedInferenceCore.Ollama proves the first truthful management_mode: :externally_managed path. It attaches to an already running Ollama daemon, owns readiness and health interpretation above the transport seam, and publishes the same northbound endpoint contract used by the spawned path.

llama_cpp_ex plugs into the kernel by implementing SelfHostedInferenceCore.Backend and owns:

llama-server boot-spec normalization
readiness and health probes
stop semantics for a spawned service
backend manifest publication
endpoint descriptor production

That keeps the kernel generic while proving both ownership shapes on real backends.

Startup Kinds

self_hosted_inference_core treats startup topology as an explicit part of the contract:

:spawned
- BEAM-managed service lifecycle
- maps to management_mode: :jido_managed
:attach_existing_service
- externally managed daemon lifecycle
- maps to management_mode: :externally_managed

Both paths use the same northbound endpoint and lease contracts. The kernel validates that backends keep startup kind, management mode, and transport ownership truthful. It also rejects execution surfaces that are not declared in the backend manifest.

Installation

Add the package to your dependency list:

def deps do
  [
    {:self_hosted_inference_core, "~> 0.1.0"}
  ]
end

Concrete backends register themselves against the kernel by implementing SelfHostedInferenceCore.Backend.

See guides/backend_packages.md for how the kernel expects concrete backend packages to attach. See guides/ollama_attach.md for the built-in attached-local backend.

Quick Start

Define a backend or attach adapter, register it, and ensure a northbound endpoint for a request:

alias SelfHostedInferenceCore.ConsumerManifest

:ok = SelfHostedInferenceCore.register_backend(MyBackend)

consumer =
  ConsumerManifest.new!(
    consumer: :jido_integration_req_llm,
    accepted_runtime_kinds: [:service],
    accepted_management_modes: [:jido_managed, :externally_managed],
    accepted_protocols: [:openai_chat_completions],
    required_capabilities: %{streaming?: true},
    optional_capabilities: %{},
    constraints: %{},
    metadata: %{adapter: :req_llm}
  )

request = %{
  request_id: "req-123",
  target_preference: %{
    target_class: "self_hosted_endpoint",
    backend: "my_backend",
    backend_options: %{model_identity: "demo-model"}
  }
}

context = %{
  run_id: "run-123",
  attempt_id: "run-123:1",
  boundary_ref: "boundary-123",
  observability: %{trace_id: "trace-123"}
}

{:ok, endpoint, compatibility} =
  SelfHostedInferenceCore.ensure_endpoint(
    request,
    consumer,
    context,
    owner_ref: "run-123",
    ttl_ms: 30_000
  )

endpoint.base_url
endpoint.lease_ref
compatibility.reason

See examples/README.md for runnable demos covering both :spawned and :attach_existing_service.

HexDocs

HexDocs includes:

architecture and stack-boundary guidance
built-in ollama attach guidance
concrete backend package guidance
the northbound endpoint contract used by jido_integration
runtime registry and lease semantics
startup-kind guidance for spawned and attached services
runnable examples

License

Released under the MIT License. See LICENSE.