llama_cpp_sdk logo

LlamaCppSdk

llama_cpp_sdk is the first concrete backend package for the self-hosted inference stack:

external_runtime_transport
  -> self_hosted_inference_core
  -> llama_cpp_sdk
  -> req_llm through published EndpointDescriptor values

It owns the llama-server specifics that do not belong in the shared kernel:

boot-spec normalization
llama-server flag rendering
readiness and health probes
stop semantics for a spawned service
backend manifest publication
OpenAI-compatible endpoint descriptor production

It does not parse OpenAI payloads, token streams, or inference responses. Those stay northbound in req_llm and the calling control plane.

The phase-1 proof fixture also serves /v1/chat/completions with both standard JSON and SSE streaming responses so the published endpoint contract can be exercised honestly by northbound clients.

Current Release Boundary

The first backend release is intentionally narrow and truthful:

supported startup kind: :spawned
supported execution surface: :local_subprocess
non-local execution surfaces: rejected during boot-spec normalization
published protocol: :openai_chat_completions
northbound integration: self_hosted_inference_core
:ssh_exec story: documented as a future additive path once remote model-path semantics, readiness reachability, and shutdown guarantees are verified

Installation

Add the package to your dependency list:

def deps do
  [
    {:llama_cpp_sdk, "~> 0.1.0"}
  ]
end

llama_cpp_sdk depends on self_hosted_inference_core, which in turn depends on external_runtime_transport.

Quick Start

Resolve a spawned endpoint through the shared kernel:

alias LlamaCppSdk
alias SelfHostedInferenceCore.ConsumerManifest

consumer =
  ConsumerManifest.new!(
    consumer: :jido_integration_req_llm,
    accepted_runtime_kinds: [:service],
    accepted_management_modes: [:jido_managed],
    accepted_protocols: [:openai_chat_completions],
    required_capabilities: %{streaming?: true},
    optional_capabilities: %{tool_calling?: :unknown},
    constraints: %{startup_kind: :spawned},
    metadata: %{}
  )

{:ok, resolution} =
  LlamaCppSdk.resolve_endpoint(
    %{
      model: "/models/qwen3-14b-instruct.gguf",
      alias: "qwen3-14b-instruct",
      host: "127.0.0.1",
      port: 8080,
      ctx_size: 8_192,
      gpu_layers: :all,
      threads: 8,
      parallel: 2,
      flash_attn: :auto
    },
    consumer,
    owner_ref: "run-123",
    ttl_ms: 30_000
  )

resolution.endpoint.base_url
resolution.lease.lease_ref

The backend normalizes the boot spec, registers itself with self_hosted_inference_core, and publishes an endpoint descriptor once the service is actually ready.

That published descriptor is the northbound contract used by jido_integration. The caller should execute requests against:

endpoint.base_url <> "/chat/completions" for chat completions
endpoint.headers for bearer auth or other published headers

Supported Boot Fields

The first release supports normalized fields for the installed llama-server CLI surface:

binary_path
launcher_args
model
alias
host
port
ctx_size
gpu_layers
threads
threads_batch
parallel
flash_attn
embeddings
api_key
api_key_file
api_prefix
timeout_seconds
threads_http
pooling
environment
extra_args

See guides/boot_spec.md for the full contract. When api_key_file is provided, llama_cpp_sdk reads it to derive the published authorization header for northbound clients.

Readiness And Health

Readiness is owned here, above the transport seam:

launch the spawned process via external_runtime_transport
probe TCP reachability on the requested host and port
probe HTTP availability on /health or /v1/models
publish the endpoint only after readiness succeeds

Health continues to poll after publication so the shared kernel can expose healthy, degraded, or unavailable runtime truth.

Examples And Guides

Development

Run the normal quality checks from the repo root when your environment allows Mix to create its local coordination socket:

mix format --check-formatted
mix compile --warnings-as-errors
mix test
MIX_ENV=test mix credo --strict
MIX_ENV=dev mix dialyzer
mix docs

License

This repository is released under the MIT License. See LICENSE for the canonical license text and CHANGELOG.md for release history.