whisper_cpp

A thin Elixir wrapper around whisper-rs, the Rust bindings to whisper.cpp. It exposes whisper.cpp speech-to-text to the BEAM through a Rustler NIF: load a model, hand it 16 kHz mono f32 PCM, get structured segments back. No subprocess, no Python, no temporary files.

Installation

def deps do
  [{:whisper_cpp, "~> 0.1.0"}]
end

Installation downloads a precompiled NIF for your target from the project's GitHub releases - no Rust toolchain needed. Requires Elixir 1.19+.

Usage

{:ok, model} = WhisperCpp.load_model("models/ggml-large-v3.bin")

# Decode upstream (ffmpeg, bumblebee, ...) into 16 kHz mono f32 PCM:
#   ffmpeg -i jfk.wav -f f32le -ac 1 -ar 16000 jfk.pcm
pcm = File.read!("jfk.pcm")

{:ok, %WhisperCpp.Transcription{text: text, segments: segs}} =
  WhisperCpp.transcribe(model, {:pcm_f32, pcm}, language: "en")

IO.puts(text)
for s <- segs, do: IO.puts("[#{s.start}-#{s.end}] #{s.text}")

Audio is always {:pcm_f32, binary} - little-endian f32 samples, mono, 16 kHz, normalised to [-1.0, 1.0]. The library does not decode WAV/MP3/etc; decode upstream. transcribe_slice/4 runs a [start_s, end_s) window of a master PCM buffer and shifts the returned times back into the source timeline.

See the docs for the full option list (:translate, :initial_prompt, :word_timestamps, :beam_size, :n_threads, cancellation, progress messages, ...) and error handling.

Backends

CPU is always available. Pick one accelerator per build; the precompiled Hex package ships CPU plus cuda / hipblas variants for Linux and Metal on Apple Silicon, selected via WHISPER_CPP_VARIANT:

WHISPER_CPP_VARIANT=cuda mix deps.compile whisper_cpp

To build from source with any whisper-rs backend (cuda, hipblas, vulkan, metal, coreml, intel-sycl, openblas, openmp):

WHISPER_CPP_BUILD=1 WHISPER_CPP_FEATURES=cuda mix compile

Source builds need a Rust toolchain, cmake, a C++17 compiler, and the backend's own SDK (CUDA toolkit, ROCm, Vulkan SDK, ...).

Testing

mix test                  # unit tests, no downloads
mix test --include integration  # downloads ggml-tiny.en + JFK sample, real inference

License

MIT. whisper.cpp is MIT-licensed; whisper-rs is public domain (Unlicense) and vendors whisper.cpp, linking it statically.