VoskEx

CIHex.pmDocumentation

Elixir bindings for the Vosk API - offline speech recognition toolkit.

VoskEx provides a high-performance interface to Vosk’s speech recognition capabilities, allowing you to recognize speech from audio files or streams entirely offline, with no network connection required.

Features

Installation

VoskEx automatically downloads precompiled Vosk libraries during compilation, so no system dependencies are required!

Simply add vosk_ex to your list of dependencies in mix.exs:

def deps do
  [
    {:vosk_ex, "~> 0.2.1"}
  ]
end

Then run:

mix deps.get
mix compile  # Automatically downloads Vosk library (~2-7 MB) for your platform

Supported platforms:

The library automatically detects your platform and downloads the appropriate precompiled Vosk library on first compilation.

Windows Users - Additional Setup Required

On Windows, you need to add the Vosk DLL directory to PATH before starting your application. This is a Windows limitation for finding external DLL dependencies.

Why? Unlike bcrypt or other self-contained NIFs, VoskEx depends on external Vosk DLLs (26MB+ of speech recognition libraries). Windows needs to know where to find these at runtime.

Option 1 - Set PATH manually (PowerShell):

# In PowerShell, before running your app
$env:PATH = "_build\dev\lib\vosk_ex\priv\native\windows-x86_64;$env:PATH"

# Then run normally
mix test
mix run
iex -S mix

Option 2 - Use the included helper script:

# Copy scripts/windows/run.ps1 to your project root
.\scripts\windows\run.ps1 mix test
.\scripts\windows\run.ps1 iex -S mix

Option 3 - Create a startup script for your app:

# my_app.ps1
$env:PATH = "_build\dev\lib\vosk_ex\priv\native\windows-x86_64;$env:PATH"
mix run --no-halt

Option 4 - Use Mix releases (recommended for production):

mix release
# Releases automatically bundle all DLLs - no PATH manipulation needed!

Note: For test environment, use _build\test\lib\vosk_ex\priv\native\windows-x86_64 instead.

Configuration

VoskEx logs are disabled by default. To enable Vosk/Kaldi internal logging, add to your config/config.exs:

config :vosk_ex,
  log_level: 0  # -1 = silent (default), 0 = default logging, >0 = more verbose

Usage

1. Download a speech model

Use the built-in Mix task to download a model:

# Download default English model
mix vosk.download_model

# Download Spanish model
mix vosk.download_model es

# Download specific model by name
mix vosk.download_model vosk-model-small-en-us-0.15

Available predefined languages: en-us, es, fr, de, ru, cn, ja, pt, it, and more.

Or download manually from https://alphacephei.com/vosk/models.

2. Basic usage

# Load the model
{:ok, model} = VoskEx.Model.load("vosk-model-small-en-us-0.15")

# Create a recognizer (16kHz sample rate)
{:ok, recognizer} = VoskEx.Recognizer.new(model, 16000.0)

# Optional: Enable word timing
:ok = VoskEx.Recognizer.set_words(recognizer, true)

# Read audio file (PCM 16-bit mono, skip WAV header)
audio = File.read!("audio.wav") |> binary_part(44, byte_size(audio) - 44)

# Process audio in chunks
chunk_size = 8000
for <<chunk::binary-size(chunk_size) <- audio>> do
  case VoskEx.Recognizer.accept_waveform(recognizer, chunk) do
    :utterance_ended ->
      {:ok, result} = VoskEx.Recognizer.result(recognizer)
      IO.inspect(result)

    :continue ->
      {:ok, partial} = VoskEx.Recognizer.partial_result(recognizer)
      IO.inspect(partial, label: "Partial")
  end
end

# Get final result
{:ok, final} = VoskEx.Recognizer.final_result(recognizer)
IO.inspect(final, label: "Final")

3. Result format

# Simple result
%{"text" => "hello world"}

# With word timing (when set_words is enabled)
%{
  "result" => [
    %{"conf" => 1.0, "end" => 1.110000, "start" => 0.870000, "word" => "hello"},
    %{"conf" => 0.98, "end" => 1.530000, "start" => 1.110000, "word" => "world"}
  ],
  "text" => "hello world"
}

# Partial result
%{"partial" => "hello wor"}

4. Streaming audio

defmodule AudioProcessor do
  use GenServer

  def start_link(model_path) do
    GenServer.start_link(__MODULE__, model_path)
  end

  def init(model_path) do
    {:ok, model} = VoskEx.Model.load(model_path)
    {:ok, recognizer} = VoskEx.Recognizer.new(model, 16000.0)
    VoskEx.Recognizer.set_words(recognizer, true)

    {:ok, %{model: model, recognizer: recognizer}}
  end

  def handle_call({:process_audio, audio_chunk}, _from, state) do
    result = case VoskEx.Recognizer.accept_waveform(state.recognizer, audio_chunk) do
      :utterance_ended ->
        {:ok, result} = VoskEx.Recognizer.result(state.recognizer)
        {:utterance, result}

      :continue ->
        {:ok, partial} = VoskEx.Recognizer.partial_result(state.recognizer)
        {:partial, partial}

      :error ->
        {:error, :recognition_failed}
    end

    {:reply, result, state}
  end
end

Documentation

Full documentation is available at https://hexdocs.pm/vosk_ex or you can generate it locally:

mix docs
open doc/index.html

API Reference

VoskEx.Model

VoskEx.Recognizer

VoskEx (Low-level API)

Audio Format

Vosk expects PCM 16-bit mono audio. Sample rates typically used:

Converting audio with ffmpeg

# Convert any audio to 16kHz mono PCM WAV
ffmpeg -i input.mp3 -ar 16000 -ac 1 -f wav output.wav

# Extract raw PCM (no WAV header)
ffmpeg -i input.mp3 -ar 16000 -ac 1 -f s16le output.raw

Performance Considerations

Available Models

Download models from https://alphacephei.com/vosk/models

Languages include:

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Vosk itself is licensed under the Apache License 2.0.

Acknowledgments