OmnivoiceEx

Hex.pmLicense

Elixir wrapper for OmniVoice โ€” a unified speech generation model from K2-FSA.

Voice Cloning ยท Voice Design ยท Multilingual TTS ยท 24kHz Output

Features

Requirements

Installation

Add to your mix.exs:

def deps do
  [
    {:omnivoice_ex, "~> 0.1.0"}
  ]
end

Then install Python dependencies:

mix omnivoice_ex.setup

Quick Start

# Start the model server
{:ok, pid} = OmnivoiceEx.start_link(device: "cuda")

# Wait for model to load
:ok = OmnivoiceEx.await_ready(pid)

# Generate speech
{:ok, audio} = OmnivoiceEx.generate(pid, "Hello, world!")

# Save to file
:ok = OmnivoiceEx.save(audio, "output.wav")

# Clean shutdown
OmnivoiceEx.stop(pid)

Voice Design

Describe a voice in natural language and OmniVoice generates it:

{:ok, audio} = OmnivoiceEx.generate(pid,
  "Welcome to our luxury resort.",
  instruct: "A warm, professional female concierge with a British accent"
)

Voice Cloning

Clone a voice from a reference audio file:

{:ok, audio} = OmnivoiceEx.generate(pid,
  "This is a cloned voice speaking English.",
  ref_audio: "/path/to/reference.wav",
  ref_text: "Transcript of the reference audio"  # optional, improves quality
)

Generation Options

Option Type Default Description
ref_audioString.t() โ€” Path to reference audio for cloning
ref_textString.t() โ€” Transcript of reference audio
instructString.t() โ€” Voice instruction for design
languageString.t() โ€” Language code (auto-detected)
durationfloat() โ€” Target duration in seconds
speedfloat() โ€” Playback speed factor
num_steppos_integer()32 Diffusion steps (more = higher quality)
guidance_scalefloat()2.0 CFG guidance scale

Architecture

Elixir (GenServer) โ†โ†’ Erlang Port โ†โ†’ Python Bridge โ†โ†’ OmniVoice Model
                    (stdin/stdout)   (msgpack framed)

Uses MessagePack binary framing over Erlang Ports โ€” audio is transmitted as raw WAV bytes inside msgpack, eliminating the 33% base64 overhead of JSON-based solutions.

License

Apache 2.0 โ€” see LICENSE.

Related