EmbedEx

EmbedEx

CI StatusHex.pmDocumentationElixirLicense

Vector embeddings service with multiple providers and similarity search


Vector embeddings service for the NSAI (North Shore AI) ecosystem. A unified interface for generating and working with text embeddings across multiple providers with built-in caching, batch processing, and similarity computations.

Features

Installation

Add embed_ex to your list of dependencies in mix.exs:

def deps do
  [
    {:embed_ex, "~> 0.1.0"}
  ]
end

Quick Start

# Single embedding
{:ok, embedding} = EmbedEx.embed("Hello world", provider: :openai)

# Batch embeddings
{:ok, embeddings} = EmbedEx.embed_batch([
  "First text",
  "Second text",
  "Third text"
], provider: :openai)

# Compute similarity
similarity = EmbedEx.cosine_similarity(embedding1, embedding2)
# => 0.87

# Find similar embeddings
{:ok, results} = EmbedEx.find_similar(
  query_embedding,
  corpus_embeddings,
  top_k: 5
)
# => [{0.95, 0}, {0.87, 2}, {0.82, 5}, {0.79, 1}, {0.75, 8}]

Configuration

# config/config.exs

config :embed_ex,
  default_provider: :openai

config :embed_ex, :cache,
  enabled: true,
  ttl: :timer.hours(24),
  limit: 10_000

# Provider configuration
config :embed_ex, :openai,
  api_key: System.get_env("OPENAI_API_KEY"),
  default_model: "text-embedding-3-small"

Environment variables:

Usage

Single Embeddings

# Using default provider (OpenAI)
{:ok, embedding} = EmbedEx.embed("Hello world")

# Specifying provider and model
{:ok, embedding} = EmbedEx.embed(
  "Hello world",
  provider: :openai,
  model: "text-embedding-3-large"
)

# Disable caching for this request
{:ok, embedding} = EmbedEx.embed("Hello world", use_cache: false)

Batch Embeddings

texts = ["Text 1", "Text 2", "Text 3", ...]

# Basic batch embedding
{:ok, embeddings} = EmbedEx.embed_batch(texts, provider: :openai)

# With progress tracking
{:ok, embeddings} = EmbedEx.embed_batch(
  texts,
  provider: :openai,
  on_progress: fn completed, total ->
    IO.puts("Progress: #{completed}/#{total}")
  end
)

# Control concurrency and batch size
{:ok, embeddings} = EmbedEx.embed_batch(
  texts,
  provider: :openai,
  batch_size: 100,
  concurrency: 10
)

Similarity Computations

# Cosine similarity (returns -1 to 1, where 1 is identical)
similarity = EmbedEx.cosine_similarity(embedding1, embedding2)

# Euclidean distance (lower is more similar)
distance = EmbedEx.euclidean_distance(embedding1, embedding2)

# Dot product
dot = EmbedEx.dot_product(embedding1, embedding2)

# Find top-k most similar
{:ok, results} = EmbedEx.find_similar(
  query_embedding,
  corpus_embeddings,
  top_k: 5,
  metric: :cosine
)

# With similarity threshold
{:ok, results} = EmbedEx.find_similar(
  query_embedding,
  corpus_embeddings,
  top_k: 10,
  threshold: 0.8
)

# Pairwise similarity matrix
matrix = EmbedEx.pairwise_similarity([emb1, emb2, emb3], metric: :cosine)
# Returns Nx.Tensor of shape {3, 3}

Cache Management

# Clear all cached embeddings
{:ok, count} = EmbedEx.clear_cache()

# Get cache statistics
{:ok, stats} = EmbedEx.cache_stats()
# => %{hits: 150, misses: 50, ...}

Providers

OpenAI

Supports OpenAI's embedding models via their API.

Supported Models:

Configuration:

{:ok, embedding} = EmbedEx.embed(
  "Hello world",
  provider: :openai,
  model: "text-embedding-3-large",
  api_key: "sk-..." # Optional, defaults to OPENAI_API_KEY env var
)

Batch Limits:

Future Providers

Planned support for:

Architecture

embed_ex/
├── lib/
│   └── embed_ex/
│       ├── embedding.ex        # Embedding struct and utilities
│       ├── provider.ex         # Provider behaviour
│       ├── providers/
│       │   └── openai.ex       # OpenAI implementation
│       ├── cache.ex            # Caching layer (Cachex)
│       ├── similarity.ex       # Vector similarity (Nx)
│       ├── batch.ex            # Batch processing
│       └── application.ex      # OTP application
└── test/
    └── embed_ex/
        ├── embedding_test.exs
        ├── similarity_test.exs
        └── cache_test.exs

Key Components

EmbedEx.Embedding - Struct representing an embedding with metadata:

%EmbedEx.Embedding{
  text: "original text",
  vector: [0.1, 0.2, ...],
  model: "text-embedding-3-small",
  provider: :openai,
  dimensions: 1536,
  metadata: %{}
}

EmbedEx.Provider - Behaviour for implementing providers:

EmbedEx.Cache - Automatic caching with:

EmbedEx.Similarity - Vector operations using Nx:

EmbedEx.Batch - Parallel processing:

Integration with NSAI Ecosystem

EmbedEx is designed to integrate seamlessly with other NSAI projects:

With CNS (Critic-Network Synthesis)

# Embed claims for similarity-based retrieval
{:ok, claim_embeddings} = EmbedEx.embed_batch(
  claims,
  provider: :openai
)

# Find similar claims for antagonist
{:ok, similar} = EmbedEx.find_similar(
  query_embedding,
  claim_embeddings,
  top_k: 5,
  threshold: 0.8
)

With Crucible Framework

# Embed experimental results
{:ok, embeddings} = EmbedEx.embed_batch(
  experiment_descriptions,
  provider: :openai
)

# Cluster similar experiments
matrix = EmbedEx.pairwise_similarity(embeddings)

With LlmGuard

# Embed prompts for semantic similarity detection
{:ok, prompt_embedding} = EmbedEx.embed(prompt, provider: :openai)

# Compare against known attack patterns
{:ok, similar_attacks} = EmbedEx.find_similar(
  prompt_embedding,
  attack_pattern_embeddings,
  top_k: 1,
  threshold: 0.9
)

Performance

Benchmarks (OpenAI provider)

Caching Impact

Cache hit rates typically exceed 80% in production workloads, resulting in:

Testing

# Run all tests
mix test

# Run with coverage
mix test --cover

# Run specific test file
mix test test/embed_ex/similarity_test.exs

All tests pass:

Finished in 0.1 seconds (0.1s async, 0.02s sync)
38 tests, 0 failures

Development

# Get dependencies
mix deps.get

# Compile
mix compile

# Format code
mix format

# Generate documentation
mix docs

# Run dialyzer (static analysis)
mix dialyzer

Roadmap

v0.2.0

v0.3.0

v0.4.0

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (mix test)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Support

For issues, questions, or contributions, please visit:

Acknowledgments

Part of the North Shore AI monorepo - an Elixir-based ML reliability research ecosystem.

Related projects: