Bloomy 🌸

High-performance Bloom Filter library for Elixir, powered by Nx

Bloomy provides probabilistic data structures for efficient set membership testing with minimal memory usage. Built on Nx tensors for blazing-fast performance with optional EXLA/GPU acceleration.

ElixirNx

Features

Installation

Add bloomy to your dependencies in mix.exs:

def deps do
  [
    {:bloomy, "~> 0.1.0"},
    {:nx, "~> 0.10.0"},
    {:exla, "~> 0.10.0"}  # Optional but recommended for performance
  ]
end

Quick Start

# Create a bloom filter for 10,000 items with 1% false positive rate
filter = Bloomy.new(10_000, false_positive_rate: 0.01)

# Add items
filter = filter
  |> Bloomy.add("user@example.com")
  |> Bloomy.add("alice@example.com")
  |> Bloomy.add("bob@example.com")

# Check membership
Bloomy.member?(filter, "user@example.com")  # => true
Bloomy.member?(filter, "other@example.com") # => false

# Get statistics
Bloomy.info(filter)
# => %{
#   type: :standard,
#   capacity: 10000,
#   items_count: 3,
#   fill_ratio: 0.0021,
#   false_positive_rate: 0.01,
#   ...
# }

Bloom Filter Types

Standard Bloom Filter

Classic space-efficient probabilistic set membership test.

filter = Bloomy.new(10_000)
filter = Bloomy.add(filter, "apple")
Bloomy.member?(filter, "apple")  # => true

Counting Bloom Filter

Supports deletion operations using counters instead of bits.

filter = Bloomy.new(10_000, type: :counting)

filter = Bloomy.add(filter, "item")
Bloomy.member?(filter, "item")  # => true

filter = Bloomy.remove(filter, "item")
Bloomy.member?(filter, "item")  # => false

Scalable Bloom Filter

Automatically grows to maintain target false positive rate.

filter = Bloomy.new(1_000, type: :scalable)

# Add millions of items - it will automatically scale!
filter = Enum.reduce(1..1_000_000, filter, fn i, f ->
  Bloomy.add(f, "item_#{i}")
end)

info = Bloomy.info(filter)
info.slices_count  # => Multiple slices created automatically

Learned Bloom Filter

ML-enhanced filter with lower false positive rates.

filter = Bloomy.new(10_000, type: :learned)

# Train with positive and negative examples
training_data = %{
  positive: ["valid_user_1", "valid_user_2", "valid_user_3"],
  negative: ["spam_1", "spam_2", "spam_3"]
}

filter = Bloomy.train(filter, training_data)
filter = Bloomy.add(filter, "valid_user_1")

Bloomy.member?(filter, "valid_user_1")  # => true (uses ML model + backup filter)

Performance: Using EXLA Backend

Why EXLA?

EXLA provides significant performance improvements (2-20x faster) by:

Option 1: Set Default Backend (Recommended)

In your config/config.exs:

# Use EXLA with CPU
config :nx, default_backend: EXLA.Backend

# Or use EXLA with GPU (if CUDA is available)
config :nx, default_backend: {EXLA.Backend, client: :cuda}

# Or use EXLA with specific device
config :nx, default_backend: {EXLA.Backend, client: :cuda, device_id: 0}

Option 2: Per-Filter Backend

Specify backend when creating filters:

# Use EXLA backend for this filter
filter = Bloomy.new(100_000, backend: EXLA.Backend)

# Use BinaryBackend (default, no compilation)
filter = Bloomy.new(100_000, backend: Nx.BinaryBackend)

Option 3: Runtime Backend Selection

# Set backend at runtime
Nx.default_backend(EXLA.Backend)

# All filters created after this will use EXLA
filter = Bloomy.new(100_000)

GPU Acceleration

To use GPU acceleration (requires CUDA):

# In config/config.exs
config :nx, default_backend: {EXLA.Backend, client: :cuda}

# Or at runtime
Nx.default_backend({EXLA.Backend, client: :cuda})

# Create filter - will automatically use GPU
filter = Bloomy.new(1_000_000)

Backend Performance Comparison

Run the backend comparison benchmark:

mix run benchmarks/backend_comparison.exs

Typical Performance Gains:

Operations

Batch Operations

# Add multiple items efficiently
filter = Bloomy.add_all(filter, ["apple", "banana", "orange"])

# Batch membership testing
results = Bloomy.batch_member?(filter, ["apple", "grape", "banana"])
# => %{"apple" => true, "grape" => false, "banana" => true}

# Create from list
filter = Bloomy.from_list(["a", "b", "c", "d"])

Merge Operations

# Union (merge) filters
f1 = Bloomy.new(10_000) |> Bloomy.add("apple")
f2 = Bloomy.new(10_000) |> Bloomy.add("banana")

merged = Bloomy.union(f1, f2)
Bloomy.member?(merged, "apple")   # => true
Bloomy.member?(merged, "banana")  # => true

# Union multiple filters
filters = [filter1, filter2, filter3]
merged = Bloomy.union_all(filters)

# Intersection
intersected = Bloomy.intersect_all([filter1, filter2])

Similarity Metrics

# Calculate Jaccard similarity
similarity = Bloomy.jaccard_similarity(filter1, filter2)
# => 0.75 (75% similar)

# Calculate overlap coefficient
overlap = Bloomy.Operations.overlap_coefficient(filter1, filter2)

Persistence

Binary Serialization

# Serialize to binary
binary = Bloomy.to_binary(filter)

# With compression (recommended for large filters)
binary = Bloomy.to_binary(filter, compress: true)

# Deserialize
{:ok, loaded_filter} = Bloomy.from_binary(binary)

File I/O

# Save to file
:ok = Bloomy.save(filter, "my_filter.bloom")

# Save with compression
:ok = Bloomy.save(filter, "my_filter.bloom", compress: true)

# Load from file
{:ok, filter} = Bloomy.load("my_filter.bloom")

Benchmarking

Run the included benchmarks to see performance on your system:

# Basic operations (add, member?, batch operations)
mix run benchmarks/basic_operations.exs

# Compare filter types (Standard, Counting, Scalable)
mix run benchmarks/filter_comparison.exs

# Merge operations (union, intersection, similarity)
mix run benchmarks/operations.exs

# Serialization and persistence
mix run benchmarks/serialization.exs

# Backend comparison (BinaryBackend vs EXLA)
mix run benchmarks/backend_comparison.exs

Configuration Options

Creating Filters

Bloomy.new(capacity, opts)

Common Options:

Counting Filter Options:

Scalable Filter Options:

Learned Filter Options:

Examples

# Standard with custom false positive rate
filter = Bloomy.new(10_000, false_positive_rate: 0.001)

# Counting with 16-bit counters and EXLA
filter = Bloomy.new(10_000,
  type: :counting,
  counter_width: 16,
  backend: EXLA.Backend
)

# Scalable with custom growth
filter = Bloomy.new(1_000,
  type: :scalable,
  growth_factor: 3,
  error_tightening_ratio: 0.5
)

Use Cases

Web Application - User Session Tracking

# Track active sessions
defmodule SessionTracker do
  use GenServer

  def start_link(_) do
    GenServer.start_link(__MODULE__, nil, name: __MODULE__)
  end

  def init(_) do
    # 1M sessions, 0.1% false positive rate, use EXLA for speed
    filter = Bloomy.new(1_000_000,
      false_positive_rate: 0.001,
      backend: EXLA.Backend
    )
    {:ok, filter}
  end

  def track_session(session_id) do
    GenServer.cast(__MODULE__, {:track, session_id})
  end

  def active?(session_id) do
    GenServer.call(__MODULE__, {:check, session_id})
  end

  def handle_cast({:track, session_id}, filter) do
    {:noreply, Bloomy.add(filter, session_id)}
  end

  def handle_call({:check, session_id}, _from, filter) do
    {:reply, Bloomy.member?(filter, session_id), filter}
  end
end

Distributed System - Seen Items Cache

# Track which items have been processed across nodes
defmodule DistributedCache do
  def merge_caches(node_filters) do
    # Collect filters from all nodes
    filters = Enum.map(node_filters, & &1)

    # Merge into single filter
    merged = Bloomy.union_all(filters)

    # Distribute back to all nodes
    distribute_filter(merged)
  end

  def sync_filter(local_filter) do
    # Save filter to shared storage
    Bloomy.save(local_filter, "/shared/cache.bloom", compress: true)
  end

  def load_filter do
    case Bloomy.load("/shared/cache.bloom") do
      {:ok, filter} -> filter
      {:error, _} -> Bloomy.new(1_000_000)
    end
  end
end

Data Pipeline - Deduplication

defmodule Deduplicator do
  def process_stream(stream) do
    initial_filter = Bloomy.new(10_000_000, type: :scalable)

    Stream.transform(stream, initial_filter, fn item, filter ->
      if Bloomy.member?(filter, item.id) do
        # Skip duplicate
        {[], filter}
      else
        # New item, add to filter and pass through
        {[item], Bloomy.add(filter, item.id)}
      end
    end)
  end
end

Performance Tips

  1. Use EXLA Backend - Significant performance improvement (2-20x faster)

    config :nx, default_backend: EXLA.Backend
  2. Choose Appropriate Capacity - Overestimate slightly for better performance

    # If expecting ~10K items, use 15K capacity
    filter = Bloomy.new(15_000)
  3. Batch Operations - Use add_all and batch_member? for multiple items

    # Fast: single batch operation
    filter = Bloomy.add_all(filter, items)
    
    # Slower: many individual operations
    filter = Enum.reduce(items, filter, &Bloomy.add(&2, &1))
  4. Use Scalable for Unknown Sizes - Automatically adapts to data volume

    filter = Bloomy.new(1_000, type: :scalable)
  5. Serialize with Compression - Reduces storage/network overhead

    Bloomy.save(filter, "cache.bloom", compress: true)

How It Works

Bloom Filters

A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set:

Mathematical Foundation

Where:

Implementation Details

API Reference

Main Functions

Operations

Persistence

ML (Learned Filters)

Testing

Run the test suite:

mix test

Run the integration tests:

mix run test_bloomy.exs

License

MIT License

Acknowledgments

Resources


Made with ❤️ and Elixir