ExBags

Hex.pmHexDocs

Duplicate bag (multiset) implementation for Elixir with set operations.

## Installation

Add to your mix.exs:

  def deps do
    [
      {:ex_bags, "~> 0.2.0"}
    ]
  end

## Overview

ExBags implements duplicate bags that allow multiple values for the same key. Values are stored as lists, enabling tracking of multiple occurrences for each key.

Use cases:

## Core Bag Operations

### new/0

Creates a new empty bag.

  iex> ExBags.new()
  %{}

### put/3

Adds a value to the bag for the given key.

  iex> bag = ExBags.new()
  iex> bag = ExBags.put(bag, :a, 1)
  iex> bag = ExBags.put(bag, :a, 2)
  iex> ExBags.get(bag, :a)
  [1, 2]

### get/2

Gets all values for a given key from the bag.

  iex> bag = %{a: [1, 2, 3], b: ["hello"]}
  iex> ExBags.get(bag, :a)
  [1, 2, 3]

  iex> ExBags.get(bag, :c)
  []

### keys/1

Gets all keys from the bag.

  iex> bag = %{a: [1, 2], b: [3], c: []}
  iex> ExBags.keys(bag) |> Enum.sort()
  [:a, :b, :c]

### values/1

Gets all values from the bag, flattened into a single list.

  iex> bag = %{a: [1, 2], b: [3, 4]}
  iex> ExBags.values(bag) |> Enum.sort()
  [1, 2, 3, 4]

### update/3

Updates the values for a key in the bag using a function.

  iex> bag = %{a: [1, 2, 3]}
  iex> ExBags.update(bag, :a, fn values -> Enum.map(values, &(&1 * 2)) end)
  %{a: [2, 4, 6]}

## Set Operations

### intersect/2

Returns key-value pairs that exist in both bags. Values are returned as tuples containing values from each bag.

  iex> ExBags.intersect(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]})
  %{b: {[2, 3], [2, 4]}}

  iex> ExBags.intersect(%{a: [1, 1, 2], b: [2, 2, 3]}, %{a: [1, 2], b: [2, 4]})
  %{a: {[1, 1, 2], [1, 2]}, b: {[2, 2, 3], [2, 4]}}

### difference/2

Returns key-value pairs that exist in the first bag but not in the second bag.

  iex> ExBags.difference(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]})
  %{a: [1, 2], b: [3]}

  iex> ExBags.difference(%{a: [1, 1, 2], b: [2, 2, 3]}, %{a: [1], b: [2]})
  %{a: [1, 2], b: [2, 3]}

### symmetric_difference/2

Returns key-value pairs that exist in either bag but not in both.

  iex> ExBags.symmetric_difference(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]})
  %{a: [1, 2], b: [3, 4], c: [5]}

  iex> ExBags.symmetric_difference(%{a: [1, 1, 2]}, %{a: [1, 2, 2]})
  %{a: [1, 2]}

### reconcile/2

Performs reconciliation similar to SQL FULL OUTER JOIN. Returns a tuple of three bags:

  1. Common: Key-value pairs that exist in both bags
  2. Only first: Key-value pairs that exist only in the first bag
  3. Only second: Key-value pairs that exist only in the second bag
  iex> ExBags.reconcile(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]})
  {%{b: [2]}, %{a: [1, 2], b: [3]}, %{b: [4], c: [5]}}

  iex> ExBags.reconcile(%{a: [1]}, %{b: [2]})
  {%{}, %{a: [1]}, %{b: [2]}}

## Stream Functions

Memory-efficient stream versions for large datasets:

### intersect_stream/2, difference_stream/2, symmetric_difference_stream/2, reconcile_stream/2

These functions return streams instead of bags.

  iex> ExBags.intersect_stream(%{a: [1, 2], b: [2, 3]}, %{b: [2, 4], c: [5]}) |> Enum.to_list() |> Enum.sort()
  [{:b, {[2, 3], [2, 4]}}]

  iex> stream = ExBags.intersect_stream(large_bag1, large_bag2)
  iex> first_ten = stream |> Stream.take(10) |> Enum.to_list()

  iex> {common, only_first, only_second} = ExBags.reconcile_stream(bag1, bag2)
  iex> {Enum.to_list(common) |> Enum.sort(), Enum.to_list(only_first) |> Enum.sort(), Enum.to_list(only_second) |> Enum.sort()}

## Use Cases

### Inventory Management

  inventory = ExBags.new()
  inventory = ExBags.put(inventory, :apples, "red")
  inventory = ExBags.put(inventory, :apples, "green")
  inventory = ExBags.put(inventory, :bananas, "yellow")

  ExBags.get(inventory, :apples)
  # ["red", "green"]

  common_items = ExBags.intersect(warehouse_inventory, store_inventory)

### Data Synchronization

  local_data = %{users: ["alice", "bob"], sessions: ["session1", "session2"]}
  remote_data = %{users: ["alice", "charlie"], sessions: ["session2", "session3"]}

  {common, local_only, remote_only} = ExBags.reconcile(local_data, remote_data)
  # common: %{users: ["alice"], sessions: ["session2"]}
  # local_only: %{users: ["bob"], sessions: ["session1"]}
  # remote_only: %{users: ["charlie"], sessions: ["session3"]}

### Event Tracking

  events = ExBags.new()
  events = ExBags.put(events, :user1, "login")
  events = ExBags.put(events, :user1, "view_page")
  events = ExBags.put(events, :user2, "login")

  common_events = ExBags.intersect(user_events, admin_events)

## Performance

ExBags is optimized for duplicate bag operations with competitive performance characteristics:

### Intersect Performance Comparison

Dataset Size MapSet.intersection ExBags.intersect Map.intersect (keys)
100 items 475K ops/sec 203K ops/sec (2.3x slower) 47K ops/sec (10x slower)
1,000 items 697K ops/sec 13K ops/sec (55x slower) 2K ops/sec (315x slower)
10,000 items 702K ops/sec 1K ops/sec (691x slower) 135 ops/sec (5,198x slower)

### Memory Usage Comparison

Dataset Size MapSet ExBags Map (keys)
100 items 3.2 KB 7.0 KB (2.2x) 8.1 KB (2.6x)
1,000 items 2.3 KB 69 KB (31x) 97 KB (43x)
10,000 items 2.3 KB 430 KB (191x) 772 KB (342x)

### Stream vs Eager Performance

For large datasets (50,000 items), streams provide better performance:

### Performance Characteristics

## Testing

  mix test

Test coverage:

Property testing validates functions with various inputs including different value types, empty bags, and large datasets.

### Coverage Reports

Generate HTML coverage report:

  mix coveralls.html

View the report in cover/excoveralls.html.

### Benchmarking

Run performance benchmarks:

  # Run all benchmarks
  mix benchmark

  # Run specific benchmarks
  mix benchmark.intersect    # Compare Map, MapSet, ExBags intersect
  mix benchmark.stream       # Compare ExBags stream vs eager
  mix benchmark.all          # Run all benchmarks

# Update README with current results
mix run priv/scripts/update_readme_benchmarks.exs

# Copy moduledoc to README
mix docs.readme

Compare ExBags with Map and MapSet operations across different dataset sizes.

Available benchmark functions:

## Changelog

See CHANGELOG.md for a detailed list of changes and migration guides.

## License

MIT License - see LICENSE file for details.