xb5
xb5 is an Erlang library that implements
B-tree-based (order 5) sorted
containers. It aims to be a faster alternative to OTP's gb_trees and
gb_sets for most workloads.
It provides three modules:
xb5_sets— ordered set, drop-in replacement forgb_setsxb5_trees— ordered key-value store, drop-in replacement forgb_treesxb5_bag— ordered multiset with order-statistic operations (rank, nth element, percentile)
The APIs of xb5_sets and xb5_trees mirror their stdlib counterparts;
refer to the API documentation for details.
Getting started
Erlang
rebar.config:
{deps, [
% [...]
{xb5, "~> 0.1.0"}
]}your_application.app.src:
{applications, [
kernel,
stdlib,
% [...]
xb5
]}Elixir
xb5_elixir provides an idiomatic
wrapper of xb5.
Benchmarks
Benchmarks compare all three modules against gb_sets/gb_trees across 50+
operations and collection sizes up to 15,000 elements, measuring both runtime
and heap allocation. All tests used small integers (immediate values) as keys.
Each module is tested under three build scenarios: bulk
construction from a sorted input, sequential single-key insertion, and random
insertion order. Tests ran on OTP 28 with JIT enabled on two machines:
AMD Ryzen 7 5700G
and
Intel i5-3550.
The benchmark source is also
available.
xb5_sets vs gb_sets:
-
Mutations, membership tests, set-algebraic operations (difference, union,
intersection,
is_disjoint,is_subset): 1.2–2.2× as fast, with similar heap use - Filter, filtermap, map: ~1.5–2.5× as fast, with up to ~40% less heap
- Bulk construction: similar or faster speed, ~15–65% less heap
- Alternating take-smallest/insert-largest: ~2–4× as fast depending on build scenario
is_equalwith no key overlap: 40–126× as fast —gb_setsconverts both sets to lists for the comparison; with identical keys the results are roughly equal- Iteration: ~25% slower on the AMD machine; near-equal on the i5
xb5_trees vs gb_trees:
- Lookups: 1.6–1.9× as fast, with equal heap
- Mutations (insert, update, delete, take): 1.2–1.7× as fast
- map, keys, values: 1.9–2.5× as fast; map uses ~47% less heap
- Alternating take-smallest/insert-largest: 1.7–3.3× as fast
take_smallestandtake_largest: 14–27% slower in most build scenarios
xb5_bag: runtime profile broadly matches xb5_sets; heap use is up to
~40% lower for filter/map operations and ~15–62% lower for bulk construction,
but ~20–25% higher for mutations.
Technical Details
Key density
In a B-tree of order 5, each non-root node holds between 2 and 4 keys. The average number of keys per node — the key density — turns out to be an important factor for performance.
Early benchmarks showed that:
- When key density drops towards 2, the tree becomes taller with more nodes, increasing the cost of traversals.
- When it rises towards 4, nodes are nearly full, leaving little room to absorb insertions without splitting. This leads to more frequent node reallocations and rotations.
- The sweet spot hovers around 3 keys per node, balancing tree height against structural churn.
Much of xb5's design follows from keeping key density close to that
sweet spot.
Measured key densities for 1000-element sets:
| Build type | Key density |
|---|---|
| Bulk construction or sequential ops | ~3.0 |
| Random insertions | ~2.9 |
| After heavy random deletion | ~2.6 |
| Adversarial deletion (every 5th element) | ~2.4 |
Insertion: spilling before splitting
In a textbook B-tree, when a node overflows (5 keys in an order-5 tree), it splits into two nodes of 2 keys each, pushing the median up. Sequential insertions cause frequent splits, leaving most nodes half full — a key density of ~2.
xb5 adds a step before splitting: if the overflowing node's left sibling
has only 2 keys, the excess key is spilled into that sibling instead.
If the node has no left sibling (i.e., it is the leftmost child), the right
sibling is checked. A split only happens when no adjacent sibling can
absorb the extra key (i.e., it already has 3 or more keys). This keeps
nodes fuller and key density close to 3.
Deletion: reactive rebalancing
Textbook B-tree deletion uses a proactive strategy: before descending into a child, ensure it has more than the minimum number of keys by rotating from a sibling or merging. This guarantees a single top-down pass with no backtracking, which makes sense when nodes are mutable and pre-allocated.
In the BEAM, however, all traversed tuples are reallocated on the way
back up regardless — there is no mutation to avoid. xb5 therefore uses
a reactive approach: it descends freely, allowing a child node to
temporarily hold only 1 key after a deletion, and then checks on the way
back up whether a rotation or merge is needed. This was measured to
perform better, as the proactive key movements are often unnecessary and
amount to wasted reallocations.
Additionally, when rebalancing after a deletion, only the left sibling is checked for a possible rotation (unless the child is the leftmost, in which case the right sibling is checked). This leads to more frequent merges compared to the textbook approach of checking both siblings — which both helps key density and reduces allocations (a merge requires only 1 tuple allocation, while a rotation requires 3).
Trade-off: code verbosity
The node modules (xb5_sets_node, xb5_trees_node, xb5_bag_node) are
large and repetitive. This is a deliberate trade-off.
Nodes are not generic tuples accessed via element/setelement — each
node size has its own explicit tuple layout, packed and unpacked through
macros:
-define(LEAF2(E1, E2), {E1, E2}).
-define(LEAF3(E1, E2, E3), {E1, E2, E3}).
-define(LEAF4(E1, E2, E3, E4), {leaf4, E1, E2, E3, E4}).
-define(INTERNAL2(E1, E2, C1, C2, C3), {internal2, E1, E2, C1, C2, C3}).
-define(INTERNAL3(E1, E2, E3, C1, C2, C3, C4), {E1, E2, E3, C1, C2, C3, C4}).
-define(INTERNAL4(E1, E2, E3, E4, C1, C2, C3, C4, C5), {E1, E2, E3, E4, C1, C2, C3, C4, C5}).
This explicit pattern matching and tuple construction was necessary to
achieve meaningful performance benefits over gb_trees/gb_sets.
The code also relies heavily on inlining (-compile({inline, [...]})) to
keep things both readable and fast: each operation branches across multiple
node sizes, and dispatching the contents of a node through function calls
would mean passing many arguments. Inlining allows separation of concerns
without paying for the dispatch.
The cost is visible in compiled module sizes — the node modules are significantly larger than their stdlib counterparts:
| Module | .beam size |
|---|---|
gb_sets | 34K |
gb_trees | 22K |
xb5_sets_node | 96K |
xb5_trees_node | 136K |
xb5_bag_node | 128K |
License
MIT License
Copyright (c) 2025-2026 Guilherme Andrade
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.