Introduction
Refine is an Elixir library for implementing fast faceted search.
Features
- Fast faceted search - Search performance comparable to established faceted search solutions.
- Works with existing queries – Faceted search can be applied to existing Ecto queries.
- Flexible configuration – Facet values and labels can be extracted from source data, including scalar values, arrays, and JSON objects.
- Backend for frontend - All data required to build facet form inputs in the frontend can be stored in the facets table.
- Simplicity - Provide faceted search as a simple add-on to your existing database, without requiring Docker, an external search service, or a paid faceted search provider.
Motivation
Faceted search is a common feature in data-heavy web applications such as ecommerce sites and catalogs. By offering multi-select filters such as category and price range, users can quickly narrow results to find the most relevant items.
From a technical perspective, calculating filter counts and visibility across all facets is computationally expensive, and a standard PostgreSQL setup is not well suited to this task.
Search engines such as Apache Lucene, Solr, and Elasticsearch address this problem using bitmaps. For each facet option, matching records are represented as sets of integers. Facets aggregation can then be performed efficiently using bitwise operations.
By applying this approach, we can also achieve fast faceted search in Elixir.
Implementation
Refine implements bitmaps using the PostgreSQL extension pg_roaringbitmap ➚. The bitmaps are stored in a sibling facets table, alongside facet option values and labels.
Changes to the source table are written to a sibling facets_deltas table (also automatically generated).
Data flow
Updates to the source table are tracked using a PostgreSQL trigger, which writes changes to the deltas table. Each entry in this table represents an update instruction for the facets table.
A merge function applies these update instructions to the facets table. The timing of this merge is left to the developer: it can be run periodically or immediately after changes are made to the source table.
Requirements
- Elixir
- PostgreSQL
-
PostgreSQL extension
pg_roaringbitmap
Installation and usage
Will be added once published to Hex.
Notes
Facet calculations
To collect all available facet options after a search query, all rows matching the query must be processed (not just the paginated ones). In addition, this aggregation must be repeated for each facet that has an active selection, excluding that facet's selected options from the base query. So both the table size and the number of facets contribute to the high cost of faceted search queries.
Solutions using PostgreSQL ts_stat ➚ don't scale well once the table exceeds 10,000 rows.
Bitmaps
A bitmap is an array of bits. A bit value of 1 at a position n indicates that an object with sequence value n is active.
A large bitmap with many zeroes can consume a lot of unnecessary space. To address memory issues, compressed Roaring bitmaps ➚ were created.