Introduction

Refine is an Elixir library for implementing fast faceted search.

Features

Motivation

Faceted search is a common feature in data-heavy web applications such as ecommerce sites and catalogs. By offering multi-select filters such as category and price range, users can quickly narrow results to find the most relevant items.

From a technical perspective, calculating filter counts and visibility across all facets is computationally expensive, and a standard PostgreSQL setup is not well suited to this task.

Search engines such as Apache Lucene, Solr, and Elasticsearch address this problem using bitmaps. For each facet option, matching records are represented as sets of integers. Facets aggregation can then be performed efficiently using bitwise operations.

By applying this approach, we can also achieve fast faceted search in Elixir.

Implementation

Refine implements bitmaps using the PostgreSQL extension pg_roaringbitmap ➚. The bitmaps are stored in a sibling facets table, alongside facet option values and labels.

Changes to the source table are written to a sibling facets_deltas table (also automatically generated).

Data flow

Updates to the source table are tracked using a PostgreSQL trigger, which writes changes to the deltas table. Each entry in this table represents an update instruction for the facets table.

A merge function applies these update instructions to the facets table. The timing of this merge is left to the developer: it can be run periodically or immediately after changes are made to the source table.

Requirements

Installation and usage

Will be added once published to Hex.

Notes

Facet calculations

To collect all available facet options after a search query, all rows matching the query must be processed (not just the paginated ones). In addition, this aggregation must be repeated for each facet that has an active selection, excluding that facet's selected options from the base query. So both the table size and the number of facets contribute to the high cost of faceted search queries.

Solutions using PostgreSQL ts_stat ➚ don't scale well once the table exceeds 10,000 rows.

Bitmaps

A bitmap is an array of bits. A bit value of 1 at a position n indicates that an object with sequence value n is active. A large bitmap with many zeroes can consume a lot of unnecessary space. To address memory issues, compressed Roaring bitmaps ➚ were created.