glazer

buildHex.pmHex.pm

Fast Erlang NIF JSON encoder/decoder backed by the glaze C++ library, with a hand-rolled recursive-descent decoder and direct term-to-JSON encoder that produce/consume native Erlang terms in a single pass.

Features

Installation

Erlang

Add glazer to your rebar.config deps:

{deps, [glazer]}.

Building the NIF requires a C++23 compiler (GCC 12+ or Clang 16+) and CMake; the glaze C++ library is fetched automatically at build time via CMake's FetchContent. The top-level Makefile wires the CMake build into rebar3 compile, so a plain

rebar3 compile

This builds priv/glazer.so and compiles the Erlang sources. Make sure you have a relatively recent C++ compiler version installed.

Elixir

Add glazer to your mix.exs deps:

def deps do
[
{:glazer, "~> 0.1"}
]
end

Then fetch and compile as usual:

mix deps.get
mix compile

glazer is an Erlang application with a Rebar-based C++ NIF build; mix invokes the same top-level Makefile/rebar3 compile path described above, so the same C++23 compiler and CMake requirements apply. Once compiled, call it via the :glazer module from Elixir:

iex> :glazer.decode(~s({"a":1,"b":[true,null,3.5]}))
%{"a" => 1, "b" => [true, :null, 3.5]}
iex> :glazer.encode(%{"a" => 1, "b" => [true, :null, 3.5]})
"{\"a\":1,\"b\":[true,null,3.5]}"

Use the use_nil/{null_term, nil} option (see JSON null below) to get idiomatic Elixir nil instead of the atom :null.

Usage

1> glazer:decode(<<"{\"a\":1,\"b\":[true,null,3.5]}">>).
#{<<"a">> => 1, <<"b">> => [true, null, 3.5]}
2> glazer:encode(#{<<"a">> => 1, <<"b">> => [true, null, 3.5]}).
<<"{\"a\":1,\"b\":[true,null,3.5]}">>
3> glazer:encode(#{a => 1}, [pretty]).
<<"{\n \"a\": 1\n}">>
4> glazer:minify(<<" { \"a\" : 1 } ">>).
{ok, <<"{\"a\":1}">>}
5> glazer:prettify(<<"{\"a\":1}">>).
{ok, <<"{\n \"a\": 1\n}">>}

Streaming

For input that arrives in chunks — e.g. reading a large document incrementally, or consuming newline-delimited JSON (NDJSON) from a socket or file — stream_decoder/0,1 provides a small stateful wrapper that buffers partial input and decodes each JSON value as soon as it's complete, without re-parsing bytes you've already seen:

1> D0 = glazer:stream_decoder(),
2> {Vals1, D1} = glazer:stream_feed(D0, <<"{\"a\":1} {\"b\":">>),
3> Vals1.
[#{<<"a">> => 1}]
4> {Vals2, D2} = glazer:stream_feed(D1, <<"2}">>),
5> Vals2.
[#{<<"b">> => 2}]
6> glazer:stream_eof(D2).
{ok, []}

stream_feed/2 returns the list of values completed by the chunk just fed (possibly empty, possibly more than one if the chunk completes several values) along with the updated decoder state to pass to the next call. Once the input is exhausted, call stream_eof/1 to flush any trailing bare scalar (numbers, strings, etc. have no closing delimiter of their own) and surface an error if the buffer holds an incomplete value:

1> D0 = glazer:stream_decoder(),
2> {[], D1} = glazer:stream_feed(D0, <<" 42">>),
3> glazer:stream_eof(D1).
{ok, [42]}

stream_decoder/1 accepts the same options as decode/2 (e.g. {keys, atom}, use_nil) and applies them to every decoded value.

Efficiency

stream_feed/2 only scans for value boundaries incrementally — the scanner carries a small resumable cursor (scan_state()) that remembers how far it has already looked (nesting depth, whether it's inside a string, escape state, …), so each call to scan/2 resumes from where the previous one left off rather than re-walking the whole buffer from byte zero. Once a complete value's end offset is known, that slice is decoded exactly once via the same NIF-backed decoder used by decode/2 — there's no intermediate tokenization or tree representation, and no byte is ever scanned or decoded twice. The only buffering cost is concatenating newly-arrived chunks onto the not-yet-complete tail of the input.

This makes stream_feed/2 well suited to byte-at-a-time or small-chunk feeding (e.g. consuming a gen_tcp/gen_statem socket buffer as it fills) without the quadratic-rescan cost a naive "concatenate and retry full decode" loop would incur on large or slow-arriving documents.

Under the hood, stream_feed/2 is built on scan/1,2 — a low-level primitive that scans a buffer for the byte offset where the next JSON value ends (or reports that more input is needed) without doing a full decode. It's exposed directly for callers that want to implement their own framing/buffering strategy:

1> glazer:scan(<<"{\"a\":1} {\"b\":2}">>).
{complete, 7}
2> glazer:scan(<<"{\"a\":">>).
{incomplete, ScanState}
3> glazer:scan(<<"{\"a\":1}">>, ScanState).
{complete, 7}

JSON null

By default, JSON null decodes to (and null encodes from) the atom null. This can be overridden:

Big integers

JSON numbers that don't fit into a 64-bit integer are decoded as Erlang big integers (and big integers are encoded back to their exact decimal JSON representation):

1> glazer:decode(<<"123456789012345678901234567890">>).
123456789012345678901234567890
2> glazer:encode(123456789012345678901234567890).
<<"123456789012345678901234567890">>

encode_bigint/1 and decode_bigint/1 expose the same conversion routines directly, independent of JSON parsing/encoding:

1> glazer:encode_bigint(123456789012345678901234567890).
{ok, <<"123456789012345678901234567890">>}
2> glazer:decode_bigint(<<"123456789012345678901234567890">>).
{ok, 123456789012345678901234567890}

Options

Decode options (decode/2)

OptionDescription
return_mapsDecode JSON objects as Erlang maps (default)
object_as_tupleDecode JSON objects as {[{Key, Value}]} proplist tuples (jiffy-style)
use_nilUse the atom nil for JSON null
{null_term, Atom}Use Atom for JSON null
{keys, atom}Decode object keys as atoms (via binary_to_atom/2-equivalent)
{keys, existing_atom}Decode object keys as existing atoms, falling back to binaries for unknown atoms
{keys, binary}Decode object keys as binaries (default)
1> glazer:decode(<<"{\"a\":1}">>, [object_as_tuple]).
{[{<<"a">>, 1}]}
2> glazer:decode(<<"{\"a\":1}">>, [{keys, atom}]).
#{a => 1}
3> glazer:decode(<<"null">>, [use_nil]).
nil
4> glazer:decode(<<"null">>, [{null_term, undefined}]).
undefined

Encode options (encode/2)

OptionDescription
prettyPretty-print the JSON output with two-space indentation
uescapeEscape non-ASCII characters as \uXXXX sequences
force_utf8Sanitize invalid UTF-8 byte sequences before encoding
use_nilEncode the atom nil as JSON null
{null_term, Atom}Encode Atom as JSON null
1> glazer:encode(#{a => 1}, [pretty]).
<<"{\n \"a\": 1\n}">>
2> glazer:encode(<<"héllo"/utf8>>, [uescape]).
<<"\"h\\u00e9llo\"">>
3> glazer:encode(nil, [use_nil]).
<<"null">>

API

FunctionDescription
decode/1, decode/2Decode a JSON binary or iolist to an Erlang term
encode/1, encode/2Encode an Erlang term to a JSON binary
minify/1Remove unnecessary whitespace from a JSON document
prettify/1Pretty-print a JSON document with two-space indentation
encode_bigint/1Encode an integer to its JSON decimal-string representation
decode_bigint/1Decode a JSON number string to an Erlang integer
scan/1, scan/2Scan a buffer for the end offset of the next complete JSON value
stream_decoder/0, stream_decoder/1Create an incremental-decode state for chunked input
stream_feed/2Feed a chunk to a stream decoder, returning completed values
stream_eof/1Flush a stream decoder at end-of-input

See the module's EDoc comments (src/glazer.erl) for full type specs and details.

Benchmarks

A comparison benchmark against other JSON libraries (simdjsone, jiffy, jason, thoas, euneus, OTP's built-in json, and torque) is available via:

$ make bench
Running benchmarks...
(numbers in µs)
twitter (616.7K) twitter2 (758.0K) openrtb (1.2K) esad (1.3K) small (0.1K)
decode encode decode encode decode encode decode encode decode encode
---------------------------------------------------------------------------------------------------------------------
glazer 9014.0 3779.4 11771.0 6557.8 15.5 12.1 12.5 8.4 1.4 1.7
torque 9825.0 3883.6 13308.5 6498.1 17.7 14.0 13.8 7.8 2.9 1.5
simdjsone 9739.3 8356.5 18468.7 13936.1 24.8 21.6 17.9 22.0 2.6 5.2
jiffy 29797.7 4485.1 46869.1 8581.4 41.9 23.8 27.8 17.3 6.8 3.0
jason 20765.0 12294.6 37614.5 22681.9 58.5 29.8 32.7 19.0 6.0 3.6
thoas 21184.5 13146.7 38650.0 23221.9 61.6 28.9 38.2 19.6 6.4 4.2
euneus 20953.2 11202.8 29964.1 21124.0 47.7 20.7 26.7 13.7 7.0 3.7
json 20262.7 10722.5 28953.8 20213.8 43.1 25.8 32.3 16.8 5.0 2.1

(requires the bench/dev Mix dependencies — see mix.exs).

Performance

glazer is roughly on par with torque (a Rust sonic-rs NIF) across the benchmarked workloads — neither library is consistently faster, and the gap on any given file/operation is typically within a few percent. Both sit well ahead of the other contenders (simdjsone, jiffy, and the pure-Elixir libraries jason, thoas, euneus, and OTP's built-in json).

Where glazer has an edge over torque:

Performance optimizations

A few implementation techniques in c_src/glaze_nif.cpp account for most of the gap over the slower contenders:

Testing

make test

runs the EUnit test suite via rebar3 eunit.

License

MIT License — see LICENSE for details.