CBOR

Implementation of RFC 8949 CBOR (Concise Binary Object Representation) for Elixir.

This is a fork of excbor which modernizes the codebase, and makes decisions on handling data types that the original library had punted on.

Migrating from the previous version

This library is a fork of the no longer maintained excbor project.

For those migrating from previous versions of this library there are breaking changes that you should be aware of.

The module Cbor has been renamed to CBOR.

CBOR.decode will return a three item tuple of the form {:ok, decoded, rest}, instead of returning the decoded object. In the wild there are APIs that concat CBOR objects together. The rest variable includes any leftover information from the decoding operation in case you need to decode multiple objects.

Atoms will be encoded/decoded as strings, except for the special case of :__undefined__ which has no direct translation to elixir but has semantic meaning in CBOR.

Elixir/Erlang does not have a concept of infinity, negative infinity or NaN. In order to encode or decode these values we will return a struct of the form %CBOR.Tag{tag: :float, value: (:inf|:"-inf"|:nan)}

If you want to encode a raw binary value, you can use the CBOR.Tag struct with a tag of :bytes and the binary as the :value field.

Upgrading from 1.x to 2.0

Version 2.0 brings the library up to RFC 8949 (December 2020) and the related ecosystem RFCs (RFC 8943, the cbor-sets-spec). The wire format is still compatible with 7049, but encoders now produce spec-correct output where there's a clean alternative.

Encoder changes (wire output may differ)

Date encodes as tag 1004 (RFC 8943 full-date string), not tag 0. The old form was technically invalid CBOR per RFC 8949 §3.4.1.
MapSet encodes as tag 258 (cbor-sets-spec) wrapping a CBOR array, preserving set semantics on round-trip. Previously emitted as a bare array.
Floats encode in the shortest IEEE 754 form (binary16/32/64) that exactly preserves the value (RFC 8949 §4.1). Previously always 64-bit.
Map keys are emitted in bytewise lexicographic order of their encoded form (RFC 8949 §4.2.1) — deterministic by default.

The encoder follows §4.2.1 (map-key sort) and §4.1 (shortest float form) but does not implement the §4.2.2 rule that integer-valued floats must be encoded as integers — CBOR.encode(1.0) emits a binary16 float, not the integer 1. Callers that need byte-identical Core Deterministic output for COSE/CWT/content-addressed use cases should encode integer-valued floats manually as integers before calling CBOR.encode/1.

Decoder changes

Tag 1 (epoch-based date/time) auto-decodes to DateTime. Previously passed through as %CBOR.Tag{tag: 1, value: ...}. Pass decode_epoch_time: false to keep the raw integer/float.
New built-in decoders: tag 100 (Date), tag 1004 (Date), tag 258 (MapSet), tag 55799 (self-described CBOR — strip the marker).

Reading 1.x data

The decoder remains backward-compatible with everything 1.x emitted: tag 0 + bare-date strings still decode to Date, and untagged arrays still decode to List.

Maps with colliding keys

CBOR.encode/1 now raises ArgumentError if two distinct Elixir keys encode to the same wire bytes — :foo/"foo" (atoms encode as text strings), {1, 2}/[1, 2] (tuples encode as arrays), or any custom struct whose defimpl CBOR.Encoder produces a key already present. v1.x silently emitted invalid CBOR with both keys, violating RFC 8949 §5.6. The error message names both colliding Elixir keys.

If you hit this on upgrade, the fix is at the call site — pick one shape and stick to it. For atom-vs-string ambiguity, normalize to strings before encoding:

map |> Map.new(fn {k, v} -> {to_string(k), v} end) |> CBOR.encode()

Minimum runtime

Elixir 1.17 and Erlang/OTP 27 are now required.

Requirements

Elixir 1.17 or later
Erlang/OTP 27 or later

Installation

def deps do
  [
    {:cbor, "~> 2.0"}
  ]
end

Usage

This library follows the standard API for CBOR libraries by exposing two methods on the CBOR module CBOR.encode/1 and CBOR.decode/1.

Encoding

iex(1)> CBOR.encode([1, [2, 3]])
<<130, 1, 130, 2, 3>>

Decoding

iex(2)> CBOR.decode(<<130, 1, 130, 2, 3>>)
{:ok, [1, [2, 3]], ""}

Design Notes

Given that Elixir has more available data types than are supported in CBOR, decisions were made so that encoding complex data structures succeed without throwing errors. My thoughts are collected below so you can understand why encoding and decoding of a value does not necessarily return exactly the same value.

Atoms

The only atoms that will be directly encoded are true, falsenil and __undefined__. Every other atom will be converted to a string before being encoded. We surround undefined with double underscores so that you only encode an undefined value when you clearly intend to do so.

Keyword List, Range, Tuple

These structures are converted to Lists before being encoded — CBOR has no native representation for any of them. Round-tripping a tuple gives back a list, etc.

MapSet

MapSet round-trips via tag 258 (the cbor-sets-spec) wrapping a CBOR array. Set semantics are preserved across encode/decode.

Date

Date encodes as tag 1004 (RFC 8943 full-date string). The decoder also accepts the older tag-0-with-bare-date-string form for compatibility with data emitted by 1.x of this library.

Time

CBOR has no IANA-registered tag for time-of-day. Time continues to encode as tag 0 + ISO 8601 partial-time string — technically out-of-spec for tag 0 (which requires a full date-time per RFC 8949 §3.4.1) but preserves round-trip behaviour. Strict-mode decoding rejects this form.

DateTime

Tag 0 (RFC 3339 date-time) decodes to DateTime. For Z-form input, the result is in UTC. For non-Z input (e.g. "2024-01-01T00:00:00+05:00"), the decoded DateTime carries the original offset via its utc_offset and synthetic time_zone fields (e.g. "+05:00") — round-trip through encode produces the same wire bytes.

DateTime.compare/2, DateTime.to_unix/1, and arithmetic operations work equivalently across both forms (same instant, different presentation). Inter-zone conversion via DateTime.shift_zone/2 requires a tz database (e.g. tzdata); the synthetic time_zone field is an ISO 8601 offset string, not an IANA zone name.

NaiveDateTime

NaiveDateTime will be treated as if they are UTC.

Special Values

Elixir and erlang have no concept of infinity, negative infinity and NaN. If you want to encode those values, we have a special struct CBOR.Tag which you can use to represent those values.

%CBOR.Tag{tag: :float, value: :inf}

%CBOR.Tag{tag: :float, value: :"-inf"}

%CBOR.Tag{tag: :float, value: :nan}

CBOR.Tag is also useful if you want to extend CBOR for internal applications

Decoder Options

CBOR.decode/2 accepts a keyword list of options:

:tag_decoders — list of modules implementing CBOR.TagDecoder for tags this library doesn't natively handle. Built-in tag numbers (0, 1, 2, 3, 32, 100, 258, 1004, 55799) are sealed — registering a decoder for one raises ArgumentError at decode time.
:decode_epoch_time — true (default) auto-decodes tag 1 to DateTime. Set false to receive the raw integer/float wrapped in %CBOR.Tag{tag: 1, value: ...}.
:on_duplicate_key — :last_wins (default), :first_wins, or :error. The :error option returns {:error, {:duplicate_key, key}}.
:max_depth — positive integer, default 256. Rejects inputs whose longest root-to-leaf chain of CBOR data items exceeds this, returning {:error, {:max_depth_exceeded, limit}}. Each container, tag wrapper, and primitive on the chain counts as one level. Defends against hostile depth-bomb input (a few-byte payload that allocates super-linearly).
:strict — false (default). Set to true to reject not-well-formed CBOR (per RFC 8949 §3 and Appendix F) as typed errors. Catches reserved two-byte simple values, indefinite-length on major types 0/1/6, stray break codes, nested indefinite-length string chunks, tag 2/3 with non-byte-string content, tag 32 with non-URI-reference content, and tag 24 inner content that is not exactly one well-formed CBOR data item (RFC 8949 §3.4.5.1).

Options are validated up front: unknown option keys, wrong-type values, out-of-set values, and non-positive :max_depth raise ArgumentError naming the option and its expected shape, rather than surfacing as a misleading {:not_well_formed, _} decode error.

Example:

CBOR.decode(bytes, strict: true, on_duplicate_key: :error)

Rendering errors

CBOR.format_error/1 turns a decode_error() term into a human-readable string suitable for logs and operator surfaces. One clause per typed variant, with RFC 8949 section references inline so triagers can reach for the spec without re-parsing the reason atom.

case CBOR.decode(bytes, strict: true) do
  {:ok, value, ""} -> value
  {:error, reason} -> Logger.warning("CBOR decode failed: " <> CBOR.format_error(reason))
end

The returned strings are for human consumption — don't pattern-match on them. Wording may improve in a patch release.

Custom Encoding

If you want to encode something that is not supported out of the box you can implement the CBOR.Encoder protocol for the module. You only have to implement a single CBOR.Encoder.encode_into/2 function. An example for encoding a Money struct is given below.

defimpl CBOR.Encoder, for: Money do
  def encode_into(money, acc) do
    money |> Money.to_string() |> CBOR.Encoder.encode_into(acc)
  end
end

Custom Tag Decoding

For tags this library does not decode natively, implement CBOR.TagDecoder and pass the module via :tag_decoders. Example for binary UUIDs (tag 37):

defmodule MyApp.UUIDDecoder do
  @behaviour CBOR.TagDecoder

  @impl true
  def tag_number, do: 37

  @impl true
  def decode(%CBOR.Tag{tag: :bytes, value: bytes}) when byte_size(bytes) == 16 do
    {:ok, bytes}
  end

  def decode(_), do: :error
end

CBOR.decode(bytes, tag_decoders: [MyApp.UUIDDecoder])

Built-in tag numbers cannot be overridden. Two user modules registering for the same tag also raise ArgumentError.

Built-in: tag 24 (Encoded CBOR data item)

Tag 24 (RFC 8949 §3.4.5.1) wraps a byte string that itself contains CBOR. By default the library leaves it wrapped:

%CBOR.Tag{tag: 24, value: %CBOR.Tag{tag: :bytes, value: <inner_bytes>}}

Pass CBOR.TagDecoders.EncodedCBOR to recursively decode the inner data item:

CBOR.decode(bytes, tag_decoders: [CBOR.TagDecoders.EncodedCBOR])

The inner decode inherits the outer call's options (:max_depth, :strict, :on_duplicate_key, :tag_decoders, :decode_epoch_time), so nested tag-24 wrappers respect the outer depth budget. In strict mode, trailing bytes or malformed inner CBOR surface as {:error, {:tag_decoder_failed, 24, reason}}. Strict mode also validates the inner content even without this decoder registered, in which case errors surface as {:error, {:invalid_tag, 24, reason}} and the success-path result stays wrapped (strict validates without auto-unwrapping — opt into EncodedCBOR for the unwrap).

Why are built-in tags sealed?

Tags 0, 1, 2, 3, 32, 100, 258, 1004, and 55799 have RFC-defined semantics (RFC 8949 §3.4 + RFC 8943 + cbor-sets-spec). Allowing user code to override them would create interop hazards: the same wire bytes would decode differently in different applications. The library raises ArgumentError at registration to surface the collision at the source rather than in production.

If you need custom handling for one of these tag numbers — for example, consuming non-conforming wire data from a legacy peer — there are two paths:

Decode normally and post-process. The library falls back to %CBOR.Tag{tag: N, value: <content>} whenever content fails built-in validation (e.g. tag 1 with a string instead of an epoch number, tag 2 with non-byte-string content). Pattern-match on the wrap and apply the legacy interpretation in your consumer.
Use a non-built-in tag for your own data. CBOR's tag namespace is uint64; pick something outside the IANA-registered range and CBOR.TagDecoder gives you full control.

API stability

This library follows Semantic Versioning. Within a 2.x line:

Stable:

The shape of decode_error() — variants will not be removed or renamed. New variants may be added; pattern-matching consumers should include a default clause.
The exception classes CBOR.encode/1 raises (Protocol.UndefinedError, ArgumentError), as documented in its docstring.

Not stable:

Strings returned by CBOR.format_error/1 are for human consumption. Don't pattern-match on them — wording may improve in a patch release.
Specific reasons within the strict-mode "documented partial-coverage gaps". Inputs currently surfacing as :malformed_header / :malformed / :truncated (BEAM-class catch-alls translated by the public rescue) may reclassify to typed {:invalid_tag, _, _} or more specific {:not_well_formed, _} reasons in a future minor as strict-mode coverage grows. The fence tests in options_test.exs under "strict option (documented partial-coverage gaps)" pin the current boundary; treat the catch-alls as non-final classifications.
Encoder wire output may shift if RFC 8949 §4.2.2 (Core Deterministic integer-valued floats as integers) is later implemented. Decoded values still round-trip, but byte-for-byte equivalence with a prior release is not promised.

Documentation

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/cbor.

Copyright and License

This work is free. You can redistribute it and/or modify it under the terms of the MIT License. See the LICENSE.md file for more details.