yamleam

A pure-Gleam YAML parser.

The library aims to be a functionally correct implementation of the spec for the sections widely used in, broadly speaking, "commercial, service and operations" fields. For perspective, we aimed version 0.1 to cover 95%+ of the YAML files people actually write, including Kubernetes manifests without anchors, GitHub Actions workflows, Helm values files, docker-compose files, ruleset definitions, and ordinary config files.

We were surprised and impressed with the deeper layers of information architecture theory addressed by the spec. We believe that depth maps to specialized domains, but urge you to review the current coverage for your case.

Known unsupported features are intended to fail explicitly instead of producing wrong output.

Why yamleam?

yamleam exists to provide a pure-Gleam implementation that:

Ships as a regular hex package with no FFI or C dependencies
Returns typed YamlNode values that you decode with gleam/dynamic/decode-style decoders
Mirrors the gleam_json API so existing Gleam users can adopt it with muscle memory
Prioritizes correctness over feature completeness — a small supported subset done right, not a large subset done approximately
Fails loudly and helpfully when you hit unsupported features, never silently

Coverage matrix

yamleam ships with partial coverage of YAML 1.2, and is not planned to reach parity with the full substantial specification - as mentioned earlier, we do believe the covered surface will benefit a vast swath of the format's users. This matrix is the source of truth for what is and isn't supported in the current version.

Supported (v1.0.0)

Feature	Status
Comments (`# ...`)	✓
Block-style mappings (`key: value`)	✓
Block-style sequences (`- item`)	✓
Nested structures (arbitrary depth via indentation)	✓
Plain scalars (unquoted)	✓
Single-quoted strings	✓
Double-quoted strings with basic escapes (`\n`, `\t`, `\"`, `\\`, `\/`, `\r`)	✓
Literal block scalars (`\|`, `\|-`, `\|+`)	✓
Folded block scalars (`>`, `>-`, `>+`)	✓
Multi-document streams (`---`, `...`)	✓
Flow-style sequences (`[1, 2, 3]`)	✓
Flow-style mappings (`{a: 1, b: 2}`)	✓
Anchors and aliases (`&name`, `*name`)	✓
Merge keys (`<<: *base`)	✓
Null (`null`, `~`, or empty value)	✓
Booleans (`true`, `false`)	✓
Integers (decimal)	✓
Floats (decimal with optional exponent)	✓
Strings (fallback for unresolved plain scalars)	✓
Duplicate mapping key rejection (per YAML 1.2 spec)	✓

Not yet supported (returns explicit errors)

Feature	Status	Planned
Multi-line flow collections (flow that spans source lines)	✗ parse-time error (single-line only)	planned
Explicit indent indicators (`\|2`, `\|+1`)	✗ parse-time error (auto-detect only)	planned
Tags (`!!int`, `!Custom`)	✗ `Unsupported`	planned
Complex keys (map as key)	✗ parse-time error	planned
YAML 1.1 boolean variants (`yes`/`no`/`on`/`off`)	✗ not planned	use `true`/`false`
YAML 1.1 octal (`0777`)	✗ not planned	use `0o777`

Some features are "not planned": YAML 1.1 has complexities with the "Norway problem", implicit octals, loose boolean literals and others that YAML 1.2 fixes, and we follow 1.2.

Installation


gleam add yamleam

Quick example

import gleam/dynamic/decode
import yamleam

pub type Config {
  Config(name: String, port: Int, debug: Bool)
}

pub fn load_config() -> Result(Config, yamleam.YamlError) {
  let source = "
name: my-service
port: 8080
debug: true
"

  let decoder = {
    use name <- decode.field("name", decode.string)
    use port <- decode.field("port", decode.int)
    use debug <- decode.field("debug", decode.bool)
    decode.success(Config(name:, port:, debug:))
  }

  yamleam.parse(source, decoder)
}

Working with the raw tree

If you need the typed node tree directly without a decoder:

import yamleam

pub fn main() {
  let assert Ok(tree) = yamleam.parse_raw("
title: Example
items:
  - alpha
  - beta
")
  // tree is a YamlNode.YamlMap([
  //   #("title", YamlString("Example")),
  //   #("items", YamlList([YamlString("alpha"), YamlString("beta")])),
  // ])
}

Multi-document streams

Parse a stream containing several documents separated by ---:

import yamleam

pub fn main() {
  let source = "
---
kind: ConfigMap
name: app-config
---
kind: Service
name: app-svc
---
kind: Deployment
name: app-deploy
"
  let assert Ok(documents) = yamleam.parse_documents_raw(source)
  // documents is List(YamlNode) — one entry per document.
}

parse_documents(source, decoder) runs a decoder against each document and returns List(a).

parse_raw and parse are single-document APIs. If the input contains more than one document, they return a ParseError instead of silently discarding the remainder of the stream.

Anchors, aliases, and merge keys

The classic CI / DRY pattern:

defaults: &defaults
  retries: 3
  timeout: 60
  notify: ops@example.com

job_a:
  <<: *defaults
  command: build

job_b:
  <<: *defaults
  command: test
  timeout: 300       # overrides the merged value in place

Parses cleanly: job_a ends up with all four entries from defaults plus its own command. job_b overrides timeout while keeping the other defaults. Local explicit keys always win over merged keys with the same name.

Handling unsupported features

When yamleam encounters a feature it doesn't yet support, it returns a clear error rather than parsing incorrectly:

import yamleam

pub fn main() {
  // Tags (&#39;!!int&#39;, &#39;!Custom&#39;) are not yet supported.
  let source = "value: !!str 42"
  case yamleam.parse_raw(source) {
    Ok(_) -> Nil
    Error(yamleam.Unsupported(feature: f, line: _, column: _)) -> {
      // f = "tags (&#39;!type&#39;) — planned for v0.6"
      let _ = f
      Nil
    }
    Error(_) -> {
      // parse errors, etc.
      Nil
    }
  }
}

Design philosophy

1. Deliberate partial coverage

The YAML 1.2 specification is large. Most existing YAML libraries either aim for full spec compliance (taking substantial work and containing many rare-edge-case bugs) or implement a subset without communicating it.

yamleam picks an explicit subset, intentionally documented, with clear errors, that we believe serves a substantial map of use cases.

2. Decoder API mirrors `gleam_json`

Gleam users already know how to decode dynamic JSON, and keeping this new mental model for YAML seems aligned with the language's philosophy. yamleam's parse(source, decoder) takes a standard gleam/dynamic/decode decoder, the same kind gleam_json.parse takes.

3. Readability before performance

YAML parsing is almost never the performance bottleneck in any real system. yamleam optimizes for clarity of implementation over raw speed. Once the parser is correct and covers a meaningful subset, performance work can happen as a separate effort.

4. Pure Gleam

yamleam is written entirely in Gleam — no Erlang FFI, no C NIFs, no external tools. gleam build is enough. The per-document anchor table is a plain dict.Dict(String, YamlNode) threaded explicitly through the parser as a state parameter, with the lexical scope of "fresh table at the start of each document, accumulated as anchors are encountered, never escaping the parser."

5. Tested against realistic shapes

The test suite covers the YAML shapes that appear in real-world configs, manifests, and rulesets — block mappings with embedded scripts, sequences of inline mappings, multi-document streams, anchors with merge keys in the CI/template pattern, flow collections inside block context, and so on. As yamleam matures, we'll continue adding fixtures from real production sources to catch the edge cases that synthetic tests miss.

Untrusted input

yamleam is designed for parsing YAML you control or that comes from a trusted source. It is not hardened for parsing arbitrary documents of unverified provenance.

Specifically, the parser does not enforce limits on:

Document size — parse_raw accepts any string and walks it eagerly. A very large document will consume memory and CPU proportional to its size.
Nesting depth — block structure is parsed by recursive descent without a depth budget. A pathologically deeply-nested document can cause stack growth or long parse times.

If you need to parse YAML received from untrusted sources, enforce input size and timeout limits at your trust boundary before calling yamleam. Equip the process with a wall-clock timeout.

Roadmap

See ROADMAP.md for the phased implementation plan and long-term coverage goals.

Released

v0.1 ✓ block-style mappings, sequences, plain and quoted scalars, decoder layer
v0.1.1 ✓ scientific-exponent crash fix, duplicate-key rejection
v0.2 ✓ literal block scalars (\|, \|-, \|+)
v0.3 ✓ folded block scalars (>, >-, >+) + multi-document streams (---, ...)
v0.5 ✓ flow-style collections ([…], {…}) + anchors / aliases (&, *) + merge keys (<<)
v1.0 ✓ stable API for the documented subset (YamlNode, YamlError, parse, parse_raw, parse_documents, parse_documents_raw)

Planned

v0.6 — tags (!!str, !Custom), explicit indent indicators (\|2, >+1), multi-line flow collections
v0.7 — complex keys (map-as-key), additional double-quoted escapes (\u, \x)

Contributing

Contributions are welcome. Priority areas:

Real-world YAML fixtures — if you have YAML files from production systems that yamleam can't yet parse, add them to test/fixtures/ and open an issue
Error message quality — we want error messages to tell users exactly what went wrong and where
Documentation — examples, edge cases, migration guides from yamerl

Please open an issue before starting significant feature work so we can align on scope and ensure your effort lands in a version we're targeting.

Development

gleam test       # Run the test suite
gleam build      # Compile the library
gleam docs build # Build HTML docs

License

Apache-2.0.

Acknowledgements

Thanks to the maintainers of Gleam, yamerl, yaml-rust, and ocaml-yaml.