dataprep
Composable, type-driven preprocessing and validation combinator library for Gleam.
dataprep is a combinator toolkit, not a rule catalog.
- Built-in and user-defined rules are identical in power.
- No domain-specific rules (email, URL, UUID). Write your own or use a dedicated package.
- No schema, no DSL, no reflection.
- Prep transforms. Validator checks. They do not mix.
- Errors are your types, not ours.
Requirements
- Gleam 1.15 or later
- Erlang/OTP 27 or later (when targeting Erlang)
- Node.js 18 or later (when targeting JavaScript)
Supported targets
- Erlang (BEAM) — for server-side use (e.g. wisp, mist)
-
JavaScript — for client-side use (e.g. Lustre form validation in the
browser). The package contains zero FFI and zero target-specific code,
so the same
Validator(a, e)can be shared between client and server code.
Install
gleam add dataprepQuick start
import dataprep/prep
import dataprep/validated.{type Validated}
import dataprep/rules
pub type User {
User(name: String, age: Int)
}
pub type Err {
NameEmpty
AgeTooYoung
}
pub fn validate_user(name: String, age: Int) -> Validated(User, Err) {
let clean = prep.trim() |> prep.then(prep.lowercase())
let check_name = rules.not_empty(NameEmpty)
let check_age = rules.min_int(0, AgeTooYoung)
validated.map2(
User,
name |> clean |> check_name,
check_age(age),
)
}
// validate_user(" Alice ", 25) -> Valid(User("alice", 25))
// validate_user("", -1) -> Invalid([NameEmpty, AgeTooYoung])Note on composing rules: Each
rules.*function returns a validator (fn(a) -> Validated(a, e)), not a transformed value. You cannot pipe one rule into another directly — usevalidator.bothto run checks in parallel (accumulating errors) orvalidator.guardto short-circuit (skip later checks if an earlier one fails):import dataprep/rules import dataprep/validator // ✗ Won't compile — piping a validator fn into another rule let check = rules.not_empty(Empty) |> rules.min_length(3, TooShort) // ✓ Correct — combine validators explicitly let check = rules.not_empty(Empty) |> validator.guard(rules.min_length(3, TooShort))
Examples
Field validation with structured error context
Attach field names to errors so callers can identify which field failed.
import dataprep/prep
import dataprep/rules
import dataprep/validated.{type Validated}
import dataprep/validator
pub type FormError {
Field(name: String, detail: FieldDetail)
}
pub type FieldDetail {
Empty
TooShort(min: Int)
TooLong(max: Int)
}
pub fn validate_username(raw: String) -> Validated(String, FormError) {
let clean = prep.trim() |> prep.then(prep.lowercase())
let check =
rules.not_empty(Empty)
|> validator.guard(
rules.min_length(3, TooShort(3))
|> validator.both(rules.max_length(20, TooLong(20))),
)
|> validator.label("username", Field)
raw |> clean |> check
}
// validate_username(" Al ")
// -> Invalid([Field("username", TooShort(3))])
// validate_username(" Alice ")
// -> Valid("alice")Parse then validate
Use validated.and_then to bridge type-changing parsing with
same-type validation. Parsing short-circuits; validation accumulates.
import dataprep/parse
import dataprep/rules
import dataprep/validated.{type Validated}
import dataprep/validator
pub type AgeError {
NotAnInteger(raw: String)
TooYoung(min: Int)
TooOld(max: Int)
}
pub fn validate_age(raw: String) -> Validated(Int, AgeError) {
let check_range =
rules.min_int(0, TooYoung(0))
|> validator.both(rules.max_int(150, TooOld(150)))
parse.int(raw, NotAnInteger)
|> validated.and_then(check_range)
}
// validate_age("abc") -> Invalid([NotAnInteger("abc")])
// validate_age("200") -> Invalid([TooOld(150)])
// validate_age("25") -> Valid(25)Nested error labeling with map3
Combine multiple fields into a domain type. All errors from all fields are accumulated with their field names.
import dataprep/prep
import dataprep/rules
import dataprep/validated.{type Validated}
import dataprep/validator
pub type SignupForm {
SignupForm(name: String, email: String, age: Int)
}
pub type SignupError {
Field(name: String, detail: Detail)
}
pub type Detail {
Empty
TooShort(min: Int)
OutOfRange(min: Int, max: Int)
}
fn validate_name(raw: String) -> Validated(String, SignupError) {
let clean = prep.trim() |> prep.then(prep.lowercase())
let check =
rules.not_empty(Empty)
|> validator.guard(rules.min_length(2, TooShort(2)))
|> validator.label("name", Field)
raw |> clean |> check
}
fn validate_email(raw: String) -> Validated(String, SignupError) {
let clean = prep.trim() |> prep.then(prep.lowercase())
let check =
rules.not_empty(Empty)
|> validator.label("email", Field)
raw |> clean |> check
}
fn validate_age(age: Int) -> Validated(Int, SignupError) {
let check =
rules.min_int(0, OutOfRange(0, 150))
|> validator.both(rules.max_int(150, OutOfRange(0, 150)))
|> validator.label("age", Field)
check(age)
}
pub fn validate_signup(
name: String,
email: String,
age: Int,
) -> Validated(SignupForm, SignupError) {
validated.map3(
SignupForm,
validate_name(name),
validate_email(email),
validate_age(age),
)
}
// validate_signup("", "", 200)
// -> Invalid([
// Field("name", Empty),
// Field("email", Empty),
// Field("age", OutOfRange(0, 150)),
// ])
Pattern matching with rules.matches / matches_string
matches and matches_string use regexp.check semantics — they
pass as long as the pattern hits anywhere in the input. A
pattern like [0-9]+ will accept "abc123def" because the digit
run matches a substring. For the validation case (\"the whole
string must look like an email / slug / number\"), use the
matches_fully / matches_fully_string siblings, which compare
the matched span against the entire input.
There are three intended ways to construct a regex-driven validator:
- Pre-compiled
Regexp— pass aregexp.Regexptomatches/matches_fully. Pattern errors surface as aregexp.from_stringResultat the call site, before the validator is built. - Literal convenience —
matches_string/matches_fully_string. The helper compiles internally and panics on a malformed literal, which is a programmer error there is no useful recovery from. Use this only when the pattern is hard-coded at the call site. - Checked dynamic pattern —
matches_string_checked/matches_fully_string_checkedreturnResult(Validator, RegexRuleError). Use this for config-driven or admin-supplied patterns where a malformed regex is a runtime condition that should be handled rather than crash the process.
import dataprep/rules
import dataprep/validated.{type Validated}
import gleam/regexp
import gleam/result
pub type TagError {
BadFormat
}
// Literal pattern with full-match semantics — the convenience
// helper compiles once at construction. No `let assert Ok(_)`
// boilerplate at the call site, and a substring hit on a partial
// pattern (like `[a-z0-9-]+`) does NOT silently slip through.
pub fn validate_tag(raw: String) -> Validated(String, TagError) {
let check =
rules.matches_fully_string(pattern: "[a-z0-9-]+", error: BadFormat)
check(raw)
}
// Dynamic pattern — the caller controls the compile error.
pub fn validate_with(
raw: String,
pattern: String,
) -> Result(Validated(String, TagError), regexp.CompileError) {
use re <- result.map(regexp.from_string(pattern))
rules.matches(pattern: re, error: BadFormat)(raw)
}
// validate_tag("ok-1") -> Valid("ok-1")
// validate_tag("BAD!") -> Invalid([BadFormat])default vs default_when_blank
prep.default(fallback) only fires on the literal empty string "". Whitespace-only inputs (" ", "\t", "\r\n") pass through unchanged. Use prep.default_when_blank(fallback) when \"blank\" should also include whitespace-only.
import dataprep/prep
let strict = prep.default("N/A")
strict("") // "N/A"
strict(" ") // " " ← passed through
strict("\t") // "\t" ← passed through
strict("hi") // "hi"
let lenient = prep.default_when_blank("N/A")
lenient("") // "N/A"
lenient(" ") // "N/A" ← whitespace-only treated as blank
lenient("\t\n") // "N/A"
lenient(" hi ") // " hi " ← original input preserved on non-blank
// Want the trimmed form on the non-blank path? Compose explicitly:
let normalised = prep.trim() |> prep.then(first: _, next: prep.default("N/A"))
normalised(" hi ") // "hi"
normalised(" ") // "N/A"More examples are available in the doc/recipes/ directory of the repository.
Modules
| Module | Responsibility |
|---|---|
dataprep/prep |
Infallible transformations: trim, lowercase, uppercase, collapse_space (ASCII whitespace only), collapse_unicode_space (full Unicode \s), replace, default, default_when_blank. Compose with then or sequence. |
dataprep/validator |
Checks without transformation: check, predicate, both, all, alt, guard, map_error, label, each, optional. |
dataprep/validated |
Applicative error accumulation: map, map_error, and_then, from_result, from_result_map, to_result, map2..map5, sequence, traverse, traverse_indexed. |
dataprep/non_empty_list |
At-least-one guarantee for error lists: single, cons, append, concat, map, flat_map, to_list, from_list. |
dataprep/rules |
Built-in rules: not_empty, not_blank, matches, matches_string, matches_string_checked, matches_fully, matches_fully_string, matches_fully_string_checked, min_length, max_length, length_between, min_int, max_int, min_float, max_float, non_negative_int, non_negative_float, one_of, equals. |
dataprep/parse |
Parse helpers: int, float, float_strict. Bridge String to typed Validated with custom error mapping. |
Composition overview
| Phase | Combinator | Errors | When to use |
|---|---|---|---|
| Prep | prep.then | (none) | Chain infallible transforms |
| Validate | validator.both / all | Accumulate all | Independent checks on same value |
| Validate | validator.alt | Accumulate on full failure | Accept alternative forms |
| Validate | validator.guard | Short-circuit | Skip if prerequisite fails |
| Combine | validated.map2..map5 | Accumulate all | Build domain types from independent fields |
| Bridge | validated.and_then | Short-circuit | Parse then validate (type changes) |
| Bridge | parse.int / parse.float | Short-circuit | String to typed Validated in one step |
| Bridge | raw |> prep |> validator | (prep has none) | Apply infallible transform before validation |
| Collection | validated.sequence / traverse | Accumulate all | Validate a list of values |
| Collection | validator.each | Accumulate all | Apply a validator to every list element |
| Collection | validator.optional | (none if None) | Skip validation for absent values |
Scope policy
dataprep is a combinator toolkit, not a rule catalog. The library deliberately ships only the building blocks needed to construct typed, error-accumulating validators:
-
infallible string transforms (
prep), -
generic checks (
validator,rules), -
applicative error accumulation (
validated,non_empty_list), StringtoInt/Floatparsers (parse).
Domain-specific parsers (email, url, uuid, iso_datetime,
ipv4, ...) are intentionally not in scope. The recommended
path is to compose the primitives above into the parser you need,
or to depend on a domain-specific package alongside dataprep.
See "Building your own parser" below for the recipes.
Building your own parser
The recipes below cover the common shapes a caller actually wants.
Each recipe uses only the public API and is verified by the tests in
test/dataprep/cookbook_test.gleam.
The recipes share one error type so they can be combined inside the same form-validation flow:
pub type Err {
NotAnInteger(raw: String)
NotPositive
WrongLength(min: Int, max: Int, got: Int)
NotUuid(raw: String)
NotAllowed(raw: String)
}
Recipe 1: positive_int
Parse to Int, then enforce > 0. Uses validated.and_then to
short-circuit when the parse itself fails.
import dataprep/parse
import dataprep/validated.{type Validated}
import dataprep/validator
fn positive_int(raw: String) -> Validated(Int, Err) {
use n <- validated.and_then(parse.int(raw, NotAnInteger))
validator.predicate(fn(x) { x > 0 }, NotPositive)(n)
}
Recipe 2: bounded_string
Trim, then enforce length is in [min, max].
import dataprep/prep
import dataprep/rules
import dataprep/validated.{type Validated}
import gleam/string
fn bounded_string(
min: Int,
max: Int,
) -> fn(String) -> Validated(String, Err) {
fn(raw: String) {
let trimmed = prep.run(prep: prep.trim(), value: raw)
rules.length_between(
minimum: min,
maximum: max,
error: WrongLength(min, max, string.length(trimmed)),
)(trimmed)
}
}
Recipe 3: uuid_v4_lowercase
Trim, lowercase, then regex match. Demonstrates prep.then for
chained infallible normalisation before validation.
import dataprep/prep
import dataprep/rules
import dataprep/validated.{type Validated}
fn uuid_v4_lowercase(raw: String) -> Validated(String, Err) {
let normalized =
prep.run(
prep: prep.then(first: prep.trim(), next: prep.lowercase()),
value: raw,
)
rules.matches_fully_string(
pattern: "[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}",
error: NotUuid(raw),
)(normalized)
}
Recipe 4: enum_of_strings_ci
Case-insensitive match against a fixed allow-list. Returns a parameterised validator so callers can vary the allowed set.
import dataprep/prep
import dataprep/rules
import dataprep/validated.{type Validated}
fn enum_of_strings_ci(
allowed: List(String),
) -> fn(String) -> Validated(String, Err) {
fn(raw: String) {
let normalized = prep.run(prep: prep.lowercase(), value: raw)
rules.one_of(allowed: allowed, error: NotAllowed(raw))(normalized)
}
}
The composition pattern is the same in every case: prep.run to
normalise the raw input, then a rules or validator combinator
to check the normalised value, then optionally validated.and_then
to chain a follow-up step that depends on a successful prior step.
Out of scope, by design
The following are intentionally not provided by dataprep, even though some of them appear in adjacent libraries in other languages:
email,url,uriparsing (use a URI- or email-specific package)iso_datetime/ time arithmetic (use agleam_time-shaped package)uuid/ulidgeneration (use a UUID-shaped package)- JSON shape validation (use a JSON-schema package)
- HTML / XML sanitisation (use a sanitiser package)
These have implementation-defining standards or substantial spec surface that would push dataprep from "small primitives" toward "kitchen sink." Keeping the scope tight is what lets the combinators stay composable.
Development
This project uses mise to manage Gleam and Erlang versions, and just as a task runner.
mise install # install Gleam and Erlang
just ci # format check, typecheck, build, test
just test # gleam test
just format # gleam format
just check # all checks without deps downloadContributing
Contributions are welcome. See CONTRIBUTING.md for details.