mesv

A type-focused CSV parsing and formatting library with extensive configuration options and documentation examples.

I wanted the name to be short and a single word, because I dislike snake_case. Adding the fact that one of the (planned) features of this library was adding frontmatter metadata to the generated CSV file, I somehow mangled the two words metadata and csv, and eventually arrived at mesv.

Important!

This library is still in active development, and although it's functional (check the unit tests to make sure), the types, function names, signatures, and behaviours are liable to change between minor versions until version 1.0.0.

If you do end up using this library, be careful when updating to avoid sudden changes!

Installation

To use in your project, use the gleam command line tool to add mesv.

gleam add mesv

Examples

Basic examples of both formatting (converting data to String) and parsing (reading String and converting to some data type).

Formatting

As formatting is the simpler use case of the two available, I will go over it first.

To generate a default CSV file from some data, you first need to create a Formatter. To do so, simply call the format.build function and pass in a function that converts your data type to a List(String).

import gleam/int
import mesv/format

const data: List(#(String, Int, Bool)) = [
  #("Adam", 20, True),
  #("Beatrice", 25, True),
  #("Colin", 2, False),
]

pub fn main() -> Nil {
  let formatter =
    // First create a formatter
    format.build(fn(val: #(String, Int, Bool)) -> List(String) {
      let #(name, age, adult) = val
      [
        name,
        int.to_string(age),
        case adult {
          True -> "true"
          False -> "false"
        },
      ]
    })

  // Then, use that formatter on the data you want to format
  let formatted_data = format.format(formatter, data)

  // By default, the formatter uses the comma as a column separator,
  // newline as the row separator, and doublequotes for escaping cells
  assert formatted_data == "Adam,20,true\nBeatrice,25,true\nColin,2,false"
}

Then just call the format.format function using your Formatter to convert a List of data into a String.

Setting custom separators and escapers

As mentioned in the comments of the above example, the Formatter is initialized with some default values.

Specifically:

, is used as the column separator
\n is used as the row separator
" is used as the escaper, to wrap cells that contain separators or escapers inside

However, all of these can be changed using the set_ functions.

// Setup code is hidden to reduce bloat

pub fn main() -> Nil {
  let formatter =
    // [...]
    |> format.set_col_sep("|")
    |> format.set_row_sep(";")
    |> format.set_escaper("'") // Effect not visible here

  let formatted_data = format.format(formatter, data)

  // As you can see, the formatted CSV is different,
  // though the data is identical.
  assert formatted_data == "Adam|20|true;Beatrice|25|true;Colin|2|false"
}

When are cells escaped?

By default, cells are only escaped (wrapped in escapers) if they contain a separator or an escaper.

In addition, if a cell contains an escaper, it is then replaced with two escapers, to maintain an even number.

import mesv/format

const data: List(#(String, String)) = [
  #("Adam", "Cool"),
  #("Beatrice", "It's |neat|"),
  #("Colin", "I guess it's fine;"),
]

pub fn main() -> Nil {
  let formatted_data =
    format.build(fn(val: #(String, String)) -> List(String) { [val.0, val.1] })
    |> format.set_col_sep("|")
    |> format.set_row_sep(";")
    |> format.set_escaper("'")
    |> format.format(data)

  // Only cells that need to be escaped will be wrapped in escapers.
  assert formatted_data
    == "Adam|Cool;Beatrice|'It''s |neat|';Colin|'I guess it''s fine;'"
}

Using the function format.set_escape_all, it's possible to make a formatter that escapes all cells. However, I don't know where this would be useful - I added it because I was on a roll, without thinking too much.

Headers

By default, the output will not contain headers - however, you can set headers by using the format.set_headers function and passing in a List(String), which will be prepended to the CSV String, joined with the column separators and appropriately escaped if necessary.

// [...]

const data: List(#(String, Int, Bool)) = [
  #("Adam", 20, True),
  #("Beatrice", 25, True),
  #("Colin", 2, False),
]

pub fn main() -> Nil {
  let formatted_data =
    // [...]
    |> format.set_headers(["Name", "Age", "Is an adult"])
    // ^ Setting headers 
    |> format.format(data)

  // The specified headers are prepended to the CSV string
  assert formatted_data
    == "Name,Age,Is an adult\nAdam,20,true\nBeatrice,25,true\nColin,2,false"
}

Parsing

Parsing CSV is a bit more complicated, as for all data types other than String, it's necessary to account for failing to parse a specific cell.

However, most of this behaviour is hidden away in the internal functions, so usage is relatively easy.

import gleam/int
import mesv
import mesv/parse

const expected_data: List(#(String, Int, Bool)) = [
  #("Adam", 20, True),
  #("Beatrice", 25, True),
  #("Colin", 2, False),
]

pub fn main() -> Nil {
  let parsed_data =
    parse.build({
      // Create a parsing function using `mesv.parsed`
      // to construct a curried parsing function
      use name <- mesv.parsed
      use age <- mesv.parsed
      use adult <- mesv.parsed

      // If any value fails (ie, returns Error(Nil)),
      // the parsing of a row will stop.
      // However, if it reaches here,
      // it returns the following data type
      #(name, age, adult)
    })
    |> parse.column(Ok)
    |> parse.column(int.parse)
    |> parse.column(fn(val: String) -> Result(Bool, Nil) {
      case val {
        "true" -> Ok(True)
        "false" -> Ok(False)
        _ -> Error(Nil)
      }
    })
    // Pass in the CSV String to parse
    |> parse.parse(
      "Adam,20,true\nBeatrice,25,true\nColin,2,false",
    )

  // The returned data is wrapped a bit weirdly, which I'm considering changing
  assert parsed_data == Ok(#(expected_data, []))
}

First, use the parse.build function to construct a Parser. This function expects a curried function - that is, a function that returns a function - with as many arguments as there are inputs to construct the final output.

Then, call parse.column to specify a function to use when parsing that column, and transform the Parser into one where one argument is filled.

It is possible to call parse.parse using a Parser without any columns specified - however, doing so would result in a List of curried functions without any data inside.

In the above example, the succesive functions do this:

First, parse.build creates a Parser(fn(String) -> fn(Int) -> fn(Bool) -> #(String, Int, Bool)).
Then, parse.column specifies that to transform the first element of a row from String to Result(String, Nil), use the function Ok.
The next parse.column says that to turn the second element from String to Result(Int, Nil), use the function int.parse.
The last parse.column is a custom one for parsing Bool - Return Ok(Bool) for true or false, and Error(Nil) for anything else.
Then, call the parse.parse function to use the above specified functions on each row, and if they succeed transform them into the specified data type, #(String, Int, Bool).

Headers

As in the Formatting section, handling headers is one of the features of mesv.

Specifically, if you know that the first row is going to be headers, you can specify what you expect they will be - then, if they don't match, the parser will return an Error(ExpectedHeadersMismatch).

// expected_data is identical
pub fn main() -> Nil {
  let parsed_data =
    // [...] parser is the exact same
    // Specify that the first row is the headers,
    // and if they don't match what is specified, 
    // the parsing will fail
    |> parse.expect_headers(["Name", "Age", "Is an adult"])
    |> parse.parse(
      "Name,Age,Is an adult\n"
      <> "Adam,20,true\n"
      <> "Beatrice,25,true\n"
      <> "Colin,2,false",
    )

  assert parsed_data == Ok(#(expected_data, []))
}

To add to the above explanation, this would be inserting a step between 4 and 5: 5. The parse.expect_headers says to the parser that the first row will be headers, and that they should be identical to what is passed in. If they aren't, the parsing returns an Error.

Setting custom separators and escapers

Just as with a formatter, a parser can also be configured to expect custom colum and row separators, and escapers.

It is important to mention however, that while row and column separators can be whatever character or word you want, for escapers, there is some ambiguity.

According to the CSV specification, the grammar of a CSV file specifies that if a cell (or field, as it's referred to) is escaped, it means that there are single doublequotes (escapers) on both sides, and if any doublequotes(escapers) appear inside of the cell, they should appear doubled.

And here lies the issue - when changing the escaper to be a single character, there is no ambiguity - when doubling a single character, there are two characters.

But imagine that you set the escaper to be ==. Now, as per the specification, if an instance of the escaper appears within a cell, it must be duplicated. So, take for example the cell a==b. Then, to escape it, wrap it in escapers and duplicate the instance of the escaper inside.

So, should that be ==a====b==, or ==a===b==? When trying to count instances of the escaper in the first one, there are 5, because there are three ways to find == in ====. But then, if it were the second, it doesn't really follow the specification either.

I'm sure there does exist a foolproof and reliable method of escaping cells with multi-character escapers, but I haven't found one, and I don't think there is need of it.

Tldr; use single character escapers.

Notes

The parse.parse function returns a rather strange type, that being Result(#(List(a), List(parse.ParsingError)), parse.ParsingError). This structure is liable to change, as I'm not very happy with how I handled parsing errors.

However, to explain it a bit - if the Parser had headers specified using expect_headers function and the first row didn't match the expected values, an Error(ExpectedHeadersMismatch) value is returned. If they did match or were not specified, the function will always return an Ok value.

In that Ok value is a tuple of lists. The first list is the list of successfully parsed rows, and the second list is the errors returned for each row that failed to parse.

Specifically, these two lists are obtained through calling the result.partition function on a List(Result(a, parse.ParsingError)), then reversing the two lists (since partition returns them in reverse order).

Documentation

Further documentation can be found at hexdocs/mesv, in the descriptions of the relevant modules.

Roadmap

As mentioned several times in this README, this project is still rather young, and there are several features I still want to implement - first among them the namesake of this library, adding metadata.

Here is a roadmap, describing the planned features:

Alpha (0.1.0) - Done! Basic features, a usable library for parsing CSV
Beta (0.2.0) - Typed frontmatter support, features to change parsing behaviour depending on metadata
Release (1.0.0) - Finalized types and error handling, two-in-one constructors (create formatter and parser using the same builder), feature for column-based cell parsing in addition to order-based
Streaming (1.1.0) - Support for stream-based inputs and outputs to reduce the memory requirements for large files