CSV Build StatusCoverage StatusInline docsHex pmHex DocsLicenseDownloads

RFC 4180 compliant CSV parsing and encoding for Elixir.

Installation

Add

{:csv, "~> 3.0"}

to your deps in mix.exs like so:

defp deps do
  [{:csv, "~> 3.0"}]
end

Getting all correctly formatted rows

CSV is a notoriously fickle format, with many implementations and files interpreting it differently.

For that reason, CSV implements a normal mode CSV.decode that will return a stream of ok: ["field1", "field2"] and err: "Message" tuples. It will also reparse lines after a previous line has opened an unterminated escape sequence, ensuring you get all correctly formatted rows.

The goal of this library is to allow to extract all correctly formatted rows, while displaying descriptive errors for incorrectly formatted rows.

In strict mode using CSV.decode! the library will raise an exception when it encounters the first error, aborting the operation.

Performance

Parallelism has been replaced by a single process binary matching parser with better performance in version 3.x. This library is able to parse about half a million rows of a moderately complex CSV file per second in a single process, ensuring that parsing will unlikely become a bottleneck.

If you are reading from a large file, CSV will perform best when streaming with :read_ahead in byte mode:

File.stream!("data.csv", [read_ahead: 100_000], 1000) |> CSV.decode()

While 1000 is usually a good default number of bytes to stream, you should measure performance and fine-tune byte size according to your environment.

Upgrading from 2.x

The main goal for 3.x has been to streamline the library API and leverage binary matching.

Upgrading should require few to no changes in most cases:

That’s it! Please open an issue if you see any other non-backward compatible behaviour so it can be documented.

Elixir version requirements

Usage

CSV can decode and encode from and to a stream of bytes or lines.

Decoding

Do this to decode data:

# Decode file line by line
File.stream!("data.csv")
|> CSV.decode()

# Decode file in chunks of 1000 bytes
File.stream!("data.csv", [], 1000) 
  |> CSV.decode()

# Decode a csv formatted string
["long,csv,string\\nwith,multiple,lines"] 
  |> Stream.map(&(&1)) 
  |> CSV.decode()

# Decode a list of arbitrarily chunked csv data
["list,", "of,arbitrarily", "\\nchun", "ked,csv,data\\n"] 
  |> Stream.map(&(&1)) 
  |> CSV.decode()

And you’ll get a stream of row tuples:

[ok: ["a", "b"], ok: ["c", "d"]]

And, potentially error tuples:

[error: "", ok: ["c", "d"]]

Use strict mode decode! to get a two-dimensional list, raising errors as they occur, aborting the operation:

File.stream!("data.csv") |> CSV.decode!

Options

For all available options check the docs on decodeand decode!

Specify a semicolon separator:

stream |> CSV.decode(separator: ?;)

Specify a custom escape character:

stream |> CSV.decode(escape_character: ?@)

Apply a transformation to a field when parsed, e.g. trimming the field:

stream |> CSV.decode(field_transform: &String.trim/1)

Unescape formulas that have been escaped:

stream |> CSV.decode(unescape_formulas: true)

Encoding

Do this to encode a table (two-dimensional list):

table_data |> CSV.encode

And you’ll get a stream of lines ready to be written to an IO. So, this is writing to a file:

file = File.open!("test.csv", [:write, :utf8])
table_data |> CSV.encode |> Enum.each(&IO.write(file, &1))

Options

Use a semicolon separator:

your_data |> CSV.encode(separator: ?;)

Use a specific escape character:

your_data |> CSV.encode(escape_character: ?@)

You can also specify headers when encoding, which will encode map values into the right place:

[%{"a" => "value!"}] |> CSV.encode(headers: ["z", "a"])
# ["z,a\\r\\n", ",value!\\r\\n"]

You can also specify a keyword list, the keys of the list will be used as the keys for the rows, but the values will be the value used for the header row name in CSV output

[%{a: "value!"}] |> CSV.encode(headers: [a: "x", b: "y"])
# ["x,y\\r\\n", "value!,\\r\\n"]

You’ll surely appreciate some more info on encode.

Polymorphic encoding

Make sure your data gets encoded the way you want - implement the CSV.Encode protocol for whatever you wish to encode:

defimpl CSV.Encode, for: MyData do
  def encode(%MyData{has: fun}, env \\ []) do
    "so much #{fun}" |> CSV.Encode.encode(env)
  end
end

Or similar.

Ensure performant encoding

The encoding protocol implements a fallback to Any for types where a simple call o to_string will provide unambiguous results. Protocol dispatch for the fallback to Any is very slow when protocols are not consolidated, so make sure you have consolidate_protocols: true in your mix.exs or you consolidate protocols manually for production in order to get good performance.

There is more to know about everything :tm: - Check the doc

Contributions & Bugfixes are most welcome!

Please make sure to add tests. I will not look at PRs that are either failing or lowering coverage. Also, solve one problem at a time.

Copyright and License

Copyright (c) 2022 Beat Richartz

CSV source code is licensed under the MIT License.