Csv Schema

Hex pmBuild StatusLicense

Csv schema is a library helping you to build Ecto.Schema-like modules having a csv file as source.

The idea behind this library is give the possibility to create, at compile-time, a self-contained module exposing functions to retrieve data starting from a CSV.

Installation

If available in Hex, the package can be installed by adding csv_schema to your list of dependencies in mix.exs:

def deps do
  [
    {:csv_schema, "~> 0.2.0"}
  ]
end

Usage

Supposing you have a CSV file looking like this:

id | first_name | last_name | email | gender | ip_address | date_of_birth :--:|:----------:|:----------:|:-----------------------------:|:------:|:---------------:|:------------: 1 | Ivory | Overstreet | ioverstreet0@businessweek.com | Female | 30.138.91.62 | 10/22/2018 2 | Ulick | Vasnev | uvasnev1@vkontakte.ru | Male | 35.15.164.70 | 01/19/2018 3 | Chloe | Freemantle | cfreemantle2@parallels.com | Female | 133.133.113.255 | 08/13/2018 ... | ... | ... | ... | ... | ... | ...

Is possible to create an Ecto.Schema-like repository using Csv.Schema macro:

defmodule Person do
  use Csv.Schema
  alias Csv.Schema.Parser

  schema "path/to/person.csv" do
    field :id, "id"
    field :first_name, "first_name", filter_by: true
    field :last_name, "last_name", sort: :asc
    field :identifier, ["first_name", "last_name"], key: true, join: " "
    field :email, "email", unique: true
    field :gender, "gender", filter_by: true, sort: :desc
    field :ip_address, "ip_address"
    field :date_of_birth, "date_of_birth", parser: &Parser.date!(&1, "{0M}/{0D}/{0YYYY}")
  end
end

Note that it's not a requirement to map all fields, but every field mapped must have a column in csv file. For example the following field configuration will result in a compilation error:

field :id, "non_existing_id", ....

Schema could be configured using a custom separator

use Csv.Schema, separator: ?,

Moreover it's possible to configure if csv file has or has not an header. Depending on header param value field config changes:

# Csv with header
schema "path/to/person.csv" do
  field :id, "id", key: true
  ...
end

# Csv without header. Note that field 1 is binded with the first csv column.
# Index goes from 1 to N
schema "path/to/person.csv" do
  field :id, 1, key: true
  ...
end

Now Person module is a struct, defined like this:

defmodule Person do
  defstruct id: nil,
            first_name: nil,
            last_name: nil,
            email: nil,
            gender: nil,
            ip_address: nil,
            date_of_birth: nil
end

This macro creates for you inside Person module those functions:

def by_id(integer_key), do: ...

def filter_by_first_name(string_value), do: ...

def by_email(string_value), do: ...

def filter_by_gender(string_value), do: ...

def get_all, do: ...

Where:

Field configuration

Every field should be formed like this:

field {struct_field}, {csv_header}, {opts}

where:

opts:

Note that every configuration is optional

Keep in mind

Compilation time increase in an exponential manner if csv contains lots of lines and you configure multiple fields candidate for method creation (flags key, unique and/or filter_by set to true).

Because "without data you're just another person with an opinion" here some data:

csv rows | key | unique | filter_by | compile time ms --------:|:---:|:------:|:---------:|----------------: 1_000 | no | 0 | 0 | 419 ms 1_000 | yes | 1 | 1 | 1_980 ms 1_000 | yes | 2 | 2 | 2_542 ms 1_000 | yes | 2 | 4 | 3_565 ms 1_000 | yes | 2 | 0 | 1_758 ms 1_000 | yes | 0 | 4 | 2_090 ms 1_000 | no | 2 | 0 | 1_634 ms 1_000 | no | 0 | 4 | 1_971 ms 5_000 | no | 0 | 0 | 2_410 ms 5_000 | yes | 1 | 1 | 15_282 ms 5_000 | yes | 2 | 2 | 22_478 ms 5_000 | yes | 2 | 4 | 28_060 ms 5_000 | yes | 2 | 0 | 16_254 ms 5_000 | yes | 0 | 4 | 15_043 ms 5_000 | no | 2 | 0 | 14_518 ms 5_000 | no | 0 | 4 | 12_931 ms 10_000 | no | 0 | 0 | 4_962 ms 10_000 | yes | 1 | 1 | 28_995 ms 10_000 | yes | 2 | 2 | 42_817 ms 10_000 | yes | 2 | 4 | 54_759 ms 10_000 | yes | 2 | 0 | 37_166 ms 10_000 | yes | 0 | 4 | 29_913 ms 10_000 | no | 2 | 0 | 33_578 ms 10_000 | no | 0 | 4 | 29_096 ms

5 compilations average time.

Executed on my machine:

Lenovo Thinkpad T480
CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
RAM: 32GB