Csv Schema

Hex pmBuild StatusLicense

Csv schema is a library helping you to build Ecto.Schema-like modules having a csv file as source.

The idea behind this library is give the possibility to create, at compile-time, a self-contained module exposing functions to retrieve data starting from a CSV.

Installation

If available in Hex, the package can be installed by adding csv_schema to your list of dependencies in mix.exs:

def deps do
[
{:csv_schema, "~> 0.1.0"}
]
end

Usage

Supposing you have a CSV file looking like this:

idfirst_namelast_nameemailgenderip_addressdate_of_birth
1IvoryOverstreetioverstreet0@businessweek.comFemale30.138.91.6210/22/2018
2UlickVasnevuvasnev1@vkontakte.ruMale35.15.164.7001/19/2018
3ChloeFreemantlecfreemantle2@parallels.comFemale133.133.113.25508/13/2018
.....................

Is possible to create an Ecto.Schema-like repository using Csv.Schema macro

defmodule Person do
use Csv.Schema
alias Csv.Schema.Parser
schema "path/to/person.csv" do
field :id, "id", key: true
field :first_name, "first_name", filter_by: true
field :last_name, "last_name"
field :email, "email", unique: true
field :gender, "gender", filter_by: true
field :ip_address, "ip_address"
field :date_of_birth, "date_of_birth", parser: &Parser.date!(&1, "{0M}/{0D}/{0YYYY}")
end
end

Note that it's not a requirement to map all fields, but every field mapped must have a column in csv file. For example the following field configuration will result in a compilation error

field :id, "non_existing_id", ....

Moreover it's possible to pass to use Csv.Schema the :separator param. This let the macro split csv for you using provided separator.

Now Person module is a struct, defined like this:

defmodule Person do
defstruct id: nil,
first_name: nil,
last_name: nil,
email: nil,
gender: nil,
ip_address: nil,
date_of_birth: nil
end

This macro creates for you inside Person module those functions:

def by_id(integer_key), do: ...
def filter_by_first_name(string_value), do: ...
def by_email(string_value), do: ...
def filter_by_gender(string_value), do: ...
def get_all, do: ...

Where:

Note: if @auto_primary_key is set to true this macro creates automatically a new column called id (and new by_id method). Its value is a progressive integer; otherwise you have to set a key opt to the field that should be key

Field configuration

Every field should be formed like this:

field {struct_field}, {csv_header}, {opts}

where:

opts:

Note that every configuration is optional

Keep in mind

Compilation time increase in a linear manner if csv contains lots of lines and you configure multiple fields candidate for method creation (flags key, unique and/or filter_by set to true) Because "without data you're just another person with an opinion" here some data

csv rowskeyuniquefilter_bycompile time ms
1_000no0022 ms
1_000yes1119 ms
1_000yes2221 ms
1_000yes2429 ms
1_000yes2015 ms
1_000yes0426 ms
1_000no2012 ms
1_000no0422 ms
5_000no00555 ms
5_000yes111_695 ms
5_000yes222_341 ms
5_000yes243_273 ms
5_000yes201_976 ms
5_000yes042_698 ms
5_000no201_559 ms
5_000no042_146 ms
10_000no001_701 ms
10_000yes113_624 ms
10_000yes225_169 ms
10_000yes246_988 ms
10_000yes204_279 ms
10_000yes045_638 ms
10_000no203_278 ms
10_000no044_846 ms

5 compilations average time.

Executed on my machine:

Lenovo Thinkpad T480
CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
RAM: 32GB