erlarg - v1.0.0

erlarg CI

An Erlang lib that parsed a list of arguments into structured data.
Useful for handling options/parameters of escript

Installation

Add erlarg to in the deps of your rebar.config:

{deps, [{erlarg, "1.0.0"}]}
% or
{depts, [{erlarg, {git, "https://github.com/Eptwalabha/erlarg.git", {tag, "v1.0.0"}}}]}

If you're building an escript, add erlarg to the list of apps to include in the binary

{escript_incl_apps, [erlarg, ]}.

fetch and compile the dependencies of your project:

rebar3 compile --deps_only

That's it, you're good to go.

How does it work ?

Imagine this command :

./my-script --limit=20 -m 0.25 --format "%s%t" -o output.tsv -

The main/1 function of my-script will receive this list of arguments:

["--limit=20", "-m", "0.25", "--format", "%s%t", "-o", "output.tsv", "-"]

The function erlarg:parse will help you convert them into a structured data:

main(Args) ->
    Syntax = {any, [erlarg:opt({"-l", "--limit"}, limit, int),
                    erlarg:opt({"-f", "--format"}, format, binary),
                    erlarg:opt("-o", file, string),
                    erlarg:opt("-", stdin),
                    erlarg:opt({"-m", "--max"}, max, float)
                   ]}
    {ok, {Result, RemainingArgs} = erlarg:parse(Args, Syntax),
    ...

For this example, parse will return this proplist:

 % Result
[{limit, 20},
 {max, 0.25},
 {format, <<"%s%t">>},
 {file, "output.tsv"},
 stdin].

The functions erlarg:parse/2 & erlarg:parse/3 will transform a list of arguments into a structured data.

Syntax

The syntax will describe to the parser how to handle each arguments (Args). It will consume each argument one by one while building the structured data.

A syntax could be any of those things:

It can be pretty complex, but for now, let's go simple.

Imagine this fictionnal script print_n_time that takes a string and an integer as argument

# this will print the string "hello" 3 times
$ print_n_time hello 3

Here's the simplest spec needed to handle the arguments:

Syntax = [string, int].
erlarg:parse(Args, Syntax). % here Args = ["hello", "3"]
{ok, {["hello", 3], []}} % erlang:parse/2 result

We explicitly asked the parser to handle two arguments, the first <u>must</u> be a string, the second <u>must</u> be an int.
If if the parsing is successful, it will return the following tuple:

{ok, {Data, RemainingArgs}}.

Where Data is the structured data generated by the parser (["hello", 3]) and RemainingArgs is the list of arguments not consumed by the parser ([]).

Parsing failure

If the parser encounter a problem with an argument, it will fail and return the nature of the problem:

> erlarg:parse(["world"], [int]).
{error, {not_int, "word"}} % it failed to convert the word "world" into an int

or

> erlang:parse(["one"], [string, string]). % expect two strings but only got one
{error, {missing, arg}}

[!TIP] These errors can be used to explain to the user what's wrong with the command it typed

Remaining Args

Remaining args are the arguments not consumed by the parser when this one terminates successfuly.
If we add some extra arguments at the end of our command:

$ print_n_time hello 3 some extra arguments

this time, calling erlarg:parse/2 with the same syntax as before will give this result:

Syntax = [string, int].
{ok, {_, RemainingArgs}} = erlarg:parse(Args, Syntax).
["some", "extra", "arguments"] % RemainingArgs

The parser will consume the two first arguments, the remaining argument will be returned in the RemainingArgs.

[!NOTE] Having unconsumed arguments does not generate an error

Types

The parser can convert the argument to more types than just string and int.
Here are all the types currently available :

syntax arg result note
int "1" 1 -
int "1.2" error not an int
float "1.2" 1.2 -
float "1" 1.0 cast int into float
float "1.234e2" 123.4 -
number "1" 1 -
number "1.2" 1.2 -
string "abc" "abc" -
binary "äbc" <<"äbc"/utf8>> use unicode:characters_to_binary
atom "super-top" 'super-top' -

the bool conversion:

arg bool note
"true" true case insensitive
"yes" true
"abcd" true any non-empty string
"1" true
"0.00001" true
"false" false case insensitive
"no" false
"" false empty-string
"0" false
"0.0" false

[!TIP] converting an argument into string, binary, bool or atom it will always succeed.

If you need more complicated "type", see the chapter on Custom types

Naming parameters

Converting an argument into a specific type is important, but it doesn't really help us understand what these values are for:

> Syntax = [string, int].
> {ok, {Result, _}} = erlarg:parse(["hello", "3"], Syntax).
["hello", 3]. % Result

To avoid this issue, you can give "name" to the parsed parameters with the following syntax:

{Name :: atom(), Type :: base_type()}

If we rewrite the syntax as such:

Syntax = [{text, string()}, {nbr, int}].
{ok, {Result, _}} = erlarg:parse(["hello", "3"], Syntax).
[{text, "hello"}, {nbr, 3}] % Result

you can even name a list of parameters if you want:

Syntax = [{a, [string, {a2, float}]}, {b, binary}],
{ok, {Result, _}} = erlang:parse(["abc", "2.3", "bin"], Syntax).
[{a, ["abc", {a2, 2.3}]}, {b, <<"bin">>}] % Result

Options

Naming and casting parameters into types is neat, but most programs use options. An option is an argument that usually (not always…) starts with dash and has zero or more parameters.

$ date -d --utc --date=STRING

Option can have several formats a short one (a dash followed by a letter eg. -v) and/or a long one (double dash and a word eg. --version)

This table summarizes the formats handled/recognized by the parser:

format note
-s
-s <u>VALUE</u>
-s<u>VALUE</u> same as -s VALUE
-abc <u>VALUE</u> same as -a -b -c VALUE
-abc<u>VALUE</u> same as -a -b -c VALUE
--long
--long <u>VALUE</u>
--long=<u>VALUE</u>

In this chapter, we'll see how to tell the parser how to recognise three kind of options:

option without parameter

$ grep -v "bad"
$ grep --invert-match "bad"

We can define this option with erlarg:opt like so:

> Syntax = [erlarg:opt({"-v", "--invert-match"}, invert_match)].
> {ok, {Result, _}} = erlarg:parse(["-v"], Syntax),
[invert_match] % Result

The first parameter of erlarg:opt is the option:

{"-s", "--long"} % short and long options
"-s" % only short option
{"-s", undefined} % same as above
{undefined, "--long"} % only long option

The second parameter is the name of the option, in this case invert_match

option with parameter(s)

Option can have parameters

$ date --date 'now -3 days'
$ date --date='now -3 days'
$ date -d'now -3 days'
> Syntax = [erlarg:opt({"-d", "--date"}, date, string)].
> {ok, {Result, _}} = erlarg:parse(["--date", "now -3 days"], date, string).
[{date, "now -3 days"}] % Result

The third parameter is the syntax of the parameters expected by the option. In this case after matching the argument --date this option is expecting a string ("now -3 days").

Maybe one of the option of your program is expecting two parameters ? No problem :

erlang:opt({"-d", "--dimension"}, dimension, [int, string]}).
[{dimension, [3, "inch"]}] % Result for "-d 3 inch"

You can even use name

erlang:opt({"-d", "--dimension"}, dimension, [{size, int}, {unit, string}]).
[{dimension, [{size, 3}, {unit, "inch"}]}] % Result for "-d 3 inch"

option with sub-option(s):

Because the third parameter is a syntax, and because an option is a syntax itself, that means you can put options into option :

$ my-script --opt1 -a "param of a" -b "param of opt1" --opt2 …

In this fictionnal program, the option --opt1 has two sub-options (-a that expects a parameter and -b that doesn't). We can define opt1 this way:

Opt1 = erlarg:opt({"-o", "--opt1"}, % option
                  opt1, % option&#39;s name 
                  [erlarg:opt("-a", a, string), % sub-option 1
                   erlarg:opt("-b", b),  % sub-option 2
                   {value, string} % the param under the name &#39;value&#39;
                  ]).
{ok, {Result, _}} = erlarg:parse(["--opt1", "-a", "abc", "-b", "def"], Opt1).
[{opt1, [{a, "abc"}, b, {value, "def"}]}] % Result

Well… that's quite unreadable… fortunately, you can use Aliases to avoid this mess.

Aliases

Aliases, let you define all your options, sub-syntax and custom types in a map. It helps keep the Syntax clear and readable.

Aliases = #{
    option1 => erlarg:opt({"-o", "--opt1"}, opt1, [opt_a, opt_b, {value, string}]),
    option2 => erlarg:opt({undefined, "--opt2"}, opt2),
    opt_a => erlarg:opt("-a", a, string),
    opt_b => erlarg:opt("-b", b)
},
Syntax = [option1, option2],
{ok, {Result, _}} = erlarg:parse(["--opt1", "-a", "abc", "-b", "def", "--opt2"],
                                 Syntax, Aliases).
[{opt1, [{a, "abc"}, b, {value, "def"}]}, opt2] % Result

Here Syntax is a list of two aliases, option1 and option2

Syntax operators

Operator tells the parser how to handle a list of syntax

sequence operator

Take the following syntax:

[opt({"-d", "--date"}, date, string), opt({"-u", "--utc"}, utc)]

It would parse this command without problem:

$ date -d "now -3 days" --utc # yay!

But will crash with this one:

$ date --utc --date="now -3 days" # boom !

Why ? Aren't these two commands identical ?
That's because a list of syntax is considered by the parser as a sequence operator :

[syntax1, syntax2, …]

A sequence is expecting the arguments to match in the same order as the elements of the list. The first argument must match syntax1, the second syntax2, …) if any fails, the whole sequence fails.

All elements of the list must succeed in order for the operator to succeed.

syntax args result note
[int, string] ["1", "a"] [1, "a"]
[int] ["1", "a"] [1] remaining: ["a"]
[int, int] ["1", "a"] error "a" isn't an int
[int, string, int] ["1", "a"] error missing a third argument

So how to parse arguments if we're not sure of they order… moreover, some option are… optionnal ! how do we do ? That's where the any operator comes to play.

any operator

format:

{any, [syntax1, syntax2, …]}

The parser will try to consume arguments as long as one of syntax matches. If an element of the syntax fails, the operator fails.

syntax args result note
{any, [int]} ["1", "2", "abc"] [1, 2] remaining: ["abc"]
{any, [{key, int}]} ["1", "2"] [{key, 1}, {key, 2}]
{any, [int, {s, string}]} ["1", "2", "abc", "3"] [1, 2, {s, "abc"}, 3]
{any, [string]} ["1", "-o", "abc", "3"] ["1", "-o", "abc", "3"] even if "-o" is an option

No matter the number of matching element, any will always succeed. If nothing matches no arguments will be consumed.

[!NOTE] Keep in mind that if the list given to any contains types like string or binary, it will consume all the remaining arguments.
{any, [string, custom_type]}, custom_type will never be executed because the type string will always consume argument

first

format:

{first, [syntax1, syntax2, …]}

The parser will return the first element of the syntax to succeed. It'll fail if no element matches.
The following table use Args = ["a", "b", "1"]

syntax result remaining
{first, [int]} [1] ["2", "a", "3", "b"]
{first, [{opt, int}]} [{opt, 1}] ["a", "3", "b"]
{any, [int, {b, binary}]} [1, 2, {b, <<"a">>}, 3, {b, <<"b">>}] []
{any, [string]} ["1", "2", "a", "3", "b"] []

Custom types

Sometime, you need to perfom some operations on an argument or do more complexe verifications. This is what custom type is for.
A custom type is a function that takes a list of arguments and return the formated / checked value to the parser:

-spec fun(Args) -> {ok, Value, RemainingArgs} | Failure) where
    Args :: args(),
    Value :: any(),
    RemainingArgs :: args(),
    Failure :: any().

Example 1:
Let say your script has an option -f FILE where FILE must be an existing file. In this case the type string won't be enought. You could write your own function to perform this check:

existing_file([File | RemainingArgs]) ->
    case filelib:is_regular(File) of
        true -> {ok, File, RemainingArgs};
        _ -> {not_a_file, File}
    end.

To use your custom type:

Spec = #{
    syntax => {any, [file]},
    definitions => #{
        file => erlarg:opt({"-f", "--file"}, existing_file),
        existing_file => fun existing_file/1
    }
}.

or directly as a syntax:

Spec = {any, [{file, erlarg:opt({"-f", "--file"}, fun existing_file/1)}]}.

Example 2:
In this case, your script needs to fetch the informations of a particular user from a config file with the option --consult USERS_FILE USER_ID where USERS_FILE is the file containing the users data and USER_ID is the id of the user:

get_user_config([DatabaseFile, UserID | RemainingArgs]) ->
    case file:consult(DatabaseFile) of
        {ok, Users} ->
            case proplists:get_value(UserID, Users, not_found) of
                not_found -> {user_not_found, UserID};
                UserData -> {ok, UserData, RemainingArgs}
            end;
        Error -> {cannot_consult, DatabaseFile, Error}
    end;
get_user_config(_) ->
    {badarg, missing_arguments}.