nested_access
Introduction
This library contains Erlang functions to access and update deeply nested data structures.
Lets assume a deeply nested data structure like this:
Data = {1,
2,
{
a,
b,
#{a => 42, b => 0},
d
},
4}.
To retrieve the value 42 from Data,
we would typically use lookup functions or pattern matching as shown below:
> D2 = element(3, Data).
> M3 = element(3, D2).
> maps:get(a, M3).
42
> {_, _, D2, _} = Data.
> {_, _, M3, _} = D2.
> #{a := Val} = M3.
> Val.
42With the current library this can be simplified to a "path" based lookup:
> na:get(Data, [3, 3, a]).
42Replacing, updating and removing content also becomes similarly terse. This comes with the added benefit that we no longer need to explicitly re-build the data structure to propagate the change up along the path (see Example 2 - 4).
Paths
All functions in the na module, take a "path" as input,
e.g. [3, 3, a] as in the example above. This path is used to locate the
desired leaf element or branch, in a nested data structure.
The path is a list, typically containing a mix of index and key values, needed to traverse the elements of a nested data structure.
The na functions support traversal of any
proplist(), map(), tuple() and record(), found in a nested data
structure, using keys or indexes (as appropriate for the type).
Note that the na module relies on Erlang runtime type checks to detect
the types of elements.
This detection is somewhat limited and coarse-grained, so an accessor()
can be supplied as a path element.
This allows traversal of unsupported types and custom traversal of
supported types.
An accessor() consist of one or more funs, that specify
how to access and/or update a specific piece of data. See the na docs
for details.
In practice it should be uncommon to have to use accessors, as most nested data structures are built from maps, proplists, tuples and records - lists, sets, dicts, arrays and the like are typically located at the end of a branch, so will normally only show up as the destination of a path, rather than being something that needs to be traversed.
Note that any list() is assumed to be a proplist().
Use the na_lists module to create accessors to do "normal" index based
list access.
Note: It may seem somewhat odd to treat all lists as proplists,
but this is done because type checks can't distinguish between the two,
so the most common type is used as the default.
When dealing with paths you're more likely to traverse a
[{Key, Value}, ...] style lists (i.e. a proplist) rather than a "regular"
[V1, V2, ...] style list, so Key based lookup is more useful than
index based lookup.
Example 1 - na:get()
Example of accessing a proplist() where each entry contains a map():
> Users = [{john, #{name => "John", age => 27, languages => ["Erlang", "Ruby", "Elixir"]}},
{mary, #{name => "Mary", age => 29, languages => ["Elixir", "F#", "Clojure"]}}].
> na:get(Users, [john, age]).
27
> na:get(Users, [john, languages, na_lists:get_at(1)]).
"Erlang"
Note the use of na_lists:get_at(1) to get an accessor() to access the
list() data.
The data structure above is fairly shallow, but we can still see that each path entry replaces one line of lookup code, so a call like:
na:get(Users, [john, languages, na_lists:get_at(1)])replaces a multi-line lookup like:
User = proplists:get_value(john, Users),
#{languages := Lang} = User,
lists:nth(1, Lang)Example 2 - na:replace()
With the following record definitions:
-record(level3, {a, b, c}).
-record(level2, {a, b, c = #level3{}}).
-record(level1, {a, b = #level2{}, c}).
Where Rec is set as Rec = #level1{} and we want to do "Rec.b.c.a = 42",
we can either do:
na:replace(Rec, [#level1.b, #level2.c, #level3.a], 42)or the much more verbose:
%% "unpack" data structure
RL2 = Rec#level1.b,
RL3 = RL2#level2.c,
%% insert 42 and rebuild data structure
UL3 = RL3#level3{a = 42},
UL2 = RL2#level2{c = UL3},
Rec#level1{b = UL2}Reminder: Erlang implements records as tuples.
The syntax #<record name>.<field name> is used to get the tuple index of a
record field. The index is generated at compile time, so there no runtime cost.
Note: for a debugging one-off, if might be simpler to just supply the indexes directly.
na:replace(Rec, [3, 4, 2], 42)Example 3 - na:remove()
na:remove() removes a specific leaf or branch from a data structure, in the
example below the age field is removed from mary:
> Users = [{john, #{name => "John", age => 27, languages => ["Erlang", "Ruby", "Elixir"]}},
{mary, #{name => "Mary", age => 29, languages => ["Elixir", "F#", "Clojure"]}}].
> na:remove(Users, [mary, age]).
[{john, #{name => "John", age => 27, languages => ["Erlang","Ruby","Elixir"]}},
{mary, #{name => "Mary", languages => ["Elixir","F#","Clojure"]}}]Note: be aware that na:remove() can't be applied to all data types,
see the na module documentation for details.
Example 4 - na:modify()
na:modify() is a more flexible version of na:replace(), as the last argument
is a fun(), rather than a simple replacement value.
This allows for arbitrary transformations of the data referenced by the path.
Assuming the following data structure:
Users = [{john, #{name => "John", age => 27, languages => ["Erlang", "Ruby", "Elixir"]}},
{mary, #{name => "Mary", age => 29, languages => ["Elixir", "F#", "Clojure"]}}].Then the following (and more) can be done:
Add "C" to the to the programming languages known by john:
> na:modify(Users, [john, languages], fun(L) -> ["C" | L] end).
[{john,#{name => "John",age => 27, languages => ["C", "Erlang", "Ruby", "Elixir"]}},
{mary,#{name => "Mary",age => 29, languages => ["Elixir", "F#", "Clojure"]}}]
Sort the languages of mary:
> na:modify(Users, [mary, languages], fun(L) -> lists:sort(L) end).
[{john, #{name => "John", age => 27, languages => ["Erlang", "Ruby", "Elixir"]}},
{mary, #{name => "Mary", age => 29, languages => ["Clojure", "Elixir", "F#"]}}]na:modify() can also be used to remove elements, though na:remove() would
generally be more convenient:
> na:modify(Users, [mary], fun(M) -> maps:remove(age, M) end).
[{john, #{name => "John", age => 27, languages => ["Erlang","Ruby","Elixir"]}},
{mary, #{name => "Mary", languages => ["Elixir","F#","Clojure"]}}]Note: it's recommended to use na:replace() and na:remove() rather
than na:modify() when possible. This will be slightly terser and make the
intent more obvious. It should also be slightly faster, as no fun needs to be
created or invoked.
Example 5 - combining calls
In some cases data needs to be read from one place and written to another.
To do this na:get() and na:replace() / na:modify() can be combined,
as in the example below:
> M = #{c => #{scores => [1, 5, 3, 2]}, d => #{avg => undefined}}.
> Scores = na:get(M, [c, scores]).
> Avg = lists:sum(Scores) / length(Scores).
> na:replace(M, [d, avg], Avg).
#{c => #{scores => [1, 5, 3, 2]}, d => #{avg => 2.75}}Usage Recommendations
The na functions are designed to act on select parts, of a
deeply nested data structure.
If the whole data structure needs to be "walked",
then it's best to write custom code to do this, to ensure that each tree node
is only visited once. Iterating over a number of mostly identical paths,
using na functions, will result in repeated visits to the same tree nodes.
It's preferable to act on "larger" elements (e.g. a list of test scores as in
the previous example), as this minimises the cost of the na function call.
While na handles proplists more correctly than many Erlang/OTP functions
(used to manipulate proplists) it still assumes that proplists
are essentially maps, causing some limitations:
- Only the first matching key-val gets accessed, even if the same key occurs more than once.
-
Only
{Key, Val}tuples and atoms (treated as{Atom, true}) are matched against when accessing the proplist. Other elements are ignored but preserved.
As documented in the na_proplists_helper docs, proplists can also have
other tuples in their lists.
[{Key1}, {Key2, 1, 2}] is e.g. a valid proplist().
Note: a custom accessor() would need to be supplied,
to pick between multiple key matches in a proplist or
between multiple values in a tuple.
It should be noted that the na functions usually come with a notable overhead,
especially when compared to using traditional lookup/update code run
directly inside of the current function
(see the Running the Benchmarks
section for details).
This may limit the viability of na functions in performance critical code.
Note: performance should be a non-issue (except in very special cases)
when using na functions for debugging, runtime exploration or in test suites.
Design choices
This library aims to be easy to use from the Erlang shell - this is why
the na functions don't rely on include files, macros,
parse transforms or similar Erlang compile time features.
The na API is intentionally kept simple, to make it easy to learn.
The na.erl implementation aims to be as performant as possible.
This is also why the functions in na.erl tend to avoid abstractions e.g.
helper functions, as this would add to the already notably overhead of the
na functions.
Working with the repo
An optional Makefile is included to cut down on rebar3 command verbosity.
The Makefile also acts as an informal reference, for useful rebar
commands.
Running the Benchmarks
The benchmark directory contains several benchmark modules.
These can be run to see the performance difference between
na functions vs various forms of traditional access code.
To compile the benchmark files and start an interactive Erlang shell, do:
make bcompile
make bshellTo run the individual benchmark modules, use:
na_benchmark_get:run().
na_benchmark_replace:run().
Note that there are also run/1 and run/2 variants, to control how the tests
are run - see the benchmark modules for details.
In regards to the benchmarks:
na_benchmark_get.erlbenchmarksna:get()i.e. the performance of accessing, an element within a nested data structure.na_benchmark_replace.erlbenchmarksna:replace()i.e. the performance of accessing an element and then propagating the change up through the data structure.na:modify()andna:remove()don't have benchmarks, but should perform similar tona:replace()as they mostly work the same way.
Note that the benchmarks generally represent a worst case scenario for the
na functions:
- The traditional code is normally run in the local scope.
-
In comparison the
nafunctions do one recursivenafunction call per path element. Each function call also has to do a number of type checks, to determine what kind of access to do, at each path step. - Also note that the traditional code can often make more assumptions, about the the data that it is working on, so may end up being able to access/rebuild data in a more efficient way.
The benchmarks result below are from running on a
2019 MacBook Pro 16", 2.6GHz 6 Core Intel i7, with macOS Sonoma 14.7,
using Erlang/OTP 28.0.2 (with JIT), where the benchmark run/1 functions where used
with Dups = 100 (default is 10):
| test | na:get() | na:replace() |
|---|---|---|
| Mixed nested types | 2.3 times slower | 1.2 times slower. |
| Mixed (only accessors in path) | 2.8 times slower | - |
| Nested lists | 2.4 to 13.6 times slower | 1.1 to 20.6 times slower |
| Nested records/tuples | 8.2 to 10.3 times slower | 27.8 to 36.0 times slower |
| Nested maps | 3.1 to 3.2 times slower | x4.2 faster to x3.2 slower |
| Nested proplists | 1.02 times slower | 1.02 times slower |
Using Erlang/OTP 28.0, compiled without JIT support, yields somewhat different numbers:
| test | na:get() | na:replace() |
|---|---|---|
| Mixed nested types | 2.6 times slower | 1.8 times slower. |
| Mixed (only accessors in path) | 3.6 times slower | - |
| Nested lists | 2.2 to 8.7 times slower | 1.3 to 17.1 times slower |
| Nested records/tuples | 8.2 times slower | 24.7 to 24.8 times slower |
| Nested maps | 2.3 to 3.6 times slower | 2.2 to 3.5 times slower |
| Nested proplists | 1.4 times slower | 1.6 times slower |
While the specific benchmark results should be taken with a grain of salt, as they depend on the specifics of individual tests, there are are few general take-aways:
nafunctions will most likely be several times slower than traditional code.-
lists, records, tuples and maps are generally accessed via BIFs or similar
low level operators, that take very little time, so the
nafunction call and type check overhead tends to be very noticable. -
Using
accessor()path elements (rather than regular indexes and keys), as done inna_benchmark_get:get_mixed_accessor_only()will add additional slowdown, due to the added fun() call overhead. -
Note that proplist access mostly uses regular Erlang functions - this makes
the
naoverhead less noticeable.
History
This project was initially inspired by the Elixir (non-macro) access functions,
like Kernel.get_in/2 and Kernel.update_in/3.
The initial Erlang API mimicked the Elixir API. This changed over time, reflecting various differences between the languages and the implementations (the Erlang implementation is not based on the Elixir one).
Certain features like being able to return multiple elements in
Kernel.get_in/2 or returning nil on failed lookup (rather than crash),
where never considered for the Erlang version. Supporting these would have
made the main use-case ("access single element at path") slower,
due to the extra code that would have had to be executed.
Future ideas
Parse transforms could be used to support Lisp/Elixir style macros.
This would allow for a function call like:
Result = nap:get(Data, [{tuple, 2}, {list, N}, {map, "foo"}]).to be replaced by a series of local lookup calls, i.e. something like:
Temp1 = element(2, Data),
Temp2 = lists:nth(N, Temp1),
Result = maps:get("foo", Temp2),
This allows for Erlang "native" performance, but comes at the cost
of having to know the types and path elements at compile time.
This means that the types (tuple, list and map in the example above)
must be literals. This also holds true for the list elements.
This is so that the parse transform can inspect
the code (at compile time) to pick the appropriate replacement code.
Note that the lookup key/index can still be a variable - see the use of
N in the example above.
Also note that a large number of types could easily be supported, as types are user supplied, rather than detected at runtime.
A unique module name (e.g. nap) should be used, to make it easier to identify
what function calls should be parse transformed.
Elvis
This project also contains an elvis.config file that is used by Elvis
(an Erlang linter).
The config is configured to run as many Elvis rules as possible, while also removing the need to use Elvis exceptions to mute undesirable warnings. This is achieved by omitting specific Elvis rules, that are prone to false positives.
For more on Elvis see:
- https://github.com/inaka/elvis
- https://github.com/inaka/elvis_core
- https://github.com/project-fifo/rebar3_lint
Wrap Up
Thanks goes out to Erlang Solutions for letting me worked on random projects (like this one), while being on the "bench" between consultancy projects.
This project has also given me the time to delve into ANSI escape codes. These allow for the fancy formatting and progress-bar animations, as seen in the shell, when running the benchmarks.