Aja

Hex VersiondocsCI

Extension of the Elixir standard library focused on data stuctures and data manipulation.

Data structures

"there is one aspect of functional programming that no amount of cleverness on the part of the compiler writer is likely to mitigate — the use of inferior or inappropriate data structures." -- Chris Okasaki

Persistent vectors: A.Vector

Clojure-like persistent vectors are an efficient alternative to lists, supporting many operations like appends and random access in effective constant time.

iex> vector = A.Vector.new(1..10)
#A<vec([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])>
iex> A.Vector.append(vector, :foo)
#A<vec([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, :foo])>
iex> vector[3]
4
iex> A.Vector.replace_at(vector, -1, :bar)
#A<vec([1, 2, 3, 4, 5, 6, 7, 8, 9, :bar])>
iex> 3 in vector
true

A.Vector is blazing fast and easier to use from Elixir than Erlang's :array module.

A.Vector reimplements many of the functions from the Enum module specifically for vectors, with efficiency in mind.

The A.vec/1 and A.vec_size/1 macros, while being totally optional, can make it easier to work with vectors and make pattern-matching possible:

iex> import A
iex> vec([a, 2, c, _d, e]) = A.Vector.new(1..5)
#A<vec([1, 2, 3, 4, 5])>
iex> {a, c, e}
{1, 3, 5}
iex> match?(v when vec_size(v) > 9, vec(1..10))
true

The A.+++/2 operator can make appending to a vector more explicit:

iex> vec([1, 2, 3]) +++ vec([4, 5])
#A<vec([1, 2, 3, 4, 5])>

Ordered maps: A.OrdMap

The standard library does not offer any similar functionality:

iex> %{"one" => 1, "two" => 2, "three" => 3}
%{"one" => 1, "three" => 3, "two" => 2}
iex> ord_map = A.OrdMap.new([{"one", 1}, {"two", 2}, {"three", 3}])
#A<ord(%{"one" => 1, "two" => 2, "three" => 3})>
iex> ord_map["two"]
2
iex> Enum.to_list(ord_map)
[{"one", 1}, {"two", 2}, {"three", 3}]

Ordered maps behave pretty much like regular maps, and the A.OrdMap module offers the same API as Map. The convenience macro A.ord/1 make them a breeze to instantiate or patter-match upon:

iex> import A
iex> ord_map = ord(%{"一" => 1, "二" => 2, "三" => 3})
#A<ord(%{"一" => 1, "二" => 2, "三" => 3})>
iex> ord(%{"三" => three, "一" => one}) = ord_map
iex> {one, three}
{1, 3}

Red-Black Trees: A.RBMap and A.RBSet

Trees are useful when map keys or set elements need to be kept sorted.

iex> A.RBMap.new([b: "Bat", a: "Ant", c: "Cat", b: "Buffalo"])
#A.RBMap<%{a: "Ant", b: "Buffalo", c: "Cat"}>
iex> A.RBSet.new([5, 3, 4, 1, 2, 3, 1, 5])
#A.RBSet<[1, 2, 3, 4, 5]>

They offer similar functionalities as general balanced trees (:gb_trees and :gb_sets) included in the Erlang standard library. A.RBMap and A.RBSet should however be safer and more convenient to use while offering similar performance.

All data structures offer:

Utility functions

Sigil i for IO data

iex> import A
iex> ~i"atom: #{:foo}, charlist: #{&#39;abc&#39;}, number: #{12 + 2.35}\n"
["atom: ", "foo", ", charlist: ", &#39;abc&#39;, ", number: ", "14.35", 10]

Exclusive ranges: A.ExRange

iex> A.ExRange.new(0, 10) |> Enum.to_list()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
iex> import A
iex> Enum.map(0 ~> 5, &"id_#{&1}")
["id_0", "id_1", "id_2", "id_3", "id_4"]

Don't Break The Pipe!

iex> %{foo: "bar"} |> A.Pair.wrap(:noreply)
{:noreply, %{foo: "bar"}}
iex> {:ok, 55} |> A.Pair.unwrap!(:ok)
55

Various other convenience helpers

iex> A.String.slugify("> \"It Was Me, Dio!!!\"\n")
"it-was-me-dio"
iex> A.Integer.decimal_format(1234567)
"1,234,567"
iex> A.Integer.div_rem(7, 3)
{2, 1}
iex> A.Enum.sort_uniq([1, 4, 2, 2, 3, 1, 4, 3])
[1, 2, 3, 4]
iex> A.List.repeatedly(&:rand.uniform/0, 3)
[0.40502929729990744, 0.45336720247823126, 0.04094511692041057]
iex> A.IO.iodata_empty?(["", []])
true

Nothing groundbreaking, but having these helpers to hand might save you the implementation and the testing, or bringing over a library just for this one thing.

Browse the API documentation for more details.

Installation

Aja can be installed by adding aja to your list of dependencies in mix.exs:

def deps do
  [
    {:aja, "~> 0.4.5"}
  ]
end

Documentation can be found at https://hexdocs.pm/aja.

About Aja

Inspirations

Goals

(* while fast compile time is a target, A.Vector, which is optimized for fast runtime at the expense of compile time, slows it down)

Non-goals

Resources

FAQ

How stable is it?

Aja is still pretty early stage. Some breaking changes are still to be expected.

However, many of its APIs are based on the standard library and should therefore remain fairly stable.

Besides, Aja is tested quite thoroughly both with unit tests and property-based testing (especially for data structures). This effort is far from perfect, but increases our confidence in the overall stability.

How is the performance?

Vectors

Most operations from A.Vector are much faster than Erlang's :array equivalents, and in some cases are even slightly faster than equivalent list operations (map, folds, join, sum...).

There is one exception where A.Vector can be slightly slower than :array: random access for small collections. For bigger collections however, the higher branching factor for vectors (16 vs 10) should close this gap.

Maps / sets

Performance for alternative maps/sets cannot match native maps or ETS (mutable state) which are written in native code.

However:

Aja data structures should work fine in most cases, but if you're considering them for performance-critical sections of your code, make sure to benchmark them and also consider alternatives, typically ETS if mutable state is acceptable.

Benchmarking is still a work in progress, but you can check the bench folder for more detailed figures.

Why is there a convenience macro for A.OrdMap but not for other structures?

There are actually two reasons for this:

  1. ordered maps would be unconvenient to initialize otherwise
  2. ordered maps can be pattern-matched upon due to their internal representation, tree-based structures cannot

1. Initialization with new/1:

Ordered maps are tricky to initialize, and A.OrdMap.new/1 is not convenient to do so. We cannot simply pass it a map, because the map will reorder the keys. We have to pass it a list of tuples, which is fine if keys are atoms, but feels messy and not readable otherwise.

Being a macro, A.ord/1 is able to read the code and preserve the order, without ever instanciating a map that would lose the order:

iex> A.OrdMap.new(%{"one" => 1, "two" => 2, "three" => 3})
#A<ord(%{"one" => 1, "three" => 3, "two" => 2})>
iex> ord(%{"one" => 1, "two" => 2, "three" => 3})
#A<ord(%{"one" => 1, "two" => 2, "three" => 3})>

A.RBMap.new/1, A.RBSet.new/1 ... do not face any similar constraints and wouldn't benefit from a macro.

2. Pattern-matching

Short answer: because the internal representation of ordered maps happens to use a map, it is possible to make A.ord/1 work as it does. Tree-based A.RBMaps cannot enjoy this treatment.

Longer answer: Elixir (Erlang) is limited in what can be pattern-matched upon, because it does not offer active patterns. While this is a fine decision that helps keeping the language simpler, it has the drawback of being tied to the internal representation of data structures.

Quoting Okasaki again, describing what might be called pattern-matching induced damage:

"Ironically, pattern matching — one of the most popular features in functional programming languages — is also one of the biggest obstacles to the widespread use of efficient functional data structures. The problem is that pattern matching can only be performed on data structures whose representation is known, yet the basic software-engineering principle of abstraction tells us that the representation of non-trivial data structures should be hidden. The seductive allure of pattern matching leads many functional programmers to abandon sophisticated data structures in favor of simple, known representations such as lists, even when doing so causes an otherwise linear algorithm to explode to quadratic or even exponential time."

Making pattern-matching work for trees would probably need to implement some kind of active pattern, that would imply to redefine alternative versions of def, case and =/2.

Does Aja try to do too much?

The Unix philosophy of "Do one thing and do it well" is arguably the right approach in many cases. Aja doesn't really follow it, but there are conscious reasons for going that direction.

While it might be possible later down the road to split some of its components, there is no plan to do so at the moment.

First, we don't think there is any real downside of shipping "too much": Aja is and aims to remain lightweight and keep a modular structure. You can just use what you need without suffering from what you don't.

This lodash-like approach has benefits too: it aims to ship with a lot of convenience while introducing only one flat dependency. This can help staying out of two extreme paths:

Finally, data structures can work more efficiently together than if they were separated libraries.

What are the next steps?

Nothing is set in stone, but the next steps will probably be:

Copyright and License

Aja is licensed under the MIT License.