Arrays

Arrays is a library to work with well-structured Arrays with fast random-element-access for Elixir, offering a common interface with multiple implementations with varying performance guarantees that can be switched in your configuration.

hex.pm versionDocumentationciCoverage Status

Installation

Arrays is available in Hex and can be installed by adding arrays to your list of dependencies in mix.exs:

def deps do
[
  {:arrays, "~> 2.1"}
]
end

Documentation can be found at https://hexdocs.pm/arrays.


Using Arrays

Some simple examples:

Constructing Arrays

By calling Arrays.new or Arrays.empty:

    iex> Arrays.new(["Dvorak", "Tchaikovsky", "Bruch"])
    #Arrays.Implementations.MapArray<["Dvorak", "Tchaikovsky", "Bruch"]>

    iex> Arrays.new(["Dvorak", "Tchaikovsky", "Bruch"], implementation: Arrays.Implementations.ErlangArray)
    #Arrays.Implementations.ErlangArray<["Dvorak", "Tchaikovsky", "Bruch"]>

By using Collectable:

    iex> [1, 2, 3] |> Enum.into(Arrays.new())
    #Arrays.Implementations.MapArray<[1, 2, 3]>
    iex> for x <- 1..2, y <- 4..5, into: Arrays.new(), do: {x, y}
    #Arrays.Implementations.MapArray<[{1, 4}, {1, 5}, {2, 4}, {2, 5}]>

Some common array operations:

    iex> words = Arrays.new(["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"])
    #Arrays.Implementations.MapArray<["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]>
    iex> Arrays.size(words) # Runs in constant-time
    9
    iex> words[3] # Indexing is fast
    "fox"
    iex> words = put_in(words[2], "purple") # All of `Access` is supported
    #Arrays.Implementations.MapArray<["the", "quick", "purple", "fox", "jumps", "over", "the", "lazy", "dog"]>
    iex> # Common operations are available without having to turn the array back into a list (as `Enum` functions would do):
    iex> Arrays.map(words, &String.upcase/1) # Map a function, keep result an array
    #Arrays.Implementations.MapArray<["THE", "QUICK", "PURPLE", "FOX", "JUMPS", "OVER", "THE", "LAZY", "DOG"]>
    iex> lengths = Arrays.map(words, &String.length/1)
    #Arrays.Implementations.MapArray<[3, 5, 6, 3, 5, 4, 3, 4, 3]>
    iex> Arrays.reduce(lengths, 0, &Kernel.+/2) # `reduce_right` is supported as well.
    36

Concatenating arrays:

    iex> Arrays.new([1, 2, 3]) |> Arrays.concat(Arrays.new([4, 5, 6]))
    #Arrays.Implementations.MapArray<[1, 2, 3, 4, 5, 6]>

Slicing arrays:

    iex> ints = Arrays.new(1..100)
    iex> Arrays.slice(ints, 9..19)
    #Arrays.Implementations.MapArray<[10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]>

Rationale

Algorithms that use arrays can be used while abstracting away from the underlying representation. Which array implementation/representation is actually used, can then later be configured/compared, to make a trade-off between ease-of-use and time/memory efficiency.

Arrays itself comes with two built-in implementations:

By default, the MapArray implementation is used when creating new array objects, but this can be configured by either changing the default in your whole application, or by passing an option to a specific invocation of new/0-2, or empty/0-1.

Implementations provided by other libraries:

    iex> words = Arrays.new(["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"])
    #Arrays.Implementations.MapArray<["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]>

Protocols

Besides being able to use all functions in the Arrays module, one can use the following protocols and behaviours with them:

Note: FunLand is an optional dependency of this library, so its functionality will only be available if :fun_land is also added to your mix.exs dependencies list.

Enumerable

    iex> myarray = Arrays.new([2, 1, 4, 2, 0])
    iex> Enum.sort(myarray)
    [0, 1, 2, 2, 4]
    iex> Enum.count(myarray)
    5
    iex> Enum.with_index(myarray)
    [{2, 0}, {1, 1}, {4, 2}, {2, 3}, {0, 4}]
    iex> Enum.slice(myarray, 1, 3)
    [1, 4, 2]

    iex> names = Arrays.new(["Ernie", "Bert", "Kermit"])
    iex> names |> Stream.map(&String.upcase/1) |> Enum.into(Arrays.new())
    #Arrays.Implementations.MapArray<["ERNIE", "BERT", "KERMIT"]>

    iex> foods = Arrays.new(["Cheese", "Strawberries", "Cookies"])
    iex> foods |> Enum.take(2)
    ["Cheese", "Strawberries"]

Collectable

    iex> [10, 20, 30, 40] |> Enum.into(Arrays.new())
    #Arrays.Implementations.MapArray<[10, 20, 30, 40]>

Access

    iex> arr = Arrays.new([1, 2, 3, 4])
    iex> arr = put_in(arr[2], 33)
    #Arrays.Implementations.MapArray<[1, 2, 33, 4]>
    iex> arr = update_in(arr[1], (&(&1 * -2)))
    #Arrays.Implementations.MapArray<[1, -4, 33, 4]>
    iex> arr = update_in(arr[-1], (&(&1 + 1)))
    #Arrays.Implementations.MapArray<[1, -4, 33, 5]>
    iex> {33, arr} = pop_in(arr[-2])
    iex> arr
    #Arrays.Implementations.MapArray<[1, -4, 5]>
    iex> {1, arr} = pop_in(arr[0])
    iex> arr
    #Arrays.Implementations.MapArray<[-4, 5]>
    iex> {5, arr} = pop_in(arr[-1])
    iex> arr
    #Arrays.Implementations.MapArray<[-4]>

    iex> arr2 = Arrays.new([10, 20, 30])
    iex> {20, arr2} = get_and_update_in(arr2[1], fn _ -> :pop end)
    iex> arr2
    #Arrays.Implementations.MapArray<[10, 30]>

square-bracket access, get_in, put_in and update_in are very fast operations. Unless pop/pop_in is used for the last element in the array, is a very slow operation, as it requires moving of all elements after the given index in the array.

Both positive indexes (counting from zero) and negative indexes (-1 is the last element, -2 the second-to-last element, etc.) are supported.

However, if positive_index > Arrays.size(array) or negative_index < -Arrays.size(array), an ArgumentError is raised:

    iex> arr = Arrays.new([1,2,3,4])
    iex> pop_in(arr[4])
    ** (ArgumentError) argument error

    iex> arr = Arrays.new([1,2,3,4])
    iex> pop_in(arr[-5])
    ** (ArgumentError) argument error

    iex> arr = Arrays.new([1,2,3,4])
    iex> Access.fetch(arr, 4)
    :error
    iex> Access.fetch(arr, -5)
    :error

    iex> arr = Arrays.new([1,2,3,4])
    iex> update_in(arr[8], fn x -> x * 2 end)
    ** (ArgumentError) argument error

    iex> arr = Arrays.new([1,2,3,4])
    iex> update_in(arr[-8], fn x -> x * 2 end)
    ** (ArgumentError) argument error

Insertable

    iex> arr = Arrays.new()
    iex> {:ok, arr} = Insertable.insert(arr, 42)
    iex> {:ok, arr} = Insertable.insert(arr, 100)
    iex> arr
    #Arrays.Implementations.MapArray<[42, 100]>

Extractable

    iex> Extractable.extract(Arrays.new())
    {:error, :empty}
    iex> {:ok, {3, arr}} = Extractable.extract(Arrays.new([1, 2, 3]))
    iex> arr
    #Arrays.Implementations.MapArray<[1, 2]>

FunLand.Reducible

Note: FunLand is an optional dependency of this library.

    iex> Arrays.new([1,2,3,4]) |> FunLand.reduce(0, &(&1+&2))
    10

FunLand.Mappable

    iex> Arrays.new([1, 2, 3, 4]) |> FunLand.Mappable.map(fn x -> x * 2 end)
    #Arrays.Implementations.MapArray<[2, 4, 6, 8]>

Arrays vs Lists

Elixir widely uses List as default collection type. Arrays have the folowing differences:

¹: Depending on the implementation, ‘fast’ is either O(1) (constant time, regardless of array size) or O(log(n)) (logarithmic time, becoming a constant amount slower each time the array doubles in size.)

The linear time many operations on lists take, means that the operation becomes twice as slow when the list doubles in size.

Implementing a new Array type

To add array-functionality to a custom datastructure, you’ll need to implement the Arrays.Protocol.

Besides these, you probably want to implement the above-mentioned protocols as well. You can look at the source code of Arrays.CommonProtocolImplementations for some hints as to how those protocols can be easily implemented, as many functions can be defined as simple wrappers on top of the functions that Arrays.Protocol itself already provides.


Changelog

Roadmap


Benchmarks

You can run the benchmarks locally by running mix run benchmarks/benchmarks.exs, which will also output the HTML format with nice graphs.

From below benchmarks, we know (caveat emptor):

Append a single element

appendappend_focus

Appending a single element is very fast on arrays, even as sizes grow. MapArray and ErlangArray perform similarly.

For extra comparison, we look at lists both to see how slow list ++ [val] becomes as baseline, but also how fast [val | list] still is:

In certain situations where a list can be treated as ‘backwards’, this can be a very simple way to append elements. As doing this is built-in, it will always be faster than our arrays. Thus, it serves as a ‘maxline’.

Random element access

random_read

Accessing a random element is very fast on arrays, even as sizes grow.

Arrays start beating lists significantly once the collection has more than 256 elements.

MapArray and ErlangArray seem to perform similarly < 8192 elements.

For larger sizes, ErlangArray seems to be a factor ~2 slower than MapArray again.

Random element update

random_update

Arrays start beating lists once the collection has more than 128 elements.

For sizes up to 131072 elements, MapArray seems to be between 100% and 30% slower than ErlangArray. For longer arrays, MapArray wins out, with ErlangArray being ~30% slower.

It seems like put_in has some overhead w.r.t. calling Arrays.replace. This warrants more investigation. Maybe Access has some overhead for its calls, or maybe the implementations of get_and_update_in could be further optimized.

Concatenate two equally-large collections

concatconcat_focusconcat_focus_log

Strangely, concatenation of large collections is very fast on lists. Probably because all of it happens in a single built-in function?

Lists outperform arrays 20x-100x for this task.

Between ErlangArray and MapArray, ErlangArray seems to handle this task 50% faster when concatenating two 4068-element arrays, and twice as fast for larger collections.