GenStage
GenStage is a specification for exchanging events between producers and consumers.
This project currently provides the following functionality:
GenStage(docs) - a behaviour for implementing producer and consumer stagesDynamicSupervisor(docs) - a supervisor designed for starting children dynamically. Besides being a replacement for the:simple_one_for_onestrategy in the regularSupervisor, aDynamicSupervisorcan also be used as a stage consumer, making it straight-forward to spawn a new process for every event in a stage pipeline
You can find examples on how to use the modules above in the examples directory:
DymamicSupervisor consumer - an example of how to use one or more
DynamicSupervisoras a consumer to a producer that works as a counterGenEvent - an example of how to use
GenStageto implement aGenEventreplacement that leverages concurrency and provides more flexibility regarding buffer size and back-pressure
Installation
GenStage requires Elixir v1.3.
Add
:gen_stageto your list of dependencies in mix.exs:def deps do
[{:gen_stage, "~> 0.1.0"}]end
Ensure
:gen_stageis started before your application:def application do
[applications: [:gen_stage]]end
Future research
Here is a list of potential topics to be explored by this project (in no particular order or guarantee):
Provide examples with delivery guarantees
Consider using DynamicSupervisor to implement Task.Supervisor (as a consumer)
TCP and UDP acceptors as producers
Ability to attach filtering to producers - today, if we need to filter events from a producer, we need an intermediary process which incurs extra copying
Connecting stages across nodes - because there is no guarantee demand is delivered, the demand driven approach in
GenStagewon't perform well in a distributed setupIntegration with streams - how the
GenStagefoundation integrates with Elixir streams? In particular, streams composition is still purely functional while stages introduce asynchronicity.Exploit parallelism - how the GenStage foundation can help us implement conveniences like pmap, chunked map, farming and so on. See Patterns for Parallel Programming (1), the Eden project (2) and the skel project (3, 4, 5)
Explore different windowing strategies - the ideas behind the Apache Beam project are interesting, specially the mechanism that divides operations between what/where/when/how (6, 7) as well as windowing from the perspective of aggregation (8)
Introduce key-based functions - after a
group_byis performed, there are many operations that can be inlined likemap_by_key,reduce_by_keyand so on. The Spark project, for example, provides many functions for key-based functionality (9)
Other research topics include the Naiad's Differential Dataflow engine (10) and Lasp (11).
Links
- https://www.amazon.com/Patterns-Parallel-Programming-paperback-Paperback/dp/0321940784
- http://www.mathematik.uni-marburg.de/~eden/
- https://github.com/ParaPhrase/skel
- http://paraphrase-ict.eu/Deliverables/d2.5/at_download/file
- http://paraphrase-ict.eu/Deliverables/d2-10-pattern-amenability/at_download/file
- https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
- http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
- http://www.vldb.org/pvldb/vol8/p702-tangwongsan.pdf
- http://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs
- http://research-srv.microsoft.com/pubs/176693/differentialdataflow.pdf
- https://lasp-lang.org/
License
Same as Elixir.