Exoplanet

Exoplanet is an Elixir library that aggregates multiple RSS and Atom feeds into a single, unified feed sorted by publication date (descending).

It downloads each source concurrently, applies optional per-feed content filters (category allow/block lists, image stripping, summary excerpts), caps the contribution from any one feed, and returns a flat list of %Exoplanet.Post{} structs ready to render.

Inspired by Planet Venus and NimblePublisher.

Installation

Add :exoplanet to your dependencies in mix.exs:

def deps do
[
{:exoplanet, "~> 0.6"}
]
end

Documentation is published at https://hexdocs.pm/exoplanet.

Quick start

Describe your feeds in an .exs file that returns a map:

# planet.exs
%{
items: 60,
new_feed_items: 4,
default_filters: %{
allow_categories: ["elixir", "erlang"],
block_categories: [],
strip_images: false,
excerpt_length: nil,
sanitize_html: true
# `drop_tags` / `drop_attrs` are omitted so they inherit the secure
# built-in defaults — see `Exoplanet.Filters` to customize them.
},
sources: %{
"https://milmazz.uno/atom.xml" => %{name: "Milton Mazzarri"},
"https://www.theerlangelist.com/rss" => %{name: "Saša Jurić"}
}
}

Load it and build the merged feed:

config = Exoplanet.Config.from_file("planet.exs")
posts = Exoplanet.build(config)

posts is a list of Exoplanet.Post structs sorted newest-first. See example/planet_beam.exs for a fuller config that exercises the supported fields (it leaves drop_tags / drop_attrs at their secure defaults), and Exoplanet.Filters for the filter semantics.

A note on ordering and time zones

Post.published/Post.updated are NaiveDateTime values, and the merged feed is sorted by published descending. Both date paths discard the original UTC offset: RSS dates go through Exoplanet.DateTimeParser (RFC 822) and Atom dates through NaiveDateTime.from_iso8601/1, neither of which keeps the zone. This is a deliberate trade-off — feeds publish in many zones, and Exoplanet treats every timestamp as wall-clock time.

The practical consequence: when two posts from feeds in different zones are published close together, their relative order can be off by as much as the offset difference (up to ~24h). For a human-readable aggregated feed this is usually fine; if you need globally-correct chronological ordering, normalise the timestamps to UTC yourself before relying on the order.

A note on HTML sanitization

With sanitize_html: true (the default), Exoplanet removes the tags in drop_tags, the attributes in drop_attrs, all on* event-handler attributes, and any URL-bearing attribute (href, src, srcset, action, formaction, poster, xlink:href) whose URL scheme is not http, https, or mailto. This is defense-in-depth for feed content, not a guarantee — if you render feed HTML in a security-sensitive context, consider pairing it with a dedicated sanitizer such as html_sanitize_ex.

To delegate sanitization to such a library, implement the Exoplanet.Sanitizer behaviour and configure it:

defmodule MyApp.FeedSanitizer do
@behaviour Exoplanet.Sanitizer
@impl true
def sanitize(html), do: HtmlSanitizeEx.basic_html(html)
end
# config/config.exs
config :exoplanet, sanitizer_adapter: MyApp.FeedSanitizer

When configured (and sanitize_html is true), the adapter replaces the built-in sanitizer. html_sanitize_ex is not a dependency of Exoplanet — add it to your own application.

Note that Exoplanet.Config.from_file/1 evaluates the config file with Code.eval_file/1 — treat that file as trusted code, like any other .exs in your project.

HTTP client options

Each feed request is bounded by the config's feed_timeout (seconds), which is passed to Req as :receive_timeout. You can forward additional options to Req.get/2 (user-agent, proxy, retry policy, …) via the application environment:

Application.put_env(:exoplanet, :req_options, headers: [user_agent: "my-planet/1.0"])

The old key name :planet_req_options still works but is deprecated; use :req_options.

HTTP caching (optional)

Exoplanet can issue conditional If-None-Match / If-Modified-Since requests and fall back to a cached body on transient errors. Implement the Exoplanet.Cache behaviour and register the adapter:

Application.put_env(:exoplanet, :cache_adapter, MyApp.FeedCache)

See the Exoplanet.Cache module docs for the callback contract and the optional on_success/2 / on_error/3 notification hooks.

Migration from 0.2.x

Exoplanet.Config was slimmed down in 0.3.0 to hold only fields the library itself reads. Site-presentation metadata (name, link, owner_name, owner_email, about, code_of_conduct, activity_threshold, related_sites) and vestigial Venus-era settings (cache_directory, output_dir, output_theme, log_level) are no longer part of the struct.

Exoplanet.Config.from_file/1 ignores keys it doesn't recognise, so a single .exs file can still serve both Exoplanet and a consumer-side config (for example a static-site generator) — keep the extra fields in the same map and read them yourself.

See the CHANGELOG for the full list of changes.

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, the test conventions, and how the generated DateTimeParser is regenerated. Run mix precommit before opening a pull request — it mirrors the CI checks.

License

Apache-2.0. See LICENSE.