ElixirDatasets

Hex.pmDocumentationLicense: MIT

ElixirDatasets is a comprehensive library for accessing and managing datasets from Hugging Face Hub in Elixir. Inspired by the Python datasets library, it brings powerful dataset management capabilities to the Elixir ecosystem with seamless integration with Explorer DataFrames.

✨ Features

πŸ“¦ Installation

Add elixir_datasets to your list of dependencies in mix.exs:

def deps do
  [
    {:elixir_datasets, "~> 0.1.0"}
  ]
end

πŸš€ Quick Start

{:ok, [train_df]} = ElixirDatasets.load_dataset(
  {:hf, "cornell-movie-review-data/rotten_tomatoes"},
  split: "train"
)

{:ok, datasets} = ElixirDatasets.load_dataset({:local, "./data"})

{:ok, stream} = ElixirDatasets.load_dataset(
  {:hf, "stanfordnlp/imdb", subdir: "plain_text"},
  split: "train",
  streaming: true
)

stream |> Enum.take(100) |> IO.inspect()

πŸ“š Examples

All examples can be found in the examples directory.

πŸ”§ Configuration

Environment Variables

πŸ“– Documentation

Full documentation is available at HexDocs and hosted on GitHub Pages for current status of under-development features. Documentation can be generated locally using:

mix docs

πŸ§ͺ Testing

MIX_ENV=test mix test

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2025 RadosΕ‚aw Rolka, Weronika Wojtas