HfHub Logo

HfHub

Hex.pmDocumentationCILicense

Elixir client for HuggingFace Hub — dataset/model metadata, file downloads, caching, and authentication. An Elixir port of Python's huggingface_hub.

hf_hub_ex provides a robust, production-ready interface to the HuggingFace Hub API, enabling Elixir applications to seamlessly access models, datasets, and spaces. This library is designed to be the foundational layer for porting Python HuggingFace libraries (datasets, evaluate, transformers) to the BEAM ecosystem.

Features

Installation

Add hf_hub to your dependencies in mix.exs:

def deps do
  [
    {:hf_hub, "~> 0.2.0"}
  ]
end

Then run:

mix deps.get

Quick Start

Authentication

Set your HuggingFace token as an environment variable or in config:

export HF_TOKEN="hf_..."

Or in config/config.exs:

config :hf_hub,
  token: System.get_env("HF_TOKEN"),
  cache_dir: Path.expand("~/.cache/huggingface")

Fetching Model Metadata

# Get model information
{:ok, model_info} = HfHub.Api.model_info("bert-base-uncased")

IO.inspect(model_info.id)          # "bert-base-uncased"
IO.inspect(model_info.downloads)   # 123456789
IO.inspect(model_info.tags)        # ["pytorch", "bert", "fill-mask"]

Downloading Files

# Download a model file
{:ok, path} = HfHub.Download.hf_hub_download(
  repo_id: "bert-base-uncased",
  filename: "config.json",
  repo_type: :model
)

# Read the downloaded file
{:ok, config} = File.read(path)

# Download and extract an archive (returns extracted path)
{:ok, extracted_path} = HfHub.Download.hf_hub_download(
  repo_id: "albertvillanova/tmp-tests-zip",
  filename: "ds.zip",
  repo_type: :dataset,
  extract: true
)

# Download with progress tracking
{:ok, path} = HfHub.Download.hf_hub_download(
  repo_id: "some/model",
  filename: "model.bin",
  progress_callback: fn downloaded, total ->
    if total, do: IO.puts("#{round(downloaded / total * 100)}%")
  end
)

# Download with SHA256 verification
{:ok, path} = HfHub.Download.hf_hub_download(
  repo_id: "some/model",
  filename: "model.bin",
  verify_checksum: true,
  expected_sha256: "abc123..."  # Optional: fails if hash doesn't match
)

Offline Mode

# Check if offline mode is enabled (via HF_HUB_OFFLINE=1 or config)
if HfHub.offline_mode?() do
  IO.puts("Running in offline mode - only cached files available")
end

# Try to load a file from cache without network requests
case HfHub.try_to_load_from_cache("bert-base-uncased", "config.json") do
  {:ok, path} ->
    # File is cached, use it directly
    File.read!(path)
  {:error, :not_cached} ->
    # File not cached, decide whether to download
    {:ok, path} = HfHub.Download.hf_hub_download(
      repo_id: "bert-base-uncased",
      filename: "config.json"
    )
    File.read!(path)
end

Accessing Datasets

# Get dataset information
{:ok, dataset_info} = HfHub.Api.dataset_info("squad")

# Download dataset files
{:ok, path} = HfHub.Download.hf_hub_download(
  repo_id: "squad",
  filename: "train-v1.1.json",
  repo_type: :dataset
)

# Discover configs and splits
{:ok, configs} = HfHub.Api.dataset_configs("dpdl-benchmark/caltech101")
{:ok, splits} = HfHub.Api.dataset_splits("dpdl-benchmark/caltech101", config: "default")

# Resolve file paths for a config + split
{:ok, files} =
  HfHub.DatasetFiles.resolve("dpdl-benchmark/caltech101", "default", "train")

Bumblebee-Compatible API

Use the tuple-based repository API for seamless integration with Elixir ML pipelines:

# Repository reference types
repo = {:hf, "bert-base-uncased"}
repo_with_opts = {:hf, "bert-base-uncased", revision: "v1.0", auth_token: "hf_xxx"}
local_repo = {:local, "/path/to/model"}

# List files with ETags for cache validation
{:ok, files} = HfHub.get_repo_files({:hf, "bert-base-uncased"})
# => %{"config.json" => "\"abc123\"", "pytorch_model.bin" => "\"def456\"", ...}

# ETag-based cached download
{:ok, path} = HfHub.cached_download(
  "https://huggingface.co/bert-base-uncased/resolve/main/config.json"
)

# Build file URLs
url = HfHub.file_url("bert-base-uncased", "config.json", "main")

Repository Management

# Create a new repository
{:ok, url} = HfHub.Repo.create("my-org/my-model", private: true)

# Create a Space with Gradio
{:ok, url} = HfHub.Repo.create("my-space", repo_type: :space, space_sdk: "gradio")

# Delete a repository
:ok = HfHub.Repo.delete("my-org/old-model")

# Update settings
:ok = HfHub.Repo.update_settings("my-model", private: true, gated: :auto)

# Move/rename
{:ok, url} = HfHub.Repo.move("old-name", "new-org/new-name")

# Check existence
true = HfHub.Repo.exists?("bert-base-uncased")

File Upload

# Upload a small file (< 10MB uses base64, >= 10MB uses LFS automatically)
{:ok, info} = HfHub.Commit.upload_file(
  "/path/to/model.bin",
  "model.bin",
  "my-org/my-model",
  token: token,
  commit_message: "Add model weights"
)

# Upload from binary content
{:ok, info} = HfHub.Commit.upload_file(
  Jason.encode!(%{hidden_size: 768}),
  "config.json",
  "my-model",
  token: token
)

# Delete a file
{:ok, info} = HfHub.Commit.delete_file("old_model.bin", "my-model", token: token)

# Multiple operations in one commit
alias HfHub.Commit.Operation

{:ok, info} = HfHub.Commit.create("my-model", [
  Operation.add("config.json", config_content),
  Operation.add("model.bin", "/path/to/model.bin"),
  Operation.delete("old_config.json")
], token: token, commit_message: "Update model")

Folder Upload

# Upload entire folder
{:ok, info} = HfHub.Commit.upload_folder(
  "/path/to/model_dir",
  "my-org/my-model",
  token: token,
  commit_message: "Upload model"
)

# With pattern filtering
{:ok, info} = HfHub.Commit.upload_folder(
  "/path/to/model_dir",
  "my-model",
  token: token,
  ignore_patterns: ["*.pyc", "__pycache__/**"],
  allow_patterns: ["*.safetensors", "*.json"]
)

# Large folder with automatic batching
{:ok, infos} = HfHub.Commit.upload_large_folder(
  "/path/to/huge_model",
  "my-model",
  token: token,
  multi_commits: true
)

Git Operations

# Create a branch
{:ok, info} = HfHub.Git.create_branch("my-org/my-model", "feature-branch", token: token)

# Create branch from specific revision
{:ok, info} = HfHub.Git.create_branch("my-model", "hotfix", revision: "v1.0", token: token)

# Delete a branch
:ok = HfHub.Git.delete_branch("my-model", "old-branch", token: token)

# Create a tag
{:ok, info} = HfHub.Git.create_tag("my-model", "v1.0", token: token)

# Create annotated tag with message
{:ok, info} = HfHub.Git.create_tag("my-model", "v2.0",
  revision: "abc123",
  message: "Release v2.0",
  token: token
)

# List all refs (branches, tags)
{:ok, refs} = HfHub.Git.list_refs("bert-base-uncased")
refs.branches  # [%BranchInfo{name: "main", ...}]
refs.tags      # [%TagInfo{name: "v1.0", ...}]

# List commits
{:ok, commits} = HfHub.Git.list_commits("bert-base-uncased", revision: "main")

# Super squash (destructive - squashes all commits)
:ok = HfHub.Git.super_squash("my-model", message: "Squashed history", token: token)

User & Organization Profiles

# Get user profile
{:ok, user} = HfHub.Users.get("username")
IO.inspect(user.num_followers)

# List followers/following
{:ok, followers} = HfHub.Users.list_followers("username")
{:ok, following} = HfHub.Users.list_following("username")

# Like/unlike repos
:ok = HfHub.Users.like("bert-base-uncased")
:ok = HfHub.Users.unlike("bert-base-uncased")

# Organization info
{:ok, org} = HfHub.Organizations.get("huggingface")
{:ok, members} = HfHub.Organizations.list_members("huggingface")

Model & Dataset Cards

# Load and parse cards
{:ok, card} = HfHub.Cards.load_model_card("bert-base-uncased")
card.data.license  # "apache-2.0"
card.data.tags     # ["pytorch", "bert", "fill-mask"]

{:ok, card} = HfHub.Cards.load_dataset_card("squad")
card.data.task_categories  # ["question-answering"]

# Parse from content
{:ok, card} = HfHub.Cards.parse_model_card(readme_content)

# Create and render cards
card = HfHub.Cards.create_model_card(%{
  language: "en",
  license: "mit",
  tags: ["text-classification"]
})
markdown = HfHub.Cards.render(card)

Cache Management

# Check if a file is cached
cached? = HfHub.Cache.cached?(
  repo_id: "bert-base-uncased",
  filename: "pytorch_model.bin"
)

# Clear cache for a specific repo
:ok = HfHub.Cache.clear_cache(repo_id: "bert-base-uncased")

# Get cache statistics
{:ok, stats} = HfHub.Cache.cache_stats()
IO.inspect(stats.total_size)  # Total bytes in cache
IO.inspect(stats.file_count)  # Number of cached files

Examples

The examples/ directory contains runnable scripts demonstrating common use cases:

# Run all examples at once
./examples/run_all.sh

# Or run individual examples:
mix run examples/list_datasets.exs      # List top datasets
mix run examples/list_models.exs        # List popular models
mix run examples/dataset_info.exs       # Get dataset metadata
mix run examples/list_repo_tree.exs     # List repo tree entries
mix run examples/dataset_configs_splits.exs  # Dataset configs + splits
mix run examples/dataset_files_resolver.exs  # Resolve dataset files by config + split
mix run examples/download_file.exs      # Download a single file
mix run examples/download_with_extract.exs   # Download + extract archives
mix run examples/cache_demo.exs         # Cache management demo
mix run examples/stream_download.exs    # Stream large files
mix run examples/snapshot_download.exs  # Download entire repo
mix run examples/auth_demo.exs          # Authentication flow

See the examples README for detailed documentation.

API Overview

HfHub.Api

Interact with the HuggingFace Hub API:

HfHub.Download

Download files from HuggingFace repositories:

HfHub.DatasetFiles

Resolve dataset files by config and split:

HfHub.Cache

Manage local file cache:

HfHub.FS

Filesystem utilities for HuggingFace cache:

HfHub.Config

Configuration utilities:

HfHub.Auth

Authentication and authorization:

HfHub.Hub

Bumblebee-compatible ETag-based caching:

HfHub.Repository

Repository reference types and helpers:

HfHub.RepoFiles

Repository file listing with ETags:

HfHub.Constants

Constants matching Python's huggingface_hub.constants:

HfHub.Errors

Structured exceptions for error handling:

HfHub.LFS

LFS (Large File Storage) utilities:

HfHub.Commit

Commit operations for file uploads:

HfHub.Git

Git operations for branch, tag, and commit management:

HfHub.Users

User profile and activity API:

HfHub.Organizations

Organization profile API:

HfHub.Cards

Model and Dataset card parsing and creation:

Configuration

Configure hf_hub in your config/config.exs:

config :hf_hub,
  # Authentication token (defaults to HF_TOKEN env var)
  token: System.get_env("HF_TOKEN"),

  # Cache directory (defaults to ~/.cache/huggingface)
  cache_dir: Path.expand("~/.cache/huggingface"),

  # Hub endpoint (defaults to https://huggingface.co)
  endpoint: "https://huggingface.co",

  # HTTP client options
  http_opts: [
    receive_timeout: 30_000,
    pool_timeout: 5_000
  ],

  # Cache options
  cache_opts: [
    max_size: 10 * 1024 * 1024 * 1024,  # 10 GB
    eviction_policy: :lru
  ]

Comparison to Python's huggingface_hub

hf_hub_ex aims for feature parity with the Python library while embracing Elixir idioms:

Feature Python huggingface_hub Elixir hf_hub_ex
API Client
File Downloads
Caching ✅ (OTP-based)
Authentication
Repository Management
Upload Files
Inference API 🚧 (Planned)

Key Differences

Roadmap

See docs/ROADMAP.md for detailed feature parity status with Python huggingface_hub.

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-new-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (mix test)
  5. Run code quality checks (mix format && mix credo && mix dialyzer)
  6. Commit your changes (git commit -am 'Add new feature')
  7. Push to the branch (git push origin feature/my-new-feature)
  8. Create a Pull Request

Testing

# Run all tests
mix test

# Run with coverage
mix test --cover

# Run specific test file
mix test test/hf_hub/api_test.exs

License

MIT License - See LICENSE for details.

Acknowledgments

Links


Built with ❤️ by the North-Shore-AI team