ImageVision
ImageVision is a simple, opinionated image vision library for Elixir. It sits alongside the image library and answers three questions about any image — what's in it, where are the objects, and which pixels belong to which object — with strong defaults and no ML expertise required.
Quick start
# Classification — what is in this image?
iex> puppy = Image.open!("puppy.jpg")
iex> Image.Classification.labels(puppy)
["Blenheim spaniel"]
# Detection — where are the objects and what are they?
iex> street = Image.open!("street.jpg")
iex> detections = Image.Detection.detect(street)
iex> hd(detections)
%{label: "person", score: 0.94, box: {120, 45, 60, 180}}
# Draw bounding boxes on the image
iex> Image.Detection.draw_bbox_with_labels(detections, street)
# Segmentation — which pixels belong to which object?
iex> segments = Image.Segmentation.segment_panoptic(street)
iex> Enum.map(segments, & &1.label)
["person", "car", "road", "sky"]
# Colour-coded overlay of all segments
iex> Image.Segmentation.compose_overlay(street, segments)
# Promptable segmentation — isolate the object at a specific point
iex> %{mask: mask} = Image.Segmentation.segment(puppy, prompt: {:point, 320, 240})
iex> {:ok, cutout} = Image.Segmentation.apply_mask(puppy, mask)
# Embedding — 768-dim feature vector for similarity search
iex> Image.Classification.embed(puppy)
#Nx.Tensor<f32[768]>Installation
Add :image_vision to mix.exs along with whichever optional ML backends you need:
def deps do
[
{:image_vision, "~> 0.1"},
# Required for Image.Classification and Image.Classification.embed/2
{:bumblebee, "~> 0.6"},
{:nx, "~> 0.10"},
{:exla, "~> 0.10"}, # or {:torchx, "~> 0.10"} for Torch backend
# Required for Image.Detection and Image.Segmentation
{:ortex, "~> 0.1"}
]
endAll ML deps are optional — omit any you do not use. The library compiles cleanly without them.
Prerequisites
For the vast majority of users on Linux x86_64, macOS (Intel and Apple Silicon), and Windows x86_64, no native toolchain is required. The libraries used here ship precompiled native binaries for those platforms and mix deps.get is all you need.
If your platform isn't covered by precompiled binaries — uncommon Linux distros, ARM Linux, glibc mismatches — you'll need:
A Rust toolchain for
:ortex(the ONNX runtime wrapper used by detection and segmentation) and for:tokenizers(pulled in transitively by:bumblebee). Install via rustup:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | shA C compiler for
:vix(the libvips wrapper used by:image). On Linux installbuild-essential(Debian/Ubuntu) orgcc(Fedora/RHEL); on macOS install Xcode Command Line Tools (xcode-select --install).libvipsif you need advanced libvips features beyond what the precompiled NIF includes. On macOS:brew install vips. On Linux: your distro'slibvips-dev/vips-develpackage. Most users don't need this.
If you see Cargo or cc errors during mix deps.compile, you've likely landed on a platform without precompiled coverage — install the toolchain above and re-run.
Disk space and first-call latency
Model weights are downloaded on first call and cached on disk. Across all three default models the total is approximately:
| Task | Default model | Size |
|---|---|---|
| Classification | facebook/convnext-tiny-224 | ~110 MB |
| Detection | onnx-community/rtdetr_r50vd | ~175 MB |
| Segmentation (SAM 2) | SharpAI/sam2-hiera-tiny-onnx | ~150 MB |
| Segmentation (panoptic) | Xenova/detr-resnet-50-panoptic | ~175 MB |
The first call to each task therefore appears to "hang" while weights download — that's expected, not a bug.
To pre-download all default models before first use (recommended for production deployments and CI):
mix image_vision.download_models
Pass --classify, --detect, or --segment to limit scope.
Livebook Desktop
Livebook Desktop launches as a GUI application and does not inherit your shell's PATH. Tools installed via rustup, mise, asdf, or Homebrew aren't visible to it by default — even if cargo works fine in your terminal.
If you hit "cargo: command not found" or similar during Mix.install inside Livebook Desktop, create ~/.livebookdesktop.sh and add the relevant directories to PATH. A reasonable starting point:
# ~/.livebookdesktop.sh
# Rust (rustup)
export PATH="$HOME/.cargo/bin:$PATH"
# Homebrew (Apple Silicon)
export PATH="/opt/homebrew/bin:$PATH"
# mise — uncomment if you use it
# eval "$(mise activate bash)"
# asdf — uncomment if you use it
# . "$HOME/.asdf/asdf.sh"Restart Livebook Desktop after creating this file. See the Livebook Desktop documentation for details.
Default models
All models are permissively licensed. Weights are downloaded automatically on first call and cached on disk — no manual setup required.
| Task | Model | License | Size |
|---|---|---|---|
| Classification | facebook/convnext-tiny-224 | Apache 2.0 | ~110 MB |
| Embedding | facebook/dinov2-base | Apache 2.0 | ~340 MB |
| Object detection | onnx-community/rtdetr_r50vd | Apache 2.0 | ~175 MB |
| Promptable segmentation | SharpAI/sam2-hiera-tiny-onnx | Apache 2.0 | ~150 MB |
| Panoptic segmentation | Xenova/detr-resnet-50-panoptic | Apache 2.0 | ~175 MB |
Configuration
Model cache
ONNX model weights are cached in a per-user directory by default (~/.cache/image_vision on Linux, ~/Library/Caches/image_vision on macOS). Override in config/runtime.exs:
config :image_vision, :cache_dir, "/var/lib/my_app/models"Classification serving
Image.Classification runs a supervised Bumblebee serving. It does not autostart by default. Start it in your application's supervision tree:
# application.ex
def start(_type, _args) do
children = [
Image.Classification.classifier(),
Image.Classification.embedder() # omit if you do not need embeddings
]
Supervisor.start_link(children, strategy: :one_for_one)
end
Or enable autostart via config so ImageVision.Supervisor handles it:
# config/runtime.exs
config :image_vision, :classifier, autostart: trueTo use a different model:
config :image_vision, :classifier,
model: {:hf, "facebook/convnext-large-224-22k-1k"},
featurizer: {:hf, "facebook/convnext-large-224-22k-1k"},
autostart: trueGuides
- Classification — classifying images and computing embeddings
- Detection — bounding-box object detection
- Segmentation — promptable and panoptic segmentation