ExTorch.Vision

TorchVision ops for ExTorch -- detection, segmentation, and image I/O operators running on the BEAM.

ExTorch.Vision builds libtorchvision.so from source at compile time (against ExTorch's libtorch) and exposes all torchvision C++ operators through ExTorch's generic dispatcher. No Rust or C++ code in this package -- everything goes through ExTorch.Native.dispatch_op/3.

Requirements

ExTorch (provides libtorch)
CMake >= 3.18
C++17 compiler (gcc >= 7 or clang >= 5)
CUDA toolkit (optional, for GPU support and NVJPEG)

Installation

Add extorch_vision to your dependencies in mix.exs:

def deps do
  [
    {:extorch, "~> 0.3.0"},
    {:extorch_vision, "~> 0.1.0"}
  ]
end

Then compile -- libtorchvision.so is built from source automatically during mix compile (no manual steps):

mix deps.get
mix compile

Usage

# Lazy initialization -- loads libtorchvision.so on first call
# or call ExTorch.Vision.setup!() explicitly

# Non-maximum suppression
boxes = ExTorch.tensor([[0.0, 0.0, 10.0, 10.0], [0.5, 0.5, 10.5, 10.5], [20.0, 20.0, 30.0, 30.0]])
scores = ExTorch.tensor([0.9, 0.8, 0.7])
keep = ExTorch.Vision.nms(boxes, scores, 0.5)

# ROI Align (detection models)
features = ExTorch.rand({1, 256, 14, 14})
rois = ExTorch.tensor([[0.0, 0.0, 0.0, 7.0, 7.0]])
pooled = ExTorch.Vision.roi_align(features, rois, 1.0, 7, 7)

# Deformable Convolution v2
input = ExTorch.rand({1, 3, 8, 8})
weight = ExTorch.rand({8, 3, 3, 3})
offset = ExTorch.zeros({1, 18, 6, 6})
mask = ExTorch.ones({1, 9, 6, 6})
bias = ExTorch.zeros({8})
out = ExTorch.Vision.deform_conv2d(input, weight, offset, mask, bias, 1, 1, 0, 0)

# Image I/O -- encode/decode without leaving the BEAM
image = ExTorch.randint(0, 255, {3, 224, 224}, dtype: :uint8)
png_bytes = ExTorch.Vision.encode_png(image)
decoded = ExTorch.Vision.decode_png(png_bytes)

# GPU-accelerated JPEG decode (requires NVJPEG)
jpeg_data = ExTorch.Vision.encode_jpeg(image)
gpu_images = ExTorch.Vision.decode_jpegs_cuda([jpeg_data], 0, :cuda)

Available operators

Detection / segmentation

Function	Description
`nms/3`	Non-maximum suppression
`roi_align/7`	Region of Interest Align (bilinear)
`roi_pool/5`	Region of Interest Pooling (max)
`ps_roi_align/6`	Position-sensitive ROI Align (R-FCN)
`ps_roi_pool/5`	Position-sensitive ROI Pooling
`deform_conv2d/14`	Deformable Convolution v2

Image I/O

Function	Description
`decode_jpeg/3`	Decode JPEG from uint8 tensor
`encode_jpeg/2`	Encode to JPEG bytes
`decode_png/3`	Decode PNG from uint8 tensor
`encode_png/2`	Encode to PNG bytes
`decode_webp/2`	Decode WebP
`decode_gif/1`	Decode GIF (animated supported)
`decode_image/3`	Auto-detect format and decode
`decode_jpegs_cuda/3`	Batch JPEG decode on GPU (NVJPEG)
`encode_jpegs_cuda/2`	Batch JPEG encode on GPU

ExTorch.Export integration

All ops are automatically registered with ExTorch.Export.OpRegistry when setup!/0 is called. This means exported PyTorch models that use torchvision operators (e.g., Faster R-CNN with torchvision::roi_align and torchvision::nms) can be loaded and run via ExTorch.Export.forward/2 without any additional configuration:

ExTorch.Vision.setup!()

model = ExTorch.Export.load("faster_rcnn.pt2", device: :cuda)
output = ExTorch.Export.forward(model, [input_tensor])

How it works

ExTorch.Vision contains zero C++ or Rust code. It works by:

Building libtorchvision.so from source via CMake (at mix torchvision.build time)
Loading it at runtime via ExTorch.Native.load_torch_library/1 (which calls dlopen)
TorchVision registers its ops with PyTorch's c10::Dispatcher via TORCH_LIBRARY blocks
Elixir calls ops through ExTorch.Native.dispatch_op/3, which invokes the dispatcher

This architecture means any future torchvision ops are automatically available without code changes -- just rebuild libtorchvision.so.

Configuration

Library path override

Skip the CMake build by pointing to a pre-built libtorchvision.so:

# In config/config.exs
config :extorch_vision, library_path: "/path/to/libtorchvision.so"

Or via environment variable:

TORCHVISION_LIB_PATH=/path/to/libtorchvision.so mix compile

Local development with extorch

By default, extorch_vision pulls ExTorch from Hex. For local development against a checkout of ExTorch, set the EXTORCH_PATH environment variable:

# Point to your local extorch checkout
export EXTORCH_PATH=../extorch

mix deps.get
mix test

This overrides the Hex dependency with a local path dependency, so changes to ExTorch are picked up immediately without publishing.

Force rebuild

To rebuild libtorchvision.so from scratch (e.g., after upgrading CUDA or switching libtorch versions):

mix torchvision.build --force

License

MIT