ExTorch.Vision
TorchVision ops for ExTorch -- detection, segmentation, and image I/O operators running on the BEAM.
ExTorch.Vision builds libtorchvision.so from source at compile time (against ExTorch's libtorch) and exposes all torchvision C++ operators through ExTorch's generic dispatcher. No Rust or C++ code in this package -- everything goes through ExTorch.Native.dispatch_op/3.
Requirements
- ExTorch (provides libtorch)
- CMake >= 3.18
- C++17 compiler (gcc >= 7 or clang >= 5)
- CUDA toolkit (optional, for GPU support and NVJPEG)
Installation
Add extorch_vision to your dependencies in mix.exs:
def deps do
[
{:extorch, "~> 0.3.0"},
{:extorch_vision, "~> 0.1.0"}
]
end
Then compile -- libtorchvision.so is built from source automatically
during mix compile (no manual steps):
mix deps.get
mix compileUsage
# Lazy initialization -- loads libtorchvision.so on first call
# or call ExTorch.Vision.setup!() explicitly
# Non-maximum suppression
boxes = ExTorch.tensor([[0.0, 0.0, 10.0, 10.0], [0.5, 0.5, 10.5, 10.5], [20.0, 20.0, 30.0, 30.0]])
scores = ExTorch.tensor([0.9, 0.8, 0.7])
keep = ExTorch.Vision.nms(boxes, scores, 0.5)
# ROI Align (detection models)
features = ExTorch.rand({1, 256, 14, 14})
rois = ExTorch.tensor([[0.0, 0.0, 0.0, 7.0, 7.0]])
pooled = ExTorch.Vision.roi_align(features, rois, 1.0, 7, 7)
# Deformable Convolution v2
input = ExTorch.rand({1, 3, 8, 8})
weight = ExTorch.rand({8, 3, 3, 3})
offset = ExTorch.zeros({1, 18, 6, 6})
mask = ExTorch.ones({1, 9, 6, 6})
bias = ExTorch.zeros({8})
out = ExTorch.Vision.deform_conv2d(input, weight, offset, mask, bias, 1, 1, 0, 0)
# Image I/O -- encode/decode without leaving the BEAM
image = ExTorch.randint(0, 255, {3, 224, 224}, dtype: :uint8)
png_bytes = ExTorch.Vision.encode_png(image)
decoded = ExTorch.Vision.decode_png(png_bytes)
# GPU-accelerated JPEG decode (requires NVJPEG)
jpeg_data = ExTorch.Vision.encode_jpeg(image)
gpu_images = ExTorch.Vision.decode_jpegs_cuda([jpeg_data], 0, :cuda)Available operators
Detection / segmentation
| Function | Description |
|---|---|
nms/3 | Non-maximum suppression |
roi_align/7 | Region of Interest Align (bilinear) |
roi_pool/5 | Region of Interest Pooling (max) |
ps_roi_align/6 | Position-sensitive ROI Align (R-FCN) |
ps_roi_pool/5 | Position-sensitive ROI Pooling |
deform_conv2d/14 | Deformable Convolution v2 |
Image I/O
| Function | Description |
|---|---|
decode_jpeg/3 | Decode JPEG from uint8 tensor |
encode_jpeg/2 | Encode to JPEG bytes |
decode_png/3 | Decode PNG from uint8 tensor |
encode_png/2 | Encode to PNG bytes |
decode_webp/2 | Decode WebP |
decode_gif/1 | Decode GIF (animated supported) |
decode_image/3 | Auto-detect format and decode |
decode_jpegs_cuda/3 | Batch JPEG decode on GPU (NVJPEG) |
encode_jpegs_cuda/2 | Batch JPEG encode on GPU |
ExTorch.Export integration
All ops are automatically registered with ExTorch.Export.OpRegistry when setup!/0 is called. This means exported PyTorch models that use torchvision operators (e.g., Faster R-CNN with torchvision::roi_align and torchvision::nms) can be loaded and run via ExTorch.Export.forward/2 without any additional configuration:
ExTorch.Vision.setup!()
model = ExTorch.Export.load("faster_rcnn.pt2", device: :cuda)
output = ExTorch.Export.forward(model, [input_tensor])How it works
ExTorch.Vision contains zero C++ or Rust code. It works by:
-
Building
libtorchvision.sofrom source via CMake (atmix torchvision.buildtime) -
Loading it at runtime via
ExTorch.Native.load_torch_library/1(which callsdlopen) -
TorchVision registers its ops with PyTorch's
c10::DispatcherviaTORCH_LIBRARYblocks -
Elixir calls ops through
ExTorch.Native.dispatch_op/3, which invokes the dispatcher
This architecture means any future torchvision ops are automatically available without code changes -- just rebuild libtorchvision.so.
Configuration
Library path override
Skip the CMake build by pointing to a pre-built libtorchvision.so:
# In config/config.exs
config :extorch_vision, library_path: "/path/to/libtorchvision.so"Or via environment variable:
TORCHVISION_LIB_PATH=/path/to/libtorchvision.so mix compileLocal development with extorch
By default, extorch_vision pulls ExTorch from Hex. For local development
against a checkout of ExTorch, set the EXTORCH_PATH environment variable:
# Point to your local extorch checkout
export EXTORCH_PATH=../extorch
mix deps.get
mix testThis overrides the Hex dependency with a local path dependency, so changes to ExTorch are picked up immediately without publishing.
Force rebuild
To rebuild libtorchvision.so from scratch (e.g., after upgrading CUDA
or switching libtorch versions):
mix torchvision.build --forceLicense
MIT