ExBashkit

Elixir NIF wrapper for bashkit — a sandboxed, virtual bash interpreter written in Rust.

Run bash scripts safely from Elixir: ~150 builtins (echo, grep, sed, awk, jq, cat, find, sort, …) are reimplemented in Rust, file I/O hits an in-memory virtual filesystem, and there is no fork/exec escape hatch. Nothing touches the host OS unless you explicitly grant it. That makes it safe to run untrusted scripts — for example, bash written by an LLM agent.

⚠️ Early days. Stateless ExBashkit.exec/1 and persistent ExBashkit.Sessions are wired up today. The rest of the surface (virtual-filesystem mounts, resource limits, a network allowlist, Elixir-defined custom builtins, snapshot/resume) is in progress — see PORTING.md for the plan and current status.

Installation

def deps do
  [
    {:ex_bashkit, "~> 0.1"}
  ]
end

A precompiled NIF is downloaded for your platform — no Rust toolchain required to use the library. Supported targets: {x86_64,aarch64}-apple-darwin and {x86_64,aarch64}-unknown-linux-gnu.

Quick start

iex> ExBashkit.exec("echo hello | tr a-z A-Z")
{:ok, %ExBashkit.Result{stdout: "HELLO\n", stderr: "", exit_code: 0}}

iex> ExBashkit.exec("for i in 1 2 3; do echo $((i * i)); done")
{:ok, %ExBashkit.Result{stdout: "1\n4\n9\n", exit_code: 0}}

# A non-zero exit is still {:ok, ...} — the script ran and chose to fail,
# exactly like a real shell.
iex> ExBashkit.exec("test -f /etc/passwd")
{:ok, %ExBashkit.Result{exit_code: 1}}

Persistent sessions

ExBashkit.exec/1 is stateless — each call is a fresh sandbox. When you want state to carry across calls (like an interactive shell), use a ExBashkit.Session: environment variables, the working directory, the in-memory filesystem, shell functions and aliases all persist.

session = ExBashkit.Session.new()

ExBashkit.Session.exec(session, "export GREETING=hello")
ExBashkit.Session.exec(session, "cd /tmp && echo world > note.txt")

{:ok, result} = ExBashkit.Session.exec(session, "echo $GREETING $(cat /tmp/note.txt)")
result.stdout
# => "hello world\n"

Seed the initial state with options:

session =
  ExBashkit.Session.new(
    env: %{"LANG" => "C"},
    cwd: "/tmp",
    username: "alice",
    hostname: "my-server"
  )

ExBashkit.Session.exec(session, "whoami")   # => "alice\n"
ExBashkit.Session.exec(session, "pwd")      # => "/tmp\n"

A session serializes its own calls — concurrent exec/2 on the same session run one at a time. Separate sessions are fully independent.

Virtual filesystem

A session's filesystem is in-memory and shared between scripts and the host. You can seed inputs, then pull results back out — without going through a script:

session = ExBashkit.Session.new(files: %{"/in/data.csv" => "a,1\nb,2\n"})

{:ok, _} = ExBashkit.Session.exec(session, "cut -d, -f1 /in/data.csv | sort > /out.txt")

ExBashkit.Session.read_file(session, "/out.txt")
# => {:ok, "a\nb\n"}

ExBashkit.Session.new(files: %{path => content}) seeds files up front (content is any iodata; parent dirs are created).
ExBashkit.Session.write_file(session, path, content) places a file at any time.
ExBashkit.Session.read_file(session, path) returns {:ok, binary} — including files a script wrote — round-tripping arbitrary (even non-UTF-8) bytes.

By default the filesystem is fully virtual — no host path is reachable.

Host mounts

To give a sandbox controlled access to real host directories, map them in with explicit access modes:

session =
  ExBashkit.Session.new(
    mounts: [
      {"/data", "/srv/app/data", :read_only},
      {"/work", "/tmp/sandbox-work", :read_write}
    ]
  )

{:ok, _} = ExBashkit.Session.exec(session, "wc -l /data/*.csv > /work/counts.txt")
# /tmp/sandbox-work/counts.txt now exists on the real disk.

:read_only — scripts read host files; writes fail.
:read_write — scripts read and modify real host files (a footgun — use a dedicated directory).

bashkit enforces the isolation: paths are canonicalized, and .. traversal or symlinks that escape the mounted directory are rejected — a mount of /srv/app/data can't reach /srv/app/secrets. Sensitive host locations (/etc, /home, /Users, /private, paths with .ssh/.aws, …) are refused by default; pass :allowed_mount_paths to opt in (note: setting it switches bashkit from the built-in denylist to allowlist-only gating). On macOS, temp dirs under /var/folders canonicalize beneath /private, so mounting them needs an allowlist entry. A refused or misconfigured mount raises from new/1.

:overlay mounts (host-backed, copy-on-write) are intentionally not supported: bashkit has no real-FS overlay mode, and ExBashkit only exposes what bashkit does. For copy-on-write behavior, use the in-memory filesystem.

Resource limits

bashkit bounds execution with safe defaults; tighten them per session for untrusted scripts. Exceeding a limit returns {:error, message}.

session = ExBashkit.Session.new(limits: [max_commands: 1_000, timeout_ms: 2_000])

ExBashkit.Session.exec(session, "for i in {1..1000000}; do :; done")
# => {:error, "resource limit exceeded: maximum command count exceeded (1000)"}

Available limits: :max_commands, :max_loop_iterations, :max_total_loop_iterations, :max_function_depth, :max_input_bytes, :timeout_ms. Each is optional and defaults to bashkit's value.

Network access

A session cannot reach the network until you grant it an allowlist. :allow_net is default-deny — only requests matching a pattern's scheme, host, port, and path-prefix are permitted, and redirects are not followed.

session = ExBashkit.Session.new(allow_net: ["https://api.example.com"])

ExBashkit.Session.exec(session, "curl -s https://api.example.com/v1/health")
# => {:ok, %ExBashkit.Result{exit_code: 0, ...}}

ExBashkit.Session.exec(session, "curl -s https://evil.example")
# => blocked (non-zero exit) — not on the allowlist

Requests to private/reserved IPs (loopback, RFC 1918, link-local, …) are blocked by default to prevent SSRF, even when the URL is allowlisted; pass block_private_ips: false to reach a localhost service deliberately. Use allow_net: :all only for fully trusted scripts.

Custom builtins

Register Elixir functions as virtual executables the script can call. A script line name args… calls back into your application, which returns the command's output — the way to expose capabilities you control (a database query, a lookup, an approval step) without real process or network access.

session =
  ExBashkit.Session.new(
    builtins: %{
      "kv_get" => fn call ->
        case Map.fetch(%{"answer" => "42"}, hd(call.args)) do
          {:ok, value} -> {:ok, value <> "\n"}
          :error -> {:error, "no such key\n"}
        end
      end
    }
  )

ExBashkit.Session.exec(session, "echo \"the answer is $(kv_get answer)\"")
# => {:ok, %ExBashkit.Result{stdout: "the answer is 42\n", exit_code: 0}}

A builtin receives %{args:, stdin:, env:} and returns {:ok, iodata} (stdout, exit 0), {:error, iodata} (stderr, exit 1), or a full %ExBashkit.Result{}. A handler that raises or exceeds :builtin_timeout_ms fails only that command, not the session.

Virtual filesystem backends

Mount an Elixir-backed filesystem at a path: the script's reads and writes under it are serviced by your application, so "files" can be generated on demand or proxied to a real store. A backend is a module implementing the ExBashkit.VirtualFs behaviour (as module or {module, arg}), or a single dispatch function for inline use.

session =
  ExBashkit.Session.new(
    virtual_fs: %{
      "/api" => fn
        %{op: :read, path: "/" <> name} -> {:ok, "generated: #{name}\n"}
        _ -> {:error, :enotsup}
      end
    }
  )

ExBashkit.Session.exec(session, "cat /api/widget")
# => {:ok, %ExBashkit.Result{stdout: "generated: widget\n", exit_code: 0}}

Reads and writes are both supported (read/write/append/mkdir/remove/ list/stat); paths arrive rooted at the mount. It composes with the in-memory FS, :files, and host :mounts, and reuses the same back-call machinery (and failure isolation) as custom builtins.

Python (optional)

With the optional ex_monty dependency, a session can run sandboxed Python that shares the bash filesystem — so a file one step writes, the next step reads, across the bash/Python boundary, just like a real shell.

# add {:ex_monty, "~> ..."} to your deps, then:
session = ExBashkit.Session.new(python: true)

ExBashkit.Session.exec(session, """
  printf '1\\n2\\n3\\n' > /nums.txt
  python -c "from pathlib import Path; \\
             print(sum(int(x) for x in Path('/nums.txt').read_text().split()))"
""")
# => {:ok, %ExBashkit.Result{stdout: "6\n", exit_code: 0}}

python: true registers python and python3. A script runs python file.py, python -c "…", or a program piped on stdin; Python's pathlib/os filesystem operations are routed to the same virtual filesystem (cat, >, mounts, and :virtual_fs all interoperate). Python runs fully sandboxed — every effect except the filesystem and os.getenv is denied (no network, no clock) — and a Python error or timeout fails only that command, never the session.

It's an Elixir-defined builtin over the same back-call bridge as :builtins, so there's no change to the precompiled NIF; you opt in purely by adding ex_monty to your deps. (Current limits: no sys.argv; pathlib.Path I/O, not open().)

Without ex_monty, ExBashkit still compiles and runs normally — ex_monty is an optional dependency gated at runtime. The only difference: python: true then raises a clear ArgumentError at Session.new/1 telling you to add the dep (fail-fast, never a mysterious crash mid-script). A session created withoutpython: is unaffected — a script that runs python simply gets a command-not-found, exactly as if the executable weren't installed.

Snapshot & resume

Capture a session's state to a binary and reload it later — after a restart, or on another node. snapshot/2 serializes the shell state (variables, env, cwd, aliases, functions) and in-memory filesystem contents; restore/3 loads it back.

session = ExBashkit.Session.new()
{:ok, _} = ExBashkit.Session.exec(session, "x=42; echo data > /work.txt")

{:ok, bytes} = ExBashkit.Session.snapshot(session)
# ...persist `bytes`, restart, come back later...

resumed = ExBashkit.Session.new()
{:ok, resumed} = ExBashkit.Session.restore(resumed, bytes)
ExBashkit.Session.exec(resumed, "echo $x; cat /work.txt")
# => {:ok, %ExBashkit.Result{stdout: "42\ndata\n", exit_code: 0}}

A snapshot carries interpreter state, not session configuration: custom :builtins, :virtual_fs backends, host :mounts, and :limits are live Elixir processes / builder config, not bytes. To resume a session that used them, rebuild it with the same capabilities, then restore — the backends re-attach live and only the shell + in-memory FS travel in the snapshot. restore/3 preserves the target session's capabilities and validates the whole snapshot before mutating, so a bad snapshot returns {:error, _} and leaves the session usable.

For snapshots that cross a trust boundary (network, shared storage, untrusted input), pass key: — an HMAC secret that must match on restore; a wrong key or tampered bytes are rejected. Without a key, the embedded digest detects accidental corruption only (it is public, not a forgery defense). :exclude_filesystem and :exclude_functions trim what is captured.

Using a session as an LLM tool

ExBashkit deliberately ships noTool module. Wiring a sandbox to an LLM is a handful of plain data — a JSON schema, a system prompt, and a function that runs a tool call and formats the result — and every agent framework wants that data in its own shape. So it's a short recipe rather than a dependency:

session = ExBashkit.Session.new(python: true)

# 1. The tool's input schema (mirrors bashkit's BashTool contract):
schema = %{
  "type" => "object",
  "required" => ["commands"],
  "properties" => %{"commands" => %{"type" => "string"}}
}

# 2. Run one tool call -> the string the model sees:
run = fn %{"commands" => commands} ->
  case ExBashkit.Session.exec(session, commands) do
    {:ok, %ExBashkit.Result{stdout: out, stderr: err, exit_code: code}} ->
      out <> (if err == "", do: "", else: "\n[stderr]\n" <> err) <>
              (if code == 0, do: "", else: "\n[exit #{code}]")
    {:error, message} -> "tool error: #{message}"
  end
end

Because a session persists state across calls, the model can build up a workspace over a multi-step turn (write a file, process it, run python3 on it) — exactly what you want from an agentic shell. Plug run into any framework, e.g. ReqLLM:

{:ok, tool} =
  ReqLLM.Tool.new(
    name: "bash",
    description: "Run bash in a sandboxed virtual shell.",
    parameter_schema: [commands: [type: :string, required: true]],
    callback: fn args -> {:ok, run.(args)} end
  )

A complete, runnable version (with a system prompt and a simulated agent turn) is in examples/llm_tool.exs.

Why a virtual bash?

	Real `System.cmd/3`	ExBashkit
Spawns OS processes	yes (`fork`/`exec`)	no — pure in-process
Host filesystem	full access	virtual, empty by default
Network	unrestricted	denied by default; opt-in per-URL allowlist
Safe for untrusted input	no	yes
Determinism / reproducibility	depends on host	high

It's the same design philosophy as its sibling ExMonty (sandboxed Python): the guest language runs inert, and the host grants capabilities. bashkit even embeds monty for its optional python builtin.

Security model

Filesystem: in-memory virtual FS; no host paths are reachable unless you explicitly mount them (:read_only / :read_write), with canonicalization, escape rejection, and a sensitive-path default-deny enforced by bashkit.
Processes: none. All commands are reimplemented Rust builtins.
Network: off by default; opt-in per-URL allowlist (:allow_net) with redirect-blocking and private-IP/SSRF protection enforced by bashkit.
Resource limits: command count, loop iterations, recursion depth, input size, and a wall-clock timeout — tunable per session via :limits.
Isolation: each exec/1 runs in an independent sandbox; a Session is an independent sandbox that persists across its own calls.

Development

To build the NIF from source (instead of downloading a precompiled one):

export EXBASHKIT_BUILD=1
mix deps.get
mix test

This requires a Rust toolchain. The first build is slow — bashkit and its dependencies are large.

CI runs mix format --check-formatted, cargo fmt --check, cargo clippy -- -D warnings, and mix test on every push/PR.

Roadmap

See PORTING.md for the staged plan. In brief:

✅ Stateless exec/1 (skeleton, proves the toolchain)
✅ Persistent sessions (state across calls)
✅ Virtual filesystem — in-memory seed/read/write, plus :read_only / :read_write host-directory mounts
✅ Resource limits (:limits — commands, loops, recursion, input size, timeout)
✅ Network allowlist (:allow_net — default-deny per-URL, SSRF protection)
✅ Elixir-defined custom builtins (:builtins — call back into your app)
✅ Dynamic Elixir-backed filesystem (:virtual_fs — same back-call bridge)
✅ Sandboxed python builtin (optional ex_monty; shares the session FS). sqlite/typescript dropped (use a back-call); native bashkit interpreters not pursued (not on crates.io, would break the pin)
✅ Snapshot / resume (snapshot/2 + restore/3, keyed or plain)
✅ LLM tool contract — a documented recipe (examples/llm_tool.exs), not a module: a session is a tool in ~10 lines, framework-agnostic

Relationship to bashkit

ExBashkit pins an exact bashkit version and vendors no logic — all execution semantics come from upstream. Version bumps follow UPDATE_PROCEDURE.md.

Releasing

Releases are automated. Pushing a vX.Y.Z tag builds the precompiled NIFs, creates a GitHub release, and publishes to Hex — pausing for a manual approval before anything ships. You never hand-build checksums or re-tag.

One-time setup. Hex no longer mints API keys from the CLI (auth is OAuth); generate one at hex.pm/dashboard/keys with the api permission, then store it scoped to the hex environment:

gh secret set HEX_API_KEY --env hex --repo jtippett/ex_bashkit

To cut a release, run the release assistant from master and follow the prompts:

just release          # or, without just:  elixir scripts/release.exs

It shows the current and published versions, asks for a patch / minor / major bump (you pick the level — no version numbers to type), rolls the CHANGELOG.md[Unreleased] section into the new version, then commits, tags, and pushes. That kicks off release.yml, which builds NIFs for all four targets and creates the GitHub release.

Then approve the publish: open the workflow run → Review deployments → approve the hex environment. On approval it generates checksum-Elixir.ExBashkit.Native.exs from the released artifacts and runs mix hex.publish.

Keep notes under ## [Unreleased] in CHANGELOG.md as you work — the assistant rolls them into each release. Don't commit the checksum file or move a published tag by hand; the pipeline owns both. See UPDATE_PROCEDURE.md for bumping the pinned bashkit version.