Automator
Chrome DevTools Protocol (CDP) scraper for Elixir. Spawn headless Chromium, navigate pages, evaluate JavaScript, and extract data — all through a clean, idiomatic Elixir API.
Installation
Add :automator to your dependencies:
def deps do
[
{:automator, "~> 0.1.0"}
]
end
Requires Chromium installed and available on PATH as chromium.
Quick Start
# Start a scraper (spawns Chromium + connects automatically)
{:ok, scraper} = Automator.Scraper.start_link()
# Navigate to a page
Automator.Scraper.navigate(scraper, "https://example.com")
# Evaluate JavaScript
title = Automator.Scraper.eval(scraper, "document.title")
# => "Example Domain"
# Wait for an element to appear
Automator.Scraper.wait_for_selector(scraper, "h1")
# Click an element
Automator.Scraper.click(scraper, "a")
# Take a screenshot (returns base64)
%{"data" => base64} = Automator.Scraper.screenshot(scraper)
File.write!("page.png", Base.decode64!(base64))
# Set cookies
Automator.Scraper.set_cookie(scraper, "session", "abc123", ".example.com")
# Cleanup
Automator.Scraper.stop(scraper)Architecture
Automator has three layers, from high-level to low-level:
┌─────────────────────────────────────────┐
│ Automator.Scraper (GenServer) │ ← Primary API
│ Manages browser + page, simple fns │
├─────────────────────────────────────────┤
│ Automator.Client (WebSockex) │ ← Raw CDP commands
│ WebSocket JSON-RPC client │
├─────────────────────────────────────────┤
│ Automator.Chromium (Process mgmt) │ ← Browser lifecycle
│ Spawns/kills headless Chromium │
└─────────────────────────────────────────┘
Most users only need Automator.Scraper. Use Client when you need direct access to CDP domains not exposed by the scraper. Use Chromium when you want to manage the browser lifecycle yourself.
API Reference
Automator.Scraper
High-level scraping API. A GenServer that owns a Chromium instance and a page-level WebSocket connection.
start_link/0
Spawns headless Chromium and connects to a blank page.
{:ok, scraper} = Automator.Scraper.start_link()
Returns {:ok, pid}.
navigate/2
Navigates to a URL. Waits ~1 second for the page to load before returning.
Automator.Scraper.navigate(scraper, "https://example.com")Parameters:
| Param | Type | Description |
|---|---|---|
pid | pid | Scraper process |
url | String.t() | URL to navigate to |
eval/2
Evaluates JavaScript in the page context. Supports async/await — promises are awaited automatically.
Automator.Scraper.eval(scraper, "document.title")
# => "Example Domain"
Automator.Scraper.eval(scraper, "document.querySelectorAll('a').length")
# => 1
Automator.Scraper.eval(scraper, "Array.from(document.querySelectorAll('a')).map(a => a.href)")
# => ["https://www.iana.org/domains/example"]
# Async example
Automator.Scraper.eval(scraper, """
await fetch('/api/data').then(r => r.json())
""")Parameters:
| Param | Type | Description |
|---|---|---|
pid | pid | Scraper process |
js | String.t() | JavaScript expression |
Returns: The JavaScript result value, converted to an Elixir term.
click/2
Clicks an element matching a CSS selector.
Automator.Scraper.click(scraper, "button.submit")
# => true
Automator.Scraper.click(scraper, ".nonexistent")
# => falseParameters:
| Param | Type | Description |
|---|---|---|
pid | pid | Scraper process |
selector | String.t() | CSS selector |
Returns:true if element found and clicked, false otherwise.
wait_for_selector/3
Waits for an element to appear in the DOM using a MutationObserver (not polling).
Automator.Scraper.wait_for_selector(scraper, "h1")
# => :ok
Automator.Scraper.wait_for_selector(scraper, ".dynamic-content", 5000)
# => :ok
Automator.Scraper.wait_for_selector(scraper, ".nonexistent", 1000)
# => {:error, "selector .nonexistent not found within 1000ms"}Parameters:
| Param | Type | Default | Description |
|---|---|---|---|
pid | pid | — | Scraper process |
selector | String.t() | — | CSS selector |
timeout | integer() | 10_000 | Max wait time in ms |
Returns::ok or {:error, reason}.
screenshot/1
Captures a screenshot of the current page as a base64-encoded PNG.
%{"data" => base64} = Automator.Scraper.screenshot(scraper)
File.write!("screenshot.png", Base.decode64!(base64))Parameters:
| Param | Type | Description |
|---|---|---|
pid | pid | Scraper process |
Returns:%{"data" => base64_string}.
set_cookie/4
Sets a cookie for the given domain.
Automator.Scraper.set_cookie(scraper, "session", "abc123", ".example.com")
# => %{"success" => true}Parameters:
| Param | Type | Description |
|---|---|---|
pid | pid | Scraper process |
name | String.t() | Cookie name |
value | String.t() | Cookie value |
domain | String.t() |
Cookie domain (e.g., ".example.com") |
stop/1
Stops the scraper and kills the Chromium process.
Automator.Scraper.stop(scraper)
# => :okAutomator.Chromium
Low-level browser process management. Use this when you want to manage the Chromium lifecycle yourself and connect multiple clients.
spawn/0
Launches headless Chromium on an available port.
browser = Automator.Chromium.spawn()
# => %{
# chromium: #Port<0.5>,
# os_pid: 12345,
# port: 9222,
# ws_url: "ws://localhost:9222/devtools/browser/..."
# }Flags used:
| Flag | Value |
|---|---|
--headless | new |
--no-sandbox | — |
--disable-gpu | — |
--window-size | 1920,1080 |
--remote-debugging-port | auto-detected |
Returns: A map with :chromium (port ref), :os_pid, :port, and :ws_url.
kill/1
Kills the Chromium process by OS PID.
browser = Automator.Chromium.spawn()
Automator.Chromium.kill(browser)Parameters:
| Param | Type | Description |
|---|---|---|
browser | map() |
Map returned by spawn/0 |
Automator.Client
Low-level WebSocket client for sending raw CDP commands. Use this when you need access to CDP domains not exposed by Scraper.
start_link/1
Connects to a Chromium WebSocket debugger URL.
{:ok, client} = Automator.Client.start_link("ws://localhost:9222/devtools/browser/...")Parameters:
| Param | Type | Description |
|---|---|---|
ws_url | String.t() |
WebSocket URL from Chromium.spawn().ws_url or /json endpoint |
send_command/3
Sends a CDP command and blocks until the response arrives.
# Browser-level command
{:ok, result} = Automator.Client.send_command(client, "Browser.getVersion")
IO.inspect(result["product"])
# => "Chrome/145.0.7632.159"
# Page-level command
{:ok, page_client} = Automator.Client.start_link(page_ws_url)
{:ok, _} = Automator.Client.send_command(page_client, "Page.navigate", %{url: "https://example.com"})
# With parameters
{:ok, result} = Automator.Client.send_command(page_client, "Runtime.evaluate", %{
expression: "document.title",
returnByValue: true
})Parameters:
| Param | Type | Default | Description |
|---|---|---|---|
pid | pid | — | Client process |
method | String.t() | — | CDP method name |
params | map() | %{} | Command parameters |
Returns:{:ok, result} or {:error, error}.
See the CDP protocol documentation for all available domains and methods.
Common Patterns
Scraping a list of items
{:ok, scraper} = Automator.Scraper.start_link()
Automator.Scraper.navigate(scraper, "https://example.com/products")
items = Automator.Scraper.eval(scraper, """
Array.from(document.querySelectorAll('.product')).map(el => ({
name: el.querySelector('.name').textContent,
price: el.querySelector('.price').textContent,
url: el.querySelector('a').href
}))
""")
Automator.Scraper.stop(scraper)Waiting for dynamic content
{:ok, scraper} = Automator.Scraper.start_link()
Automator.Scraper.navigate(scraper, "https://example.com")
# Wait for SPA to render
Automator.Scraper.wait_for_selector(scraper, ".app-root", 15_000)
# Interact
Automator.Scraper.click(scraper, "button.load-more")
Automator.Scraper.wait_for_selector(scraper, ".item:nth-child(20)", 10_000)
# Extract
data = Automator.Scraper.eval(scraper, "window.__INITIAL_STATE__")
Automator.Scraper.stop(scraper)Using cookies for authenticated sessions
{:ok, scraper} = Automator.Scraper.start_link()
# Set auth cookie
Automator.Scraper.set_cookie(scraper, "auth_token", "secret", ".example.com")
# Navigate — already authenticated
Automator.Scraper.navigate(scraper, "https://example.com/dashboard")
profile = Automator.Scraper.eval(scraper, "document.querySelector('.profile').textContent")
Automator.Scraper.stop(scraper)Raw CDP access for advanced use cases
# Start scraper for browser management
{:ok, scraper} = Automator.Scraper.start_link()
Automator.Scraper.navigate(scraper, "https://example.com")
# Access performance metrics via CDP
Automator.Scraper.eval(scraper, "performance.getEntriesByType('navigation')[0]")
# Or use Client directly for any CDP domain
# (e.g., Network, DOM, CSS, Accessibility, etc.)
Automator.Scraper.stop(scraper)CDP Domains
Through Automator.Client.send_command/3, you have access to the full Chrome DevTools Protocol. Commonly useful domains:
| Domain | Use case |
|---|---|
Page | Navigation, screenshots, lifecycle events |
Runtime | JavaScript evaluation, object inspection |
DOM | DOM tree traversal, node manipulation |
Network | Request/response interception, cookies |
CSS | Stylesheet inspection, computed styles |
Input | Mouse/keyboard simulation |
Emulation | Device emulation, viewport, geolocation |
Browser | Browser info, window management |
Target | Tab/page management |
See the full CDP reference for every method.
License
MIT