Chroxy Build Status

A proxy service to mediate access to Chrome that is run in headless mode, for use in high-frequency application load testing, end-user behaviour simulations and programmatic access to Chrome Devtools.

Enables automatic initialisation of the underlying chrome browser pages upon the request for a connection, as well as closing the page once the WebSocket connection is closed.

This project was born out of necessity, as we needed to orchestrate a large number of concurrent browser scenario executions, with low-level control and advanced introspection capabilities.

Features

Cowboy Compatibility

Cowboy is a major dependency of Phoenix, as such here is a little notice as to which versions of cowboy are hard dependencies of Chroxy. This notice will be removed at version 1.0 of Chroxy.

Cowboy 1.x <= version 0.5.1 Cowboy 2.x > version 0.6.0

Project Goals

The objective of this project is to enable connections to headless chrome instances with minimal overhead and abstractions. Unlike browser testing frameworks such as Hound and Wallaby, Chroxy aims to provide direct unfettered access to the underlying browser using the Chrome Debug protocol whilst enabling many 1000s of concurrent connections channelling these to an underlying chrome browser resource pool.

Elixir Supervision of Chrome OS Processes - Resiliency

Chroxy uses Elixir processes and OTP supervision to manage the chrome instances, as well as including a transparent proxy to facilitate automatic initialisation and termination of the underlying chrome page based on the upstream connection lifetime.

Getting Started

Get dependencies and compile:

$ mix do deps.get, compile

Run the Chroxy Server:

$ mix run --no-halt

Run with an attached session:

$ iex -S mix

Run Docker image

Exposes 1330, and 1331 (default ports for connection api and chrome proxy endpoint).

$ docker build . -t chroxy
$ docker run -p 1330:1330 -p 1331:1331 chroxy

Operation Examples:

Using Chroxy Client & ChromeRemoteInterface

Establish 100 Browser Connections:

clients = Enum.map(1..100, fn(_) ->
  ChroxyClient.page_session!(%{host: "localhost", port: 1330})
end)

Run 100 Asynchronous browser operations:

Task.async_stream(clients, fn(client) ->
  url = "https://github.com/holsee"
  {:ok, _} = ChromeRemoteInterface.RPC.Page.navigate(client, %{url: url})
end, timeout: :infinity) |> Stream.run

You can then use any Page related functionality using ChromeRemoteInterface.

Use any client that speaks Chrome Debug Protocol:

Get the address for a connection:

$ curl http://localhost:1330/api/v1/connection

ws://localhost:1331/devtools/page/2CD7F0BC05863AB665D1FB95149665AF

With this address you can establish the connection to the chrome instance (which is routed via a transparent proxy).

Configuration

The configuration is designed to be friendly for containerisation as such uses environment variables

Chroxy as a Library

def deps do
  [{:chroxy, "~> 0.3"}]
end

If using Chroxy as a dependency of another mix projects you may wish to leverage the configuration implementation of Chroxy by replication the configuration in "../deps/chroxy/config/config.exs".

Example: Create a Page Session, Registering for Event and Navigating to URL

ws_addr = Chroxy.connection()
{:ok, page} = ChromeRemoteInterface.PageSession.start_link(ws_addr)
ChromeRemoteInterface.RPC.Page.enable(page)
ChromeRemoteInterface.PageSession.subscribe(page, "Page.loadEventFired", self())
url = "https://github.com/holsee"
{:ok, _} = ChromeRemoteInterface.RPC.Page.navigate(page, %{url: url})
# Message Received by self() => {:chrome_remote_interface, "Page.loadEventFired", _}

Configuration Variables

Ports, Proxy Host and Endpoint Scheme are managed via Env Vars.

Variable Default Desc.
CHROXY_CHROME_PORT_FROM 9222 Starting port in the Chrome Browser port range
CHROXY_CHROME_PORT_TO 9223 Last port in the Chrome Browser port range
CHROXY_PROXY_HOST “127.0.0.1” Host which is substituted to route connections via proxy
CHROXY_PROXY_PORT 1331 Port which proxy listener will accept connections on
CHROXY_ENDPOINT_SCHEME :http HTTP or HTTPS
CHROXY_ENDPOINT_PORT 1330 HTTP API will register on this port
CHROXY_CHROME_SERVER_PAGE_WAIT_MS 200 Milliseconds to wait after asking chrome to create a page
CHROME_CHROME_SERVER_CRASH_DUMPS_DIR “/tmp” Directory to which chrome will write crash dumps

Components

Proxy

An intermediary TCP proxy is in place to allow for monitoring of the upstream client and downstream chrome RSP web socket connections, in order to clean up resources after connections are closed.

Chroxy.ProxyListener - Incoming Connection Management & Delegation

Chroxy.ProxyServer - Dynamically Configured Transparent Proxy

Chroxy.ProxyServer.Hook - Behaviour for ProxyServer hooks. Example: ChromeProxy

Chrome Browser Management

Chrome is the first browser supported, and the following server processes manage the communication and lifetime of the Chrome Browsers and Tabs.

Chroxy.ChromeProxy - Implements ProxyServer.Hook for Chrome resource management

Chroxy.ChromeServer - Wraps Chrome Browser OS Process

Chroxy.BrowserPool - Inits & Controls access to pool of browser processes

Chroxy.BrowerPool.Chrome - Chrome Process Pool

HTTP API - Chroxy.Endpoint

GET /api/v1/connection

Returns WebSocket URI ws:// to a Chrome Browser Page which is routed via the Proxy. This is the first port of call for an external client connecting to the service.

Request:

$ curl http://localhost:1330/api/v1/connection

Response:

ws://localhost:1331/devtools/page/2CD7F0BC05863AB665D1FB95149665AF

Kubernetes

The following is an example configuration which can be used to run Chroxy on Kubernetes.

deployment.yaml

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: crawler
  namespace: default
  labels:
    app: myApp
    tier: crawler

spec:
  replicas: 1
  revisionHistoryLimit: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector:
    matchLabels:
      app: myApp
      tier: crawler
  template:
    metadata:
      labels:
        app: myApp
        tier: crawler
    spec:
      containers:
        - image: eu.gcr.io/..../...:latest # your consumer
          name: api
          imagePullPolicy: Always
          resources:
            requests:
              cpu: 30m
              memory: 100Mi
          ports:
            - containerPort: 4000
          env:
          - name: USER_AGENT
            value: ...
          - name: INSTANCE_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name

        # [START chroxy]
        - name: headless-chrome
          image: eu.gcr.io/..../chroxy:latest # chroxy
          imagePullPolicy: Always
          resources:
            requests:
              cpu: 30m
              memory: 100Mi
          env:
            - name: CHROXY_CHROME_PORT_FROM
              value: "9222"
            - name: CHROXY_CHROME_PORT_TO
              value: "9223"
          ports:
            - containerPort: 1331
            - containerPort: 1330
        # [END chroxy]

service.yaml

apiVersion: v1
kind: Service
metadata:
  namespace: default
  name: crawler-api
  labels:
    app: myApp
    tier: crawler
spec:
  selector:
    app: myApp
    tier: crawler
  ports:
  - port: 4000
    protocol: TCP