Hex.pm

Note: This library is under active development and the API may change.

AshScylla

An Ash Framework data layer for ScyllaDB/Apache Cassandra

Quick StartFeaturesDocumentationContributingLicense


Overview

AshScylla enables you to use ScyllaDB or Apache Cassandra as a persistence layer for your Ash Framework resources. It implements the Ash.DataLayer behaviour using Xandra (a native Elixir CQL driver) to communicate via CQL (Cassandra Query Language).

Key Benefits


Quick Start

Prerequisites

Installation

Add ash_scylla to your dependencies in mix.exs:

def deps do
[
{:ash_scylla, "~> 0.7.0"}
]
end

Minimal Setup

1. Configure a Repo:

# lib/my_app/repo.ex
defmodule MyApp.Repo do
use AshScylla.Repo,
otp_app: :my_app
end

2. Configure the Repo in config/config.exs:

config :my_app, MyApp.Repo,
nodes: ["127.0.0.1:9042"],
keyspace: "my_app_dev",
pool_size: 10

3. Add the Repo to your supervision tree:

# lib/my_app/application.ex
children = [
MyApp.Repo,
# ...
]

4. Generate a Resource Template:

# Simple resource
mix ash_scylla.new_template User name:string, email:string
# Resource with domain (auto-prefixes module name)
mix ash_scylla.new_template User name:string --domain MyApp.Domain
# Resource with fully-qualified module name
mix ash_scylla.new_template User name:string --resource MyApp.Domain.User

This creates lib/my_app/resources/user.ex with a starter template. Or define it manually:

# lib/my_app/resources/user.ex
defmodule MyApp.User do
use Ash.Resource,
data_layer: AshScylla.DataLayer,
repo: MyApp.Repo
attributes do
uuid_primary_key :id
attribute :name, :string
attribute :email, :string
end
actions do
defaults [:create, :read, :update, :destroy]
end
end

5. Create a Domain:

# lib/my_app/domain.ex
defmodule MyApp.Domain do
use Ash.Domain
resources do
resource MyApp.User
end
end

6. Create Keyspace and Tables:

# Create keyspace (using the mix task)
mix ash_scylla.setup
# Or programmatically
MyApp.Repo.create_keyspace()
# Run migrations (includes schema files from priv/migrations)
mix ash_scylla.migrate
# Or run only schema files
mix ash_scylla.migrate --schemas-only
# Or run resource migrations only (skip schema files)
mix ash_scylla.migrate --resource MyApp.User

6a. Generate Schema Migrations from Ash DSL:

# Auto-generate schema file from all AshScylla resources
mix ash_scylla.gen --dev
# Generate with a specific schema module name
mix ash_scylla.gen AddUserTable
# Generate for a specific resource only
mix ash_scylla.gen --resource MyApp.User

This scans your project for Ash resources using AshScylla.DataLayer and produces a priv/migrations/<timestamp>_schema.ex file containing CREATE TABLE and CREATE INDEX CQL statements derived from each resource's attributes and secondary indexes.

Schema migration files in priv/migrations use AshScylla.Schema and implement change/0 to return a list of CQL statements. They are executed before resource-driven migrations when running mix ash_scylla.migrate.

7. Start Using It:

# Create
{:ok, user} = Ash.create(MyApp.User, %{name: "John", email: "john@example.com"})
# Read
users = MyApp.User
|> Ash.Query.filter(email == "john@example.com")
|> Ash.read!()
# Update
{:ok, updated} = user
|> Ash.Changeset.for_update(:update, %{name: "John Doe"})
|> Ash.update()
# Delete
:ok = Ash.destroy(user)

Or using the domain directly:

# Create via domain
{:ok, user} = MyApp.Domain.create_user(%{name: "John", email: "john@example.com"})
# Read via domain
users = MyApp.Domain.read_users!()

Features

Core Ash Features ✅

FeatureStatusDescription
CreateInsert records with TTL support
ReadQuery with filtering and sorting
UpdateUpdate existing records
DestroyDelete records
FilterPowerful filter syntax with CQL WHERE conversion
Sort⚠️ORDER BY on clustering columns only (within a partition)
Keyset paginationToken-based pagination via paging_state (preferred over OFFSET)
LimitLIMIT is natively supported
Offset⚠️Not natively supported in ScyllaDB; results silently truncated. Use keyset pagination instead.
SelectSelect specific fields
MultitenancyKeyspace-based multitenancy
Bulk CreateBatch INSERT operations

ScyllaDB-Specific Features 🚀

TTL (Time To Live)

Automatically expire data after a specified time:

defmodule MyApp.Session do
use Ash.Resource,
data_layer: AshScylla.DataLayer
ash_scylla do
ttl 3600 # Expire after 1 hour
end
end

Consistency Levels

Configure read/write consistency per resource:

ash_scylla do
consistency :quorum # :any, :one, :two, :three, :quorum, :all, :local_quorum
end

Secondary Indexes

Query non-primary key columns efficiently:

ash_scylla do
secondary_index :email # Single column
secondary_index [:name, :age] # Composite index
end

Materialized Views

Create alternative query patterns with automatic view maintenance:

ash_scylla do
materialized_view :users_by_email,
primary_key: [:email, :id],
include_columns: [:name, :age]
end

Batch Operations

Reduce network round-trips with BATCH statements:

# Bulk create (uses BATCH internally)
{:ok, users} = user_data_list
|> Ash.bulk_create(MyApp.User, :create)
# Async partition-aware batching for large datasets
AshScylla.DataLayer.Batch.batch_insert_async(repo, statements, resource: MyApp.User, max_concurrency: 8)

Token-Based Pagination

Efficient pagination without OFFSET:

ash_scylla do
pagination :token # Use token-based pagination instead of OFFSET
end

Per-Action Consistency

Configure consistency levels per action:

ash_scylla do
consistency :quorum # Default consistency
per_action_consistency read: :one, create: :quorum # Per-action overrides
end

Data Modeling Best Practices

ScyllaDB is a wide-column store optimized for specific query patterns. Follow these principles:

1. Query-First Design 🎯

Design your tables around your queries, not the other way around:

# Good: Partition key supports your main query
defmodule MyApp.User do
attributes do
attribute :email, :string, primary_key?: true # Partition key
attribute :name, :string
end
end
# Query by partition key (efficient)
MyApp.User
|> Ash.Query.filter(email == "user@example.com")
|> Ash.read_one()

2. Denormalization is Normal 📦

Duplicate data across tables to support different query patterns:

# Table for querying posts by author
defmodule MyApp.PostByAuthor do
attributes do
attribute :author_id, :uuid, primary_key?: true
attribute :post_id, :uuid, primary_key?: true
attribute :title, :string
attribute :content, :string
end
end
# Table for querying posts by date
defmodule MyApp.PostByDate do
attributes do
attribute :date, :date, primary_key?: true
attribute :post_id, :uuid, primary_key?: true
attribute :title, :string
attribute :author_name, :string # Denormalized
end
end

3. Choose Partition Keys Wisely 🔑

# Good: User ID has high cardinality
attribute :user_id, :uuid, primary_key?: true
# Avoid: Status has low cardinality (creates hotspots)
attribute :status, :string, primary_key?: true # Don't do this

Configuration

Resource Configuration

defmodule MyApp.User do
use Ash.Resource,
data_layer: AshScylla.DataLayer
ash_scylla do
table "users" # Override table name
keyspace "custom_keyspace" # Override keyspace
consistency :quorum # Consistency level
ttl 3600 # Default TTL (seconds)
# Secondary indexes
secondary_index :email
secondary_index [:name, :age]
# Materialized views
materialized_view :users_by_email,
primary_key: [:email, :id],
include_columns: [:name, :age]
end
end

Repo Configuration

Single-node connection:

config :my_app, MyApp.Repo,
nodes: ["127.0.0.1:9042"],
keyspace: "my_app_dev"

Multi-node cluster connection (all nodes must use the same port):

config :my_app, MyApp.Repo,
nodes: ["scylla-1:9042", "scylla-2:9042", "scylla-3:9042"],
keyspace: "my_app_prod",
pool_size: 10,
connect_timeout: 5_000

Cluster with a non-standard port (autodiscovered_nodes_port is auto-detected):

config :my_app, MyApp.Repo,
nodes: ["scylla-1:9043", "scylla-2:9043", "scylla-3:9043"],
keyspace: "my_app_prod"

Or set it explicitly:

config :my_app, MyApp.Repo,
nodes: ["scylla-1:9043", "scylla-2:9043"],
autodiscovered_nodes_port: 9043,
keyspace: "my_app_prod"

NOTE: Xandra.Cluster uses a single autodiscovered_nodes_port for all discovered peers because ScyllaDB/Cassandra system.peers does not advertise ports. All cluster nodes must use the same port, or the connection falls back to single-node mode with a warning.

Pool Size Guidelines:

ScyllaDB works best with a connections-per-shard approach: pool_size = num_nodes * num_cores_per_node


Limitations

Since ScyllaDB/Cassandra is a NoSQL wide-column store, some features are not supported:

LimitationReasonWorkaround
No JOINsNo relational joinsDenormalize or application-side joins
No complex aggregationsNo GROUP BY, COUNT across partitionsMaterialized views or custom aggregation
No ACID transactionsOnly lightweight transactions (LWT)Use LWT for single-partition operations
Limited WHERE clausesWithout indexes, only PK queries are efficient; filtering on non-indexed columns raises errorsCreate secondary indexes or materialized views for non-PK query patterns
No OR conditionsCQL limitationMultiple queries or UNION-like patterns
No foreign keysNo relational integrityApplication-level validation
OFFSET not supportedScyllaDB has no native OFFSET; it would require full table scanUse keyset pagination with pagination :token. The data layer silently drops OFFSET to prevent performance disasters.
Cluster requires same portXandra.Cluster uses one autodiscovered_nodes_port for all peersConfigure all ScyllaDB nodes on the same port, or use single-node connection

Observability

Telemetry

AshScylla emits standard :telemetry events for all query and batch operations, enabling integration with LiveDashboard, Datadog, OpenTelemetry, and other observability tools.

Query events:

Batch events:

Attaching a handler:

:telemetry.attach(
"ash_scylla-logger",
[:ash_scylla, :query, :stop],
&MyApp.Telemetry.handle_event/4,
nil
)

Prepared Statement Caching

For high-throughput workloads, enable the prepared statement cache to eliminate repeated query parsing overhead on ScyllaDB:

# In your supervision tree
children = [
AshScylla.PreparedStatementCache,
# ... other children
]

Documentation

For detailed documentation, see:


Testing

Run the test suite:

# All tests (unit + integration; requires Podman for testcontainers)
mix test
# Unit tests only (no ScyllaDB required)
mix test --exclude integration
# Integration tests only (requires Podman)
mix test --only integration
# CI pipeline (unit tests + credo)
mix test.ci

Test Structure

Tests are organized by feature domain under test/unit/ and test/integration/.

Unit Tests (test/unit/)

No ScyllaDB instance required. All tests use fake/mock repos or inline resources.

DirectoryFeature
unit/autogenerate/UUID autogeneration
unit/batch/Batch operations, bulk create, partition grouping
unit/connection/Xandra connection, prepared statement cache
unit/data_layer/CRUD, callbacks, feature flags, upsert
unit/dsl/DSL options, resource definition, repo/migration
unit/error/Error handling, edge cases
unit/filter/Filter validation, edge cases
unit/identifier/Identifier sanitization consistency
unit/mix_helpers/Mix helper utilities
unit/query/Query builder, optimizer, edge cases
unit/schema/Schema behaviour, schema loader
unit/security/CQL injection prevention
unit/source_cache/Table name resolution, caching
unit/telemetry/Span/batch_span telemetry
unit/types/Type conversion, type pipeline
unit/workload/Concurrent workload stress tests

Integration Tests (test/integration/)

Require a running ScyllaDB instance. Can use either testcontainers (Podman) or a direct connection.

FileDescriptionMulti-node?
integration/scylla_integration_test.exsFull ScyllaDB integration (CRUD, secondary indexes, clustering keys, consistency levels, concurrent operations)No
integration/data_layer_integration_test.exsDataLayer pipeline against real ScyllaDBNo
integration/pipeline_integration_test.exsDSL → DataLayer → QueryBuilder → ScyllaDB end-to-endNo
integration/basic_integration_test.exsBasic integration placeholderNo
integration/cluster_integration_test.exsMulti-node cluster topology, cluster formation, cross-node reads/writesYes

Running Integration Tests With a Local ScyllaDB

Integration tests can run against a pre-existing ScyllaDB instance — no container runtime needed. Set the SCYLLA_DIRECT environment variable and optionally override the host/port:

# Connect to ScyllaDB at localhost:9042 (defaults)
SCYLLA_DIRECT=1 mix test --only integration
# Connect to a remote ScyllaDB instance
SCYLLA_DIRECT=1 SCYLLA_HOST=db.example.com SCYLLA_PORT=9042 mix test --only integration
# Run a specific integration test file
SCYLLA_DIRECT=1 mix test test/integration/scylla_integration_test.exs
SCYLLA_DIRECT=1 mix test test/integration/data_layer_integration_test.exs

Note: The cluster integration test (cluster_integration_test.exs) is automatically skipped when SCYLLA_DIRECT is set. It requires multi-node container orchestration and cannot run against a single ScyllaDB instance.

ScyllaDB Configuration for Direct Connection

Env VarDefaultDescription
SCYLLA_DIRECTSet to 1 to enable direct connection mode
SCYLLA_HOST127.0.0.1ScyllaDB hostname or IP address
SCYLLA_PORT9042ScyllaDB CQL transport port

If you have authentication enabled on your ScyllaDB cluster, configure the repo in your config/test.exs:

config :my_app, MyApp.Repo,
nodes: ["db.example.com:9042"],
keyspace: "my_app_test",
authentication: {Xandra.Auth.Password, username: "cassandra", password: "cassandra"}

Running Cluster Integration Tests

The cluster integration test supports two modes:

Container Mode (default)

Spins up a 3-node ScyllaDB cluster using testcontainers (Podman). Tests cluster formation, cross-node reads/writes, and concurrent operations.

# Run cluster integration test (requires Podman)
mix test test/integration/cluster_integration_test.exs --only integration
# Run with increased timeout (first run may take longer to pull images)
MIX_ENV=test mix test test/integration/cluster_integration_test.exs --only integration --timeout 300_000

Prerequisites:

Cluster Mode (multi-node)

Connects to an already-running multi-node ScyllaDB cluster. Requires SCYLLA_NODES with comma-separated host:port pairs.

# Connect to a multi-node cluster
TEST_CLUSTER=true SCYLLA_NODES="node1:9042,node2:9042,node3:9042" \
mix test test/integration/cluster_integration_test.exs --only integration

Single-Node Direct Mode

Connects to a single ScyllaDB instance at SCYLLA_HOST:SCYLLA_PORT.

# Connect to a single-node at localhost:9042 (defaults)
SCYLLA_DIRECT=1 mix test test/integration/cluster_integration_test.exs --only integration
# Connect to a single-node with custom host/port
SCYLLA_DIRECT=1 SCYLLA_HOST=db.example.com SCYLLA_PORT=9042 \
mix test test/integration/cluster_integration_test.exs --only integration

Configuration:

Env VarDefaultDescription
TEST_CLUSTERfalseSet to true for multi-node cluster mode
SCYLLA_DIRECTSet to 1 for single-node direct connection
SCYLLA_NODESComma-separated host:port pairs (cluster mode)
SCYLLA_HOST127.0.0.1Single host (single-node mode)
SCYLLA_PORT9042Single port (single-node mode)

Note:TEST_CLUSTER=true connects to each node directly for concurrent multi-node operations. SCYLLA_DIRECT=1 without TEST_CLUSTER connects to a single node only.

What it tests:

Integration tests use testcontainer_ex to spin up ScyllaDB instances automatically via Podman (container mode).


Contributing

Contributions are welcome! Here's how to get started:

  1. Fork the repository
  2. Clone your fork: git clone https://github.com/your-username/ash_scylla.git
  3. Create a feature branch: git checkout -b feature/my-feature
  4. Make your changes
  5. Run tests: mix test
  6. Commit your changes: git commit -am 'Add some feature'
  7. Push to the branch: git push origin feature/my-feature
  8. Create a Pull Request

Development Setup

# Install dependencies
mix deps.get
# Start ScyllaDB via Podman Compose (includes health checks)
podman-compose -f podman-compose.yml up -d
# Or start ScyllaDB manually
podman run -p 9042:9042 docker.io/scylladb/scylla:latest
# Run tests
mix test

Dev Container

A .devcontainer/devcontainer.json is provided for VS Code Dev Containers. It brings up both Elixir and ScyllaDB together via Podman Compose.

Integration Test

export CONTAINER_ENGINE=podman
export CONTAINER_ENGINE_HOST='unix:///private/var/folders/76/xt0kl9zj2ks6wsl1q13513h40000gn/T/podman/podman-machine-default-api.sock'
MIX_ENV=test mix test.integration

Note: For socket host need to check in your local machine. Auto detect feature will be added in the future.


License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Acknowledgments


Made with ❤️ for the Elixir and Ash communities