Khafra Search
Khafra has changed from a deployment handler search clusters to managing real time search data in a cluster based on Ecto & SQL schemas and behaviours.
Features:
- Implements core functionality of Manticore & Sphinx Search through Giza
- Behaviour to back your Ecto Schemas with search tables
- Behaviour to back SQL tables with search tables
- Automated distributed table creation
- Timed and executable tasks for refreshing search data
- RabbitMQ queue supported for mass distributed updates
Khafra includes the Giza Sphinx Client for Elixir
Installation
def deps do
[
{:khafra_search, "~> 0.3"}
]
end
# Add to your application or supervisor
def start(_type, _args) do
import Supervisor.Spec
# List all child processes to be supervised
children = [
...,
Khafra.Supervisor
]
opts = [strategy: :one_for_one, name: YourApp.Supervisor]
Supervisor.start_link(children, opts)
endOptional
Install RabbitMQ to enable queuing operations. This is the only supported queue for now.
The default is streaming or immediate operations so queueing is not necessary.
Search Behaviours
Khafra exposes two behaviours that mark a module as backable by a real-time
search table. Implement one of them on the schema/module that represents your
data, and Khafra will create a matching Manticore table on startup and keep it
in sync as rows are inserted or updated through Khafra.insert/2 and
Khafra.update/2.
Khafra.SearchBehaviour (Ecto)
For modules using Ecto.Schema. The behaviour requires a single callback,
index_fields/0, returning the list of schema fields that should be indexed
for full-text search. The schema's @source (table name) is used to derive
the Manticore table and its _dist distributed alias.
elixir path=lib/sample/test_schema.ex start=1 defmodule Khafra.Sample.TestSchema do use Ecto.Schema import Ecto.Changeset @behaviour Khafra.SearchBehaviour schema "test" do field :city, :string field :temp_lo, :integer field :temp_hi, :integer field :score, :float field :desc, :string timestamps() end def changeset(test, attrs) do cast(test, attrs, [:city, :temp_lo, :temp_hi, :score, :desc]) end @impl Khafra.SearchBehaviour def index_fields, do: [:city, :desc] end
Khafra.SearchBehaviourSQL (~SQL sigil)
For modules built on the elixir-dbvisor/sql
library instead of Ecto. Two callbacks are required:
table_name/0— the underlying SQL table name as an atomindex_fields/0— aKeyword.t()offield: typepairs to index
elixir path=lib/sample/test_sql.ex start=1 defmodule Khafra.Sample.TestSql do @behaviour Khafra.SearchBehaviourSQL @impl Khafra.SearchBehaviourSQL def table_name, do: :book @impl Khafra.SearchBehaviourSQL def index_fields, do: [id: :integer, title: :string, description: :string] end
Search Examples
The modules under lib/sample/ show the basics for how Khafra is intended to be wired into
an application. Two complete samples are included — one driven by Ecto
(Khafra.Sample) and one driven by ~SQL (Khafra.Sample.SampleSQL).
Ecto example
Khafra.insert/2 and Khafra.update/2 accept the result of a normal Ecto
operation and, when the schema implements Khafra.SearchBehaviour,
transparently mirror the row into the Manticore real-time table.
elixir path=lib/sample/sample.ex start=11 def add_city(%{} = attrs, opts) do %TestSchema{} |> TestSchema.changeset(attrs) |> @repo.insert() |> Khafra.insert(opts) end def update_city(city, %{} = attrs, opts) do city |> TestSchema.changeset(attrs) |> @repo.update() |> Khafra.update(opts) end def find_cities(search_string) do ManticoreQL.new() |> ManticoreQL.from("test_dist") |> ManticoreQL.match("*#{search_string}*") |> Giza.send() end
A convenience path is also available through Khafra.match/2, which accepts
either an Ecto query struct or a %SQL{} struct and dispatches to the right
distributed table:
elixir path=null start=null Khafra.match(from t in TestSchema, where: t.city == "Tokyo")
~SQL example
When using the ~SQL sigil, Khafra.SearchBehaviourSQL is paired with
Giza.SearchTables.replace/3 to push rows into Manticore alongside the
primary write. Khafra.match/2 understands %SQL{} structs directly and
translates WHERE field = 'value' predicates into Manticore MATCH
expressions against the _dist table.
elixir path=lib/sample/sampl_sql.ex start=18 def add_book(%{id: id, title: title, description: description}, _opts) do Enum.to_list( ~SQL"INSERT INTO book (id, title, description) VALUES ({{id}}, {{title}}, {{description}})" ) SearchTables.replace(@table, ["id", "title", "description"], [id, title, description]) end def find_books(search_string) do ManticoreQL.new() |> ManticoreQL.from("#{@table}_dist") |> ManticoreQL.match("*#{search_string}*") |> Giza.send!() end
Other useful entry points
Khafra.create_table/2— create the Manticore table for a schema using the configured strategyKhafra.refresh_table/2— rebuild the search table from the underlying datastore (batched/streamed; seeKhafra.SearchTable.batch_replace/2)Khafra.trigger_maintenance/0— force a maintenance pass on every registered table server (also runs daily viaKhafra.Scheduler)Khafra.peek/1andKhafra.peek/2— inspect observer and per-table stateKhafra.destroy_all/0— drop every managed table and its distributed index on the current node (intended for tests)
Live Dashboard
Khafra ships with two Phoenix LiveDashboard
pages for observing search activity in real time. Mount them through the
additional_pages option of live_dashboard in your router:
elixir path=null start=null live_dashboard "/dashboard", additional_pages: [ search_tables: Khafra.LiveDashboard.SearchTablesPage, query_metrics: Khafra.LiveDashboard.QueryMetricsPage ]
Khafra.LiveDashboard.SearchTablesPage
Lists every Manticore table managed by Khafra on the selected node, with
sortable/searchable columns for the table name, the backing schema, indexed
document count, RAM footprint and on-disk size. Data is sourced live from
the Khafra.Observer registry of TableServer processes via :rpc, so the
page reflects whichever node you are inspecting.
Khafra.LiveDashboard.QueryMetricsPage
Renders live charts driven by Giza's [:giza, :query, :stop] and
[:giza, :query, :exception] telemetry events:
- Query Count (counter)
- Query Duration (summary, ms)
- Query Errors (counter)
-
Duration by Source (summary, ms, broken down by query source tag)
Metrics are scoped per LiveDashboard session — the page attaches its
own telemetry handler on
mount/3and tears it down when the socket disconnects.