KittAgent

KittAgent is an advanced AI agent platform built with Elixir and Phoenix LiveView, designed to bring physical robots to life using Large Language Models (LLMs).

It currently specializes in controlling mBot2 robots, enabling them to engage in natural language conversations while autonomously deciding on physical actions like moving and turning based on context.

🚀 Key Features

1. Real-time Interaction Dashboard

Direct Communication: Chat directly with your agents via the web interface to test personality and responsiveness.
Multimedia Feedback:
- Audio Playback: Listen to generated voice responses directly in the feed (integrated with TTS engines like Zonos).
- Visual Logs: See immediate feedback on role, message content, and mood.
Transaction Management: View and clean up recent interaction logs.
Action Inspection: Inspect complex SystemActions in a formatted, scrollable code view to verify parameters.

2. Advanced Agent Management (“Kitts”)

Comprehensive Profiles: Manage multiple agents with distinct profiles including Name, Model, Vendor, Birthday, and Hometown.
Localization Support:
- Language Selection: Choose from 18+ languages (Japanese, English, Swahili, etc.) for your agent’s communication.
- Timezone Awareness: Selectable timezone support (via Tzdata) to ground the agent in a specific locale.
Personality Engine: Define complex biographies and behavioral traits (“Personality”). Long descriptions are easily managed via pop-up modals.
Smart Defaults: Automatically applies system-wide default language and timezone settings to new agents.

3. LLM-Driven Intelligence & Control

Flexible Model Selection:
- Main Conversation Model: Choose any model supported by your provider (e.g., Gemini 2.0 Flash) for agent dialogue.
- Summarization Model: Select a specialized model for high-quality memory consolidation.
Custom LLM Providers: Configure custom API endpoints and keys (e.g., OpenRouter) directly from the settings menu.
Structured Outputs: Enforces strict JSON Schema for all LLM responses to ensure reliable parsing.
Physical Capabilities (SystemActions):
- Direct Code Generation: The agent generates MicroPython code dynamically to control the robot, allowing for complex and adaptive behaviors beyond simple preset commands.
- Hardware Control:
  - Core (CyberPi): Control LEDs, speaker, display, and read inputs (buttons, gyro, mic).
  - Chassis (mBot2): Precise movement control (speed, duration, turning).
  - Sensors (mBuild): Access external modules like Ultrasonic Sensor 2 and Quad RGB Sensor.
- Flexible Logic: Supports conditional logic and loops within the generated code (e.g., “Forward until obstacle < 10cm, then turn”).

4. Comprehensive Activity Monitoring (“Activities”)

Live Audit Log: A dedicated “Activities” dashboard to track all historical agent responses and decisions.
Status Management: Monitor and manually override action statuses (pending, processing, completed, failed).
- Formatted Code Blocks: Ensure long parameters are readable without breaking the layout.
- Queue Maintenance: Monitor real-time queue depths and manually clear “Talk” or “System Action” queues (globally or per-agent) to prevent stale task accumulation.
- Advanced Filtering: Filter logs by Kitt, Status, or Role to pinpoint specific events.

5. Dual-Layer Memory Architecture

Short-term Memory (Events): Maintains a log of recent interactions for immediate context.
Long-term Memory (Memories):
- Auto-Summarization: A background process condense new events into narrative summaries.
- Persistent Context: Summaries are injected into the system prompt, providing a persistent sense of history.

6. Centralized Configuration (“Settings”)

System Defaults: Set global defaults for new agent creation.
LLM Provider Setup: Update API keys and Base URLs without touching environment variables or restarting the server.
Model Management: Switch between different LLM models for conversation and summarization on the fly.

🤖 Physical Action Architecture & Client Design

KittAgent employs a distributed client architecture designed to overcome the resource constraints of microcontroller-based robots like mBot2.

Dual-Queue System

To handle different types of agent outputs efficiently, the system maintains two separate, independent queues:

Talk Queue (Talks.Queue): Stores audio response data (TTS-generated WAV files).
System Action Queue (SystemActions.Queue): Stores physical command data (MicroPython code).

Client Roles

mBot2 (Robot Client):
- Primary Role: Physical execution.
- Mechanism: Polls the System Action Queue, retrieves generated MicroPython code, and executes it locally to perform actions (move, turn, LED control).
- Constraint: Due to limited memory and processing power, it does not handle audio playback.
Companion Device (Mobile/PC Client):
- Primary Role: Voice interaction.
- Mechanism: Polls the Talk Queue and plays back the audio responses.
- Benefit: This offloads the heavy lifting of audio streaming/decoding from the robot, ensuring smooth, uninterrupted movement and clear voice output.

🛠 Tech Stack

Core: Elixir, Phoenix Framework (LiveView)
Database: PostgreSQL (with pgvector support planned/ready)
AI Provider: OpenRouter (default), or any OpenAI-compatible API
TTS Provider: Zonos (Gradio) / Custom Audio Pipelines
Styling: Tailwind CSS + DaisyUI
Infrastructure: Docker & Docker Compose

🗄 Database Schema Overview

kitts: Core agent metadata.
biographies: Detailed “Personality” text.
events: Raw log of interactions.
contents: Structured data for each event (message, action, mood, status, audio paths).
system_actions: Specific parameters for physical movements.
memories: Narrative summaries generated by the agent.
configs: Global key-value settings (LLM models, API credentials, defaults).

📦 Installation & Setup

Prerequisites

Docker & Docker Compose

Quick Start

Clone & Setup:

git clone <repository_url>
cd kitt_agent
docker compose run --rm app mix setup

Initial Configuration:
1. Start the server: docker compose up
2. Visit http://localhost:4000/kitt-web/settings
3. Configure your API Key and API Base URL (OpenRouter default: https://openrouter.ai/api/v1/chat/completions).
4. Select your preferred Main and Summary models.
Create your first Kitt: Navigate to the KITTs page and click New Kitt.

📝 License

This project is licensed under the MIT License.