Brain — Telemetry & Usage Analytics for Deployed AI Agents

Status: Draft (MVP scope) Date: 2026-04-29 Owner: M8TRX.AI platform team

1. Goal & non-goals

Goal. Stand up a phone-home server ("the brain") that collects usage data from hundreds of M8TRX.AI agents deployed on customer EC2 instances, so we can monitor adoption and design product iterations from real usage.

Top questions the brain must answer (priorities, in order):

Engagement / churn risk — "Is customer X using their agent enough to justify renewal?"
Upsell signals — "What is customer X using their agent for most, and where could we expand?"
Cross-customer patterns — "What is common across customers — what should we productize next?"

Cost-to-serve and QA-at-scale are explicit secondary goals: captured by the same data, not the design driver.

Non-goals for MVP (additive later, do not block shipping):

Real-time alerting / anomaly detection.
Customer-facing analytics dashboards.
Automated churn prediction or scoring beyond saved Metabase questions.
Multi-region deployment.
A web admin UI.

2. Architecture overview

[Customer EC2]                        [Brain — single AWS region]
┌──────────────────┐                  ┌────────────────────────────┐
│ Agent process    │  HTTPS POST      │  ALB (TLS terminator)      │
│  (Python /       │  /v1/events      │   │                        │
│   Claude Code /  │ ───────────────► │   ▼                        │
│   paperclip)     │  Bearer api_key  │  FastAPI on EC2 (uvicorn)  │
│   + brain SDK    │                  │   │ validate, redact, write │
└──────────────────┘                  │   ▼                        │
                                      │  Postgres (RDS)            │
                                      │   events / agents /        │
                                      │   customers / api_keys     │
                                      └────────────┬───────────────┘
                                                   │ read-only role
                                                   ▼
                                            Metabase (queries)

Single-region, AWS-native, intentionally boring. We can horizontally scale FastAPI by adding another EC2 behind the ALB; we can introduce ClickHouse later when Postgres analytics start to hurt.

3. Components

Component	Responsibility	Tech
Brain SDK	Buffer + flush events from inside the agent process; never raise.	Python package `m8trx_brain` (pip-installable from a private index or git URL).
Ingestion API	`POST /v1/events`, bearer auth, schema validation, redaction, DB write. Stateless.	FastAPI + uvicorn in a Docker container on EC2 (t3.small) behind an ALB.
Redaction pipeline	In-process module. Routes raw summaries through Claude Haiku to strip PII based on the customer's privacy tier.	`anthropic` SDK; Haiku (`claude-haiku-4-5`).
Postgres (RDS)	Primary store. JSONB payload column for schema flexibility during iteration.	RDS Postgres 16, db.t4g.medium, gp3 storage, 7-day automated snapshots.
Metabase	Read-only analytics UI for the team.	Open-source Metabase on a small EC2 / ECS task; uses a read-only DB role.

Out-of-MVP components (additive later, no design blocker): real-time stream consumer, alerting service, customer-facing dashboard.

4. Event model & schema

4.1 Common envelope

Every event sent by the SDK has the same outer shape:

{
  "event_id": "uuid-v4",
  "ts": "2026-04-29T10:30:00Z",
  "event_type": "session.end",
  "session_id": "sess_abc123",
  "payload": { ... }
}

agent_id and customer_id are never sent by the client. They are resolved server-side from the bearer token. This means a leaked key can only attribute events to the agent that owns it.

4.2 Event types (MVP — four)

`event_type`	Emitted when	Payload fields
`session.start`	Agent picks up a unit of work.	`category` (free string, agent-tagged, e.g. `"email_reply"`); `source` (optional, e.g. `"inbox"`).
`session.end`	Unit of work completes, fails, or escalates.	`status: "success"\|"failed"\|"escalated"`; `duration_ms`; `tool_calls: [{name, ms, ok}]`; `llm_usage: {model, input_tokens, output_tokens, cost_cents}`; `summary_raw?` (redaction input).
`error`	Uncaught error in the agent.	`message`, `kind`, `stack_hash` (sha256 of stack trace, no body) — groupable without leaking code paths.
`heartbeat`	SDK background thread, every 5 min.	`agent_version`, `pid_uptime_s`, `dropped_events_since_last`.

We deliberately do not model tool.call or llm.call as separate top-level events for MVP. Rolling them into session.end keeps query patterns simple (one row per session). Splitting later is an additive, not breaking, change.

4.3 Categories

category on session.start is a free string, agent-tagged. We accept that this means cross-customer comparison will need a normalization step (likely a periodic job that maps free strings → a curated taxonomy). Forcing a fixed enum on day one would either constrain real workflows or be ignored. The cost of free-string is one normalization job; the cost of premature enum is wrong data.

5. SDK contracts

5.1 Python (reference implementation)

import brain
brain.init(api_key=os.environ["BRAIN_API_KEY"])  # endpoint defaults to prod URL

with brain.session(category="email_reply") as s:
    s.tool("read_inbox", duration_ms=150, ok=True)
    s.llm(model="claude-sonnet-4-6", input_tokens=2400,
          output_tokens=350, cost_cents=2.1)
    s.set_summary("Replied to a billing dispute about an unpaid invoice.")
    # context exit emits session.end; status inferred from raised exception or set explicitly via s.fail("...") / s.escalate()

brain.track("error", {"message": "...", "kind": "TimeoutError",
                      "stack_hash": "sha256:..."})  # raw escape hatch

Behavior:

Background thread flushes the in-memory queue every 2 s or every 50 events, whichever comes first.
A heartbeat is emitted every 5 min.
Network errors are caught; events that fail to send go to a 1 MB on-disk ring buffer at ~/.m8brain/buffer.ndjson and replay on the next successful flush.
If the buffer fills, oldest events are dropped and a counter (dropped_events_since_last) is reported on the next heartbeat.
The SDK never raises into the agent, ever. All internal errors are caught and logged.

5.2 Claude Code adapter

A thin wrapper script claude-code-brain-hook.py plus a settings.json snippet customers add to their Claude Code install. Hooks used:

SessionStart → session.start
Stop / session-end equivalent → session.end
PostToolUse → appended to the in-flight session's tool_calls

The wrapper imports the same Python SDK; no separate code path.

5.3 paperclip

paperclip's runtime is not yet known to the brain team. For MVP, paperclip integrates via the raw HTTP contract (Section 4.1). Once we know its language, we wrap it as a thin adapter over the same wire format. Open question — see §11.

6. Privacy & redaction

6.1 Privacy tiers

Per-customer config (stored on the customers row) chooses one of:

Tier A — metadata only. No summary_raw is accepted; if sent, it is dropped at the ingestion boundary before the row is written.
Tier B — metadata + redacted summaries (DEFAULT). summary_raw is accepted, passed through Haiku, and only summary_redacted is persisted; the raw value never reaches durable storage.
Tier C — full transcripts. Tier B plus summary_raw is persisted alongside the redacted version. Requires a signed DPA and explicit customer opt-in. Not enabled for any customer at MVP launch.

6.2 Redaction prompt

Haiku call uses the following user prompt (system prompt sets the role):

Rewrite the following one-line agent task summary in 120 characters or fewer. Strip every personal name, email address, phone number, postal address, account number, and order/ticket ID. Preserve the business intent (what kind of task it was, what the outcome was). Reply with only the rewritten line.

Wrapped in a 5-second timeout. On failure, the event is still stored, summary_redacted is null, and the payload gains summary_failed: true so we can re-run later. The raw summary is never persisted on a redaction failure for tier B customers — it is dropped.

7. Auth model

API key per agent (not per customer). Format: m8brain_<env>_<32 base32 chars>, e.g. m8brain_prod_AB3F.... The m8brain_ prefix makes leaked keys greppable in code/logs.
Stored as sha256(key) only. Plaintext shown once at creation.
Authorization: Bearer <key> header. Lookup is where key_hash = sha256($1) and revoked_at is null.
Revocation is setting revoked_at. Rotation is "issue new key, dual-run, revoke old."
Admin endpoints (/admin/*) gated by a separate admin token (env var on the server). Not exposed to the public ALB.

8. Storage schema

create table customers (
  id            text primary key,            -- "cust_<slug>"
  name          text not null,
  privacy_tier  text not null check (privacy_tier in ('a','b','c')),
  created_at    timestamptz not null default now()
);

create table agents (
  id            text primary key,            -- "agent_<slug>"
  customer_id   text not null references customers(id),
  kind          text,                        -- 'booking'|'inbox'|'sales'|'ops'|'other' (advisory)
  version       text,                        -- updated from heartbeats
  last_seen_at  timestamptz,                 -- updated on every event
  created_at    timestamptz not null default now()
);

create table api_keys (
  id            text primary key,            -- "key_<slug>"
  agent_id      text not null references agents(id),
  key_hash      bytea not null unique,
  label         text,
  created_at    timestamptz not null default now(),
  revoked_at    timestamptz
);

create table events (
  id               bigserial primary key,
  ts               timestamptz not null,
  ingested_at      timestamptz not null default now(),
  customer_id      text not null,             -- denormalized for fast filtering
  agent_id         text not null,
  event_type       text not null,
  session_id       text,
  payload          jsonb not null,
  summary_redacted text                       -- null for tier-A or pre-redaction
);

create index events_customer_ts on events (customer_id, ts desc);
create index events_agent_ts    on events (agent_id, ts desc);
create index events_type_ts     on events (event_type, ts desc);
create index events_session     on events (session_id) where session_id is not null;

Tier-C raw summaries (when in scope) live inside payload->>'summary_raw'; we deliberately do not promote them to a column to avoid an accidental SELECT * leaking them.

Partition events by month once we cross ~10 M rows. Not needed at MVP volume.

9. Saved analytics (Metabase)

Six saved questions, one for each priority signal, ship with the brain:

Engagement — sessions/day per customer, last 30 days, sparkline. (Priority 1.)
Last-seen — now() - max(ts) per agent; agents with no event in > 1 h flagged. (Priority 1.)
Tool-call leaderboard — jsonb_array_elements(payload->'tool_calls') grouped by tool name × customer. (Priority 2.)
Token spend per customer — daily sum(payload->'llm_usage'->>'cost_cents'). (Unit economics, secondary.)
Category mix — session.start.category distribution per customer + cross-customer. (Priorities 2 & 3.)
Escalation/failure rate — session.end.status ratios per customer; leading churn indicator. (Priority 1.)

These are not the final analytics — they are the smallest set that demonstrably answers the three priority questions on day one.

10. Error handling

SDK side. Every internal error is caught. Network failures replay from the on-disk ring buffer. Buffer overflow drops oldest events and bumps the counter on the next heartbeat. The SDK is allowed to be lossy; it is never allowed to crash the agent.

Server side.

400 on schema validation failure (logged, not retried by SDK).
401 on bad / revoked / missing bearer token.
429 reserved for future rate limiting (not enforced at MVP).
503 on DB or downstream errors (SDK retries with exponential backoff: 1 s, 2 s, 4 s, 8 s, max 60 s).
Redaction Haiku failure: see §6.2 — the event still lands, redaction is marked failed.

Backpressure. Not enforced at MVP. If Postgres falls behind, the SDK will see 503s and retry; we will see it in events_rejected_total before customers do.

11. Ops & deployment

Brain server. One EC2 t3.small, Docker container of the FastAPI app, behind an ALB with TLS terminated at the ALB.
Deploy. GitHub Action builds image → pushes to ECR → SSH-and-docker compose pull && up (or aws ecs update-service if/when we adopt ECS).
DB. RDS Postgres 16, db.t4g.medium, gp3, daily automated snapshots, 7-day retention. Private subnet; brain-EC2 security group allowed.
Secrets. BRAIN_DB_URL, ANTHROPIC_API_KEY, BRAIN_ADMIN_TOKEN in AWS SSM Parameter Store, fetched at container start.
Logs/metrics. Structured JSON logs → CloudWatch. Three counters: events_ingested_total{event_type, customer_id}, events_rejected_total{reason}, redaction_latency_seconds.
Customer onboarding. A small admin CLI (brain admin create-customer, create-agent, issue-key) hits the server's /admin/* routes. Output: API key shown once. No web admin UI in MVP.

12. Testing

Unit tests: schema validation, key hashing/lookup, redaction prompt construction, SDK batching/buffer, ring-buffer overflow behavior.
Integration test: spin up Postgres (testcontainers), run FastAPI, post sample events, assert rows + redaction output. One end-to-end test in CI.
No load testing at MVP. Real volume from real customers will tell us where the next optimization lives.

13. Open questions

paperclip runtime. We assume HTTP-only integration until we know its language. If it turns out to be a runtime we can target with an SDK (Python/Node/Go), we wrap it post-MVP.
Category taxonomy. Free string at MVP. The normalization job (free string → curated taxonomy) is post-MVP work; the question is whether it's a periodic batch job or an LLM-based classifier at ingest time.
Tier-C launch criteria. No customer is at Tier C on day one. Before enabling, we need DPA template, a per-customer kill switch, and an audit log of every Tier-C row read.
Multi-tenant isolation. All customers share one Postgres instance with customer_id-scoped queries. If a single large customer's volume becomes disruptive, we partition the events table by customer_id or move them to a dedicated DB. Not needed at MVP.