Phase B.2 — Claude Code hooks for agent-runtime

Date: 2026-05-03 Phase: B.2 (telemetry — agent-side hooks) Predecessor: B.1 (m8trx-claude-isolate wrapper telemetry patch) Successor: B.3 (host-side heartbeat)

Goal

Install Claude Code hooks inside the M8trx agent-runtime container that emit one tool_call event to the brain ingestion API for every tool Claude invokes. Operators get fleet-wide visibility into "what tools are being called, by which agent, how often, with what payload sizes" — without ever seeing tool argument or output content.

Why now

Phase B.1 instrumented the wrapper (m8trx-claude-isolate), which runs outside the agent-runtime container and emits session.start / session.end from the bash EXIT trap. The wrapper can't see anything that happens inside Claude — including which tools fire. Phase B.2 closes that gap with hooks running inside the container.

Architecture

                     ┌─ paperclipai container ─┐
                     │  m8trx-claude-isolate    │
                     │  (B.1 wrapper)           │ ── session.start ───┐
                     │  trap EXIT → session.end │ ── session.end  ────┼──► brain /v1/events
                     │                          │                     │
                     │  docker run -e BRAIN_*   │                     │
                     │     │                    │                     │
                     │     ▼                    │                     │
                     │  ┌─ agent-runtime ────┐  │                     │
                     │  │  claude (Node)     │  │                     │
                     │  │   ↓ PostToolUse    │  │                     │
                     │  │  brain-hook (sh)   │ ─┼─ tool_call ─────────┘
                     │  │  reads stdin JSON  │  │
                     │  │  curl POST 2s t/o  │  │
                     │  │  exit 0 always     │  │
                     │  └────────────────────┘  │
                     └──────────────────────────┘

Env vars are passed by the wrapper via docker run -e (already wired in B.1, see agent-artifacts/m8trx-claude-isolate.modified lines 245–254): BRAIN_URL, BRAIN_API_KEY, BRAIN_AGENT_ID, BRAIN_RUN_ID. The hook reads these from its env. If BRAIN_URL or BRAIN_API_KEY is unset, the hook is a silent no-op — same opt-in pattern as the wrapper.

Hook coverage

PostToolUse only, matcher *. Fires once per completed tool call.

The Stop hook was considered (originally listed in docs/RESUME.md) but rejected: it would only emit session.end events that duplicate what the wrapper's EXIT trap already produces. Adding more hook types (UserPromptSubmit, PreToolUse) is out of scope for MVP — easy to add later when fleet data tells us what we're missing.

Components

Three files, all under agent-artifacts/claude-hooks/:

1. `settings.json`

The Claude Code hooks config. Installed at /etc/claude-code/managed-settings.json inside the agent-runtime container — the highest-priority settings tier, which the customer cannot override by dropping a .claude/settings.json into their workspace.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "*",
        "hooks": [
          { "type": "command", "command": "/usr/local/bin/brain-hook" }
        ]
      }
    ]
  }
}

2. `brain-hook`

POSIX sh script. Installed at /usr/local/bin/brain-hook, mode 0755, root-owned, world-readable+executable (the node user inside the container needs to execute it).

Behaviour:

Read stdin (Claude Code's PostToolUse JSON).
If BRAIN_URL, BRAIN_API_KEY, or BRAIN_AGENT_ID is unset → exit 0 immediately (no POST attempted).
Parse stdin with jq:
- tool_name from .tool_name
- input_bytes from .tool_input | tostring | length
- output_bytes from .tool_response | tostring | length
Build event JSON (see Event payload contract below).
POST with curl --silent --max-time 2 --retry 0 to ${BRAIN_URL%/}/v1/events, with Authorization: Bearer ${BRAIN_API_KEY} and Content-Type: application/json.
On any failure (jq parse, curl exit non-zero, brain non-2xx): if BRAIN_DEBUG=1, write one line to stderr; otherwise silent.
Always exit 0.

Implementation notes:

Use set -e plus an EXIT trap that forces exit 0. Do not use set -u — the script must tolerate missing optional env vars.
event_id from cat /proc/sys/kernel/random/uuid (matches B.1 wrapper, no extra dep).
ts from date -u +"%Y-%m-%dT%H:%M:%S.000Z". Brain validates with Date.parse, which accepts this; fixed .000 ms is fine for tool-call resolution.

3. `README.md`

Short doc covering:

Where each artifact installs to inside the agent-runtime image.
The exact diff M8trxAgent's Dockerfile will need to apply to integrate these (add jq to the existing apt-get install line; COPY the two files into the right paths; chmod +x the hook).
How to run the standalone test suite (bin/test-brain-hook.sh).

Event payload contract

{
  "event_id": "<uuid v4>",
  "ts": "2026-05-03T19:42:11.000Z",
  "event_type": "tool_call",
  "agent_id": "<$BRAIN_AGENT_ID>",
  "run_id":   "<$BRAIN_RUN_ID, omitted if empty>",
  "payload": {
    "tool_name":    "<from stdin .tool_name>",
    "input_bytes":  <int>,
    "output_bytes": <int>
  }
}

agent_id is required by server/src/routes/events.js's validateEvent. The wrapper guarantees BRAIN_AGENT_ID is set whenever BRAIN_URL and BRAIN_API_KEY are set; the hook treats BRAIN_AGENT_ID as required and exits 0 if missing.
run_id is optional in brain. The hook omits the field when BRAIN_RUN_ID is empty (does not send "run_id": "").
customer_id is not in the payload — brain derives it from the bearer key via requireCustomerAuth injecting req.customerId (server/src/auth.js).
Privacy: tool argument and output content are never sent. Only tool name and byte counts. This matches the design decision in 2026-05-03-brain-mvp-ingestion-design.md ("managed metadata only").

Env contract

Var	Set by	Required for telemetry	Hook behaviour if unset
`BRAIN_URL`	wrapper (`docker run -e`)	yes	silent no-op
`BRAIN_API_KEY`	wrapper	yes	silent no-op
`BRAIN_AGENT_ID`	wrapper (from `$PAPERCLIP_AGENT_ID`)	yes	silent no-op
`BRAIN_RUN_ID`	wrapper (from `$PAPERCLIP_RUN_ID`)	no	field omitted from payload
`BRAIN_DEBUG`	operator override (`docker run -e BRAIN_DEBUG=1`)	no	suppress all stderr (default)

Error handling

Failure	Behaviour
`BRAIN_URL` or `BRAIN_API_KEY` unset	Exit 0 immediately, no POST.
`BRAIN_AGENT_ID` unset	Exit 0 immediately. (Should never happen with the wrapper; defensive.)
`jq` parse fails on stdin	Exit 0. If `BRAIN_DEBUG=1`, log "brain-hook: stdin parse failed".
`curl` fails at the transport layer (timeout, DNS, connection refused)	Exit 0. If `BRAIN_DEBUG=1`, log "brain-hook: POST failed (curl exit N)".
Brain returns non-2xx (4xx bad event shape, 5xx server error)	Exit 0. If `BRAIN_DEBUG=1`, log status + response body. This is the "we shipped a bad hook" canary. The hook must explicitly check HTTP status — `curl --silent` alone exits 0 for any received response, so the implementation needs `--fail`/`--fail-with-body` or an explicit `--write-out '%{http_code}'` capture.
Hook crashes (syntax error)	`set -e` + `EXIT` trap forces exit 0.

The exit 0-always invariant matters because Claude Code treats a non-zero hook exit as a tool-blocking signal. Telemetry must never break the customer's task.

Stderr behaviour: silent by default. With BRAIN_DEBUG=1, each failure writes one short line. Claude Code surfaces hook stderr in its transcript log; it does not dump it into the model's context or the user's terminal, so debug-mode output is operator-visible only.

Testing

Standalone hook test (in scope for this phase)

A bin/test-brain-hook.sh script in the brain repo, runnable against the local brain server, covering:

Happy path — pipe a valid PostToolUse JSON to the hook with correct env, then verify the event landed by querying postgres directly: docker compose exec -T postgres psql -U brain brain -c "select event_type, agent_id, run_id, payload from events order by ts desc limit 1". Expect tool_call, the right agent_id, the right run_id (or NULL), and payload matching {"tool_name":..., "input_bytes":..., "output_bytes":...}. Direct psql is used because the dashboard API exposes only aggregated rollups (/customers, /agents, /tools, /overview), not individual event lookups.
No telemetry config — BRAIN_URL unset, no event posted.
Bad URL with debug on — BRAIN_URL=http://192.0.2.1:1 (RFC 5737 unroutable) + BRAIN_DEBUG=1, verify one stderr line within ~2s and exit 0.
Malformed stdin with debug on — echo "not json" | … + BRAIN_DEBUG=1, verify one "stdin parse failed" stderr line and exit

Bootstrap: the script mints a fresh test key via docker compose exec brain-api node bin/mint-key.js cust_m8trx_test "M8TRX Test" (the original key from yesterday's session was shown once and is no longer recoverable).

End-to-end test inside agent-runtime container — deferred

Real in-container e2e requires editing M8trxAgent's images/agent-runtime/Dockerfile to add jq and COPY the two files. The RESUME plan explicitly defers that integration step (this phase only documents the Dockerfile diff in README.md). True e2e lands when M8trxAgent's Dockerfile takes its first non-placeholder commit.

Out of scope

Editing the M8trxAgent Dockerfile. Documented here, applied later.
Other hook events (UserPromptSubmit, PreToolUse, Stop, SessionStart). Add when fleet data justifies.
Capturing tool argument or output content (privacy-tier work was dropped per yesterday's design call).
Retrofitting BRAIN_DEBUG to the B.1 wrapper for consistency. Worth doing in a small follow-up; not part of this phase.
A proper "no events from agent X for N minutes" operator alert. That needs B.3 (heartbeats) plus dashboard work.

Open questions

None at design-approval time. All five clarifying questions were resolved interactively in the brainstorming session before this spec was written.