Date: 2026-05-03 Phase: C (operator-facing onboarding documentation) Predecessor: B.4 (Tailscale + cloud-init bootstrap) Successor: none — final MVP phase
Ship docs/runbook-connect-customer-ec2.md: a single-page operator
runbook that synthesizes the four B.x phases (wrapper telemetry,
Claude Code hooks, host heartbeat, cloud-init bootstrap) into an
end-to-end "from zero to telemetry-arriving" sequence for connecting
a brand-new customer EC2 to the M8trx brain.
The runbook is the single entry point an internal operator opens when a new customer needs onboarding. It covers the order and prerequisites of what to do; the details (script behaviour, debug recipes, IAM policy text) live in the per-phase READMEs the runbook links out to.
All four B.x phases shipped artifacts that work in isolation. The
per-phase READMEs (agent-artifacts/<phase>/README.md) cover their
own slice well. What's missing is the operator-facing "where do I
start" entry point — the doc you open the first time you need to
connect a real customer EC2 and want a single sequenced checklist
instead of stitching four READMEs together yourself.
This is the last MVP phase. Once it's in, "first customer connect" is operationally documented end-to-end.
Internal M8trx operator only. Assumes:
mint-key.js.The runbook does not explain what SSM, Tailscale, or Terraform are. If/when M8trx ever needs a customer-DevOps-facing variant (for customers running their own AWS accounts), it'll fork from this internal version. Building both now is YAGNI.
Onboard only.
Out of scope (each handled separately when needed):
docs/superpowers/specs/2026-04-29-brain-design.md).Single file docs/runbook-connect-customer-ec2.md. Six h2 sections,
in operator-execution order:
What must already exist before any customer onboard. Each item is a checklist line with a "verify by" hint and a pointer to the source-of-truth doc:
curl http://<brain-tailscale-ip>:8080/v1/healthz).tag:m8trx-customer-host ACL role
scoped to brain-only egress (Tailscale admin console)./m8trx/brain-url (String, the brain Tailscale URL)./m8trx/tailscale/auth-key (SecureString, reusable+ephemeral
key tagged tag:m8trx-customer-host).agent-artifacts/cloud-init/iam-policy.json (Terraform snippet
in agent-artifacts/cloud-init/README.md § Terraform launch
snippet).The 4-step sequence for every new customer:
docker compose -f /home/ubuntu/brain/server/docker-compose.yml \
exec -T brain-api node bin/mint-key.js cust_<id> "<Display Name>"
aws ssm put-parameter --name /m8trx/cust_<id>/brain-key \
--type SecureString --value "$KEY"
tag:m8trx-customer-host rule is in
prerequisites).agent-artifacts/cloud-init/README.md § Terraform launch snippet.The 3-command success check:
m8trx-bootstrap: complete for cust_<id>.docker compose -f /home/ubuntu/brain/server/docker-compose.yml \
exec -T postgres psql -U brain brain -tAc \
"select payload->>'hostname' from events
where customer_id='cust_<id>'
and event_type='heartbeat'
order by ts desc limit 1"
Expect a real hostname.Pointer-only section. Don't restate the full failure → fix mapping; just link to it:
See
agent-artifacts/cloud-init/README.md§ Operator debug recipe for the full failure → fix mapping (Customer tag missing, AccessDeniedException, ParameterNotFound, tailscale auth fail, no events arriving).
Restate the Out-of-scope items above so a reader landing on the runbook knows what to look for elsewhere. Particularly important: "updating existing customer EC2" → terminate + relaunch.
Per-phase doc layers, README first (most operationally relevant), spec second:
agent-artifacts/m8trx-claude-isolate.{patch,modified} (the patch + the post-patch script for inspection; no dedicated README). Upstream design context: docs/superpowers/specs/2026-05-03-brain-mvp-ingestion-design.md (the MVP ingestion design that motivated the wrapper telemetry).agent-artifacts/claude-hooks/README.md. Spec: docs/superpowers/specs/2026-05-03-brain-claude-hooks-design.md.agent-artifacts/heartbeat/README.md. Spec: docs/superpowers/specs/2026-05-03-brain-host-heartbeat-design.md.agent-artifacts/cloud-init/README.md. Spec: docs/superpowers/specs/2026-05-03-brain-cloud-init-bootstrap-design.md.Plans (docs/superpowers/plans/*) are intentionally omitted from
the runbook — they're implementation history, not operational
reference.
In addition to the new runbook file, one small change to
docs/RESUME.md: add a top-level pointer at the start of "What's
running right now" or as a new "Operator runbook" section so a
returning operator/contributor sees:
For onboarding a new customer EC2, read
docs/runbook-connect-customer-ec2.mdfirst.
This is the runbook's discoverability hook from the doc operators already know to open.
The runbook itself is doc-only — there's no test suite that runs
against it. Validation is empirical: the first real-customer
connect IS the runbook's validation. If a reasonably-careful
operator follows the runbook end-to-end and gets cust_<id>
heartbeats arriving at brain, the runbook works. If they get stuck,
that's a runbook bug to fix in a follow-up.
To set the runbook up for that test, the spec mandates:
<id> placeholder is filled in by the
operator).agent-artifacts/cloud-init/README.md § Manual smoke
procedure (which has the cleanup steps).None at design-approval time. All three clarifying questions were resolved interactively before this spec was written.