← Back to index | ← 11 Operations | 13 Open / deferred →
Authentication, authorization, secrets, audit, and Swiss data-protection considerations.
Authentication — who can call what
Public Reads /api/v1/assess/{registry, framework/.../resolved, framework/.../calibrate}
Reads /api/v1/assess/slots
No auth required
Authenticated user Clerk JWT (browser) OR static bearer (server)
All PATCH endpoints, audit-log-writing actions
Tenant-scoped admin Clerk JWT with org_id matching the tenant
OR static bearer scoped per tenant (BEARER_MMC, BEARER_HSP)
Bridge integrations X-MMC-Bridge-Secret header (gateway → WP)
X-Deal-Secret header (quiz app → WP)
Clerk JWT (dashboard sessions)
The CoachPilot dashboard authenticates via Clerk. Clerk emits short-lived JWTs that the gateway validates via auth.py::get_current_user.
The JWT carries:
sub— Clerk user idorg_id— Clerk org id (maps to a tenant in our system via a manual mapping today)tier— subscription tier (Starter / Professional / Business)role—admin/developer/operator/viewer/service
Token verification happens via Clerk's JWKS endpoint, cached locally.
For service-to-service calls inside the cluster (e.g. agent → gateway), we use static bearer tokens.
Static bearers — BEARER_<tenant>
For inter-pod calls and CLI smoke testing, each tenant has a static bearer in the k8s secret coachpilot-secrets:
| Secret | Tenant | Purpose |
|---|---|---|
BEARER_MMC | mmc | Service-level access to MMC's data |
BEARER_HSP | hsp | (Reserved) |
Sent as Authorization: Bearer <token>. The gateway validates against the secret store.
Rotation: change the secret in k8s + redeploy gateway. There's a 5-minute window where in-flight requests with the old token still validate (cached); after that the old token is invalid.
Static bearers are NOT for end-user authentication. They have full tenant-scope access. Never expose them client-side.
RBAC — five roles
| Role | What they can do |
|---|---|
admin | Everything within their tenant; cross-tenant for platform admins |
developer | All PATCH/POST/DELETE except billing + role changes |
operator | PATCH config + restore snapshots; no new framework registration |
viewer | Read-only across their tenant |
service | Internal cluster calls (agent, cron); no human owns this role |
Feature gating per-tier is separate from RBAC. A starter tier admin can still PATCH but cannot use AI authoring (the feature gate blocks the action). See feature_gate.py.
Bridge secrets
Two distinct shared secrets — see 09 MMC bridge for the full pattern.
MMC_BRIDGE_INTEGRATION_SECRET
Gateway → WP. WP gates coachpilot/v1/* endpoints via mmc_bridge_integration_auth permission_callback using hash_equals for constant-time comparison.
archetype_scores_secret (per-tenant)
Quiz app → tenant CRM. Stored in tenant_brand.archetype_scores_secret (per-tenant). Quiz app sends as X-Deal-Secret header. MMC validates against WP option sg_deal_webhook_secret.
Rotation: 24h dual-validate window. Update both ends in lockstep, then restart pods.
Audit log
assess_audit_log records every state-changing operation. See 04 Config schema for the schema.
Critical implementation note: audit-log INSERTs MUST use their OWN short-lived autocommit connection (not the per-request transactional one). Otherwise an HTTPException rolls back the audit row alongside the operation, and we lose the trace of who attempted what.
Pattern:
def write_audit_row(...):
with psycopg.connect(DATABASE_URL, autocommit=True) as conn:
conn.execute("INSERT INTO assess_audit_log (...) VALUES (...)", [...])
See feedback_dashboard_runner_defensive.md.
What gets audited
- PATCH
framework_configs(every column, every change) - PATCH
tenant_brand - POST snapshot restore
- POST AI authoring (amend / generate)
- POST tenant document upload (Personas)
- DELETE tenant document
- POST run followups (the cron-triggered action)
What doesn't get audited (public-readable endpoints):
- GET
/resolved - GET
/calibrate(no state change) - GET tenant brand
PII handling
The platform stores PII in three places:
assess_sessions.user_email,user_name— full PIItenant_brand.coach_*— coach's own PII (their info, not players')assess_client_documents(Personas, Phase 10) — bio PDFs may contain detailed PII
Swiss nDSG + GDPR compliance:
- All databases run in Exoscale Zurich (Swiss data residency).
- Encryption at rest via Exoscale managed storage.
- TLS 1.2+ on every endpoint.
- Right-to-delete:
DELETE FROM tenants WHERE id = ...cascades through cascade FKs. PII gone within minutes. Backup window is 7 days — after which it's purged from snapshots too.
For analytics events (D3 deferred): payloads sent to analytics_webhook MUST hash player identifiers (NEVER raw email). The CRM payload to archetype_scores_webhook is the only place raw PII goes outbound, and only when the tenant has explicitly configured the URL.
Frame-ancestors / iframe whitelist
The public quiz allows iframing only from approved tenant domains. CSP set in middleware.ts:
Content-Security-Policy: frame-ancestors 'self'
https://mindfulmoneycoaching.online
https://*.mindfulmoneycoaching.online
https://*.thesynergygroup.ch
...
When onboarding a new white-label tenant who needs iframe hosting:
- Add their domain to the CSP
frame-ancestorslist. - Build + deploy new moneyquiz-admin image.
- Test from their site.
Don't use a wildcard * — that opens up clickjacking risks on the booking flow.
Secret discovery / rotation playbook
If a secret leaks (committed to git, posted in a Slack channel, etc.):
- Immediately rotate the affected secret. Don't wait to "verify" — assume worst case.
- For
archetype_scores_secret: dual-validate 24h window; update WP +tenant_brandrow; restart gateway. - For
BEARER_<tenant>: update k8s secret; redeploy gateway; revoke old. - For Clerk: rotate via Clerk dashboard; tokens with the old signing key invalidate immediately.
- For Hostinger SSH: rotate password via Hostinger panel; update vault.
- Audit what was accessed during the leak window. Check
assess_audit_logfor unusualactor_user_idvalues. - Document the incident in
docs/incidents/<date>-<short-desc>.md.
What we don't have yet (deferred)
- MFA on dashboard admin sessions — Clerk supports it; not enforced.
- IP allowlist on gateway PATCH endpoints — anyone with a valid bearer can call them. Adding allowlist would be Cloudflare-level.
- Rate limiting on
/calibrate— public endpoint; today protected only by reasonable HTTP timeouts. If misused for DoS, would need per-IP rate limiting. - Secret expiration policies — static bearers don't expire. Rotate manually on a schedule (proposed: quarterly).
- WAF rules — no WAF in front of the gateway today.
See 13 Open / deferred.
OWASP touchpoints
Where each OWASP Top 10 is handled (or known-deferred):
| OWASP | Where |
|---|---|
| A01 Broken Access Control | auth.py::get_current_user + role checks per endpoint |
| A02 Cryptographic Failures | TLS termination at Exoscale NLB; secrets in k8s; hash_equals for bridge auth |
| A03 Injection | All Postgres queries parameterised (no string concat); WP uses $wpdb->prepare |
| A04 Insecure Design | Audit log on every state change; snapshot history; explicit opt-in for webhooks |
| A05 Security Misconfiguration | Pre-deploy checklist; no env vars in image |
| A06 Vulnerable Components | pip-audit + npm audit runs in CI (not in repo for now); manual review of new deps |
| A07 Authentication Failures | Clerk handles brute-force protection; static bearers should be high-entropy |
| A08 Software/Data Integrity | Snapshots make config changes auditable + reversible |
| A09 Logging Failures | assess_audit_log for state changes; pod logs for runtime events |
| A10 SSRF | Gateway only calls known per-tenant webhook URLs (not user-supplied per-request) |
Next
→ 13 Open / deferred — what's not done, what's intentionally deferred, the future-work backlog.