← Back to index | ← 11 Operations | 13 Open / deferred →

Authentication, authorization, secrets, audit, and Swiss data-protection considerations.


Authentication — who can call what

Public                  Reads /api/v1/assess/{registry, framework/.../resolved, framework/.../calibrate}
                        Reads /api/v1/assess/slots
                        No auth required

Authenticated user      Clerk JWT (browser) OR static bearer (server)
                        All PATCH endpoints, audit-log-writing actions

Tenant-scoped admin     Clerk JWT with org_id matching the tenant
                        OR static bearer scoped per tenant (BEARER_MMC, BEARER_HSP)

Bridge integrations     X-MMC-Bridge-Secret header (gateway → WP)
                        X-Deal-Secret header (quiz app → WP)

Clerk JWT (dashboard sessions)

The CoachPilot dashboard authenticates via Clerk. Clerk emits short-lived JWTs that the gateway validates via auth.py::get_current_user.

The JWT carries:

  • sub — Clerk user id
  • org_id — Clerk org id (maps to a tenant in our system via a manual mapping today)
  • tier — subscription tier (Starter / Professional / Business)
  • roleadmin / developer / operator / viewer / service

Token verification happens via Clerk's JWKS endpoint, cached locally.

For service-to-service calls inside the cluster (e.g. agent → gateway), we use static bearer tokens.


Static bearers — BEARER_<tenant>

For inter-pod calls and CLI smoke testing, each tenant has a static bearer in the k8s secret coachpilot-secrets:

SecretTenantPurpose
BEARER_MMCmmcService-level access to MMC's data
BEARER_HSPhsp(Reserved)

Sent as Authorization: Bearer <token>. The gateway validates against the secret store.

Rotation: change the secret in k8s + redeploy gateway. There's a 5-minute window where in-flight requests with the old token still validate (cached); after that the old token is invalid.

Static bearers are NOT for end-user authentication. They have full tenant-scope access. Never expose them client-side.


RBAC — five roles

RoleWhat they can do
adminEverything within their tenant; cross-tenant for platform admins
developerAll PATCH/POST/DELETE except billing + role changes
operatorPATCH config + restore snapshots; no new framework registration
viewerRead-only across their tenant
serviceInternal cluster calls (agent, cron); no human owns this role

Feature gating per-tier is separate from RBAC. A starter tier admin can still PATCH but cannot use AI authoring (the feature gate blocks the action). See feature_gate.py.


Bridge secrets

Two distinct shared secrets — see 09 MMC bridge for the full pattern.

MMC_BRIDGE_INTEGRATION_SECRET

Gateway → WP. WP gates coachpilot/v1/* endpoints via mmc_bridge_integration_auth permission_callback using hash_equals for constant-time comparison.

archetype_scores_secret (per-tenant)

Quiz app → tenant CRM. Stored in tenant_brand.archetype_scores_secret (per-tenant). Quiz app sends as X-Deal-Secret header. MMC validates against WP option sg_deal_webhook_secret.

Rotation: 24h dual-validate window. Update both ends in lockstep, then restart pods.


Audit log

assess_audit_log records every state-changing operation. See 04 Config schema for the schema.

Critical implementation note: audit-log INSERTs MUST use their OWN short-lived autocommit connection (not the per-request transactional one). Otherwise an HTTPException rolls back the audit row alongside the operation, and we lose the trace of who attempted what.

Pattern:

def write_audit_row(...):
    with psycopg.connect(DATABASE_URL, autocommit=True) as conn:
        conn.execute("INSERT INTO assess_audit_log (...) VALUES (...)", [...])

See feedback_dashboard_runner_defensive.md.

What gets audited

  • PATCH framework_configs (every column, every change)
  • PATCH tenant_brand
  • POST snapshot restore
  • POST AI authoring (amend / generate)
  • POST tenant document upload (Personas)
  • DELETE tenant document
  • POST run followups (the cron-triggered action)

What doesn't get audited (public-readable endpoints):

  • GET /resolved
  • GET /calibrate (no state change)
  • GET tenant brand

PII handling

The platform stores PII in three places:

  1. assess_sessions.user_email, user_name — full PII
  2. tenant_brand.coach_* — coach's own PII (their info, not players')
  3. assess_client_documents (Personas, Phase 10) — bio PDFs may contain detailed PII

Swiss nDSG + GDPR compliance:

  • All databases run in Exoscale Zurich (Swiss data residency).
  • Encryption at rest via Exoscale managed storage.
  • TLS 1.2+ on every endpoint.
  • Right-to-delete: DELETE FROM tenants WHERE id = ... cascades through cascade FKs. PII gone within minutes. Backup window is 7 days — after which it's purged from snapshots too.

For analytics events (D3 deferred): payloads sent to analytics_webhook MUST hash player identifiers (NEVER raw email). The CRM payload to archetype_scores_webhook is the only place raw PII goes outbound, and only when the tenant has explicitly configured the URL.


Frame-ancestors / iframe whitelist

The public quiz allows iframing only from approved tenant domains. CSP set in middleware.ts:

Content-Security-Policy: frame-ancestors 'self'
  https://mindfulmoneycoaching.online
  https://*.mindfulmoneycoaching.online
  https://*.thesynergygroup.ch
  ...

When onboarding a new white-label tenant who needs iframe hosting:

  1. Add their domain to the CSP frame-ancestors list.
  2. Build + deploy new moneyquiz-admin image.
  3. Test from their site.

Don't use a wildcard * — that opens up clickjacking risks on the booking flow.


Secret discovery / rotation playbook

If a secret leaks (committed to git, posted in a Slack channel, etc.):

  1. Immediately rotate the affected secret. Don't wait to "verify" — assume worst case.
  2. For archetype_scores_secret: dual-validate 24h window; update WP + tenant_brand row; restart gateway.
  3. For BEARER_<tenant>: update k8s secret; redeploy gateway; revoke old.
  4. For Clerk: rotate via Clerk dashboard; tokens with the old signing key invalidate immediately.
  5. For Hostinger SSH: rotate password via Hostinger panel; update vault.
  6. Audit what was accessed during the leak window. Check assess_audit_log for unusual actor_user_id values.
  7. Document the incident in docs/incidents/<date>-<short-desc>.md.

What we don't have yet (deferred)

  • MFA on dashboard admin sessions — Clerk supports it; not enforced.
  • IP allowlist on gateway PATCH endpoints — anyone with a valid bearer can call them. Adding allowlist would be Cloudflare-level.
  • Rate limiting on /calibrate — public endpoint; today protected only by reasonable HTTP timeouts. If misused for DoS, would need per-IP rate limiting.
  • Secret expiration policies — static bearers don't expire. Rotate manually on a schedule (proposed: quarterly).
  • WAF rules — no WAF in front of the gateway today.

See 13 Open / deferred.


OWASP touchpoints

Where each OWASP Top 10 is handled (or known-deferred):

OWASPWhere
A01 Broken Access Controlauth.py::get_current_user + role checks per endpoint
A02 Cryptographic FailuresTLS termination at Exoscale NLB; secrets in k8s; hash_equals for bridge auth
A03 InjectionAll Postgres queries parameterised (no string concat); WP uses $wpdb->prepare
A04 Insecure DesignAudit log on every state change; snapshot history; explicit opt-in for webhooks
A05 Security MisconfigurationPre-deploy checklist; no env vars in image
A06 Vulnerable Componentspip-audit + npm audit runs in CI (not in repo for now); manual review of new deps
A07 Authentication FailuresClerk handles brute-force protection; static bearers should be high-entropy
A08 Software/Data IntegritySnapshots make config changes auditable + reversible
A09 Logging Failuresassess_audit_log for state changes; pod logs for runtime events
A10 SSRFGateway only calls known per-tenant webhook URLs (not user-supplied per-request)

Next

13 Open / deferred — what's not done, what's intentionally deferred, the future-work backlog.