CoachPilot — Your AI-Powered Coaching Business

← Back to index | ← 11 Operations | 13 Open / deferred →

Authentication, authorization, secrets, audit, and Swiss data-protection considerations.

Authentication — who can call what

Public                  Reads /api/v1/assess/{registry, framework/.../resolved, framework/.../calibrate}
                        Reads /api/v1/assess/slots
                        No auth required

Authenticated user      Clerk JWT (browser) OR static bearer (server)
                        All PATCH endpoints, audit-log-writing actions

Tenant-scoped admin     Clerk JWT with org_id matching the tenant
                        OR static bearer scoped per tenant (BEARER_MMC, BEARER_HSP)

Bridge integrations     X-MMC-Bridge-Secret header (gateway → WP)
                        X-Deal-Secret header (quiz app → WP)

Clerk JWT (dashboard sessions)

The CoachPilot dashboard authenticates via Clerk. Clerk emits short-lived JWTs that the gateway validates via auth.py::get_current_user.

The JWT carries:

sub — Clerk user id
org_id — Clerk org id (maps to a tenant in our system via a manual mapping today)
tier — subscription tier (Starter / Professional / Business)
role — admin / developer / operator / viewer / service

Token verification happens via Clerk's JWKS endpoint, cached locally.

For service-to-service calls inside the cluster (e.g. agent → gateway), we use static bearer tokens.

Static bearers — `BEARER_<tenant>`

For inter-pod calls and CLI smoke testing, each tenant has a static bearer in the k8s secret coachpilot-secrets:

Secret	Tenant	Purpose
`BEARER_MMC`	`mmc`	Service-level access to MMC's data
`BEARER_HSP`	`hsp`	(Reserved)

Sent as Authorization: Bearer <token>. The gateway validates against the secret store.

Rotation: change the secret in k8s + redeploy gateway. There's a 5-minute window where in-flight requests with the old token still validate (cached); after that the old token is invalid.

Static bearers are NOT for end-user authentication. They have full tenant-scope access. Never expose them client-side.

RBAC — five roles

Role	What they can do
`admin`	Everything within their tenant; cross-tenant for platform admins
`developer`	All PATCH/POST/DELETE except billing + role changes
`operator`	PATCH config + restore snapshots; no new framework registration
`viewer`	Read-only across their tenant
`service`	Internal cluster calls (agent, cron); no human owns this role

Feature gating per-tier is separate from RBAC. A starter tier admin can still PATCH but cannot use AI authoring (the feature gate blocks the action). See feature_gate.py.

Bridge secrets

Two distinct shared secrets — see 09 MMC bridge for the full pattern.

`MMC_BRIDGE_INTEGRATION_SECRET`

Gateway → WP. WP gates coachpilot/v1/* endpoints via mmc_bridge_integration_auth permission_callback using hash_equals for constant-time comparison.

`archetype_scores_secret` (per-tenant)

Quiz app → tenant CRM. Stored in tenant_brand.archetype_scores_secret (per-tenant). Quiz app sends as X-Deal-Secret header. MMC validates against WP option sg_deal_webhook_secret.

Rotation: 24h dual-validate window. Update both ends in lockstep, then restart pods.

Audit log

assess_audit_log records every state-changing operation. See 04 Config schema for the schema.

Critical implementation note: audit-log INSERTs MUST use their OWN short-lived autocommit connection (not the per-request transactional one). Otherwise an HTTPException rolls back the audit row alongside the operation, and we lose the trace of who attempted what.

Pattern:

def write_audit_row(...):
    with psycopg.connect(DATABASE_URL, autocommit=True) as conn:
        conn.execute("INSERT INTO assess_audit_log (...) VALUES (...)", [...])

See feedback_dashboard_runner_defensive.md.

What gets audited

PATCH framework_configs (every column, every change)
PATCH tenant_brand
POST snapshot restore
POST AI authoring (amend / generate)
POST tenant document upload (Personas)
DELETE tenant document
POST run followups (the cron-triggered action)

What doesn't get audited (public-readable endpoints):

GET /resolved
GET /calibrate (no state change)
GET tenant brand

PII handling

The platform stores PII in three places:

assess_sessions.user_email, user_name — full PII
tenant_brand.coach_* — coach's own PII (their info, not players')
assess_client_documents (Personas, Phase 10) — bio PDFs may contain detailed PII

Swiss nDSG + GDPR compliance:

All databases run in Exoscale Zurich (Swiss data residency).
Encryption at rest via Exoscale managed storage.
TLS 1.2+ on every endpoint.
Right-to-delete: DELETE FROM tenants WHERE id = ... cascades through cascade FKs. PII gone within minutes. Backup window is 7 days — after which it's purged from snapshots too.

For analytics events (D3 deferred): payloads sent to analytics_webhook MUST hash player identifiers (NEVER raw email). The CRM payload to archetype_scores_webhook is the only place raw PII goes outbound, and only when the tenant has explicitly configured the URL.

Frame-ancestors / iframe whitelist

The public quiz allows iframing only from approved tenant domains. CSP set in middleware.ts:

Content-Security-Policy: frame-ancestors 'self'
  https://mindfulmoneycoaching.online
  https://*.mindfulmoneycoaching.online
  https://*.thesynergygroup.ch
  ...

When onboarding a new white-label tenant who needs iframe hosting:

Add their domain to the CSP frame-ancestors list.
Build + deploy new moneyquiz-admin image.
Test from their site.

Don't use a wildcard * — that opens up clickjacking risks on the booking flow.

Secret discovery / rotation playbook

If a secret leaks (committed to git, posted in a Slack channel, etc.):

Immediately rotate the affected secret. Don't wait to "verify" — assume worst case.
For archetype_scores_secret: dual-validate 24h window; update WP + tenant_brand row; restart gateway.
For BEARER_<tenant>: update k8s secret; redeploy gateway; revoke old.
For Clerk: rotate via Clerk dashboard; tokens with the old signing key invalidate immediately.
For Hostinger SSH: rotate password via Hostinger panel; update vault.
Audit what was accessed during the leak window. Check assess_audit_log for unusual actor_user_id values.
Document the incident in docs/incidents/<date>-<short-desc>.md.

What we don't have yet (deferred)

MFA on dashboard admin sessions — Clerk supports it; not enforced.
IP allowlist on gateway PATCH endpoints — anyone with a valid bearer can call them. Adding allowlist would be Cloudflare-level.
Rate limiting on /calibrate — public endpoint; today protected only by reasonable HTTP timeouts. If misused for DoS, would need per-IP rate limiting.
Secret expiration policies — static bearers don't expire. Rotate manually on a schedule (proposed: quarterly).
WAF rules — no WAF in front of the gateway today.

See 13 Open / deferred.

OWASP touchpoints

Where each OWASP Top 10 is handled (or known-deferred):

OWASP	Where
A01 Broken Access Control	`auth.py::get_current_user` + role checks per endpoint
A02 Cryptographic Failures	TLS termination at Exoscale NLB; secrets in k8s; `hash_equals` for bridge auth
A03 Injection	All Postgres queries parameterised (no string concat); WP uses `$wpdb->prepare`
A04 Insecure Design	Audit log on every state change; snapshot history; explicit opt-in for webhooks
A05 Security Misconfiguration	Pre-deploy checklist; no env vars in image
A06 Vulnerable Components	`pip-audit` + `npm audit` runs in CI (not in repo for now); manual review of new deps
A07 Authentication Failures	Clerk handles brute-force protection; static bearers should be high-entropy
A08 Software/Data Integrity	Snapshots make config changes auditable + reversible
A09 Logging Failures	`assess_audit_log` for state changes; pod logs for runtime events
A10 SSRF	Gateway only calls known per-tenant webhook URLs (not user-supplied per-request)

→ 13 Open / deferred — what's not done, what's intentionally deferred, the future-work backlog.