Spec-Version: 1.0 (2026-05-19) Status: Living document — items move between sections as priorities shift Sibling: ARCHITECTURE.md — the architectural reference this roadmap operates against
What this is
A prioritised, dated backlog of work remaining on the Money Quiz platform after the 2026-05-19 bundle (Tracks B, C, D, A, plus polish + post-shipment fixes). Each item has:
- Goal — what success looks like
- Effort — rough engineering days
- Trigger — when this becomes urgent
- Outline — high-level approach (not a step-by-step plan)
- Dependencies — what must land first
When you start an item, move it from its current section to ## In flight with the start date. When you ship it, move it to 13 Open / deferred under "Recently shipped" with the ship date.
Section 0 — Done (recent shipment, this entry mostly for context)
| Date | What |
|---|---|
| 2026-05-18 | C-phase format rename complete (9 phases in one day, zero breakage) |
| 2026-05-19 | Track D: D4 migration + universal archetype-scores webhook + per-tenant coach notification + brand editor with 4 new fields |
| 2026-05-19 | Track C: agent-side + gateway-side calibration; 33 regression tests; MMC Likert calibrated via WP bridge |
| 2026-05-19 | Track B: TenantThemeProvider + TenantBrandContext + dashboard brand preview card |
| 2026-05-19 | Track A: 14-file architecture doc + 4 Mermaid diagrams + dashboard /docs/ route + sidebar entry + help-panel deep links |
| 2026-05-19 | Polish: D3 GA4 mq_variant propagation; production smoke (5/5 pass) |
| 2026-05-19 | Two §7 fixes: middleware writes x-resolved-tenant header; BookingCTA hides when no coach + no booking URL |
| 2026-05-19 | 3 new memory entries; MEMORY.md index updated |
Section 1 — Open issues found during the smoke
1.1 Mirror completions don't appear in MMC Quiz Leads dashboard
Goal: Mirror sessions surface in Ilana's wp-admin → Quiz Leads alongside Likert.
Why this matters: D2 wired the webhook so MMC Quiz Leads RECEIVES Mirror completions. But the receiver gates the dual-write to mq_* tables on source.startsWith("traditional-quiz") (archetype-scores.php:110). Mirror payloads (source: "money-mirror") land in WP transients + user_meta _sg_archetype_snapshots — but the Quiz Leads dashboard queries mq_prospects + mq_taken + mq_results exclusively. So Mirror prospects are visible in user LMS profiles but NOT in the leads dashboard.
Effort: 0.5 day.
Trigger: Ilana notices Mirror completions don't appear in Quiz Leads.
Outline:
- Decide whether to write Mirror payloads to
mq_*(would need a different schema since there are no per-trait answers;mq_taken.Response_Format = 'qa'works butmq_resultsper-trait rows don't apply). - OR extend Quiz Leads dashboard data loader to ALSO read from
_sg_archetype_snapshotsuser_meta +sg_archetype_pending_*transients. - Either way: add a "Source" column to Quiz Leads UI so Ilana can filter by Likert vs Mirror vs Game vs Deal vs Realm.
Recommended: option (2) — leave the legacy mq_* schema as Likert-only, extend the dashboard reader. Cleaner separation.
1.2 Brand re-theme TTL is closer to 75s than 30s
Goal: Brand changes visible on the public quiz within 30s (the documented TTL).
Why this matters: Smoke 5 changed TSG primary; first curl after 35s still showed old colour. Took ~75s for the new colour to appear across replicas.
Cause: Next.js fetch cache TTL is per-pod. With 2 replicas, the second pod can serve stale up to 30s after the first pod refreshed. Combined with HTTP response caching at edge, real-world feels longer.
Effort: 0.25 day (mostly diagnosis + a small write-up).
Trigger: An admin complains a publish didn't take effect. Or fix proactively by lowering the TTL.
Outline: Either accept the documented 30-90s window (and update the doc), or drop TTL to 15s + accept the doubled gateway load. Or add ?_ts= cache-buster on the brand fetch when the dashboard saves (active invalidation).
Section 2 — Customer-impacting gaps
2.1 Per-tenant Sessions drilldown in CoachPilot dashboard
Goal: When a non-MMC tenant (TSG, coachpilot, future HSP) wants to see "who took my quiz", they have a CoachPilot-native view comparable to MMC's WP Quiz Leads.
Effort: 2 days.
Trigger: First non-MMC tenant onboards as a paying customer.
Outline: New route /[locale]/coach/quiz-sessions/?tenant=<id> mirroring Quiz Leads UX:
- Paginated list of
assess_sessionsfor the tenant - Filters: date range, format, archetype combo, email contains
- Per-session slide-out (responses, scores, calibration trace if applicable)
- Hot/Warm/Cool/Cold likelihood scoring (port from MMC
quiz-leads-likelihood.php) - Notes table (similar to dashboard tasks notes)
Dependencies: None blocking. Data is already in assess_sessions.
2.2 Federation: unified Ilana view (MMC + platform-native)
Goal: One page shows Ilana's Mirror/Game/Deal/Realm sessions (in assess_sessions) alongside her Legacy Likert sessions (in WP mq_*).
Effort: 3 days, after 1.1 is decided.
Trigger: Once 1.1 is resolved (Mirror → Quiz Leads or via platform reader) the federation pattern is clear.
Outline:
- If platform reader (1.1 option 2): query both Postgres + WP bridge in a single dashboard route.
- If WP receiver expanded (1.1 option 1): leave MMC's WP dashboard as the sole leads source; CoachPilot dashboard shows aggregated counts only.
Section 3 — Soft-launched items needing a cutover decision
3.1 TSG / cutover from Mirror funnel to chooser
Status: /chooser has the 5-card vertical selector. / still serves Mirror directly.
Decision needed: Cut / over to render /chooser content (or 308 redirect), or keep current Mirror-as-default and surface the chooser via a button/link.
Effort: 0.25 day either direction.
Recommendation: Surface chooser via a small "Try other formats" link on Mirror's landing. Don't break Ilana's main funnel.
3.2 MMC traditional-quiz-engine.js → money-quiz-engine.js rename
Status: Deferred from C9 (2026-05-18). Touches MMC theme + 10+ refs.
Effort: 0.5 day (rename + cache purge + sweep refs).
Trigger: Next MMC theme deploy that touches quiz assets.
Outline: Rename file, sweep traditional-quiz-engine references in theme + mu-plugin + any docs, bump theme version, nuclear cache purge.
3.3 ?skip=1 cookie (deferred)
Status: Designed (skip the chooser to land directly on a format) but never built.
Recommendation: Drop entirely. Not needed — direct format URLs (/mirror, /game, etc.) already serve as the deep links.
Section 4 — Bigger architectural initiatives
4.1 Phase 4 — Likert WP → Postgres migration
Goal: MMC Money Quiz (Likert) sessions live in platform assess_sessions instead of WP mq_* tables.
Why: Brings Likert to parity with the other 4 formats. Unlocks unified analytics. Retires WP mq_* dependency long-term.
Effort: 1 week.
Trigger: We commit to retiring mq_* (compliance, ops cost, or a customer ask).
Outline:
- ETL
mq_prospects + mq_taken + mq_resultsintoassess_sessions(read-only mirror). Build conversion: STRIDE-4 master_id → archetype id, per-archetype sum → 0..100. - Wire WP receiver to dual-write for 7 days (write to BOTH
mq_*AND post to platform). - Cut reads: Quiz Leads dashboard data loader reads from
assess_sessions(or platform API). - Decommission WP-side writes; archive
mq_*tables; keep schema. - Update the
reference_mmc_money_quiz_schemamemory entry to markmq_*as decommissioned.
Risks: Data shape mismatch (mq_results is per-trait; assess_sessions is per-session). Need a schema decision for whether to keep per-trait or aggregate.
4.2 Sandbox preview (Phase 7)
Goal: Admin previews a config change in a sandboxed Next.js render before publishing to the public quiz.
Effort: 2 days.
Trigger: An admin pushes a bad config that's visible to live players for 30s while they roll back. We've avoided this so far via snapshots, but it's a known sharp edge.
Outline:
POST /api/v1/assess/framework/{id}/previewreturns adraft_id+ signed URL.quiz.thesynergygroup.ch/?config=draft:<id>&tenant=<t>renders against the draft state (read from a Redis key, not Postgres).- Sandbox is read-only; player completions in sandbox mode never write to
assess_sessionsor fire webhooks.
4.3 tenant_brand snapshot system
Goal: Roll back a brand change atomically (like framework_configs already supports).
Effort: 0.5 day.
Trigger: Someone breaks a tenant's brand and can't recall the previous values.
Outline: Extend framework_config_snapshots schema with target_kind discriminator (framework_config / tenant_brand) OR add a sibling tenant_brand_snapshots table. PATCH endpoint writes snapshot + restore endpoint exists.
4.4 SG-drop auto-remedy CronJob (Lesson #42)
Goal: SG drop after nodepool resize is auto-healed by a periodic CronJob instead of requiring manual repair_worker_sgs.py.
Effort: 1 day.
Trigger: Anytime — proactively prevent the next outage.
Outline: k8s CronJob runs every 15min. Python script lists workers, checks SG attachments, re-attaches the default SG if any are missing. Idempotent — no-op when state is clean. Uses the claude_fix_nlb API key from vault.
4.5 Live Mermaid render in /docs/ route
Goal: Mermaid diagrams in the architecture doc render visually in the dashboard, not just as code blocks.
Effort: 1 hour.
Trigger: Anyone says "the diagrams aren't rendering on app.coachpilot.ch".
Outline: Add mermaid npm dep. In DocRenderer.tsx, detect ```mermaid code blocks via a custom code component override; render via mermaid's runtime API in a useEffect. Lazy-load mermaid to keep initial bundle small.
Section 5 — Security + hardening
5.1 MFA enforcement on admin sessions
Effort: 1 day. Trigger: before signing a non-internal admin to the dashboard.
Clerk supports MFA enforcement per-org via dashboard config. Action: turn on; communicate to existing admins; test.
5.2 IP allowlist on PATCH endpoints
Effort: 0.5 day. Trigger: exposing API tokens to a tenant.
Cloudflare or nginx Ingress rule: PATCH endpoints accept requests only from a tenant-configured allowlist. Stored in tenants.metadata.ip_allowlist[].
5.3 WAF (Cloudflare or AWS)
Effort: 1 day setup + ongoing tuning. Trigger: high-traffic launch or security incident.
Cloudflare WAF in front of api.coachpilot.ch + app.coachpilot.ch + quiz.thesynergygroup.ch. Standard OWASP rule set + per-route custom rules (e.g. /calibrate rate limiting).
5.4 Rate limit /calibrate endpoint
Effort: 0.5 day. Trigger: anyone abuses it for DoS.
Public endpoint with no auth today. Add per-IP rate limit (10 req/s) at gateway level using FastAPI middleware or upstream nginx.
5.5 Secret rotation policy
Effort: 0.5 day setup + quarterly action.
Document quarterly rotation cadence for BEARER_*, archetype_scores_secret per-tenant, MMC_BRIDGE_INTEGRATION_SECRET. Automate where possible (k8s External Secrets + Vault). Manual today.
Section 6 — Polish (deferred until triggered)
6.1 Per-tenant CSP frame-ancestors generation
Effort: 0.5 day. Trigger: new white-label iframe tenant onboards.
Today CSP frame-ancestors is hardcoded in middleware. Make it dynamic: read tenant_brand.website_url server-side, emit per-tenant CSP. Validate URL against a regex (only https://, no wildcards).
6.2 assess_sessions retention policy
Effort: 0.5 day. Trigger: compliance / cost.
Schedule purge of assess_sessions rows older than 24 months (configurable). Implement as a Postgres function + CronJob.
6.3 CLAUDE.md cross-references to architecture doc
Effort: 15 minutes.
MMC and CoachPilot project CLAUDE.md files should reference 10 Projects/MoneyQuiz/docs/ARCHITECTURE.md so new agents discover it.
6.4 BookingCTA hardcoded hex literal sweep
Effort: 0.5 day. Trigger: a tenant configures brand colours and notices BookingCTA stays Ilana-coloured.
BookingCTA has ~20 hardcoded #401405 / #B39F65 / #919C82 literals in inline styles. Convert to Tailwind utilities (bg-brown, text-gold, etc.) so the TenantThemeProvider's CSS variable overrides take effect.
6.5 Background completion of E2E test suite
Effort: 2 days. Trigger: regression confidence drops.
Playwright suite covering: complete each format end-to-end, verify Quiz Leads receives webhook (for tenants that have one), verify GA4 events fire, verify calibration applies. Replace today's manual smoke.
Section 7 — In flight
(none — start populating when work resumes)
Sequencing recommendation
If we had a single engineer-week to invest, in priority order:
- Section 1.1 (Mirror in Quiz Leads, 0.5 d) — the smoke surfaced this. Customer-visible bug.
- Section 2.1 (per-tenant Sessions, 2 d) — biggest customer-blocker for non-MMC tenants.
- Section 4.4 (SG-drop CronJob, 1 d) — eliminates a class of incidents.
- Section 3.1 (TSG
/cutover decision, 0.25 d) — pick a path and move on. - Section 6.4 (BookingCTA hardcoded hex sweep, 0.5 d) — completes the Track B story.
- Section 1.2 (re-theme TTL, 0.25 d) — small but high-visibility.
- Section 5.1 (MFA enforcement, 1 d) — security floor before more tenants.
Total: ~5.5 days. Leaves slack for review + smoke per item.
If we have customer demand for a unified analytics view: re-prioritise to Section 4.1 (Phase 4 Likert migration, 1 week) — biggest single architectural step forward.
How to use this doc
- Adding work: append under the appropriate section. Include Goal / Effort / Trigger / Outline. Date it.
- Starting work: move to Section 7. Update todos in the active session.
- Shipping work: remove from Section 7, append to 13 Open / deferred §Recently shipped with date.
- Re-prioritising: edit Section 8.
- Closing items as won't-do: move to 13 Open / deferred §Removed / dropped scope with reason.
Production state at time of writing
Image registry: Docker Hub synergygroup/
| Component | Tag |
|---|---|
coachpilot-gateway | p10e-mmc-20260519-c5-calibrate |
coachpilot-dashboard | p10e-mmc-20260519-docs-v2 |
moneyquiz-admin (public quiz) | v1.6.8 |
az-adaptive-assessment | v1.5-c2-calibration-20260519 |
Migrations applied: through 0009_tenant_brand_webhooks.sql.
Smoke verified 2026-05-19 (5/5 passing). Calibration tests passing (33/33).