Spec-Version: 1.0 (2026-05-19) Status: Living document — items move between sections as priorities shift Sibling: ARCHITECTURE.md — the architectural reference this roadmap operates against


What this is

A prioritised, dated backlog of work remaining on the Money Quiz platform after the 2026-05-19 bundle (Tracks B, C, D, A, plus polish + post-shipment fixes). Each item has:

  • Goal — what success looks like
  • Effort — rough engineering days
  • Trigger — when this becomes urgent
  • Outline — high-level approach (not a step-by-step plan)
  • Dependencies — what must land first

When you start an item, move it from its current section to ## In flight with the start date. When you ship it, move it to 13 Open / deferred under "Recently shipped" with the ship date.


Section 0 — Done (recent shipment, this entry mostly for context)

DateWhat
2026-05-18C-phase format rename complete (9 phases in one day, zero breakage)
2026-05-19Track D: D4 migration + universal archetype-scores webhook + per-tenant coach notification + brand editor with 4 new fields
2026-05-19Track C: agent-side + gateway-side calibration; 33 regression tests; MMC Likert calibrated via WP bridge
2026-05-19Track B: TenantThemeProvider + TenantBrandContext + dashboard brand preview card
2026-05-19Track A: 14-file architecture doc + 4 Mermaid diagrams + dashboard /docs/ route + sidebar entry + help-panel deep links
2026-05-19Polish: D3 GA4 mq_variant propagation; production smoke (5/5 pass)
2026-05-19Two §7 fixes: middleware writes x-resolved-tenant header; BookingCTA hides when no coach + no booking URL
2026-05-193 new memory entries; MEMORY.md index updated

Section 1 — Open issues found during the smoke

1.1 Mirror completions don't appear in MMC Quiz Leads dashboard

Goal: Mirror sessions surface in Ilana's wp-admin → Quiz Leads alongside Likert.

Why this matters: D2 wired the webhook so MMC Quiz Leads RECEIVES Mirror completions. But the receiver gates the dual-write to mq_* tables on source.startsWith("traditional-quiz") (archetype-scores.php:110). Mirror payloads (source: "money-mirror") land in WP transients + user_meta _sg_archetype_snapshots — but the Quiz Leads dashboard queries mq_prospects + mq_taken + mq_results exclusively. So Mirror prospects are visible in user LMS profiles but NOT in the leads dashboard.

Effort: 0.5 day.

Trigger: Ilana notices Mirror completions don't appear in Quiz Leads.

Outline:

  1. Decide whether to write Mirror payloads to mq_* (would need a different schema since there are no per-trait answers; mq_taken.Response_Format = 'qa' works but mq_results per-trait rows don't apply).
  2. OR extend Quiz Leads dashboard data loader to ALSO read from _sg_archetype_snapshots user_meta + sg_archetype_pending_* transients.
  3. Either way: add a "Source" column to Quiz Leads UI so Ilana can filter by Likert vs Mirror vs Game vs Deal vs Realm.

Recommended: option (2) — leave the legacy mq_* schema as Likert-only, extend the dashboard reader. Cleaner separation.

1.2 Brand re-theme TTL is closer to 75s than 30s

Goal: Brand changes visible on the public quiz within 30s (the documented TTL).

Why this matters: Smoke 5 changed TSG primary; first curl after 35s still showed old colour. Took ~75s for the new colour to appear across replicas.

Cause: Next.js fetch cache TTL is per-pod. With 2 replicas, the second pod can serve stale up to 30s after the first pod refreshed. Combined with HTTP response caching at edge, real-world feels longer.

Effort: 0.25 day (mostly diagnosis + a small write-up).

Trigger: An admin complains a publish didn't take effect. Or fix proactively by lowering the TTL.

Outline: Either accept the documented 30-90s window (and update the doc), or drop TTL to 15s + accept the doubled gateway load. Or add ?_ts= cache-buster on the brand fetch when the dashboard saves (active invalidation).


Section 2 — Customer-impacting gaps

2.1 Per-tenant Sessions drilldown in CoachPilot dashboard

Goal: When a non-MMC tenant (TSG, coachpilot, future HSP) wants to see "who took my quiz", they have a CoachPilot-native view comparable to MMC's WP Quiz Leads.

Effort: 2 days.

Trigger: First non-MMC tenant onboards as a paying customer.

Outline: New route /[locale]/coach/quiz-sessions/?tenant=<id> mirroring Quiz Leads UX:

  • Paginated list of assess_sessions for the tenant
  • Filters: date range, format, archetype combo, email contains
  • Per-session slide-out (responses, scores, calibration trace if applicable)
  • Hot/Warm/Cool/Cold likelihood scoring (port from MMC quiz-leads-likelihood.php)
  • Notes table (similar to dashboard tasks notes)

Dependencies: None blocking. Data is already in assess_sessions.

2.2 Federation: unified Ilana view (MMC + platform-native)

Goal: One page shows Ilana's Mirror/Game/Deal/Realm sessions (in assess_sessions) alongside her Legacy Likert sessions (in WP mq_*).

Effort: 3 days, after 1.1 is decided.

Trigger: Once 1.1 is resolved (Mirror → Quiz Leads or via platform reader) the federation pattern is clear.

Outline:

  • If platform reader (1.1 option 2): query both Postgres + WP bridge in a single dashboard route.
  • If WP receiver expanded (1.1 option 1): leave MMC's WP dashboard as the sole leads source; CoachPilot dashboard shows aggregated counts only.

Section 3 — Soft-launched items needing a cutover decision

3.1 TSG / cutover from Mirror funnel to chooser

Status: /chooser has the 5-card vertical selector. / still serves Mirror directly.

Decision needed: Cut / over to render /chooser content (or 308 redirect), or keep current Mirror-as-default and surface the chooser via a button/link.

Effort: 0.25 day either direction.

Recommendation: Surface chooser via a small "Try other formats" link on Mirror's landing. Don't break Ilana's main funnel.

3.2 MMC traditional-quiz-engine.jsmoney-quiz-engine.js rename

Status: Deferred from C9 (2026-05-18). Touches MMC theme + 10+ refs.

Effort: 0.5 day (rename + cache purge + sweep refs).

Trigger: Next MMC theme deploy that touches quiz assets.

Outline: Rename file, sweep traditional-quiz-engine references in theme + mu-plugin + any docs, bump theme version, nuclear cache purge.

3.3 ?skip=1 cookie (deferred)

Status: Designed (skip the chooser to land directly on a format) but never built.

Recommendation: Drop entirely. Not needed — direct format URLs (/mirror, /game, etc.) already serve as the deep links.


Section 4 — Bigger architectural initiatives

4.1 Phase 4 — Likert WP → Postgres migration

Goal: MMC Money Quiz (Likert) sessions live in platform assess_sessions instead of WP mq_* tables.

Why: Brings Likert to parity with the other 4 formats. Unlocks unified analytics. Retires WP mq_* dependency long-term.

Effort: 1 week.

Trigger: We commit to retiring mq_* (compliance, ops cost, or a customer ask).

Outline:

  1. ETL mq_prospects + mq_taken + mq_results into assess_sessions (read-only mirror). Build conversion: STRIDE-4 master_id → archetype id, per-archetype sum → 0..100.
  2. Wire WP receiver to dual-write for 7 days (write to BOTH mq_* AND post to platform).
  3. Cut reads: Quiz Leads dashboard data loader reads from assess_sessions (or platform API).
  4. Decommission WP-side writes; archive mq_* tables; keep schema.
  5. Update the reference_mmc_money_quiz_schema memory entry to mark mq_* as decommissioned.

Risks: Data shape mismatch (mq_results is per-trait; assess_sessions is per-session). Need a schema decision for whether to keep per-trait or aggregate.

4.2 Sandbox preview (Phase 7)

Goal: Admin previews a config change in a sandboxed Next.js render before publishing to the public quiz.

Effort: 2 days.

Trigger: An admin pushes a bad config that's visible to live players for 30s while they roll back. We've avoided this so far via snapshots, but it's a known sharp edge.

Outline:

  • POST /api/v1/assess/framework/{id}/preview returns a draft_id + signed URL.
  • quiz.thesynergygroup.ch/?config=draft:<id>&tenant=<t> renders against the draft state (read from a Redis key, not Postgres).
  • Sandbox is read-only; player completions in sandbox mode never write to assess_sessions or fire webhooks.

4.3 tenant_brand snapshot system

Goal: Roll back a brand change atomically (like framework_configs already supports).

Effort: 0.5 day.

Trigger: Someone breaks a tenant's brand and can't recall the previous values.

Outline: Extend framework_config_snapshots schema with target_kind discriminator (framework_config / tenant_brand) OR add a sibling tenant_brand_snapshots table. PATCH endpoint writes snapshot + restore endpoint exists.

4.4 SG-drop auto-remedy CronJob (Lesson #42)

Goal: SG drop after nodepool resize is auto-healed by a periodic CronJob instead of requiring manual repair_worker_sgs.py.

Effort: 1 day.

Trigger: Anytime — proactively prevent the next outage.

Outline: k8s CronJob runs every 15min. Python script lists workers, checks SG attachments, re-attaches the default SG if any are missing. Idempotent — no-op when state is clean. Uses the claude_fix_nlb API key from vault.

4.5 Live Mermaid render in /docs/ route

Goal: Mermaid diagrams in the architecture doc render visually in the dashboard, not just as code blocks.

Effort: 1 hour.

Trigger: Anyone says "the diagrams aren't rendering on app.coachpilot.ch".

Outline: Add mermaid npm dep. In DocRenderer.tsx, detect ```mermaid code blocks via a custom code component override; render via mermaid's runtime API in a useEffect. Lazy-load mermaid to keep initial bundle small.


Section 5 — Security + hardening

5.1 MFA enforcement on admin sessions

Effort: 1 day. Trigger: before signing a non-internal admin to the dashboard.

Clerk supports MFA enforcement per-org via dashboard config. Action: turn on; communicate to existing admins; test.

5.2 IP allowlist on PATCH endpoints

Effort: 0.5 day. Trigger: exposing API tokens to a tenant.

Cloudflare or nginx Ingress rule: PATCH endpoints accept requests only from a tenant-configured allowlist. Stored in tenants.metadata.ip_allowlist[].

5.3 WAF (Cloudflare or AWS)

Effort: 1 day setup + ongoing tuning. Trigger: high-traffic launch or security incident.

Cloudflare WAF in front of api.coachpilot.ch + app.coachpilot.ch + quiz.thesynergygroup.ch. Standard OWASP rule set + per-route custom rules (e.g. /calibrate rate limiting).

5.4 Rate limit /calibrate endpoint

Effort: 0.5 day. Trigger: anyone abuses it for DoS.

Public endpoint with no auth today. Add per-IP rate limit (10 req/s) at gateway level using FastAPI middleware or upstream nginx.

5.5 Secret rotation policy

Effort: 0.5 day setup + quarterly action.

Document quarterly rotation cadence for BEARER_*, archetype_scores_secret per-tenant, MMC_BRIDGE_INTEGRATION_SECRET. Automate where possible (k8s External Secrets + Vault). Manual today.


Section 6 — Polish (deferred until triggered)

6.1 Per-tenant CSP frame-ancestors generation

Effort: 0.5 day. Trigger: new white-label iframe tenant onboards.

Today CSP frame-ancestors is hardcoded in middleware. Make it dynamic: read tenant_brand.website_url server-side, emit per-tenant CSP. Validate URL against a regex (only https://, no wildcards).

6.2 assess_sessions retention policy

Effort: 0.5 day. Trigger: compliance / cost.

Schedule purge of assess_sessions rows older than 24 months (configurable). Implement as a Postgres function + CronJob.

6.3 CLAUDE.md cross-references to architecture doc

Effort: 15 minutes.

MMC and CoachPilot project CLAUDE.md files should reference 10 Projects/MoneyQuiz/docs/ARCHITECTURE.md so new agents discover it.

6.4 BookingCTA hardcoded hex literal sweep

Effort: 0.5 day. Trigger: a tenant configures brand colours and notices BookingCTA stays Ilana-coloured.

BookingCTA has ~20 hardcoded #401405 / #B39F65 / #919C82 literals in inline styles. Convert to Tailwind utilities (bg-brown, text-gold, etc.) so the TenantThemeProvider's CSS variable overrides take effect.

6.5 Background completion of E2E test suite

Effort: 2 days. Trigger: regression confidence drops.

Playwright suite covering: complete each format end-to-end, verify Quiz Leads receives webhook (for tenants that have one), verify GA4 events fire, verify calibration applies. Replace today's manual smoke.


Section 7 — In flight

(none — start populating when work resumes)


Sequencing recommendation

If we had a single engineer-week to invest, in priority order:

  1. Section 1.1 (Mirror in Quiz Leads, 0.5 d) — the smoke surfaced this. Customer-visible bug.
  2. Section 2.1 (per-tenant Sessions, 2 d) — biggest customer-blocker for non-MMC tenants.
  3. Section 4.4 (SG-drop CronJob, 1 d) — eliminates a class of incidents.
  4. Section 3.1 (TSG / cutover decision, 0.25 d) — pick a path and move on.
  5. Section 6.4 (BookingCTA hardcoded hex sweep, 0.5 d) — completes the Track B story.
  6. Section 1.2 (re-theme TTL, 0.25 d) — small but high-visibility.
  7. Section 5.1 (MFA enforcement, 1 d) — security floor before more tenants.

Total: ~5.5 days. Leaves slack for review + smoke per item.

If we have customer demand for a unified analytics view: re-prioritise to Section 4.1 (Phase 4 Likert migration, 1 week) — biggest single architectural step forward.


How to use this doc

  • Adding work: append under the appropriate section. Include Goal / Effort / Trigger / Outline. Date it.
  • Starting work: move to Section 7. Update todos in the active session.
  • Shipping work: remove from Section 7, append to 13 Open / deferred §Recently shipped with date.
  • Re-prioritising: edit Section 8.
  • Closing items as won't-do: move to 13 Open / deferred §Removed / dropped scope with reason.

Production state at time of writing

Image registry: Docker Hub synergygroup/

ComponentTag
coachpilot-gatewayp10e-mmc-20260519-c5-calibrate
coachpilot-dashboardp10e-mmc-20260519-docs-v2
moneyquiz-admin (public quiz)v1.6.8
az-adaptive-assessmentv1.5-c2-calibration-20260519

Migrations applied: through 0009_tenant_brand_webhooks.sql.

Smoke verified 2026-05-19 (5/5 passing). Calibration tests passing (33/33).


Back to architecture index