CoachPilot — Your AI-Powered Coaching Business

← Back to index | ← 05 Configuration flow | 07 Quiz app runtime →

How archetype scores are computed, calibrated, and finalised across all five formats. Reflects post-D-phase state (2026-05-19) where calibration parity is achieved end-to-end.

Three implementations, one contract

flowchart TB
    JSON["framework_definitions/money_archetypes.json<br/>scoring.engine.archetype_calibration.rules"]

    subgraph TS["TypeScript (moneyquiz-app)"]
        TS_FN["applyArchetypeCalibration()<br/>score-analyser.ts"]
    end
    subgraph AG["Agent Python (adaptive-assessment)"]
        AG_FN["apply_archetype_calibration()<br/>scoring_calibration.py"]
    end
    subgraph GW["Gateway Python (api-gateway)"]
        GW_FN["apply_archetype_calibration()<br/>scoring_calibration.py"]
    end

    JSON --> TS_FN
    JSON --> AG_FN
    JSON --> GW_FN

    MIRROR["Money Mirror /api/quiz/results"] --> TS_FN
    GAME["Money Game /api/quiz/results"] --> TS_FN
    DEAL["Money Deal generate_results"] --> AG_FN
    REALM["Money Realm generate_results"] --> AG_FN
    LIKERT["Money Quiz (WP) archetype-scores.php"] --> GW_FN

    TS_FN -.->|"calibrated scores<br/>(must be byte-identical<br/>for same rule set + input)"| EQ((Identical<br/>output))
    AG_FN -.-> EQ
    GW_FN -.-> EQ

    classDef src fill:#fff3e0,stroke:#f57c00
    classDef impl fill:#e3f2fd,stroke:#1976d2
    classDef format fill:#e8f5e9,stroke:#2e7d32
    classDef contract fill:#f3e5f5,stroke:#7b1fa2
    class JSON src
    class TS_FN,AG_FN,GW_FN impl
    class MIRROR,GAME,DEAL,REALM,LIKERT format
    class EQ contract

The contract is cross-language byte-equivalence: the same input scores + rule set must produce the same output regardless of which implementation runs. Any change to the rule semantics must update all three in lockstep and add a fixture test that compares all three.

The three score-derivation pipelines

Format(s)	Score derivation	Where it lives
Money Mirror (`qa`)	Agent answers `process_answer` to update session scores; final scores POSTed to `/api/quiz/results` from the client	Agent `quiz_engine` (raw scores) + TS `analyseScores` (calibration + analysis)
Money Game (`game`)	Game session in moneyquiz-app accumulates per-archetype during play	TS `analyseScores`
Money Deal + Realm (`deal`, `realm`)	Agent NATS session accumulates during card interactions; `generate_results` returns final scores	Agent `quiz_engine.generate_results`
Money Quiz (`likert`)	WordPress JS sums weighted Likert values per trait, maps to archetypes	WP JS (raw) + gateway `/calibrate` (calibration)

Three different paths. One calibration engine, ported to three languages.

Severity bands

Each archetype has a set of severity bands defined in framework_definitions/money_archetypes.json (and copied into the agent's scoring_config.py for runtime use):

{
  "hero": {
    "idealScore": 60,
    "classification": "good_high",
    "bands": [
      {"min":0,  "max":19,  "level":"critical", "meaning":"Your inner drive is waiting to be awakened"},
      {"min":20, "max":39,  "level":"red",      "meaning":"Your courage has more to give"},
      {"min":40, "max":49,  "level":"amber",    "meaning":"You're building momentum"},
      {"min":50, "max":70,  "level":"green",    "meaning":"Strong drive and determination"},
      {"min":71, "max":80,  "level":"amber",    "meaning":"Your competitive drive is very strong"},
      {"min":81, "max":90,  "level":"red",      "meaning":"Drive may be overshadowing other areas"},
      {"min":91, "max":100, "level":"critical", "meaning":"Your intensity is exceptional"}
    ]
  },
  ...
}

Severity levels: critical < red < amber < green. The meaning strings feed the LLM narrative.

classification is one of good_high / good_low / neutral — used by the analyser to decide whether to flag deviation above or below ideal.

Tier-2 archetype calibration

A post-scoring rule engine. Adjusts raw scores based on co-occurrence conditions BEFORE sorting into primary/secondary/tertiary.

Rule format

In framework_definitions/money_archetypes.json under scoring.engine.archetype_calibration.rules:

{
  "id": "victim_martyr_cooccurrence",
  "description": "Real bios show Victim and Martyr almost always co-occur. Q&A Strategy A captures Victim well but Martyr can sit just below the top-3 cutoff. +10 Martyr when both signals present pulls the right top-3. Validated against n=26 dataset (bio-recall 0.854 -> 0.875).",
  "conditions": [
    {"archetype": "victim", "min": 50},
    {"archetype": "martyr", "min": 15}
  ],
  "effect": {"archetype": "martyr", "delta": 10}
}

Rule semantics

conditions is AND: ALL must be met (each archetype value satisfies min and/or max).
effect.delta is added to effect.archetype's score.
Result is clamped to [0, 100].
Rules are evaluated in array order. Later rules see the post-adjustment state from earlier rules.

Rules in production today

Rule id	Conditions	Effect
`victim_martyr_cooccurrence`	victim ≥ 50, martyr ≥ 15	martyr +10

(only one rule shipped today; the framework permits N)

The rule lives in the platform default JSON, so it applies to ALL tenants. Tenants can override by patching their scoring_overrides.engine.archetype_calibration.rules (wholesale replace — see 05 Configuration flow).

Three implementations, one contract

The rule engine is implemented three times. All three MUST produce byte-identical outputs for the same input + rule set.

1. TypeScript — `applyArchetypeCalibration`

Used by Mirror + Game (any format that POSTs to /api/quiz/results).

moneyquiz-app/src/lib/score-analyser.ts:

export function applyArchetypeCalibration(
  scores: Record<string, number>,
  engineConfig?: EngineConfig,
): Record<string, number> {
  const rules = engineConfig?.archetype_calibration?.rules;
  if (!rules || rules.length === 0) return scores;
  const out: Record<string, number> = { ...scores };
  for (const rule of rules) {
    const allMet = rule.conditions.every(c => {
      const v = out[c.archetype] ?? 0;
      if (c.min !== undefined && v < c.min) return false;
      if (c.max !== undefined && v > c.max) return false;
      return true;
    });
    if (allMet) {
      const cur = out[rule.effect.archetype] ?? 0;
      out[rule.effect.archetype] = Math.max(0, Math.min(100, cur + rule.effect.delta));
    }
  }
  return out;
}

2. Agent Python — `apply_archetype_calibration`

Used by Deal + Realm (any format that goes through agent generate_results).

agents/adaptive-assessment/scoring_calibration.py:

def apply_archetype_calibration(
    scores: dict[str, float],
    framework: dict[str, Any] | None,
) -> dict[str, float]:
    # ... mirrors the TS implementation exactly

Hooked into quiz_engine.py::generate_results:

from scoring_calibration import apply_archetype_calibration
scores = apply_archetype_calibration(raw_scores, framework)
session["scores"] = scores  # persist back so CRM sync sees calibrated values
sorted_dims = sorted(dimensions, key=lambda d: scores.get(d["id"], 0), reverse=True)

3. Gateway Python — `apply_archetype_calibration` (+ rule trace)

Used by the MMC WP Likert bridge. Exposed via POST /api/v1/assess/framework/{id}/calibrate.

api-gateway/scoring_calibration.py:

def apply_archetype_calibration(
    scores: dict[str, float],
    framework: dict[str, Any] | None,
) -> tuple[dict[str, float], list[str]]:
    # ... mirrors TS + agent exactly
    # PLUS returns the list of rule IDs that fired (for observability)

Why three?

Each runtime has its own boundary. Mirror is server-side TS in moneyquiz-app. Deal is server-side Python in the agent (NATS). Likert is in WordPress PHP that calls out to the gateway via HTTP.

Cross-language equivalence is the contract. Anyone changing the engine MUST update all three in lockstep AND add a regression test covering the change.

Per-format coverage (post-D-phase)

Format	Calibration applied?	Where	Verified
Money Quiz (Likert, MMC WP)	✅ Yes	Gateway `/calibrate` via WP bridge	C5 smoke 2026-05-19 (martyr 0.22 → 0.32)
Money Mirror (qa)	✅ Yes	TS `analyseScores`	C1 production data 2026-05-17
Money Game	✅ Yes	TS `analyseScores` (same route)	C1 production data 2026-05-17
Money Deal	✅ Yes	Agent `generate_results`	C2-C3 agent rollout 2026-05-19
Money Realm	✅ Yes	Same as Deal	Same as Deal

Scoring engine knobs

Beyond calibration, the engine block can configure:

{
  "engine": {
    "archetype_calibration": { "rules": [...] },
    "severity_band_offset": 0,
    "act_weights": { "self_awareness": 1.0, "decision_making": 1.2 },
    "slider_positions": { ... },
    "confidence_boost": 5,
    "coverage_target": 0.85,
    "framework_status": "active"
  }
}

These are read by score-analyser.ts and the agent's quiz_engine. Today most tenants leave them at defaults. The Scoring tab in the dashboard exposes them for advanced tuning.

Adding a new rule

Validate the hypothesis against the Tier-2 Personas dataset (Personas/tier2_eval.json). The rule must improve bio-recall without regressing other personas.
Add to platform default JSON: edit framework_definitions/money_archetypes.json under scoring.engine.archetype_calibration.rules. Add id, description (with the validation evidence), conditions, effect.
Mirror into agent's JSON: agents/adaptive-assessment/framework_definitions/money_archetypes.json (the agent has its own copy synced via infra/scripts/sync-framework-jsons.sh).
No code change needed — all three engines read rules from the framework JSON.
Build + push new gateway + agent images so the baked-in JSON has the rule.
Smoke with the calibrate endpoint to verify the rule fires for a Tier-2 fixture.
Document in this section under §Rules in production today.

Per the methodology in reference_money_quiz_design_strategy.md: add ≤2 rules per validation cycle. Tier-3 v3 regressed bio-recall from 0.875 → 0.854 when 7 items were added at once.

Smoke endpoint

Public, no auth:

curl -X POST "https://api.coachpilot.ch/api/v1/assess/framework/money_archetypes/calibrate?client=<tenant>" \
  -H "Content-Type: application/json" \
  -d '{"scores": {"hero": 40, "victim": 65, "martyr": 22, ...}}'

Response:

{
  "scores": {"hero": 40, "victim": 65, "martyr": 32, ...},
  "applied_rules": ["victim_martyr_cooccurrence"],
  "framework": "money_archetypes"
}

Use this to debug per-tenant rule behaviour without running a real quiz.

→ 07 Quiz app runtime — public quiz Next.js runtime, theme provider, brand context.

Three implementations, one contract

The three score-derivation pipelines

Severity bands

Tier-2 archetype calibration

Rule format

Rule semantics

Rules in production today

Three implementations, one contract

1. TypeScript — applyArchetypeCalibration

2. Agent Python — apply_archetype_calibration

3. Gateway Python — apply_archetype_calibration (+ rule trace)

Why three?

Per-format coverage (post-D-phase)

Scoring engine knobs

Adding a new rule

Smoke endpoint

Next

1. TypeScript — `applyArchetypeCalibration`

2. Agent Python — `apply_archetype_calibration`

3. Gateway Python — `apply_archetype_calibration` (+ rule trace)