← Back to index | ← 05 Configuration flow | 07 Quiz app runtime →
How archetype scores are computed, calibrated, and finalised across all five formats. Reflects post-D-phase state (2026-05-19) where calibration parity is achieved end-to-end.
Three implementations, one contract
flowchart TB
JSON["framework_definitions/money_archetypes.json<br/>scoring.engine.archetype_calibration.rules"]
subgraph TS["TypeScript (moneyquiz-app)"]
TS_FN["applyArchetypeCalibration()<br/>score-analyser.ts"]
end
subgraph AG["Agent Python (adaptive-assessment)"]
AG_FN["apply_archetype_calibration()<br/>scoring_calibration.py"]
end
subgraph GW["Gateway Python (api-gateway)"]
GW_FN["apply_archetype_calibration()<br/>scoring_calibration.py"]
end
JSON --> TS_FN
JSON --> AG_FN
JSON --> GW_FN
MIRROR["Money Mirror /api/quiz/results"] --> TS_FN
GAME["Money Game /api/quiz/results"] --> TS_FN
DEAL["Money Deal generate_results"] --> AG_FN
REALM["Money Realm generate_results"] --> AG_FN
LIKERT["Money Quiz (WP) archetype-scores.php"] --> GW_FN
TS_FN -.->|"calibrated scores<br/>(must be byte-identical<br/>for same rule set + input)"| EQ((Identical<br/>output))
AG_FN -.-> EQ
GW_FN -.-> EQ
classDef src fill:#fff3e0,stroke:#f57c00
classDef impl fill:#e3f2fd,stroke:#1976d2
classDef format fill:#e8f5e9,stroke:#2e7d32
classDef contract fill:#f3e5f5,stroke:#7b1fa2
class JSON src
class TS_FN,AG_FN,GW_FN impl
class MIRROR,GAME,DEAL,REALM,LIKERT format
class EQ contract
The contract is cross-language byte-equivalence: the same input scores + rule set must produce the same output regardless of which implementation runs. Any change to the rule semantics must update all three in lockstep and add a fixture test that compares all three.
The three score-derivation pipelines
| Format(s) | Score derivation | Where it lives |
|---|---|---|
Money Mirror (qa) | Agent answers process_answer to update session scores; final scores POSTed to /api/quiz/results from the client | Agent quiz_engine (raw scores) + TS analyseScores (calibration + analysis) |
Money Game (game) | Game session in moneyquiz-app accumulates per-archetype during play | TS analyseScores |
Money Deal + Realm (deal, realm) | Agent NATS session accumulates during card interactions; generate_results returns final scores | Agent quiz_engine.generate_results |
Money Quiz (likert) | WordPress JS sums weighted Likert values per trait, maps to archetypes | WP JS (raw) + gateway /calibrate (calibration) |
Three different paths. One calibration engine, ported to three languages.
Severity bands
Each archetype has a set of severity bands defined in framework_definitions/money_archetypes.json (and copied into the agent's scoring_config.py for runtime use):
{
"hero": {
"idealScore": 60,
"classification": "good_high",
"bands": [
{"min":0, "max":19, "level":"critical", "meaning":"Your inner drive is waiting to be awakened"},
{"min":20, "max":39, "level":"red", "meaning":"Your courage has more to give"},
{"min":40, "max":49, "level":"amber", "meaning":"You're building momentum"},
{"min":50, "max":70, "level":"green", "meaning":"Strong drive and determination"},
{"min":71, "max":80, "level":"amber", "meaning":"Your competitive drive is very strong"},
{"min":81, "max":90, "level":"red", "meaning":"Drive may be overshadowing other areas"},
{"min":91, "max":100, "level":"critical", "meaning":"Your intensity is exceptional"}
]
},
...
}
Severity levels: critical < red < amber < green. The meaning strings feed the LLM narrative.
classification is one of good_high / good_low / neutral — used by the analyser to decide whether to flag deviation above or below ideal.
Tier-2 archetype calibration
A post-scoring rule engine. Adjusts raw scores based on co-occurrence conditions BEFORE sorting into primary/secondary/tertiary.
Rule format
In framework_definitions/money_archetypes.json under scoring.engine.archetype_calibration.rules:
{
"id": "victim_martyr_cooccurrence",
"description": "Real bios show Victim and Martyr almost always co-occur. Q&A Strategy A captures Victim well but Martyr can sit just below the top-3 cutoff. +10 Martyr when both signals present pulls the right top-3. Validated against n=26 dataset (bio-recall 0.854 -> 0.875).",
"conditions": [
{"archetype": "victim", "min": 50},
{"archetype": "martyr", "min": 15}
],
"effect": {"archetype": "martyr", "delta": 10}
}
Rule semantics
conditionsis AND: ALL must be met (eacharchetypevalue satisfiesminand/ormax).effect.deltais added toeffect.archetype's score.- Result is clamped to
[0, 100]. - Rules are evaluated in array order. Later rules see the post-adjustment state from earlier rules.
Rules in production today
| Rule id | Conditions | Effect |
|---|---|---|
victim_martyr_cooccurrence | victim ≥ 50, martyr ≥ 15 | martyr +10 |
(only one rule shipped today; the framework permits N)
The rule lives in the platform default JSON, so it applies to ALL tenants. Tenants can override by patching their scoring_overrides.engine.archetype_calibration.rules (wholesale replace — see 05 Configuration flow).
Three implementations, one contract
The rule engine is implemented three times. All three MUST produce byte-identical outputs for the same input + rule set.
1. TypeScript — applyArchetypeCalibration
Used by Mirror + Game (any format that POSTs to /api/quiz/results).
moneyquiz-app/src/lib/score-analyser.ts:
export function applyArchetypeCalibration(
scores: Record<string, number>,
engineConfig?: EngineConfig,
): Record<string, number> {
const rules = engineConfig?.archetype_calibration?.rules;
if (!rules || rules.length === 0) return scores;
const out: Record<string, number> = { ...scores };
for (const rule of rules) {
const allMet = rule.conditions.every(c => {
const v = out[c.archetype] ?? 0;
if (c.min !== undefined && v < c.min) return false;
if (c.max !== undefined && v > c.max) return false;
return true;
});
if (allMet) {
const cur = out[rule.effect.archetype] ?? 0;
out[rule.effect.archetype] = Math.max(0, Math.min(100, cur + rule.effect.delta));
}
}
return out;
}
2. Agent Python — apply_archetype_calibration
Used by Deal + Realm (any format that goes through agent generate_results).
agents/adaptive-assessment/scoring_calibration.py:
def apply_archetype_calibration(
scores: dict[str, float],
framework: dict[str, Any] | None,
) -> dict[str, float]:
# ... mirrors the TS implementation exactly
Hooked into quiz_engine.py::generate_results:
from scoring_calibration import apply_archetype_calibration
scores = apply_archetype_calibration(raw_scores, framework)
session["scores"] = scores # persist back so CRM sync sees calibrated values
sorted_dims = sorted(dimensions, key=lambda d: scores.get(d["id"], 0), reverse=True)
3. Gateway Python — apply_archetype_calibration (+ rule trace)
Used by the MMC WP Likert bridge. Exposed via POST /api/v1/assess/framework/{id}/calibrate.
api-gateway/scoring_calibration.py:
def apply_archetype_calibration(
scores: dict[str, float],
framework: dict[str, Any] | None,
) -> tuple[dict[str, float], list[str]]:
# ... mirrors TS + agent exactly
# PLUS returns the list of rule IDs that fired (for observability)
Why three?
Each runtime has its own boundary. Mirror is server-side TS in moneyquiz-app. Deal is server-side Python in the agent (NATS). Likert is in WordPress PHP that calls out to the gateway via HTTP.
Cross-language equivalence is the contract. Anyone changing the engine MUST update all three in lockstep AND add a regression test covering the change.
Per-format coverage (post-D-phase)
| Format | Calibration applied? | Where | Verified |
|---|---|---|---|
| Money Quiz (Likert, MMC WP) | ✅ Yes | Gateway /calibrate via WP bridge | C5 smoke 2026-05-19 (martyr 0.22 → 0.32) |
| Money Mirror (qa) | ✅ Yes | TS analyseScores | C1 production data 2026-05-17 |
| Money Game | ✅ Yes | TS analyseScores (same route) | C1 production data 2026-05-17 |
| Money Deal | ✅ Yes | Agent generate_results | C2-C3 agent rollout 2026-05-19 |
| Money Realm | ✅ Yes | Same as Deal | Same as Deal |
Scoring engine knobs
Beyond calibration, the engine block can configure:
{
"engine": {
"archetype_calibration": { "rules": [...] },
"severity_band_offset": 0,
"act_weights": { "self_awareness": 1.0, "decision_making": 1.2 },
"slider_positions": { ... },
"confidence_boost": 5,
"coverage_target": 0.85,
"framework_status": "active"
}
}
These are read by score-analyser.ts and the agent's quiz_engine. Today most tenants leave them at defaults. The Scoring tab in the dashboard exposes them for advanced tuning.
Adding a new rule
- Validate the hypothesis against the Tier-2 Personas dataset (
Personas/tier2_eval.json). The rule must improve bio-recall without regressing other personas. - Add to platform default JSON: edit
framework_definitions/money_archetypes.jsonunderscoring.engine.archetype_calibration.rules. Addid,description(with the validation evidence),conditions,effect. - Mirror into agent's JSON:
agents/adaptive-assessment/framework_definitions/money_archetypes.json(the agent has its own copy synced viainfra/scripts/sync-framework-jsons.sh). - No code change needed — all three engines read rules from the framework JSON.
- Build + push new gateway + agent images so the baked-in JSON has the rule.
- Smoke with the calibrate endpoint to verify the rule fires for a Tier-2 fixture.
- Document in this section under §Rules in production today.
Per the methodology in reference_money_quiz_design_strategy.md: add ≤2 rules per validation cycle. Tier-3 v3 regressed bio-recall from 0.875 → 0.854 when 7 items were added at once.
Smoke endpoint
Public, no auth:
curl -X POST "https://api.coachpilot.ch/api/v1/assess/framework/money_archetypes/calibrate?client=<tenant>" \
-H "Content-Type: application/json" \
-d '{"scores": {"hero": 40, "victim": 65, "martyr": 22, ...}}'
Response:
{
"scores": {"hero": 40, "victim": 65, "martyr": 32, ...},
"applied_rules": ["victim_martyr_cooccurrence"],
"framework": "money_archetypes"
}
Use this to debug per-tenant rule behaviour without running a real quiz.
Next
→ 07 Quiz app runtime — public quiz Next.js runtime, theme provider, brand context.