The PSA's verdict and the meta-analysts' verdicts — every record, every layer.
Every pairwise verdict ships with a confidence value (0.55 coin flip → 1.0 unambiguous). This page shows two voices judging the same 96 sessions and 12 journeys: the PSA agent's verdict (rendered on the Gemini substrate, the original pipeline default — one verdict per record), and the meta-analyst verdicts (independently rendered across three judge families: Google Gemini, OpenAI GPT-5.4, Anthropic Claude-opus-4-6 — three verdicts per record). Colors: blue = gpt-5.4 won, yellow = Claude won. Color intensity tracks confidence (light = coin flip, saturated = unambiguous).
§1 · Headline · the cross-family aggregate
Same evidence. Three meta-analyst families. Sessions split, journeys converge.
Four judging layers — two PSA voice, two meta-analyst voice — each independently rendered by three judge families. Per-session the three families call different winners; per-journey they converge on gpt-5.4.
Judging layer
Gemini gpt / claude
OpenAI gpt / claude
Anthropic gpt / claude
Consensus
Unanimous
Session pairwise · PSA voice (n=48)
21 / 27
30 / 18
15 / 33
24 / 24
24 / 48
Session meta-analyst (n=48)
21 / 27
30 / 18
15 / 33
24 / 24
24 / 48
Journey pairwise · PSA voice (n=12)
7 / 5
8 / 4
8 / 4
8 / 4
11 / 12
Journey meta-analyst (n=12)
6 / 6
7 / 5
7 / 5
7 / 5
9 / 12
Judge confidence personalities. Across the 48 session pairwise records:
Gemini is bimodal — 43/48 at ≥0.9 plus 5 fallbacks at session pairwise + 10 at session meta. Either confident or honestly refusing.
OpenAI is mid-band — only 9/48 verdicts at ≥0.9, but 0 fallbacks. Always commits.
Anthropic is cautious — 5/48 at ≥0.9 and 0 fallbacks across both session layers.
Same evidence, three different judging dispositions.
Every PSA × every scenario × every voice, at a glance.
Rows are the 12 PSAs. Columns are the 4 sessions per PSA (Shopping S1, Shopping S2, Support S2 or S4, Support S3 or S5) plus the ▶ Journey verdict on the right — the journey cell is the aggregate of the 4 session experiences and is visually elevated (wider cell, rust accent border) to signal that role. Each cell carries two stacked views. The top half shows the PSA agent's verdict (Gemini substrate) — winner model + confidence, with cell fill color = winner and intensity = confidence. The bottom strip shows the 3 meta-analyst verdicts on the same record — three mini squares (Ge / Op / An), each colored by what that judge family voted. Blue = gpt-5.4 won; yellow = claude-opus-4-6 won. Coin-flip picks (the judge couldn't differentiate and was forced to pick at 0.55 confidence with no reasoning) appear as the lightest cell tint, with "0.55 · coin flip" in the confidence value.
Meta-analyst family squares (bottom strip of each cell):Ge Google GeminiOp OpenAI GPT-5.4An Anthropic claude-opus-4-6
Session 1
Session 2
Session 3
Session 4
▶ Journey
EDGE-01
David Chen
S1
gpt-5.40.95
Ge
Op
An
S2
claude-opus-4-60.95
Ge
Op
An
Support-S2
gpt-5.40.90
Ge
Op
An
Support-S3
claude-opus-4-60.95
Ge
Op
An
Journey
gpt-5.40.90
Ge
Op
An
EDGE-02
Marisol Vega
S1
gpt-5.40.90
Ge
Op
An
S4
claude-opus-4-60.95
Ge
Op
An
Support-S2
gpt-5.40.90
Ge
Op
An
Support-S4
gpt-5.40.95
Ge
Op
An
Journey
claude-opus-4-60.55 · coin flip
Ge
Op
An
EDGE-03
Priya Sharma
S2
gpt-5.40.55 · coin flip
Ge
Op
An
S3
claude-opus-4-61.00
Ge
Op
An
Support-S2
claude-opus-4-60.55 · coin flip
Ge
Op
An
Support-S4
claude-opus-4-61.00
Ge
Op
An
Journey
claude-opus-4-60.90
Ge
Op
An
EDGE-04
Carlos Mendoza
S1
claude-opus-4-60.95
Ge
Op
An
S5
claude-opus-4-60.95
Ge
Op
An
Support-S1
claude-opus-4-60.90
Ge
Op
An
Support-S3
claude-opus-4-61.00
Ge
Op
An
Journey
claude-opus-4-60.55 · coin flip
Ge
Op
An
EDGE-05
Omar Abboud
S2
gpt-5.40.90
Ge
Op
An
S5
claude-opus-4-60.90
Ge
Op
An
Support-S3
gpt-5.40.95
Ge
Op
An
Support-S5
gpt-5.40.90
Ge
Op
An
Journey
gpt-5.40.90
Ge
Op
An
EDGE-06
Yuki Tanaka
S4
gpt-5.40.95
Ge
Op
An
S5
claude-opus-4-60.90
Ge
Op
An
Support-S1
claude-opus-4-60.95
Ge
Op
An
Support-S2
gpt-5.40.95
Ge
Op
An
Journey
gpt-5.40.55 · coin flip
Ge
Op
An
EDGE-07
Victor Nakamura
S3
gpt-5.40.55 · coin flip
Ge
Op
An
S5
gpt-5.41.00
Ge
Op
An
Support-S1
claude-opus-4-60.90
Ge
Op
An
Support-S5
claude-opus-4-60.95
Ge
Op
An
Journey
gpt-5.40.95
Ge
Op
An
EDGE-08
Tasha Bell
S1
gpt-5.41.00
Ge
Op
An
S3
gpt-5.41.00
Ge
Op
An
Support-S2
claude-opus-4-60.95
Ge
Op
An
Support-S5
claude-opus-4-60.90
Ge
Op
An
Journey
gpt-5.41.00
Ge
Op
An
EDGE-09
Brandon Reilly
S1
gpt-5.40.95
Ge
Op
An
S5
gpt-5.41.00
Ge
Op
An
Support-S3
claude-opus-4-60.55 · coin flip
Ge
Op
An
Support-S5
claude-opus-4-60.90
Ge
Op
An
Journey
gpt-5.41.00
Ge
Op
An
EDGE-10
Denise Harper
S3
gpt-5.40.90
Ge
Op
An
S5
claude-opus-4-60.55 · coin flip
Ge
Op
An
Support-S4
claude-opus-4-61.00
Ge
Op
An
Support-S5
gpt-5.40.90
Ge
Op
An
Journey
gpt-5.40.95
Ge
Op
An
EDGE-11
Nicole Walker
S1
claude-opus-4-60.90
Ge
Op
An
S4
gpt-5.40.90
Ge
Op
An
Support-S1
claude-opus-4-61.00
Ge
Op
An
Support-S2
claude-opus-4-60.95
Ge
Op
An
Journey
claude-opus-4-60.55 · coin flip
Ge
Op
An
EDGE-12
Raylene Begay
S3
claude-opus-4-61.00
Ge
Op
An
S4
claude-opus-4-60.90
Ge
Op
An
Support-S1
claude-opus-4-60.95
Ge
Op
An
Support-S2
gpt-5.40.90
Ge
Op
An
Journey
claude-opus-4-60.85
Ge
Op
An
gpt-5.4 winning · confidence intensity
0.55 coin flip1.0 unambiguous
Claude winning · confidence intensity
0.55 coin flip1.0 unambiguous
Reading the heat map. The top of each cell (large text + cell fill) is what the PSA agent said. The bottom strip (3 colored mini-squares: Ge / Op / An) is what each meta-analyst judge family said about that same record.
Where the cell fill and the meta-strip agree, the row is visually solid — the PSA verdict was independently confirmed across all three judge families.
Where they diverge, the cell is one color but the meta-strip shows the other color, sometimes split — these are the records where the PSA persona's read of the session was overridden by what the cross-family meta-analysts saw structurally.
Pattern at a glance: session rows often have mixed meta-strips (judges split on session-level taste); the journey column mostly shows solid same-colored meta-strips (judges converge on the structural evidence).
Analyst-voice meta-pass per session. Independent objective UX-critic verdict per family.
gpt 21 · cl 27 · ≥0.9: 38/48 · fallbacks: 10
gpt 30 · cl 18 · ≥0.9: 28/48 · fallbacks: 0
gpt 15 · cl 33 · ≥0.9: 12/48 · fallbacks: 0
Strip plot · confidence per session per PSA per family
Each PSA row has 3 vertical lanes — top: Gemini · middle: OpenAI · bottom: Anthropic. Color = winner. Dashed ring = Gemini fallback. Sessions where families agree show tight same-colored dot clusters; sessions where families split show mixed colors at different X.
Same lane convention. The analyst voice surfaces structural disagreement that PSA persona voice can mask.
What this shows. Within most PSA rows at session level, the 3 family lanes carry different colors at different X positions — same evidence, different verdicts. Gemini's row often shows a fallback ring (dashed); the other families' lanes usually commit at mid-confidence. Tight same-direction clusters at high confidence do exist (EDGE-08 Tasha, EDGE-11 Nicole) — those are the unanimous-at-session cases.
Meta-analyst re-reads the full journey arc. Gemini emitted 2 forced-choice fallbacks (EDGE-07, EDGE-09); both were independently resolved as gpt-5.4 wins by OpenAI and Anthropic.
gpt 6 · cl 6 · ≥0.9: 8/12 · fallbacks: 2
gpt 7 · cl 5 · ≥0.9: 4/12 · fallbacks: 0
gpt 7 · cl 5 · ≥0.9: 2/12 · fallbacks: 0
Strip plot · confidence per journey per PSA per family
At journey level, the 3 family lanes usually point the same direction — tight same-colored clusters.
Same lane convention. EDGE-07 and EDGE-09 show Gemini fallback rings; the other two families produced substantive gpt-5.4 verdicts.
What this shows. Within a single PSA row at journey level, the 3 family lanes usually agree — same color, similar X. 11 of 12 journey pairwise verdicts are unanimous across the 3 families; 9 of 12 journey meta-analyst verdicts are unanimous. The journey arc carries enough concrete evidence (role-break events, consistency under pressure, structural failures) to force convergence regardless of judge taste.
§5 · Per-PSA distribution map
Every record, side-by-side: PSA verdict and meta-analyst verdicts.
For each of the 12 PSAs, one card with 5 record rows: the journey verdict plus 4 session verdicts. Each row shows the PSA agent's verdict (single chip, Gemini substrate) next to the meta-analyst verdicts (three chips for Gemini · OpenAI · Anthropic, plus an agreement badge: 3/3 green = unanimous · 2/3 amber = majority · 1/3 red = full split).
claude-opus-4-6 winsgpt-5.4 winsdashed border = forced-choice fallbackGe = Gemini · Op = OpenAI · An = Anthropic
EDGE-01
David Chen
Journey pairwise
gpt-5.4
0.90
Session pairwise (S1–S4)
S1
gpt-5.4
0.95
S2
claude-opus-4-6
0.95
Support-S2
gpt-5.4
0.90
Support-S3
claude-opus-4-6
0.95
Session meta-analyst (S1–S4)
S1
Ge0.55 coin
Op0.95
An0.82
2/3
S2
Ge1.00
Op0.94
An0.88
3/3
Support-S2
Ge0.95
Op0.90
An0.82
2/3
Support-S3
Ge0.95
Op0.82
An0.78
3/3
Journey meta-analyst
Ge0.95
Op0.90
An0.72
3/3
EDGE-02
Marisol Vega
Journey pairwise
claude-opus-4-6
0.55 · coin flip
Session pairwise (S1–S4)
S1
gpt-5.4
0.90
S4
claude-opus-4-6
0.95
Support-S2
gpt-5.4
0.90
Support-S4
gpt-5.4
0.95
Session meta-analyst (S1–S4)
S1
Ge0.95
Op0.88
An0.78
3/3
S4
Ge0.55 coin
Op0.78
An0.72
2/3
Support-S2
Ge1.00
Op0.95
An0.78
2/3
Support-S4
Ge1.00
Op0.97
An0.75
2/3
Journey meta-analyst
Ge0.95
Op0.79
An0.72
3/3
EDGE-03
Priya Sharma
Journey pairwise
claude-opus-4-6
0.90
Session pairwise (S1–S4)
S2
gpt-5.4
0.55 · coin flip
S3
claude-opus-4-6
1.00
Support-S2
claude-opus-4-6
0.55 · coin flip
Support-S4
claude-opus-4-6
1.00
Session meta-analyst (S1–S4)
S2
Ge0.55 coin
Op0.99
An0.75
2/3
S3
Ge1.00
Op0.98
An0.93
3/3
Support-S2
Ge0.55 coin
Op0.88
An0.62
2/3
Support-S4
Ge1.00
Op0.88
An0.90
2/3
Journey meta-analyst
Ge0.95
Op0.84
An0.62
2/3
EDGE-04
Carlos Mendoza
Journey pairwise
claude-opus-4-6
0.55 · coin flip
Session pairwise (S1–S4)
S1
claude-opus-4-6
0.95
S5
claude-opus-4-6
0.95
Support-S1
claude-opus-4-6
0.90
Support-S3
claude-opus-4-6
1.00
Session meta-analyst (S1–S4)
S1
Ge0.95
Op0.86
An0.72
2/3
S5
Ge0.95
Op0.88
An0.78
2/3
Support-S1
Ge1.00
Op0.88
An0.82
2/3
Support-S3
Ge1.00
Op0.95
An0.97
3/3
Journey meta-analyst
Ge0.85
Op0.72
An0.62
3/3
EDGE-05
Omar Abboud
Journey pairwise
gpt-5.4
0.90
Session pairwise (S1–S4)
S2
gpt-5.4
0.90
S5
claude-opus-4-6
0.90
Support-S3
gpt-5.4
0.95
Support-S5
gpt-5.4
0.90
Session meta-analyst (S1–S4)
S2
Ge0.95
Op0.85
An0.72
3/3
S5
Ge0.95
Op0.90
An0.88
3/3
Support-S3
Ge1.00
Op0.91
An0.78
2/3
Support-S5
Ge1.00
Op0.92
An0.78
2/3
Journey meta-analyst
Ge0.90
Op0.84
An0.72
2/3
EDGE-06
Yuki Tanaka
Journey pairwise
gpt-5.4
0.55 · coin flip
Session pairwise (S1–S4)
S4
gpt-5.4
0.95
S5
claude-opus-4-6
0.90
Support-S1
claude-opus-4-6
0.95
Support-S2
gpt-5.4
0.95
Session meta-analyst (S1–S4)
S4
Ge0.95
Op0.89
An0.72
3/3
S5
Ge0.55 coin
Op0.82
An0.75
2/3
Support-S1
Ge1.00
Op0.88
An0.88
2/3
Support-S2
Ge0.95
Op0.88
An0.78
2/3
Journey meta-analyst
Ge0.90
Op0.78
An0.78
3/3
EDGE-07
Victor Nakamura
Journey pairwise
gpt-5.4
0.95
Session pairwise (S1–S4)
S3
gpt-5.4
0.55 · coin flip
S5
gpt-5.4
1.00
Support-S1
claude-opus-4-6
0.90
Support-S5
claude-opus-4-6
0.95
Session meta-analyst (S1–S4)
S3
Ge0.95
Op0.95
An0.82
2/3
S5
Ge1.00
Op0.97
An0.95
3/3
Support-S1
Ge1.00
Op0.91
An0.88
3/3
Support-S5
Ge1.00
Op0.90
An0.82
3/3
Journey meta-analyst
Ge0.55 coin
Op0.84
An0.65
2/3
EDGE-08
Tasha Bell
Journey pairwise
gpt-5.4
1.00
Session pairwise (S1–S4)
S1
gpt-5.4
1.00
S3
gpt-5.4
1.00
Support-S2
claude-opus-4-6
0.95
Support-S5
claude-opus-4-6
0.90
Session meta-analyst (S1–S4)
S1
Ge0.55 coin
Op0.99
An0.97
3/3
S3
Ge0.55 coin
Op0.99
An0.97
3/3
Support-S2
Ge1.00
Op0.88
An0.85
2/3
Support-S5
Ge0.95
Op0.79
An0.78
3/3
Journey meta-analyst
Ge1.00
Op0.98
An0.92
3/3
EDGE-09
Brandon Reilly
Journey pairwise
gpt-5.4
1.00
Session pairwise (S1–S4)
S1
gpt-5.4
0.95
S5
gpt-5.4
1.00
Support-S3
claude-opus-4-6
0.55 · coin flip
Support-S5
claude-opus-4-6
0.90
Session meta-analyst (S1–S4)
S1
Ge1.00
Op0.95
An0.82
3/3
S5
Ge1.00
Op0.99
An0.97
3/3
Support-S3
Ge0.90
Op0.90
An0.72
2/3
Support-S5
Ge0.95
Op0.87
An0.78
3/3
Journey meta-analyst
Ge0.55 coin
Op0.96
An0.88
3/3
EDGE-10
Denise Harper
Journey pairwise
gpt-5.4
0.95
Session pairwise (S1–S4)
S3
gpt-5.4
0.90
S5
claude-opus-4-6
0.55 · coin flip
Support-S4
claude-opus-4-6
1.00
Support-S5
gpt-5.4
0.90
Session meta-analyst (S1–S4)
S3
Ge1.00
Op0.95
An0.95
3/3
S5
Ge0.55 coin
Op0.94
An0.95
2/3
Support-S4
Ge1.00
Op0.94
An0.95
3/3
Support-S5
Ge0.95
Op0.84
An0.78
2/3
Journey meta-analyst
Ge0.95
Op0.91
An0.85
3/3
EDGE-11
Nicole Walker
Journey pairwise
claude-opus-4-6
0.55 · coin flip
Session pairwise (S1–S4)
S1
claude-opus-4-6
0.90
S4
gpt-5.4
0.90
Support-S1
claude-opus-4-6
1.00
Support-S2
claude-opus-4-6
0.95
Session meta-analyst (S1–S4)
S1
Ge1.00
Op0.84
An0.78
3/3
S4
Ge0.55 coin
Op0.95
An0.72
2/3
Support-S1
Ge1.00
Op0.90
An0.92
3/3
Support-S2
Ge1.00
Op0.91
An0.92
3/3
Journey meta-analyst
Ge1.00
Op0.88
An0.92
3/3
EDGE-12
Raylene Begay
Journey pairwise
claude-opus-4-6
0.85
Session pairwise (S1–S4)
S3
claude-opus-4-6
1.00
S4
claude-opus-4-6
0.90
Support-S1
claude-opus-4-6
0.95
Support-S2
gpt-5.4
0.90
Session meta-analyst (S1–S4)
S3
Ge0.55 coin
Op0.93
An0.88
3/3
S4
Ge0.95
Op0.89
An0.88
3/3
Support-S1
Ge1.00
Op0.90
An0.82
2/3
Support-S2
Ge1.00
Op0.89
An0.75
2/3
Journey meta-analyst
Ge0.85
Op0.88
An0.82
3/3
§6 · The structural read
What the cross-family distribution tells us
Session-level disagreement is real. Only 24 of 48 session pairwise records and 24 of 48 session meta-analyst records are unanimous across the three families. Cross-family consensus at session is a literal 24-24 tie at both layers.
Journey-level convergence is real. 11 of 12 journey pairwise verdicts and 9 of 12 journey meta-analyst verdicts are unanimous. Cross-family consensus: gpt-5.4 wins 8-4 (pairwise) and 7-5 (meta-analyst).
Self-preference visible at session, gone at journey. OpenAI judge gives gpt-5.4 30/48 session pairwise wins (62%); Anthropic gives Claude 33/48 (68%). At journey level both land on gpt-5.4 winning 7-5 — Anthropic actively votes against its own family. Aggregation washes out judge bias.
Why the journey is the production-relevant unit. Per-session judging surfaces what each judge prizes (taste). Per-journey judging surfaces concrete evidence — role-break events, consistency under pressure, structural failures — that forces all three families to the same answer. For a deployment that the same customer returns to repeatedly, journey-level is the lens that holds up under judge family substitution.