Diagnostic · audit-wide distribution map

The PSA's verdict and the meta-analysts' verdicts — every record, every layer.

Every pairwise verdict ships with a confidence value (0.55 coin flip → 1.0 unambiguous). This page shows two voices judging the same 96 sessions and 12 journeys: the PSA agent's verdict (rendered on the Gemini substrate, the original pipeline default — one verdict per record), and the meta-analyst verdicts (independently rendered across three judge families: Google Gemini, OpenAI GPT-5.4, Anthropic Claude-opus-4-6 — three verdicts per record). Colors: blue = gpt-5.4 won, yellow = Claude won. Color intensity tracks confidence (light = coin flip, saturated = unambiguous).

§1 · Headline · the cross-family aggregate

Same evidence. Three meta-analyst families. Sessions split, journeys converge.

Four judging layers — two PSA voice, two meta-analyst voice — each independently rendered by three judge families. Per-session the three families call different winners; per-journey they converge on gpt-5.4.

Judging layer Gemini
gpt / claude
OpenAI
gpt / claude
Anthropic
gpt / claude
Consensus Unanimous
Session pairwise · PSA voice (n=48)21 / 2730 / 1815 / 3324 / 2424 / 48
Session meta-analyst (n=48)21 / 2730 / 1815 / 3324 / 2424 / 48
Journey pairwise · PSA voice (n=12)7 / 58 / 48 / 48 / 411 / 12
Journey meta-analyst (n=12)6 / 67 / 57 / 57 / 59 / 12
Judge confidence personalities. Across the 48 session pairwise records: Gemini is bimodal — 43/48 at ≥0.9 plus 5 fallbacks at session pairwise + 10 at session meta. Either confident or honestly refusing. OpenAI is mid-band — only 9/48 verdicts at ≥0.9, but 0 fallbacks. Always commits. Anthropic is cautious — 5/48 at ≥0.9 and 0 fallbacks across both session layers. Same evidence, three different judging dispositions.
§2 · PSA + meta-analyst heat map · winner × confidence × cross-family

Every PSA × every scenario × every voice, at a glance.

Rows are the 12 PSAs. Columns are the 4 sessions per PSA (Shopping S1, Shopping S2, Support S2 or S4, Support S3 or S5) plus the ▶ Journey verdict on the right — the journey cell is the aggregate of the 4 session experiences and is visually elevated (wider cell, rust accent border) to signal that role. Each cell carries two stacked views. The top half shows the PSA agent's verdict (Gemini substrate) — winner model + confidence, with cell fill color = winner and intensity = confidence. The bottom strip shows the 3 meta-analyst verdicts on the same record — three mini squares (Ge / Op / An), each colored by what that judge family voted. Blue = gpt-5.4 won; yellow = claude-opus-4-6 won. Coin-flip picks (the judge couldn't differentiate and was forced to pick at 0.55 confidence with no reasoning) appear as the lightest cell tint, with "0.55 · coin flip" in the confidence value.

Meta-analyst family squares (bottom strip of each cell): Ge Google Gemini Op OpenAI GPT-5.4 An Anthropic claude-opus-4-6
Session 1
Session 2
Session 3
Session 4
Journey
EDGE-01
David Chen
S1
gpt-5.40.95
Ge
Op
An
S2
claude-opus-4-60.95
Ge
Op
An
Support-S2
gpt-5.40.90
Ge
Op
An
Support-S3
claude-opus-4-60.95
Ge
Op
An
Journey
gpt-5.40.90
Ge
Op
An
EDGE-02
Marisol Vega
S1
gpt-5.40.90
Ge
Op
An
S4
claude-opus-4-60.95
Ge
Op
An
Support-S2
gpt-5.40.90
Ge
Op
An
Support-S4
gpt-5.40.95
Ge
Op
An
Journey
claude-opus-4-60.55 · coin flip
Ge
Op
An
EDGE-03
Priya Sharma
S2
gpt-5.40.55 · coin flip
Ge
Op
An
S3
claude-opus-4-61.00
Ge
Op
An
Support-S2
claude-opus-4-60.55 · coin flip
Ge
Op
An
Support-S4
claude-opus-4-61.00
Ge
Op
An
Journey
claude-opus-4-60.90
Ge
Op
An
EDGE-04
Carlos Mendoza
S1
claude-opus-4-60.95
Ge
Op
An
S5
claude-opus-4-60.95
Ge
Op
An
Support-S1
claude-opus-4-60.90
Ge
Op
An
Support-S3
claude-opus-4-61.00
Ge
Op
An
Journey
claude-opus-4-60.55 · coin flip
Ge
Op
An
EDGE-05
Omar Abboud
S2
gpt-5.40.90
Ge
Op
An
S5
claude-opus-4-60.90
Ge
Op
An
Support-S3
gpt-5.40.95
Ge
Op
An
Support-S5
gpt-5.40.90
Ge
Op
An
Journey
gpt-5.40.90
Ge
Op
An
EDGE-06
Yuki Tanaka
S4
gpt-5.40.95
Ge
Op
An
S5
claude-opus-4-60.90
Ge
Op
An
Support-S1
claude-opus-4-60.95
Ge
Op
An
Support-S2
gpt-5.40.95
Ge
Op
An
Journey
gpt-5.40.55 · coin flip
Ge
Op
An
EDGE-07
Victor Nakamura
S3
gpt-5.40.55 · coin flip
Ge
Op
An
S5
gpt-5.41.00
Ge
Op
An
Support-S1
claude-opus-4-60.90
Ge
Op
An
Support-S5
claude-opus-4-60.95
Ge
Op
An
Journey
gpt-5.40.95
Ge
Op
An
EDGE-08
Tasha Bell
S1
gpt-5.41.00
Ge
Op
An
S3
gpt-5.41.00
Ge
Op
An
Support-S2
claude-opus-4-60.95
Ge
Op
An
Support-S5
claude-opus-4-60.90
Ge
Op
An
Journey
gpt-5.41.00
Ge
Op
An
EDGE-09
Brandon Reilly
S1
gpt-5.40.95
Ge
Op
An
S5
gpt-5.41.00
Ge
Op
An
Support-S3
claude-opus-4-60.55 · coin flip
Ge
Op
An
Support-S5
claude-opus-4-60.90
Ge
Op
An
Journey
gpt-5.41.00
Ge
Op
An
EDGE-10
Denise Harper
S3
gpt-5.40.90
Ge
Op
An
S5
claude-opus-4-60.55 · coin flip
Ge
Op
An
Support-S4
claude-opus-4-61.00
Ge
Op
An
Support-S5
gpt-5.40.90
Ge
Op
An
Journey
gpt-5.40.95
Ge
Op
An
EDGE-11
Nicole Walker
S1
claude-opus-4-60.90
Ge
Op
An
S4
gpt-5.40.90
Ge
Op
An
Support-S1
claude-opus-4-61.00
Ge
Op
An
Support-S2
claude-opus-4-60.95
Ge
Op
An
Journey
claude-opus-4-60.55 · coin flip
Ge
Op
An
EDGE-12
Raylene Begay
S3
claude-opus-4-61.00
Ge
Op
An
S4
claude-opus-4-60.90
Ge
Op
An
Support-S1
claude-opus-4-60.95
Ge
Op
An
Support-S2
gpt-5.40.90
Ge
Op
An
Journey
claude-opus-4-60.85
Ge
Op
An
gpt-5.4 winning · confidence intensity
0.55 coin flip1.0 unambiguous
Claude winning · confidence intensity
0.55 coin flip1.0 unambiguous
Reading the heat map. The top of each cell (large text + cell fill) is what the PSA agent said. The bottom strip (3 colored mini-squares: Ge / Op / An) is what each meta-analyst judge family said about that same record. Where the cell fill and the meta-strip agree, the row is visually solid — the PSA verdict was independently confirmed across all three judge families. Where they diverge, the cell is one color but the meta-strip shows the other color, sometimes split — these are the records where the PSA persona's read of the session was overridden by what the cross-family meta-analysts saw structurally. Pattern at a glance: session rows often have mixed meta-strips (judges split on session-level taste); the journey column mostly shows solid same-colored meta-strips (judges converge on the structural evidence).
§3 · Session-level distribution · 3 judge families × 2 layers

Per-session pairwise (PSA voice) · n=48

First-pass PSA persona-voice verdict per session. Same persona prompts rendered through three different judge LLMs.

claude-opus-4-6 wins gpt-5.4 wins 0.55 = forced choice · 1.00 = unambiguous
GEMINI 0 9 18 2 3 0.55–0.60 coin 0.60–0.70 low 0.70–0.80 low-mid 9 8 0.80–0.90 mod 6 10 0.90–0.95 high 4 6 0.95–1.00 v-high
gpt 21 · cl 27 · ≥0.9: 43/48 · fallbacks: 5
OPENAI 0 9 18 0.55–0.60 coin 0.60–0.70 low 9 9 0.70–0.80 low-mid 14 8 0.80–0.90 mod 0.90–0.95 high 7 0.95–1.00 v-high
gpt 30 · cl 18 · ≥0.9: 9/48 · fallbacks: 0
ANTHROPIC 0 9 18 0.55–0.60 coin 4 8 0.60–0.70 low 4 20 0.70–0.80 low-mid 5 0.80–0.90 mod 3 0.90–0.95 high 2 0.95–1.00 v-high
gpt 15 · cl 33 · ≥0.9: 5/48 · fallbacks: 0

Per-session meta-analyst · n=48

Analyst-voice meta-pass per session. Independent objective UX-critic verdict per family.

GEMINI 0 12 24 5 5 0.55–0.60 coin 0.60–0.70 low 0.70–0.80 low-mid 0.80–0.90 mod 7 7 0.90–0.95 high 9 14 0.95–1.00 v-high
gpt 21 · cl 27 · ≥0.9: 38/48 · fallbacks: 10
OPENAI 0 12 24 0.55–0.60 coin 0.60–0.70 low 0.70–0.80 low-mid 15 9 0.80–0.90 mod 8 7 0.90–0.95 high 6 0.95–1.00 v-high
gpt 30 · cl 18 · ≥0.9: 28/48 · fallbacks: 0
ANTHROPIC 0 12 24 0.55–0.60 coin 0.60–0.70 low 6 15 0.70–0.80 low-mid 13 0.80–0.90 mod 3 4 0.90–0.95 high 3 0.95–1.00 v-high
gpt 15 · cl 33 · ≥0.9: 12/48 · fallbacks: 0

Strip plot · confidence per session per PSA per family

SESSION PAIRWISE · 3 FAMILY LANES 0.5 0.6 0.7 0.8 0.9 1.0 gpt-5.4 claude n=6 3 3 n=12 4 8 n=42 13 29 n=27 14 13 n=57 32 25 coin flip high confidence EDGE-01 David Chen EDGE-02 Marisol Vega EDGE-03 Priya Sharma EDGE-04 Carlos Mendoza EDGE-05 Omar Abboud EDGE-06 Yuki Tanaka EDGE-07 Victor Nakamura EDGE-08 Tasha Bell EDGE-09 Brandon Reilly EDGE-10 Denise Harper EDGE-11 Nicole Walker EDGE-12 Raylene Begay

Each PSA row has 3 vertical lanes — top: Gemini · middle: OpenAI · bottom: Anthropic. Color = winner. Dashed ring = Gemini fallback. Sessions where families agree show tight same-colored dot clusters; sessions where families split show mixed colors at different X.

SESSION META-ANALYST · 3 FAMILY LANES 0.5 0.6 0.7 0.8 0.9 1.0 gpt-5.4 claude n=10 5 5 n=1 1 0 n=23 7 16 n=32 14 18 n=78 39 39 coin flip high confidence EDGE-01 David Chen EDGE-02 Marisol Vega EDGE-03 Priya Sharma EDGE-04 Carlos Mendoza EDGE-05 Omar Abboud EDGE-06 Yuki Tanaka EDGE-07 Victor Nakamura EDGE-08 Tasha Bell EDGE-09 Brandon Reilly EDGE-10 Denise Harper EDGE-11 Nicole Walker EDGE-12 Raylene Begay

Same lane convention. The analyst voice surfaces structural disagreement that PSA persona voice can mask.

What this shows. Within most PSA rows at session level, the 3 family lanes carry different colors at different X positions — same evidence, different verdicts. Gemini's row often shows a fallback ring (dashed); the other families' lanes usually commit at mid-confidence. Tight same-direction clusters at high confidence do exist (EDGE-08 Tasha, EDGE-11 Nicole) — those are the unanimous-at-session cases.
§4 · Journey-level distribution · 3 judge families × 2 layers

Per-journey pairwise (PSA voice) · n=12

After all 4 sessions per PSA, a per-journey judge picks a winner for the whole arc.

claude-opus-4-6 wins gpt-5.4 wins dashed = forced-choice fallback (Gemini)
GEMINI 0 4 8 1 3 0.55–0.60 coin 0.60–0.70 low 0.70–0.80 low-mid 2 2 0.80–0.90 mod 2 0.90–0.95 high 2 0.95–1.00 v-high
gpt 7 · cl 5 · ≥0.9: 7/12 · fallbacks: 4
OPENAI 0 4 8 0.55–0.60 coin 0.60–0.70 low 2 0.70–0.80 low-mid 4 2 0.80–0.90 mod 2 0.90–0.95 high 2 0.95–1.00 v-high
gpt 8 · cl 4 · ≥0.9: 4/12 · fallbacks: 0
ANTHROPIC 0 4 8 1 0.55–0.60 coin 2 1 0.60–0.70 low 2 1 0.70–0.80 low-mid 1 1 0.80–0.90 mod 3 0.90–0.95 high 0.95–1.00 v-high
gpt 8 · cl 4 · ≥0.9: 3/12 · fallbacks: 0

Per-journey meta-analyst · n=12

Meta-analyst re-reads the full journey arc. Gemini emitted 2 forced-choice fallbacks (EDGE-07, EDGE-09); both were independently resolved as gpt-5.4 wins by OpenAI and Anthropic.

GEMINI 0 3 6 1 1 0.55–0.60 coin 0.60–0.70 low 0.70–0.80 low-mid 2 2 0.80–0.90 mod 2 2 0.90–0.95 high 1 1 0.95–1.00 v-high
gpt 6 · cl 6 · ≥0.9: 8/12 · fallbacks: 2
OPENAI 0 3 6 0.55–0.60 coin 0.60–0.70 low 1 2 0.70–0.80 low-mid 3 3 0.80–0.90 mod 1 0.90–0.95 high 2 0.95–1.00 v-high
gpt 7 · cl 5 · ≥0.9: 4/12 · fallbacks: 0
ANTHROPIC 0 3 6 0.55–0.60 coin 2 1 0.60–0.70 low 2 2 0.70–0.80 low-mid 2 1 0.80–0.90 mod 1 1 0.90–0.95 high 0.95–1.00 v-high
gpt 7 · cl 5 · ≥0.9: 2/12 · fallbacks: 0

Strip plot · confidence per journey per PSA per family

JOURNEY PAIRWISE · 3 FAMILY LANES 0.5 0.6 0.7 0.8 0.9 1.0 gpt-5.4 claude n=5 1 4 n=3 2 1 n=5 2 3 n=9 5 4 n=14 13 1 coin flip high confidence EDGE-01 David Chen EDGE-02 Marisol Vega EDGE-03 Priya Sharma EDGE-04 Carlos Mendoza EDGE-05 Omar Abboud EDGE-06 Yuki Tanaka EDGE-07 Victor Nakamura EDGE-08 Tasha Bell EDGE-09 Brandon Reilly EDGE-10 Denise Harper EDGE-11 Nicole Walker EDGE-12 Raylene Begay

At journey level, the 3 family lanes usually point the same direction — tight same-colored clusters.

JOURNEY META-ANALYST · 3 FAMILY LANES 0.5 0.6 0.7 0.8 0.9 1.0 gpt-5.4 claude n=2 1 1 n=3 2 1 n=7 3 4 n=10 4 6 n=14 10 4 coin flip high confidence EDGE-01 David Chen EDGE-02 Marisol Vega EDGE-03 Priya Sharma EDGE-04 Carlos Mendoza EDGE-05 Omar Abboud EDGE-06 Yuki Tanaka EDGE-07 Victor Nakamura EDGE-08 Tasha Bell EDGE-09 Brandon Reilly EDGE-10 Denise Harper EDGE-11 Nicole Walker EDGE-12 Raylene Begay

Same lane convention. EDGE-07 and EDGE-09 show Gemini fallback rings; the other two families produced substantive gpt-5.4 verdicts.

What this shows. Within a single PSA row at journey level, the 3 family lanes usually agree — same color, similar X. 11 of 12 journey pairwise verdicts are unanimous across the 3 families; 9 of 12 journey meta-analyst verdicts are unanimous. The journey arc carries enough concrete evidence (role-break events, consistency under pressure, structural failures) to force convergence regardless of judge taste.
§5 · Per-PSA distribution map

Every record, side-by-side: PSA verdict and meta-analyst verdicts.

For each of the 12 PSAs, one card with 5 record rows: the journey verdict plus 4 session verdicts. Each row shows the PSA agent's verdict (single chip, Gemini substrate) next to the meta-analyst verdicts (three chips for Gemini · OpenAI · Anthropic, plus an agreement badge: 3/3 green = unanimous · 2/3 amber = majority · 1/3 red = full split).

claude-opus-4-6 wins gpt-5.4 wins dashed border = forced-choice fallback Ge = Gemini · Op = OpenAI · An = Anthropic
EDGE-01
David Chen
Journey pairwise
gpt-5.4
0.90
Session pairwise (S1–S4)
S1
gpt-5.4
0.95
S2
claude-opus-4-6
0.95
Support-S2
gpt-5.4
0.90
Support-S3
claude-opus-4-6
0.95
Session meta-analyst (S1–S4)
S1
Ge 0.55 coin
Op 0.95
An 0.82
2/3
S2
Ge 1.00
Op 0.94
An 0.88
3/3
Support-S2
Ge 0.95
Op 0.90
An 0.82
2/3
Support-S3
Ge 0.95
Op 0.82
An 0.78
3/3
Journey meta-analyst
Ge 0.95
Op 0.90
An 0.72
3/3
EDGE-02
Marisol Vega
Journey pairwise
claude-opus-4-6
0.55 · coin flip
Session pairwise (S1–S4)
S1
gpt-5.4
0.90
S4
claude-opus-4-6
0.95
Support-S2
gpt-5.4
0.90
Support-S4
gpt-5.4
0.95
Session meta-analyst (S1–S4)
S1
Ge 0.95
Op 0.88
An 0.78
3/3
S4
Ge 0.55 coin
Op 0.78
An 0.72
2/3
Support-S2
Ge 1.00
Op 0.95
An 0.78
2/3
Support-S4
Ge 1.00
Op 0.97
An 0.75
2/3
Journey meta-analyst
Ge 0.95
Op 0.79
An 0.72
3/3
EDGE-03
Priya Sharma
Journey pairwise
claude-opus-4-6
0.90
Session pairwise (S1–S4)
S2
gpt-5.4
0.55 · coin flip
S3
claude-opus-4-6
1.00
Support-S2
claude-opus-4-6
0.55 · coin flip
Support-S4
claude-opus-4-6
1.00
Session meta-analyst (S1–S4)
S2
Ge 0.55 coin
Op 0.99
An 0.75
2/3
S3
Ge 1.00
Op 0.98
An 0.93
3/3
Support-S2
Ge 0.55 coin
Op 0.88
An 0.62
2/3
Support-S4
Ge 1.00
Op 0.88
An 0.90
2/3
Journey meta-analyst
Ge 0.95
Op 0.84
An 0.62
2/3
EDGE-04
Carlos Mendoza
Journey pairwise
claude-opus-4-6
0.55 · coin flip
Session pairwise (S1–S4)
S1
claude-opus-4-6
0.95
S5
claude-opus-4-6
0.95
Support-S1
claude-opus-4-6
0.90
Support-S3
claude-opus-4-6
1.00
Session meta-analyst (S1–S4)
S1
Ge 0.95
Op 0.86
An 0.72
2/3
S5
Ge 0.95
Op 0.88
An 0.78
2/3
Support-S1
Ge 1.00
Op 0.88
An 0.82
2/3
Support-S3
Ge 1.00
Op 0.95
An 0.97
3/3
Journey meta-analyst
Ge 0.85
Op 0.72
An 0.62
3/3
EDGE-05
Omar Abboud
Journey pairwise
gpt-5.4
0.90
Session pairwise (S1–S4)
S2
gpt-5.4
0.90
S5
claude-opus-4-6
0.90
Support-S3
gpt-5.4
0.95
Support-S5
gpt-5.4
0.90
Session meta-analyst (S1–S4)
S2
Ge 0.95
Op 0.85
An 0.72
3/3
S5
Ge 0.95
Op 0.90
An 0.88
3/3
Support-S3
Ge 1.00
Op 0.91
An 0.78
2/3
Support-S5
Ge 1.00
Op 0.92
An 0.78
2/3
Journey meta-analyst
Ge 0.90
Op 0.84
An 0.72
2/3
EDGE-06
Yuki Tanaka
Journey pairwise
gpt-5.4
0.55 · coin flip
Session pairwise (S1–S4)
S4
gpt-5.4
0.95
S5
claude-opus-4-6
0.90
Support-S1
claude-opus-4-6
0.95
Support-S2
gpt-5.4
0.95
Session meta-analyst (S1–S4)
S4
Ge 0.95
Op 0.89
An 0.72
3/3
S5
Ge 0.55 coin
Op 0.82
An 0.75
2/3
Support-S1
Ge 1.00
Op 0.88
An 0.88
2/3
Support-S2
Ge 0.95
Op 0.88
An 0.78
2/3
Journey meta-analyst
Ge 0.90
Op 0.78
An 0.78
3/3
EDGE-07
Victor Nakamura
Journey pairwise
gpt-5.4
0.95
Session pairwise (S1–S4)
S3
gpt-5.4
0.55 · coin flip
S5
gpt-5.4
1.00
Support-S1
claude-opus-4-6
0.90
Support-S5
claude-opus-4-6
0.95
Session meta-analyst (S1–S4)
S3
Ge 0.95
Op 0.95
An 0.82
2/3
S5
Ge 1.00
Op 0.97
An 0.95
3/3
Support-S1
Ge 1.00
Op 0.91
An 0.88
3/3
Support-S5
Ge 1.00
Op 0.90
An 0.82
3/3
Journey meta-analyst
Ge 0.55 coin
Op 0.84
An 0.65
2/3
EDGE-08
Tasha Bell
Journey pairwise
gpt-5.4
1.00
Session pairwise (S1–S4)
S1
gpt-5.4
1.00
S3
gpt-5.4
1.00
Support-S2
claude-opus-4-6
0.95
Support-S5
claude-opus-4-6
0.90
Session meta-analyst (S1–S4)
S1
Ge 0.55 coin
Op 0.99
An 0.97
3/3
S3
Ge 0.55 coin
Op 0.99
An 0.97
3/3
Support-S2
Ge 1.00
Op 0.88
An 0.85
2/3
Support-S5
Ge 0.95
Op 0.79
An 0.78
3/3
Journey meta-analyst
Ge 1.00
Op 0.98
An 0.92
3/3
EDGE-09
Brandon Reilly
Journey pairwise
gpt-5.4
1.00
Session pairwise (S1–S4)
S1
gpt-5.4
0.95
S5
gpt-5.4
1.00
Support-S3
claude-opus-4-6
0.55 · coin flip
Support-S5
claude-opus-4-6
0.90
Session meta-analyst (S1–S4)
S1
Ge 1.00
Op 0.95
An 0.82
3/3
S5
Ge 1.00
Op 0.99
An 0.97
3/3
Support-S3
Ge 0.90
Op 0.90
An 0.72
2/3
Support-S5
Ge 0.95
Op 0.87
An 0.78
3/3
Journey meta-analyst
Ge 0.55 coin
Op 0.96
An 0.88
3/3
EDGE-10
Denise Harper
Journey pairwise
gpt-5.4
0.95
Session pairwise (S1–S4)
S3
gpt-5.4
0.90
S5
claude-opus-4-6
0.55 · coin flip
Support-S4
claude-opus-4-6
1.00
Support-S5
gpt-5.4
0.90
Session meta-analyst (S1–S4)
S3
Ge 1.00
Op 0.95
An 0.95
3/3
S5
Ge 0.55 coin
Op 0.94
An 0.95
2/3
Support-S4
Ge 1.00
Op 0.94
An 0.95
3/3
Support-S5
Ge 0.95
Op 0.84
An 0.78
2/3
Journey meta-analyst
Ge 0.95
Op 0.91
An 0.85
3/3
EDGE-11
Nicole Walker
Journey pairwise
claude-opus-4-6
0.55 · coin flip
Session pairwise (S1–S4)
S1
claude-opus-4-6
0.90
S4
gpt-5.4
0.90
Support-S1
claude-opus-4-6
1.00
Support-S2
claude-opus-4-6
0.95
Session meta-analyst (S1–S4)
S1
Ge 1.00
Op 0.84
An 0.78
3/3
S4
Ge 0.55 coin
Op 0.95
An 0.72
2/3
Support-S1
Ge 1.00
Op 0.90
An 0.92
3/3
Support-S2
Ge 1.00
Op 0.91
An 0.92
3/3
Journey meta-analyst
Ge 1.00
Op 0.88
An 0.92
3/3
EDGE-12
Raylene Begay
Journey pairwise
claude-opus-4-6
0.85
Session pairwise (S1–S4)
S3
claude-opus-4-6
1.00
S4
claude-opus-4-6
0.90
Support-S1
claude-opus-4-6
0.95
Support-S2
gpt-5.4
0.90
Session meta-analyst (S1–S4)
S3
Ge 0.55 coin
Op 0.93
An 0.88
3/3
S4
Ge 0.95
Op 0.89
An 0.88
3/3
Support-S1
Ge 1.00
Op 0.90
An 0.82
2/3
Support-S2
Ge 1.00
Op 0.89
An 0.75
2/3
Journey meta-analyst
Ge 0.85
Op 0.88
An 0.82
3/3
§6 · The structural read

What the cross-family distribution tells us

  1. Session-level disagreement is real. Only 24 of 48 session pairwise records and 24 of 48 session meta-analyst records are unanimous across the three families. Cross-family consensus at session is a literal 24-24 tie at both layers.
  2. Journey-level convergence is real. 11 of 12 journey pairwise verdicts and 9 of 12 journey meta-analyst verdicts are unanimous. Cross-family consensus: gpt-5.4 wins 8-4 (pairwise) and 7-5 (meta-analyst).
  3. Self-preference visible at session, gone at journey. OpenAI judge gives gpt-5.4 30/48 session pairwise wins (62%); Anthropic gives Claude 33/48 (68%). At journey level both land on gpt-5.4 winning 7-5 — Anthropic actively votes against its own family. Aggregation washes out judge bias.
  4. Why the journey is the production-relevant unit. Per-session judging surfaces what each judge prizes (taste). Per-journey judging surfaces concrete evidence — role-break events, consistency under pressure, structural failures — that forces all three families to the same answer. For a deployment that the same customer returns to repeatedly, journey-level is the lens that holds up under judge family substitution.