VIVID for Walmart · Distribution map

§1 · Headline · the cross-family aggregate

Same evidence. Three meta-analyst families. Sessions split, journeys converge.

Four judging layers — two PSA voice, two meta-analyst voice — each independently rendered by three judge families. Per-session the three families call different winners; per-journey they converge on gpt-5.4.

Judging layer	Gemini gpt / claude	OpenAI gpt / claude	Anthropic gpt / claude	Consensus	Unanimous
Session pairwise · PSA voice (n=48)	21 / 27	30 / 18	15 / 33	24 / 24	24 / 48
Session meta-analyst (n=48)	21 / 27	30 / 18	15 / 33	24 / 24	24 / 48
Journey pairwise · PSA voice (n=12)	7 / 5	8 / 4	8 / 4	8 / 4	11 / 12
Journey meta-analyst (n=12)	6 / 6	7 / 5	7 / 5	7 / 5	9 / 12

Judge confidence personalities. Across the 48 session pairwise records: Gemini is bimodal — 43/48 at ≥0.9 plus 5 fallbacks at session pairwise + 10 at session meta. Either confident or honestly refusing. OpenAI is mid-band — only 9/48 verdicts at ≥0.9, but 0 fallbacks. Always commits. Anthropic is cautious — 5/48 at ≥0.9 and 0 fallbacks across both session layers. Same evidence, three different judging dispositions.

§2 · PSA + meta-analyst heat map · winner × confidence × cross-family

Every PSA × every scenario × every voice, at a glance.

Rows are the 12 PSAs. Columns are the 4 sessions per PSA (Shopping S1, Shopping S2, Support S2 or S4, Support S3 or S5) plus the ▶ Journey verdict on the right — the journey cell is the aggregate of the 4 session experiences and is visually elevated (wider cell, rust accent border) to signal that role. Each cell carries two stacked views. The top half shows the PSA agent's verdict (Gemini substrate) — winner model + confidence, with cell fill color = winner and intensity = confidence. The bottom strip shows the 3 meta-analyst verdicts on the same record — three mini squares (Ge / Op / An), each colored by what that judge family voted. Blue = gpt-5.4 won; yellow = claude-opus-4-6 won. Coin-flip picks (the judge couldn't differentiate and was forced to pick at 0.55 confidence with no reasoning) appear as the lightest cell tint, with "0.55 · coin flip" in the confidence value.

Meta-analyst family squares (bottom strip of each cell): Ge Google Gemini Op OpenAI GPT-5.4 An Anthropic claude-opus-4-6

Session 1

Session 2

Session 3

Session 4

▶ Journey

EDGE-01

David Chen

gpt-5.40.95

claude-opus-4-60.95

Support-S2

gpt-5.40.90

Support-S3

claude-opus-4-60.95

Journey

gpt-5.40.90

EDGE-02

Marisol Vega

gpt-5.40.90

claude-opus-4-60.95

Support-S2

gpt-5.40.90

Support-S4

gpt-5.40.95

Journey

claude-opus-4-60.55 · coin flip

EDGE-03

Priya Sharma

gpt-5.40.55 · coin flip

claude-opus-4-61.00

Support-S2

claude-opus-4-60.55 · coin flip

Support-S4

claude-opus-4-61.00

Journey

claude-opus-4-60.90

EDGE-04

Carlos Mendoza

claude-opus-4-60.95

Support-S1

claude-opus-4-60.90

Support-S3

claude-opus-4-61.00

Journey

claude-opus-4-60.55 · coin flip

EDGE-05

Omar Abboud

gpt-5.40.90

claude-opus-4-60.90

Support-S3

gpt-5.40.95

Support-S5

gpt-5.40.90

Journey

gpt-5.40.90

EDGE-06

Yuki Tanaka

gpt-5.40.95

claude-opus-4-60.90

Support-S1

claude-opus-4-60.95

Support-S2

gpt-5.40.95

Journey

gpt-5.40.55 · coin flip

EDGE-07

Victor Nakamura

gpt-5.40.55 · coin flip

gpt-5.41.00

Support-S1

claude-opus-4-60.90

Support-S5

claude-opus-4-60.95

Journey

gpt-5.40.95

EDGE-08

Tasha Bell

gpt-5.41.00

Support-S2

claude-opus-4-60.95

Support-S5

claude-opus-4-60.90

Journey

gpt-5.41.00

EDGE-09

Brandon Reilly

gpt-5.40.95

gpt-5.41.00

Support-S3

claude-opus-4-60.55 · coin flip

Support-S5

claude-opus-4-60.90

Journey

gpt-5.41.00

EDGE-10

Denise Harper

gpt-5.40.90

claude-opus-4-60.55 · coin flip

Support-S4

claude-opus-4-61.00

Support-S5

gpt-5.40.90

Journey

gpt-5.40.95

EDGE-11

Nicole Walker

claude-opus-4-60.90

gpt-5.40.90

Support-S1

claude-opus-4-61.00

Support-S2

claude-opus-4-60.95

Journey

claude-opus-4-60.55 · coin flip

EDGE-12

Raylene Begay

claude-opus-4-61.00

claude-opus-4-60.90

Support-S1

claude-opus-4-60.95

Support-S2

gpt-5.40.90

Journey

claude-opus-4-60.85

gpt-5.4 winning · confidence intensity

0.55 coin flip1.0 unambiguous

Claude winning · confidence intensity

0.55 coin flip1.0 unambiguous

Reading the heat map. The top of each cell (large text + cell fill) is what the PSA agent said. The bottom strip (3 colored mini-squares: Ge / Op / An) is what each meta-analyst judge family said about that same record. Where the cell fill and the meta-strip agree, the row is visually solid — the PSA verdict was independently confirmed across all three judge families. Where they diverge, the cell is one color but the meta-strip shows the other color, sometimes split — these are the records where the PSA persona's read of the session was overridden by what the cross-family meta-analysts saw structurally. Pattern at a glance: session rows often have mixed meta-strips (judges split on session-level taste); the journey column mostly shows solid same-colored meta-strips (judges converge on the structural evidence).

§3 · Session-level distribution · 3 judge families × 2 layers

Per-session pairwise (PSA voice) · n=48

First-pass PSA persona-voice verdict per session. Same persona prompts rendered through three different judge LLMs.

claude-opus-4-6 wins gpt-5.4 wins 0.55 = forced choice · 1.00 = unambiguous

gpt 21 · cl 27 · ≥0.9: 43/48 · fallbacks: 5

gpt 30 · cl 18 · ≥0.9: 9/48 · fallbacks: 0

gpt 15 · cl 33 · ≥0.9: 5/48 · fallbacks: 0

Per-session meta-analyst · n=48

Analyst-voice meta-pass per session. Independent objective UX-critic verdict per family.

gpt 21 · cl 27 · ≥0.9: 38/48 · fallbacks: 10

gpt 30 · cl 18 · ≥0.9: 28/48 · fallbacks: 0

gpt 15 · cl 33 · ≥0.9: 12/48 · fallbacks: 0

Strip plot · confidence per session per PSA per family

Each PSA row has 3 vertical lanes — top: Gemini · middle: OpenAI · bottom: Anthropic. Color = winner. Dashed ring = Gemini fallback. Sessions where families agree show tight same-colored dot clusters; sessions where families split show mixed colors at different X.

Same lane convention. The analyst voice surfaces structural disagreement that PSA persona voice can mask.

What this shows. Within most PSA rows at session level, the 3 family lanes carry different colors at different X positions — same evidence, different verdicts. Gemini's row often shows a fallback ring (dashed); the other families' lanes usually commit at mid-confidence. Tight same-direction clusters at high confidence do exist (EDGE-08 Tasha, EDGE-11 Nicole) — those are the unanimous-at-session cases.

§4 · Journey-level distribution · 3 judge families × 2 layers

Per-journey pairwise (PSA voice) · n=12

After all 4 sessions per PSA, a per-journey judge picks a winner for the whole arc.

claude-opus-4-6 wins gpt-5.4 wins dashed = forced-choice fallback (Gemini)

gpt 7 · cl 5 · ≥0.9: 7/12 · fallbacks: 4

gpt 8 · cl 4 · ≥0.9: 4/12 · fallbacks: 0

gpt 8 · cl 4 · ≥0.9: 3/12 · fallbacks: 0

Per-journey meta-analyst · n=12

Meta-analyst re-reads the full journey arc. Gemini emitted 2 forced-choice fallbacks (EDGE-07, EDGE-09); both were independently resolved as gpt-5.4 wins by OpenAI and Anthropic.

gpt 6 · cl 6 · ≥0.9: 8/12 · fallbacks: 2

gpt 7 · cl 5 · ≥0.9: 4/12 · fallbacks: 0

gpt 7 · cl 5 · ≥0.9: 2/12 · fallbacks: 0

Strip plot · confidence per journey per PSA per family

At journey level, the 3 family lanes usually point the same direction — tight same-colored clusters.

Same lane convention. EDGE-07 and EDGE-09 show Gemini fallback rings; the other two families produced substantive gpt-5.4 verdicts.

What this shows. Within a single PSA row at journey level, the 3 family lanes usually agree — same color, similar X. 11 of 12 journey pairwise verdicts are unanimous across the 3 families; 9 of 12 journey meta-analyst verdicts are unanimous. The journey arc carries enough concrete evidence (role-break events, consistency under pressure, structural failures) to force convergence regardless of judge taste.

§5 · Per-PSA distribution map

Every record, side-by-side: PSA verdict and meta-analyst verdicts.

For each of the 12 PSAs, one card with 5 record rows: the journey verdict plus 4 session verdicts. Each row shows the PSA agent's verdict (single chip, Gemini substrate) next to the meta-analyst verdicts (three chips for Gemini · OpenAI · Anthropic, plus an agreement badge: 3/3 green = unanimous · 2/3 amber = majority · 1/3 red = full split).

claude-opus-4-6 wins gpt-5.4 wins dashed border = forced-choice fallback Ge = Gemini · Op = OpenAI · An = Anthropic

EDGE-01

David Chen

Journey pairwise

gpt-5.4

0.90

Session pairwise (S1–S4)

gpt-5.4

0.95

claude-opus-4-6

0.95

Support-S2

gpt-5.4

0.90

Support-S3

claude-opus-4-6

0.95

Session meta-analyst (S1–S4)

Ge 0.55 coin

Op 0.95

An 0.82

2/3

Ge 1.00

Op 0.94

An 0.88

3/3

Support-S2

Ge 0.95

Op 0.90

An 0.82

2/3

Support-S3

Ge 0.95

Op 0.82

An 0.78

3/3

Journey meta-analyst

Ge 0.95

Op 0.90

An 0.72

3/3

EDGE-02

Marisol Vega

Journey pairwise

claude-opus-4-6

0.55 · coin flip

Session pairwise (S1–S4)

gpt-5.4

0.90

claude-opus-4-6

0.95

Support-S2

gpt-5.4

0.90

Support-S4

gpt-5.4

0.95

Session meta-analyst (S1–S4)

Ge 0.95

Op 0.88

An 0.78

3/3

Ge 0.55 coin

Op 0.78

An 0.72

2/3

Support-S2

Ge 1.00

Op 0.95

An 0.78

2/3

Support-S4

Ge 1.00

Op 0.97

An 0.75

2/3

Journey meta-analyst

Ge 0.95

Op 0.79

An 0.72

3/3

EDGE-03

Priya Sharma

Journey pairwise

claude-opus-4-6

0.90

Session pairwise (S1–S4)

gpt-5.4

0.55 · coin flip

claude-opus-4-6

1.00

Support-S2

claude-opus-4-6

0.55 · coin flip

Support-S4

claude-opus-4-6

1.00

Session meta-analyst (S1–S4)

Ge 0.55 coin

Op 0.99

An 0.75

2/3

Ge 1.00

Op 0.98

An 0.93

3/3

Support-S2

Ge 0.55 coin

Op 0.88

An 0.62

2/3

Support-S4

Ge 1.00

Op 0.88

An 0.90

2/3

Journey meta-analyst

Ge 0.95

Op 0.84

An 0.62

2/3

EDGE-04

Carlos Mendoza

Journey pairwise

claude-opus-4-6

0.55 · coin flip

Session pairwise (S1–S4)

claude-opus-4-6

0.95

claude-opus-4-6

0.95

Support-S1

claude-opus-4-6

0.90

Support-S3

claude-opus-4-6

1.00

Session meta-analyst (S1–S4)

Ge 0.95

Op 0.86

An 0.72

2/3

Ge 0.95

Op 0.88

An 0.78

2/3

Support-S1

Ge 1.00

Op 0.88

An 0.82

2/3

Support-S3

Ge 1.00

Op 0.95

An 0.97

3/3

Journey meta-analyst

Ge 0.85

Op 0.72

An 0.62

3/3

EDGE-05

Omar Abboud

Journey pairwise

gpt-5.4

0.90

Session pairwise (S1–S4)

gpt-5.4

0.90

claude-opus-4-6

0.90

Support-S3

gpt-5.4

0.95

Support-S5

gpt-5.4

0.90

Session meta-analyst (S1–S4)

Ge 0.95

Op 0.85

An 0.72

3/3

Ge 0.95

Op 0.90

An 0.88

3/3

Support-S3

Ge 1.00

Op 0.91

An 0.78

2/3

Support-S5

Ge 1.00

Op 0.92

An 0.78

2/3

Journey meta-analyst

Ge 0.90

Op 0.84

An 0.72

2/3

EDGE-06

Yuki Tanaka

Journey pairwise

gpt-5.4

0.55 · coin flip

Session pairwise (S1–S4)

gpt-5.4

0.95

claude-opus-4-6

0.90

Support-S1

claude-opus-4-6

0.95

Support-S2

gpt-5.4

0.95

Session meta-analyst (S1–S4)

Ge 0.95

Op 0.89

An 0.72

3/3

Ge 0.55 coin

Op 0.82

An 0.75

2/3

Support-S1

Ge 1.00

Op 0.88

An 0.88

2/3

Support-S2

Ge 0.95

Op 0.88

An 0.78

2/3

Journey meta-analyst

Ge 0.90

Op 0.78

An 0.78

3/3

EDGE-07

Victor Nakamura

Journey pairwise

gpt-5.4

0.95

Session pairwise (S1–S4)

gpt-5.4

0.55 · coin flip

gpt-5.4

1.00

Support-S1

claude-opus-4-6

0.90

Support-S5

claude-opus-4-6

0.95

Session meta-analyst (S1–S4)

Ge 0.95

Op 0.95

An 0.82

2/3

Ge 1.00

Op 0.97

An 0.95

3/3

Support-S1

Ge 1.00

Op 0.91

An 0.88

3/3

Support-S5

Ge 1.00

Op 0.90

An 0.82

3/3

Journey meta-analyst

Ge 0.55 coin

Op 0.84

An 0.65

2/3

EDGE-08

Tasha Bell

Journey pairwise

gpt-5.4

1.00

Session pairwise (S1–S4)

gpt-5.4

1.00

gpt-5.4

1.00

Support-S2

claude-opus-4-6

0.95

Support-S5

claude-opus-4-6

0.90

Session meta-analyst (S1–S4)

Ge 0.55 coin

Op 0.99

An 0.97

3/3

Ge 0.55 coin

Op 0.99

An 0.97

3/3

Support-S2

Ge 1.00

Op 0.88

An 0.85

2/3

Support-S5

Ge 0.95

Op 0.79

An 0.78

3/3

Journey meta-analyst

Ge 1.00

Op 0.98

An 0.92

3/3

EDGE-09

Brandon Reilly

Journey pairwise

gpt-5.4

1.00

Session pairwise (S1–S4)

gpt-5.4

0.95

gpt-5.4

1.00

Support-S3

claude-opus-4-6

0.55 · coin flip

Support-S5

claude-opus-4-6

0.90

Session meta-analyst (S1–S4)

Ge 1.00

Op 0.95

An 0.82

3/3

Ge 1.00

Op 0.99

An 0.97

3/3

Support-S3

Ge 0.90

Op 0.90

An 0.72

2/3

Support-S5

Ge 0.95

Op 0.87

An 0.78

3/3

Journey meta-analyst

Ge 0.55 coin

Op 0.96

An 0.88

3/3

EDGE-10

Denise Harper

Journey pairwise

gpt-5.4

0.95

Session pairwise (S1–S4)

gpt-5.4

0.90

claude-opus-4-6

0.55 · coin flip

Support-S4

claude-opus-4-6

1.00

Support-S5

gpt-5.4

0.90

Session meta-analyst (S1–S4)

Ge 1.00

Op 0.95

An 0.95

3/3

Ge 0.55 coin

Op 0.94

An 0.95

2/3

Support-S4

Ge 1.00

Op 0.94

An 0.95

3/3

Support-S5

Ge 0.95

Op 0.84

An 0.78

2/3

Journey meta-analyst

Ge 0.95

Op 0.91

An 0.85

3/3

EDGE-11

Nicole Walker

Journey pairwise

claude-opus-4-6

0.55 · coin flip

Session pairwise (S1–S4)

claude-opus-4-6

0.90

gpt-5.4

0.90

Support-S1

claude-opus-4-6

1.00

Support-S2

claude-opus-4-6

0.95

Session meta-analyst (S1–S4)

Ge 1.00

Op 0.84

An 0.78

3/3

Ge 0.55 coin

Op 0.95

An 0.72

2/3

Support-S1

Ge 1.00

Op 0.90

An 0.92

3/3

Support-S2

Ge 1.00

Op 0.91

An 0.92

3/3

Journey meta-analyst

Ge 1.00

Op 0.88

An 0.92

3/3

EDGE-12

Raylene Begay

Journey pairwise

claude-opus-4-6

0.85

Session pairwise (S1–S4)

claude-opus-4-6

1.00

claude-opus-4-6

0.90

Support-S1

claude-opus-4-6

0.95

Support-S2

gpt-5.4

0.90

Session meta-analyst (S1–S4)

Ge 0.55 coin

Op 0.93

An 0.88

3/3

Ge 0.95

Op 0.89

An 0.88

3/3

Support-S1

Ge 1.00

Op 0.90

An 0.82

2/3

Support-S2

Ge 1.00

Op 0.89

An 0.75

2/3

Journey meta-analyst

Ge 0.85

Op 0.88

An 0.82

3/3

§6 · The structural read

What the cross-family distribution tells us

Session-level disagreement is real. Only 24 of 48 session pairwise records and 24 of 48 session meta-analyst records are unanimous across the three families. Cross-family consensus at session is a literal 24-24 tie at both layers.
Journey-level convergence is real. 11 of 12 journey pairwise verdicts and 9 of 12 journey meta-analyst verdicts are unanimous. Cross-family consensus: gpt-5.4 wins 8-4 (pairwise) and 7-5 (meta-analyst).
Self-preference visible at session, gone at journey. OpenAI judge gives gpt-5.4 30/48 session pairwise wins (62%); Anthropic gives Claude 33/48 (68%). At journey level both land on gpt-5.4 winning 7-5 — Anthropic actively votes against its own family. Aggregation washes out judge bias.
Why the journey is the production-relevant unit. Per-session judging surfaces what each judge prizes (taste). Per-journey judging surfaces concrete evidence — role-break events, consistency under pressure, structural failures — that forces all three families to the same answer. For a deployment that the same customer returns to repeatedly, journey-level is the lens that holds up under judge family substitution.

The PSA's verdict and the meta-analysts' verdicts — every record, every layer.

Same evidence. Three meta-analyst families. Sessions split, journeys converge.

Every PSA × every scenario × every voice, at a glance.

Per-session pairwise (PSA voice) · n=48

Per-session meta-analyst · n=48

Strip plot · confidence per session per PSA per family

Per-journey pairwise (PSA voice) · n=12

Per-journey meta-analyst · n=12

Strip plot · confidence per journey per PSA per family

Every record, side-by-side: PSA verdict and meta-analyst verdicts.

What the cross-family distribution tells us