# Judge Verdict - NEXA diagram bakeoff

## Scope and evidence

I reviewed the shared brief, canonical spec, all six `notes.md` files, and opened all twelve PNG renders for pixel-level inspection. [SoT: Vault] `plans/260617-0500-n8n-alt-research-nexa-design-explain/bakeoff/_brief-judge.md`, `plans/260617-0500-n8n-alt-research-nexa-design-explain/bakeoff/NEXA-CONCEPT-SPEC.md`, `plans/260617-0500-n8n-alt-research-nexa-design-explain/bakeoff/<tech>/notes.md`, `plans/260617-0500-n8n-alt-research-nexa-design-explain/bakeoff/<tech>/*-render.png` [Confidence: High]

The canonical target has 22 named nodes and the required feedback or back-edge routes include `G1 -> RD`, `G4 -> IMPL`, `SOT -> RD`, and `MCP -> IN`. [SoT: Vault] `plans/260617-0500-n8n-alt-research-nexa-design-explain/bakeoff/NEXA-CONCEPT-SPEC.md` [Confidence: High]

No submitted PNG is blank or below the 10KB gate. Actual sizes range from 45,620 bytes to 949,593 bytes. [SoT: Code] `wc -c plans/260617-0500-n8n-alt-research-nexa-design-explain/bakeoff/*/*-render.png` [Confidence: High]

## Scorecard

Scores are 1 to 10. `Fid` = fidelity to 22-node spec and required edges. `Vis` = pixel-level visual quality. `Auth` = LLM first-try authorability. `RT` = text-diffable, LLM-readable feedback loop. [SoT: Inference] Scored from direct PNG review plus builder notes [Confidence: Medium]

| Rank | Tech | Mode | Fid | Vis | Auth | RT | Avg | Pixel evidence |
|---:|---|---|---:|---:|---:|---:|---:|---|
| 1 | graphviz | B skill | 9 | 9 | 8 | 9 | 8.75 | Clean vertical SDLC spine, grouped BA and validators, readable Japanese, clear long feedback arcs. |
| 2 | drawio | A LLM-pure | 9 | 9 | 7 | 7 | 8.00 | Best explicit waypoint routing, readable labels, strong color semantics, no node overlap. |
| 3 | mermaid | B skill | 9 | 7 | 7 | 7 | 7.50 | All content present, good page chrome and legend, but long labels clip or truncate in several nodes. |
| 4 | plantuml | B skill | 8 | 5 | 8 | 8 | 7.25 | Text source likely strong, but render has heavy crossings and the dark system section is clipped and low contrast. |
| 5 | mermaid | A LLM-pure | 7 | 5 | 8 | 9 | 7.25 | Correct compact source, but visual is narrow, small, and disconnected between BA and main flow. |
| 6 | graphviz | A LLM-pure | 8 | 4 | 8 | 9 | 7.25 | DOT is authorable, but rank direction produced an ultra-wide strip with tiny labels and poor human readability. |
| 7 | html-svg | B skill | 7 | 8 | 5 | 9 | 7.25 | Polished page and good visuals, but hand-coordinate geometry remains brittle and notes admit only 23 edges. |
| 8 | html-svg | A LLM-pure | 7 | 6 | 5 | 9 | 6.75 | Text-diffable and readable, but huge manual Bezier routes, tiny labels, and only 23 edges per notes. |
| 9 | excalidraw | B skill | 8 | 6 | 6 | 7 | 6.75 | Good hand-drawn readability, but group labels are clipped, feedback labels collide, and margins are overloaded. |
| 10 | drawio | B skill | 7 | 5 | 8 | 6 | 6.50 | All nodes appear, but red dashed routing overlays the main column and edge labels are ambiguous. |
| 11 | plantuml | A LLM-pure | 6 | 7 | 5 | 8 | 6.50 | Activity diagram changes semantics, SOT and MCP back-edges become notes, not real routed edges. |
| 12 | excalidraw | A LLM-pure | 8 | 5 | 6 | 7 | 6.50 | Nodes are present, but left feedback labels are clipped and overlapping, with heavy margin congestion. |

## Brutal findings by tech

### Mermaid

`A` is a useful source-of-truth artifact, not a presentation artifact: it is too narrow and small, with cramped back-edge labels. [SoT: Vault] `bakeoff/mermaid/A-render.png`, `bakeoff/mermaid/notes.md` [Confidence: High]

`B` is much better for a rendered page, but several labels are visibly truncated, including BA and phase labels. [SoT: Vault] `bakeoff/mermaid/B-render.png` [Confidence: High]

Mermaid remains strong for LLM generation because `.mmd` is compact plain text and edges/classes are line-level editable. [SoT: Vault] `bakeoff/mermaid/notes.md` [Confidence: Medium]

### Excalidraw

Both Excalidraw renders are real and human-friendly at first glance, but both have margin and label defects in the feedback loops. [SoT: Vault] `bakeoff/excalidraw/A-render.png`, `bakeoff/excalidraw/B-render.png` [Confidence: High]

`B` improves grouping, but group labels at the top are clipped and the left-side `UC1` and `UC3` labels collide. [SoT: Vault] `bakeoff/excalidraw/B-render.png` [Confidence: High]

Excalidraw is not the best LLM-driven engine here because spatial coordinates dominate the work and edits can easily break routing. [SoT: Inference] Based on PNG routing defects and Excalidraw source/notes [Confidence: Medium]

### Draw.io

`A` is one of the best final visuals because explicit waypoints route feedback loops cleanly around the canvas. [SoT: Vault] `bakeoff/drawio/A-render.png`, `bakeoff/drawio/notes.md` [Confidence: High]

`B` is materially worse than `A`: red dashed feedback edges run down the main spine and make labels like `UC1 reverse code->doc`, `UC3 Q&A`, and `bug loop` ambiguous. [SoT: Vault] `bakeoff/drawio/B-render.png` [Confidence: High]

Draw.io XML is viable but not ideal for LLM-first workflows because the source is verbose and geometry-heavy. [SoT: Inference] Based on `bakeoff/drawio/notes.md` and rendered coordinate complexity [Confidence: Medium]

### Graphviz

`A` proves the failure mode: a naive DOT layout becomes a very wide, low-readability strip. [SoT: Vault] `bakeoff/graphviz/A-render.png` [Confidence: High]

`B` is the strongest overall render: it preserves the flow, separates BA and validators, keeps Japanese readable, and makes long feedback arcs visible without cluttering the center. [SoT: Vault] `bakeoff/graphviz/B-render.png` [Confidence: High]

Graphviz is the best balance of LLM-writable text and automatic layout when the prompt or skill constrains orientation and grouping. [SoT: Inference] Based on `bakeoff/graphviz/B-render.png` and `bakeoff/graphviz/notes.md` [Confidence: Medium]

### PlantUML

`A` is structurally wrong for this target because activity syntax cannot faithfully express arbitrary graph back-edges without semantic distortion. [SoT: Vault] `bakeoff/plantuml/A-render.png`, `bakeoff/plantuml/notes.md` [Confidence: High]

`B` has excellent authorability on paper, but the actual component render is not acceptable as a polished final diagram: system-layer labels are clipped or low-contrast and several edges cross heavily. [SoT: Vault] `bakeoff/plantuml/B-render.png` [Confidence: High]

PlantUML should be kept for sequence/component diagrams where its layout fits the content, not as the default for this NEXA SDLC flow. [SoT: Inference] Based on observed PlantUML renders and notes [Confidence: Medium]

### HTML/SVG

`B` is visually polished and zero-runtime, but it is still hand-authored geometry with manual arcs, manual text wrapping, and manual coordinate maintenance. [SoT: Vault] `bakeoff/html-svg/B-render.png`, `bakeoff/html-svg/notes.md` [Confidence: High]

The HTML/SVG builder notes state both modes used 22 nodes and 23 edges, which misses the full 26-edge target from the spec. [SoT: Vault] `bakeoff/html-svg/notes.md`, `bakeoff/NEXA-CONCEPT-SPEC.md` [Confidence: High]

HTML/SVG is best as a final packaging or polish layer, not as the primary LLM diagram authoring format for 22-node flows. [SoT: Inference] Based on notes and manual geometry burden [Confidence: Medium]

## Ranking for LLM-driven diagrams

1. Graphviz B skill-assisted: best overall for this specific SDLC flow because it combines text source, auto-layout, grouping, and the best final pixels. [SoT: Inference] Scorecard row 1 [Confidence: Medium]
2. Draw.io A LLM-pure XML: best routed final picture, but coordinate planning and XML verbosity hurt LLM iteration. [SoT: Inference] Scorecard row 2 [Confidence: Medium]
3. Mermaid B skill-assisted: best practical default when source simplicity matters more than perfect routing. [SoT: Inference] Scorecard row 3 [Confidence: Medium]
4. PlantUML B component: best authorability, but final render defects keep it out of the top tier. [SoT: Inference] Scorecard row 4 [Confidence: Medium]
5. Mermaid A LLM-pure: excellent round-trip source, weak presentation. [SoT: Inference] Scorecard row 5 [Confidence: Medium]
6. Graphviz A LLM-pure: authorable but bad default orientation. [SoT: Inference] Scorecard row 6 [Confidence: Medium]
7. HTML/SVG B skill-assisted: good polish, poor authoring scalability. [SoT: Inference] Scorecard row 7 [Confidence: Medium]
8. HTML/SVG A LLM-pure: diffable but too geometry-heavy and incomplete on edges. [SoT: Inference] Scorecard row 8 [Confidence: Medium]
9. Excalidraw B skill-assisted: good sketch feel, not robust enough for automated eval. [SoT: Inference] Scorecard row 9 [Confidence: Medium]
10. Draw.io B skill-assisted: CLI authorability did not translate to clean routing. [SoT: Inference] Scorecard row 10 [Confidence: Medium]
11. PlantUML A LLM-pure: wrong diagram type for arbitrary back-edges. [SoT: Inference] Scorecard row 11 [Confidence: Medium]
12. Excalidraw A LLM-pure: too much coordinate and label fragility. [SoT: Inference] Scorecard row 12 [Confidence: Medium]

## Category winners

### Best LLM-pure authorability

Winner: Mermaid A. The `.mmd` source is compact, line-diffable, and uses direct node and edge declarations that an LLM can emit reliably. [SoT: Vault] `bakeoff/mermaid/notes.md`, `bakeoff/mermaid/A-llm-pure.mmd` [Confidence: Medium]

Runner-up: Graphviz A. DOT is similarly text-native, but the default layout trap produced a poor first visual. [SoT: Vault] `bakeoff/graphviz/A-render.png` [Confidence: Medium]

### Best final visual quality

Winner: Graphviz B. It has the clearest full-system render with strong grouping, readable Japanese, no clipping, and visible back-edge semantics. [SoT: Vault] `bakeoff/graphviz/B-render.png` [Confidence: High]

Runner-up: Draw.io A. It has excellent routing and readability, but relies on painstaking waypoint geometry. [SoT: Vault] `bakeoff/drawio/A-render.png` [Confidence: High]

### Best feedback and eval round-trip format

Winner: Mermaid A. The source is pure text, small, source-controlled, and easy for an LLM evaluator to compare against the canonical node and edge list. [SoT: Vault] `bakeoff/mermaid/A-llm-pure.mmd`, `bakeoff/mermaid/notes.md` [Confidence: Medium]

Runner-up: PlantUML B. Component mode is also plain text and very patchable, but its render quality was weaker for this graph. [SoT: Vault] `bakeoff/plantuml/B-skill.puml`, `bakeoff/plantuml/B-render.png` [Confidence: Medium]

## Blocked or partial notes

No builder is blocked at the artifact level because every tech produced both `A-render.png` and `B-render.png`, and every PNG exceeds 10KB. [SoT: Code] `wc -c plans/260617-0500-n8n-alt-research-nexa-design-explain/bakeoff/*/*-render.png` [Confidence: High]

Partial or defective outputs: HTML/SVG reports only 23 edges, PlantUML A substitutes notes for required SOT and MCP back-edges, Draw.io B has broken-looking feedback routing, PlantUML B has clipped or low-contrast system-layer content, Excalidraw has clipped or colliding loop labels, and Mermaid B has visible label truncation. [SoT: Vault] `bakeoff/html-svg/notes.md`, `bakeoff/plantuml/A-render.png`, `bakeoff/drawio/B-render.png`, `bakeoff/plantuml/B-render.png`, `bakeoff/excalidraw/B-render.png`, `bakeoff/mermaid/B-render.png` [Confidence: High]

## Final recommendation

Adopt a two-format pipeline for `nexa-design --explain` diagrams. [SoT: Inference] Based on scorecard and category winners [Confidence: Medium]

1. Primary authored format: Mermaid `.mmd` for LLM generation, evaluation, diffs, and source-of-truth storage. [SoT: Inference] Mermaid A has the best round-trip score and highest LLM-pure authorability among successful pure modes [Confidence: Medium]
2. Primary polished renderer for complex SDLC process diagrams: Graphviz skill-assisted template when final human readability matters. [SoT: Inference] Graphviz B is the top-ranked render and best final visual [Confidence: Medium]
3. Optional export target: Draw.io XML only when a human must continue editing in diagrams.net and the workflow can afford coordinate or waypoint maintenance. [SoT: Inference] Draw.io A has strong final pixels but weaker LLM edit ergonomics [Confidence: Medium]
4. Do not default to Excalidraw, raw HTML/SVG, or PlantUML activity diagrams for this class of 22-node feedback-loop process diagrams. [SoT: Inference] These modes showed coordinate fragility, edge incompleteness, or semantic mismatch [Confidence: Medium]

Operational rule: keep the canonical diagram as text, evaluate it against the 22-node and 26-edge spec, then render via the selected engine and screenshot-check for clipping, Japanese text, and feedback-loop clarity before publishing. [SoT: Inference] Based on the judge brief quality gate and observed defects [Confidence: Medium]