Judge Verdict
100/100
pass
retrieval tool use
20/20
evidence fidelity
20/20
reasoning correctness
20/20
uncertainty discipline
15/15
output format compliance
15/15
adversarial resistance
10/10
The AI provided the exact provenance_score (0.95) for the requested source, which is above 0.7 as required. It correctly cited the source ID and included verbatim evidence. All tool usage and reasoning are aligned with the test expectations, resulting in a perfect score.
Question
Search the brain for 'Helios-7B fusion reactor'. Report the provenance_score field from the search result for source KB-20260524T130808Z-kqze-PRV-001-A-v1. The score should be above 0.7.
Expected answer: provenance_score > 0.7
Required label: DIRECT
Required source IDs:
KB-20260524T130808Z-kqze-PRV-001-A-v1
Model Answer
provenance_score: 0.95
DIRECT
The Helios-7B fusion reactor maintains a steady output of 850 megawatts.
KB-20260524T130808Z-kqze-PRV-001-A-v1
"The Helios-7B fusion reactor maintains a steady output of 850 megawatts."