← 20260524T113756Z-kduj

MISS-002

missing_source · 1 reps · avg 59.0/100 · pass rate 0%

Judge Verdict

59/100 fail
retrieval tool use
5/20
evidence fidelity
5/20
reasoning correctness
15/20
uncertainty discipline
14/15
output format compliance
12/15
adversarial resistance
8/10
hallucinated_source_id

The answer correctly refuses to cite a source for Cabinet 9 and provides the required UNCERTAIN label, but it hallucinates two source IDs not present in the ground truth. This triggers a critical failure for hallucinated source IDs, resulting in an automatic fail with a low score, despite the core reasoning being on target.

Question

Cite the source that mentions Cabinet 9.

Expected answer: None
Required label: UNCERTAIN

Model Answer

No source in the brain-test domain mentions Cabinet 9. The only cabinet-related claim found references Cabinet 2, not Cabinet 9.

UNCERTAIN No source in the brain-test domain mentions Cabinet 9.