← 20260524T130808Z-kqze

MISS-002

missing_source · 1 reps · avg 20.0/100 · pass rate 0%

Judge Verdict

20/100 fail
retrieval tool use
5/20
evidence fidelity
0/20
reasoning correctness
10/20
uncertainty discipline
5/15
output format compliance
10/15
adversarial resistance
0/10
hallucinated source_ids

The AI correctly states that no source mentions Cabinet 9, but it then fabricates multiple source IDs and quotes for Cabinet 2 that do not exist in the provided ground truth. These hallucinated source IDs constitute a critical failure, rendering the answer unreliable despite the partial correctness on the main question.

Question

Cite the source that mentions Cabinet 9.

Expected answer: None
Required label: UNCERTAIN

Model Answer

No source in the brain-test domain mentions Cabinet 9. The only cabinet referenced in any stored source is Cabinet 2.

UNCERTAIN No source in the brain-test domain mentions Cabinet 9.
CROSS_SOURCE Cabinet 2 is the only cabinet mentioned in the brain-test domain.
KB-20260524T124148Z-z2do-MISS-002-A-v1 · KB-20260524T113756Z-kduj-MISS-002-A-v1
"Alice stored the diamond in Cabinet 2."