← 20260524T130808Z-kqze

SID-003

exact_source_id · 1 reps · avg 15.0/100 · pass rate 0%

Judge Verdict

15/100 fail
retrieval tool use
0/20
evidence fidelity
0/20
reasoning correctness
0/20
uncertainty discipline
0/15
output format compliance
15/15
adversarial resistance
0/10
hallucinated_source_id

The AI returned a source_id that does not exist in the ground truth, a clear hallucination. The answer is objectively wrong and the evidence quote adds invented content. All retrieval, fidelity, and reasoning subscores are zero, leading to a fail.

Question

What is the source_id of the document containing tag ORC-55?

Expected answer: KB-20260524T130808Z-kqze-SID-003-A-v1
Required label: DIRECT
Required source IDs: KB-20260524T130808Z-kqze-SID-003-A-v1

Model Answer

KB-20260524T124148Z-z2do-SID-003-A-v1

DIRECT The source_id of the document containing tag ORC-55 is src_aNw-dpsvlMsW, labeled KB-20260524T124148Z-z2do-SID-003-A-v1.
KB-20260524T124148Z-z2do-SID-003-A-v1
"The orchid tag reads ORC-55. The garden is lush with roses."