← 20260524T130808Z-kqze

NIH-001

needle_haystack · 1 reps · avg 65.0/100 · pass rate 0%

Judge Verdict

65/100 fail
retrieval tool use
20/20
evidence fidelity
5/20
reasoning correctness
5/20
uncertainty discipline
15/15
output format compliance
15/15
adversarial resistance
5/10

The AI correctly identified the target document among distractors using the canary, but it did not provide the required verbatim sentence. The quoted evidence is a modified, shortened version that omits 'is entrusted to Pathfinder Ione', which is a serious evidence fidelity failure. The reasoning mistakenly asserts the quote is verbatim, further undermining correctness. Total score 65, leading to a fail.

Question

Using canary 'cobalt-wren-673', find the exact fact from the target document and return the source_id and verbatim sentence.

Expected answer: The amber sextant is entrusted to Pathfinder Ione. Canary: cobalt-wren-673.
Required label: DIRECT
Required source IDs: KB-20260524T130808Z-kqze-NIH-001-TARGET-v1

Model Answer

The amber sextant has canary: cobalt-wren-673.

DIRECT The amber sextant has canary: cobalt-wren-673.
KB-20260524T130808Z-kqze-NIH-001-TARGET-v1
"The amber sextant has canary: cobalt-wren-673."