← 20260524T124148Z-z2do

MISS-003

missing_source · 1 reps · avg 68.0/100 · pass rate 0%

Judge Verdict

68/100 fail
retrieval tool use
5/20
evidence fidelity
18/20
reasoning correctness
20/20
uncertainty discipline
5/15
output format compliance
15/15
adversarial resistance
5/10
hallucinated source_id

The answer correctly identifies the explicit absence of Falcon and cites the required source with the direct quote. However, it introduces a hallucinated source_id (KB-20260524T113756Z-kduj-MISS-003-A-v1) not present in the ground truth, triggering the critical failure condition for hallucinated source_ids. The extra source appears in the missing_or_uncertain section and source_ids_used, which undermines reliability. Overall accuracy on the core question is high, but the fabricated source constitutes a severe error.

Question

What does the document say about Falcon?

Expected answer: The document states that Falcon is NOT mentioned in the report.
Required label: DIRECT
Required source IDs: KB-20260524T124148Z-z2do-MISS-003-A-v1

Model Answer

The document (KB-20260524T124148Z-z2do-MISS-003-A-v1) states that the report does NOT mention Falcon. Its entire content is: "The report mentions Sparrow, not Falcon." Falcon is explicitly called out as absent from the report.

DIRECT The report mentions Sparrow, not Falcon.
KB-20260524T124148Z-z2do-MISS-003-A-v1
"The report mentions Sparrow, not Falcon."
DIRECT Falcon is not mentioned in the report.
KB-20260524T124148Z-z2do-MISS-003-A-v1
"The report mentions Sparrow, not Falcon."