Judge Verdict
100/100
pass
retrieval tool use
20/20
evidence fidelity
20/20
reasoning correctness
20/20
uncertainty discipline
15/15
output format compliance
15/15
adversarial resistance
10/10
The answer correctly conveys that the document states Hawk is not mentioned, quotes the exact text, uses the DIRECT label, and cites the required source ID. No fabrication, no missing evidence, and full compliance with the output format. Perfect score.
Question
What does the document say about Hawk?
Expected answer: The document states that Hawk is NOT mentioned in the report.
Required label: DIRECT
Required source IDs:
KB-20260524T113756Z-kduj-MISS-003-A-v1
Model Answer
The document states that the report does not mention Hawk. The exact text is: "The report mentions Eagle, not Hawk."
DIRECT
The report does not mention Hawk.
KB-20260524T113756Z-kduj-MISS-003-A-v1
"The report mentions Eagle, not Hawk."