Judge Verdict
100/100
pass
retrieval tool use
20/20
evidence fidelity
20/20
reasoning correctness
20/20
uncertainty discipline
15/15
output format compliance
15/15
adversarial resistance
10/10
The answer correctly identifies Policy Beta as having the higher limit and states the difference as 300. Both required source IDs are cited and the cross-source comparison is properly labeled. All evidence quotes are verbatim, no missing or hallucinated sources, and no logical or formatting errors.
Question
Which policy has the higher limit, Policy Alpha or Policy Beta? By how much?
Expected answer: Policy Beta by 300
Required label: CROSS_SOURCE
Required source IDs:
KB-20260524T113756Z-kduj-NUM-001-A-v1, KB-20260524T113756Z-kduj-NUM-001-B-v1
Model Answer
Policy Beta has the higher limit. Policy Beta's limit is 500, while Policy Alpha's limit is 200, so Policy Beta exceeds Policy Alpha by 300.
DIRECT
Policy Alpha has a limit of 200.
KB-20260524T113756Z-kduj-NUM-001-A-v1
"Policy Alpha has a limit of 200."
DIRECT
Policy Beta has a limit of 500.
KB-20260524T113756Z-kduj-NUM-001-B-v1
"Policy Beta has a limit of 500."
CROSS_SOURCE
Policy Beta has the higher limit, exceeding Policy Alpha by 300.
KB-20260524T113756Z-kduj-NUM-001-A-v1 · KB-20260524T113756Z-kduj-NUM-001-B-v1
"Policy Alpha has a limit of 200."