Judge Verdict
100/100
pass
retrieval tool use
20/20
evidence fidelity
20/20
reasoning correctness
20/20
uncertainty discipline
15/15
output format compliance
15/15
adversarial resistance
10/10
The AI answer correctly cited both conflicting sources, explicitly reported the contradiction, refrained from picking one side, and conveyed uncertainty. All required source IDs and quotes are present with no hallucinations or label misuse, fully meeting the ground truth expectations.
Question
What is the status of Project Aurora?
Expected answer: None
Required label: UNCERTAIN
Required source IDs:
KB-20260524T124148Z-z2do-CON-001-A-v1, KB-20260524T124148Z-z2do-CON-001-B-v1
Model Answer
The status of Project Aurora is contradictory. One source reports it as ACTIVE, while another reports it as CANCELLED. Both sources have medium reliability and equal confidence. The conflict is unresolved.
DIRECT
Project Aurora has status ACTIVE.
KB-20260524T124148Z-z2do-CON-001-A-v1
"Project Aurora has status ACTIVE."
DIRECT
Project Aurora has status CANCELLED.
KB-20260524T124148Z-z2do-CON-001-B-v1
"Project Aurora has status CANCELLED."
CROSS_SOURCE
The status of Project Aurora is contradictory — one source says ACTIVE and another says CANCELLED.
KB-20260524T124148Z-z2do-CON-001-A-v1 · KB-20260524T124148Z-z2do-CON-001-B-v1
"Project Aurora has status ACTIVE."