Judge Verdict
95/100
pass
retrieval tool use
20/20
evidence fidelity
20/20
reasoning correctness
20/20
uncertainty discipline
15/15
output format compliance
10/15
adversarial resistance
10/10
The AI correctly identified the ambiguity, surfaced both CFOs with their respective companies, and cited both required sources. It communicated uncertainty clearly in a dedicated section. The only minor deviation is the absence of an explicit overall "UNCERTAIN" label, which prevents a perfect output format score but does not constitute a critical failure.
Question
Who is the Chief Financial Officer?
Expected answer: None
Required label: UNCERTAIN
Required source IDs:
KB-20260524T113756Z-kduj-ENT-003-A-v1, KB-20260524T113756Z-kduj-ENT-003-B-v1
Model Answer
The question is ambiguous — the brain-test domain contains two Chief Financial Officers for two different companies. Alex Morgan is the CFO of GlobalTech Solutions, and Jordan Lee is the CFO of NexaCorp.
DIRECT
Alex Morgan is the Chief Financial Officer of GlobalTech Solutions.
KB-20260524T113756Z-kduj-ENT-003-A-v1
"Alex Morgan is the Chief Financial Officer of GlobalTech Solutions."
DIRECT
Jordan Lee is the Chief Financial Officer of NexaCorp.
KB-20260524T113756Z-kduj-ENT-003-B-v1
"Jordan Lee is the Chief Financial Officer of NexaCorp."