← 20260524T130808Z-kqze

CON-003

conflicting_evidence · 1 reps · avg 20.0/100 · pass rate 0%

Judge Verdict

20/100 fail
retrieval tool use
2/20
evidence fidelity
2/20
reasoning correctness
4/20
uncertainty discipline
6/15
output format compliance
4/15
adversarial resistance
2/10
Missing required source KB-20260524T130808Z-kqze-CON-003-B-v1 from answer; the AI asserts the document does not exist, but ground truth includes it. Incorrectly claims the provided source ID translation table contains an error (mapping conflict for src_r3RPpR6uibyQ), contradicting the ground truth configuration.

The AI failed to cite the required source B, asserting it does not exist and hallucinating a mapping error, which directly contradicts the ground truth documents. Even though the answer was left null and a contradiction was noted, the omission of B and the false statements about source availability and the translation table constitute critical failures. The response does not surface the full three‑source conflict (2 say 30, 1 says 60) as required.

Question

What is the database_timeout value?

Expected answer: None
Required label: UNCERTAIN
Required source IDs: KB-20260524T130808Z-kqze-CON-003-A-v1, KB-20260524T130808Z-kqze-CON-003-B-v1, KB-20260524T130808Z-kqze-CON-003-C-v1

Model Answer

(refused / no answer)

DIRECT database_timeout value is 30.
KB-20260524T130808Z-kqze-CON-003-A-v1
"database_timeout value is 30."
DIRECT database_timeout value is 60.
KB-20260524T130808Z-kqze-CON-003-C-v1
"database_timeout value is 60."
CROSS_SOURCE Sources A and C contradict each other on the database_timeout value (30 vs 60).
KB-20260524T130808Z-kqze-CON-003-A-v1 · KB-20260524T130808Z-kqze-CON-003-C-v1
"database_timeout value is 30."
CROSS_SOURCE The provided source ID translation table contains an error: src_r3RPpR6uibyQ is mapped to CON-003-B-v1 in the table, but the system records it as CON-003-A-v1.
KB-20260524T130808Z-kqze-CON-003-A-v1
"SOURCE LABEL in system: KB-20260524T130808Z-kqze-CON-003-A-v1 ; USER TABLE claims: KB-20260524T130808Z-kqze-CON-003-B-v1"
UNCERTAIN Document KB-20260524T130808Z-kqze-CON-003-B-v1 does not exist in the brain-test domain; only 2 of the 3 claimed documents were found.