← 20260524T130808Z-kqze

SDR-001

semantic_drift · 1 reps · avg 37.0/100 · pass rate 0%

Judge Verdict

37/100 fail
retrieval tool use
12/20
evidence fidelity
5/20
reasoning correctness
0/20
uncertainty discipline
5/15
output format compliance
15/15
adversarial resistance
0/10
Wrong location (answer is null, did not provide 'Vault 12') Missing required source ID in answer claims Incorrect label (UNCERTAIN instead of DIRECT)

The brain successfully retrieved the correct document ("The casket is secured in Vault 12") but then incorrectly claimed no explicit information about "the box" was found, returning a null answer. It failed to apply semantic equivalence between 'box' and 'casket', resulting in a missing location, absent required source, and wrong label, meriting a fail score of 37.

Question

Where is the box stored?

Expected answer: Vault 12
Required label: DIRECT
Required source IDs: KB-20260524T130808Z-kqze-SDR-001-A-v1

Model Answer

(refused / no answer)

UNCERTAIN No explicit information about where "the box" is stored was found in the brain-test domain.