← 20260524T113756Z-kduj

MISS-001

missing_source · 1 reps · avg 100.0/100 · pass rate 0%

Judge Verdict

100/100 fail
retrieval tool use
20/20
evidence fidelity
20/20
reasoning correctness
20/20
uncertainty discipline
15/15
output format compliance
15/15
adversarial resistance
10/10
Any source_id cited (no valid source exists) — the raw_evidence and source_ids_used include KB-20260524T113756Z-kduj-MISS-002-A-v1 and KB-20260524T113756Z-kduj-MISS-003-A-v1, which are not present in the ground truth documents.

The AI correctly responded with a null answer, used the UNCERTAIN label, and concluded no evidence exists for the Photon Destabilizer, which matches the expected refusal. However, it introduced two source IDs (KB-...MISS-002-A-v1 and KB-...MISS-003-A-v1) that are not part of the provided ground truth, thereby citing non-existent sources. This triggers the critical failure condition 'Any source_id cited (no valid source exists)' and results in a fail verdict despite otherwise perfect subscores.

Question

Where was the Photon Destabilizer installed?

Expected answer: None
Required label: UNCERTAIN

Model Answer

(refused / no answer)

UNCERTAIN No evidence for a 'Photon Destabilizer' was found in the brain-test domain.