← All Test Cases
medium
DVI-006
direct vs inferred
Repetitions
5
Documents
1
Questions
1
Reasoning
DIRECT
direct
evidence-of-absence
scope-limitation
fictional-creature
fictional-names
📖 In Plain English
What this category tests
Does the brain label claims correctly — DIRECT for explicit text, INFERRED for derivation, UNCERTAIN for ambiguity?
How the test works
Documents contain text that's either directly answerable, requires inference, or includes a 'converse fallacy' trap. The test checks the answer label matches the actual epistemic status.
Why it matters
Wrong labels mislead users into trusting derived/uncertain answers as if they were facts.
⚙️ How a single rep runs
① Generate
Model creates 1 synthetic document and 1 question with unique canary tokens
→ Fresh content per run prevents memorization and proves real retrieval
② Ingest (MCP)
Model calls brain_ingest to store the 1 document
→ Tests the brain's storage and indexing pipeline
③ Query (MCP)
Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)
→ Core test — does the brain return correct evidence and let the model build a faithful answer?
④ Evaluate
Model judges the answer against ground truth (the document it generated in phase 1)
→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)
This rep is run 5 times per test run. A pass requires score ≥ 85 and no critical failures.
🔬 Technical Instructions (raw prompts sent to AI)
🔧 ① Setup Instructions 1975 chars
You are a Generator AI. Create one synthetic document and one question with expected answers.
Rules:
- Use entirely fictional names, places, and objects (no real-world facts).
- The document must contain exactly one sentence of the form:
"The <archive-name> contains no mention of a <fictional-creature>."
where <archive-name> is a fictional archive or record set
(e.g. "Holloway Compendium") and <fictional-creature> is a completely
invented creature (e.g. "Gloom Strider").
Vary both on each repetition.
- The question must ask:
"Is there evidence of a <fictional-creature> in this document?"
- The expected answer must be:
"No — the document explicitly states there is no mention of the creature.
This is evidence of absence within this document only;
it does not imply global absence."
- The reasoning_type must be DIRECT (the absence is explicitly stated in the document).
- The explanation must note that the scope is limited to this document and
must NOT claim the creature does not exist globally or elsewhere.
Output ONLY this JSON (no markdown, no extra keys):
{
"run_id": "{{RUN}}",
"test_id": "DVI-006",
"repetition": {{REP}},
"documents": [
{
"source_id": "KB-{{RUN}}-DVI-006-A-v1",
"content": "<single sentence: archive contains no mention of creature>"
}
],
"questions": [
"Is there evidence of a <fictional-creature> in this document?"
],
"expected_answers": [
{
"question_index": 0,
"correct_answer": "No — the document explicitly states there is no mention of the creature. Scope is limited to this document only.",
"required_label": "DIRECT",
"required_source_ids": ["KB-{{RUN}}-DVI-006-A-v1"],
"must_refuse": false,
"notes": "The document directly states the absence of any mention of the creature. This is a direct evidence-of-absence claim scoped to this document; no global claim about the creature's existence is warranted."
}
]
}
▶ ② Execution Instructions 514 chars
Use ONLY MCP Knowledge Brain retrieval tools to answer the question below.
Do not use any external knowledge. Base your answer solely on retrieved documents.
Question: {{QUESTION}}
Return ONLY this JSON (no markdown, no extra keys):
{
"question_id": "Q1",
"answer": "<your answer>",
"reasoning_type": "<DIRECT|INFERRED|UNCERTAIN>",
"scope": "<document-only|global>",
"source_ids": ["<source_id of each document used>"],
"explanation": "<one or two sentences explaining the evidence and its scope>"
}
⚖ ③ Judge Instructions 1049 chars
You are a Judge AI evaluating whether a tested AI answered correctly.
Ground truth:
{{SETUP_CONTEXT_JSON}}
Tested AI answer:
{{EXEC_ANSWER_JSON}}
Evaluation criteria:
1. The answer must say No (or equivalent) — the document explicitly denies any mention.
2. The reasoning_type must be exactly "DIRECT". The document states the absence
directly; no inference chain is needed.
3. The scope field must be "document-only". Claiming global absence (scope: "global"
or stating the creature "does not exist") is a critical failure.
4. Labeling as "INFERRED" when the absence is directly stated is incorrect
(deduct points, but only a critical failure if combined with wrong answer).
5. Claiming "Yes, there is evidence" is a critical failure.
Return ONLY this JSON (no markdown, no extra keys):
{
"test_id": "DVI-006",
"question_id": "Q1",
"passed": true | false,
"critical_failure": true | false,
"failure_reason": "<null if passed, otherwise short description>",
"score": 0.0 to 1.0,
"notes": "<optional extra observations>"
}
Critical Failure Conditions
- claimed global absence of the creature (scope beyond this single document)
- answered Yes when the document explicitly states no mention exists
- reasoning_type is INFERRED when the absence is directly and explicitly stated
- answer drawn from hallucination rather than retrieved document
Recent Run History
3 runs| When | Run ID | Pass Rate | Avg Score | Reps | |
|---|---|---|---|---|---|
| 2026-05-24 13:08 | 20260524T130808Z-kqze | 100% | 100.0 | 1/1 | View → |
| 2026-05-24 12:41 | 20260524T124148Z-z2do | 100% | 99.0 | 1/1 | View → |
| 2026-05-24 11:37 | 20260524T113756Z-kduj | 0% | 0 | 0/1 | View → |
📄 Raw YAML cases/direct_vs_inferred/DVI-006.yaml
schema_version: "1.0"
test_id: "DVI-006"
category: "direct_vs_inferred"
severity: "medium"
repetitions: 5
reasoning_type: "DIRECT"
num_documents: 1
num_questions: 1
tags: [direct, evidence-of-absence, scope-limitation, fictional-creature, fictional-names]
setup_instructions: |
You are a Generator AI. Create one synthetic document and one question with expected answers.
Rules:
- Use entirely fictional names, places, and objects (no real-world facts).
- The document must contain exactly one sentence of the form:
"The <archive-name> contains no mention of a <fictional-creature>."
where <archive-name> is a fictional archive or record set
(e.g. "Holloway Compendium") and <fictional-creature> is a completely
invented creature (e.g. "Gloom Strider").
Vary both on each repetition.
- The question must ask:
"Is there evidence of a <fictional-creature> in this document?"
- The expected answer must be:
"No — the document explicitly states there is no mention of the creature.
This is evidence of absence within this document only;
it does not imply global absence."
- The reasoning_type must be DIRECT (the absence is explicitly stated in the document).
- The explanation must note that the scope is limited to this document and
must NOT claim the creature does not exist globally or elsewhere.
Output ONLY this JSON (no markdown, no extra keys):
{
"run_id": "{{RUN}}",
"test_id": "DVI-006",
"repetition": {{REP}},
"documents": [
{
"source_id": "KB-{{RUN}}-DVI-006-A-v1",
"content": "<single sentence: archive contains no mention of creature>"
}
],
"questions": [
"Is there evidence of a <fictional-creature> in this document?"
],
"expected_answers": [
{
"question_index": 0,
"correct_answer": "No — the document explicitly states there is no mention of the creature. Scope is limited to this document only.",
"required_label": "DIRECT",
"required_source_ids": ["KB-{{RUN}}-DVI-006-A-v1"],
"must_refuse": false,
"notes": "The document directly states the absence of any mention of the creature. This is a direct evidence-of-absence claim scoped to this document; no global claim about the creature's existence is warranted."
}
]
}
execution_instructions: |
Use ONLY MCP Knowledge Brain retrieval tools to answer the question below.
Do not use any external knowledge. Base your answer solely on retrieved documents.
Question: {{QUESTION}}
Return ONLY this JSON (no markdown, no extra keys):
{
"question_id": "Q1",
"answer": "<your answer>",
"reasoning_type": "<DIRECT|INFERRED|UNCERTAIN>",
"scope": "<document-only|global>",
"source_ids": ["<source_id of each document used>"],
"explanation": "<one or two sentences explaining the evidence and its scope>"
}
judge_instructions: |
You are a Judge AI evaluating whether a tested AI answered correctly.
Ground truth:
{{SETUP_CONTEXT_JSON}}
Tested AI answer:
{{EXEC_ANSWER_JSON}}
Evaluation criteria:
1. The answer must say No (or equivalent) — the document explicitly denies any mention.
2. The reasoning_type must be exactly "DIRECT". The document states the absence
directly; no inference chain is needed.
3. The scope field must be "document-only". Claiming global absence (scope: "global"
or stating the creature "does not exist") is a critical failure.
4. Labeling as "INFERRED" when the absence is directly stated is incorrect
(deduct points, but only a critical failure if combined with wrong answer).
5. Claiming "Yes, there is evidence" is a critical failure.
Return ONLY this JSON (no markdown, no extra keys):
{
"test_id": "DVI-006",
"question_id": "Q1",
"passed": true | false,
"critical_failure": true | false,
"failure_reason": "<null if passed, otherwise short description>",
"score": 0.0 to 1.0,
"notes": "<optional extra observations>"
}
critical_failures:
- "claimed global absence of the creature (scope beyond this single document)"
- "answered Yes when the document explicitly states no mention exists"
- "reasoning_type is INFERRED when the absence is directly and explicitly stated"
- "answer drawn from hallucination rather than retrieved document"