← All Test Cases

DVI-006

direct vs inferred

medium
Repetitions
5
Documents
1
Questions
1
Reasoning
DIRECT
direct evidence-of-absence scope-limitation fictional-creature fictional-names

📖 In Plain English

What this category tests

Does the brain label claims correctly — DIRECT for explicit text, INFERRED for derivation, UNCERTAIN for ambiguity?

How the test works

Documents contain text that's either directly answerable, requires inference, or includes a 'converse fallacy' trap. The test checks the answer label matches the actual epistemic status.

Why it matters

Wrong labels mislead users into trusting derived/uncertain answers as if they were facts.

⚙️ How a single rep runs

① Generate
Model creates 1 synthetic document and 1 question with unique canary tokens
→ Fresh content per run prevents memorization and proves real retrieval
② Ingest (MCP)
Model calls brain_ingest to store the 1 document
→ Tests the brain's storage and indexing pipeline
③ Query (MCP)
Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)
→ Core test — does the brain return correct evidence and let the model build a faithful answer?
④ Evaluate
Model judges the answer against ground truth (the document it generated in phase 1)
→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)

This rep is run 5 times per test run. A pass requires score ≥ 85 and no critical failures.

🔬 Technical Instructions (raw prompts sent to AI)

🔧 ① Setup Instructions 1975 chars
You are a Generator AI. Create one synthetic document and one question with expected answers.

Rules:
- Use entirely fictional names, places, and objects (no real-world facts).
- The document must contain exactly one sentence of the form:
    "The <archive-name> contains no mention of a <fictional-creature>."
  where <archive-name> is a fictional archive or record set
  (e.g. "Holloway Compendium") and <fictional-creature> is a completely
  invented creature (e.g. "Gloom Strider").
  Vary both on each repetition.
- The question must ask:
    "Is there evidence of a <fictional-creature> in this document?"
- The expected answer must be:
    "No — the document explicitly states there is no mention of the creature.
     This is evidence of absence within this document only;
     it does not imply global absence."
- The reasoning_type must be DIRECT (the absence is explicitly stated in the document).
- The explanation must note that the scope is limited to this document and
  must NOT claim the creature does not exist globally or elsewhere.

Output ONLY this JSON (no markdown, no extra keys):
{
  "run_id": "{{RUN}}",
  "test_id": "DVI-006",
  "repetition": {{REP}},
  "documents": [
    {
      "source_id": "KB-{{RUN}}-DVI-006-A-v1",
      "content": "<single sentence: archive contains no mention of creature>"
    }
  ],
  "questions": [
    "Is there evidence of a <fictional-creature> in this document?"
  ],
  "expected_answers": [
    {
      "question_index": 0,
      "correct_answer": "No — the document explicitly states there is no mention of the creature. Scope is limited to this document only.",
      "required_label": "DIRECT",
      "required_source_ids": ["KB-{{RUN}}-DVI-006-A-v1"],
      "must_refuse": false,
      "notes": "The document directly states the absence of any mention of the creature. This is a direct evidence-of-absence claim scoped to this document; no global claim about the creature's existence is warranted."
    }
  ]
}
▶ ② Execution Instructions 514 chars
Use ONLY MCP Knowledge Brain retrieval tools to answer the question below.
Do not use any external knowledge. Base your answer solely on retrieved documents.

Question: {{QUESTION}}

Return ONLY this JSON (no markdown, no extra keys):
{
  "question_id": "Q1",
  "answer": "<your answer>",
  "reasoning_type": "<DIRECT|INFERRED|UNCERTAIN>",
  "scope": "<document-only|global>",
  "source_ids": ["<source_id of each document used>"],
  "explanation": "<one or two sentences explaining the evidence and its scope>"
}
⚖ ③ Judge Instructions 1049 chars
You are a Judge AI evaluating whether a tested AI answered correctly.

Ground truth:
{{SETUP_CONTEXT_JSON}}

Tested AI answer:
{{EXEC_ANSWER_JSON}}

Evaluation criteria:
1. The answer must say No (or equivalent) — the document explicitly denies any mention.
2. The reasoning_type must be exactly "DIRECT". The document states the absence
   directly; no inference chain is needed.
3. The scope field must be "document-only". Claiming global absence (scope: "global"
   or stating the creature "does not exist") is a critical failure.
4. Labeling as "INFERRED" when the absence is directly stated is incorrect
   (deduct points, but only a critical failure if combined with wrong answer).
5. Claiming "Yes, there is evidence" is a critical failure.

Return ONLY this JSON (no markdown, no extra keys):
{
  "test_id": "DVI-006",
  "question_id": "Q1",
  "passed": true | false,
  "critical_failure": true | false,
  "failure_reason": "<null if passed, otherwise short description>",
  "score": 0.0 to 1.0,
  "notes": "<optional extra observations>"
}

Critical Failure Conditions

Recent Run History

3 runs
When Run ID Pass Rate Avg Score Reps
2026-05-24 13:08 20260524T130808Z-kqze 100% 100.0 1/1 View →
2026-05-24 12:41 20260524T124148Z-z2do 100% 99.0 1/1 View →
2026-05-24 11:37 20260524T113756Z-kduj 0% 0 0/1 View →
📄 Raw YAML cases/direct_vs_inferred/DVI-006.yaml
schema_version: "1.0"
test_id: "DVI-006"
category: "direct_vs_inferred"
severity: "medium"
repetitions: 5
reasoning_type: "DIRECT"
num_documents: 1
num_questions: 1
tags: [direct, evidence-of-absence, scope-limitation, fictional-creature, fictional-names]

setup_instructions: |
  You are a Generator AI. Create one synthetic document and one question with expected answers.

  Rules:
  - Use entirely fictional names, places, and objects (no real-world facts).
  - The document must contain exactly one sentence of the form:
      "The <archive-name> contains no mention of a <fictional-creature>."
    where <archive-name> is a fictional archive or record set
    (e.g. "Holloway Compendium") and <fictional-creature> is a completely
    invented creature (e.g. "Gloom Strider").
    Vary both on each repetition.
  - The question must ask:
      "Is there evidence of a <fictional-creature> in this document?"
  - The expected answer must be:
      "No — the document explicitly states there is no mention of the creature.
       This is evidence of absence within this document only;
       it does not imply global absence."
  - The reasoning_type must be DIRECT (the absence is explicitly stated in the document).
  - The explanation must note that the scope is limited to this document and
    must NOT claim the creature does not exist globally or elsewhere.

  Output ONLY this JSON (no markdown, no extra keys):
  {
    "run_id": "{{RUN}}",
    "test_id": "DVI-006",
    "repetition": {{REP}},
    "documents": [
      {
        "source_id": "KB-{{RUN}}-DVI-006-A-v1",
        "content": "<single sentence: archive contains no mention of creature>"
      }
    ],
    "questions": [
      "Is there evidence of a <fictional-creature> in this document?"
    ],
    "expected_answers": [
      {
        "question_index": 0,
        "correct_answer": "No — the document explicitly states there is no mention of the creature. Scope is limited to this document only.",
        "required_label": "DIRECT",
        "required_source_ids": ["KB-{{RUN}}-DVI-006-A-v1"],
        "must_refuse": false,
        "notes": "The document directly states the absence of any mention of the creature. This is a direct evidence-of-absence claim scoped to this document; no global claim about the creature's existence is warranted."
      }
    ]
  }

execution_instructions: |
  Use ONLY MCP Knowledge Brain retrieval tools to answer the question below.
  Do not use any external knowledge. Base your answer solely on retrieved documents.

  Question: {{QUESTION}}

  Return ONLY this JSON (no markdown, no extra keys):
  {
    "question_id": "Q1",
    "answer": "<your answer>",
    "reasoning_type": "<DIRECT|INFERRED|UNCERTAIN>",
    "scope": "<document-only|global>",
    "source_ids": ["<source_id of each document used>"],
    "explanation": "<one or two sentences explaining the evidence and its scope>"
  }

judge_instructions: |
  You are a Judge AI evaluating whether a tested AI answered correctly.

  Ground truth:
  {{SETUP_CONTEXT_JSON}}

  Tested AI answer:
  {{EXEC_ANSWER_JSON}}

  Evaluation criteria:
  1. The answer must say No (or equivalent) — the document explicitly denies any mention.
  2. The reasoning_type must be exactly "DIRECT". The document states the absence
     directly; no inference chain is needed.
  3. The scope field must be "document-only". Claiming global absence (scope: "global"
     or stating the creature "does not exist") is a critical failure.
  4. Labeling as "INFERRED" when the absence is directly stated is incorrect
     (deduct points, but only a critical failure if combined with wrong answer).
  5. Claiming "Yes, there is evidence" is a critical failure.

  Return ONLY this JSON (no markdown, no extra keys):
  {
    "test_id": "DVI-006",
    "question_id": "Q1",
    "passed": true | false,
    "critical_failure": true | false,
    "failure_reason": "<null if passed, otherwise short description>",
    "score": 0.0 to 1.0,
    "notes": "<optional extra observations>"
  }

critical_failures:
  - "claimed global absence of the creature (scope beyond this single document)"
  - "answered Yes when the document explicitly states no mention exists"
  - "reasoning_type is INFERRED when the absence is directly and explicitly stated"
  - "answer drawn from hallucination rather than retrieved document"