← All Test Cases

CAU-002

causal

high

Repetitions

Documents

Questions

Reasoning

INFERRED

causal-chain 2-hop-causal inference leads_to

📖 In Plain English

What this category tests

Does the brain extract and reason over causal relationships?

How the test works

Documents state 'X leads to Y' relationships. The brain should store these with predicate=leads_to and let the user trace causes/effects, including 2-hop chains (A→B→C).

Why it matters

Causal reasoning is a key step above pure retrieval — it lets the brain answer 'why' and 'what if' questions.

Specifically for CAU-002

Tests 2-hop causal chains — A→B and B→C ingested separately, query asks what A ultimately leads to (should answer C).

⚙️ How a single rep runs

① Generate

Model creates 2 synthetic documents and 1 question with unique canary tokens

→ Fresh content per run prevents memorization and proves real retrieval

② Ingest (MCP)

Model calls brain_ingest to store the 2 documents

→ Tests the brain's storage and indexing pipeline

③ Query (MCP)

Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)

→ Core test — does the brain return correct evidence and let the model build a faithful answer?

④ Evaluate

Model judges the answer against ground truth (the document it generated in phase 1)

→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)

This rep is run 3 times per test run. A pass requires score ≥ 85 and no critical failures.

🔬 Technical Instructions (raw prompts sent to AI)

🔧 ① Setup Instructions 1491 chars

Generate TWO documents forming a causal chain: A leads to B, B leads to C.
The query asks what A ultimately leads to (the answer requires chaining A→B→C).

Requirements:
- Invent 3 fictional concepts: A, B, C.
- Doc 1: "A leads to B" with predicate=leads_to, subject=A, object=B
- Doc 2: "B leads to C" with predicate=leads_to, subject=B, object=C
- source_ids: KB-{{RUN}}-CAU-002-A-v1 and KB-{{RUN}}-CAU-002-B-v1
- Canary token in each document.
- Question: what does A ultimately lead to? (answer: C, via B)

Output ONLY this JSON:
{
  "run_id": "{{RUN}}",
  "test_id": "CAU-002",
  "repetition": {{REP}},
  "documents": [
    {
      "source_id": "KB-{{RUN}}-CAU-002-A-v1",
      "content": "<sentence: A leads to B [canary1]>",
      "title": "<title>",
      "version": 1
    },
    {
      "source_id": "KB-{{RUN}}-CAU-002-B-v1",
      "content": "<sentence: B leads to C [canary2]>",
      "title": "<title>",
      "version": 1
    }
  ],
  "questions": [
    "Following the causal chain: what does '<A>' ultimately lead to? Search the brain and trace the full chain A→B→C."
  ],
  "expected_answers": [
    {
      "question_index": 0,
      "correct_answer": "<C>",
      "required_source_ids": ["KB-{{RUN}}-CAU-002-A-v1", "KB-{{RUN}}-CAU-002-B-v1"],
      "required_label": "INFERRED",
      "must_refuse": false,
      "notes": "Requires retrieving both documents and chaining A→B→C. Final answer is C. Label INFERRED because it requires multi-hop causal reasoning."
    }
  ]
}

▶ ② Execution Instructions 368 chars

You have access to MCP Knowledge Brain tools.

When ingesting, use brain_ingest with extracted.claims for each document:
- Doc 1: predicate="leads_to", subject=<A>, object=<B>
- Doc 2: predicate="leads_to", subject=<B>, object=<C>

Then search the brain to trace the causal chain from A to C.

Question: {{QUESTION}}

Return ONLY this JSON:
{{REQUIRED_ANSWER_SCHEMA}}

⚖ ③ Judge Instructions 786 chars

Ground truth:
{{SETUP_CONTEXT_JSON}}

Tested AI answer:
{{EXEC_ANSWER_JSON}}

Two causal claims were ingested: A→B and B→C. The AI must chain them to answer
what A ultimately leads to (C). This tests 2-hop causal reasoning.

Scoring:
- retrieval_tool_use (0-20): Retrieved both source documents?
- evidence_fidelity (0-20): Cited both required source_ids? Claims are accurate?
- reasoning_correctness (0-20): Final answer is C (the end of the chain)?
- uncertainty_discipline (0-15): Used INFERRED label (multi-hop reasoning)?
- output_format_compliance (0-15): Valid JSON?
- adversarial_resistance (0-10): Full marks.

Critical failures: answered only the first hop (B instead of C), cited wrong source_ids, did not chain the reasoning.

Return ONLY this JSON:
{{JUDGE_RESULT_SCHEMA}}

Critical Failure Conditions

only_answered_first_hop
missing_required_source_ids
wrong_final_answer

Recent Run History

3 runs

When	Run ID	Pass Rate	Avg Score	Reps
2026-05-24 13:08	20260524T130808Z-kqze	100%	95.0	1/1	View →
2026-05-24 12:41	20260524T124148Z-z2do	100%	92.0	1/1	View →
2026-05-24 11:37	20260524T113756Z-kduj	100%	85.0	1/1	View →

📄 Raw YAML cases/causal/CAU-002.yaml

schema_version: "1.0"
test_id: "CAU-002"
category: "causal"
severity: "high"
repetitions: 3
reasoning_type: "INFERRED"
num_documents: 2
num_questions: 1
tags: ["causal-chain", "2-hop-causal", "inference", "leads_to"]

setup_instructions: |
  Generate TWO documents forming a causal chain: A leads to B, B leads to C.
  The query asks what A ultimately leads to (the answer requires chaining A→B→C).

  Requirements:
  - Invent 3 fictional concepts: A, B, C.
  - Doc 1: "A leads to B" with predicate=leads_to, subject=A, object=B
  - Doc 2: "B leads to C" with predicate=leads_to, subject=B, object=C
  - source_ids: KB-{{RUN}}-CAU-002-A-v1 and KB-{{RUN}}-CAU-002-B-v1
  - Canary token in each document.
  - Question: what does A ultimately lead to? (answer: C, via B)

  Output ONLY this JSON:
  {
    "run_id": "{{RUN}}",
    "test_id": "CAU-002",
    "repetition": {{REP}},
    "documents": [
      {
        "source_id": "KB-{{RUN}}-CAU-002-A-v1",
        "content": "<sentence: A leads to B [canary1]>",
        "title": "<title>",
        "version": 1
      },
      {
        "source_id": "KB-{{RUN}}-CAU-002-B-v1",
        "content": "<sentence: B leads to C [canary2]>",
        "title": "<title>",
        "version": 1
      }
    ],
    "questions": [
      "Following the causal chain: what does '<A>' ultimately lead to? Search the brain and trace the full chain A→B→C."
    ],
    "expected_answers": [
      {
        "question_index": 0,
        "correct_answer": "<C>",
        "required_source_ids": ["KB-{{RUN}}-CAU-002-A-v1", "KB-{{RUN}}-CAU-002-B-v1"],
        "required_label": "INFERRED",
        "must_refuse": false,
        "notes": "Requires retrieving both documents and chaining A→B→C. Final answer is C. Label INFERRED because it requires multi-hop causal reasoning."
      }
    ]
  }

execution_instructions: |
  You have access to MCP Knowledge Brain tools.

  When ingesting, use brain_ingest with extracted.claims for each document:
  - Doc 1: predicate="leads_to", subject=<A>, object=<B>
  - Doc 2: predicate="leads_to", subject=<B>, object=<C>

  Then search the brain to trace the causal chain from A to C.

  Question: {{QUESTION}}

  Return ONLY this JSON:
  {{REQUIRED_ANSWER_SCHEMA}}

judge_instructions: |
  Ground truth:
  {{SETUP_CONTEXT_JSON}}

  Tested AI answer:
  {{EXEC_ANSWER_JSON}}

  Two causal claims were ingested: A→B and B→C. The AI must chain them to answer
  what A ultimately leads to (C). This tests 2-hop causal reasoning.

  Scoring:
  - retrieval_tool_use (0-20): Retrieved both source documents?
  - evidence_fidelity (0-20): Cited both required source_ids? Claims are accurate?
  - reasoning_correctness (0-20): Final answer is C (the end of the chain)?
  - uncertainty_discipline (0-15): Used INFERRED label (multi-hop reasoning)?
  - output_format_compliance (0-15): Valid JSON?
  - adversarial_resistance (0-10): Full marks.

  Critical failures: answered only the first hop (B instead of C), cited wrong source_ids, did not chain the reasoning.

  Return ONLY this JSON:
  {{JUDGE_RESULT_SCHEMA}}

critical_failures:
  - "only_answered_first_hop"
  - "missing_required_source_ids"
  - "wrong_final_answer"