← All Test Cases

CAU-002

causal

high
Repetitions
3
Documents
2
Questions
1
Reasoning
INFERRED
causal-chain 2-hop-causal inference leads_to

📖 In Plain English

What this category tests

Does the brain extract and reason over causal relationships?

How the test works

Documents state 'X leads to Y' relationships. The brain should store these with predicate=leads_to and let the user trace causes/effects, including 2-hop chains (A→B→C).

Why it matters

Causal reasoning is a key step above pure retrieval — it lets the brain answer 'why' and 'what if' questions.

Specifically for CAU-002

Tests 2-hop causal chains — A→B and B→C ingested separately, query asks what A ultimately leads to (should answer C).

⚙️ How a single rep runs

① Generate
Model creates 2 synthetic documents and 1 question with unique canary tokens
→ Fresh content per run prevents memorization and proves real retrieval
② Ingest (MCP)
Model calls brain_ingest to store the 2 documents
→ Tests the brain's storage and indexing pipeline
③ Query (MCP)
Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)
→ Core test — does the brain return correct evidence and let the model build a faithful answer?
④ Evaluate
Model judges the answer against ground truth (the document it generated in phase 1)
→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)

This rep is run 3 times per test run. A pass requires score ≥ 85 and no critical failures.

🔬 Technical Instructions (raw prompts sent to AI)

🔧 ① Setup Instructions 1491 chars
Generate TWO documents forming a causal chain: A leads to B, B leads to C.
The query asks what A ultimately leads to (the answer requires chaining A→B→C).

Requirements:
- Invent 3 fictional concepts: A, B, C.
- Doc 1: "A leads to B" with predicate=leads_to, subject=A, object=B
- Doc 2: "B leads to C" with predicate=leads_to, subject=B, object=C
- source_ids: KB-{{RUN}}-CAU-002-A-v1 and KB-{{RUN}}-CAU-002-B-v1
- Canary token in each document.
- Question: what does A ultimately lead to? (answer: C, via B)

Output ONLY this JSON:
{
  "run_id": "{{RUN}}",
  "test_id": "CAU-002",
  "repetition": {{REP}},
  "documents": [
    {
      "source_id": "KB-{{RUN}}-CAU-002-A-v1",
      "content": "<sentence: A leads to B [canary1]>",
      "title": "<title>",
      "version": 1
    },
    {
      "source_id": "KB-{{RUN}}-CAU-002-B-v1",
      "content": "<sentence: B leads to C [canary2]>",
      "title": "<title>",
      "version": 1
    }
  ],
  "questions": [
    "Following the causal chain: what does '<A>' ultimately lead to? Search the brain and trace the full chain A→B→C."
  ],
  "expected_answers": [
    {
      "question_index": 0,
      "correct_answer": "<C>",
      "required_source_ids": ["KB-{{RUN}}-CAU-002-A-v1", "KB-{{RUN}}-CAU-002-B-v1"],
      "required_label": "INFERRED",
      "must_refuse": false,
      "notes": "Requires retrieving both documents and chaining A→B→C. Final answer is C. Label INFERRED because it requires multi-hop causal reasoning."
    }
  ]
}
▶ ② Execution Instructions 368 chars
You have access to MCP Knowledge Brain tools.

When ingesting, use brain_ingest with extracted.claims for each document:
- Doc 1: predicate="leads_to", subject=<A>, object=<B>
- Doc 2: predicate="leads_to", subject=<B>, object=<C>

Then search the brain to trace the causal chain from A to C.

Question: {{QUESTION}}

Return ONLY this JSON:
{{REQUIRED_ANSWER_SCHEMA}}
⚖ ③ Judge Instructions 786 chars
Ground truth:
{{SETUP_CONTEXT_JSON}}

Tested AI answer:
{{EXEC_ANSWER_JSON}}

Two causal claims were ingested: A→B and B→C. The AI must chain them to answer
what A ultimately leads to (C). This tests 2-hop causal reasoning.

Scoring:
- retrieval_tool_use (0-20): Retrieved both source documents?
- evidence_fidelity (0-20): Cited both required source_ids? Claims are accurate?
- reasoning_correctness (0-20): Final answer is C (the end of the chain)?
- uncertainty_discipline (0-15): Used INFERRED label (multi-hop reasoning)?
- output_format_compliance (0-15): Valid JSON?
- adversarial_resistance (0-10): Full marks.

Critical failures: answered only the first hop (B instead of C), cited wrong source_ids, did not chain the reasoning.

Return ONLY this JSON:
{{JUDGE_RESULT_SCHEMA}}

Critical Failure Conditions

Recent Run History

3 runs
When Run ID Pass Rate Avg Score Reps
2026-05-24 13:08 20260524T130808Z-kqze 100% 95.0 1/1 View →
2026-05-24 12:41 20260524T124148Z-z2do 100% 92.0 1/1 View →
2026-05-24 11:37 20260524T113756Z-kduj 100% 85.0 1/1 View →
📄 Raw YAML cases/causal/CAU-002.yaml
schema_version: "1.0"
test_id: "CAU-002"
category: "causal"
severity: "high"
repetitions: 3
reasoning_type: "INFERRED"
num_documents: 2
num_questions: 1
tags: ["causal-chain", "2-hop-causal", "inference", "leads_to"]

setup_instructions: |
  Generate TWO documents forming a causal chain: A leads to B, B leads to C.
  The query asks what A ultimately leads to (the answer requires chaining A→B→C).

  Requirements:
  - Invent 3 fictional concepts: A, B, C.
  - Doc 1: "A leads to B" with predicate=leads_to, subject=A, object=B
  - Doc 2: "B leads to C" with predicate=leads_to, subject=B, object=C
  - source_ids: KB-{{RUN}}-CAU-002-A-v1 and KB-{{RUN}}-CAU-002-B-v1
  - Canary token in each document.
  - Question: what does A ultimately lead to? (answer: C, via B)

  Output ONLY this JSON:
  {
    "run_id": "{{RUN}}",
    "test_id": "CAU-002",
    "repetition": {{REP}},
    "documents": [
      {
        "source_id": "KB-{{RUN}}-CAU-002-A-v1",
        "content": "<sentence: A leads to B [canary1]>",
        "title": "<title>",
        "version": 1
      },
      {
        "source_id": "KB-{{RUN}}-CAU-002-B-v1",
        "content": "<sentence: B leads to C [canary2]>",
        "title": "<title>",
        "version": 1
      }
    ],
    "questions": [
      "Following the causal chain: what does '<A>' ultimately lead to? Search the brain and trace the full chain A→B→C."
    ],
    "expected_answers": [
      {
        "question_index": 0,
        "correct_answer": "<C>",
        "required_source_ids": ["KB-{{RUN}}-CAU-002-A-v1", "KB-{{RUN}}-CAU-002-B-v1"],
        "required_label": "INFERRED",
        "must_refuse": false,
        "notes": "Requires retrieving both documents and chaining A→B→C. Final answer is C. Label INFERRED because it requires multi-hop causal reasoning."
      }
    ]
  }

execution_instructions: |
  You have access to MCP Knowledge Brain tools.

  When ingesting, use brain_ingest with extracted.claims for each document:
  - Doc 1: predicate="leads_to", subject=<A>, object=<B>
  - Doc 2: predicate="leads_to", subject=<B>, object=<C>

  Then search the brain to trace the causal chain from A to C.

  Question: {{QUESTION}}

  Return ONLY this JSON:
  {{REQUIRED_ANSWER_SCHEMA}}

judge_instructions: |
  Ground truth:
  {{SETUP_CONTEXT_JSON}}

  Tested AI answer:
  {{EXEC_ANSWER_JSON}}

  Two causal claims were ingested: A→B and B→C. The AI must chain them to answer
  what A ultimately leads to (C). This tests 2-hop causal reasoning.

  Scoring:
  - retrieval_tool_use (0-20): Retrieved both source documents?
  - evidence_fidelity (0-20): Cited both required source_ids? Claims are accurate?
  - reasoning_correctness (0-20): Final answer is C (the end of the chain)?
  - uncertainty_discipline (0-15): Used INFERRED label (multi-hop reasoning)?
  - output_format_compliance (0-15): Valid JSON?
  - adversarial_resistance (0-10): Full marks.

  Critical failures: answered only the first hop (B instead of C), cited wrong source_ids, did not chain the reasoning.

  Return ONLY this JSON:
  {{JUDGE_RESULT_SCHEMA}}

critical_failures:
  - "only_answered_first_hop"
  - "missing_required_source_ids"
  - "wrong_final_answer"