← All Test Cases
high
CAU-002
causal
Repetitions
3
Documents
2
Questions
1
Reasoning
INFERRED
causal-chain
2-hop-causal
inference
leads_to
📖 In Plain English
What this category tests
Does the brain extract and reason over causal relationships?
How the test works
Documents state 'X leads to Y' relationships. The brain should store these with predicate=leads_to and let the user trace causes/effects, including 2-hop chains (A→B→C).
Why it matters
Causal reasoning is a key step above pure retrieval — it lets the brain answer 'why' and 'what if' questions.
Specifically for CAU-002
Tests 2-hop causal chains — A→B and B→C ingested separately, query asks what A ultimately leads to (should answer C).
⚙️ How a single rep runs
① Generate
Model creates 2 synthetic documents and 1 question with unique canary tokens
→ Fresh content per run prevents memorization and proves real retrieval
② Ingest (MCP)
Model calls brain_ingest to store the 2 documents
→ Tests the brain's storage and indexing pipeline
③ Query (MCP)
Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)
→ Core test — does the brain return correct evidence and let the model build a faithful answer?
④ Evaluate
Model judges the answer against ground truth (the document it generated in phase 1)
→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)
This rep is run 3 times per test run. A pass requires score ≥ 85 and no critical failures.
🔬 Technical Instructions (raw prompts sent to AI)
🔧 ① Setup Instructions 1491 chars
Generate TWO documents forming a causal chain: A leads to B, B leads to C.
The query asks what A ultimately leads to (the answer requires chaining A→B→C).
Requirements:
- Invent 3 fictional concepts: A, B, C.
- Doc 1: "A leads to B" with predicate=leads_to, subject=A, object=B
- Doc 2: "B leads to C" with predicate=leads_to, subject=B, object=C
- source_ids: KB-{{RUN}}-CAU-002-A-v1 and KB-{{RUN}}-CAU-002-B-v1
- Canary token in each document.
- Question: what does A ultimately lead to? (answer: C, via B)
Output ONLY this JSON:
{
"run_id": "{{RUN}}",
"test_id": "CAU-002",
"repetition": {{REP}},
"documents": [
{
"source_id": "KB-{{RUN}}-CAU-002-A-v1",
"content": "<sentence: A leads to B [canary1]>",
"title": "<title>",
"version": 1
},
{
"source_id": "KB-{{RUN}}-CAU-002-B-v1",
"content": "<sentence: B leads to C [canary2]>",
"title": "<title>",
"version": 1
}
],
"questions": [
"Following the causal chain: what does '<A>' ultimately lead to? Search the brain and trace the full chain A→B→C."
],
"expected_answers": [
{
"question_index": 0,
"correct_answer": "<C>",
"required_source_ids": ["KB-{{RUN}}-CAU-002-A-v1", "KB-{{RUN}}-CAU-002-B-v1"],
"required_label": "INFERRED",
"must_refuse": false,
"notes": "Requires retrieving both documents and chaining A→B→C. Final answer is C. Label INFERRED because it requires multi-hop causal reasoning."
}
]
}
▶ ② Execution Instructions 368 chars
You have access to MCP Knowledge Brain tools.
When ingesting, use brain_ingest with extracted.claims for each document:
- Doc 1: predicate="leads_to", subject=<A>, object=<B>
- Doc 2: predicate="leads_to", subject=<B>, object=<C>
Then search the brain to trace the causal chain from A to C.
Question: {{QUESTION}}
Return ONLY this JSON:
{{REQUIRED_ANSWER_SCHEMA}}
⚖ ③ Judge Instructions 786 chars
Ground truth:
{{SETUP_CONTEXT_JSON}}
Tested AI answer:
{{EXEC_ANSWER_JSON}}
Two causal claims were ingested: A→B and B→C. The AI must chain them to answer
what A ultimately leads to (C). This tests 2-hop causal reasoning.
Scoring:
- retrieval_tool_use (0-20): Retrieved both source documents?
- evidence_fidelity (0-20): Cited both required source_ids? Claims are accurate?
- reasoning_correctness (0-20): Final answer is C (the end of the chain)?
- uncertainty_discipline (0-15): Used INFERRED label (multi-hop reasoning)?
- output_format_compliance (0-15): Valid JSON?
- adversarial_resistance (0-10): Full marks.
Critical failures: answered only the first hop (B instead of C), cited wrong source_ids, did not chain the reasoning.
Return ONLY this JSON:
{{JUDGE_RESULT_SCHEMA}}
Critical Failure Conditions
- only_answered_first_hop
- missing_required_source_ids
- wrong_final_answer
Recent Run History
3 runs| When | Run ID | Pass Rate | Avg Score | Reps | |
|---|---|---|---|---|---|
| 2026-05-24 13:08 | 20260524T130808Z-kqze | 100% | 95.0 | 1/1 | View → |
| 2026-05-24 12:41 | 20260524T124148Z-z2do | 100% | 92.0 | 1/1 | View → |
| 2026-05-24 11:37 | 20260524T113756Z-kduj | 100% | 85.0 | 1/1 | View → |
📄 Raw YAML cases/causal/CAU-002.yaml
schema_version: "1.0"
test_id: "CAU-002"
category: "causal"
severity: "high"
repetitions: 3
reasoning_type: "INFERRED"
num_documents: 2
num_questions: 1
tags: ["causal-chain", "2-hop-causal", "inference", "leads_to"]
setup_instructions: |
Generate TWO documents forming a causal chain: A leads to B, B leads to C.
The query asks what A ultimately leads to (the answer requires chaining A→B→C).
Requirements:
- Invent 3 fictional concepts: A, B, C.
- Doc 1: "A leads to B" with predicate=leads_to, subject=A, object=B
- Doc 2: "B leads to C" with predicate=leads_to, subject=B, object=C
- source_ids: KB-{{RUN}}-CAU-002-A-v1 and KB-{{RUN}}-CAU-002-B-v1
- Canary token in each document.
- Question: what does A ultimately lead to? (answer: C, via B)
Output ONLY this JSON:
{
"run_id": "{{RUN}}",
"test_id": "CAU-002",
"repetition": {{REP}},
"documents": [
{
"source_id": "KB-{{RUN}}-CAU-002-A-v1",
"content": "<sentence: A leads to B [canary1]>",
"title": "<title>",
"version": 1
},
{
"source_id": "KB-{{RUN}}-CAU-002-B-v1",
"content": "<sentence: B leads to C [canary2]>",
"title": "<title>",
"version": 1
}
],
"questions": [
"Following the causal chain: what does '<A>' ultimately lead to? Search the brain and trace the full chain A→B→C."
],
"expected_answers": [
{
"question_index": 0,
"correct_answer": "<C>",
"required_source_ids": ["KB-{{RUN}}-CAU-002-A-v1", "KB-{{RUN}}-CAU-002-B-v1"],
"required_label": "INFERRED",
"must_refuse": false,
"notes": "Requires retrieving both documents and chaining A→B→C. Final answer is C. Label INFERRED because it requires multi-hop causal reasoning."
}
]
}
execution_instructions: |
You have access to MCP Knowledge Brain tools.
When ingesting, use brain_ingest with extracted.claims for each document:
- Doc 1: predicate="leads_to", subject=<A>, object=<B>
- Doc 2: predicate="leads_to", subject=<B>, object=<C>
Then search the brain to trace the causal chain from A to C.
Question: {{QUESTION}}
Return ONLY this JSON:
{{REQUIRED_ANSWER_SCHEMA}}
judge_instructions: |
Ground truth:
{{SETUP_CONTEXT_JSON}}
Tested AI answer:
{{EXEC_ANSWER_JSON}}
Two causal claims were ingested: A→B and B→C. The AI must chain them to answer
what A ultimately leads to (C). This tests 2-hop causal reasoning.
Scoring:
- retrieval_tool_use (0-20): Retrieved both source documents?
- evidence_fidelity (0-20): Cited both required source_ids? Claims are accurate?
- reasoning_correctness (0-20): Final answer is C (the end of the chain)?
- uncertainty_discipline (0-15): Used INFERRED label (multi-hop reasoning)?
- output_format_compliance (0-15): Valid JSON?
- adversarial_resistance (0-10): Full marks.
Critical failures: answered only the first hop (B instead of C), cited wrong source_ids, did not chain the reasoning.
Return ONLY this JSON:
{{JUDGE_RESULT_SCHEMA}}
critical_failures:
- "only_answered_first_hop"
- "missing_required_source_ids"
- "wrong_final_answer"