← All Test Cases
high
CAU-001
causal
Repetitions
3
Documents
1
Questions
1
Reasoning
INFERRED
causal-extraction
predicate
leads_to
inference
📖 In Plain English
What this category tests
Does the brain extract and reason over causal relationships?
How the test works
Documents state 'X leads to Y' relationships. The brain should store these with predicate=leads_to and let the user trace causes/effects, including 2-hop chains (A→B→C).
Why it matters
Causal reasoning is a key step above pure retrieval — it lets the brain answer 'why' and 'what if' questions.
Specifically for CAU-001
Tests basic causal extraction — ingest a document with predicate='leads_to' and verify it's stored and retrievable.
⚙️ How a single rep runs
① Generate
Model creates 1 synthetic document and 1 question with unique canary tokens
→ Fresh content per run prevents memorization and proves real retrieval
② Ingest (MCP)
Model calls brain_ingest to store the 1 document
→ Tests the brain's storage and indexing pipeline
③ Query (MCP)
Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)
→ Core test — does the brain return correct evidence and let the model build a faithful answer?
④ Evaluate
Model judges the answer against ground truth (the document it generated in phase 1)
→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)
This rep is run 3 times per test run. A pass requires score ≥ 85 and no critical failures.
🔬 Technical Instructions (raw prompts sent to AI)
🔧 ① Setup Instructions 1641 chars
Generate a document containing a clear causal relationship using "leads to" phrasing.
Ingest it using brain_ingest with the causal claim extracted as a structured claim
with predicate="leads_to".
Requirements:
- Invent a fictional process chain: "X leads to Y" (e.g. "Poor inventory tracking leads to stock shortages").
- Use a unique canary token so the claim is uniquely identifiable.
- source_id: KB-{{RUN}}-CAU-001-A-v1
- MANDATORY: ingest using brain_ingest with extracted.claims containing:
predicate="leads_to", subject="<X>", object="<Y>", confidence=1.0
Without this structured extraction the test will fail — plain ingestion is not enough.
- Question: search for Y and verify the causal claim linking X→Y is stored.
Output ONLY this JSON:
{
"run_id": "{{RUN}}",
"test_id": "CAU-001",
"repetition": {{REP}},
"documents": [
{
"source_id": "KB-{{RUN}}-CAU-001-A-v1",
"content": "<one sentence: 'X leads to Y [canary: token]'>",
"title": "<causal process title>",
"version": 1
}
],
"questions": [
"Search the brain for '<Y>' and verify a causal claim exists showing '<X>' leads to '<Y>'. Return the claim text and its predicate if present in the search result."
],
"expected_answers": [
{
"question_index": 0,
"correct_answer": "<X> leads to <Y>",
"required_source_ids": ["KB-{{RUN}}-CAU-001-A-v1"],
"required_label": "INFERRED",
"must_refuse": false,
"notes": "The brain must store and retrieve a causal claim. The claim text should mention both X and Y. Label is INFERRED because the relationship is a causal inference."
}
]
}
▶ ② Execution Instructions 1119 chars
You have access to MCP Knowledge Brain tools.
══════════════════════════════════════════════════
MANDATORY INGEST FORMAT — DO NOT SKIP THIS STEP
══════════════════════════════════════════════════
You MUST call brain_ingest with the `extracted` parameter structured EXACTLY as shown:
brain_ingest(
text="<document content from setup_context>",
metadata={"source_label": "<source_id from setup_context>", "domain": "brain-test"},
extracted={
"claims": [
{
"text": "<the full causal sentence: X leads to Y>",
"subject": "<X — the cause>",
"predicate": "leads_to",
"object": "<Y — the effect>",
"confidence": 1.0
}
]
}
)
WITHOUT the extracted.claims.predicate="leads_to" field this test FAILS.
A plain brain_ingest without `extracted` does NOT store the causal structure.
══════════════════════════════════════════════════
After ingesting, use brain_search to find the claim by searching for Y (the effect).
Report what was returned, including any predicate field if visible.
Question: {{QUESTION}}
Return ONLY this JSON:
{{REQUIRED_ANSWER_SCHEMA}}
⚖ ③ Judge Instructions 844 chars
Ground truth:
{{SETUP_CONTEXT_JSON}}
Tested AI answer:
{{EXEC_ANSWER_JSON}}
The AI ingested a causal claim "X leads to Y" with predicate=leads_to and then searched for it.
Evaluate whether the causal claim was successfully stored and retrieved.
Scoring:
- retrieval_tool_use (0-20): Used brain_ingest with extracted.claims AND brain_search?
- evidence_fidelity (0-20): Correct source_id? Claim text mentions both cause and effect?
- reasoning_correctness (0-20): Causal relationship correctly identified and reported?
- uncertainty_discipline (0-15): Used INFERRED label appropriately?
- output_format_compliance (0-15): Valid JSON?
- adversarial_resistance (0-10): Full marks.
Critical failures: causal claim not found in brain, wrong source_id, did not use extracted.claims with predicate.
Return ONLY this JSON:
{{JUDGE_RESULT_SCHEMA}}
Critical Failure Conditions
- causal_claim_not_stored
- causal_claim_not_retrieved
- hallucinated_source_id
Recent Run History
3 runs| When | Run ID | Pass Rate | Avg Score | Reps | |
|---|---|---|---|---|---|
| 2026-05-24 13:08 | 20260524T130808Z-kqze | 0% | 77.0 | 0/1 | View → |
| 2026-05-24 12:41 | 20260524T124148Z-z2do | 100% | 85.0 | 1/1 | View → |
| 2026-05-24 11:37 | 20260524T113756Z-kduj | 0% | 75.0 | 0/1 | View → |
📄 Raw YAML cases/causal/CAU-001.yaml
schema_version: "1.0"
test_id: "CAU-001"
category: "causal"
severity: "high"
repetitions: 3
reasoning_type: "INFERRED"
num_documents: 1
num_questions: 1
tags: ["causal-extraction", "predicate", "leads_to", "inference"]
setup_instructions: |
Generate a document containing a clear causal relationship using "leads to" phrasing.
Ingest it using brain_ingest with the causal claim extracted as a structured claim
with predicate="leads_to".
Requirements:
- Invent a fictional process chain: "X leads to Y" (e.g. "Poor inventory tracking leads to stock shortages").
- Use a unique canary token so the claim is uniquely identifiable.
- source_id: KB-{{RUN}}-CAU-001-A-v1
- MANDATORY: ingest using brain_ingest with extracted.claims containing:
predicate="leads_to", subject="<X>", object="<Y>", confidence=1.0
Without this structured extraction the test will fail — plain ingestion is not enough.
- Question: search for Y and verify the causal claim linking X→Y is stored.
Output ONLY this JSON:
{
"run_id": "{{RUN}}",
"test_id": "CAU-001",
"repetition": {{REP}},
"documents": [
{
"source_id": "KB-{{RUN}}-CAU-001-A-v1",
"content": "<one sentence: 'X leads to Y [canary: token]'>",
"title": "<causal process title>",
"version": 1
}
],
"questions": [
"Search the brain for '<Y>' and verify a causal claim exists showing '<X>' leads to '<Y>'. Return the claim text and its predicate if present in the search result."
],
"expected_answers": [
{
"question_index": 0,
"correct_answer": "<X> leads to <Y>",
"required_source_ids": ["KB-{{RUN}}-CAU-001-A-v1"],
"required_label": "INFERRED",
"must_refuse": false,
"notes": "The brain must store and retrieve a causal claim. The claim text should mention both X and Y. Label is INFERRED because the relationship is a causal inference."
}
]
}
execution_instructions: |
You have access to MCP Knowledge Brain tools.
══════════════════════════════════════════════════
MANDATORY INGEST FORMAT — DO NOT SKIP THIS STEP
══════════════════════════════════════════════════
You MUST call brain_ingest with the `extracted` parameter structured EXACTLY as shown:
brain_ingest(
text="<document content from setup_context>",
metadata={"source_label": "<source_id from setup_context>", "domain": "brain-test"},
extracted={
"claims": [
{
"text": "<the full causal sentence: X leads to Y>",
"subject": "<X — the cause>",
"predicate": "leads_to",
"object": "<Y — the effect>",
"confidence": 1.0
}
]
}
)
WITHOUT the extracted.claims.predicate="leads_to" field this test FAILS.
A plain brain_ingest without `extracted` does NOT store the causal structure.
══════════════════════════════════════════════════
After ingesting, use brain_search to find the claim by searching for Y (the effect).
Report what was returned, including any predicate field if visible.
Question: {{QUESTION}}
Return ONLY this JSON:
{{REQUIRED_ANSWER_SCHEMA}}
judge_instructions: |
Ground truth:
{{SETUP_CONTEXT_JSON}}
Tested AI answer:
{{EXEC_ANSWER_JSON}}
The AI ingested a causal claim "X leads to Y" with predicate=leads_to and then searched for it.
Evaluate whether the causal claim was successfully stored and retrieved.
Scoring:
- retrieval_tool_use (0-20): Used brain_ingest with extracted.claims AND brain_search?
- evidence_fidelity (0-20): Correct source_id? Claim text mentions both cause and effect?
- reasoning_correctness (0-20): Causal relationship correctly identified and reported?
- uncertainty_discipline (0-15): Used INFERRED label appropriately?
- output_format_compliance (0-15): Valid JSON?
- adversarial_resistance (0-10): Full marks.
Critical failures: causal claim not found in brain, wrong source_id, did not use extracted.claims with predicate.
Return ONLY this JSON:
{{JUDGE_RESULT_SCHEMA}}
critical_failures:
- "causal_claim_not_stored"
- "causal_claim_not_retrieved"
- "hallucinated_source_id"