← All Test Cases

CAU-001

causal

high

Repetitions

Documents

Questions

Reasoning

INFERRED

causal-extraction predicate leads_to inference

📖 In Plain English

What this category tests

Does the brain extract and reason over causal relationships?

How the test works

Documents state 'X leads to Y' relationships. The brain should store these with predicate=leads_to and let the user trace causes/effects, including 2-hop chains (A→B→C).

Why it matters

Causal reasoning is a key step above pure retrieval — it lets the brain answer 'why' and 'what if' questions.

Specifically for CAU-001

Tests basic causal extraction — ingest a document with predicate='leads_to' and verify it's stored and retrievable.

⚙️ How a single rep runs

① Generate

Model creates 1 synthetic document and 1 question with unique canary tokens

→ Fresh content per run prevents memorization and proves real retrieval

② Ingest (MCP)

Model calls brain_ingest to store the 1 document

→ Tests the brain's storage and indexing pipeline

③ Query (MCP)

Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)

→ Core test — does the brain return correct evidence and let the model build a faithful answer?

④ Evaluate

Model judges the answer against ground truth (the document it generated in phase 1)

→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)

This rep is run 3 times per test run. A pass requires score ≥ 85 and no critical failures.

🔬 Technical Instructions (raw prompts sent to AI)

🔧 ① Setup Instructions 1641 chars

Generate a document containing a clear causal relationship using "leads to" phrasing.
Ingest it using brain_ingest with the causal claim extracted as a structured claim
with predicate="leads_to".

Requirements:
- Invent a fictional process chain: "X leads to Y" (e.g. "Poor inventory tracking leads to stock shortages").
- Use a unique canary token so the claim is uniquely identifiable.
- source_id: KB-{{RUN}}-CAU-001-A-v1
- MANDATORY: ingest using brain_ingest with extracted.claims containing:
    predicate="leads_to", subject="<X>", object="<Y>", confidence=1.0
  Without this structured extraction the test will fail — plain ingestion is not enough.
- Question: search for Y and verify the causal claim linking X→Y is stored.

Output ONLY this JSON:
{
  "run_id": "{{RUN}}",
  "test_id": "CAU-001",
  "repetition": {{REP}},
  "documents": [
    {
      "source_id": "KB-{{RUN}}-CAU-001-A-v1",
      "content": "<one sentence: 'X leads to Y [canary: token]'>",
      "title": "<causal process title>",
      "version": 1
    }
  ],
  "questions": [
    "Search the brain for '<Y>' and verify a causal claim exists showing '<X>' leads to '<Y>'. Return the claim text and its predicate if present in the search result."
  ],
  "expected_answers": [
    {
      "question_index": 0,
      "correct_answer": "<X> leads to <Y>",
      "required_source_ids": ["KB-{{RUN}}-CAU-001-A-v1"],
      "required_label": "INFERRED",
      "must_refuse": false,
      "notes": "The brain must store and retrieve a causal claim. The claim text should mention both X and Y. Label is INFERRED because the relationship is a causal inference."
    }
  ]
}

▶ ② Execution Instructions 1119 chars

You have access to MCP Knowledge Brain tools.

══════════════════════════════════════════════════
MANDATORY INGEST FORMAT — DO NOT SKIP THIS STEP
══════════════════════════════════════════════════
You MUST call brain_ingest with the `extracted` parameter structured EXACTLY as shown:

brain_ingest(
  text="<document content from setup_context>",
  metadata={"source_label": "<source_id from setup_context>", "domain": "brain-test"},
  extracted={
    "claims": [
      {
        "text": "<the full causal sentence: X leads to Y>",
        "subject": "<X — the cause>",
        "predicate": "leads_to",
        "object": "<Y — the effect>",
        "confidence": 1.0
      }
    ]
  }
)

WITHOUT the extracted.claims.predicate="leads_to" field this test FAILS.
A plain brain_ingest without `extracted` does NOT store the causal structure.
══════════════════════════════════════════════════

After ingesting, use brain_search to find the claim by searching for Y (the effect).
Report what was returned, including any predicate field if visible.

Question: {{QUESTION}}

Return ONLY this JSON:
{{REQUIRED_ANSWER_SCHEMA}}

⚖ ③ Judge Instructions 844 chars

Ground truth:
{{SETUP_CONTEXT_JSON}}

Tested AI answer:
{{EXEC_ANSWER_JSON}}

The AI ingested a causal claim "X leads to Y" with predicate=leads_to and then searched for it.
Evaluate whether the causal claim was successfully stored and retrieved.

Scoring:
- retrieval_tool_use (0-20): Used brain_ingest with extracted.claims AND brain_search?
- evidence_fidelity (0-20): Correct source_id? Claim text mentions both cause and effect?
- reasoning_correctness (0-20): Causal relationship correctly identified and reported?
- uncertainty_discipline (0-15): Used INFERRED label appropriately?
- output_format_compliance (0-15): Valid JSON?
- adversarial_resistance (0-10): Full marks.

Critical failures: causal claim not found in brain, wrong source_id, did not use extracted.claims with predicate.

Return ONLY this JSON:
{{JUDGE_RESULT_SCHEMA}}

Critical Failure Conditions

causal_claim_not_stored
causal_claim_not_retrieved
hallucinated_source_id

Recent Run History

3 runs

When	Run ID	Pass Rate	Avg Score	Reps
2026-05-24 13:08	20260524T130808Z-kqze	0%	77.0	0/1	View →
2026-05-24 12:41	20260524T124148Z-z2do	100%	85.0	1/1	View →
2026-05-24 11:37	20260524T113756Z-kduj	0%	75.0	0/1	View →

📄 Raw YAML cases/causal/CAU-001.yaml

schema_version: "1.0"
test_id: "CAU-001"
category: "causal"
severity: "high"
repetitions: 3
reasoning_type: "INFERRED"
num_documents: 1
num_questions: 1
tags: ["causal-extraction", "predicate", "leads_to", "inference"]

setup_instructions: |
  Generate a document containing a clear causal relationship using "leads to" phrasing.
  Ingest it using brain_ingest with the causal claim extracted as a structured claim
  with predicate="leads_to".

  Requirements:
  - Invent a fictional process chain: "X leads to Y" (e.g. "Poor inventory tracking leads to stock shortages").
  - Use a unique canary token so the claim is uniquely identifiable.
  - source_id: KB-{{RUN}}-CAU-001-A-v1
  - MANDATORY: ingest using brain_ingest with extracted.claims containing:
      predicate="leads_to", subject="<X>", object="<Y>", confidence=1.0
    Without this structured extraction the test will fail — plain ingestion is not enough.
  - Question: search for Y and verify the causal claim linking X→Y is stored.

  Output ONLY this JSON:
  {
    "run_id": "{{RUN}}",
    "test_id": "CAU-001",
    "repetition": {{REP}},
    "documents": [
      {
        "source_id": "KB-{{RUN}}-CAU-001-A-v1",
        "content": "<one sentence: 'X leads to Y [canary: token]'>",
        "title": "<causal process title>",
        "version": 1
      }
    ],
    "questions": [
      "Search the brain for '<Y>' and verify a causal claim exists showing '<X>' leads to '<Y>'. Return the claim text and its predicate if present in the search result."
    ],
    "expected_answers": [
      {
        "question_index": 0,
        "correct_answer": "<X> leads to <Y>",
        "required_source_ids": ["KB-{{RUN}}-CAU-001-A-v1"],
        "required_label": "INFERRED",
        "must_refuse": false,
        "notes": "The brain must store and retrieve a causal claim. The claim text should mention both X and Y. Label is INFERRED because the relationship is a causal inference."
      }
    ]
  }

execution_instructions: |
  You have access to MCP Knowledge Brain tools.

  ══════════════════════════════════════════════════
  MANDATORY INGEST FORMAT — DO NOT SKIP THIS STEP
  ══════════════════════════════════════════════════
  You MUST call brain_ingest with the `extracted` parameter structured EXACTLY as shown:

  brain_ingest(
    text="<document content from setup_context>",
    metadata={"source_label": "<source_id from setup_context>", "domain": "brain-test"},
    extracted={
      "claims": [
        {
          "text": "<the full causal sentence: X leads to Y>",
          "subject": "<X — the cause>",
          "predicate": "leads_to",
          "object": "<Y — the effect>",
          "confidence": 1.0
        }
      ]
    }
  )

  WITHOUT the extracted.claims.predicate="leads_to" field this test FAILS.
  A plain brain_ingest without `extracted` does NOT store the causal structure.
  ══════════════════════════════════════════════════

  After ingesting, use brain_search to find the claim by searching for Y (the effect).
  Report what was returned, including any predicate field if visible.

  Question: {{QUESTION}}

  Return ONLY this JSON:
  {{REQUIRED_ANSWER_SCHEMA}}

judge_instructions: |
  Ground truth:
  {{SETUP_CONTEXT_JSON}}

  Tested AI answer:
  {{EXEC_ANSWER_JSON}}

  The AI ingested a causal claim "X leads to Y" with predicate=leads_to and then searched for it.
  Evaluate whether the causal claim was successfully stored and retrieved.

  Scoring:
  - retrieval_tool_use (0-20): Used brain_ingest with extracted.claims AND brain_search?
  - evidence_fidelity (0-20): Correct source_id? Claim text mentions both cause and effect?
  - reasoning_correctness (0-20): Causal relationship correctly identified and reported?
  - uncertainty_discipline (0-15): Used INFERRED label appropriately?
  - output_format_compliance (0-15): Valid JSON?
  - adversarial_resistance (0-10): Full marks.

  Critical failures: causal claim not found in brain, wrong source_id, did not use extracted.claims with predicate.

  Return ONLY this JSON:
  {{JUDGE_RESULT_SCHEMA}}

critical_failures:
  - "causal_claim_not_stored"
  - "causal_claim_not_retrieved"
  - "hallucinated_source_id"