← All Test Cases
critical
XSR-007
cross source
Repetitions
5
Documents
5
Questions
1
Reasoning
CROSS_SOURCE
cross_source
four_hop
distractor
project
token
log
server
rack
critical
📖 In Plain English
What this category tests
Can the brain answer questions that require combining information from 2+ documents?
How the test works
Documents form chains (e.g. Doc A: 'X9 is the velvet pouch'; Doc B: 'velvet pouch contains a silver locket'). The test asks 'what does X9 contain?' which requires Doc A → Doc B reasoning.
Why it matters
Real questions often span multiple documents. Single-doc retrieval is not enough.
⚙️ How a single rep runs
① Generate
Model creates 5 synthetic documents and 1 question with unique canary tokens
→ Fresh content per run prevents memorization and proves real retrieval
② Ingest (MCP)
Model calls brain_ingest to store the 5 documents
→ Tests the brain's storage and indexing pipeline
③ Query (MCP)
Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)
→ Core test — does the brain return correct evidence and let the model build a faithful answer?
④ Evaluate
Model judges the answer against ground truth (the document it generated in phase 1)
→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)
This rep is run 5 times per test run. A pass requires score ≥ 85 and no critical failures.
🔬 Technical Instructions (raw prompts sent to AI)
🔧 ① Setup Instructions 3678 chars
You are a test-data Generator AI.
Generate one self-consistent test scenario using the following structure:
- Choose a fictional project name (e.g. "Project Helion").
- Choose a fictional token code (e.g. "TOK-2291-ZX").
- Choose a fictional log number (e.g. "LOG-5540").
- Choose a fictional primary server name (e.g. "SRVR-Koval").
- Choose a fictional primary rack number (e.g. "Rack-19B").
- Choose a DIFFERENT fictional server name for the distractor (e.g. "SRVR-Dantec").
- Choose a DIFFERENT fictional rack number for the distractor (e.g. "Rack-04A").
Produce exactly five documents:
Doc A source_id: KB-{{RUN}}-XSR-007-A-v1
title: Project Token Index
content: "Project <Name> uses token <token_code>."
Doc B source_id: KB-{{RUN}}-XSR-007-B-v1
title: Token Audit Log
content: "Token <token_code> appears in log <log_number>."
Doc C source_id: KB-{{RUN}}-XSR-007-C-v1
title: Log Origin Record
content: "Log <log_number> was generated by Server <primary_server>."
Doc D source_id: KB-{{RUN}}-XSR-007-D-v1
title: Server Location Register
content: "Server <primary_server> is in Rack <primary_rack>."
Doc E source_id: KB-{{RUN}}-XSR-007-E-v1
title: Server Location Register (supplementary)
content: "Server <distractor_server> is in Rack <distractor_rack>."
note: "DISTRACTOR — must NOT be used in the answer chain."
Produce exactly one question:
"Where is Project <Name>'s server rack?"
Produce the expected answer:
question_index: 0
correct_answer: "<primary_rack>"
required_label: "CROSS_SOURCE"
required_source_ids: [
"KB-{{RUN}}-XSR-007-A-v1",
"KB-{{RUN}}-XSR-007-B-v1",
"KB-{{RUN}}-XSR-007-C-v1",
"KB-{{RUN}}-XSR-007-D-v1"
]
must_refuse: false
notes: |
forbidden_source_ids: ["KB-{{RUN}}-XSR-007-E-v1"]
The four-hop chain is: project -> token -> log -> server -> rack (Docs A-D).
Doc E describes a different server and must not appear in the answer chain.
Replace {{RUN}} with the actual run_id value.
Output ONLY this JSON (no markdown, no commentary):
{
"run_id": "<string>",
"test_id": "XSR-007",
"repetition": <integer 1-5>,
"documents": [
{
"source_id": "KB-<run_id>-XSR-007-A-v1",
"title": "Project Token Index",
"content": "Project <Name> uses token <token_code>."
},
{
"source_id": "KB-<run_id>-XSR-007-B-v1",
"title": "Token Audit Log",
"content": "Token <token_code> appears in log <log_number>."
},
{
"source_id": "KB-<run_id>-XSR-007-C-v1",
"title": "Log Origin Record",
"content": "Log <log_number> was generated by Server <primary_server>."
},
{
"source_id": "KB-<run_id>-XSR-007-D-v1",
"title": "Server Location Register",
"content": "Server <primary_server> is in Rack <primary_rack>."
},
{
"source_id": "KB-<run_id>-XSR-007-E-v1",
"title": "Server Location Register (supplementary)",
"content": "Server <distractor_server> is in Rack <distractor_rack>.",
"role": "distractor"
}
],
"questions": ["Where is Project <Name>'s server rack?"],
"expected_answers": [
{
"question_index": 0,
"correct_answer": "<primary_rack>",
"required_label": "CROSS_SOURCE",
"required_source_ids": [
"KB-<run_id>-XSR-007-A-v1",
"KB-<run_id>-XSR-007-B-v1",
"KB-<run_id>-XSR-007-C-v1",
"KB-<run_id>-XSR-007-D-v1"
],
"must_refuse": false,
"notes": "Four-hop chain through Docs A-D; Doc E (KB-<run_id>-XSR-007-E-v1) is a distractor and must NOT be cited."
}
]
}
▶ ② Execution Instructions 531 chars
Use ONLY MCP Knowledge Brain retrieval tools to answer the question below.
Do not use any external knowledge or memory.
Question: {{QUESTION}}
Return ONLY this JSON (no markdown, no commentary):
{
"question": "<the question text>",
"answer_value": "<your answer or null>",
"reasoning_type": "<CROSS_SOURCE | DIRECT | UNCERTAIN | INFERRED>",
"source_ids_used": ["<id1>", "<id2>", "<id3>", "<id4>"],
"confidence": "<high | medium | low>",
"explanation": "<one or two sentences describing how you reached the answer>"
}
⚖ ③ Judge Instructions 1267 chars
You are an impartial judge evaluating a Knowledge Brain retrieval test.
Ground truth:
{{SETUP_CONTEXT_JSON}}
Tested AI answer:
{{EXEC_ANSWER_JSON}}
Evaluation criteria:
1. ANSWER_CORRECT — answer_value matches the primary rack designation (case-insensitive).
2. REASONING_TYPE_OK — reasoning_type is exactly "CROSS_SOURCE".
3. SOURCES_COMPLETE — all four required source_ids (A-D) appear in source_ids_used.
4. DISTRACTOR_EXCLUDED — the forbidden source_id (Doc E) does NOT appear in source_ids_used,
and the distractor rack value does NOT appear in answer_value or explanation.
5. NO_HALLUCINATION — answer contains no information absent from the ingested documents.
Return ONLY this JSON (no markdown, no commentary):
{
"test_id": "XSR-007",
"run_id": "<from ground truth>",
"repetition": <integer>,
"pass": <true|false>,
"score": <0.0-1.0>,
"criteria": {
"ANSWER_CORRECT": <true|false>,
"REASONING_TYPE_OK": <true|false>,
"SOURCES_COMPLETE": <true|false>,
"DISTRACTOR_EXCLUDED": <true|false>,
"NO_HALLUCINATION": <true|false>
},
"critical_failure_triggered": <true|false>,
"critical_failure_reason": "<string or null>",
"notes": "<brief free-text notes>"
}
Critical Failure Conditions
- Answer gives the distractor rack (Doc E value) instead of the correct primary rack.
- Doc E source_id appears in source_ids_used.
- Any of the four required source_ids (A-D) is missing from source_ids_used.
- reasoning_type is not CROSS_SOURCE.
- Answer is null or empty when all relevant documents were retrievable.
Recent Run History
3 runs| When | Run ID | Pass Rate | Avg Score | Reps | |
|---|---|---|---|---|---|
| 2026-05-24 13:08 | 20260524T130808Z-kqze | 100% | 100.0 | 1/1 | View → |
| 2026-05-24 12:41 | 20260524T124148Z-z2do | 0% | 0 | 0/1 | View → |
| 2026-05-24 11:37 | 20260524T113756Z-kduj | 0% | 0 | 0/1 | View → |
📄 Raw YAML cases/cross_source/XSR-007.yaml
schema_version: "1.0"
test_id: "XSR-007"
category: "cross_source"
severity: "critical"
repetitions: 5
reasoning_type: "CROSS_SOURCE"
num_documents: 5
num_questions: 1
tags: [cross_source, four_hop, distractor, project, token, log, server, rack, critical]
setup_instructions: |
You are a test-data Generator AI.
Generate one self-consistent test scenario using the following structure:
- Choose a fictional project name (e.g. "Project Helion").
- Choose a fictional token code (e.g. "TOK-2291-ZX").
- Choose a fictional log number (e.g. "LOG-5540").
- Choose a fictional primary server name (e.g. "SRVR-Koval").
- Choose a fictional primary rack number (e.g. "Rack-19B").
- Choose a DIFFERENT fictional server name for the distractor (e.g. "SRVR-Dantec").
- Choose a DIFFERENT fictional rack number for the distractor (e.g. "Rack-04A").
Produce exactly five documents:
Doc A source_id: KB-{{RUN}}-XSR-007-A-v1
title: Project Token Index
content: "Project <Name> uses token <token_code>."
Doc B source_id: KB-{{RUN}}-XSR-007-B-v1
title: Token Audit Log
content: "Token <token_code> appears in log <log_number>."
Doc C source_id: KB-{{RUN}}-XSR-007-C-v1
title: Log Origin Record
content: "Log <log_number> was generated by Server <primary_server>."
Doc D source_id: KB-{{RUN}}-XSR-007-D-v1
title: Server Location Register
content: "Server <primary_server> is in Rack <primary_rack>."
Doc E source_id: KB-{{RUN}}-XSR-007-E-v1
title: Server Location Register (supplementary)
content: "Server <distractor_server> is in Rack <distractor_rack>."
note: "DISTRACTOR — must NOT be used in the answer chain."
Produce exactly one question:
"Where is Project <Name>'s server rack?"
Produce the expected answer:
question_index: 0
correct_answer: "<primary_rack>"
required_label: "CROSS_SOURCE"
required_source_ids: [
"KB-{{RUN}}-XSR-007-A-v1",
"KB-{{RUN}}-XSR-007-B-v1",
"KB-{{RUN}}-XSR-007-C-v1",
"KB-{{RUN}}-XSR-007-D-v1"
]
must_refuse: false
notes: |
forbidden_source_ids: ["KB-{{RUN}}-XSR-007-E-v1"]
The four-hop chain is: project -> token -> log -> server -> rack (Docs A-D).
Doc E describes a different server and must not appear in the answer chain.
Replace {{RUN}} with the actual run_id value.
Output ONLY this JSON (no markdown, no commentary):
{
"run_id": "<string>",
"test_id": "XSR-007",
"repetition": <integer 1-5>,
"documents": [
{
"source_id": "KB-<run_id>-XSR-007-A-v1",
"title": "Project Token Index",
"content": "Project <Name> uses token <token_code>."
},
{
"source_id": "KB-<run_id>-XSR-007-B-v1",
"title": "Token Audit Log",
"content": "Token <token_code> appears in log <log_number>."
},
{
"source_id": "KB-<run_id>-XSR-007-C-v1",
"title": "Log Origin Record",
"content": "Log <log_number> was generated by Server <primary_server>."
},
{
"source_id": "KB-<run_id>-XSR-007-D-v1",
"title": "Server Location Register",
"content": "Server <primary_server> is in Rack <primary_rack>."
},
{
"source_id": "KB-<run_id>-XSR-007-E-v1",
"title": "Server Location Register (supplementary)",
"content": "Server <distractor_server> is in Rack <distractor_rack>.",
"role": "distractor"
}
],
"questions": ["Where is Project <Name>'s server rack?"],
"expected_answers": [
{
"question_index": 0,
"correct_answer": "<primary_rack>",
"required_label": "CROSS_SOURCE",
"required_source_ids": [
"KB-<run_id>-XSR-007-A-v1",
"KB-<run_id>-XSR-007-B-v1",
"KB-<run_id>-XSR-007-C-v1",
"KB-<run_id>-XSR-007-D-v1"
],
"must_refuse": false,
"notes": "Four-hop chain through Docs A-D; Doc E (KB-<run_id>-XSR-007-E-v1) is a distractor and must NOT be cited."
}
]
}
execution_instructions: |
Use ONLY MCP Knowledge Brain retrieval tools to answer the question below.
Do not use any external knowledge or memory.
Question: {{QUESTION}}
Return ONLY this JSON (no markdown, no commentary):
{
"question": "<the question text>",
"answer_value": "<your answer or null>",
"reasoning_type": "<CROSS_SOURCE | DIRECT | UNCERTAIN | INFERRED>",
"source_ids_used": ["<id1>", "<id2>", "<id3>", "<id4>"],
"confidence": "<high | medium | low>",
"explanation": "<one or two sentences describing how you reached the answer>"
}
judge_instructions: |
You are an impartial judge evaluating a Knowledge Brain retrieval test.
Ground truth:
{{SETUP_CONTEXT_JSON}}
Tested AI answer:
{{EXEC_ANSWER_JSON}}
Evaluation criteria:
1. ANSWER_CORRECT — answer_value matches the primary rack designation (case-insensitive).
2. REASONING_TYPE_OK — reasoning_type is exactly "CROSS_SOURCE".
3. SOURCES_COMPLETE — all four required source_ids (A-D) appear in source_ids_used.
4. DISTRACTOR_EXCLUDED — the forbidden source_id (Doc E) does NOT appear in source_ids_used,
and the distractor rack value does NOT appear in answer_value or explanation.
5. NO_HALLUCINATION — answer contains no information absent from the ingested documents.
Return ONLY this JSON (no markdown, no commentary):
{
"test_id": "XSR-007",
"run_id": "<from ground truth>",
"repetition": <integer>,
"pass": <true|false>,
"score": <0.0-1.0>,
"criteria": {
"ANSWER_CORRECT": <true|false>,
"REASONING_TYPE_OK": <true|false>,
"SOURCES_COMPLETE": <true|false>,
"DISTRACTOR_EXCLUDED": <true|false>,
"NO_HALLUCINATION": <true|false>
},
"critical_failure_triggered": <true|false>,
"critical_failure_reason": "<string or null>",
"notes": "<brief free-text notes>"
}
critical_failures:
- "Answer gives the distractor rack (Doc E value) instead of the correct primary rack."
- "Doc E source_id appears in source_ids_used."
- "Any of the four required source_ids (A-D) is missing from source_ids_used."
- "reasoning_type is not CROSS_SOURCE."
- "Answer is null or empty when all relevant documents were retrievable."