← All Test Cases

JUDGE-003

judge reliability

critical

Repetitions

Documents

Questions

Reasoning

UNCERTAIN

judge-reliability converse-fallacy label-error-detection flaky_acceptable

📖 In Plain English

What this category tests

Does the brain's own internal judge correctly identify good vs bad answers?

How the test works

Synthetic 'candidate answers' (some correct, some with known flaws like converse fallacy) are evaluated. The judge must correctly flag the bad ones and approve the good ones.

Why it matters

If the judge is unreliable, automated quality control fails and bad answers slip through.

Specifically for JUDGE-003

Tests judge's converse fallacy detection — synthetic answer commits the fallacy, judge must flag it.

⚙️ How a single rep runs

① Generate

Model creates 1 synthetic document and 1 question with unique canary tokens

→ Fresh content per run prevents memorization and proves real retrieval

② Ingest (MCP)

Model calls brain_ingest to store the 1 document

→ Tests the brain's storage and indexing pipeline

③ Query (MCP)

Model answers the question using brain retrieval tools (search, fetch, context_pack, etc.)

→ Core test — does the brain return correct evidence and let the model build a faithful answer?

④ Evaluate

Model judges the answer against ground truth (the document it generated in phase 1)

→ Produces a score 0–100 with detailed sub-scores (retrieval, fidelity, reasoning, etc.)

This rep is run 3 times per test run. A pass requires score ≥ 85 and no critical failures.

🔬 Technical Instructions (raw prompts sent to AI)

🔧 ① Setup Instructions 1691 chars

Generate a document with a conditional and an observation. Generate a BAD candidate that commits the converse fallacy and labels it DIRECT.

Requirements:
- Content: "If <condition C>, then <event E>. <Event E> occurred." (e.g. "If rain falls, the alarm sounds. The alarm sounded.")
- source_id: KB-{{RUN}}-JUDGE-003-A-v1
- BAD candidate: claims condition C occurred, labels it DIRECT (double error: converse fallacy + wrong label)

Output ONLY this JSON:
{
  "run_id": "{{RUN}}",
  "test_id": "JUDGE-003",
  "repetition": {{REP}},
  "documents": [
    {"source_id": "KB-{{RUN}}-JUDGE-003-A-v1", "content": "If <condition C>, then <event E>. <Event E> occurred.", "title": "<title>", "version": 1}
  ],
  "questions": ["Did <condition C> occur?"],
  "expected_answers": [{
    "question_index": 0,
    "correct_answer": null,
    "required_source_ids": ["KB-{{RUN}}-JUDGE-003-A-v1"],
    "required_quote": null,
    "required_label": "UNCERTAIN",
    "must_refuse": false,
    "notes": "Correct: UNCERTAIN (converse fallacy). BAD candidate uses DIRECT and affirms C — judge must fail this."
  }],
  "candidate_output": {
    "answer_value": "Yes, <condition C> occurred.",
    "claims": [{"claim": "<condition C> occurred.", "label": "DIRECT", "cited_source_ids": ["KB-{{RUN}}-JUDGE-003-A-v1"], "evidence_quotes": ["If <condition C>, then <event E>. <Event E> occurred."], "reasoning": "The event occurred so the condition must have."}],
    "raw_evidence": [{"source_id": "KB-{{RUN}}-JUDGE-003-A-v1", "quote": "If <condition C>, then <event E>. <Event E> occurred."}],
    "missing_or_uncertain": [],
    "source_ids_used": ["KB-{{RUN}}-JUDGE-003-A-v1"]
  },
  "expected_verdict": "fail"
}

▶ ② Execution Instructions 99 chars

Output the following candidate answer exactly as provided — do not modify it:
{{CANDIDATE_OUTPUT}}

⚖ ③ Judge Instructions 439 chars

Ground truth:
{{SETUP_CONTEXT_JSON}}

Candidate output being evaluated:
{{EXEC_ANSWER_JSON}}

This is a known-BAD candidate: it commits the converse fallacy (E happened therefore C happened) and incorrectly labels it DIRECT.
Both the reasoning and the label are wrong. Judge must return "fail" and identify both errors.
If you return "pass", your evaluation system has a false-positive bug.

Return ONLY this JSON:
{{JUDGE_RESULT_SCHEMA}}

Critical Failure Conditions

false_positive_judge_passed_converse_fallacy
false_positive_judge_passed_wrong_direct_label

Recent Run History

3 runs

When	Run ID	Pass Rate	Avg Score	Reps
2026-05-24 13:08	20260524T130808Z-kqze	0%	55.0	0/1	View →
2026-05-24 12:41	20260524T124148Z-z2do	0%	38.0	0/1	View →
2026-05-24 11:37	20260524T113756Z-kduj	0%	55.0	0/1	View →

📄 Raw YAML cases/judge_reliability/JUDGE-003.yaml

schema_version: "1.0"
test_id: "JUDGE-003"
category: "judge_reliability"
severity: "critical"
flaky_acceptable: true  # model gets answer right but mislabels — accept >=50% as pass
repetitions: 3
reasoning_type: "UNCERTAIN"
num_documents: 1
num_questions: 1
skip_brain: true
tags: ["judge-reliability", "converse-fallacy", "label-error-detection"]

setup_instructions: |
  Generate a document with a conditional and an observation. Generate a BAD candidate that commits the converse fallacy and labels it DIRECT.

  Requirements:
  - Content: "If <condition C>, then <event E>. <Event E> occurred." (e.g. "If rain falls, the alarm sounds. The alarm sounded.")
  - source_id: KB-{{RUN}}-JUDGE-003-A-v1
  - BAD candidate: claims condition C occurred, labels it DIRECT (double error: converse fallacy + wrong label)

  Output ONLY this JSON:
  {
    "run_id": "{{RUN}}",
    "test_id": "JUDGE-003",
    "repetition": {{REP}},
    "documents": [
      {"source_id": "KB-{{RUN}}-JUDGE-003-A-v1", "content": "If <condition C>, then <event E>. <Event E> occurred.", "title": "<title>", "version": 1}
    ],
    "questions": ["Did <condition C> occur?"],
    "expected_answers": [{
      "question_index": 0,
      "correct_answer": null,
      "required_source_ids": ["KB-{{RUN}}-JUDGE-003-A-v1"],
      "required_quote": null,
      "required_label": "UNCERTAIN",
      "must_refuse": false,
      "notes": "Correct: UNCERTAIN (converse fallacy). BAD candidate uses DIRECT and affirms C — judge must fail this."
    }],
    "candidate_output": {
      "answer_value": "Yes, <condition C> occurred.",
      "claims": [{"claim": "<condition C> occurred.", "label": "DIRECT", "cited_source_ids": ["KB-{{RUN}}-JUDGE-003-A-v1"], "evidence_quotes": ["If <condition C>, then <event E>. <Event E> occurred."], "reasoning": "The event occurred so the condition must have."}],
      "raw_evidence": [{"source_id": "KB-{{RUN}}-JUDGE-003-A-v1", "quote": "If <condition C>, then <event E>. <Event E> occurred."}],
      "missing_or_uncertain": [],
      "source_ids_used": ["KB-{{RUN}}-JUDGE-003-A-v1"]
    },
    "expected_verdict": "fail"
  }

execution_instructions: |
  Output the following candidate answer exactly as provided — do not modify it:
  {{CANDIDATE_OUTPUT}}

judge_instructions: |
  Ground truth:
  {{SETUP_CONTEXT_JSON}}

  Candidate output being evaluated:
  {{EXEC_ANSWER_JSON}}

  This is a known-BAD candidate: it commits the converse fallacy (E happened therefore C happened) and incorrectly labels it DIRECT.
  Both the reasoning and the label are wrong. Judge must return "fail" and identify both errors.
  If you return "pass", your evaluation system has a false-positive bug.

  Return ONLY this JSON:
  {{JUDGE_RESULT_SCHEMA}}

critical_failures:
  - "false_positive_judge_passed_converse_fallacy"
  - "false_positive_judge_passed_wrong_direct_label"