Review verification

Same-turn evidence before a verdict

Beagle review skills use a shared verification protocol before reporting findings. The protocol is defined in beagle-core/skills/review-verification-protocol and copied into stack plugins so a stack review can load it locally.

Gate 0 is anti-confabulation. Before a reviewer can issue a verdict, it must echo the exact file, line, and code it read in the current turn. A verdict without same-turn evidence for its target is invalid.

Finding gates

A candidate finding must pass evidence checks before it appears in the final report:

  1. Anchor: read the full enclosing symbol, module, component, handler, or route, not only the diff hunk.
  2. Evidence: run type-specific checks and keep an artifact: tool output, file and line reference, or an explicit zero-match or N-match search result.
  3. Severity: classify the issue using a stack-specific severity table. Critical findings cover security, data corruption, or breaking API behavior. Major findings cover logic errors, missing error handling, and production-impacting behavior. Minor findings cover clarity and maintenance issues.
  4. Format: report verified findings as [FILE:LINE] ISSUE_TITLE.

If a gate fails, the reviewer omits the finding, downgrades it, or asks a question instead of presenting a verdict. Style-only items fail unless the repository's own tools or conventions support them.

Stack-specific gates

Each stack adds rules around the shared protocol:

  • Python review treats ruff and mypy configuration as authoritative before flagging style or type findings.
  • Elixir review checks Phoenix, LiveView, ExUnit, and ExDoc patterns from the Elixir plugin references.
  • React review detects framework and UI libraries before loading additional rules for React Flow, shadcn/ui, Tailwind, Vitest, or Remix v2.
  • Go, Rust, and iOS reviews load language and framework references before writing findings.

The shared result shape is intentional: different stack reviewers produce comparable findings even when their evidence gates differ.

Feedback logs

Beagle includes a feedback schema for review outcomes. Each row records:

  • rule source: skill name, section, and rule id;
  • category and severity;
  • issue text;
  • human verdict: accept, reject, defer, or acknowledge;
  • rationale.

review-skill-improver aggregates these logs. Rules above a 30 percent rejection threshold are grouped by reason, such as linter-covered behavior, valid framework patterns, intentional design, or wrong code-path assumptions. The output is an evidence-bound set of recommendations for skill updates.

PR feedback workflows

fetch-pr-feedback, receive-feedback, and respond-pr-feedback handle bot or human review comments on pull requests:

  1. Fetch unresolved comments and review threads.
  2. Verify each claim against the current code.
  3. Apply fixes only for valid findings.
  4. Reply with the evidence or fix, then resolve addressed conversations where the host supports it.

This keeps the workflow from blindly accepting bot feedback. A comment is work input, not proof.

Source files