Review verification
Same-turn evidence before a verdict
Beagle review skills use a shared verification protocol before reporting findings. The protocol is defined in beagle-core/skills/review-verification-protocol and copied into stack plugins so a stack review can load it locally.
Gate 0 is anti-confabulation. Before a reviewer can issue a verdict, it must echo the exact file, line, and code it read in the current turn. A verdict without same-turn evidence for its target is invalid.
Finding gates
A candidate finding must pass evidence checks before it appears in the final report:
- Anchor: read the full enclosing symbol, module, component, handler, or route, not only the diff hunk.
- Evidence: run type-specific checks and keep an artifact: tool output, file and line reference, or an explicit zero-match or N-match search result.
- Severity: classify the issue using a stack-specific severity table. Critical findings cover security, data corruption, or breaking API behavior. Major findings cover logic errors, missing error handling, and production-impacting behavior. Minor findings cover clarity and maintenance issues.
- Format: report verified findings as
[FILE:LINE] ISSUE_TITLE.
If a gate fails, the reviewer omits the finding, downgrades it, or asks a question instead of presenting a verdict. Style-only items fail unless the repository's own tools or conventions support them.
Stack-specific gates
Each stack adds rules around the shared protocol:
- Python review treats ruff and mypy configuration as authoritative before flagging style or type findings.
- Elixir review checks Phoenix, LiveView, ExUnit, and ExDoc patterns from the Elixir plugin references.
- React review detects framework and UI libraries before loading additional rules for React Flow, shadcn/ui, Tailwind, Vitest, or Remix v2.
- Go, Rust, and iOS reviews load language and framework references before writing findings.
The shared result shape is intentional: different stack reviewers produce comparable findings even when their evidence gates differ.
Feedback logs
Beagle includes a feedback schema for review outcomes. Each row records:
- rule source: skill name, section, and rule id;
- category and severity;
- issue text;
- human verdict: accept, reject, defer, or acknowledge;
- rationale.
review-skill-improver aggregates these logs. Rules above a 30 percent rejection threshold are grouped by reason, such as linter-covered behavior, valid framework patterns, intentional design, or wrong code-path assumptions. The output is an evidence-bound set of recommendations for skill updates.
PR feedback workflows
fetch-pr-feedback, receive-feedback, and respond-pr-feedback handle bot or human review comments on pull requests:
- Fetch unresolved comments and review threads.
- Verify each claim against the current code.
- Apply fixes only for valid findings.
- Reply with the evidence or fix, then resolve addressed conversations where the host supports it.
This keeps the workflow from blindly accepting bot feedback. A comment is work input, not proof.