feat: preflight #57
No reviewers
Labels
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
brooktrails/pr-reviewer!57
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/preflight-step-0"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Storage layer for the upcoming preflight (step-0) pass. Each event gets an observations array of {tag, message} objects emitted by the preflight model — short tag (currently free-form) plus a human message describing what was noticed. NOT NULL with DEFAULT '[]' so existing rows backfill automatically. JSONB chosen over a normalized review_observations table because the tag vocabulary is currently free-form. Migrating to the relational shape is coupled to moving to a deterministic tag vocabulary — the relational query benefit only pays off once "show me reviews where tag = X" is a useful question against a closed enum, and a closed enum is what unlocks per-repo halt rules. Note captured in the column comment so future-us sees it on schema spelunks. ReviewEvent gains an Observations slice; Record marshals it to JSONB before insert; scanEvent reads it back. Malformed JSONB (only possible from hand-edited rows) reads as empty rather than failing the load. No code that produces observations yet — that lands with the preflight backend method in a follow-up commit on this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Adds a cheap, breadth-first pre-review pass that runs on every review regardless of tier. Output is a complexity tag and a list of {tag, message} observations; the runner threads these into the event row before the heavier review fan-out, where they replace complexity as a side-output of the review pass. Step 0 (always) preflight: complexity + observations ↓ Step 1 (full only) discovery + extract: investigative context ↓ Step 2 (always) review: verdict + body, no longer emits complexity Why a separate pass: - Complexity is now known *before* the heavier review runs, which enables routing decisions (skip-for-trivial, model selection, future per-repo halt rules from #44) without changing the review pass's behavior. - The "always-context" — guidance files + directory listings — lives in a tier-agnostic helper now, so step 0 and step 1 share it. Pulled out of the existing discovery wiring as repocontext.BuildAlways. - submit_review's tool schema sheds complexity (single source of truth), and the body-first / verdict-last reorder lands here too: body is the model's scratchpad, verdict is the final commitment. Implementation: - tracker: ReviewEvent gains Observations []Observation; populated by the runner from preflight before fan-out, copied to shadow rows by shadowEventFrom so the comparison group sees consistent complexity / observations across configs. - reviewer: submit_preflight tool schema, PreflightPrompt, Backend.Preflight method on both Anthropic and vLLM backends. Result no longer carries Complexity. - runner: PreflightModel field, runPreflight stage between diff fetch and changed-file fetch. Falls open on any failure — review still runs, just without step-0 signals. - config: PREFLIGHT_MODEL env var, defaults to claude-haiku-4-5-20251001. cmd/webhook plumbs it through. Storage migration for observations landed in a previous commit on this branch; this commit produces and consumes the data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Surfaces the step-0 observations the runner now writes to review_events. On /reviews/{id}, a new "Preflight observations" card lists each {tag, message} with the tag rendered as a code-pill. Hidden when the review has no observations (the common case for trivial diffs where the model correctly emits an empty array). The /admin/comparisons/{id} compare page lifts the same observations to the page header alongside complexity, since preflight runs once per comparison group and the data is identical across configs — duplicating it inside each config column would be noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Automated review by pr-reviewer v0.25.0 | Full Review | Claude Sonnet 4 | tracking id
r-f5458e-2fe57dThis is an AI-generated review and may contain mistakes.
Status: ✅ Completed
✅ Verdict: Approve — Well-architected feature implementation that adds preflight analysis capabilities with proper error handling, good separation of concerns, and comprehensive test coverage.
Complexity: complex
Preflight Step-0 Implementation Review
This PR implements a preflight analysis step that runs before the main review to emit complexity assessments and observations. The implementation is well-architected and follows good Go practices.
Architecture & Design ✅
Excellent separation of concerns: The preflight functionality is cleanly separated from the main review pipeline while sharing common infrastructure. The step-0 pass uses a cheaper model to provide early signals that can drive routing decisions.
Smart data flow: The preflight results are captured on the primary event and inherited by shadow events, ensuring consistency across A/B comparisons while avoiding duplicate expensive calls.
Robust error handling: Preflight failures are non-fatal and logged appropriately - the main review continues without these signals, providing good resilience.
Implementation Quality ✅
Database schema: The migration properly adds the
observationsJSONB column with appropriate defaults and comments. The choice to use JSONB over a normalized table is well-justified for the current free-form tag vocabulary.Prompt engineering: The
PreflightPromptis well-crafted, clearly defining the two jobs (complexity estimation and observation surfacing) with concrete examples and appropriate constraints.Tool schema design: The preflight tool schema correctly orders fields (observations first, complexity last) to encourage chain-of-thought reasoning. The 20-item limit on observations provides reasonable bounds.
Code Organization ✅
Consistent patterns: The implementation follows established patterns from the existing codebase - backend interface extensions, tool definition structures, and error handling approaches are all consistent.
Good abstractions: The
AlwaysContextabstraction cleanly captures the deterministic context shared between preflight and main review passes.Proper configuration: The
PreflightModelconfig with sensible defaults (Haiku for Anthropic, fallback to primary model for vLLM) provides good operational flexibility.Testing & Safety ✅
Comprehensive coverage: Tests cover the happy path, error scenarios, fallback behavior, and shadow event inheritance. The mock backends properly implement all interface methods including the new
Preflightmethod.Backwards compatibility: Existing code paths are preserved - the preflight step is additive and doesn't break existing functionality.
Safe defaults: When preflight fails, the system gracefully continues without these signals rather than failing the entire review.
Dashboard Integration ✅
The UI properly displays observations and complexity on both the individual review and admin comparison pages. The styling is clean and the information hierarchy is appropriate.
Minor Considerations
The complexity field migration from review result to preflight result is handled cleanly, with appropriate comments explaining the ownership change. The tool schema changes (removing complexity, reordering fields) are well-reasoned and documented.
The long line in
internal/reviewer/tools.go:51(525 characters) is flagged by the automated analysis, but it's a descriptive JSON schema field that benefits from being on one line for readability.Verdict
This is a solid implementation that adds valuable preflight capabilities while maintaining system reliability and following established architectural patterns. The feature is well-designed, properly tested, and ready for production use.
📚 Context used in this review
Project guidance files:
CLAUDE.mdRepository fragments:
internal/tracker/dashboard.go(internal/tracker/dashboard.go:118-158) — extracted in answer to: "any existing methods that handle observations or complexity data for dashboard display"internal/runner/format.go(internal/runner/format.go:59-85) — extracted in answer to: "the FormatVerdict function implementation to see how complexity is displayed in comments"