feat: preflight #57

Merged
rcsheets merged 3 commits from feat/preflight-step-0 into main 2026-05-02 00:34:27 +00:00
Owner
No description provided.
Storage layer for the upcoming preflight (step-0) pass. Each event
gets an observations array of {tag, message} objects emitted by the
preflight model — short tag (currently free-form) plus a human
message describing what was noticed. NOT NULL with DEFAULT '[]' so
existing rows backfill automatically.

JSONB chosen over a normalized review_observations table because the
tag vocabulary is currently free-form. Migrating to the relational
shape is coupled to moving to a deterministic tag vocabulary — the
relational query benefit only pays off once "show me reviews where
tag = X" is a useful question against a closed enum, and a closed
enum is what unlocks per-repo halt rules. Note captured in the
column comment so future-us sees it on schema spelunks.

ReviewEvent gains an Observations slice; Record marshals it to JSONB
before insert; scanEvent reads it back. Malformed JSONB (only
possible from hand-edited rows) reads as empty rather than failing
the load.

No code that produces observations yet — that lands with the
preflight backend method in a follow-up commit on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a cheap, breadth-first pre-review pass that runs on every review
regardless of tier. Output is a complexity tag and a list of
{tag, message} observations; the runner threads these into the event
row before the heavier review fan-out, where they replace complexity
as a side-output of the review pass.

  Step 0  (always)         preflight: complexity + observations
                            ↓
  Step 1  (full only)      discovery + extract: investigative context
                            ↓
  Step 2  (always)         review: verdict + body, no longer emits complexity

Why a separate pass:

  - Complexity is now known *before* the heavier review runs, which
    enables routing decisions (skip-for-trivial, model selection,
    future per-repo halt rules from #44) without changing the review
    pass's behavior.
  - The "always-context" — guidance files + directory listings — lives
    in a tier-agnostic helper now, so step 0 and step 1 share it.
    Pulled out of the existing discovery wiring as repocontext.BuildAlways.
  - submit_review's tool schema sheds complexity (single source of
    truth), and the body-first / verdict-last reorder lands here too:
    body is the model's scratchpad, verdict is the final commitment.

Implementation:

  - tracker: ReviewEvent gains Observations []Observation; populated
    by the runner from preflight before fan-out, copied to shadow
    rows by shadowEventFrom so the comparison group sees consistent
    complexity / observations across configs.
  - reviewer: submit_preflight tool schema, PreflightPrompt,
    Backend.Preflight method on both Anthropic and vLLM backends.
    Result no longer carries Complexity.
  - runner: PreflightModel field, runPreflight stage between diff
    fetch and changed-file fetch. Falls open on any failure — review
    still runs, just without step-0 signals.
  - config: PREFLIGHT_MODEL env var, defaults to
    claude-haiku-4-5-20251001. cmd/webhook plumbs it through.

Storage migration for observations landed in a previous commit on
this branch; this commit produces and consumes the data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(dashboard): render preflight observations on review-detail and compare
All checks were successful
ci / check (pull_request) Successful in 45s
c1c42a7dcd
Surfaces the step-0 observations the runner now writes to
review_events. On /reviews/{id}, a new "Preflight observations"
card lists each {tag, message} with the tag rendered as a code-pill.
Hidden when the review has no observations (the common case for
trivial diffs where the model correctly emits an empty array).

The /admin/comparisons/{id} compare page lifts the same observations
to the page header alongside complexity, since preflight runs once
per comparison group and the data is identical across configs —
duplicating it inside each config column would be noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator

Automated review by pr-reviewer v0.25.0 | Full Review | Claude Sonnet 4 | tracking id r-f5458e-2fe57d
This is an AI-generated review and may contain mistakes.

Status: Completed


Verdict: Approve — Well-architected feature implementation that adds preflight analysis capabilities with proper error handling, good separation of concerns, and comprehensive test coverage.
Complexity: complex

Preflight Step-0 Implementation Review

This PR implements a preflight analysis step that runs before the main review to emit complexity assessments and observations. The implementation is well-architected and follows good Go practices.

Architecture & Design

Excellent separation of concerns: The preflight functionality is cleanly separated from the main review pipeline while sharing common infrastructure. The step-0 pass uses a cheaper model to provide early signals that can drive routing decisions.

Smart data flow: The preflight results are captured on the primary event and inherited by shadow events, ensuring consistency across A/B comparisons while avoiding duplicate expensive calls.

Robust error handling: Preflight failures are non-fatal and logged appropriately - the main review continues without these signals, providing good resilience.

Implementation Quality

Database schema: The migration properly adds the observations JSONB column with appropriate defaults and comments. The choice to use JSONB over a normalized table is well-justified for the current free-form tag vocabulary.

Prompt engineering: The PreflightPrompt is well-crafted, clearly defining the two jobs (complexity estimation and observation surfacing) with concrete examples and appropriate constraints.

Tool schema design: The preflight tool schema correctly orders fields (observations first, complexity last) to encourage chain-of-thought reasoning. The 20-item limit on observations provides reasonable bounds.

Code Organization

Consistent patterns: The implementation follows established patterns from the existing codebase - backend interface extensions, tool definition structures, and error handling approaches are all consistent.

Good abstractions: The AlwaysContext abstraction cleanly captures the deterministic context shared between preflight and main review passes.

Proper configuration: The PreflightModel config with sensible defaults (Haiku for Anthropic, fallback to primary model for vLLM) provides good operational flexibility.

Testing & Safety

Comprehensive coverage: Tests cover the happy path, error scenarios, fallback behavior, and shadow event inheritance. The mock backends properly implement all interface methods including the new Preflight method.

Backwards compatibility: Existing code paths are preserved - the preflight step is additive and doesn't break existing functionality.

Safe defaults: When preflight fails, the system gracefully continues without these signals rather than failing the entire review.

Dashboard Integration

The UI properly displays observations and complexity on both the individual review and admin comparison pages. The styling is clean and the information hierarchy is appropriate.

Minor Considerations

The complexity field migration from review result to preflight result is handled cleanly, with appropriate comments explaining the ownership change. The tool schema changes (removing complexity, reordering fields) are well-reasoned and documented.

The long line in internal/reviewer/tools.go:51 (525 characters) is flagged by the automated analysis, but it's a descriptive JSON schema field that benefits from being on one line for readability.

Verdict

This is a solid implementation that adds valuable preflight capabilities while maintaining system reliability and following established architectural patterns. The feature is well-designed, properly tested, and ready for production use.

📚 Context used in this review

Project guidance files:

  • CLAUDE.md

Repository fragments:

  • internal/tracker/dashboard.go (internal/tracker/dashboard.go:118-158) — extracted in answer to: "any existing methods that handle observations or complexity data for dashboard display"
  • internal/runner/format.go (internal/runner/format.go:59-85) — extracted in answer to: "the FormatVerdict function implementation to see how complexity is displayed in comments"
*Automated review by [pr-reviewer](https://git.brooktrails.org/brooktrails/pr-reviewer) v0.25.0 | Full Review | Claude Sonnet 4 | tracking id `r-f5458e-2fe57d`* *This is an AI-generated review and may contain mistakes.* **Status:** ✅ Completed --- **✅ Verdict: Approve** — Well-architected feature implementation that adds preflight analysis capabilities with proper error handling, good separation of concerns, and comprehensive test coverage. **Complexity:** complex # Preflight Step-0 Implementation Review This PR implements a preflight analysis step that runs before the main review to emit complexity assessments and observations. The implementation is well-architected and follows good Go practices. ## Architecture & Design ✅ **Excellent separation of concerns**: The preflight functionality is cleanly separated from the main review pipeline while sharing common infrastructure. The step-0 pass uses a cheaper model to provide early signals that can drive routing decisions. **Smart data flow**: The preflight results are captured on the primary event and inherited by shadow events, ensuring consistency across A/B comparisons while avoiding duplicate expensive calls. **Robust error handling**: Preflight failures are non-fatal and logged appropriately - the main review continues without these signals, providing good resilience. ## Implementation Quality ✅ **Database schema**: The migration properly adds the `observations` JSONB column with appropriate defaults and comments. The choice to use JSONB over a normalized table is well-justified for the current free-form tag vocabulary. **Prompt engineering**: The `PreflightPrompt` is well-crafted, clearly defining the two jobs (complexity estimation and observation surfacing) with concrete examples and appropriate constraints. **Tool schema design**: The preflight tool schema correctly orders fields (observations first, complexity last) to encourage chain-of-thought reasoning. The 20-item limit on observations provides reasonable bounds. ## Code Organization ✅ **Consistent patterns**: The implementation follows established patterns from the existing codebase - backend interface extensions, tool definition structures, and error handling approaches are all consistent. **Good abstractions**: The `AlwaysContext` abstraction cleanly captures the deterministic context shared between preflight and main review passes. **Proper configuration**: The `PreflightModel` config with sensible defaults (Haiku for Anthropic, fallback to primary model for vLLM) provides good operational flexibility. ## Testing & Safety ✅ **Comprehensive coverage**: Tests cover the happy path, error scenarios, fallback behavior, and shadow event inheritance. The mock backends properly implement all interface methods including the new `Preflight` method. **Backwards compatibility**: Existing code paths are preserved - the preflight step is additive and doesn't break existing functionality. **Safe defaults**: When preflight fails, the system gracefully continues without these signals rather than failing the entire review. ## Dashboard Integration ✅ The UI properly displays observations and complexity on both the individual review and admin comparison pages. The styling is clean and the information hierarchy is appropriate. ## Minor Considerations The complexity field migration from review result to preflight result is handled cleanly, with appropriate comments explaining the ownership change. The tool schema changes (removing complexity, reordering fields) are well-reasoned and documented. The long line in `internal/reviewer/tools.go:51` (525 characters) is flagged by the automated analysis, but it's a descriptive JSON schema field that benefits from being on one line for readability. ## Verdict This is a solid implementation that adds valuable preflight capabilities while maintaining system reliability and following established architectural patterns. The feature is well-designed, properly tested, and ready for production use. <details> <summary>📚 Context used in this review</summary> **Project guidance files:** - `CLAUDE.md` **Repository fragments:** - `internal/tracker/dashboard.go` (internal/tracker/dashboard.go:118-158) — extracted in answer to: "any existing methods that handle observations or complexity data for dashboard display" - `internal/runner/format.go` (internal/runner/format.go:59-85) — extracted in answer to: "the FormatVerdict function implementation to see how complexity is displayed in comments" </details>
rcsheets deleted branch feat/preflight-step-0 2026-05-02 00:34:27 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
brooktrails/pr-reviewer!57
No description provided.