feat: add LLM fixture suite for regression testing #60
No reviewers
Labels
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
brooktrails/pr-reviewer!60
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/fixture-suite"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Seven obvious-bug Go diffs in internal/reviewer/testdata/fixtures, one per
category (logic, security/creds, error handling, concurrency, resource leak,
performance, security/injection). TestFixtures runs each through the live
Anthropic backend's quick-scan path and asserts Result.Verdict matches
want.json.
Gated on ANTHROPIC_API_KEY so
go test ./...and per-PR CI skip cleanly.A dedicated .forgejo/workflows/fixtures.yaml runs the suite on
workflow_dispatch + nightly cron with the API key wired from secrets.
This is v1; controls (clean-refactor, docs-only) and harder difficulty
tiers are tracked as follow-ups so we can also measure false-positive rate.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
Automated review by pr-reviewer v0.27.1 | Full Review | Claude Sonnet 4 | tracking id
r-f9a59a-1d418dThis is an AI-generated reveiw and may contain mistakes.
Status: ✅ Completed
✅ Verdict: Approve — Well-designed regression test suite with comprehensive fixtures covering key bug categories, proper CI integration, and clean implementation that validates the core LLM review functionality.
Complexity: trivial
LLM Fixture Suite for Regression Testing
This PR introduces a comprehensive test suite for validating the LLM reviewer's ability to detect obvious bugs across seven key categories. Let me work through the implementation:
Overall Approach ✅
The design is well-thought-out:
ANTHROPIC_API_KEYkeeps regular CI free while enabling dedicated fixture runs via workflow dispatch and nightly cronwant.jsonwith expected verdict and explanatory notesTest Implementation (
internal/reviewer/fixtures_test.go) ✅The test logic is solid:
DefaultAnthropicModelwhenANTHROPIC_MODELunset)TierQuickwith appropriate contextFixture Quality ✅
Each of the seven fixtures represents a genuinely obvious bug that should be caught:
!user.IsAdmin→user.IsAdmin- classic auth bypassThe diffs are minimal and focused, making them good test cases for the reviewer's ability to spot critical issues without noise.
CI Workflow (
.forgejo/workflows/fixtures.yaml) ✅The workflow design addresses the key requirements:
ANTHROPIC_API_KEYfrom secretsThe checkout logic properly handles the Forgejo token for private repo access.
Architecture Alignment ✅
This fits well with the project's testing philosophy:
reviewer.Requestand backend interfacesEdge Cases Considered ✅
Potential Concerns
None significant. The automated observations flagged "possible secrets" in the fixture files, but these are intentionally crafted test data (the hardcoded token fixture contains
fjt_2bSk9aRpLnQwVxYzM7HvBcDeFgT3kJq8xPmW1zRywhich is clearly a test token pattern).Minor suggestion: The fixtures could benefit from a README explaining their purpose and how to add new ones, but this isn't blocking.
Value Proposition
This suite provides:
The implementation is clean, focused, and addresses a real need for validating the core value proposition of the PR reviewer system.
📚 Context used in this review
Project guidance files:
CLAUDE.mdRepository fragments:
internal/reviewer/reviewer.go(internal/reviewer/reviewer.go:16-77) — extracted in answer to: "The definitions of TierQuick constant, Request struct, PRContext struct, and any verdict constants or expected values"internal/reviewer/backend_anthropic.go(internal/reviewer/backend_anthropic.go:12-28) — extracted in answer to: "The DefaultAnthropicModel constant value and NewAnthropic function signature"