feat: per-repo review policy (gating, auto-triggers, behavior overrides) #44

Open
opened 2026-05-01 07:29:32 +00:00 by rcsheets · 2 comments
Owner

Captured from a design conversation; not yet a ready-to-build spec, but enough structure to come back to.

Goal

Let repo admins customize how the bot behaves and how its outcomes interact with the PR, on a per-repo basis. Today everything is global (one set of model_configs, one tier-prompts pair, no concept of "this repo wants something different"). The proposal collects three orthogonal axes under one per-repo policy row.

Three orthogonal axes

1. Gating policy — what commit status the bot posts on completion

Single status context: pr-reviewer. Branch protection on the target repo lists pr-reviewer as a required status; merge is blocked while pending and depends on policy at completion.

Initial menu (one of):

  • Off — never post a status. Bot is purely informational on this repo.
  • Presencepending while reviewing → success regardless of verdict. "Block until the bot finishes."
  • Verdictsuccess on lgtm/approve, failure on flag/request_changes.
  • Verdict-non-trivial — same as Verdict, but trivial-complexity reviews always pass even on a flag.

PR-level overrides (e.g. labels like pr-reviewer:advisory to soften gating on a specific PR) is a logical follow-up once the per-repo machinery exists.

2. Auto-trigger — when to escalate quick → full without a human asking

Boolean per trigger (checkbox UI), evaluated at quick-scan completion:

  • auto_full_on_complex — quick scan returned complexity=complex.
  • auto_full_on_flag — quick scan returned verdict=flag.
  • ...future: path match, author-trust signal, diff-size threshold, etc.

UX wrinkle to decide when implementing: today a quick-then-full produces two separate bot comments. Either keep the chatty behavior, or edit the quick comment to "escalating to full review" and start a new pending → completed cycle for the full. The latter reads better.

3. Behavior overrides — per-repo customization of what "quick" and "full" mean

Examples to anchor the shape:

  • "For repo X, quick reviews should include context." Today the repocontext discovery+extraction pipeline only runs for full reviews; this would enable it for quick reviews on a specific repo.
  • "For repo Y, full reviews should never use as context any files under customer_data/*." A per-repo path-exclusion list applied during context fetch (both for the LLM-discovery whitelist and for any guidance/listing reads).

This axis implies an architectural pivot worth flagging: the current "all enabled model_configs fan out on every PR globally" model assumes uniform behavior across repos. Once customization lands, you may also want per-repo config selection (repo X uses configs A+B, repo Y uses C). Doable but a bigger change than just adding columns.

Storage sketch

One new table:

CREATE TABLE repo_policy (
  repo               TEXT PRIMARY KEY,        -- e.g. "brooktrails/infra"
  gating_policy      TEXT NOT NULL DEFAULT 'off',
  auto_full_on_complex BOOLEAN NOT NULL DEFAULT FALSE,
  auto_full_on_flag    BOOLEAN NOT NULL DEFAULT FALSE,
  -- behavior overrides (axis 3) added later as columns or a separate table
  context_for_quick    BOOLEAN NOT NULL DEFAULT FALSE,
  context_path_excludes TEXT,                  -- newline- or comma-separated globs
  prompt_quick_override TEXT,
  prompt_full_override  TEXT,
  max_tokens_quick_override INT,
  max_tokens_full_override  INT,
  updated_at         TIMESTAMPTZ NOT NULL DEFAULT now()
);

Repos with no row default to current global behavior. Admins manage rows from the dashboard; primary edit surface is a card per repo with checkbox/radio groups.

Open questions

  1. Default policy on a fresh row: probably "off" + no auto-triggers + no overrides — same as no row.
  2. Behavior overrides vs. model_configs: a per-repo prompt_full_override would shadow the model_configs-level override. Which wins? Probably repo override > config override > global default. Worth explicit precedence rules.
  3. Per-PR override mechanism: PR labels (pr-reviewer:advisory, pr-reviewer:strict)? PR description directives? Repo-level config that allows a specific label set?
  4. Fan-out interaction: when repo policy customizes behavior for repo X, do all enabled model_configs see that customization, or do we move to per-repo config selection? Latter changes A/B-comparison semantics.
  5. Migration: should we backfill any rows on first deploy, or rely entirely on dashboard input?

Suggested implementation order

If/when this is built, the natural sequence is roughly:

  1. Status posting + per-repo gating policy (axis 1, simplest, immediate value).
  2. Auto-trigger machinery (axis 2, builds on the gating UI surface).
  3. Path-exclusion override (smallest piece of axis 3, biggest immediate user value per the customer_data example).
  4. Prompt/token overrides (rest of axis 3, more invasive).
  5. Per-PR overrides via labels (last, after per-repo machinery proves out).
Captured from a design conversation; not yet a ready-to-build spec, but enough structure to come back to. ## Goal Let repo admins customize how the bot behaves and how its outcomes interact with the PR, on a per-repo basis. Today everything is global (one set of `model_configs`, one tier-prompts pair, no concept of "this repo wants something different"). The proposal collects three orthogonal axes under one per-repo policy row. ## Three orthogonal axes ### 1. Gating policy — what commit status the bot posts on completion Single status context: `pr-reviewer`. Branch protection on the target repo lists `pr-reviewer` as a required status; merge is blocked while pending and depends on policy at completion. Initial menu (one of): - **Off** — never post a status. Bot is purely informational on this repo. - **Presence** — `pending` while reviewing → `success` regardless of verdict. "Block until the bot finishes." - **Verdict** — `success` on `lgtm`/`approve`, `failure` on `flag`/`request_changes`. - **Verdict-non-trivial** — same as Verdict, but trivial-complexity reviews always pass even on a flag. PR-level overrides (e.g. labels like `pr-reviewer:advisory` to soften gating on a specific PR) is a logical follow-up once the per-repo machinery exists. ### 2. Auto-trigger — when to escalate quick → full without a human asking Boolean per trigger (checkbox UI), evaluated at quick-scan completion: - `auto_full_on_complex` — quick scan returned complexity=complex. - `auto_full_on_flag` — quick scan returned verdict=flag. - ...future: path match, author-trust signal, diff-size threshold, etc. UX wrinkle to decide when implementing: today a quick-then-full produces two separate bot comments. Either keep the chatty behavior, or edit the quick comment to "escalating to full review" and start a new pending → completed cycle for the full. The latter reads better. ### 3. Behavior overrides — per-repo customization of what "quick" and "full" mean Examples to anchor the shape: - **"For repo X, quick reviews should include context."** Today the repocontext discovery+extraction pipeline only runs for full reviews; this would enable it for quick reviews on a specific repo. - **"For repo Y, full reviews should never use as context any files under `customer_data/*`."** A per-repo path-exclusion list applied during context fetch (both for the LLM-discovery whitelist and for any guidance/listing reads). This axis implies an architectural pivot worth flagging: the current "all enabled `model_configs` fan out on every PR globally" model assumes uniform behavior across repos. Once customization lands, you may also want per-repo *config selection* (repo X uses configs A+B, repo Y uses C). Doable but a bigger change than just adding columns. ## Storage sketch One new table: ```sql CREATE TABLE repo_policy ( repo TEXT PRIMARY KEY, -- e.g. "brooktrails/infra" gating_policy TEXT NOT NULL DEFAULT 'off', auto_full_on_complex BOOLEAN NOT NULL DEFAULT FALSE, auto_full_on_flag BOOLEAN NOT NULL DEFAULT FALSE, -- behavior overrides (axis 3) added later as columns or a separate table context_for_quick BOOLEAN NOT NULL DEFAULT FALSE, context_path_excludes TEXT, -- newline- or comma-separated globs prompt_quick_override TEXT, prompt_full_override TEXT, max_tokens_quick_override INT, max_tokens_full_override INT, updated_at TIMESTAMPTZ NOT NULL DEFAULT now() ); ``` Repos with no row default to current global behavior. Admins manage rows from the dashboard; primary edit surface is a card per repo with checkbox/radio groups. ## Open questions 1. **Default policy on a fresh row**: probably "off" + no auto-triggers + no overrides — same as no row. 2. **Behavior overrides vs. model_configs**: a per-repo `prompt_full_override` would shadow the model_configs-level override. Which wins? Probably repo override > config override > global default. Worth explicit precedence rules. 3. **Per-PR override mechanism**: PR labels (`pr-reviewer:advisory`, `pr-reviewer:strict`)? PR description directives? Repo-level config that allows a specific label set? 4. **Fan-out interaction**: when repo policy customizes behavior for repo X, do all enabled `model_configs` see that customization, or do we move to per-repo config selection? Latter changes A/B-comparison semantics. 5. **Migration**: should we backfill any rows on first deploy, or rely entirely on dashboard input? ## Suggested implementation order If/when this is built, the natural sequence is roughly: 1. Status posting + per-repo gating policy (axis 1, simplest, immediate value). 2. Auto-trigger machinery (axis 2, builds on the gating UI surface). 3. Path-exclusion override (smallest piece of axis 3, biggest immediate user value per the customer_data example). 4. Prompt/token overrides (rest of axis 3, more invasive). 5. Per-PR overrides via labels (last, after per-repo machinery proves out).
Author
Owner

Addition to axis 3 (behavior overrides): the discovery + extract pipeline itself has knobs that are reasonable per-repo customization targets, beyond just on/off and path exclusions. The current globals all live in internal/repocontext/ and internal/reviewer/tools.go; any of them could become a per-repo override:

  • Caps: MaxFragmentsPerRequest (5), MaxExcerptChars (8k), MaxListingDirs (10), MaxListingEntriesPerDir (50). Repos with deep monorepos may want larger listing budgets; repos with tight token budgets may want smaller fragment caps.
  • Token budgets: DiscoveryMaxTokens (1024) and ExtractMaxTokens (4096) for the sub-passes.
  • Prompts: per-repo discovery_prompt_override and extract_prompt_override so the discovery LLM can be told repo-specific norms ("this is a Kubernetes manifests repo; prefer pulling sibling YAML files for comparison" vs. "this is a Go monorepo; prefer pulling type definitions and interface declarations").
  • Guidance probe list: today GuidanceProbeList is hard-coded to CLAUDE.md, AGENTS.md, AGENT.md, .cursorrules, .github/copilot-instructions.md. Repos with house standards under different filenames (e.g. STYLE.md, docs/REVIEW_RULES.md) could extend this list per-repo.
  • Always-listed directories: a repo could want certain directories (e.g. a top-level schema/, docs/conventions/) included in every discovery prompt's listings regardless of whether the diff touches them, so the LLM always has that orientation context.

These aren't all needed up front — most repos will work fine with the current globals — but the per-repo policy table should be designed knowing this category exists, so the schema doesn't paint us into a corner. Concretely: keep repocontext-related overrides nullable, add them lazily as need surfaces, and document the precedence rule (repo override > global default) consistently with the prompt/token overrides above.

**Addition to axis 3 (behavior overrides):** the discovery + extract pipeline itself has knobs that are reasonable per-repo customization targets, beyond just on/off and path exclusions. The current globals all live in `internal/repocontext/` and `internal/reviewer/tools.go`; any of them could become a per-repo override: - **Caps**: `MaxFragmentsPerRequest` (5), `MaxExcerptChars` (8k), `MaxListingDirs` (10), `MaxListingEntriesPerDir` (50). Repos with deep monorepos may want larger listing budgets; repos with tight token budgets may want smaller fragment caps. - **Token budgets**: `DiscoveryMaxTokens` (1024) and `ExtractMaxTokens` (4096) for the sub-passes. - **Prompts**: per-repo `discovery_prompt_override` and `extract_prompt_override` so the discovery LLM can be told repo-specific norms ("this is a Kubernetes manifests repo; prefer pulling sibling YAML files for comparison" vs. "this is a Go monorepo; prefer pulling type definitions and interface declarations"). - **Guidance probe list**: today `GuidanceProbeList` is hard-coded to `CLAUDE.md`, `AGENTS.md`, `AGENT.md`, `.cursorrules`, `.github/copilot-instructions.md`. Repos with house standards under different filenames (e.g. `STYLE.md`, `docs/REVIEW_RULES.md`) could extend this list per-repo. - **Always-listed directories**: a repo could want certain directories (e.g. a top-level `schema/`, `docs/conventions/`) included in every discovery prompt's listings regardless of whether the diff touches them, so the LLM always has that orientation context. These aren't all needed up front — most repos will work fine with the current globals — but the per-repo policy table should be designed knowing this category exists, so the schema doesn't paint us into a corner. Concretely: keep `repocontext`-related overrides nullable, add them lazily as need surfaces, and document the precedence rule (repo override > global default) consistently with the prompt/token overrides above.
Author
Owner

Storage decision pinned during preflight (step-0) work: observations land in a JSONB column (review_events.observations, migration 000008). The migration's column comment captures the coupling we agreed on: the move from JSONB to a normalized review_observations table is tied to the move from a free-form to a deterministic tag vocabulary — those decisions are linked, since the relational shape only pays off once "show me reviews where tag = X" is a useful question against a closed enum, and a closed enum is what unlocks per-repo halt rules in the first place. Captured here so the trail's in the issue people will look at when picking up the rule layer.

**Storage decision pinned during preflight (step-0) work:** observations land in a JSONB column (`review_events.observations`, migration 000008). The migration's column comment captures the coupling we agreed on: the move from JSONB to a normalized `review_observations` table is tied to the move from a free-form to a deterministic tag vocabulary — those decisions are linked, since the relational shape only pays off once "show me reviews where tag = X" is a useful question against a closed enum, and a closed enum is what unlocks per-repo halt rules in the first place. Captured here so the trail's in the issue people will look at when picking up the rule layer.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
brooktrails/pr-reviewer#44
No description provided.