feat(ci): add the possibility of live tests against anthropic api #7

Merged
rcsheets merged 4 commits from feat/llm-and-controlloop into main 2026-04-13 06:07:25 +00:00
Owner
No description provided.
The control loop's list_universes tool needs to enumerate every
configured universe so the LLM can pick the right one for an
operator's intent. Add a typed lookup that returns rows ordered by
name. Integration test covers ordering behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The control loop drives Claude through a multi-turn tool-use
conversation: take operator intent, decide what to propose, call
tools to insert the proposal into the approval store. This package
provides the framework — concrete tool implementations live in
cmd/controlloop.

internal/llm/tools.go — ToolDef + Dispatcher. Dispatcher routes
tool invocations by name; unit-testable with no API access.

internal/llm/prompt.go — BuildSystemPrompt assembles a deterministic
system prompt (sorted operations, framing text, advisory rules) so
the prompt cache hits on repeated requests. Universes are
deliberately NOT in the prompt — they change too often and would
invalidate the cached prefix; the LLM discovers them via the
list_universes tool instead.

internal/llm/client.go — Client wraps the official anthropic-sdk-go
with control-loop defaults: claude-opus-4-6, adaptive thinking,
prompt-cache breakpoint on the system prompt, MaxToolIterations
guard. Run() executes the manual tool-use loop and returns a Result
with the final text and a tool-call trace. The MessageNewer
interface lets tests fake the API without a network call.

internal/llm/integration_test.go — opt-in live API test, gated on
ANTHROPIC_API_KEY. Verified end-to-end: model called the supplied
tool and returned a coherent answer. Cache reads were zero on the
second request because the test's system prompt is below Opus 4.6's
4096-token minimum cacheable prefix; this resolves itself once the
operations registry and advisory rules grow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second runnable binary. controlloop authenticates to Postgres as the
control_loop role, accepts operator intent over HTTP, drives the LLM
tool-use loop to propose a rollout plan + expand operations, and
returns the proposal to the caller for the operator to approve via
the dp TUI (forthcoming).

intent.go — intentProcessor wires three tools onto the LLM:
list_universes, propose_rollout_plan, expand_operations. Every tool
validates aggressively before touching the store and surfaces
problems back to the model as tool errors so it can recover (bad
UUID, unregistered op_type, invalid params, empty subjects, etc.).
The store/ops/llm dependencies are interfaces so tests can fake them
end-to-end.

handler.go — minimal http.ServeMux: POST /intent and GET /healthz.
Handler tests cover happy path, malformed JSON, empty text, server
errors, and method validation.

main.go — env-based config, pgxpool to the approval store, advisory
rules loaded from disk, system prompt assembled at startup so the
prompt-cache prefix is stable. Graceful shutdown with a 30s grace
period for in-flight LLM requests.

Process tests use a scriptedRunner that pretends to be the LLM and
walks the dispatcher through a sequence of canned tool calls — that
way the agentic logic gets exercised without an Anthropic round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass ANTHROPIC_API_KEY into go-test job for live LLM test
All checks were successful
test / go-test (pull_request) Successful in 21s
test / schema-test (pull_request) Successful in 52s
test / go-test (push) Successful in 21s
test / schema-test (push) Successful in 52s
3048d17657
The live LLM integration test in internal/llm auto-skips when
ANTHROPIC_API_KEY is empty, so wiring the secret in is opt-in: if
ANTHROPIC_API_KEY is defined as a Forgejo secret, the test runs and
exercises the tool-use loop end-to-end; if not, the env var
substitutes to empty and the test skips silently.

Cost is small (approx. a few cents per run, capped by
MaxToolIterations=12 and the test's tiny prompt).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator

Automated review by pr-reviewer v0.15.0 | Safety Check | anthropic | tracking id r-dc87e2-42940d
This is an AI-generated review and may contain mistakes.

Status: Completed


Verdict: LGTM — This is a well-structured addition of LLM functionality with proper testing, error handling, and security practices.
Complexity: complex

Safety Check Review

Summary

This PR adds LLM integration for natural language intent processing in the control loop. The implementation includes proper HTTP handlers, intent processing with tool dispatch, and comprehensive testing.

Key Changes Examined

  • New controlloop service: HTTP server with /intent endpoint for processing natural language requests
  • LLM client integration: Anthropic API client with tool-use loop and prompt caching
  • Intent processor: Converts operator text into rollout plans via LLM tool calls
  • Database integration: New ListUniverses method in approval store
  • CI enhancement: Optional live API testing when ANTHROPIC_API_KEY is available

Security & Safety Assessment

API key handling: Properly uses environment variable, gracefully skips tests when unset
Input validation: JSON parsing, UUID validation, parameter validation for operations
Error handling: Distinguishes client cancellation (499) from server errors (500)
Loop protection: MaxToolIterations = 12 prevents infinite LLM loops
SQL injection: Uses parameterized queries in new ListUniverses method

Technical Concerns Reviewed

  • High entropy in go.sum: Expected for cryptographic hashes and dependency checksums
  • Missing tests for some files: Acceptable for main.go (integration) and simple handlers
  • Version compatibility: Anthropic SDK version appears reasonable for a 2024 release

Code Quality Observations

  • Proper interface abstractions for testability
  • Comprehensive unit test coverage for core logic
  • Good separation of HTTP layer from business logic
  • Reasonable error messages and logging

No blocking issues identified. The implementation follows good practices for LLM integration with appropriate safeguards.

*Automated review by [pr-reviewer](https://git.brooktrails.org/brooktrails/pr-reviewer) v0.15.0 | Safety Check | anthropic | tracking id `r-dc87e2-42940d`* *This is an AI-generated review and may contain mistakes.* **Status:** ✅ Completed --- **✅ Verdict: LGTM** — This is a well-structured addition of LLM functionality with proper testing, error handling, and security practices. **Complexity:** complex # Safety Check Review ## Summary This PR adds LLM integration for natural language intent processing in the control loop. The implementation includes proper HTTP handlers, intent processing with tool dispatch, and comprehensive testing. ## Key Changes Examined - **New controlloop service**: HTTP server with `/intent` endpoint for processing natural language requests - **LLM client integration**: Anthropic API client with tool-use loop and prompt caching - **Intent processor**: Converts operator text into rollout plans via LLM tool calls - **Database integration**: New `ListUniverses` method in approval store - **CI enhancement**: Optional live API testing when `ANTHROPIC_API_KEY` is available ## Security & Safety Assessment ✅ **API key handling**: Properly uses environment variable, gracefully skips tests when unset ✅ **Input validation**: JSON parsing, UUID validation, parameter validation for operations ✅ **Error handling**: Distinguishes client cancellation (499) from server errors (500) ✅ **Loop protection**: `MaxToolIterations = 12` prevents infinite LLM loops ✅ **SQL injection**: Uses parameterized queries in new `ListUniverses` method ## Technical Concerns Reviewed - **High entropy in go.sum**: Expected for cryptographic hashes and dependency checksums - **Missing tests for some files**: Acceptable for main.go (integration) and simple handlers - **Version compatibility**: Anthropic SDK version appears reasonable for a 2024 release ## Code Quality Observations - Proper interface abstractions for testability - Comprehensive unit test coverage for core logic - Good separation of HTTP layer from business logic - Reasonable error messages and logging No blocking issues identified. The implementation follows good practices for LLM integration with appropriate safeguards.
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
brooktrails/directory-pacifier!7
No description provided.