A ready-to-run example is available here!
Overview
A plainconversation.run() stops as soon as the agent thinks it is done. The /goal command is stricter: after each run it asks a second judge LLM to audit the transcript for authoritative evidence — file contents, command output, test results — that the objective is provably complete. If something is still missing, the loop re-prompts the agent with the judge’s feedback and runs again, until the goal is genuinely done or a hard iteration cap is reached.
That makes it a good fit for verifiable objectives like “make the tests pass”, “produce a working CLI”, or “publish a passing migration”: the agent cannot finish just by claiming success — the judge has to see the green output first.
Use cases:
- Test-driven objectives — finish only when
pytest(or any command) actually passes - Multi-step deliverables — keep the agent going until every requirement is verified
- Long-running tasks — combine with a critic and stop hooks for full control over termination
/goal is an extension applied to a conversation: it composes with whatever agent, tools, or critic you already have. The critic governs each inner run(); the /goal loop governs the overall objective.
How It Works
run_goal drives the conversation you pass in (it does not fork or spin up a sidecar), every turn — objective, agent work, judge-driven follow-ups — lands in the same conversation.state.events history.
Quick Start
Use a separate
LLM instance (distinct usage_id) for the judge, even if you reuse the same model. Keeping the judge isolated from the agent’s LLM lets you account for its cost separately and avoids accidentally sharing streaming or callback state.Understanding the Result
run_goal returns a GoalOutcome that reports whether the loop ended cleanly or was capped, plus the judge’s final verdict.
| Field | Type | Description |
|---|---|---|
status | "complete" | "capped" | Whether the judge confirmed completion, or the loop hit max_iterations. |
iterations | int | Number of audit rounds performed (≥ 1). |
verdict | GoalVerdict | The judge’s last verdict. |
GoalVerdict is what the judge LLM produces every round:
| Field | Type | Description |
|---|---|---|
score | float (0.0–1.0) | Probability that the full objective is provably done. |
complete | bool | Whether the judge considers the objective complete. |
missing | str | Concise description of what remains, or empty if complete. |
missing field is what the loop feeds back to the agent in the next follow-up turn, so the agent knows exactly which requirements still need verifiable evidence.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
conversation | BaseConversation | — | The conversation to drive. Any agent/tools/critic config is supported. |
objective | str | — | The goal to pursue and audit against. Must be non-empty. |
judge_llm | LLM | — | The second LLM that grades completion. Should be independent from the agent’s LLM. |
max_iterations | int | 10 | Hard cap on audit rounds before the loop returns status="capped". |
Composing With a Critic
/goal and a Critic operate at different layers:
- A critic governs each inner
run()— it can refine the agent’s work mid-run via iterative refinement. - The
/goalloop governs the overall objective — it decides whether to re-prompt the agent at all.
run_goal. Every inner run() still consults the critic; the outer loop still re-runs until the judge is satisfied.
Lower-Level Building Blocks
run_goal is a thin synchronous driver over a transport-agnostic controller. If you need to integrate the loop into a custom driver (async, agent-server, UI progress reporting), reach for the building blocks directly.
GoalController
GoalController owns the continue-vs-stop decision logic and the iteration cap. It does no conversation transport I/O — the driver owns sending messages and running the agent — but it does own the judge call: on_run_finished() synchronously invokes the judge LLM, so treat that call as blocking.
judge_goal
judge_goal is the reusable kernel: a synchronous, LLM-backed evaluator with signature judge_goal(judge_llm, objective, events) → GoalVerdict and no dependency on the loop. It calls the judge LLM each time, so it is not a pure function. Use it directly to build a /status command, a stop hook, or a server endpoint:
role: text transcript and asks the LLM for a strict-JSON verdict. The agent’s system prompt is intentionally excluded from the transcript to keep judge token cost low — it carries no goal-specific evidence.
Notes
- Goal vs. Critic. A critic scores each
run()and triggers refinement turns inside one run. The/goalloop drives the overall objective from the outside. The two compose: the critic improves each turn; the goal loop ensures the right number of turns happen. - No fork.
run_goaldrives the conversation you pass in — it does not create a sidecar conversation. All goal-related events land in the sameconversation.state.eventshistory. - Conservative parsing. If the judge response cannot be parsed as JSON, the verdict falls back to
score=0.0, complete=Falseso the loop keeps working rather than falsely finishing.
Ready-to-run Example
This example is available on GitHub: examples/01_standalone_sdk/54_goal_completion_loop.py
examples/01_standalone_sdk/54_goal_completion_loop.py
The model name should follow the LiteLLM convention:
provider/model_name (e.g., anthropic/claude-sonnet-4-5-20250929, openai/gpt-4o).
The LLM_API_KEY should be the API key for your chosen provider.Next Steps
- Critic — Score and refine individual agent runs in real time
- Iterative Refinement — Multi-agent feedback loop for quality-bound tasks
- Hooks — Customize start/stop semantics on every run
- Persistence — Save and restore conversation state across goal runs

