Documentation Index
Fetch the complete documentation index at: https://docs.openhands.dev/llms.txt
Use this file to discover all available pages before exploring further.
The LLM system provides a unified interface to language model providers through LiteLLM. It handles model configuration, request orchestration, retry logic, telemetry, and cost tracking across all providers.
Source: openhands-sdk/openhands/sdk/llm/
Core Responsibilities
The LLM system has five primary responsibilities:
- Provider Abstraction - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers
- Request Pipeline - Dual API support: Chat Completions (
completion()) and Responses API (responses())
- Configuration Management - Load from environment, JSON, or programmatic configuration
- Telemetry & Cost - Track usage, latency, and costs across providers
- Enhanced Reasoning - Support for OpenAI Responses API with encrypted thinking and reasoning summaries
Architecture
Key Components
| Component | Purpose | Design |
LLM | Configuration model | Pydantic model with provider settings |
completion() | Chat Completions API | Handles retries, timeouts, streaming |
responses() | Responses API | Enhanced reasoning with encrypted thinking |
LiteLLM | Provider adapter | Unified API for 100+ providers |
| Configuration Loaders | Config hydration | load_from_env(), load_from_json() |
| Telemetry | Usage tracking | Token counts, costs, latency |
Configuration
See LLM source for complete list of supported fields.
Programmatic Configuration
Create LLM instances directly in code:
Example:
from pydantic import SecretStr
from openhands.sdk import LLM
llm = LLM(
model="anthropic/claude-sonnet-4.1",
api_key=SecretStr("sk-ant-123"),
temperature=0.1,
timeout=120,
)
Environment Variable Configuration
Load from environment using naming convention:
Environment Variable Pattern:
- Prefix: All variables start with
LLM_
- Mapping:
LLM_FIELD → field (lowercased)
- Types: Auto-cast to int, float, bool, JSON, or SecretStr
Common Variables:
export LLM_MODEL="anthropic/claude-sonnet-4.1"
export LLM_API_KEY="sk-ant-123"
export LLM_USAGE_ID="primary"
export LLM_TIMEOUT="120"
export LLM_NUM_RETRIES="5"
JSON Configuration
Serialize and load from JSON files:
Example:
# Save
llm.model_dump_json(exclude_none=True, indent=2)
# Load
llm = LLM.load_from_json("config/llm.json")
Security: Secrets are redacted in serialized JSON (combine with environment variables for sensitive data).
If you need to include secrets in JSON, use llm.model_dump_json(exclude_none=True, context={"expose_secrets": True}).
Request Pipeline
Completion Flow
Pipeline Stages:
- Validation: Check required fields (model, messages)
- Request: Call LiteLLM with provider-specific formatting
- Retry Logic: Exponential backoff on failures (configurable)
- Telemetry: Record tokens, cost, latency
- Response: Return completion or raise error
Responses API Support
In addition to the standard chat completion API, the LLM system supports OpenAI’s Responses API as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries.
Architecture
Supported Models
Models that automatically use the Responses API path:
| Pattern | Examples | Documentation |
| gpt-5* | gpt-5, gpt-5-mini, gpt-5-codex | OpenAI GPT-5 family |
Detection: The SDK automatically detects if a model supports the Responses API using pattern matching in model_features.py.
Provider Integration
LiteLLM Abstraction
Software Agent SDK uses LiteLLM for provider abstraction:
Benefits:
- 100+ Providers: OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc.
- Unified API: Same interface regardless of provider
- Format Translation: Provider-specific request/response formatting
- Error Handling: Normalized error codes and messages
LLM Providers
Provider integrations remain shared between the Software Agent SDK and the OpenHands Application.
The pages linked below live under the OpenHands app section but apply
verbatim to SDK applications because both layers wrap the same
openhands.sdk.llm.LLM interface.
When you follow any of those guides while building with the SDK, create an
LLM object using the documented parameters (for example, API keys, base URLs,
or custom headers) and pass it into your agent or registry. The OpenHands UI
surfacing is simply a convenience layer on top of the same configuration model.
Telemetry and Cost Tracking
Telemetry Collection
LLM requests automatically collect metrics:
Tracked Metrics:
- Token Usage: Input tokens, output tokens, total
- Cost: Per-request cost using configured rates
- Latency: Request duration in milliseconds
- Errors: Failure types and retry counts
Cost Configuration
Configure per-token costs for custom models:
llm = LLM(
model="custom/my-model",
input_cost_per_token=0.00001, # $0.01 per 1K tokens
output_cost_per_token=0.00003, # $0.03 per 1K tokens
)
Built-in Costs: LiteLLM includes costs for major providers (updated regularly, link)
Custom Costs: Override for:
- Internal models
- Custom pricing agreements
- Cost estimation for budgeting
Component Relationships
How LLM Integrates
Relationship Characteristics:
- Agent → LLM: Agent uses LLM for reasoning and tool calls
- LLM → Events: LLM requests/responses recorded as events
- Security → LLM: Optional security analyzer can use separate LLM
- Condenser → LLM: Optional context condenser can use separate LLM
- Configuration: LLM configured independently, passed to agent
- Telemetry: LLM metrics flow through event system to UI/logging
See Also