openhands-sdk/openhands/sdk/llm/
Core Responsibilities
The LLM system has five primary responsibilities:- Provider Abstraction - Uniform interface to OpenAI, Anthropic, Google, and 100+ providers
- Request Pipeline - Dual API support: Chat Completions (
completion()) and Responses API (responses()) - Configuration Management - Load from environment, JSON, or programmatic configuration
- Telemetry & Cost - Track usage, latency, and costs across providers
- Enhanced Reasoning - Support for OpenAI Responses API with encrypted thinking and reasoning summaries
Architecture
Key Components
| Component | Purpose | Design |
|---|---|---|
LLM | Configuration model | Pydantic model with provider settings |
completion() | Chat Completions API | Handles retries, timeouts, streaming |
responses() | Responses API | Enhanced reasoning with encrypted thinking |
LiteLLM | Provider adapter | Unified API for 100+ providers |
| Configuration Loaders | Config hydration | load_from_env(), load_from_json() |
| Telemetry | Usage tracking | Token counts, costs, latency |
Configuration
SeeLLM source for complete list of supported fields.
Programmatic Configuration
Create LLM instances directly in code: Example:Environment Variable Configuration
Load from environment using naming convention: Environment Variable Pattern:- Prefix: All variables start with
LLM_ - Mapping:
LLM_FIELDβfield(lowercased) - Types: Auto-cast to int, float, bool, JSON, or SecretStr
JSON Configuration
Serialize and load from JSON files: Example:llm.model_dump_json(exclude_none=True, context={"expose_secrets": True}).
Request Pipeline
Completion Flow
Pipeline Stages:- Validation: Check required fields (model, messages)
- Request: Call LiteLLM with provider-specific formatting
- Retry Logic: Exponential backoff on failures (configurable)
- Telemetry: Record tokens, cost, latency
- Response: Return completion or raise error
Responses API Support
In addition to the standard chat completion API, the LLM system supports OpenAIβs Responses API as an alternative invocation path for models that benefit from this newer interface (e.g., GPT-5-Codex only supports Responses API). The Responses API provides enhanced reasoning capabilities with encrypted thinking and detailed reasoning summaries.Architecture
Supported Models
Models that automatically use the Responses API path:| Pattern | Examples | Documentation |
|---|---|---|
| gpt-5* | gpt-5, gpt-5-mini, gpt-5-codex | OpenAI GPT-5 family |
model_features.py.
Provider Integration
LiteLLM Abstraction
Agent SDK uses LiteLLM for provider abstraction: Benefits:- 100+ Providers: OpenAI, Anthropic, Google, Azure, AWS Bedrock, local models, etc.
- Unified API: Same interface regardless of provider
- Format Translation: Provider-specific request/response formatting
- Error Handling: Normalized error codes and messages
LLM Providers
Provider integrations remain shared between the Agent SDK and the OpenHands Application. The pages linked below live under the OpenHands app section but apply verbatim to SDK applications because both layers wrap the sameopenhands.sdk.llm.LLM interface.
| Provider / scenario | Documentation |
|---|---|
| OpenHands hosted models | /openhands/usage/llms/openhands-llms |
| OpenAI | /openhands/usage/llms/openai-llms |
| Azure OpenAI | /openhands/usage/llms/azure-llms |
| Google Gemini / Vertex | /openhands/usage/llms/google-llms |
| Groq | /openhands/usage/llms/groq |
| OpenRouter | /openhands/usage/llms/openrouter |
| Moonshot | /openhands/usage/llms/moonshot |
| LiteLLM proxy | /openhands/usage/llms/litellm-proxy |
| Local LLMs (Ollama, SGLang, vLLM, LM Studio) | /openhands/usage/llms/local-llms |
| Custom LLM configurations | /openhands/usage/llms/custom-llm-configs |
LLM object using the documented parameters (for example, API keys, base URLs,
or custom headers) and pass it into your agent or registry. The OpenHands UI
surfacing is simply a convenience layer on top of the same configuration model.
Telemetry and Cost Tracking
Telemetry Collection
LLM requests automatically collect metrics: Tracked Metrics:- Token Usage: Input tokens, output tokens, total
- Cost: Per-request cost using configured rates
- Latency: Request duration in milliseconds
- Errors: Failure types and retry counts
Cost Configuration
Configure per-token costs for custom models:- Internal models
- Custom pricing agreements
- Cost estimation for budgeting
Component Relationships
How LLM Integrates
Relationship Characteristics:- Agent β LLM: Agent uses LLM for reasoning and tool calls
- LLM β Events: LLM requests/responses recorded as events
- Security β LLM: Optional security analyzer can use separate LLM
- Condenser β LLM: Optional context condenser can use separate LLM
- Configuration: LLM configured independently, passed to agent
- Telemetry: LLM metrics flow through event system to UI/logging
See Also
- Agent Architecture - How agents use LLMs for reasoning and perform actions
- Events - LLM request/response event types
- Security - Optional LLM-based security analysis
- Provider Setup Guides - Provider-specific configuration

