openhands.sdk.llm - OpenHands Docs

class ImageContent

Bases: BaseContent

Properties

cache_prompt: bool
image_urls: list[str]
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
type: Literal[‘image’]

Methods

to_llm_dict()

Convert to LLM API format.

class LLM

Bases: BaseModel, RetryMixin, NonNativeToolCallingMixin Language model interface for OpenHands agents. The LLM class provides a unified interface for interacting with various language models through the litellm library. It handles model configuration, API authentication, retry logic, and tool calling capabilities.

Example

>>> from openhands.sdk import LLM
>>> from pydantic import SecretStr
>>> llm = LLM(
...     model="claude-sonnet-4-20250514",
...     api_key=SecretStr("your-api-key"),
...     usage_id="my-agent"
... )
>>> # Use with agent or conversation

Properties

OVERRIDE_ON_SERIALIZE: tuple[str, …]
api_key: str | SecretStr | None
api_version: str | None
aws_access_key_id: str | SecretStr | None
aws_region_name: str | None
aws_secret_access_key: str | SecretStr | None
base_url: str | None
caching_prompt: bool
custom_llm_provider: str | None
custom_tokenizer: str | None
disable_stop_word: bool | None
disable_vision: bool | None
drop_params: bool
enable_encrypted_reasoning: bool
extended_thinking_budget: int | None
extra_headers: dict[str, str] | None
input_cost_per_token: float | None
litellm_extra_body: dict[str, Any]
log_completions: bool
log_completions_folder: str
max_input_tokens: int | None
max_message_chars: int
max_output_tokens: int | None
metrics: Metrics Get usage metrics for this LLM instance.
- Returns: Metrics object containing token usage, costs, and other statistics.
model: str
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_info: dict | None Returns the model info dictionary.
modify_params: bool
native_tool_calling: bool
num_retries: int
ollama_base_url: str | None
openrouter_app_name: str
openrouter_site_url: str
output_cost_per_token: float | None
reasoning_effort: Literal[‘low’, ‘medium’, ‘high’, ‘none’] | None
reasoning_summary: Literal[‘auto’, ‘concise’, ‘detailed’] | None
retry_listener: SkipJsonSchema[Callable[[int, int], None] | None]
retry_max_wait: int
retry_min_wait: int
retry_multiplier: float
safety_settings: list[dict[str, str]] | None
seed: int | None
service_id: str
temperature: float | None
timeout: int | None
top_k: float | None
top_p: float | None
usage_id: str

Methods

completion()

Generate a completion from the language model. This is the method for getting responses from the model via Completion API. It handles message formatting, tool calling, and response processing.

Returns: LLMResponse containing the model’s response and metadata.
Raises: ValueError – If streaming is requested (not supported).

format_messages_for_llm()

Formats Message objects for LLM consumption.

format_messages_for_responses()

Prepare (instructions, input[]) for the OpenAI Responses API.

Skips prompt caching flags and string serializer concerns
Uses Message.to_responses_value to get either instructions (system) or input items (others)
Concatenates system instructions into a single instructions string

get_token_count()

is_caching_prompt_active()

Check if prompt caching is supported and enabled for current model.

Returns: True if prompt caching is supported and enabled for the given : model.
Return type: boolean

classmethod load_from_env()

classmethod load_from_json()

model_post_init()

This function is meant to behave like a BaseModel method to initialise private attributes. It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
- self – The BaseModel instance.
- context – The context.

resolve_diff_from_deserialized()

Resolve differences between a deserialized LLM and the current instance. This is due to fields like api_key being serialized to “

**

**

” in dumps, and we want to ensure that when loading from a file, we still use the runtime-provided api_key in the self instance. Return a new LLM instance equivalent to persisted but with explicitly whitelisted fields (e.g. api_key) taken from self.

responses()

Alternative invocation path using OpenAI Responses API via LiteLLM. Maps Message[] -> (instructions, input[]) and returns LLMResponse. Non-stream only for v1.

restore_metrics()

uses_responses_api()

Whether this model uses the OpenAI Responses API path.

vision_is_active()

class LLMRegistry

Bases: object A minimal LLM registry for managing LLM instances by usage ID. This registry provides a simple way to manage multiple LLM instances, avoiding the need to recreate LLMs with the same configuration.

Properties

registry_id: str
retry_listener: Callable[[int, int], None] | None
service_to_llm: dict[str, LLM]
subscriber: Callable[[RegistryEvent], None] | None
usage_to_llm: dict[str, LLM] Access the internal usage-ID-to-LLM mapping.

Methods

init()

Initialize the LLM registry.

Parameters: retry_listener – Optional callback for retry events.

add()

Add an LLM instance to the registry.

Parameters: llm – The LLM instance to register.
Raises: ValueError – If llm.usage_id already exists in the registry.

get()

Get an LLM instance from the registry.

Parameters: usage_id – Unique identifier for the LLM usage slot.
Returns: The LLM instance.
Raises: KeyError – If usage_id is not found in the registry.

list_services()

Deprecated alias for list_usage_ids().

list_usage_ids()

List all registered usage IDs.

notify()

Notify subscribers of registry events.

Parameters: event – The registry event to notify about.

Subscribe to registry events.

Parameters: callback – Function to call when LLMs are created or updated.

class LLMResponse

Bases: BaseModel Result of an LLM completion request. This type provides a clean interface for LLM completion results, exposing only OpenHands-native types to consumers while preserving access to the raw LiteLLM response for internal use.

Properties

id: str Get the response ID from the underlying LLM response. This property provides a clean interface to access the response ID, supporting both completion mode (ModelResponse) and response API modes (ResponsesAPIResponse).
- Returns: The response ID from the LLM response
message: Message
metrics: MetricsSnapshot
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
raw_response: ModelResponse | ResponsesAPIResponse

Methods

message

The completion message converted to OpenHands Message type

Type: openhands.sdk.llm.message.Message

metrics

Snapshot of metrics from the completion request

Type: openhands.sdk.llm.utils.metrics.MetricsSnapshot

raw_response

The original LiteLLM response (ModelResponse or ResponsesAPIResponse) for internal use

Type: litellm.types.utils.ModelResponse | litellm.types.llms.openai.ResponsesAPIResponse

class Message

Bases: BaseModel

Properties

cache_enabled: bool
contains_image: bool
content: Sequence[TextContent | ImageContent]
force_string_serializer: bool
function_calling_enabled: bool
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
name: str | None
reasoning_content: str | None
responses_reasoning_item: ReasoningItemModel | None
role: Literal[‘user’, ‘system’, ‘assistant’, ‘tool’]
thinking_blocks: Sequence[ThinkingBlock | RedactedThinkingBlock]
tool_call_id: str | None
tool_calls: list[MessageToolCall] | None
vision_enabled: bool

Methods

classmethod from_llm_chat_message()

Convert a LiteLLMMessage (Chat Completions) to our Message class. Provider-agnostic mapping for reasoning:

Prefer message.reasoning_content if present (LiteLLM normalized field)
Extract thinking_blocks from content array (Anthropic-specific)

classmethod from_llm_responses_output()

Convert OpenAI Responses API output items into a single assistant Message. Policy (non-stream):

Collect assistant text by concatenating output_text parts from message items
Normalize function_call items to MessageToolCall list

to_chat_dict()

Serialize message for OpenAI Chat Completions. Chooses the appropriate content serializer and then injects threading keys:

Assistant tool call turn: role == “assistant” and self.tool_calls
Tool result turn: role == “tool” and self.tool_call_id (with name)

to_responses_dict()

Serialize message for OpenAI Responses (input parameter). Produces a list of “input” items for the Responses API:

system: returns [], system content is expected in ‘instructions’
user: one ‘message’ item with content parts -> input_text / input_image (when vision enabled)
assistant: emits prior assistant content as input_text, and function_call items for tool_calls
tool: emits function_call_output items (one per TextContent) with matching call_id

to_responses_value()

Return serialized form. Either an instructions string (for system) or input items (for other roles).

class MessageToolCall

Bases: BaseModel Transport-agnostic tool call representation. One canonical id is used for linking across actions/observations and for Responses function_call_output call_id.

Properties

arguments: str
id: str
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
name: str
origin: Literal[‘completion’, ‘responses’]

Methods

classmethod from_chat_tool_call()

Create a MessageToolCall from a Chat Completions tool call.

classmethod from_responses_function_call()

Create a MessageToolCall from a typed OpenAI Responses function_call item. Note: OpenAI Responses function_call.arguments is already a JSON string.

to_chat_dict()

Serialize to OpenAI Chat Completions tool_calls format.

to_responses_dict()

Serialize to OpenAI Responses ‘function_call’ input item format.

class Metrics

Bases: MetricsSnapshot Metrics class can record various metrics during running and evaluation. We track:

accumulated_cost and costs
max_budget_per_task (budget limit)
A list of ResponseLatency
A list of TokenUsage (one per call).

Properties

costs: list[Cost]
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
response_latencies: list[ResponseLatency]
token_usages: list[TokenUsage]

Methods

add_cost()

add_response_latency()

add_token_usage()

Add a single usage record.

deep_copy()

Create a deep copy of the Metrics object.

diff()

Calculate the difference between current metrics and a baseline. This is useful for tracking metrics for specific operations like delegates.

Parameters: baseline – A metrics object representing the baseline state
Returns: A new Metrics object containing only the differences since the baseline

get()

Return the metrics in a dictionary.

get_snapshot()

Get a snapshot of the current metrics without the detailed lists.

initialize_accumulated_token_usage()

log()

Log the metrics.

merge()

Merge ‘other’ metrics into this one.

classmethod validate_accumulated_cost()

class MetricsSnapshot

Bases: BaseModel A snapshot of metrics at a point in time. Does not include lists of individual costs, latencies, or token usages.

Properties

accumulated_cost: float
accumulated_token_usage: TokenUsage | None
max_budget_per_task: float | None
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
model_name: str

class ReasoningItemModel

Bases: BaseModel OpenAI Responses reasoning item (non-stream, subset we consume). Do not log or render encrypted_content.

Properties

content: list[str] | None
encrypted_content: str | None
id: str | None
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
status: str | None
summary: list[str]

class RedactedThinkingBlock

Bases: BaseModel Redacted thinking block for previous responses without extended thinking. This is used as a placeholder for assistant messages that were generated before extended thinking was enabled.

Properties

data: str
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
type: Literal[‘redacted_thinking’]

class RegistryEvent

Bases: BaseModel

Properties

llm: LLM
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RouterLLM

Bases: LLM Base class for multiple LLM acting as a unified LLM. This class provides a foundation for implementing model routing by inheriting from LLM, allowing routers to work with multiple underlying LLM models while presenting a unified LLM interface to consumers. Key features:

Works with multiple LLMs configured via llms_for_routing
Delegates all other operations/properties to the selected LLM
Provides routing interface through select_llm() method

Properties

active_llm: LLM | None
llms_for_routing: dict[str, LLM]
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
router_name: str

Methods

completion()

This method intercepts completion calls and routes them to the appropriate underlying LLM based on the routing logic implemented in select_llm().

model_post_init()

This function is meant to behave like a BaseModel method to initialise private attributes. It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
- self – The BaseModel instance.
- context – The context.

abstractmethod select_llm()

Select which LLM to use based on messages and events. This method implements the core routing logic for the RouterLLM. Subclasses should analyze the provided messages to determine which LLM from llms_for_routing is most appropriate for handling the request.

Parameters: messages – List of messages in the conversation that can be used to inform the routing decision.
Returns: The key/name of the LLM to use from llms_for_routing dictionary.

classmethod set_placeholder_model()

Guarantee model exists before LLM base validation runs.

classmethod validate_llms_not_empty()

class TextContent

Bases: BaseContent

Properties

cache_prompt: bool
model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
text: str
type: Literal[‘text’]

Methods

to_llm_dict()

Convert to LLM API format.

class ThinkingBlock

Bases: BaseModel Anthropic thinking block for extended thinking feature. This represents the raw thinking blocks returned by Anthropic models when extended thinking is enabled. These blocks must be preserved and passed back to the API for tool use scenarios.

Properties

model_config: ClassVar[ConfigDict] = (configuration object) Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
signature: str
thinking: str
type: Literal[‘thinking’]

Guides

Architecture

API Reference

​class ImageContent

​Properties

​Methods

​to_llm_dict()

​class LLM

​Example

​Properties

​Methods

​completion()

​format_messages_for_llm()

​format_messages_for_responses()

​get_token_count()

​is_caching_prompt_active()

​classmethod load_from_env()

​classmethod load_from_json()

​model_post_init()

​resolve_diff_from_deserialized()

​responses()

​restore_metrics()

​uses_responses_api()

​vision_is_active()

​class LLMRegistry

​Properties

​Methods

​init()

​add()

​get()

​list_services()

​list_usage_ids()

​notify()

​subscribe()

​class LLMResponse

​Properties

​Methods

​message

​metrics

​raw_response

​class Message

​Properties

​Methods

​classmethod from_llm_chat_message()

​classmethod from_llm_responses_output()

​to_chat_dict()

​to_responses_dict()

​to_responses_value()

​class MessageToolCall

​Properties

​Methods

​classmethod from_chat_tool_call()

​classmethod from_responses_function_call()

​to_chat_dict()

​to_responses_dict()

​class Metrics

​Properties

​Methods

​add_cost()

​add_response_latency()

​add_token_usage()

​deep_copy()

​diff()

​get()

​get_snapshot()

​initialize_accumulated_token_usage()

​log()

​merge()

​classmethod validate_accumulated_cost()

​class MetricsSnapshot

​Properties

​class ReasoningItemModel

​Properties

​class RedactedThinkingBlock

​Properties

​class RegistryEvent

​Properties

​class RouterLLM

​Properties

​Methods

class ImageContent

Properties

Methods

to_llm_dict()

class LLM

Example

Properties

Methods

completion()

format_messages_for_llm()

format_messages_for_responses()

get_token_count()

is_caching_prompt_active()

classmethod load_from_env()

classmethod load_from_json()

model_post_init()

resolve_diff_from_deserialized()

responses()

restore_metrics()

uses_responses_api()

vision_is_active()

class LLMRegistry

Properties

Methods

init()

add()

get()

list_services()

list_usage_ids()

notify()

subscribe()

class LLMResponse

Properties

Methods

message

metrics

raw_response

class Message

Properties

Methods

classmethod from_llm_chat_message()

classmethod from_llm_responses_output()

to_chat_dict()

to_responses_dict()

to_responses_value()

class MessageToolCall

Properties

Methods

classmethod from_chat_tool_call()

classmethod from_responses_function_call()

to_chat_dict()

to_responses_dict()

class Metrics

Properties

Methods

add_cost()

add_response_latency()

add_token_usage()

deep_copy()

diff()

get()

get_snapshot()

initialize_accumulated_token_usage()

log()

merge()

classmethod validate_accumulated_cost()

class MetricsSnapshot

Properties

class ReasoningItemModel

Properties

class RedactedThinkingBlock

Properties

class RegistryEvent

Properties

class RouterLLM

Properties

Methods

completion()

model_post_init()

abstractmethod select_llm()