OpenAI-Compatible Endpoint

The agent-server exposes an OpenAI-compatible /v1/chat/completions endpoint so clients that already speak the OpenAI protocol can call an OpenHands agent. Use this when you want an existing chat UI, IDE integration, evaluation harness, voice platform, or another agent to treat OpenHands as an OpenAI-style backend while still getting the full agent runtime behind the request.

What to Configure

Most OpenAI-compatible clients ask for the same three fields:

Client Field	Value
Base URL	`https://YOUR_AGENT_SERVER/v1`
API key	Your agent-server session API key
Model	`openhands_<profile_name>`

For example, a saved LLM profile named gateway_demo appears as the OpenAI model openhands_gateway_demo. The gateway accepts the same session key in either OpenHands or OpenAI-compatible form:

X-Session-API-Key: <key>
Authorization: Bearer <key>

Prepare a Profile

OpenAI-compatible traffic is backed by an agent-server LLM profile. Create one with the native profile API first:

export AGENT_SERVER_URL="http://localhost:8000"
export SESSION_API_KEY="your-session-api-key"
export PROFILE_NAME="gateway_demo"
export OPENHANDS_MODEL="openhands_${PROFILE_NAME}"

curl -X POST "$AGENT_SERVER_URL/api/profiles/$PROFILE_NAME" \
  -H "X-Session-API-Key: $SESSION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "llm": {
      "model": "gpt-5-nano",
      "api_key": "YOUR_LLM_API_KEY"
    },
    "include_secrets": true
  }'

Then confirm the profile is visible to OpenAI clients:

curl "$AGENT_SERVER_URL/v1/models" \
  -H "Authorization: Bearer $SESSION_API_KEY"

Client Recipes

curl
Python SDK
JavaScript SDK
Chat UIs
Voice or Webhook

curl -i "$AGENT_SERVER_URL/v1/chat/completions" \
  -H "Authorization: Bearer $SESSION_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"$OPENHANDS_MODEL\",
    \"messages\": [
      {
        \"role\": \"system\",
        \"content\": \"Answer directly unless you need to inspect files.\"
      },
      {
        \"role\": \"user\",
        \"content\": \"Explain what this OpenHands endpoint does in one sentence.\"
      }
    ]
  }"

The response includes X-OpenHands-ServerConversation-ID. Save that header if you want a later request to continue the same agent conversation.

import os

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SESSION_API_KEY"],
    base_url=f"{os.environ['AGENT_SERVER_URL']}/v1",
)

response = client.chat.completions.with_raw_response.create(
    model=os.environ["OPENHANDS_MODEL"],
    messages=[
        {"role": "user", "content": "Summarize this repository."},
    ],
)
completion = response.parse()
conversation_id = response.headers["X-OpenHands-ServerConversation-ID"]
print(completion.choices[0].message.content)

follow_up = client.chat.completions.create(
    model=os.environ["OPENHANDS_MODEL"],
    messages=[{"role": "user", "content": "Now list the main packages."}],
    extra_headers={"X-OpenHands-ServerConversation-ID": conversation_id},
)
print(follow_up.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.SESSION_API_KEY,
  baseURL: `${process.env.AGENT_SERVER_URL}/v1`,
});

const first = await client.chat.completions
  .create({
    model: process.env.OPENHANDS_MODEL,
    messages: [
      { role: "user", content: "Summarize this repository." },
    ],
  })
  .withResponse();

const conversationId = first.response.headers.get(
  "x-openhands-serverconversation-id",
);
console.log(first.data.choices[0].message.content);

const followUp = await client.chat.completions.create(
  {
    model: process.env.OPENHANDS_MODEL,
    messages: [{ role: "user", content: "Now list the main packages." }],
  },
  {
    headers: { "X-OpenHands-ServerConversation-ID": conversationId },
  },
);
console.log(followUp.choices[0].message.content);

For Open WebUI, LibreChat, Chatbot UI, and similar OpenAI-compatible frontends, configure a custom OpenAI provider with:

Base URL: https://YOUR_AGENT_SERVER/v1
API key: your agent-server session API key
Model: openhands_<profile_name>
Streaming: disabled for now

If the UI can store a response header and send a custom request header, persist X-OpenHands-ServerConversation-ID per chat thread and send it on follow-up turns. If it cannot, each request starts a new OpenHands conversation and works best for one-shot tasks.

Voice platforms and webhook integrations usually have their own session or call ID. Store a mapping from that external ID to the OpenHands conversation ID:

import os

# Initialize this once at app startup, or replace it with durable session storage.
conversation_ids: dict[str, str] = {}

conversation_id = conversation_ids.get(platform_session_id)
headers = {}
if conversation_id:
    headers["X-OpenHands-ServerConversation-ID"] = conversation_id

response = client.chat.completions.with_raw_response.create(
    model=os.environ.get("OPENHANDS_MODEL", "openhands_gateway_demo"),
    messages=[{"role": "user", "content": transcript_text}],
    extra_headers=headers,
)

conversation_ids[platform_session_id] = response.headers[
    "X-OpenHands-ServerConversation-ID"
]
reply_text = response.parse().choices[0].message.content

Return reply_text to the voice or webhook platform. Keep the mapping for as long as that external session should continue.

Conversation State

The OpenAI Chat Completions protocol usually sends full message history on every request. The OpenHands gateway does not reconstruct agent history from prior assistant messages. Instead:

Omit X-OpenHands-ServerConversation-ID to start a new OpenHands conversation.
Read X-OpenHands-ServerConversation-ID from the response.
Send that header on follow-up requests to continue the same OpenHands conversation.

When reusing a conversation, send the newest user turn in messages. The server-side OpenHands conversation owns the previous agent state, tool activity, and workspace context.

Current Limitations

Only non-streaming Chat Completions requests are supported. Requests with stream: true return 400 until streaming support is added.
The response contains the final assistant text only. Internal OpenHands tool activity is not exposed as OpenAI tool calls.
OpenAI request fields that are not needed by the gateway are ignored or rejected intentionally by the server implementation.

Ready-to-run example

This example is available on GitHub: examples/02_remote_agent_server/15_openai_compatible_gateway.py

examples/02_remote_agent_server/15_openai_compatible_gateway.py

"""Use the agent-server through an OpenAI-compatible Chat Completions client.

This example starts a local agent-server, stores an LLM profile, lists it through
``GET /v1/models``, then calls ``POST /v1/chat/completions`` with the OpenAI
Python SDK. The returned ``X-OpenHands-ServerConversation-ID`` header is passed
back on a second call to continue the same OpenHands conversation.
"""

import os
from uuid import UUID

import httpx
from openai import OpenAI
from scripts.utils import ManagedAPIServer


# The gateway runs a full OpenHands agent, but OpenAI clients still need a
# normal model-like name. We create an LLM profile below and expose it as
# `openhands_<profile_name>` through `/v1/models`.

api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
assert api_key is not None, "Set LLM_API_KEY or OPENAI_API_KEY."

llm_model = os.getenv("LLM_MODEL", "gpt-5-nano")
llm_base_url = os.getenv("LLM_BASE_URL")
profile_name = "gateway_demo"
gateway_model = f"openhands_{profile_name}"

# Start a local agent-server for the demo. `use_session_api_key=True` turns on
# authentication; the same key works as both `X-Session-API-Key` for native
# agent-server routes and `Authorization: Bearer ...` for OpenAI SDK calls.

with ManagedAPIServer(
    port=8770,
    use_session_api_key=True,
    extra_env={
        "OH_ENABLE_VNC": "0",
        "OH_ENABLE_VSCODE": "0",
        "OH_PRELOAD_TOOLS": "0",
        "OH_SECRET_KEY": "example-secret-key-for-demo-only-32b",
        "OH_WEBHOOKS": "[]",
    },
    health_request_timeout=2.0,
) as server:
    session_api_key = (
        os.getenv("SESSION_API_KEY")
        or os.getenv("OH_SESSION_API_KEYS_0")
        or server.session_api_key
    )
    assert session_api_key is not None

    # Use the native REST API once to create the profile that backs the gateway
    # model. After that, normal OpenAI SDK calls are enough for chat traffic.
    api_client = httpx.Client(
        base_url=server.base_url,
        headers={"X-Session-API-Key": session_api_key},
        timeout=120.0,
    )
    openai_client = OpenAI(
        api_key=session_api_key,
        base_url=f"{server.base_url}/v1",
        timeout=120.0,
    )

    llm_config = {"model": llm_model, "api_key": api_key}
    if llm_base_url:
        llm_config["base_url"] = llm_base_url

    # `gateway_demo` becomes visible to OpenAI clients as `openhands_gateway_demo`.
    profile_response = api_client.post(
        f"/api/profiles/{profile_name}",
        json={"llm": llm_config, "include_secrets": True},
    )
    assert profile_response.status_code == 201, profile_response.text

    models = openai_client.models.list()
    model_ids = [model.id for model in models.data]
    assert gateway_model in model_ids
    print(f"Gateway models include: {gateway_model}")

    # Ask through the OpenAI SDK. `with_raw_response` lets us read the custom
    # response header that identifies the OpenHands conversation created behind
    # this otherwise OpenAI-shaped request.

    first_response = openai_client.chat.completions.with_raw_response.create(
        model=gateway_model,
        messages=[
            {
                "role": "system",
                "content": "Answer directly and do not use tools.",
            },
            {
                "role": "user",
                "content": (
                    "In one sentence, explain what an OpenAI-compatible "
                    "agent-server gateway does."
                ),
            },
        ],
    )
    first_completion = first_response.parse()
    conversation_id = first_response.headers.get("X-OpenHands-ServerConversation-ID")
    assert conversation_id is not None
    UUID(conversation_id)

    first_answer = first_completion.choices[0].message.content
    print(f"First answer: {first_answer}")
    print(f"OpenHands conversation ID: {conversation_id}")

    persisted_response = api_client.get(f"/api/conversations/{conversation_id}")
    assert persisted_response.status_code == 200, persisted_response.text

    # The gateway keeps conversations by default. Passing the header back lets
    # another OpenAI-compatible request continue the same server-side agent
    # conversation instead of starting over.

    second_completion = openai_client.chat.completions.create(
        model=gateway_model,
        messages=[
            {
                "role": "user",
                "content": "Now answer in five words or fewer: what did I ask about?",
            }
        ],
        extra_headers={"X-OpenHands-ServerConversation-ID": conversation_id},
    )
    second_answer = second_completion.choices[0].message.content
    print(f"Second answer using same conversation: {second_answer}")

    conversation_response = api_client.get(f"/api/conversations/{conversation_id}")
    assert conversation_response.status_code == 200, conversation_response.text
    stats = conversation_response.json().get("stats") or {}
    usage_to_metrics = stats.get("usage_to_metrics") or {}
    accumulated_cost = sum(
        metrics.get("accumulated_cost", 0.0) for metrics in usage_to_metrics.values()
    )

    # Clean up the demo resources. Real applications can keep the conversation
    # ID and inspect it later through the native agent-server API.
    api_client.delete(f"/api/conversations/{conversation_id}")
    api_client.delete(f"/api/profiles/{profile_name}")
    api_client.close()

    print(f"EXAMPLE_COST: {accumulated_cost}")

You can run the example code as-is.

The model name should follow the LiteLLM convention: provider/model_name (e.g., anthropic/claude-sonnet-4-5-20250929, openai/gpt-4o). The LLM_API_KEY should be the API key for your chosen provider.

ChatGPT Plus/Pro subscribers: You can use LLM.subscription_login() to authenticate with your ChatGPT account and access Codex models without consuming API credits. See the LLM Subscriptions guide for details.

Guides

Architecture

API Reference

OpenAI-Compatible Endpoint

What to Configure

Prepare a Profile

Client Recipes

Conversation State

Current Limitations

Ready-to-run example

​What to Configure

​Prepare a Profile

​Client Recipes

​Conversation State

​Current Limitations

​Ready-to-run example

What to Configure

Prepare a Profile

Client Recipes

Conversation State

Current Limitations

Ready-to-run example