Skip to main content
This example is available on GitHub: examples/01_standalone_sdk/15_browser_use.py
The BrowserToolSet integration enables your agent to interact with web pages through automated browser control. Built on top of browser-use, it provides capabilities for navigating websites, clicking elements, filling forms, and extracting content - all through natural language instructions.
examples/01_standalone_sdk/15_browser_use.py
import os

from pydantic import SecretStr

from openhands.sdk import (
    LLM,
    Agent,
    Conversation,
    Event,
    LLMConvertibleEvent,
    get_logger,
)
from openhands.sdk.tool import Tool, register_tool
from openhands.tools.browser_use import BrowserToolSet
from openhands.tools.execute_bash import BashTool
from openhands.tools.file_editor import FileEditorTool


logger = get_logger(__name__)

# Configure LLM
api_key = os.getenv("LLM_API_KEY")
assert api_key is not None, "LLM_API_KEY environment variable is not set."
model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929")
base_url = os.getenv("LLM_BASE_URL")
llm = LLM(
    usage_id="agent",
    model=model,
    base_url=base_url,
    api_key=SecretStr(api_key),
)

# Tools
cwd = os.getcwd()
register_tool("BashTool", BashTool)
register_tool("FileEditorTool", FileEditorTool)
register_tool("BrowserToolSet", BrowserToolSet)
tools = [
    Tool(
        name="BashTool",
    ),
    Tool(name="FileEditorTool"),
    Tool(name="BrowserToolSet"),
]

# If you need fine-grained browser control, you can manually register individual browser
# tools by creating a BrowserToolExecutor and providing factories that return customized
# Tool instances before constructing the Agent.

# Agent
agent = Agent(llm=llm, tools=tools)

llm_messages = []  # collect raw LLM messages


def conversation_callback(event: Event):
    if isinstance(event, LLMConvertibleEvent):
        llm_messages.append(event.to_llm_message())


conversation = Conversation(
    agent=agent, callbacks=[conversation_callback], workspace=cwd
)

conversation.send_message(
    "Could you go to https://openhands.dev/ blog page and summarize main "
    "points of the latest blog?"
)
conversation.run()


print("=" * 100)
print("Conversation finished. Got the following LLM messages:")
for i, message in enumerate(llm_messages):
    print(f"Message {i}: {str(message)[:200]}")
Running the Example
export LLM_API_KEY="your-api-key"
cd agent-sdk
uv run python examples/01_standalone_sdk/15_browser_use.py

How It Works

The example demonstrates combining multiple tools to create a capable web research agent:
  1. BrowserToolSet: Provides automated browser control for web interaction
  2. FileEditorTool: Allows the agent to read and write files if needed
  3. BashTool: Enables command-line operations for additional functionality
The agent uses these tools to:
  • Navigate to specified URLs
  • Interact with web page elements (clicking, scrolling, etc.)
  • Extract and analyze content from web pages
  • Summarize information from multiple sources
In this example, the agent visits the openhands.dev blog, finds the latest blog post, and provides a summary of its main points.

Customization

For advanced use cases requiring only a subset of browser tools or custom configurations, you can manually register individual browser tools. Refer to the BrowserToolSet definition to see the available individual tools and create a BrowserToolExecutor with customized tool configurations before constructing the Agent. This gives you fine-grained control over which browser capabilities are exposed to the agent.

Next Steps