Skip to main content

What is a Context Condenser?

A context condenser is a crucial component that addresses one of the most persistent challenges in AI agent development: managing growing conversation context efficiently. As conversations with AI agents grow longer, the cumulative history leads to:
  • 💰 Increased API Costs: More tokens in the context means higher costs per API call
  • ⏱️ Slower Response Times: Larger contexts take longer to process
  • 📉 Reduced Effectiveness: LLMs become less effective when dealing with excessive irrelevant information
The context condenser solves this by intelligently summarizing older parts of the conversation while preserving essential information needed for the agent to continue working effectively.

Default Implementation: LLMSummarizingCondenser

OpenHands SDK provides LLMSummarizingCondenser as the default condenser implementation. This condenser uses an LLM to generate summaries of conversation history when it exceeds the configured size limit.

How It Works

When conversation history exceeds a defined threshold, the LLM-based condenser:
  1. Keeps recent messages intact - The most recent exchanges remain unchanged for immediate context
  2. Preserves key information - Important details like user goals, technical specifications, and critical files are retained
  3. Summarizes older content - Earlier parts of the conversation are condensed into concise summaries using LLM-generated summaries
  4. Maintains continuity - The agent retains awareness of past progress without processing every historical interaction
Condenser Overview This approach achieves remarkable efficiency gains:
  • Up to 2x reduction in per-turn API costs
  • Consistent response times even in long sessions
  • Equivalent or better performance on software engineering tasks
Learn more about the implementation and benchmarks in our blog post on context condensation.

Extensibility

The LLMSummarizingCondenser extends the RollingCondenser base class, which provides a framework for condensers that work with rolling conversation history. You can create custom condensers by extending base classes (source code):
  • RollingCondenser - For condensers that apply condensation to rolling history
  • CondenserBase - For more specialized condensation strategies
This architecture allows you to implement custom condensation logic tailored to your specific needs while leveraging the SDK’s conversation management infrastructure.

Example Usage

This example is available on GitHub: examples/01_standalone_sdk/14_context_condenser.py
Automatically condense conversation history when context length exceeds limits, reducing token usage while preserving important information:
examples/01_standalone_sdk/14_context_condenser.py
"""
To manage context in long-running conversations, the agent can use a context condenser
that keeps the conversation history within a specified size limit. This example
demonstrates using the `LLMSummarizingCondenser`, which automatically summarizes
older parts of the conversation when the history exceeds a defined threshold.
"""

import os

from pydantic import SecretStr

from openhands.sdk import (
    LLM,
    Agent,
    Conversation,
    Event,
    LLMConvertibleEvent,
    get_logger,
)
from openhands.sdk.context.condenser import LLMSummarizingCondenser
from openhands.sdk.tool import Tool, register_tool
from openhands.tools.execute_bash import BashTool
from openhands.tools.file_editor import FileEditorTool
from openhands.tools.task_tracker import TaskTrackerTool


logger = get_logger(__name__)

# Configure LLM
api_key = os.getenv("LLM_API_KEY")
assert api_key is not None, "LLM_API_KEY environment variable is not set."
model = os.getenv("LLM_MODEL", "openhands/claude-sonnet-4-5-20250929")
base_url = os.getenv("LLM_BASE_URL")
llm = LLM(
    usage_id="agent",
    model=model,
    base_url=base_url,
    api_key=SecretStr(api_key),
)

# Tools
cwd = os.getcwd()
register_tool("BashTool", BashTool)
register_tool("FileEditorTool", FileEditorTool)
register_tool("TaskTrackerTool", TaskTrackerTool)
tools = [
    Tool(
        name="BashTool",
    ),
    Tool(name="FileEditorTool"),
    Tool(name="TaskTrackerTool"),
]

# Create a condenser to manage the context. The condenser will automatically truncate
# conversation history when it exceeds max_size, and replaces the dropped events with an
#  LLM-generated summary. This condenser triggers when there are more than ten events in
# the conversation history, and always keeps the first two events (system prompts,
# initial user messages) to preserve important context.
condenser = LLMSummarizingCondenser(
    llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2
)

# Agent with condenser
agent = Agent(llm=llm, tools=tools, condenser=condenser)

llm_messages = []  # collect raw LLM messages


def conversation_callback(event: Event):
    if isinstance(event, LLMConvertibleEvent):
        llm_messages.append(event.to_llm_message())


conversation = Conversation(
    agent=agent,
    callbacks=[conversation_callback],
    persistence_dir="./.conversations",
    workspace=".",
)

# Send multiple messages to demonstrate condensation
print("Sending multiple messages to demonstrate LLM Summarizing Condenser...")

conversation.send_message(
    "Hello! Can you create a Python file named math_utils.py with functions for "
    "basic arithmetic operations (add, subtract, multiply, divide)?"
)
conversation.run()

conversation.send_message(
    "Great! Now add a function to calculate the factorial of a number."
)
conversation.run()

conversation.send_message("Add a function to check if a number is prime.")
conversation.run()

conversation.send_message(
    "Add a function to calculate the greatest common divisor (GCD) of two numbers."
)
conversation.run()

conversation.send_message(
    "Now create a test file to verify all these functions work correctly."
)
conversation.run()


print("=" * 100)
print("Conversation finished. Got the following LLM messages:")
for i, message in enumerate(llm_messages):
    print(f"Message {i}: {str(message)[:200]}")

# Conversation persistence
print("Serializing conversation...")

del conversation

# Deserialize the conversation
print("Deserializing conversation...")
conversation = Conversation(
    agent=agent,
    callbacks=[conversation_callback],
    persistence_dir="./.conversations",
    workspace=".",
)

print("Sending message to deserialized conversation...")
conversation.send_message("Finally, clean up by deleting both files.")
conversation.run()


print("=" * 100)
print("Conversation finished with LLM Summarizing Condenser.")
print(f"Total LLM messages collected: {len(llm_messages)}")
print("\nThe condenser automatically summarized older conversation history")
print("when the conversation exceeded the configured max_size threshold.")
print("This helps manage context length while preserving important information.")
Running the Example
export LLM_API_KEY="your-api-key"
cd agent-sdk
uv run python examples/01_standalone_sdk/14_context_condenser.py

Setting Up Condensing

Create a LLMSummarizingCondenser to manage the context. The condenser will automatically truncate conversation history when it exceeds max_size, and replaces the dropped events with an LLM-generated summary. This condenser triggers when there are more than max_context_length events in the conversation history, and always keeps the first keep_first events (system prompts, initial user messages) to preserve important context.
from openhands.sdk.context import LLMSummarizingCondenser

condenser = LLMSummarizingCondenser(
    llm=llm.model_copy(update={"usage_id": "condenser"}), max_size=10, keep_first=2
)

# Agent with condenser
agent = Agent(llm=llm, tools=tools, condenser=condenser)

Next Steps

  • LLM Metrics - Track token usage reduction and analyze cost savings