openhands.agent_server) provides an HTTP API server for remote agent execution. It enables building multi-user systems, SaaS products, and distributed agent platforms.
Source: openhands/agent_server/
Purpose
The Agent Server enables:- Remote execution: Clients interact with agents via HTTP API
- Multi-user isolation: Each user gets isolated workspace
- Container orchestration: Manages Docker containers for workspaces
- Centralized management: Monitor and control all agents
- Scalability: Horizontal scaling with multiple servers
Architecture Overview
Key Components
1. FastAPI Server- HTTP REST API endpoints
- Authentication and authorization
- Request validation
- WebSocket support for streaming
- Creates and manages Docker containers
- Isolates workspaces per user
- Handles container lifecycle
- Manages resource limits
- Routes requests to appropriate workspace
- Manages conversation state
- Handles concurrent requests
- Supports streaming responses
- Interfaces with Docker daemon
- Builds and pulls images
- Creates and destroys containers
- Monitors container health
Design Decisions
Why HTTP API?
Alternative approaches considered:- gRPC: More efficient but harder for web clients
- WebSockets only: Good for streaming but not RESTful
- HTTP + WebSockets: Best of both worlds
- β Works from any client (web, mobile, CLI)
- β Easy to debug (curl, Postman)
- β Standard authentication (API keys, OAuth)
- β Streaming where needed
Why Container Per User?
Alternative approaches:- Shared container: Multiple users in one container
- Container per session: New container each conversation
- Container per user: One container per user (chosen)
- β Strong isolation between users
- β Persistent workspace across sessions
- β Better resource management
- β οΈ More containers, but worth it for isolation
Why FastAPI?
Alternative frameworks:- Flask: Simpler but less type-safe
- Django: Too heavyweight
- FastAPI: Modern, fast, type-safe (chosen)
- β Automatic API documentation (OpenAPI)
- β Type validation with Pydantic
- β Async support for performance
- β WebSocket support built-in
API Design
Key Endpoints
Workspace ManagementAuthentication
API Key Authentication- API key β user ID mapping
- Each user gets separate workspace
- Users canβt access each otherβs workspaces
Streaming Responses
WebSocket for real-time updates- Real-time feedback to users
- Show agent thinking process
- Better UX for long-running tasks
Deployment Models
1. Local Development
Run server locally for testing:2. Single-Server Deployment
Deploy on one server (VPS, EC2, etc.):3. Multi-Server Deployment
Scale horizontally with load balancer:4. Kubernetes Deployment
Container orchestration with Kubernetes:Resource Management
Container Limits
Set per-workspace resource limits:- Prevent one user from consuming all resources
- Fair usage across users
- Protect server from runaway processes
- Cost control
Cleanup & Garbage Collection
Container lifecycle:- Containers created on first use
- Kept alive between requests (warm)
- Cleaned up after inactivity timeout
- Force cleanup on server shutdown
- Old workspaces deleted automatically
- Disk usage monitored
- Alerts when approaching limits
Security Considerations
Multi-Tenant Isolation
Container isolation:- Each user gets separate container
- Containers canβt communicate
- Network isolation (optional)
- File system isolation
- API keys mapped to users
- Users can only access their workspaces
- Server validates all permissions
Input Validation
Server validates:- API request schemas
- Command injection attempts
- Path traversal attempts
- File size limits
- API validation
- Container validation
- Docker security features
- OS-level security
Network Security
Best practices:- HTTPS only (TLS certificates)
- Firewall rules (only port 443/8000)
- Rate limiting
- DDoS protection
Monitoring & Observability
Health Checks
Metrics
Prometheus metrics:- Request count and latency
- Active workspaces
- Container resource usage
- Error rates
- Structured JSON logs
- Per-request tracing
- Workspace events
- Error tracking
Alerting
Alert on:- Server down
- High error rate
- Resource exhaustion
- Container failures
Client SDK
Python SDK for interacting with Agent Server:- Authentication
- Request/response serialization
- Error handling
- Streaming
- Retries
Cost Considerations
Server Costs
Compute: CPU and memory for containers- Each active workspace = 1 container
- Typically 1-2 GB RAM per workspace
- 0.5-1 CPU core per workspace
- ~1-10 GB per workspace (depends on usage)
- Conversation history in database
- Minimal (mostly text)
- Streaming adds bandwidth
Cost Optimization
1. Idle timeout: Shutdown containers after inactivityWhen to Use Agent Server
Use Agent Server When:
β Multi-user system: Web app with many usersβ Remote clients: Mobile app, web frontend
β Centralized management: Need to monitor all agents
β Workspace isolation: Users shouldnβt interfere
β SaaS product: Building agent-as-a-service
β Scaling: Need to handle concurrent users Examples:
- Chatbot platforms
- Code assistant web apps
- Agent marketplaces
- Enterprise agent deployments
Use Standalone SDK When:
β Single-user: Personal tool or scriptβ Local execution: Running on your machine
β Full control: Need programmatic access
β Simpler deployment: No server management
β Lower latency: No network overhead Examples:
- CLI tools
- Automation scripts
- Local development
- Desktop applications
Hybrid Approach
Use SDK locally but RemoteAPIWorkspace for execution:- Agent logic in your Python code
- Execution happens on remote server
- Best of both worlds
Building Custom Agent Server
The server is extensible for custom needs: Custom authentication:Next Steps
For Usage Examples
- Local Agent Server - Run locally
- Docker Sandboxed Server - Docker setup
- API Sandboxed Server - Remote API
- Remote Agent Server Overview - All options
For Related Architecture
- Workspace Architecture - RemoteAPIWorkspace details
- SDK Architecture - Core framework
- Architecture Overview - System design
For Implementation Details
openhands/agent_server/- Server sourceexamples/- Working examples

