cremote/mcp/README.md

914 lines
24 KiB
Markdown

# Cremote MCP Server
This is a Model Context Protocol (MCP) server that exposes cremote's web automation capabilities to LLMs and AI agents. Instead of using CLI commands, this server provides a structured API that maintains state and provides intelligent abstractions.
## 🎉 Complete Web Automation Platform
**31 comprehensive tools** across 6 enhancement phases, providing a complete web automation toolkit for LLM agents:
### 🚀 **NEW: Multi-Client Support**
The Cremote MCP server now supports **multiple concurrent clients** with isolated browser sessions:
- **Concurrent Agents**: Multiple AI agents can use the same browser simultaneously
- **Session Isolation**: Each client maintains independent browser state (tabs, history, iframe context)
- **Transport Flexibility**: Choose between stdio (single client) or HTTP (multiple clients)
- **Backward Compatible**: Existing stdio clients continue to work unchanged
See the [Multi-Client Guide](MULTI_CLIENT_GUIDE.md) for detailed setup and usage instructions.
- **Phase 1**: Element state checking and conditional logic (2 tools)
- **Phase 2**: Enhanced data extraction and batch operations (4 tools)
- **Phase 3**: Form analysis and bulk operations (3 tools)
- **Phase 4**: Page state and metadata tools (4 tools)
- **Phase 5**: Enhanced screenshots and file management (4 tools)
- **Core Tools**: Essential web automation capabilities (10 tools)
## Features
- **State Management**: Automatically tracks current tab, tab history, and iframe context
- **Intelligent Abstractions**: High-level tools that combine multiple cremote operations
- **Batch Operations**: Reduce round trips with bulk operations and multi-selector extraction
- **Form Intelligence**: Complete form analysis and bulk filling capabilities
- **Rich Context**: Page metadata, performance metrics, and content verification
- **Enhanced Screenshots**: Element-specific and metadata-rich screenshot capture
- **File Management**: Bulk file operations and automated cleanup
- **Accessibility Tree**: Chrome accessibility tree interface for semantic understanding
- **Automatic Screenshots**: Optional screenshot capture for debugging and documentation
- **Error Recovery**: Better error handling and context for LLMs
- **Resource Management**: Automatic cleanup and connection management
## Quick Start for LLMs
**For LLM agents**: See the comprehensive [LLM Usage Guide](LLM_USAGE_GUIDE.md) for detailed usage instructions, examples, and best practices.
## Available Tools (31 Total)
### Version Information
#### `version_cremotemcp`
Get version information for MCP server and daemon.
```json
{
"name": "version_cremotemcp",
"arguments": {}
}
```
Returns version information for both the MCP server and the connected daemon.
### Core Web Automation Tools (10 tools)
#### 1. `web_navigate_cremotemcp`
Navigate to URLs with optional screenshot capture.
```json
{
"name": "web_navigate_cremotemcp",
"arguments": {
"url": "https://example.com",
"screenshot": true,
"timeout": 10
}
}
```
#### 2. `web_interact_cremotemcp`
Interact with web elements (click, fill, submit, upload, select).
```json
{
"name": "web_interact_cremotemcp",
"arguments": {
"action": "fill",
"selector": "#username",
"value": "testuser",
"timeout": 5
}
}
```
For select dropdowns:
```json
{
"name": "web_interact_cremotemcp",
"arguments": {
"action": "select",
"selector": "#country",
"value": "United States",
"timeout": 5
}
}
```
#### 3. `web_extract_cremotemcp`
Extract data from pages (source, element HTML, JavaScript execution).
```json
{
"name": "web_extract_cremotemcp",
"arguments": {
"type": "javascript",
"code": "document.title",
"timeout": 5
}
}
```
#### 4. `web_screenshot_cremotemcp`
Take screenshots of the current page.
```json
{
"name": "web_screenshot_cremotemcp",
"arguments": {
"output": "/tmp/page.png",
"full_page": true,
"timeout": 5
}
}
```
#### 5. `web_manage_tabs_cremotemcp`
Manage browser tabs (open, close, list, switch).
```json
{
"name": "web_manage_tabs_cremotemcp",
"arguments": {
"action": "open",
"timeout": 5
}
}
```
#### 6. `web_iframe_cremotemcp`
Switch iframe context for subsequent operations.
```json
{
"name": "web_iframe_cremotemcp",
"arguments": {
"action": "enter",
"selector": "iframe#payment-form"
}
}
```
#### 7. `file_upload_cremotemcp`
Upload files from client to container for use in form uploads.
```json
{
"name": "file_upload_cremotemcp",
"arguments": {
"local_path": "/local/file.txt",
"container_path": "/tmp/file.txt"
}
}
```
**Note**: The CLI `cremote upload-file` command now automatically transfers files to the daemon container first, making file uploads seamless even when the daemon runs in a container.
#### 8. `file_download_cremotemcp`
Download files from container to client (e.g., downloaded files from browser).
```json
{
"name": "file_download_cremotemcp",
"arguments": {
"container_path": "/tmp/downloaded-file.pdf",
"local_path": "/local/downloaded-file.pdf"
}
}
```
#### 9. `console_logs_cremotemcp`
Get console logs from the browser tab.
```json
{
"name": "console_logs_cremotemcp",
"arguments": {
"tab": "tab-123",
"timeout": 5
}
}
```
#### 10. `console_command_cremotemcp`
Execute commands in the browser console.
```json
{
"name": "console_command_cremotemcp",
"arguments": {
"command": "document.getElementById('test').innerHTML = 'Hello World'",
"tab": "tab-123",
"timeout": 5
}
}
```
### Phase 1: Element State and Checking Tools (2 tools)
#### 11. `web_element_check_cremotemcp`
Check element existence, visibility, enabled state, and other properties without interaction.
```json
{
"name": "web_element_check_cremotemcp",
"arguments": {
"selector": "#submit-button",
"check_type": "all",
"timeout": 5
}
}
```
**Check Types:**
- `exists`: Check if element exists in DOM
- `visible`: Check if element is visible (not hidden)
- `enabled`: Check if element is enabled (not disabled)
- `focused`: Check if element has focus
- `selected`: Check if element is selected (checkboxes, radio buttons)
- `all`: Check all states above
**Response includes:**
```json
{
"exists": true,
"visible": true,
"enabled": false,
"focused": false,
"selected": true,
"count": 1
}
```
#### 12. `web_element_attributes_cremotemcp`
Get element attributes, properties, and computed styles.
```json
{
"name": "web_element_attributes_cremotemcp",
"arguments": {
"selector": "#user-profile",
"attributes": "all",
"timeout": 5
}
}
```
**Attribute Options:**
- `all`: Get common attributes, properties, and styles
- `"id,class,href"`: Comma-separated list of specific attributes
- `"style_display,style_color"`: Computed styles (prefix with `style_`)
- `"prop_textContent,prop_value"`: JavaScript properties (prefix with `prop_`)
**Example Response:**
```json
{
"id": "user-profile",
"class": "profile-card active",
"data-user-id": "12345",
"textContent": "John Doe",
"style_display": "block",
"style_color": "rgb(0, 0, 0)"
}
```
### Phase 2: Enhanced Data Extraction Tools (4 tools)
#### 13. `web_extract_multiple_cremotemcp`
Extract data from multiple selectors in a single call for improved efficiency.
```json
{
"name": "web_extract_multiple_cremotemcp",
"arguments": {
"selectors": {
"title": "h1",
"price": ".price",
"description": ".product-description"
},
"timeout": 5
}
}
```
#### 14. `web_extract_links_cremotemcp`
Extract all links from a page with powerful filtering options.
```json
{
"name": "web_extract_links_cremotemcp",
"arguments": {
"container_selector": "nav",
"href_pattern": "https://.*",
"text_pattern": ".*Download.*",
"timeout": 5
}
}
```
#### 15. `web_extract_table_cremotemcp`
Extract table data as structured JSON with optional header processing.
```json
{
"name": "web_extract_table_cremotemcp",
"arguments": {
"selector": "#data-table",
"include_headers": true,
"timeout": 5
}
}
```
#### 16. `web_extract_text_cremotemcp`
Extract text content with optional pattern matching and different extraction types.
```json
{
"name": "web_extract_text_cremotemcp",
"arguments": {
"selector": ".content",
"pattern": "\\d{3}-\\d{3}-\\d{4}",
"extract_type": "textContent",
"timeout": 5
}
}
```
### Phase 3: Form Analysis and Bulk Operations (3 tools)
#### 17. `web_form_analyze_cremotemcp`
Analyze forms completely to understand their structure, fields, and submission requirements.
```json
{
"name": "web_form_analyze_cremotemcp",
"arguments": {
"selector": "#registration-form",
"timeout": 10
}
}
```
#### 18. `web_interact_multiple_cremotemcp`
Perform multiple interactions in a single call for efficient batch operations.
```json
{
"name": "web_interact_multiple_cremotemcp",
"arguments": {
"interactions": [
{"selector": "#username", "action": "fill", "value": "testuser"},
{"selector": "#password", "action": "fill", "value": "testpass"},
{"selector": "#remember-me", "action": "check"},
{"selector": "#login-btn", "action": "click"}
],
"timeout": 10
}
}
```
#### 19. `web_form_fill_bulk_cremotemcp`
Fill entire forms with key-value pairs in a single operation.
```json
{
"name": "web_form_fill_bulk_cremotemcp",
"arguments": {
"form_selector": "#contact-form",
"fields": {
"name": "John Doe",
"email": "john@example.com",
"message": "Hello, this is a test message."
},
"timeout": 10
}
}
```
### Phase 4: Page State and Metadata Tools (4 tools)
#### 20. `web_page_info_cremotemcp`
Get comprehensive page metadata and state information.
```json
{
"name": "web_page_info_cremotemcp",
"arguments": {
"tab": "tab-123",
"timeout": 5
}
}
```
Returns detailed page information including title, URL, loading state, domain, protocol, and browser status.
#### 21. `web_viewport_info_cremotemcp`
Get viewport and scroll information.
```json
{
"name": "web_viewport_info_cremotemcp",
"arguments": {
"tab": "tab-123",
"timeout": 5
}
}
```
Returns viewport dimensions, scroll position, device pixel ratio, and orientation.
#### 22. `web_performance_metrics_cremotemcp`
Get page performance metrics.
```json
{
"name": "web_performance_metrics_cremotemcp",
"arguments": {
"tab": "tab-123",
"timeout": 5
}
}
```
Returns performance data including load times, resource counts, and memory usage.
#### 23. `web_content_check_cremotemcp`
Check for specific content types and loading states.
```json
{
"name": "web_content_check_cremotemcp",
"arguments": {
"type": "images",
"tab": "tab-123",
"timeout": 5
}
}
```
Supported content types: `images`, `scripts`, `styles`, `forms`, `links`, `iframes`, `errors`.
### Phase 5: Enhanced Screenshot and File Management (4 tools)
#### 24. `web_screenshot_element_cremotemcp`
Take a screenshot of a specific element on the page.
```json
{
"name": "web_screenshot_element_cremotemcp",
"arguments": {
"selector": "#main-content",
"output": "/tmp/element-screenshot.png",
"tab": "tab-123",
"timeout": 5
}
}
```
Automatically scrolls the element into view and captures a screenshot of just that element.
#### 25. `web_screenshot_enhanced_cremotemcp`
Take an enhanced screenshot with metadata.
```json
{
"name": "web_screenshot_enhanced_cremotemcp",
"arguments": {
"output": "/tmp/enhanced-screenshot.png",
"full_page": true,
"tab": "tab-123",
"timeout": 5
}
}
```
Returns screenshot metadata including timestamp, URL, title, viewport size, and file information.
#### 26. `file_operations_bulk_cremotemcp`
Perform bulk file operations (upload/download multiple files).
```json
{
"name": "file_operations_bulk_cremotemcp",
"arguments": {
"operation": "upload",
"files": [
{
"local_path": "/local/file1.txt",
"container_path": "/tmp/file1.txt"
},
{
"local_path": "/local/file2.txt",
"container_path": "/tmp/file2.txt"
}
],
"timeout": 30
}
}
```
Supports both "upload" and "download" operations with detailed success/failure reporting.
#### 27. `file_management_cremotemcp`
Manage files (cleanup, list, get info).
```json
{
"name": "file_management_cremotemcp",
"arguments": {
"operation": "cleanup",
"pattern": "/tmp/cremote-*",
"max_age": "24"
}
}
```
Operations: `cleanup` (remove old files), `list` (list files), `info` (get file details).
## 🎉 Complete Enhancement Summary
All 5 phases of the MCP enhancement plan have been successfully implemented, delivering a comprehensive web automation platform with **27 tools** organized across the following capabilities:
### ✅ Phase 1: Element State and Checking (2 tools)
**Enables conditional logic without timing issues**
- `web_element_check_cremotemcp`: Check existence, visibility, enabled state, count elements
- `web_element_attributes_cremotemcp`: Get attributes, properties, computed styles
**Benefits**: LLMs can make decisions based on page state, prevent errors from trying to interact with non-existent elements, enable conditional workflows.
### ✅ Phase 2: Enhanced Data Extraction (4 tools)
**Dramatically improves data gathering efficiency**
- `web_extract_multiple_cremotemcp`: Extract from multiple selectors in one call
- `web_extract_links_cremotemcp`: Extract all links with filtering options
- `web_extract_table_cremotemcp`: Extract table data as structured JSON
- `web_extract_text_cremotemcp`: Extract text with pattern matching
**Benefits**: Reduces multiple round trips to single calls, provides structured data ready for LLM processing, enables comprehensive page analysis.
### ✅ Phase 3: Form Analysis and Bulk Operations (3 tools)
**Streamlines form handling workflows with 10x efficiency**
- `web_form_analyze_cremotemcp`: Analyze forms completely
- `web_interact_multiple_cremotemcp`: Batch interactions
- `web_form_fill_bulk_cremotemcp`: Fill entire forms with key-value pairs
**Benefits**: Complete forms in 1-2 calls instead of 10+, form intelligence provides complete understanding before interaction, error prevention through field validation.
### ✅ Phase 4: Page State and Metadata Tools (4 tools)
**Provides rich context about page state for better debugging and monitoring**
- `web_page_info_cremotemcp`: Get page metadata and loading state
- `web_viewport_info_cremotemcp`: Get viewport and scroll information
- `web_performance_metrics_cremotemcp`: Get performance data
- `web_content_check_cremotemcp`: Check for specific content types
**Benefits**: Better debugging and monitoring capabilities, performance optimization insights, content loading verification, rich page state context for LLM decision making.
### ✅ Phase 5: Enhanced Screenshot and File Management (4 tools)
**Improves debugging and file handling**
- `web_screenshot_element_cremotemcp`: Screenshot specific elements
- `web_screenshot_enhanced_cremotemcp`: Screenshots with metadata
- `file_operations_bulk_cremotemcp`: Bulk file operations
- `file_management_cremotemcp`: Temporary file cleanup
**Benefits**: Better debugging with targeted screenshots, improved file handling workflows, automatic resource management, enhanced visual debugging capabilities.
## Key Benefits for LLM Agents
### 🚀 **Efficiency Gains**
- **10x Form Efficiency**: Complete forms in 1-2 calls instead of 10+ individual interactions
- **Batch Operations**: Multiple data extractions and interactions in single calls
- **Reduced Round Trips**: Comprehensive tools minimize API call overhead
### 🧠 **Intelligence & Context**
- **Conditional Logic**: Element checking enables smart decision making without timing issues
- **Rich Page Context**: Complete page state, performance metrics, and content verification
- **Form Intelligence**: Complete form analysis before interaction prevents errors
### 🛠 **Enhanced Capabilities**
- **Visual Debugging**: Element-specific screenshots and enhanced metadata
- **File Management**: Bulk operations and automated cleanup
- **Error Prevention**: State checking and validation before actions
- **Resource Management**: Automatic cleanup and connection handling
## Installation & Usage
### Prerequisites
1. **Cremote daemon must be running**:
```bash
cremotedaemon
```
2. **Chrome/Chromium with remote debugging**:
```bash
chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug
```
### Build the MCP Server
```bash
cd mcp/
go build -o cremote-mcp .
```
### Configuration
#### Basic Configuration (Single Client - stdio)
Set environment variables to configure the cremote connection:
```bash
export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
export CREMOTE_TRANSPORT=stdio # Default
```
#### Multi-Client Configuration (HTTP Transport)
For multiple concurrent clients:
```bash
export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
export CREMOTE_TRANSPORT=http
export CREMOTE_HTTP_HOST=localhost
export CREMOTE_HTTP_PORT=8990
```
#### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `CREMOTE_TRANSPORT` | `stdio` | Transport mode: `stdio` or `http` |
| `CREMOTE_HOST` | `localhost` | Cremote daemon host |
| `CREMOTE_PORT` | `8989` | Cremote daemon port |
| `CREMOTE_HTTP_HOST` | `localhost` | HTTP server host (HTTP mode only) |
| `CREMOTE_HTTP_PORT` | `8990` | HTTP server port (HTTP mode only) |
### Running with Claude Desktop
Add to your Claude Desktop configuration (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
```json
{
"mcpServers": {
"cremote": {
"command": "/path/to/cremote-mcp",
"env": {
"CREMOTE_HOST": "localhost",
"CREMOTE_PORT": "8989"
}
}
}
}
```
### Running with Other MCP Clients
The server communicates via JSON-RPC over stdio, so it can be used with any MCP-compatible client:
```bash
echo '{"method":"tools/list","params":{},"id":1}' | ./cremote-mcp
```
## Response Format
All tool responses include:
```json
{
"success": true,
"data": "...",
"screenshot": "/tmp/screenshot.png",
"current_tab": "tab-id-123",
"tab_history": ["tab-id-123", "tab-id-456"],
"iframe_mode": false,
"error": null,
"metadata": {}
}
```
## Example Workflows
### Basic Login Workflow (Traditional Approach)
```json
// 1. Navigate to a page
{
"name": "web_navigate_cremotemcp",
"arguments": {
"url": "https://example.com/login",
"screenshot": true
}
}
// 2. Check if login form exists
{
"name": "web_element_check_cremotemcp",
"arguments": {
"selector": "#login-form",
"check_type": "exists"
}
}
// 3. Fill login form using bulk operations
{
"name": "web_form_fill_bulk_cremotemcp",
"arguments": {
"form_selector": "#login-form",
"fields": {
"username": "testuser",
"password": "password123"
}
}
}
// 4. Submit and verify
{
"name": "web_interact_cremotemcp",
"arguments": {
"action": "click",
"selector": "#login-button"
}
}
// 5. Extract multiple results at once
{
"name": "web_extract_multiple_cremotemcp",
"arguments": {
"selectors": {
"welcome_message": ".welcome-message",
"user_name": ".user-profile .name",
"last_login": ".user-info .last-login"
}
}
}
// 6. Take enhanced screenshot with metadata
{
"name": "web_screenshot_enhanced_cremotemcp",
"arguments": {
"output": "/tmp/login-success.png",
"full_page": true
}
}
```
### Advanced E-commerce Data Extraction Workflow
```json
// 1. Navigate and check page state
{
"name": "web_navigate_cremotemcp",
"arguments": {
"url": "https://shop.example.com/products",
"screenshot": true
}
}
// 2. Get page performance metrics
{
"name": "web_performance_metrics_cremotemcp",
"arguments": {}
}
// 3. Extract all product data in one call
{
"name": "web_extract_multiple_cremotemcp",
"arguments": {
"selectors": {
"product_titles": ".product-card h3",
"prices": ".product-card .price",
"ratings": ".product-card .rating",
"availability": ".product-card .stock-status"
}
}
}
// 4. Extract all product links with filtering
{
"name": "web_extract_links_cremotemcp",
"arguments": {
"container_selector": ".product-grid",
"href_pattern": ".*/product/.*",
"text_pattern": ".*"
}
}
// 5. Check if more products are loading
{
"name": "web_content_check_cremotemcp",
"arguments": {
"type": "scripts"
}
}
```
### Phase 6: Accessibility Tree Support (3 Tools)
#### `get_accessibility_tree_cremotemcp`
Get the full accessibility tree for a page or with limited depth.
```json
{
"name": "get_accessibility_tree_cremotemcp",
"arguments": {
"tab": "optional-tab-id",
"depth": 3,
"timeout": 10
}
}
```
#### `get_partial_accessibility_tree_cremotemcp`
Get accessibility tree for a specific element and its relatives.
```json
{
"name": "get_partial_accessibility_tree_cremotemcp",
"arguments": {
"selector": "form",
"tab": "optional-tab-id",
"fetch_relatives": true,
"timeout": 10
}
}
```
#### `query_accessibility_tree_cremotemcp`
Query accessibility tree for nodes matching specific criteria.
```json
{
"name": "query_accessibility_tree_cremotemcp",
"arguments": {
"tab": "optional-tab-id",
"selector": "form",
"accessible_name": "Submit",
"role": "button",
"timeout": 10
}
}
```
## Benefits Over CLI
### 🎯 **Enhanced Efficiency**
- **State Management**: No need to manually track tab IDs
- **Batch Operations**: 10x efficiency with bulk form filling and multi-selector extraction
- **Intelligent Defaults**: Smart parameter handling and fallbacks
- **Resource Cleanup**: Automatic management of tabs and files
### 🔍 **Better Intelligence**
- **Conditional Logic**: Element checking enables smart decision making
- **Rich Context**: Page state, performance metrics, and content verification
- **Form Intelligence**: Complete form analysis before interaction
- **Error Prevention**: State validation before actions
### 🛠 **Advanced Capabilities**
- **Enhanced Screenshots**: Element-specific and metadata-rich capture
- **File Management**: Bulk operations and automated cleanup
- **Better Error Context**: Rich error information for debugging
- **Structured Responses**: Consistent, parseable response format
## 🎉 Production Ready
This comprehensive web automation platform is **production ready** with:
- **31 Tools**: Complete coverage of web automation needs
- **6 Enhancement Phases**: Systematic capability building from basic to advanced
- **Extensive Testing**: All tools validated and documented
- **LLM Optimized**: Designed specifically for AI agent workflows
- **Backward Compatible**: All existing tools continue to work unchanged
### 📊 **Capability Matrix**
| Category | Tools | Key Benefits |
|----------|-------|--------------|
| **Core Web Automation** | 10 tools | Navigation, interaction, extraction, screenshots, tabs, iframes, files, console |
| **Element Intelligence** | 2 tools | Conditional logic, state checking, attribute inspection |
| **Data Extraction** | 4 tools | Batch extraction, structured data, pattern matching, table processing |
| **Form Automation** | 3 tools | Form analysis, bulk filling, batch interactions |
| **Page Intelligence** | 4 tools | Page state, performance metrics, content verification, viewport info |
| **Enhanced Capabilities** | 4 tools | Element screenshots, enhanced metadata, bulk file ops, file management |
| **Accessibility Tree** | 3 tools | Semantic understanding, accessibility testing, screen reader simulation |
## Development
To extend the MCP server with new tools:
1. Add the tool definition to `handleToolsList()`
2. Add a case in `handleToolCall()`
3. Implement the handler function following the pattern of existing handlers
4. Update this documentation
The server is designed to be easily extensible while maintaining consistency with the cremote client library.
---
**🚀 Ready for Production**: Complete web automation platform with 31 tools across 6 enhancement phases, optimized for LLM agents and production workflows.