# LLM Agent Guide: Using Cremote MCP Server for Web Automation This document provides comprehensive guidance for LLM agents on how to use the **Cremote MCP Server** for intelligent web automation. The MCP server provides a structured, stateful interface that's optimized for AI-driven web testing and automation workflows. ## What is the Cremote MCP Server? The **Cremote MCP Server** is a Model Context Protocol implementation that wraps cremote's web automation capabilities in a structured API designed specifically for LLMs. Unlike CLI commands, the MCP server provides: - **Automatic State Management**: Tracks current tab, tab history, and iframe context - **Intelligent Abstractions**: High-level tools that combine multiple operations - **Rich Error Context**: Detailed error information for better debugging - **Automatic Screenshots**: Built-in screenshot capture for documentation - **Structured Responses**: Consistent, parseable JSON responses ## Prerequisites Before using the MCP server, ensure the cremote infrastructure is running: 1. **Check if everything is already running:** ```bash cremote status ``` 2. **Start Chromium with remote debugging (if needed):** ```bash chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug & ``` 3. **Start cremote daemon (if needed):** ```bash cremotedaemon & ``` 4. **The MCP server should be configured in your MCP client** (e.g., Claude Desktop) ## Available MCP Tools ### 1. `web_navigate` - Smart Navigation Navigate to URLs with automatic tab management and optional screenshot capture. **Parameters:** - `url` (required): URL to navigate to - `tab` (optional): Specific tab ID (uses current tab if not specified) - `screenshot` (optional): Take screenshot after navigation (default: false) - `timeout` (optional): Timeout in seconds (default: 5) **Example:** ```json { "name": "web_navigate", "arguments": { "url": "https://example.com/login", "screenshot": true, "timeout": 10 } } ``` **Smart Behavior:** - Automatically opens a new tab if none exists - Updates current tab tracking - Adds tab to history for easy switching ### 2. `web_interact` - Element Interactions Interact with web elements through a unified interface. **Parameters:** - `action` (required): "click", "fill", "submit", or "upload" - `selector` (required): CSS selector for the target element - `value` (optional): Value for fill/upload actions - `tab` (optional): Tab ID (uses current tab if not specified) - `timeout` (optional): Timeout in seconds (default: 5) **Examples:** ```json // Fill a form field { "name": "web_interact", "arguments": { "action": "fill", "selector": "#username", "value": "testuser" } } // Click a button { "name": "web_interact", "arguments": { "action": "click", "selector": "#login-button" } } // Submit a form { "name": "web_interact", "arguments": { "action": "submit", "selector": "form#login-form" } } // Upload a file { "name": "web_interact", "arguments": { "action": "upload", "selector": "input[type=file]", "value": "/path/to/file.pdf" } } ``` ### 3. `web_extract` - Data Extraction Extract information from web pages through multiple methods. **Parameters:** - `type` (required): "source", "element", or "javascript" - `selector` (optional): CSS selector (required for "element" type) - `code` (optional): JavaScript code (required for "javascript" type) - `tab` (optional): Tab ID (uses current tab if not specified) - `timeout` (optional): Timeout in seconds (default: 5) **Examples:** ```json // Get page source { "name": "web_extract", "arguments": { "type": "source" } } // Get specific element HTML { "name": "web_extract", "arguments": { "type": "element", "selector": ".error-message" } } // Execute JavaScript and get result { "name": "web_extract", "arguments": { "type": "javascript", "code": "document.title" } } // Check form validation { "name": "web_extract", "arguments": { "type": "javascript", "code": "document.getElementById('email').validity.valid" } } ``` ### 4. `web_screenshot` - Screenshot Capture Take screenshots for documentation and debugging. **Parameters:** - `output` (required): File path for the screenshot - `full_page` (optional): Capture full page vs viewport (default: false) - `tab` (optional): Tab ID (uses current tab if not specified) - `timeout` (optional): Timeout in seconds (default: 5) **Examples:** ```json // Viewport screenshot { "name": "web_screenshot", "arguments": { "output": "/tmp/login-page.png" } } // Full page screenshot { "name": "web_screenshot", "arguments": { "output": "/tmp/full-page.png", "full_page": true } } ``` ### 5. `web_manage_tabs` - Tab Management Manage browser tabs with automatic state tracking. **Parameters:** - `action` (required): "open", "close", "list", or "switch" - `tab` (optional): Tab ID (required for "close" and "switch" actions) - `timeout` (optional): Timeout in seconds (default: 5) **Examples:** ```json // Open new tab { "name": "web_manage_tabs", "arguments": { "action": "open" } } // List all tabs { "name": "web_manage_tabs", "arguments": { "action": "list" } } // Switch to specific tab { "name": "web_manage_tabs", "arguments": { "action": "switch", "tab": "tab-id-123" } } // Close current tab { "name": "web_manage_tabs", "arguments": { "action": "close" } } ``` ### 6. `web_iframe` - Iframe Context Management Switch between main page and iframe contexts for testing embedded content. **Parameters:** - `action` (required): "enter" or "exit" - `selector` (optional): Iframe CSS selector (required for "enter" action) - `tab` (optional): Tab ID (uses current tab if not specified) **Examples:** ```json // Enter iframe context { "name": "web_iframe", "arguments": { "action": "enter", "selector": "iframe#payment-form" } } // Exit iframe context (return to main page) { "name": "web_iframe", "arguments": { "action": "exit" } } ``` ## Response Format All MCP tools return a consistent response structure: ```json { "success": true, "data": "...", // Tool-specific response data "screenshot": "/tmp/shot.png", // Screenshot path (if captured) "current_tab": "tab-id-123", // Current active tab "tab_history": ["tab-id-123"], // Tab history stack "iframe_mode": false, // Whether in iframe context "error": null, // Error message (if failed) "metadata": {} // Additional context information } ``` ## Common Automation Patterns ### 1. Login Flow Testing ```json // 1. Navigate to login page with screenshot { "name": "web_navigate", "arguments": { "url": "https://myapp.com/login", "screenshot": true } } // 2. Fill credentials { "name": "web_interact", "arguments": { "action": "fill", "selector": "#email", "value": "user@example.com" } } { "name": "web_interact", "arguments": { "action": "fill", "selector": "#password", "value": "password123" } } // 3. Submit login { "name": "web_interact", "arguments": { "action": "click", "selector": "#login-button" } } // 4. Verify success { "name": "web_extract", "arguments": { "type": "javascript", "code": "document.querySelector('.welcome-message')?.textContent" } } // 5. Document result { "name": "web_screenshot", "arguments": { "output": "/tmp/login-success.png" } } ``` ### 2. Form Validation Testing ```json // 1. Navigate to form { "name": "web_navigate", "arguments": { "url": "https://myapp.com/register" } } // 2. Test empty form submission { "name": "web_interact", "arguments": { "action": "click", "selector": "#submit-button" } } // 3. Check for validation errors { "name": "web_extract", "arguments": { "type": "element", "selector": ".error-message" } } // 4. Test invalid email { "name": "web_interact", "arguments": { "action": "fill", "selector": "#email", "value": "invalid-email" } } // 5. Verify JavaScript validation { "name": "web_extract", "arguments": { "type": "javascript", "code": "document.getElementById('email').validity.valid" } } ``` ### 3. Multi-Tab Workflow ```json // 1. Open multiple tabs for comparison { "name": "web_manage_tabs", "arguments": { "action": "open" } } { "name": "web_navigate", "arguments": { "url": "https://app.com/admin" } } { "name": "web_manage_tabs", "arguments": { "action": "open" } } { "name": "web_navigate", "arguments": { "url": "https://app.com/user" } } // 2. List tabs to see current state { "name": "web_manage_tabs", "arguments": { "action": "list" } } // 3. Switch between tabs as needed { "name": "web_manage_tabs", "arguments": { "action": "switch", "tab": "first-tab-id" } } ``` ### 4. Iframe Testing (Payment Forms, Widgets) ```json // 1. Navigate to page with iframe { "name": "web_navigate", "arguments": { "url": "https://shop.com/checkout" } } // 2. Enter iframe context { "name": "web_iframe", "arguments": { "action": "enter", "selector": "iframe.payment-frame" } } // 3. Interact with iframe content { "name": "web_interact", "arguments": { "action": "fill", "selector": "#card-number", "value": "4111111111111111" } } { "name": "web_interact", "arguments": { "action": "fill", "selector": "#expiry", "value": "12/25" } } // 4. Exit iframe context { "name": "web_iframe", "arguments": { "action": "exit" } } // 5. Continue with main page { "name": "web_interact", "arguments": { "action": "click", "selector": "#complete-order" } } ``` ## Best Practices for LLMs ### 1. State Awareness - The MCP server automatically tracks state, but always check the response for current context - Use the `current_tab` and `iframe_mode` fields to understand your current position - The `tab_history` helps you understand available tabs ### 2. Error Handling - Always check the `success` field in responses - Use the `error` field for detailed error information - Take screenshots when errors occur for debugging: `"screenshot": true` ### 3. Timeout Management - Use longer timeouts for slow-loading pages or complex interactions - Default 5-second timeouts work for most scenarios - Increase timeouts for file uploads or heavy JavaScript applications ### 4. Screenshot Strategy - Take screenshots at key points for documentation - Use `full_page: true` for comprehensive page captures - Screenshot before and after critical actions for debugging ### 5. Verification Patterns - Always verify actions completed successfully - Use JavaScript extraction to check application state - Combine element extraction with JavaScript validation ## Debugging Failed Tests ### 1. Capture Current State ```json // Get page source for analysis { "name": "web_extract", "arguments": { "type": "source" } } // Take screenshot to see visual state { "name": "web_screenshot", "arguments": { "output": "/tmp/debug-state.png", "full_page": true } } // Check JavaScript console errors { "name": "web_extract", "arguments": { "type": "javascript", "code": "console.error.toString()" } } ``` ### 2. Element Debugging ```json // Check if element exists { "name": "web_extract", "arguments": { "type": "javascript", "code": "document.querySelector('#my-element') !== null" } } // Get element properties { "name": "web_extract", "arguments": { "type": "javascript", "code": "JSON.stringify({visible: document.querySelector('#my-element')?.offsetParent !== null, text: document.querySelector('#my-element')?.textContent})" } } ``` ### 3. Network and Loading Issues ```json // Check if page is still loading { "name": "web_extract", "arguments": { "type": "javascript", "code": "document.readyState" } } // Check for JavaScript errors { "name": "web_extract", "arguments": { "type": "javascript", "code": "window.onerror ? 'Errors detected' : 'No errors'" } } ``` ## Advantages Over CLI Commands ### 1. **Automatic State Management** - No need to manually track tab IDs - Automatic current tab resolution - Persistent iframe context tracking ### 2. **Rich Error Context** - Detailed error messages with context - Automatic screenshot capture on failures - Structured error responses for better debugging ### 3. **Intelligent Abstractions** - Combined operations in single tools - Smart parameter defaults and validation - Automatic resource management ### 4. **Better Performance** - Direct library integration (no subprocess overhead) - Persistent connections to cremote daemon - Efficient state tracking ### 5. **Structured Responses** - Consistent JSON format for all responses - Rich metadata for decision making - Easy parsing and error handling ## Key Differences from CLI Usage | Aspect | CLI Commands | MCP Server | |--------|-------------|------------| | **State Tracking** | Manual tab ID management | Automatic state management | | **Error Handling** | Text parsing required | Structured error objects | | **Screenshots** | Manual command execution | Automatic capture options | | **Performance** | Subprocess overhead | Direct library calls | | **Response Format** | Text output | Structured JSON | | **Context Management** | Manual iframe tracking | Automatic context switching | | **Resource Cleanup** | Manual tab management | Automatic resource tracking | The Cremote MCP Server transforms web automation from a series of CLI commands into an intelligent, stateful API that's optimized for AI-driven testing and automation workflows.