14 KiB
LLM Agent Guide: Using Cremote MCP Server for Web Automation
This document provides comprehensive guidance for LLM agents on how to use the Cremote MCP Server for intelligent web automation. The MCP server provides a structured, stateful interface that's optimized for AI-driven web testing and automation workflows.
What is the Cremote MCP Server?
The Cremote MCP Server is a Model Context Protocol implementation that wraps cremote's web automation capabilities in a structured API designed specifically for LLMs. Unlike CLI commands, the MCP server provides:
- Automatic State Management: Tracks current tab, tab history, and iframe context
- Intelligent Abstractions: High-level tools that combine multiple operations
- Rich Error Context: Detailed error information for better debugging
- Automatic Screenshots: Built-in screenshot capture for documentation
- Structured Responses: Consistent, parseable JSON responses
Prerequisites
Before using the MCP server, ensure the cremote infrastructure is running:
-
Check if everything is already running:
cremote status
-
Start Chromium with remote debugging (if needed):
chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug &
-
Start cremote daemon (if needed):
cremotedaemon &
-
The MCP server should be configured in your MCP client (e.g., Claude Desktop)
Available MCP Tools
1. web_navigate
- Smart Navigation
Navigate to URLs with automatic tab management and optional screenshot capture.
Parameters:
url
(required): URL to navigate totab
(optional): Specific tab ID (uses current tab if not specified)screenshot
(optional): Take screenshot after navigation (default: false)timeout
(optional): Timeout in seconds (default: 5)
Example:
{
"name": "web_navigate",
"arguments": {
"url": "https://example.com/login",
"screenshot": true,
"timeout": 10
}
}
Smart Behavior:
- Automatically opens a new tab if none exists
- Updates current tab tracking
- Adds tab to history for easy switching
2. web_interact
- Element Interactions
Interact with web elements through a unified interface.
Parameters:
action
(required): "click", "fill", "submit", or "upload"selector
(required): CSS selector for the target elementvalue
(optional): Value for fill/upload actionstab
(optional): Tab ID (uses current tab if not specified)timeout
(optional): Timeout in seconds (default: 5)
Examples:
// Fill a form field
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#username",
"value": "testuser"
}
}
// Click a button
{
"name": "web_interact",
"arguments": {
"action": "click",
"selector": "#login-button"
}
}
// Submit a form
{
"name": "web_interact",
"arguments": {
"action": "submit",
"selector": "form#login-form"
}
}
// Upload a file
{
"name": "web_interact",
"arguments": {
"action": "upload",
"selector": "input[type=file]",
"value": "/path/to/file.pdf"
}
}
3. web_extract
- Data Extraction
Extract information from web pages through multiple methods.
Parameters:
type
(required): "source", "element", or "javascript"selector
(optional): CSS selector (required for "element" type)code
(optional): JavaScript code (required for "javascript" type)tab
(optional): Tab ID (uses current tab if not specified)timeout
(optional): Timeout in seconds (default: 5)
Examples:
// Get page source
{
"name": "web_extract",
"arguments": {
"type": "source"
}
}
// Get specific element HTML
{
"name": "web_extract",
"arguments": {
"type": "element",
"selector": ".error-message"
}
}
// Execute JavaScript and get result
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "document.title"
}
}
// Check form validation
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "document.getElementById('email').validity.valid"
}
}
4. web_screenshot
- Screenshot Capture
Take screenshots for documentation and debugging.
Parameters:
output
(required): File path for the screenshotfull_page
(optional): Capture full page vs viewport (default: false)tab
(optional): Tab ID (uses current tab if not specified)timeout
(optional): Timeout in seconds (default: 5)
Examples:
// Viewport screenshot
{
"name": "web_screenshot",
"arguments": {
"output": "/tmp/login-page.png"
}
}
// Full page screenshot
{
"name": "web_screenshot",
"arguments": {
"output": "/tmp/full-page.png",
"full_page": true
}
}
5. web_manage_tabs
- Tab Management
Manage browser tabs with automatic state tracking.
Parameters:
action
(required): "open", "close", "list", or "switch"tab
(optional): Tab ID (required for "close" and "switch" actions)timeout
(optional): Timeout in seconds (default: 5)
Examples:
// Open new tab
{
"name": "web_manage_tabs",
"arguments": {
"action": "open"
}
}
// List all tabs
{
"name": "web_manage_tabs",
"arguments": {
"action": "list"
}
}
// Switch to specific tab
{
"name": "web_manage_tabs",
"arguments": {
"action": "switch",
"tab": "tab-id-123"
}
}
// Close current tab
{
"name": "web_manage_tabs",
"arguments": {
"action": "close"
}
}
6. web_iframe
- Iframe Context Management
Switch between main page and iframe contexts for testing embedded content.
Parameters:
action
(required): "enter" or "exit"selector
(optional): Iframe CSS selector (required for "enter" action)tab
(optional): Tab ID (uses current tab if not specified)
Examples:
// Enter iframe context
{
"name": "web_iframe",
"arguments": {
"action": "enter",
"selector": "iframe#payment-form"
}
}
// Exit iframe context (return to main page)
{
"name": "web_iframe",
"arguments": {
"action": "exit"
}
}
Response Format
All MCP tools return a consistent response structure:
{
"success": true,
"data": "...", // Tool-specific response data
"screenshot": "/tmp/shot.png", // Screenshot path (if captured)
"current_tab": "tab-id-123", // Current active tab
"tab_history": ["tab-id-123"], // Tab history stack
"iframe_mode": false, // Whether in iframe context
"error": null, // Error message (if failed)
"metadata": {} // Additional context information
}
Common Automation Patterns
1. Login Flow Testing
// 1. Navigate to login page with screenshot
{
"name": "web_navigate",
"arguments": {
"url": "https://myapp.com/login",
"screenshot": true
}
}
// 2. Fill credentials
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#email",
"value": "user@example.com"
}
}
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#password",
"value": "password123"
}
}
// 3. Submit login
{
"name": "web_interact",
"arguments": {
"action": "click",
"selector": "#login-button"
}
}
// 4. Verify success
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "document.querySelector('.welcome-message')?.textContent"
}
}
// 5. Document result
{
"name": "web_screenshot",
"arguments": {
"output": "/tmp/login-success.png"
}
}
2. Form Validation Testing
// 1. Navigate to form
{
"name": "web_navigate",
"arguments": {
"url": "https://myapp.com/register"
}
}
// 2. Test empty form submission
{
"name": "web_interact",
"arguments": {
"action": "click",
"selector": "#submit-button"
}
}
// 3. Check for validation errors
{
"name": "web_extract",
"arguments": {
"type": "element",
"selector": ".error-message"
}
}
// 4. Test invalid email
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#email",
"value": "invalid-email"
}
}
// 5. Verify JavaScript validation
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "document.getElementById('email').validity.valid"
}
}
3. Multi-Tab Workflow
// 1. Open multiple tabs for comparison
{
"name": "web_manage_tabs",
"arguments": {
"action": "open"
}
}
{
"name": "web_navigate",
"arguments": {
"url": "https://app.com/admin"
}
}
{
"name": "web_manage_tabs",
"arguments": {
"action": "open"
}
}
{
"name": "web_navigate",
"arguments": {
"url": "https://app.com/user"
}
}
// 2. List tabs to see current state
{
"name": "web_manage_tabs",
"arguments": {
"action": "list"
}
}
// 3. Switch between tabs as needed
{
"name": "web_manage_tabs",
"arguments": {
"action": "switch",
"tab": "first-tab-id"
}
}
4. Iframe Testing (Payment Forms, Widgets)
// 1. Navigate to page with iframe
{
"name": "web_navigate",
"arguments": {
"url": "https://shop.com/checkout"
}
}
// 2. Enter iframe context
{
"name": "web_iframe",
"arguments": {
"action": "enter",
"selector": "iframe.payment-frame"
}
}
// 3. Interact with iframe content
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#card-number",
"value": "4111111111111111"
}
}
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#expiry",
"value": "12/25"
}
}
// 4. Exit iframe context
{
"name": "web_iframe",
"arguments": {
"action": "exit"
}
}
// 5. Continue with main page
{
"name": "web_interact",
"arguments": {
"action": "click",
"selector": "#complete-order"
}
}
Best Practices for LLMs
1. State Awareness
- The MCP server automatically tracks state, but always check the response for current context
- Use the
current_tab
andiframe_mode
fields to understand your current position - The
tab_history
helps you understand available tabs
2. Error Handling
- Always check the
success
field in responses - Use the
error
field for detailed error information - Take screenshots when errors occur for debugging:
"screenshot": true
3. Timeout Management
- Use longer timeouts for slow-loading pages or complex interactions
- Default 5-second timeouts work for most scenarios
- Increase timeouts for file uploads or heavy JavaScript applications
4. Screenshot Strategy
- Take screenshots at key points for documentation
- Use
full_page: true
for comprehensive page captures - Screenshot before and after critical actions for debugging
5. Verification Patterns
- Always verify actions completed successfully
- Use JavaScript extraction to check application state
- Combine element extraction with JavaScript validation
Debugging Failed Tests
1. Capture Current State
// Get page source for analysis
{
"name": "web_extract",
"arguments": {
"type": "source"
}
}
// Take screenshot to see visual state
{
"name": "web_screenshot",
"arguments": {
"output": "/tmp/debug-state.png",
"full_page": true
}
}
// Check JavaScript console errors
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "console.error.toString()"
}
}
2. Element Debugging
// Check if element exists
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "document.querySelector('#my-element') !== null"
}
}
// Get element properties
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "JSON.stringify({visible: document.querySelector('#my-element')?.offsetParent !== null, text: document.querySelector('#my-element')?.textContent})"
}
}
3. Network and Loading Issues
// Check if page is still loading
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "document.readyState"
}
}
// Check for JavaScript errors
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "window.onerror ? 'Errors detected' : 'No errors'"
}
}
Advantages Over CLI Commands
1. Automatic State Management
- No need to manually track tab IDs
- Automatic current tab resolution
- Persistent iframe context tracking
2. Rich Error Context
- Detailed error messages with context
- Automatic screenshot capture on failures
- Structured error responses for better debugging
3. Intelligent Abstractions
- Combined operations in single tools
- Smart parameter defaults and validation
- Automatic resource management
4. Better Performance
- Direct library integration (no subprocess overhead)
- Persistent connections to cremote daemon
- Efficient state tracking
5. Structured Responses
- Consistent JSON format for all responses
- Rich metadata for decision making
- Easy parsing and error handling
Key Differences from CLI Usage
Aspect | CLI Commands | MCP Server |
---|---|---|
State Tracking | Manual tab ID management | Automatic state management |
Error Handling | Text parsing required | Structured error objects |
Screenshots | Manual command execution | Automatic capture options |
Performance | Subprocess overhead | Direct library calls |
Response Format | Text output | Structured JSON |
Context Management | Manual iframe tracking | Automatic context switching |
Resource Cleanup | Manual tab management | Automatic resource tracking |
The Cremote MCP Server transforms web automation from a series of CLI commands into an intelligent, stateful API that's optimized for AI-driven testing and automation workflows.