14 KiB

Raw Blame History

LLM Agent Guide: Using Cremote MCP Server for Web Automation

This document provides comprehensive guidance for LLM agents on how to use the Cremote MCP Server for intelligent web automation. The MCP server provides a structured, stateful interface that's optimized for AI-driven web testing and automation workflows.

What is the Cremote MCP Server?

The Cremote MCP Server is a Model Context Protocol implementation that wraps cremote's web automation capabilities in a structured API designed specifically for LLMs. Unlike CLI commands, the MCP server provides:

Automatic State Management: Tracks current tab, tab history, and iframe context
Intelligent Abstractions: High-level tools that combine multiple operations
Rich Error Context: Detailed error information for better debugging
Automatic Screenshots: Built-in screenshot capture for documentation
Structured Responses: Consistent, parseable JSON responses

Prerequisites

Before using the MCP server, ensure the cremote infrastructure is running:

Check if everything is already running:
```
cremote status
```

Start Chromium with remote debugging (if needed):

chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug &

Start cremote daemon (if needed):
```
cremotedaemon &
```
The MCP server should be configured in your MCP client (e.g., Claude Desktop)

Available MCP Tools

1. `web_navigate` - Smart Navigation

Navigate to URLs with automatic tab management and optional screenshot capture.

Parameters:

url (required): URL to navigate to
tab (optional): Specific tab ID (uses current tab if not specified)
screenshot (optional): Take screenshot after navigation (default: false)
timeout (optional): Timeout in seconds (default: 5)

Example:

{
  "name": "web_navigate",
  "arguments": {
    "url": "https://example.com/login",
    "screenshot": true,
    "timeout": 10
  }
}

Smart Behavior:

Automatically opens a new tab if none exists
Updates current tab tracking
Adds tab to history for easy switching

2. `web_interact` - Element Interactions

Interact with web elements through a unified interface.

Parameters:

action (required): "click", "fill", "submit", or "upload"
selector (required): CSS selector for the target element
value (optional): Value for fill/upload actions
tab (optional): Tab ID (uses current tab if not specified)
timeout (optional): Timeout in seconds (default: 5)

Examples:

// Fill a form field
{
  "name": "web_interact",
  "arguments": {
    "action": "fill",
    "selector": "#username",
    "value": "testuser"
  }
}

// Click a button
{
  "name": "web_interact",
  "arguments": {
    "action": "click",
    "selector": "#login-button"
  }
}

// Submit a form
{
  "name": "web_interact",
  "arguments": {
    "action": "submit",
    "selector": "form#login-form"
  }
}

// Upload a file
{
  "name": "web_interact",
  "arguments": {
    "action": "upload",
    "selector": "input[type=file]",
    "value": "/path/to/file.pdf"
  }
}

3. `web_extract` - Data Extraction

Extract information from web pages through multiple methods.

Parameters:

type (required): "source", "element", or "javascript"
selector (optional): CSS selector (required for "element" type)
code (optional): JavaScript code (required for "javascript" type)
tab (optional): Tab ID (uses current tab if not specified)
timeout (optional): Timeout in seconds (default: 5)

Examples:

// Get page source
{
  "name": "web_extract",
  "arguments": {
    "type": "source"
  }
}

// Get specific element HTML
{
  "name": "web_extract",
  "arguments": {
    "type": "element",
    "selector": ".error-message"
  }
}

// Execute JavaScript and get result
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "document.title"
  }
}

// Check form validation
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "document.getElementById('email').validity.valid"
  }
}

4. `web_screenshot` - Screenshot Capture

Take screenshots for documentation and debugging.

Parameters:

output (required): File path for the screenshot
full_page (optional): Capture full page vs viewport (default: false)
tab (optional): Tab ID (uses current tab if not specified)
timeout (optional): Timeout in seconds (default: 5)

Examples:

// Viewport screenshot
{
  "name": "web_screenshot",
  "arguments": {
    "output": "/tmp/login-page.png"
  }
}

// Full page screenshot
{
  "name": "web_screenshot",
  "arguments": {
    "output": "/tmp/full-page.png",
    "full_page": true
  }
}

5. `web_manage_tabs` - Tab Management

Manage browser tabs with automatic state tracking.

Parameters:

action (required): "open", "close", "list", or "switch"
tab (optional): Tab ID (required for "close" and "switch" actions)
timeout (optional): Timeout in seconds (default: 5)

Examples:

// Open new tab
{
  "name": "web_manage_tabs",
  "arguments": {
    "action": "open"
  }
}

// List all tabs
{
  "name": "web_manage_tabs",
  "arguments": {
    "action": "list"
  }
}

// Switch to specific tab
{
  "name": "web_manage_tabs",
  "arguments": {
    "action": "switch",
    "tab": "tab-id-123"
  }
}

// Close current tab
{
  "name": "web_manage_tabs",
  "arguments": {
    "action": "close"
  }
}

6. `web_iframe` - Iframe Context Management

Switch between main page and iframe contexts for testing embedded content.

Parameters:

action (required): "enter" or "exit"
selector (optional): Iframe CSS selector (required for "enter" action)
tab (optional): Tab ID (uses current tab if not specified)

Examples:

// Enter iframe context
{
  "name": "web_iframe",
  "arguments": {
    "action": "enter",
    "selector": "iframe#payment-form"
  }
}

// Exit iframe context (return to main page)
{
  "name": "web_iframe",
  "arguments": {
    "action": "exit"
  }
}

Response Format

All MCP tools return a consistent response structure:

{
  "success": true,
  "data": "...",                    // Tool-specific response data
  "screenshot": "/tmp/shot.png",    // Screenshot path (if captured)
  "current_tab": "tab-id-123",      // Current active tab
  "tab_history": ["tab-id-123"],    // Tab history stack
  "iframe_mode": false,             // Whether in iframe context
  "error": null,                    // Error message (if failed)
  "metadata": {}                    // Additional context information
}

Common Automation Patterns

// 1. Navigate to login page with screenshot
{
  "name": "web_navigate",
  "arguments": {
    "url": "https://myapp.com/login",
    "screenshot": true
  }
}

// 2. Fill credentials
{
  "name": "web_interact",
  "arguments": {
    "action": "fill",
    "selector": "#email",
    "value": "user@example.com"
  }
}

{
  "name": "web_interact",
  "arguments": {
    "action": "fill",
    "selector": "#password",
    "value": "password123"
  }
}

// 3. Submit login
{
  "name": "web_interact",
  "arguments": {
    "action": "click",
    "selector": "#login-button"
  }
}

// 4. Verify success
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "document.querySelector('.welcome-message')?.textContent"
  }
}

// 5. Document result
{
  "name": "web_screenshot",
  "arguments": {
    "output": "/tmp/login-success.png"
  }
}

2. Form Validation Testing

// 1. Navigate to form
{
  "name": "web_navigate",
  "arguments": {
    "url": "https://myapp.com/register"
  }
}

// 2. Test empty form submission
{
  "name": "web_interact",
  "arguments": {
    "action": "click",
    "selector": "#submit-button"
  }
}

// 3. Check for validation errors
{
  "name": "web_extract",
  "arguments": {
    "type": "element",
    "selector": ".error-message"
  }
}

// 4. Test invalid email
{
  "name": "web_interact",
  "arguments": {
    "action": "fill",
    "selector": "#email",
    "value": "invalid-email"
  }
}

// 5. Verify JavaScript validation
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "document.getElementById('email').validity.valid"
  }
}

3. Multi-Tab Workflow

// 1. Open multiple tabs for comparison
{
  "name": "web_manage_tabs",
  "arguments": {
    "action": "open"
  }
}

{
  "name": "web_navigate",
  "arguments": {
    "url": "https://app.com/admin"
  }
}

{
  "name": "web_manage_tabs",
  "arguments": {
    "action": "open"
  }
}

{
  "name": "web_navigate",
  "arguments": {
    "url": "https://app.com/user"
  }
}

// 2. List tabs to see current state
{
  "name": "web_manage_tabs",
  "arguments": {
    "action": "list"
  }
}

// 3. Switch between tabs as needed
{
  "name": "web_manage_tabs",
  "arguments": {
    "action": "switch",
    "tab": "first-tab-id"
  }
}

4. Iframe Testing (Payment Forms, Widgets)

// 1. Navigate to page with iframe
{
  "name": "web_navigate",
  "arguments": {
    "url": "https://shop.com/checkout"
  }
}

// 2. Enter iframe context
{
  "name": "web_iframe",
  "arguments": {
    "action": "enter",
    "selector": "iframe.payment-frame"
  }
}

// 3. Interact with iframe content
{
  "name": "web_interact",
  "arguments": {
    "action": "fill",
    "selector": "#card-number",
    "value": "4111111111111111"
  }
}

{
  "name": "web_interact",
  "arguments": {
    "action": "fill",
    "selector": "#expiry",
    "value": "12/25"
  }
}

// 4. Exit iframe context
{
  "name": "web_iframe",
  "arguments": {
    "action": "exit"
  }
}

// 5. Continue with main page
{
  "name": "web_interact",
  "arguments": {
    "action": "click",
    "selector": "#complete-order"
  }
}

Best Practices for LLMs

1. State Awareness

The MCP server automatically tracks state, but always check the response for current context
Use the current_tab and iframe_mode fields to understand your current position
The tab_history helps you understand available tabs

2. Error Handling

Always check the success field in responses
Use the error field for detailed error information
Take screenshots when errors occur for debugging: "screenshot": true

3. Timeout Management

Use longer timeouts for slow-loading pages or complex interactions
Default 5-second timeouts work for most scenarios
Increase timeouts for file uploads or heavy JavaScript applications

4. Screenshot Strategy

Take screenshots at key points for documentation
Use full_page: true for comprehensive page captures
Screenshot before and after critical actions for debugging

5. Verification Patterns

Always verify actions completed successfully
Use JavaScript extraction to check application state
Combine element extraction with JavaScript validation

Debugging Failed Tests

1. Capture Current State

// Get page source for analysis
{
  "name": "web_extract",
  "arguments": {
    "type": "source"
  }
}

// Take screenshot to see visual state
{
  "name": "web_screenshot",
  "arguments": {
    "output": "/tmp/debug-state.png",
    "full_page": true
  }
}

// Check JavaScript console errors
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "console.error.toString()"
  }
}

2. Element Debugging

// Check if element exists
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "document.querySelector('#my-element') !== null"
  }
}

// Get element properties
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "JSON.stringify({visible: document.querySelector('#my-element')?.offsetParent !== null, text: document.querySelector('#my-element')?.textContent})"
  }
}

3. Network and Loading Issues

// Check if page is still loading
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "document.readyState"
  }
}

// Check for JavaScript errors
{
  "name": "web_extract",
  "arguments": {
    "type": "javascript",
    "code": "window.onerror ? 'Errors detected' : 'No errors'"
  }
}

Advantages Over CLI Commands

1. Automatic State Management

No need to manually track tab IDs
Automatic current tab resolution
Persistent iframe context tracking

2. Rich Error Context

Detailed error messages with context
Automatic screenshot capture on failures
Structured error responses for better debugging

3. Intelligent Abstractions

Combined operations in single tools
Smart parameter defaults and validation
Automatic resource management

4. Better Performance

Direct library integration (no subprocess overhead)
Persistent connections to cremote daemon
Efficient state tracking

5. Structured Responses

Consistent JSON format for all responses
Rich metadata for decision making
Easy parsing and error handling

Key Differences from CLI Usage

Aspect	CLI Commands	MCP Server
State Tracking	Manual tab ID management	Automatic state management
Error Handling	Text parsing required	Structured error objects
Screenshots	Manual command execution	Automatic capture options
Performance	Subprocess overhead	Direct library calls
Response Format	Text output	Structured JSON
Context Management	Manual iframe tracking	Automatic context switching
Resource Cleanup	Manual tab management	Automatic resource tracking

The Cremote MCP Server transforms web automation from a series of CLI commands into an intelligent, stateful API that's optimized for AI-driven testing and automation workflows.

14 KiB Raw Blame History

LLM Agent Guide: Using Cremote MCP Server for Web Automation

What is the Cremote MCP Server?

Prerequisites

Available MCP Tools

1. web_navigate - Smart Navigation

2. web_interact - Element Interactions

3. web_extract - Data Extraction

4. web_screenshot - Screenshot Capture

5. web_manage_tabs - Tab Management

6. web_iframe - Iframe Context Management

Response Format

Common Automation Patterns

1. Login Flow Testing

2. Form Validation Testing

3. Multi-Tab Workflow

4. Iframe Testing (Payment Forms, Widgets)

Best Practices for LLMs

1. State Awareness

2. Error Handling

3. Timeout Management

4. Screenshot Strategy

5. Verification Patterns

Debugging Failed Tests

1. Capture Current State

2. Element Debugging

3. Network and Loading Issues

Advantages Over CLI Commands

1. Automatic State Management

2. Rich Error Context

3. Intelligent Abstractions

4. Better Performance

5. Structured Responses

Key Differences from CLI Usage

14 KiB

Raw Blame History

1. `web_navigate` - Smart Navigation

2. `web_interact` - Element Interactions

3. `web_extract` - Data Extraction

4. `web_screenshot` - Screenshot Capture

5. `web_manage_tabs` - Tab Management

6. `web_iframe` - Iframe Context Management