cremote/mcp/README.md

24 KiB

Cremote MCP Server

This is a Model Context Protocol (MCP) server that exposes cremote's web automation capabilities to LLMs and AI agents. Instead of using CLI commands, this server provides a structured API that maintains state and provides intelligent abstractions.

🎉 Complete Web Automation Platform

31 comprehensive tools across 6 enhancement phases, providing a complete web automation toolkit for LLM agents:

🚀 NEW: Multi-Client Support

The Cremote MCP server now supports multiple concurrent clients with isolated browser sessions:

  • Concurrent Agents: Multiple AI agents can use the same browser simultaneously
  • Session Isolation: Each client maintains independent browser state (tabs, history, iframe context)
  • Transport Flexibility: Choose between stdio (single client) or HTTP (multiple clients)
  • Backward Compatible: Existing stdio clients continue to work unchanged

See the Multi-Client Guide for detailed setup and usage instructions.

  • Phase 1: Element state checking and conditional logic (2 tools)
  • Phase 2: Enhanced data extraction and batch operations (4 tools)
  • Phase 3: Form analysis and bulk operations (3 tools)
  • Phase 4: Page state and metadata tools (4 tools)
  • Phase 5: Enhanced screenshots and file management (4 tools)
  • Core Tools: Essential web automation capabilities (10 tools)

Features

  • State Management: Automatically tracks current tab, tab history, and iframe context
  • Intelligent Abstractions: High-level tools that combine multiple cremote operations
  • Batch Operations: Reduce round trips with bulk operations and multi-selector extraction
  • Form Intelligence: Complete form analysis and bulk filling capabilities
  • Rich Context: Page metadata, performance metrics, and content verification
  • Enhanced Screenshots: Element-specific and metadata-rich screenshot capture
  • File Management: Bulk file operations and automated cleanup
  • Accessibility Tree: Chrome accessibility tree interface for semantic understanding
  • Automatic Screenshots: Optional screenshot capture for debugging and documentation
  • Error Recovery: Better error handling and context for LLMs
  • Resource Management: Automatic cleanup and connection management

Quick Start for LLMs

For LLM agents: See the comprehensive LLM Usage Guide for detailed usage instructions, examples, and best practices.

Available Tools (31 Total)

Version Information

version_cremotemcp

Get version information for MCP server and daemon.

{
  "name": "version_cremotemcp",
  "arguments": {}
}

Returns version information for both the MCP server and the connected daemon.

Core Web Automation Tools (10 tools)

1. web_navigate_cremotemcp

Navigate to URLs with optional screenshot capture.

{
  "name": "web_navigate_cremotemcp",
  "arguments": {
    "url": "https://example.com",
    "screenshot": true,
    "timeout": 10
  }
}

2. web_interact_cremotemcp

Interact with web elements (click, fill, submit, upload, select).

{
  "name": "web_interact_cremotemcp",
  "arguments": {
    "action": "fill",
    "selector": "#username",
    "value": "testuser",
    "timeout": 5
  }
}

For select dropdowns:

{
  "name": "web_interact_cremotemcp",
  "arguments": {
    "action": "select",
    "selector": "#country",
    "value": "United States",
    "timeout": 5
  }
}

3. web_extract_cremotemcp

Extract data from pages (source, element HTML, JavaScript execution).

{
  "name": "web_extract_cremotemcp",
  "arguments": {
    "type": "javascript",
    "code": "document.title",
    "timeout": 5
  }
}

4. web_screenshot_cremotemcp

Take screenshots of the current page.

{
  "name": "web_screenshot_cremotemcp",
  "arguments": {
    "output": "/tmp/page.png",
    "full_page": true,
    "timeout": 5
  }
}

5. web_manage_tabs_cremotemcp

Manage browser tabs (open, close, list, switch).

{
  "name": "web_manage_tabs_cremotemcp",
  "arguments": {
    "action": "open",
    "timeout": 5
  }
}

6. web_iframe_cremotemcp

Switch iframe context for subsequent operations.

{
  "name": "web_iframe_cremotemcp",
  "arguments": {
    "action": "enter",
    "selector": "iframe#payment-form"
  }
}

7. file_upload_cremotemcp

Upload files from client to container for use in form uploads.

{
  "name": "file_upload_cremotemcp",
  "arguments": {
    "local_path": "/local/file.txt",
    "container_path": "/tmp/file.txt"
  }
}

Note: The CLI cremote upload-file command now automatically transfers files to the daemon container first, making file uploads seamless even when the daemon runs in a container.

8. file_download_cremotemcp

Download files from container to client (e.g., downloaded files from browser).

{
  "name": "file_download_cremotemcp",
  "arguments": {
    "container_path": "/tmp/downloaded-file.pdf",
    "local_path": "/local/downloaded-file.pdf"
  }
}

9. console_logs_cremotemcp

Get console logs from the browser tab.

{
  "name": "console_logs_cremotemcp",
  "arguments": {
    "tab": "tab-123",
    "timeout": 5
  }
}

10. console_command_cremotemcp

Execute commands in the browser console.

{
  "name": "console_command_cremotemcp",
  "arguments": {
    "command": "document.getElementById('test').innerHTML = 'Hello World'",
    "tab": "tab-123",
    "timeout": 5
  }
}

Phase 1: Element State and Checking Tools (2 tools)

11. web_element_check_cremotemcp

Check element existence, visibility, enabled state, and other properties without interaction.

{
  "name": "web_element_check_cremotemcp",
  "arguments": {
    "selector": "#submit-button",
    "check_type": "all",
    "timeout": 5
  }
}

Check Types:

  • exists: Check if element exists in DOM
  • visible: Check if element is visible (not hidden)
  • enabled: Check if element is enabled (not disabled)
  • focused: Check if element has focus
  • selected: Check if element is selected (checkboxes, radio buttons)
  • all: Check all states above

Response includes:

{
  "exists": true,
  "visible": true,
  "enabled": false,
  "focused": false,
  "selected": true,
  "count": 1
}

12. web_element_attributes_cremotemcp

Get element attributes, properties, and computed styles.

{
  "name": "web_element_attributes_cremotemcp",
  "arguments": {
    "selector": "#user-profile",
    "attributes": "all",
    "timeout": 5
  }
}

Attribute Options:

  • all: Get common attributes, properties, and styles
  • "id,class,href": Comma-separated list of specific attributes
  • "style_display,style_color": Computed styles (prefix with style_)
  • "prop_textContent,prop_value": JavaScript properties (prefix with prop_)

Example Response:

{
  "id": "user-profile",
  "class": "profile-card active",
  "data-user-id": "12345",
  "textContent": "John Doe",
  "style_display": "block",
  "style_color": "rgb(0, 0, 0)"
}

Phase 2: Enhanced Data Extraction Tools (4 tools)

13. web_extract_multiple_cremotemcp

Extract data from multiple selectors in a single call for improved efficiency.

{
  "name": "web_extract_multiple_cremotemcp",
  "arguments": {
    "selectors": {
      "title": "h1",
      "price": ".price",
      "description": ".product-description"
    },
    "timeout": 5
  }
}

Extract all links from a page with powerful filtering options.

{
  "name": "web_extract_links_cremotemcp",
  "arguments": {
    "container_selector": "nav",
    "href_pattern": "https://.*",
    "text_pattern": ".*Download.*",
    "timeout": 5
  }
}

15. web_extract_table_cremotemcp

Extract table data as structured JSON with optional header processing.

{
  "name": "web_extract_table_cremotemcp",
  "arguments": {
    "selector": "#data-table",
    "include_headers": true,
    "timeout": 5
  }
}

16. web_extract_text_cremotemcp

Extract text content with optional pattern matching and different extraction types.

{
  "name": "web_extract_text_cremotemcp",
  "arguments": {
    "selector": ".content",
    "pattern": "\\d{3}-\\d{3}-\\d{4}",
    "extract_type": "textContent",
    "timeout": 5
  }
}

Phase 3: Form Analysis and Bulk Operations (3 tools)

17. web_form_analyze_cremotemcp

Analyze forms completely to understand their structure, fields, and submission requirements.

{
  "name": "web_form_analyze_cremotemcp",
  "arguments": {
    "selector": "#registration-form",
    "timeout": 10
  }
}

18. web_interact_multiple_cremotemcp

Perform multiple interactions in a single call for efficient batch operations.

{
  "name": "web_interact_multiple_cremotemcp",
  "arguments": {
    "interactions": [
      {"selector": "#username", "action": "fill", "value": "testuser"},
      {"selector": "#password", "action": "fill", "value": "testpass"},
      {"selector": "#remember-me", "action": "check"},
      {"selector": "#login-btn", "action": "click"}
    ],
    "timeout": 10
  }
}

19. web_form_fill_bulk_cremotemcp

Fill entire forms with key-value pairs in a single operation.

{
  "name": "web_form_fill_bulk_cremotemcp",
  "arguments": {
    "form_selector": "#contact-form",
    "fields": {
      "name": "John Doe",
      "email": "john@example.com",
      "message": "Hello, this is a test message."
    },
    "timeout": 10
  }
}

Phase 4: Page State and Metadata Tools (4 tools)

20. web_page_info_cremotemcp

Get comprehensive page metadata and state information.

{
  "name": "web_page_info_cremotemcp",
  "arguments": {
    "tab": "tab-123",
    "timeout": 5
  }
}

Returns detailed page information including title, URL, loading state, domain, protocol, and browser status.

21. web_viewport_info_cremotemcp

Get viewport and scroll information.

{
  "name": "web_viewport_info_cremotemcp",
  "arguments": {
    "tab": "tab-123",
    "timeout": 5
  }
}

Returns viewport dimensions, scroll position, device pixel ratio, and orientation.

22. web_performance_metrics_cremotemcp

Get page performance metrics.

{
  "name": "web_performance_metrics_cremotemcp",
  "arguments": {
    "tab": "tab-123",
    "timeout": 5
  }
}

Returns performance data including load times, resource counts, and memory usage.

23. web_content_check_cremotemcp

Check for specific content types and loading states.

{
  "name": "web_content_check_cremotemcp",
  "arguments": {
    "type": "images",
    "tab": "tab-123",
    "timeout": 5
  }
}

Supported content types: images, scripts, styles, forms, links, iframes, errors.

Phase 5: Enhanced Screenshot and File Management (4 tools)

24. web_screenshot_element_cremotemcp

Take a screenshot of a specific element on the page.

{
  "name": "web_screenshot_element_cremotemcp",
  "arguments": {
    "selector": "#main-content",
    "output": "/tmp/element-screenshot.png",
    "tab": "tab-123",
    "timeout": 5
  }
}

Automatically scrolls the element into view and captures a screenshot of just that element.

25. web_screenshot_enhanced_cremotemcp

Take an enhanced screenshot with metadata.

{
  "name": "web_screenshot_enhanced_cremotemcp",
  "arguments": {
    "output": "/tmp/enhanced-screenshot.png",
    "full_page": true,
    "tab": "tab-123",
    "timeout": 5
  }
}

Returns screenshot metadata including timestamp, URL, title, viewport size, and file information.

26. file_operations_bulk_cremotemcp

Perform bulk file operations (upload/download multiple files).

{
  "name": "file_operations_bulk_cremotemcp",
  "arguments": {
    "operation": "upload",
    "files": [
      {
        "local_path": "/local/file1.txt",
        "container_path": "/tmp/file1.txt"
      },
      {
        "local_path": "/local/file2.txt",
        "container_path": "/tmp/file2.txt"
      }
    ],
    "timeout": 30
  }
}

Supports both "upload" and "download" operations with detailed success/failure reporting.

27. file_management_cremotemcp

Manage files (cleanup, list, get info).

{
  "name": "file_management_cremotemcp",
  "arguments": {
    "operation": "cleanup",
    "pattern": "/tmp/cremote-*",
    "max_age": "24"
  }
}

Operations: cleanup (remove old files), list (list files), info (get file details).

🎉 Complete Enhancement Summary

All 5 phases of the MCP enhancement plan have been successfully implemented, delivering a comprehensive web automation platform with 27 tools organized across the following capabilities:

Phase 1: Element State and Checking (2 tools)

Enables conditional logic without timing issues

  • web_element_check_cremotemcp: Check existence, visibility, enabled state, count elements
  • web_element_attributes_cremotemcp: Get attributes, properties, computed styles

Benefits: LLMs can make decisions based on page state, prevent errors from trying to interact with non-existent elements, enable conditional workflows.

Phase 2: Enhanced Data Extraction (4 tools)

Dramatically improves data gathering efficiency

  • web_extract_multiple_cremotemcp: Extract from multiple selectors in one call
  • web_extract_links_cremotemcp: Extract all links with filtering options
  • web_extract_table_cremotemcp: Extract table data as structured JSON
  • web_extract_text_cremotemcp: Extract text with pattern matching

Benefits: Reduces multiple round trips to single calls, provides structured data ready for LLM processing, enables comprehensive page analysis.

Phase 3: Form Analysis and Bulk Operations (3 tools)

Streamlines form handling workflows with 10x efficiency

  • web_form_analyze_cremotemcp: Analyze forms completely
  • web_interact_multiple_cremotemcp: Batch interactions
  • web_form_fill_bulk_cremotemcp: Fill entire forms with key-value pairs

Benefits: Complete forms in 1-2 calls instead of 10+, form intelligence provides complete understanding before interaction, error prevention through field validation.

Phase 4: Page State and Metadata Tools (4 tools)

Provides rich context about page state for better debugging and monitoring

  • web_page_info_cremotemcp: Get page metadata and loading state
  • web_viewport_info_cremotemcp: Get viewport and scroll information
  • web_performance_metrics_cremotemcp: Get performance data
  • web_content_check_cremotemcp: Check for specific content types

Benefits: Better debugging and monitoring capabilities, performance optimization insights, content loading verification, rich page state context for LLM decision making.

Phase 5: Enhanced Screenshot and File Management (4 tools)

Improves debugging and file handling

  • web_screenshot_element_cremotemcp: Screenshot specific elements
  • web_screenshot_enhanced_cremotemcp: Screenshots with metadata
  • file_operations_bulk_cremotemcp: Bulk file operations
  • file_management_cremotemcp: Temporary file cleanup

Benefits: Better debugging with targeted screenshots, improved file handling workflows, automatic resource management, enhanced visual debugging capabilities.

Key Benefits for LLM Agents

🚀 Efficiency Gains

  • 10x Form Efficiency: Complete forms in 1-2 calls instead of 10+ individual interactions
  • Batch Operations: Multiple data extractions and interactions in single calls
  • Reduced Round Trips: Comprehensive tools minimize API call overhead

🧠 Intelligence & Context

  • Conditional Logic: Element checking enables smart decision making without timing issues
  • Rich Page Context: Complete page state, performance metrics, and content verification
  • Form Intelligence: Complete form analysis before interaction prevents errors

🛠 Enhanced Capabilities

  • Visual Debugging: Element-specific screenshots and enhanced metadata
  • File Management: Bulk operations and automated cleanup
  • Error Prevention: State checking and validation before actions
  • Resource Management: Automatic cleanup and connection handling

Installation & Usage

Prerequisites

  1. Cremote daemon must be running:

    cremotedaemon
    
  2. Chrome/Chromium with remote debugging:

    chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug
    

Build the MCP Server

cd mcp/
go build -o cremote-mcp .

Configuration

Basic Configuration (Single Client - stdio)

Set environment variables to configure the cremote connection:

export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
export CREMOTE_TRANSPORT=stdio  # Default

Multi-Client Configuration (HTTP Transport)

For multiple concurrent clients:

export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
export CREMOTE_TRANSPORT=http
export CREMOTE_HTTP_HOST=localhost
export CREMOTE_HTTP_PORT=8990

Environment Variables

Variable Default Description
CREMOTE_TRANSPORT stdio Transport mode: stdio or http
CREMOTE_HOST localhost Cremote daemon host
CREMOTE_PORT 8989 Cremote daemon port
CREMOTE_HTTP_HOST localhost HTTP server host (HTTP mode only)
CREMOTE_HTTP_PORT 8990 HTTP server port (HTTP mode only)

Running with Claude Desktop

Add to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "cremote": {
      "command": "/path/to/cremote-mcp",
      "env": {
        "CREMOTE_HOST": "localhost",
        "CREMOTE_PORT": "8989"
      }
    }
  }
}

Running with Other MCP Clients

The server communicates via JSON-RPC over stdio, so it can be used with any MCP-compatible client:

echo '{"method":"tools/list","params":{},"id":1}' | ./cremote-mcp

Response Format

All tool responses include:

{
  "success": true,
  "data": "...",
  "screenshot": "/tmp/screenshot.png",
  "current_tab": "tab-id-123",
  "tab_history": ["tab-id-123", "tab-id-456"],
  "iframe_mode": false,
  "error": null,
  "metadata": {}
}

Example Workflows

Basic Login Workflow (Traditional Approach)

// 1. Navigate to a page
{
  "name": "web_navigate_cremotemcp",
  "arguments": {
    "url": "https://example.com/login",
    "screenshot": true
  }
}

// 2. Check if login form exists
{
  "name": "web_element_check_cremotemcp",
  "arguments": {
    "selector": "#login-form",
    "check_type": "exists"
  }
}

// 3. Fill login form using bulk operations
{
  "name": "web_form_fill_bulk_cremotemcp",
  "arguments": {
    "form_selector": "#login-form",
    "fields": {
      "username": "testuser",
      "password": "password123"
    }
  }
}

// 4. Submit and verify
{
  "name": "web_interact_cremotemcp",
  "arguments": {
    "action": "click",
    "selector": "#login-button"
  }
}

// 5. Extract multiple results at once
{
  "name": "web_extract_multiple_cremotemcp",
  "arguments": {
    "selectors": {
      "welcome_message": ".welcome-message",
      "user_name": ".user-profile .name",
      "last_login": ".user-info .last-login"
    }
  }
}

// 6. Take enhanced screenshot with metadata
{
  "name": "web_screenshot_enhanced_cremotemcp",
  "arguments": {
    "output": "/tmp/login-success.png",
    "full_page": true
  }
}

Advanced E-commerce Data Extraction Workflow

// 1. Navigate and check page state
{
  "name": "web_navigate_cremotemcp",
  "arguments": {
    "url": "https://shop.example.com/products",
    "screenshot": true
  }
}

// 2. Get page performance metrics
{
  "name": "web_performance_metrics_cremotemcp",
  "arguments": {}
}

// 3. Extract all product data in one call
{
  "name": "web_extract_multiple_cremotemcp",
  "arguments": {
    "selectors": {
      "product_titles": ".product-card h3",
      "prices": ".product-card .price",
      "ratings": ".product-card .rating",
      "availability": ".product-card .stock-status"
    }
  }
}

// 4. Extract all product links with filtering
{
  "name": "web_extract_links_cremotemcp",
  "arguments": {
    "container_selector": ".product-grid",
    "href_pattern": ".*/product/.*",
    "text_pattern": ".*"
  }
}

// 5. Check if more products are loading
{
  "name": "web_content_check_cremotemcp",
  "arguments": {
    "type": "scripts"
  }
}

Phase 6: Accessibility Tree Support (3 Tools)

get_accessibility_tree_cremotemcp

Get the full accessibility tree for a page or with limited depth.

{
  "name": "get_accessibility_tree_cremotemcp",
  "arguments": {
    "tab": "optional-tab-id",
    "depth": 3,
    "timeout": 10
  }
}

get_partial_accessibility_tree_cremotemcp

Get accessibility tree for a specific element and its relatives.

{
  "name": "get_partial_accessibility_tree_cremotemcp",
  "arguments": {
    "selector": "form",
    "tab": "optional-tab-id",
    "fetch_relatives": true,
    "timeout": 10
  }
}

query_accessibility_tree_cremotemcp

Query accessibility tree for nodes matching specific criteria.

{
  "name": "query_accessibility_tree_cremotemcp",
  "arguments": {
    "tab": "optional-tab-id",
    "selector": "form",
    "accessible_name": "Submit",
    "role": "button",
    "timeout": 10
  }
}

Benefits Over CLI

🎯 Enhanced Efficiency

  • State Management: No need to manually track tab IDs
  • Batch Operations: 10x efficiency with bulk form filling and multi-selector extraction
  • Intelligent Defaults: Smart parameter handling and fallbacks
  • Resource Cleanup: Automatic management of tabs and files

🔍 Better Intelligence

  • Conditional Logic: Element checking enables smart decision making
  • Rich Context: Page state, performance metrics, and content verification
  • Form Intelligence: Complete form analysis before interaction
  • Error Prevention: State validation before actions

🛠 Advanced Capabilities

  • Enhanced Screenshots: Element-specific and metadata-rich capture
  • File Management: Bulk operations and automated cleanup
  • Better Error Context: Rich error information for debugging
  • Structured Responses: Consistent, parseable response format

🎉 Production Ready

This comprehensive web automation platform is production ready with:

  • 31 Tools: Complete coverage of web automation needs
  • 6 Enhancement Phases: Systematic capability building from basic to advanced
  • Extensive Testing: All tools validated and documented
  • LLM Optimized: Designed specifically for AI agent workflows
  • Backward Compatible: All existing tools continue to work unchanged

📊 Capability Matrix

Category Tools Key Benefits
Core Web Automation 10 tools Navigation, interaction, extraction, screenshots, tabs, iframes, files, console
Element Intelligence 2 tools Conditional logic, state checking, attribute inspection
Data Extraction 4 tools Batch extraction, structured data, pattern matching, table processing
Form Automation 3 tools Form analysis, bulk filling, batch interactions
Page Intelligence 4 tools Page state, performance metrics, content verification, viewport info
Enhanced Capabilities 4 tools Element screenshots, enhanced metadata, bulk file ops, file management
Accessibility Tree 3 tools Semantic understanding, accessibility testing, screen reader simulation

Development

To extend the MCP server with new tools:

  1. Add the tool definition to handleToolsList()
  2. Add a case in handleToolCall()
  3. Implement the handler function following the pattern of existing handlers
  4. Update this documentation

The server is designed to be easily extensible while maintaining consistency with the cremote client library.


🚀 Ready for Production: Complete web automation platform with 31 tools across 6 enhancement phases, optimized for LLM agents and production workflows.