cremote/mcp/README.md

253 lines
5.0 KiB
Markdown

# Cremote MCP Server
This is a Model Context Protocol (MCP) server that exposes cremote's web automation capabilities to LLMs and AI agents. Instead of using CLI commands, this server provides a structured API that maintains state and provides intelligent abstractions.
## Features
- **State Management**: Automatically tracks current tab, tab history, and iframe context
- **Intelligent Abstractions**: High-level tools that combine multiple cremote operations
- **Automatic Screenshots**: Optional screenshot capture for debugging and documentation
- **Error Recovery**: Better error handling and context for LLMs
- **Resource Management**: Automatic cleanup and connection management
## Quick Start for LLMs
**For LLM agents**: See the comprehensive [LLM MCP Guide](LLM_MCP_GUIDE.md) for detailed usage instructions, examples, and best practices.
## Available Tools
### 1. `web_navigate`
Navigate to URLs with optional screenshot capture.
```json
{
"name": "web_navigate",
"arguments": {
"url": "https://example.com",
"screenshot": true,
"timeout": 10
}
}
```
### 2. `web_interact`
Interact with web elements (click, fill, submit, upload).
```json
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#username",
"value": "testuser",
"timeout": 5
}
}
```
### 3. `web_extract`
Extract data from pages (source, element HTML, JavaScript execution).
```json
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "document.title",
"timeout": 5
}
}
```
### 4. `web_screenshot`
Take screenshots of the current page.
```json
{
"name": "web_screenshot",
"arguments": {
"output": "/tmp/page.png",
"full_page": true,
"timeout": 5
}
}
```
### 5. `web_manage_tabs`
Manage browser tabs (open, close, list, switch).
```json
{
"name": "web_manage_tabs",
"arguments": {
"action": "open",
"timeout": 5
}
}
```
### 6. `web_iframe`
Switch iframe context for subsequent operations.
```json
{
"name": "web_iframe",
"arguments": {
"action": "enter",
"selector": "iframe#payment-form"
}
}
```
## Installation & Usage
### Prerequisites
1. **Cremote daemon must be running**:
```bash
cremotedaemon
```
2. **Chrome/Chromium with remote debugging**:
```bash
chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug
```
### Build the MCP Server
```bash
cd mcp/
go build -o cremote-mcp .
```
### Configuration
Set environment variables to configure the cremote connection:
```bash
export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
```
### Running with Claude Desktop
Add to your Claude Desktop configuration (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
```json
{
"mcpServers": {
"cremote": {
"command": "/path/to/cremote-mcp",
"env": {
"CREMOTE_HOST": "localhost",
"CREMOTE_PORT": "8989"
}
}
}
}
```
### Running with Other MCP Clients
The server communicates via JSON-RPC over stdio, so it can be used with any MCP-compatible client:
```bash
echo '{"method":"tools/list","params":{},"id":1}' | ./cremote-mcp
```
## Response Format
All tool responses include:
```json
{
"success": true,
"data": "...",
"screenshot": "/tmp/screenshot.png",
"current_tab": "tab-id-123",
"tab_history": ["tab-id-123", "tab-id-456"],
"iframe_mode": false,
"error": null,
"metadata": {}
}
```
## Example Workflow
```json
// 1. Navigate to a page
{
"name": "web_navigate",
"arguments": {
"url": "https://example.com/login",
"screenshot": true
}
}
// 2. Fill login form
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#username",
"value": "testuser"
}
}
{
"name": "web_interact",
"arguments": {
"action": "fill",
"selector": "#password",
"value": "password123"
}
}
// 3. Submit form
{
"name": "web_interact",
"arguments": {
"action": "click",
"selector": "#login-button"
}
}
// 4. Extract result
{
"name": "web_extract",
"arguments": {
"type": "javascript",
"code": "document.querySelector('.welcome-message')?.textContent"
}
}
// 5. Take final screenshot
{
"name": "web_screenshot",
"arguments": {
"output": "/tmp/login-success.png",
"full_page": true
}
}
```
## Benefits Over CLI
- **State Management**: No need to manually track tab IDs
- **Better Error Context**: Rich error information for debugging
- **Automatic Screenshots**: Built-in screenshot capture for documentation
- **Intelligent Defaults**: Smart parameter handling and fallbacks
- **Resource Cleanup**: Automatic management of tabs and files
- **Structured Responses**: Consistent, parseable response format
## Development
To extend the MCP server with new tools:
1. Add the tool definition to `handleToolsList()`
2. Add a case in `handleToolCall()`
3. Implement the handler function following the pattern of existing handlers
4. Update this documentation
The server is designed to be easily extensible while maintaining consistency with the cremote client library.