cremote/mcp/LLM_USAGE_GUIDE.md

380 lines
8.9 KiB
Markdown

# Cremote MCP Tools - LLM Usage Guide
This guide explains how LLMs can use the cremote MCP (Model Context Protocol) tools for web automation tasks.
## Available Tools
The cremote MCP server provides six comprehensive web automation tools:
### 1. `web_navigate_cremotemcp`
Navigate to URLs and optionally take screenshots.
**Parameters:**
- `url` (required): The URL to navigate to
- `screenshot` (optional): Boolean, take a screenshot after navigation
- `tab` (optional): Specific tab ID to use
- `timeout` (optional): Timeout in seconds (default: 5)
**Example Usage:**
```
web_navigate_cremotemcp:
url: "https://example.com"
screenshot: true
```
### 2. `web_interact_cremotemcp`
Interact with web elements through various actions.
**Parameters:**
- `action` (required): One of "click", "fill", "submit", "upload"
- `selector` (required): CSS selector for the target element
- `value` (optional): Value for fill/upload actions
- `tab` (optional): Specific tab ID to use
- `timeout` (optional): Timeout in seconds (default: 5)
**Example Usage:**
```
web_interact_cremotemcp:
action: "click"
selector: "button.submit"
web_interact_cremotemcp:
action: "fill"
selector: "input[name='email']"
value: "user@example.com"
```
### 3. `web_extract_cremotemcp`
Extract data from web pages (HTML source, element content, or JavaScript execution).
**Parameters:**
- `type` (required): One of "source", "element", "javascript"
- `selector` (optional): CSS selector (required for "element" type)
- `code` (optional): JavaScript code (required for "javascript" type)
- `tab` (optional): Specific tab ID to use
- `timeout` (optional): Timeout in seconds (default: 5)
**Example Usage:**
```
web_extract_cremotemcp:
type: "source"
web_extract_cremotemcp:
type: "element"
selector: "div.content"
web_extract_cremotemcp:
type: "javascript"
code: "document.title"
```
### 4. `web_screenshot_cremotemcp`
Take screenshots of web pages.
**Parameters:**
- `output` (required): File path where screenshot will be saved
- `full_page` (optional): Capture full page (default: false)
- `tab` (optional): Specific tab ID to use
- `timeout` (optional): Timeout in seconds (default: 5)
**Example Usage:**
```
web_screenshot_cremotemcp:
output: "/tmp/page-screenshot.png"
full_page: true
```
### 5. `web_manage_tabs_cremotemcp`
Manage browser tabs (open, close, list, switch).
**Parameters:**
- `action` (required): One of "open", "close", "list", "switch"
- `tab` (optional): Tab ID (required for "close" and "switch" actions)
- `timeout` (optional): Timeout in seconds (default: 5)
**Example Usage:**
```
web_manage_tabs_cremotemcp:
action: "open"
web_manage_tabs_cremotemcp:
action: "list"
web_manage_tabs_cremotemcp:
action: "switch"
tab: "ABC123"
```
### 6. `web_iframe_cremotemcp`
Switch iframe context for subsequent operations.
**Parameters:**
- `action` (required): One of "enter", "exit"
- `selector` (optional): Iframe CSS selector (required for "enter" action)
- `tab` (optional): Specific tab ID to use
**Example Usage:**
```
web_iframe_cremotemcp:
action: "enter"
selector: "iframe#payment-form"
web_iframe_cremotemcp:
action: "exit"
```
## Common Usage Patterns
### 1. Basic Web Navigation
```
# Navigate to a website
web_navigate_cremotemcp:
url: "https://example.com"
screenshot: true
```
### 2. Form Interaction Sequence
```
# 1. Navigate to the page
web_navigate_cremotemcp:
url: "https://example.com/login"
# 2. Fill username field
web_interact_cremotemcp:
action: "fill"
selector: "input[name='username']"
value: "myusername"
# 3. Fill password field
web_interact_cremotemcp:
action: "fill"
selector: "input[name='password']"
value: "mypassword"
# 4. Submit the form
web_interact_cremotemcp:
action: "submit"
selector: "form"
```
### 3. Clicking Elements
```
# Click a button
web_interact_cremotemcp:
action: "click"
selector: "button#submit-btn"
# Click a link
web_interact_cremotemcp:
action: "click"
selector: "a[href='/dashboard']"
```
## Best Practices for LLMs
### 1. Always Start with Navigation
Before interacting with elements, navigate to the target page:
```
web_navigate_cremotemcp:
url: "https://target-website.com"
```
### 2. Use Specific CSS Selectors
Be as specific as possible with selectors to avoid ambiguity:
- Good: `input[name='email']`, `button.primary-submit`
- Avoid: `input`, `button`
### 3. Take Screenshots for Debugging
When troubleshooting or documenting, use screenshots:
```
web_navigate_cremotemcp:
url: "https://example.com"
screenshot: true
```
### 4. Handle Timeouts Appropriately
For slow-loading pages, increase timeout:
```
web_navigate_cremotemcp:
url: "https://slow-website.com"
timeout: 10
```
### 5. Sequential Operations
Perform operations in logical sequence:
1. Navigate to page
2. Fill required fields
3. Submit forms
4. Navigate to next page if needed
## Error Handling
### Common Error Scenarios:
1. **Element not found**: Selector doesn't match any elements
2. **Timeout**: Page takes too long to load or element to appear
3. **Navigation failed**: URL is invalid or unreachable
### Troubleshooting Tips:
1. Verify the URL is correct and accessible
2. Check CSS selectors using browser developer tools
3. Increase timeout for slow-loading content
4. Take screenshots to see current page state
## Tab Management
The tools automatically manage browser tabs:
- If no tab is specified, a new tab is created automatically
- Tab IDs are returned in responses for reference
- Multiple tabs can be managed by specifying tab IDs
## Security Considerations
### Safe Practices:
- Only navigate to trusted websites
- Be cautious with form submissions
- Avoid entering sensitive information in examples
- Use screenshots sparingly to avoid exposing sensitive data
### Limitations:
- Cannot bypass CAPTCHA or other anti-automation measures
- Subject to same-origin policy restrictions
- May not work with heavily JavaScript-dependent sites
## Example: Complete Web Automation Task
Here's a complete example of automating a web form:
```
# Step 1: Navigate to the form page
web_navigate_cremotemcp:
url: "https://example.com/contact"
screenshot: true
# Step 2: Fill out the contact form
web_interact_cremotemcp:
action: "fill"
selector: "input[name='name']"
value: "John Doe"
web_interact_cremotemcp:
action: "fill"
selector: "input[name='email']"
value: "john@example.com"
web_interact_cremotemcp:
action: "fill"
selector: "textarea[name='message']"
value: "Hello, this is a test message."
# Step 3: Submit the form
web_interact_cremotemcp:
action: "submit"
selector: "form#contact-form"
# Step 4: Take a screenshot of the result
web_navigate_cremotemcp:
url: "current" # Stay on current page
screenshot: true
```
## Integration Notes
- Tools use the `_cremotemcp` suffix to avoid naming conflicts
- Responses include success status and descriptive messages
- Screenshots are saved to `/tmp/` directory with timestamps
- The underlying cremote daemon handles browser management
## Advanced Usage Examples
### Testing Web Applications
```
# Navigate to application
web_navigate_cremotemcp:
url: "https://myapp.com/login"
screenshot: true
# Test login functionality
web_interact_cremotemcp:
action: "fill"
selector: "#username"
value: "testuser"
web_interact_cremotemcp:
action: "fill"
selector: "#password"
value: "testpass"
web_interact_cremotemcp:
action: "click"
selector: "button[type='submit']"
# Verify successful login
web_navigate_cremotemcp:
url: "current"
screenshot: true
```
### Data Extraction Workflows
```
# Navigate to data source
web_navigate_cremotemcp:
url: "https://data-site.com/table"
# Click through pagination or filters
web_interact_cremotemcp:
action: "click"
selector: ".filter-button"
# Take screenshot to document current state
web_navigate_cremotemcp:
url: "current"
screenshot: true
```
### File Upload Testing
```
# Navigate to upload form
web_navigate_cremotemcp:
url: "https://example.com/upload"
# Upload a file
web_interact_cremotemcp:
action: "upload"
selector: "input[type='file']"
value: "/path/to/test-file.pdf"
# Submit the upload form
web_interact_cremotemcp:
action: "click"
selector: "button.upload-submit"
```
## Tool Response Format
Both tools return structured responses:
**Success Response:**
```
"Successfully navigated to https://example.com in tab ABC123 (screenshot saved to /tmp/navigate-1234567890.png)"
```
**Error Response:**
```
"failed to load URL: context deadline exceeded"
```
## CSS Selector Best Practices
### Recommended Selector Types:
1. **ID selectors**: `#unique-id` (most reliable)
2. **Name attributes**: `input[name='fieldname']`
3. **Class combinations**: `.primary.button`
4. **Attribute selectors**: `button[data-action='submit']`
### Avoid These Selectors:
1. **Generic tags**: `div`, `span`, `input` (too broad)
2. **Position-based**: `:nth-child()` (fragile)
3. **Text-based**: `:contains()` (not standard CSS)
This documentation should help LLMs effectively use the cremote MCP tools for web automation tasks.