accessibility

This commit is contained in:
Josh at WLTechBlog
2025-08-29 12:11:54 -05:00
parent 6bad614f9e
commit 7f4d8b8e84
12 changed files with 2708 additions and 1577 deletions

View File

@@ -4,7 +4,7 @@ This guide explains how LLMs can use the cremote MCP (Model Context Protocol) to
## 🎉 Complete Web Automation Platform
The cremote MCP server provides **27 comprehensive web automation tools** organized across 5 enhancement phases:
The cremote MCP server provides **30 comprehensive web automation tools** organized across 6 enhancement phases:
- **Core Tools (10)**: Essential web automation capabilities
- **Phase 1 (2)**: Element state checking and conditional logic
@@ -12,8 +12,9 @@ The cremote MCP server provides **27 comprehensive web automation tools** organi
- **Phase 3 (3)**: Form analysis and bulk operations
- **Phase 4 (4)**: Page state and metadata tools
- **Phase 5 (4)**: Enhanced screenshots and file management
- **Phase 6 (3)**: Accessibility tree support for semantic understanding
## Available Tools (27 Total)
## Available Tools (30 Total)
### 1. `web_navigate_cremotemcp`
Navigate to URLs and optionally take screenshots.
@@ -1337,12 +1338,89 @@ web_extract_multiple_cremotemcp:
footer_text: "footer"
```
### Phase 6: Accessibility Tree Support (3 Tools)
#### `get_accessibility_tree_cremotemcp`
Get the full accessibility tree for a page or with limited depth for semantic understanding.
**Parameters:**
- `tab` (optional): Tab ID, uses current tab if not specified
- `depth` (optional): Maximum depth to retrieve, omit for full tree
- `timeout` (optional): Timeout in seconds, default 5
**Use Cases:**
- Accessibility testing and compliance verification
- Understanding page structure for screen readers
- Semantic element discovery and analysis
**Example:**
```
get_accessibility_tree_cremotemcp:
depth: 3
timeout: 10
```
#### `get_partial_accessibility_tree_cremotemcp`
Get accessibility tree for a specific element and its relatives (ancestors, siblings, children).
**Parameters:**
- `selector`: CSS selector for the target element (required)
- `tab` (optional): Tab ID, uses current tab if not specified
- `fetch_relatives` (optional): Include relatives, default true
- `timeout` (optional): Timeout in seconds, default 5
**Use Cases:**
- Focused accessibility analysis of specific components
- Form accessibility structure analysis
- Widget accessibility verification
**Example:**
```
get_partial_accessibility_tree_cremotemcp:
selector: "form.login-form"
fetch_relatives: true
timeout: 10
```
#### `query_accessibility_tree_cremotemcp`
Query accessibility tree for nodes matching specific criteria (accessible name, role, or scope).
**Parameters:**
- `tab` (optional): Tab ID, uses current tab if not specified
- `selector` (optional): CSS selector to limit search scope
- `accessible_name` (optional): Accessible name to match
- `role` (optional): ARIA role to match (e.g., "button", "textbox", "link")
- `timeout` (optional): Timeout in seconds, default 5
**Use Cases:**
- Find elements by their accessible names (what screen readers announce)
- Locate elements by ARIA roles for semantic interaction
- Accessibility-aware element discovery and testing
**Examples:**
```
# Find all buttons on the page
query_accessibility_tree_cremotemcp:
role: "button"
# Find submit button by accessible name
query_accessibility_tree_cremotemcp:
accessible_name: "Submit"
role: "button"
# Find form controls within a specific form
query_accessibility_tree_cremotemcp:
selector: "form.checkout"
role: "textbox"
```
## Integration Notes
- Tools use the `_cremotemcp` suffix to avoid naming conflicts
- Responses include success status and descriptive messages
- Screenshots are saved to `/tmp/` directory with timestamps
- The underlying cremote daemon handles browser management
- Accessibility tree tools provide semantic understanding of page structure
## Advanced Usage Examples
@@ -1483,6 +1561,40 @@ web_screenshot_enhanced_cremotemcp:
full_page: true
```
### Accessibility Testing and Semantic Automation
```
# Navigate to page for accessibility testing
web_navigate_cremotemcp:
url: "https://myapp.com/form"
screenshot: true
# Get full accessibility tree to analyze structure
get_accessibility_tree_cremotemcp:
depth: 3
timeout: 10
# Find form elements by accessible names (more robust than CSS selectors)
query_accessibility_tree_cremotemcp:
accessible_name: "Email Address"
role: "textbox"
# Fill form using accessibility-aware approach
web_interact_cremotemcp:
action: "fill"
selector: "[aria-label='Email Address']"
value: "user@example.com"
# Find and click submit button by accessible name
query_accessibility_tree_cremotemcp:
accessible_name: "Submit Form"
role: "button"
# Verify form accessibility structure
get_partial_accessibility_tree_cremotemcp:
selector: "form"
fetch_relatives: true
```
## 🎯 Best Practices for LLM Agents
### 1. **Use Batch Operations**
@@ -1505,10 +1617,16 @@ web_screenshot_enhanced_cremotemcp:
- Leverage `console_logs_cremotemcp` for JavaScript error detection
- Take `web_screenshot_enhanced_cremotemcp` with metadata for comprehensive documentation
### 5. **Accessibility-Aware Automation**
- Use `query_accessibility_tree_cremotemcp` to find elements by accessible names instead of fragile selectors
- Verify accessibility compliance with `get_accessibility_tree_cremotemcp`
- Test screen reader compatibility by analyzing semantic structure
- Build more robust automation using ARIA roles and accessible names
## 🎉 Production Ready
This comprehensive web automation platform provides **27 tools** across 5 enhancement phases, optimized specifically for LLM agents and production workflows. All tools include proper error handling, timeout management, and structured responses for reliable automation.
This comprehensive web automation platform provides **30 tools** across 6 enhancement phases, optimized specifically for LLM agents and production workflows. All tools include proper error handling, timeout management, and structured responses for reliable automation.
---
**Ready for Production**: Complete web automation platform with 27 tools, designed for maximum efficiency and reliability in LLM-driven workflows.
**Ready for Production**: Complete web automation platform with 30 tools, designed for maximum efficiency and reliability in LLM-driven workflows.

278
mcp/MULTI_CLIENT_GUIDE.md Normal file
View File

@@ -0,0 +1,278 @@
# Cremote MCP Multi-Client Support
The Cremote MCP server now supports multiple concurrent clients with isolated browser sessions. This allows multiple AI agents or applications to use the same cremote daemon simultaneously without interfering with each other's browser state.
## Architecture Overview
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Client A │ │ Client B │ │ Client N │
│ (Agent 1) │ │ (Agent 2) │ │ (Agent N) │
└─────────┬───────┘ └─────────┬───────┘ └─────────┬───────┘
│ │ │
│ HTTP/Session │ HTTP/Session │ HTTP/Session
│ │ │
└──────────────────────┼──────────────────────┘
┌─────────────┴─────────────┐
│ Cremote MCP Server │
│ (Session Manager) │
└─────────────┬─────────────┘
│ TCP
┌─────────────┴─────────────┐
│ Cremote Daemon │
│ (Shared Browser) │
└─────────────┬─────────────┘
│ DevTools Protocol
┌─────────────┴─────────────┐
│ Chrome/Chromium │
│ (All tabs accessible) │
└───────────────────────────┘
```
## Transport Modes
### 1. stdio Transport (Single Client - Legacy)
**Default mode** - Maintains backward compatibility with existing clients.
```bash
# Default mode (stdio)
./cremote-mcp
# Or explicitly set
CREMOTE_TRANSPORT=stdio ./cremote-mcp
```
- **Clients**: 1 concurrent client
- **Communication**: stdin/stdout JSON-RPC
- **Session Management**: Single global state
- **Use Case**: Existing integrations, single-agent scenarios
### 2. HTTP Transport (Multiple Clients - New)
**Multi-client mode** - Supports concurrent clients with session isolation.
```bash
# Enable HTTP transport
CREMOTE_TRANSPORT=http ./cremote-mcp
```
- **Clients**: Multiple concurrent clients
- **Communication**: HTTP POST/GET with JSON-RPC
- **Session Management**: Per-client isolated sessions
- **Use Case**: Multiple agents, concurrent automation
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `CREMOTE_TRANSPORT` | `stdio` | Transport mode: `stdio` or `http` |
| `CREMOTE_HOST` | `localhost` | Cremote daemon host |
| `CREMOTE_PORT` | `8989` | Cremote daemon port |
| `CREMOTE_HTTP_HOST` | `localhost` | HTTP server host (HTTP mode only) |
| `CREMOTE_HTTP_PORT` | `8990` | HTTP server port (HTTP mode only) |
### Example Configurations
#### Single Client (Legacy)
```bash
export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
export CREMOTE_TRANSPORT=stdio
./cremote-mcp
```
#### Multiple Clients
```bash
export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
export CREMOTE_TRANSPORT=http
export CREMOTE_HTTP_HOST=localhost
export CREMOTE_HTTP_PORT=8990
./cremote-mcp
```
## Session Management
### Session Lifecycle
1. **Initialization**: Client sends `initialize` request, receives `Mcp-Session-Id` header
2. **Operations**: All subsequent requests include the session ID header
3. **Isolation**: Each session maintains independent browser state
4. **Cleanup**: Sessions auto-expire after 30 minutes of inactivity
5. **Termination**: Clients can explicitly terminate sessions with DELETE request
### Session State
Each client session maintains isolated state:
- **Current Tab**: Independent active tab tracking
- **Tab History**: Per-client tab navigation history
- **Iframe Context**: Independent iframe mode state
- **Screenshots**: Per-client screenshot collection
### Session Headers
HTTP clients must include session headers:
```http
POST /mcp HTTP/1.1
Content-Type: application/json
Accept: application/json
Mcp-Session-Id: a1b2c3d4e5f6g7h8
MCP-Protocol-Version: 2025-06-18
```
## Testing Multi-Client Setup
### Prerequisites
1. **Start Cremote Daemon**:
```bash
cremotedaemon
```
2. **Start Chrome with Remote Debugging**:
```bash
chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug
```
### Run Multi-Client Test
```bash
cd mcp/
./test_multiclient.sh
```
This test:
- Starts the MCP server in HTTP mode
- Creates 3 concurrent test clients
- Verifies each gets a unique session ID
- Tests session isolation
- Cleans up all sessions
### Manual Testing
1. **Start HTTP Server**:
```bash
CREMOTE_TRANSPORT=http ./cremote-mcp
```
2. **Test with curl**:
```bash
# Initialize session
curl -X POST http://localhost:8990/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-06-18"}}'
# Use returned Mcp-Session-Id for subsequent requests
```
## Benefits
### For AI Agents
- **Concurrent Operations**: Multiple agents can browse simultaneously
- **State Isolation**: No interference between agent sessions
- **Resource Sharing**: Shared browser instance reduces memory usage
- **Session Recovery**: Automatic cleanup prevents resource leaks
### For Developers
- **Scalability**: Support multiple concurrent automations
- **Debugging**: Isolated sessions simplify troubleshooting
- **Flexibility**: Choose transport mode based on use case
- **Compatibility**: Backward compatible with existing stdio clients
## Limitations
### Current Implementation
- **Tool Coverage**: Not all 27 tools are session-aware yet (work in progress)
- **SSE Streaming**: Server-Sent Events not implemented yet
- **Advanced Features**: Some MCP protocol features pending
### Planned Improvements
- Complete tool migration to session-aware handlers
- SSE support for real-time notifications
- Enhanced session management features
- Performance optimizations
## Migration Guide
### From Single Client to Multi-Client
1. **Update Environment**:
```bash
# Old
./cremote-mcp
# New
CREMOTE_TRANSPORT=http ./cremote-mcp
```
2. **Update Client Code**:
- Switch from stdio to HTTP transport
- Handle session ID headers
- Implement proper session cleanup
3. **Test Thoroughly**:
- Verify session isolation
- Test concurrent operations
- Monitor resource usage
### Backward Compatibility
Existing stdio clients continue to work unchanged:
- No code changes required
- Same tool interface
- Same behavior and performance
## Troubleshooting
### Common Issues
1. **Session Not Found (404)**:
- Session expired (30min timeout)
- Invalid session ID
- Server restart cleared sessions
2. **Port Conflicts**:
- Change `CREMOTE_HTTP_PORT` if 8990 is in use
- Ensure cremote daemon port (8989) is available
3. **CORS Issues**:
- Server includes CORS headers for web clients
- Use proper Accept headers in requests
### Debug Mode
Enable debug logging:
```bash
CREMOTE_TRANSPORT=http ./cremote-mcp 2>&1 | tee mcp-debug.log
```
## Next Steps
1. **Complete Tool Migration**: Migrate remaining tools to session-aware handlers
2. **Add SSE Support**: Implement Server-Sent Events for streaming
3. **Enhanced Testing**: Add comprehensive integration tests
4. **Performance Tuning**: Optimize session management and cleanup
5. **Documentation**: Complete API documentation and examples
```bash
CREMOTE_TRANSPORT=http ./cremote-mcp 2>&1 | tee mcp-debug.log
```
## Next Steps
1. **Complete Tool Migration**: Migrate remaining tools to session-aware handlers
2. **Add SSE Support**: Implement Server-Sent Events for streaming
3. **Enhanced Testing**: Add comprehensive integration tests
4. **Performance Tuning**: Optimize session management and cleanup
5. **Documentation**: Complete API documentation and examples

View File

@@ -4,7 +4,18 @@ This is a Model Context Protocol (MCP) server that exposes cremote's web automat
## 🎉 Complete Web Automation Platform
**27 comprehensive tools** across 5 enhancement phases, providing a complete web automation toolkit for LLM agents:
**30 comprehensive tools** across 6 enhancement phases, providing a complete web automation toolkit for LLM agents:
### 🚀 **NEW: Multi-Client Support**
The Cremote MCP server now supports **multiple concurrent clients** with isolated browser sessions:
- **Concurrent Agents**: Multiple AI agents can use the same browser simultaneously
- **Session Isolation**: Each client maintains independent browser state (tabs, history, iframe context)
- **Transport Flexibility**: Choose between stdio (single client) or HTTP (multiple clients)
- **Backward Compatible**: Existing stdio clients continue to work unchanged
See the [Multi-Client Guide](MULTI_CLIENT_GUIDE.md) for detailed setup and usage instructions.
- **Phase 1**: Element state checking and conditional logic (2 tools)
- **Phase 2**: Enhanced data extraction and batch operations (4 tools)
@@ -22,6 +33,7 @@ This is a Model Context Protocol (MCP) server that exposes cremote's web automat
- **Rich Context**: Page metadata, performance metrics, and content verification
- **Enhanced Screenshots**: Element-specific and metadata-rich screenshot capture
- **File Management**: Bulk file operations and automated cleanup
- **Accessibility Tree**: Chrome accessibility tree interface for semantic understanding
- **Automatic Screenshots**: Optional screenshot capture for debugging and documentation
- **Error Recovery**: Better error handling and context for LLMs
- **Resource Management**: Automatic cleanup and connection management
@@ -30,7 +42,7 @@ This is a Model Context Protocol (MCP) server that exposes cremote's web automat
**For LLM agents**: See the comprehensive [LLM Usage Guide](LLM_USAGE_GUIDE.md) for detailed usage instructions, examples, and best practices.
## Available Tools (27 Total)
## Available Tools (30 Total)
### Version Information
@@ -608,13 +620,38 @@ go build -o cremote-mcp .
### Configuration
#### Basic Configuration (Single Client - stdio)
Set environment variables to configure the cremote connection:
```bash
export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
export CREMOTE_TRANSPORT=stdio # Default
```
#### Multi-Client Configuration (HTTP Transport)
For multiple concurrent clients:
```bash
export CREMOTE_HOST=localhost
export CREMOTE_PORT=8989
export CREMOTE_TRANSPORT=http
export CREMOTE_HTTP_HOST=localhost
export CREMOTE_HTTP_PORT=8990
```
#### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `CREMOTE_TRANSPORT` | `stdio` | Transport mode: `stdio` or `http` |
| `CREMOTE_HOST` | `localhost` | Cremote daemon host |
| `CREMOTE_PORT` | `8989` | Cremote daemon port |
| `CREMOTE_HTTP_HOST` | `localhost` | HTTP server host (HTTP mode only) |
| `CREMOTE_HTTP_PORT` | `8990` | HTTP server port (HTTP mode only) |
### Running with Claude Desktop
Add to your Claude Desktop configuration (`~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
@@ -772,6 +809,53 @@ All tool responses include:
}
```
### Phase 6: Accessibility Tree Support (3 Tools)
#### `get_accessibility_tree_cremotemcp`
Get the full accessibility tree for a page or with limited depth.
```json
{
"name": "get_accessibility_tree_cremotemcp",
"arguments": {
"tab": "optional-tab-id",
"depth": 3,
"timeout": 10
}
}
```
#### `get_partial_accessibility_tree_cremotemcp`
Get accessibility tree for a specific element and its relatives.
```json
{
"name": "get_partial_accessibility_tree_cremotemcp",
"arguments": {
"selector": "form",
"tab": "optional-tab-id",
"fetch_relatives": true,
"timeout": 10
}
}
```
#### `query_accessibility_tree_cremotemcp`
Query accessibility tree for nodes matching specific criteria.
```json
{
"name": "query_accessibility_tree_cremotemcp",
"arguments": {
"tab": "optional-tab-id",
"selector": "form",
"accessible_name": "Submit",
"role": "button",
"timeout": 10
}
}
```
## Benefits Over CLI
### 🎯 **Enhanced Efficiency**
@@ -796,8 +880,8 @@ All tool responses include:
This comprehensive web automation platform is **production ready** with:
- **27 Tools**: Complete coverage of web automation needs
- **5 Enhancement Phases**: Systematic capability building from basic to advanced
- **30 Tools**: Complete coverage of web automation needs
- **6 Enhancement Phases**: Systematic capability building from basic to advanced
- **Extensive Testing**: All tools validated and documented
- **LLM Optimized**: Designed specifically for AI agent workflows
- **Backward Compatible**: All existing tools continue to work unchanged
@@ -811,6 +895,7 @@ This comprehensive web automation platform is **production ready** with:
| **Form Automation** | 3 tools | Form analysis, bulk filling, batch interactions |
| **Page Intelligence** | 4 tools | Page state, performance metrics, content verification, viewport info |
| **Enhanced Capabilities** | 4 tools | Element screenshots, enhanced metadata, bulk file ops, file management |
| **Accessibility Tree** | 3 tools | Semantic understanding, accessibility testing, screen reader simulation |
## Development
@@ -825,4 +910,4 @@ The server is designed to be easily extensible while maintaining consistency wit
---
**🚀 Ready for Production**: Complete web automation platform with 27 tools across 5 enhancement phases, optimized for LLM agents and production workflows.
**🚀 Ready for Production**: Complete web automation platform with 30 tools across 6 enhancement phases, optimized for LLM agents and production workflows.

File diff suppressed because it is too large Load Diff