423 lines
13 KiB
Markdown
423 lines
13 KiB
Markdown
# Chrome Remote Daemon (cremote)
|
|
|
|
A command line utility for automating browser interactions using Chrome's remote debugging protocol. The tool uses a daemon-client architecture to maintain persistent connections to the browser.
|
|
|
|
## Architecture
|
|
|
|
The tool consists of two main components:
|
|
|
|
1. **Daemon (`cremotedaemon`)**: A long-running process that connects to Chrome and manages browser state
|
|
2. **Client (`cremote`)**: A command-line client that sends commands to the daemon
|
|
|
|
This architecture provides several benefits:
|
|
- Persistent browser connection across multiple commands
|
|
- Reliable tab management
|
|
- No need to reconnect for each command
|
|
- Better performance
|
|
|
|
## MCP Server
|
|
|
|
Cremote includes a Model Context Protocol (MCP) server that provides a structured API for LLMs and AI agents. Instead of using CLI commands, the MCP server offers:
|
|
|
|
- **State Management**: Automatic tracking of tabs, history, and iframe context
|
|
- **Intelligent Abstractions**: High-level tools that combine multiple operations
|
|
- **Better Error Handling**: Rich error context for debugging
|
|
- **Automatic Screenshots**: Built-in screenshot capture for documentation
|
|
|
|
See the [MCP Server Documentation](mcp/README.md) for setup and usage instructions.
|
|
|
|
## Prerequisites
|
|
|
|
- Go 1.16 or higher
|
|
- A running instance of Chromium/Chrome with remote debugging enabled
|
|
|
|
### Starting Chromium with Remote Debugging
|
|
|
|
Before using this tool, you **must** start Chromium with remote debugging enabled on port 9222:
|
|
|
|
```bash
|
|
chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug
|
|
```
|
|
|
|
or for Chrome:
|
|
|
|
```bash
|
|
google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
|
|
```
|
|
|
|
**Important**: The `--user-data-dir` flag is required to prevent conflicts with existing browser instances.
|
|
|
|
|
|
## Usage
|
|
|
|
### Starting the Daemon
|
|
|
|
First, start the daemon:
|
|
|
|
```bash
|
|
cremotedaemon
|
|
```
|
|
|
|
By default, the daemon listens on port 8989. You can specify a different port:
|
|
|
|
```bash
|
|
cremotedaemon --port=9090
|
|
```
|
|
|
|
### Using the Client
|
|
|
|
Once the daemon is running, you can use the client to send commands:
|
|
|
|
```
|
|
cremote <command> [options]
|
|
```
|
|
|
|
### Commands
|
|
|
|
- `version`: Show version information for CLI and daemon
|
|
- `open-tab`: Open a new tab and return its ID
|
|
- `load-url`: Load a URL in a tab
|
|
- `fill-form`: Fill a form field with a value (also handles checkboxes, radio buttons, and dropdown selections)
|
|
- `upload-file`: Upload a file to a file input
|
|
- `submit-form`: Submit a form
|
|
- `get-source`: Get the source code of a page
|
|
- `get-element`: Get the HTML of an element
|
|
- `click-element`: Click on an element
|
|
- `close-tab`: Close a tab
|
|
- `wait-navigation`: Wait for a navigation event
|
|
- `eval-js`: Execute JavaScript code in a tab
|
|
- `switch-iframe`: Switch to iframe context for subsequent commands
|
|
- `switch-main`: Switch back to main page context
|
|
- `list-tabs`: List all open tabs
|
|
- `status`: Check if the daemon is running
|
|
|
|
### Current Tab Feature
|
|
|
|
The tool tracks the current tab, so you can omit the `--tab` flag to use the most recently used tab. This makes interactive use more convenient.
|
|
|
|
For example, after opening a tab:
|
|
|
|
```bash
|
|
# Open a tab
|
|
TAB_ID=$(cremote open-tab)
|
|
|
|
# Load a URL in the current tab (no need to specify --tab)
|
|
cremote load-url --url="https://example.com"
|
|
|
|
# Click an element in the current tab
|
|
cremote click-element --selector="a.button"
|
|
```
|
|
|
|
You can still specify a tab ID explicitly if you need to work with multiple tabs.
|
|
|
|
Run `cremote <command> -h` for more information on a specific command.
|
|
|
|
### Examples
|
|
|
|
#### Check Daemon Status
|
|
|
|
```bash
|
|
cremote status
|
|
```
|
|
|
|
#### Open a new tab
|
|
|
|
```bash
|
|
cremote open-tab [--timeout=5]
|
|
```
|
|
|
|
This will return a tab ID that you can use in subsequent commands. The `--timeout` parameter specifies how many seconds to wait for the tab to open (default: 5 seconds).
|
|
|
|
#### Load a URL in a tab
|
|
|
|
```bash
|
|
cremote load-url --tab="<tab-id>" --url="https://example.com" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for the URL to load (default: 5 seconds).
|
|
|
|
#### Fill a form field
|
|
|
|
```bash
|
|
cremote fill-form --tab="<tab-id>" --selector="#username" --value="user123" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for the fill operation to complete (default: 5 seconds).
|
|
|
|
#### Check/uncheck a checkbox or select a radio button
|
|
|
|
The same `fill-form` command can be used to check/uncheck checkboxes or select radio buttons:
|
|
|
|
```bash
|
|
# Check a checkbox
|
|
cremote fill-form --tab="<tab-id>" --selector="#agree" --value="true"
|
|
|
|
# Uncheck a checkbox
|
|
cremote fill-form --tab="<tab-id>" --selector="#agree" --value="false"
|
|
|
|
# Select a radio button
|
|
cremote fill-form --tab="<tab-id>" --selector="#option2" --value="true"
|
|
```
|
|
|
|
Accepted values for checking a checkbox or selecting a radio button: `true`, `1`, `yes`, `on`, `checked`.
|
|
Any other value will uncheck the checkbox or deselect the radio button.
|
|
|
|
#### Select dropdown options
|
|
|
|
The `fill-form` command can also be used to select options in dropdown elements:
|
|
|
|
```bash
|
|
# Select by option text (visible text)
|
|
cremote fill-form --tab="<tab-id>" --selector="#country" --value="United States"
|
|
|
|
# Select by option value (value attribute)
|
|
cremote fill-form --tab="<tab-id>" --selector="#state" --value="CA"
|
|
```
|
|
|
|
The command automatically detects dropdown elements and tries both option text and option value matching. This works with both `<select>` elements and custom dropdown implementations.
|
|
|
|
#### Upload a file
|
|
|
|
```bash
|
|
cremote upload-file --tab="<tab-id>" --selector="input[type=file]" --file="/path/to/file.jpg" [--timeout=5]
|
|
```
|
|
|
|
This command automatically:
|
|
1. **Transfers the file** from your local machine to the daemon container (if running in a container)
|
|
2. **Uploads the file** to the specified file input element on the web page
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for the upload operation to complete (default: 5 seconds).
|
|
|
|
**Note**: The file path should be the local path on your machine. The command will handle transferring it to the daemon container automatically.
|
|
|
|
#### Submit a form
|
|
|
|
```bash
|
|
cremote submit-form --tab="<tab-id>" --selector="form#login" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for the form submission to complete (default: 5 seconds).
|
|
|
|
#### Get the source code of a page
|
|
|
|
```bash
|
|
cremote get-source --tab="<tab-id>" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for getting the page source (default: 5 seconds).
|
|
|
|
#### Get the HTML of an element
|
|
|
|
```bash
|
|
cremote get-element --tab="<tab-id>" --selector=".content" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for the element to appear in the DOM (default: 5 seconds).
|
|
|
|
#### Click on an element
|
|
|
|
```bash
|
|
cremote click-element --tab="<tab-id>" --selector="button.submit" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for the click operation to complete (default: 5 seconds).
|
|
|
|
#### Close a tab
|
|
|
|
```bash
|
|
cremote close-tab --tab="<tab-id>" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for the tab to close (default: 5 seconds).
|
|
|
|
#### Wait for navigation to complete
|
|
|
|
```bash
|
|
cremote wait-navigation --tab="<tab-id>" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for navigation to complete (default: 5 seconds).
|
|
|
|
**Note**: `wait-navigation` intelligently detects if navigation is actually happening and returns immediately if the page is already stable, preventing unnecessary waiting.
|
|
|
|
#### Execute JavaScript code
|
|
|
|
```bash
|
|
cremote eval-js --tab="<tab-id>" --code="document.getElementById('myElement').innerHTML = 'Hello World!'" [--timeout=5]
|
|
```
|
|
|
|
The `--timeout` parameter specifies how many seconds to wait for the JavaScript execution to complete (default: 5 seconds).
|
|
|
|
This command allows you to execute arbitrary JavaScript code in a tab. Examples:
|
|
- Set element content: `--code="document.getElementById('tinymce').innerHTML='Foo!'"`
|
|
- Get element text: `--code="document.querySelector('.result').textContent"`
|
|
- Trigger events: `--code="document.getElementById('button').click()"`
|
|
- Manipulate DOM: `--code="document.body.style.backgroundColor = 'red'"`
|
|
|
|
The command handles both JavaScript expressions and statements:
|
|
- **Expressions** (return values): `document.title`, `2 + 3`, `element.textContent`
|
|
- **Statements** (assignments/actions): `document.title = 'New Title'`, `element.click()`
|
|
|
|
For statements, the command returns "undefined". For expressions, it returns the result as a string.
|
|
|
|
#### Take a screenshot
|
|
|
|
```bash
|
|
cremote screenshot --tab="<tab-id>" --output="/path/to/screenshot.png" [--full-page] [--timeout=5]
|
|
```
|
|
|
|
The `--output` parameter specifies where to save the screenshot (PNG format).
|
|
The `--full-page` flag captures the entire page instead of just the viewport (default: viewport only).
|
|
The `--timeout` parameter specifies how many seconds to wait for the screenshot to complete (default: 5 seconds).
|
|
|
|
#### Working with iframes
|
|
|
|
To interact with content inside an iframe, you need to switch the context:
|
|
|
|
```bash
|
|
# Switch to iframe context
|
|
cremote switch-iframe --tab="<tab-id>" --selector="iframe#payment-form"
|
|
|
|
# Now all subsequent commands will operate within the iframe
|
|
cremote fill-form --selector="#card-number" --value="4111111111111111"
|
|
cremote fill-form --selector="#expiry" --value="12/25"
|
|
cremote click-element --selector="#submit-payment"
|
|
|
|
# Switch back to main page context
|
|
cremote switch-main --tab="<tab-id>"
|
|
|
|
# Now commands operate on the main page again
|
|
cremote get-element --selector=".success-message"
|
|
```
|
|
|
|
**Important Notes:**
|
|
- Once you switch to an iframe, all subsequent commands (fill-form, click-element, eval-js, etc.) operate within that iframe
|
|
- You must use `switch-main` to return to the main page context
|
|
- Each tab maintains its own iframe context independently
|
|
- Iframe context persists until explicitly switched back to main or the tab is closed
|
|
|
|
#### List all open tabs
|
|
|
|
```bash
|
|
cremote list-tabs
|
|
```
|
|
|
|
This will display all open tabs with their IDs and URLs. The current tab is marked with an asterisk (*)
|
|
|
|
### Connecting to a Remote Daemon
|
|
|
|
By default, the client connects to a daemon running on localhost. To connect to a daemon running on a different host:
|
|
|
|
```bash
|
|
cremote open-tab --host="remote-host" --port=8989
|
|
```
|
|
|
|
## Automation Example
|
|
|
|
Here's an example of how to use cremote in a shell script to automate a login process:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
|
|
# Make sure Chromium is running with remote debugging enabled
|
|
# chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug &
|
|
|
|
# Make sure the daemon is running
|
|
# cremotedaemon &
|
|
|
|
# Open a new tab
|
|
TAB_ID=$(cremote open-tab)
|
|
|
|
# Load the login page (using the current tab)
|
|
cremote load-url --url="https://example.com/login"
|
|
|
|
# Fill in the username and password (using the current tab)
|
|
cremote fill-form --selector="#username" --value="user123"
|
|
cremote fill-form --selector="#password" --value="password123"
|
|
|
|
# Check the 'Remember me' checkbox
|
|
cremote fill-form --selector="#remember" --value="true"
|
|
|
|
# Accept the terms and conditions
|
|
cremote fill-form --selector="#terms" --value="true"
|
|
|
|
# Either submit the form using the form selector (using the current tab)
|
|
cremote submit-form --selector="form#login"
|
|
|
|
# Or click the login button (using the current tab)
|
|
# cremote click-element --selector="#login-button"
|
|
|
|
# You can still specify a tab ID explicitly if needed
|
|
# cremote load-url --tab="$TAB_ID" --url="https://example.com/login"
|
|
|
|
# Wait for navigation to complete (using the current tab)
|
|
cremote wait-navigation --timeout=30
|
|
|
|
# Execute JavaScript to check if login was successful
|
|
LOGIN_STATUS=$(cremote eval-js --code="document.querySelector('.welcome-message') !== null")
|
|
if [ "$LOGIN_STATUS" = "true" ]; then
|
|
echo "Login successful!"
|
|
fi
|
|
|
|
# Example: Working with an iframe (e.g., payment form)
|
|
# Switch to iframe context
|
|
cremote switch-iframe --selector="iframe.payment-frame"
|
|
|
|
# Fill payment form inside iframe
|
|
cremote fill-form --selector="#card-number" --value="4111111111111111"
|
|
cremote fill-form --selector="#expiry-date" --value="12/25"
|
|
cremote click-element --selector="#pay-button"
|
|
|
|
# Switch back to main page
|
|
cremote switch-main
|
|
|
|
# Get the source code of the page after login (using the current tab)
|
|
cremote get-source
|
|
|
|
# Take a screenshot of the logged-in page
|
|
cremote screenshot --output="/tmp/login-success.png"
|
|
|
|
# Take a full-page screenshot for documentation
|
|
cremote screenshot --output="/tmp/full-page.png" --full-page
|
|
|
|
# Close the current tab
|
|
cremote close-tab
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Daemon Not Running
|
|
|
|
If you see an error like "connection refused", make sure the daemon is running:
|
|
|
|
```bash
|
|
cremote status
|
|
```
|
|
|
|
If the daemon is not running, start it:
|
|
|
|
```bash
|
|
cremotedaemon
|
|
```
|
|
|
|
### Connection Issues
|
|
|
|
If the daemon can't connect to Chromium, check the following:
|
|
|
|
1. Make sure Chromium/Chrome is running with remote debugging enabled on port 9222
|
|
2. Verify that Chromium was started with the correct flags: `--remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug`
|
|
3. Check if you can access the Chromium DevTools Protocol by opening `http://localhost:9222/json/version` in your browser
|
|
|
|
### Tab Management
|
|
|
|
The daemon manages tab IDs for you, so you don't need to worry about tab persistence between commands. However:
|
|
|
|
1. Tab IDs are only valid for the duration of the browser session
|
|
2. If Chromium is restarted, you'll need to get new tab IDs
|
|
3. Store the tab ID returned by `open-tab` in a variable for use in subsequent commands
|
|
4. If a tab is closed by Chromium (not through the tool), you may need to run `open-tab` again
|
|
|
|
## License
|
|
|
|
MIT
|