# Chrome Remote Daemon (cremote) A command line utility for automating browser interactions using Chrome's remote debugging protocol. The tool uses a daemon-client architecture to maintain persistent connections to the browser. ## Architecture The tool consists of two main components: 1. **Daemon (`cremotedaemon`)**: A long-running process that connects to Chrome and manages browser state 2. **Client (`cremote`)**: A command-line client that sends commands to the daemon This architecture provides several benefits: - Persistent browser connection across multiple commands - Reliable tab management - No need to reconnect for each command - Better performance ## MCP Server Cremote includes a Model Context Protocol (MCP) server that provides a structured API for LLMs and AI agents. Instead of using CLI commands, the MCP server offers: - **State Management**: Automatic tracking of tabs, history, and iframe context - **Intelligent Abstractions**: High-level tools that combine multiple operations - **Better Error Handling**: Rich error context for debugging - **Automatic Screenshots**: Built-in screenshot capture for documentation See the [MCP Server Documentation](mcp/README.md) for setup and usage instructions. ## Prerequisites - Go 1.16 or higher - A running instance of Chromium/Chrome with remote debugging enabled ### Starting Chromium with Remote Debugging Before using this tool, you **must** start Chromium with remote debugging enabled on port 9222: ```bash chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug ``` or for Chrome: ```bash google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug ``` **Important**: The `--user-data-dir` flag is required to prevent conflicts with existing browser instances. ## Usage ### Starting the Daemon First, start the daemon: ```bash cremotedaemon ``` By default, the daemon listens on port 8989. You can specify a different port: ```bash cremotedaemon --port=9090 ``` ### Using the Client Once the daemon is running, you can use the client to send commands: ``` cremote [options] ``` ### Commands - `open-tab`: Open a new tab and return its ID - `load-url`: Load a URL in a tab - `fill-form`: Fill a form field with a value - `upload-file`: Upload a file to a file input - `submit-form`: Submit a form - `get-source`: Get the source code of a page - `get-element`: Get the HTML of an element - `click-element`: Click on an element - `close-tab`: Close a tab - `wait-navigation`: Wait for a navigation event - `eval-js`: Execute JavaScript code in a tab - `switch-iframe`: Switch to iframe context for subsequent commands - `switch-main`: Switch back to main page context - `list-tabs`: List all open tabs - `status`: Check if the daemon is running ### Current Tab Feature The tool tracks the current tab, so you can omit the `--tab` flag to use the most recently used tab. This makes interactive use more convenient. For example, after opening a tab: ```bash # Open a tab TAB_ID=$(cremote open-tab) # Load a URL in the current tab (no need to specify --tab) cremote load-url --url="https://example.com" # Click an element in the current tab cremote click-element --selector="a.button" ``` You can still specify a tab ID explicitly if you need to work with multiple tabs. Run `cremote -h` for more information on a specific command. ### Examples #### Check Daemon Status ```bash cremote status ``` #### Open a new tab ```bash cremote open-tab [--timeout=5] ``` This will return a tab ID that you can use in subsequent commands. The `--timeout` parameter specifies how many seconds to wait for the tab to open (default: 5 seconds). #### Load a URL in a tab ```bash cremote load-url --tab="" --url="https://example.com" [--timeout=5] ``` The `--timeout` parameter specifies how many seconds to wait for the URL to load (default: 5 seconds). #### Fill a form field ```bash cremote fill-form --tab="" --selector="#username" --value="user123" [--selection-timeout=5] [--action-timeout=5] ``` The `--selection-timeout` parameter specifies how many seconds to wait for the element to appear in the DOM (default: 5 seconds). The `--action-timeout` parameter specifies how many seconds to wait for the fill action to complete (default: 5 seconds). #### Check/uncheck a checkbox or select a radio button The same `fill-form` command can be used to check/uncheck checkboxes or select radio buttons: ```bash # Check a checkbox cremote fill-form --tab="" --selector="#agree" --value="true" # Uncheck a checkbox cremote fill-form --tab="" --selector="#agree" --value="false" # Select a radio button cremote fill-form --tab="" --selector="#option2" --value="true" ``` Accepted values for checking a checkbox or selecting a radio button: `true`, `1`, `yes`, `on`, `checked`. Any other value will uncheck the checkbox or deselect the radio button. #### Upload a file ```bash cremote upload-file --tab="" --selector="input[type=file]" --file="/path/to/file.jpg" [--selection-timeout=5] [--action-timeout=5] ``` The `--selection-timeout` parameter specifies how many seconds to wait for the element to appear in the DOM (default: 5 seconds). The `--action-timeout` parameter specifies how many seconds to wait for the upload action to complete (default: 5 seconds). #### Submit a form ```bash cremote submit-form --tab="" --selector="form#login" [--selection-timeout=5] [--action-timeout=5] ``` The `--selection-timeout` parameter specifies how many seconds to wait for the element to appear in the DOM (default: 5 seconds). The `--action-timeout` parameter specifies how many seconds to wait for the form submission to complete (default: 5 seconds). #### Get the source code of a page ```bash cremote get-source --tab="" [--timeout=5] ``` The `--timeout` parameter specifies how many seconds to wait for getting the page source (default: 5 seconds). #### Get the HTML of an element ```bash cremote get-element --tab="" --selector=".content" [--selection-timeout=5] ``` The `--selection-timeout` parameter specifies how many seconds to wait for the element to appear in the DOM (default: 5 seconds). #### Click on an element ```bash cremote click-element --tab="" --selector="button.submit" [--selection-timeout=5] [--action-timeout=5] ``` The `--selection-timeout` parameter specifies how many seconds to wait for the element to appear in the DOM (default: 5 seconds). The `--action-timeout` parameter specifies how many seconds to wait for the click action to complete (default: 5 seconds). #### Close a tab ```bash cremote close-tab --tab="" [--timeout=5] ``` The `--timeout` parameter specifies how many seconds to wait for the tab to close (default: 5 seconds). #### Wait for navigation to complete ```bash cremote wait-navigation --tab="" [--timeout=5] ``` The `--timeout` parameter specifies how many seconds to wait for navigation to complete (default: 5 seconds). **Note**: `wait-navigation` intelligently detects if navigation is actually happening and returns immediately if the page is already stable, preventing unnecessary waiting. #### Execute JavaScript code ```bash cremote eval-js --tab="" --code="document.getElementById('myElement').innerHTML = 'Hello World!'" [--timeout=5] ``` The `--timeout` parameter specifies how many seconds to wait for the JavaScript execution to complete (default: 5 seconds). This command allows you to execute arbitrary JavaScript code in a tab. Examples: - Set element content: `--code="document.getElementById('tinymce').innerHTML='Foo!'"` - Get element text: `--code="document.querySelector('.result').textContent"` - Trigger events: `--code="document.getElementById('button').click()"` - Manipulate DOM: `--code="document.body.style.backgroundColor = 'red'"` The command handles both JavaScript expressions and statements: - **Expressions** (return values): `document.title`, `2 + 3`, `element.textContent` - **Statements** (assignments/actions): `document.title = 'New Title'`, `element.click()` For statements, the command returns "undefined". For expressions, it returns the result as a string. #### Take a screenshot ```bash cremote screenshot --tab="" --output="/path/to/screenshot.png" [--full-page] [--timeout=5] ``` The `--output` parameter specifies where to save the screenshot (PNG format). The `--full-page` flag captures the entire page instead of just the viewport (default: viewport only). The `--timeout` parameter specifies how many seconds to wait for the screenshot to complete (default: 5 seconds). #### Working with iframes To interact with content inside an iframe, you need to switch the context: ```bash # Switch to iframe context cremote switch-iframe --tab="" --selector="iframe#payment-form" # Now all subsequent commands will operate within the iframe cremote fill-form --selector="#card-number" --value="4111111111111111" cremote fill-form --selector="#expiry" --value="12/25" cremote click-element --selector="#submit-payment" # Switch back to main page context cremote switch-main --tab="" # Now commands operate on the main page again cremote get-element --selector=".success-message" ``` **Important Notes:** - Once you switch to an iframe, all subsequent commands (fill-form, click-element, eval-js, etc.) operate within that iframe - You must use `switch-main` to return to the main page context - Each tab maintains its own iframe context independently - Iframe context persists until explicitly switched back to main or the tab is closed #### List all open tabs ```bash cremote list-tabs ``` This will display all open tabs with their IDs and URLs. The current tab is marked with an asterisk (*) ### Connecting to a Remote Daemon By default, the client connects to a daemon running on localhost. To connect to a daemon running on a different host: ```bash cremote open-tab --host="remote-host" --port=8989 ``` ## Automation Example Here's an example of how to use cremote in a shell script to automate a login process: ```bash #!/bin/bash # Make sure Chromium is running with remote debugging enabled # chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug & # Make sure the daemon is running # cremotedaemon & # Open a new tab TAB_ID=$(cremote open-tab) # Load the login page (using the current tab) cremote load-url --url="https://example.com/login" # Fill in the username and password (using the current tab) cremote fill-form --selector="#username" --value="user123" cremote fill-form --selector="#password" --value="password123" # Check the 'Remember me' checkbox cremote fill-form --selector="#remember" --value="true" # Accept the terms and conditions cremote fill-form --selector="#terms" --value="true" # Either submit the form using the form selector (using the current tab) cremote submit-form --selector="form#login" # Or click the login button (using the current tab) # cremote click-element --selector="#login-button" # You can still specify a tab ID explicitly if needed # cremote load-url --tab="$TAB_ID" --url="https://example.com/login" # Wait for navigation to complete (using the current tab) cremote wait-navigation --timeout=30 # Execute JavaScript to check if login was successful LOGIN_STATUS=$(cremote eval-js --code="document.querySelector('.welcome-message') !== null") if [ "$LOGIN_STATUS" = "true" ]; then echo "Login successful!" fi # Example: Working with an iframe (e.g., payment form) # Switch to iframe context cremote switch-iframe --selector="iframe.payment-frame" # Fill payment form inside iframe cremote fill-form --selector="#card-number" --value="4111111111111111" cremote fill-form --selector="#expiry-date" --value="12/25" cremote click-element --selector="#pay-button" # Switch back to main page cremote switch-main # Get the source code of the page after login (using the current tab) cremote get-source # Take a screenshot of the logged-in page cremote screenshot --output="/tmp/login-success.png" # Take a full-page screenshot for documentation cremote screenshot --output="/tmp/full-page.png" --full-page # Close the current tab cremote close-tab ``` ## Troubleshooting ### Daemon Not Running If you see an error like "connection refused", make sure the daemon is running: ```bash cremote status ``` If the daemon is not running, start it: ```bash cremotedaemon ``` ### Connection Issues If the daemon can't connect to Chromium, check the following: 1. Make sure Chromium/Chrome is running with remote debugging enabled on port 9222 2. Verify that Chromium was started with the correct flags: `--remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug` 3. Check if you can access the Chromium DevTools Protocol by opening `http://localhost:9222/json/version` in your browser ### Tab Management The daemon manages tab IDs for you, so you don't need to worry about tab persistence between commands. However: 1. Tab IDs are only valid for the duration of the browser session 2. If Chromium is restarted, you'll need to get new tab IDs 3. Store the tab ID returned by `open-tab` in a variable for use in subsequent commands 4. If a tab is closed by Chromium (not through the tool), you may need to run `open-tab` again ## License MIT