shortcut/cremote

Fork 0

Go to file

Josh at WLTechBlog f09cdc2973 crash fix

2025-12-24 11:00:37 -07:00

.roo

bump

2025-10-16 10:54:37 -05:00

browser

import

2025-08-12 10:19:13 -05:00

client

bump

2025-12-16 12:26:36 -07:00

daemon

crash fix

2025-12-24 11:00:37 -07:00

docs

bump

2025-12-16 12:26:36 -07:00

feedback

bump

2025-12-16 12:26:36 -07:00

mcp

bump

2025-12-16 12:26:36 -07:00

.gitignore

multiple

2025-08-18 14:06:05 -05:00

CHANGELOG.md

bump

2025-12-09 14:58:00 -07:00

contrast_detection_enhancement.md

enhance contrast issue detection

2025-12-08 13:51:29 -07:00

DIVI_EXTRACTION_IMPLEMENTATION.md

bump

2025-12-16 12:26:36 -07:00

error.md

crash fix

2025-12-24 11:00:37 -07:00

FIX_SUMMARY.md

fix crash

2025-12-12 07:57:09 -07:00

go.mod

import

2025-08-12 10:19:13 -05:00

go.sum

import

2025-08-12 10:19:13 -05:00

GRADIENT_CONTRAST_FIX.md

bump

2025-12-09 14:58:00 -07:00

IMPLEMENTATION_PLAN.md

bump

2025-12-16 12:26:36 -07:00

main.go

bump

2025-12-09 14:58:00 -07:00

Makefile

bump

2025-10-01 12:38:03 -05:00

README.md

first commit

2025-10-03 15:04:19 -05:00

README.md

Chrome Remote Daemon (cremote)

A command line utility for automating browser interactions using Chrome's remote debugging protocol. The tool uses a daemon-client architecture to maintain persistent connections to the browser.

Architecture

The tool consists of two main components:

Daemon (cremotedaemon): A long-running process that connects to Chrome and manages browser state
Client (cremote): A command-line client that sends commands to the daemon

This architecture provides several benefits:

Persistent browser connection across multiple commands
Reliable tab management
No need to reconnect for each command
Better performance

MCP Server

Cremote includes a Model Context Protocol (MCP) server that provides a structured API for LLMs and AI agents. Instead of using CLI commands, the MCP server offers:

State Management: Automatic tracking of tabs, history, and iframe context
Intelligent Abstractions: High-level tools that combine multiple operations
Better Error Handling: Rich error context for debugging
Automatic Screenshots: Built-in screenshot capture for documentation

See the MCP Server Documentation for setup and usage instructions.

Prerequisites

Go 1.16 or higher
A running instance of Chromium/Chrome with remote debugging enabled

Starting Chromium with Remote Debugging

Before using this tool, you must start Chromium with remote debugging enabled on port 9222:

chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug

or for Chrome:

google-chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug

Important: The --user-data-dir flag is required to prevent conflicts with existing browser instances.

Usage

Starting the Daemon

First, start the daemon:

cremotedaemon

By default, the daemon listens on port 8989. You can specify a different port:

cremotedaemon --port=9090

Using the Client

Once the daemon is running, you can use the client to send commands:

cremote <command> [options]

Commands

version: Show version information for CLI and daemon
open-tab: Open a new tab and return its ID
load-url: Load a URL in a tab
fill-form: Fill a form field with a value (also handles checkboxes, radio buttons, and dropdown selections)
upload-file: Upload a file to a file input
submit-form: Submit a form
get-source: Get the source code of a page
get-element: Get the HTML of an element
click-element: Click on an element
close-tab: Close a tab
wait-navigation: Wait for a navigation event
eval-js: Execute JavaScript code in a tab
switch-iframe: Switch to iframe context for subsequent commands
switch-main: Switch back to main page context
list-tabs: List all open tabs
disable-cache: Disable browser cache for a tab
enable-cache: Enable browser cache for a tab
clear-cache: Clear browser cache for a tab
clear-all-site-data: Clear all site data (cookies, storage, cache, etc.)
clear-cookies: Clear cookies for a tab
clear-storage: Clear web storage (localStorage, sessionStorage, etc.)
drag-and-drop: Drag and drop from source element to target element
drag-and-drop-coordinates: Drag and drop from source element to specific coordinates
drag-and-drop-offset: Drag and drop from source element by relative offset
right-click: Right-click on an element to open context menus
double-click: Double-click on an element for file operations or text selection
middle-click: Middle-click on an element (typically opens links in new tabs)
hover: Hover over an element to trigger tooltips or dropdowns
mouse-move: Move mouse to specific coordinates without clicking
scroll-wheel: Scroll with mouse wheel at specific coordinates
key-combination: Send key combinations (Ctrl+C, Alt+Tab, Shift+Enter, etc.)
special-key: Send special keys (Enter, Escape, Tab, F1-F12, Arrow keys, etc.)
modifier-click: Click with modifier keys (Ctrl+click, Shift+click for multi-selection)
status: Check if the daemon is running

Current Tab Feature

The tool tracks the current tab, so you can omit the --tab flag to use the most recently used tab. This makes interactive use more convenient.

For example, after opening a tab:

# Open a tab
TAB_ID=$(cremote open-tab)

# Load a URL in the current tab (no need to specify --tab)
cremote load-url --url="https://example.com"

# Click an element in the current tab
cremote click-element --selector="a.button"

You can still specify a tab ID explicitly if you need to work with multiple tabs.

Run cremote <command> -h for more information on a specific command.

Examples

Check Daemon Status

cremote status

Open a new tab

cremote open-tab [--timeout=5]

This will return a tab ID that you can use in subsequent commands. The --timeout parameter specifies how many seconds to wait for the tab to open (default: 5 seconds).

Load a URL in a tab

cremote load-url --tab="<tab-id>" --url="https://example.com" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for the URL to load (default: 5 seconds).

Fill a form field

cremote fill-form --tab="<tab-id>" --selector="#username" --value="user123" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for the fill operation to complete (default: 5 seconds).

Check/uncheck a checkbox or select a radio button

The same fill-form command can be used to check/uncheck checkboxes or select radio buttons:

# Check a checkbox
cremote fill-form --tab="<tab-id>" --selector="#agree" --value="true"

# Uncheck a checkbox
cremote fill-form --tab="<tab-id>" --selector="#agree" --value="false"

# Select a radio button
cremote fill-form --tab="<tab-id>" --selector="#option2" --value="true"

Accepted values for checking a checkbox or selecting a radio button: true, 1, yes, on, checked. Any other value will uncheck the checkbox or deselect the radio button.

The fill-form command can also be used to select options in dropdown elements:

# Select by option text (visible text)
cremote fill-form --tab="<tab-id>" --selector="#country" --value="United States"

# Select by option value (value attribute)
cremote fill-form --tab="<tab-id>" --selector="#state" --value="CA"

The command automatically detects dropdown elements and tries both option text and option value matching. This works with both <select> elements and custom dropdown implementations.

Upload a file

cremote upload-file --tab="<tab-id>" --selector="input[type=file]" --file="/path/to/file.jpg" [--timeout=5]

This command automatically:

Transfers the file from your local machine to the daemon container (if running in a container)
Uploads the file to the specified file input element on the web page

The --timeout parameter specifies how many seconds to wait for the upload operation to complete (default: 5 seconds).

Note: The file path should be the local path on your machine. The command will handle transferring it to the daemon container automatically.

Submit a form

cremote submit-form --tab="<tab-id>" --selector="form#login" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for the form submission to complete (default: 5 seconds).

Get the source code of a page

cremote get-source --tab="<tab-id>" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for getting the page source (default: 5 seconds).

Get the HTML of an element

cremote get-element --tab="<tab-id>" --selector=".content" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for the element to appear in the DOM (default: 5 seconds).

Click on an element

cremote click-element --tab="<tab-id>" --selector="button.submit" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for the click operation to complete (default: 5 seconds).

Close a tab

cremote close-tab --tab="<tab-id>" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for the tab to close (default: 5 seconds).

cremote wait-navigation --tab="<tab-id>" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for navigation to complete (default: 5 seconds).

Note: wait-navigation intelligently detects if navigation is actually happening and returns immediately if the page is already stable, preventing unnecessary waiting.

Execute JavaScript code

cremote eval-js --tab="<tab-id>" --code="document.getElementById('myElement').innerHTML = 'Hello World!'" [--timeout=5]

The --timeout parameter specifies how many seconds to wait for the JavaScript execution to complete (default: 5 seconds).

This command allows you to execute arbitrary JavaScript code in a tab. Examples:

Set element content: --code="document.getElementById('tinymce').innerHTML='Foo!'"
Get element text: --code="document.querySelector('.result').textContent"
Trigger events: --code="document.getElementById('button').click()"
Manipulate DOM: --code="document.body.style.backgroundColor = 'red'"

The command handles both JavaScript expressions and statements:

Expressions (return values): document.title, 2 + 3, element.textContent
Statements (assignments/actions): document.title = 'New Title', element.click()

For statements, the command returns "undefined". For expressions, it returns the result as a string.

Take a screenshot

cremote screenshot --tab="<tab-id>" --output="/path/to/screenshot.png" [--full-page] [--timeout=5]

The --output parameter specifies where to save the screenshot (PNG format). The --full-page flag captures the entire page instead of just the viewport (default: viewport only). The --timeout parameter specifies how many seconds to wait for the screenshot to complete (default: 5 seconds).

Working with iframes

To interact with content inside an iframe, you need to switch the context:

# Switch to iframe context
cremote switch-iframe --tab="<tab-id>" --selector="iframe#payment-form"

# Now all subsequent commands will operate within the iframe
cremote fill-form --selector="#card-number" --value="4111111111111111"
cremote fill-form --selector="#expiry" --value="12/25"
cremote click-element --selector="#submit-payment"

# Switch back to main page context
cremote switch-main --tab="<tab-id>"

# Now commands operate on the main page again
cremote get-element --selector=".success-message"

Important Notes:

Once you switch to an iframe, all subsequent commands (fill-form, click-element, eval-js, etc.) operate within that iframe
You must use switch-main to return to the main page context
Each tab maintains its own iframe context independently
Iframe context persists until explicitly switched back to main or the tab is closed

List all open tabs

cremote list-tabs

This will display all open tabs with their IDs and URLs. The current tab is marked with an asterisk (*)

Cache and Site Data Management

You can control browser cache and site data for testing, performance optimization, and privacy:

# Cache Management
# Disable cache for current tab (useful for testing)
cremote disable-cache [--tab="<tab-id>"] [--timeout=5]

# Enable cache for current tab
cremote enable-cache [--tab="<tab-id>"] [--timeout=5]

# Clear browser cache for current tab
cremote clear-cache [--tab="<tab-id>"] [--timeout=5]

# Site Data Management
# Clear ALL site data (cookies, storage, cache, etc.)
cremote clear-all-site-data [--tab="<tab-id>"] [--timeout=10]

# Clear only cookies
cremote clear-cookies [--tab="<tab-id>"] [--timeout=5]

# Clear only web storage (localStorage, sessionStorage, IndexedDB, etc.)
cremote clear-storage [--tab="<tab-id>"] [--timeout=5]

Use Cases:

Testing: Disable cache to ensure fresh page loads without cached resources
Performance Testing: Clear cache to test cold load performance
Debugging: Clear cache to resolve cache-related issues
Development: Disable cache during development to see changes immediately
Authentication Testing: Clear cookies to test login/logout flows
Privacy Testing: Clear all site data to test clean state scenarios
Storage Testing: Clear web storage to test application state management

The --timeout parameter specifies how many seconds to wait for the operation to complete (default: 5 seconds, use longer timeouts for comprehensive data clearing).

Drag and Drop Operations

You can perform drag and drop operations for testing interactive web applications:

# Drag and Drop Between Elements
# Drag from source element to target element
cremote drag-and-drop --source=".draggable-item" --target=".drop-zone" [--tab="<tab-id>"] [--timeout=5]

# Drag and Drop to Specific Coordinates
# Drag from source element to specific x,y coordinates
cremote drag-and-drop-coordinates --source=".draggable-item" --x=300 --y=200 [--tab="<tab-id>"] [--timeout=5]

# Drag and Drop by Relative Offset
# Drag from source element by relative pixel offset
cremote drag-and-drop-offset --source=".draggable-item" --offset-x=100 --offset-y=50 [--tab="<tab-id>"] [--timeout=5]

Use Cases:

File Upload: Drag files to upload areas
Sortable Lists: Reorder items in sortable lists
Kanban Boards: Move cards between columns
Image Galleries: Rearrange images or media
Form Builders: Drag form elements to build layouts
Dashboard Widgets: Rearrange dashboard components
Game Testing: Test drag-based game mechanics
UI Component Testing: Test custom drag and drop components

Technical Details:

Enhanced HTML5 Support: Automatically injects JavaScript helpers to trigger proper HTML5 drag and drop events (dragstart, dragover, drop, dragend)
Smart Target Detection: For coordinate/offset drags, automatically detects and targets valid drop zones at destination coordinates
Hybrid Approach: Tries HTML5 drag events first, falls back to Chrome DevTools Protocol mouse events if needed
Intelligent Fallback: Automatically switches between element-to-element and coordinate-based approaches for optimal compatibility
Realistic Event Simulation: Performs drag operations with proper timing and intermediate mouse movements
Automatic Element Detection: Calculates element center points automatically for accurate targeting
Robust Error Handling: Supports timeout handling and graceful degradation for complex drag operations
Universal Compatibility: Works with all modern drag and drop implementations (HTML5 Drag and Drop, jQuery UI, custom implementations)

The --timeout parameter specifies how many seconds to wait for the drag and drop operation to complete (default: 5 seconds).

Advanced Input Operations

Cremote provides sophisticated mouse and keyboard interactions for comprehensive testing of modern web applications:

Mouse Operations

# Right-click to open context menus
cremote right-click --selector=".file-item" [--tab="<tab-id>"] [--timeout=5]

# Double-click for file operations or text selection
cremote double-click --selector=".file-icon" [--tab="<tab-id>"] [--timeout=5]

# Middle-click to open links in new tabs
cremote middle-click --selector="a[href='/dashboard']" [--tab="<tab-id>"] [--timeout=5]

# Hover to trigger tooltips or dropdowns
cremote hover --selector=".tooltip-trigger" [--tab="<tab-id>"] [--timeout=5]

# Move mouse to specific coordinates without clicking
cremote mouse-move --x=400 --y=300 [--tab="<tab-id>"] [--timeout=5]

# Scroll with mouse wheel at specific coordinates
cremote scroll-wheel --x=400 --y=300 --delta-y=-120 [--delta-x=0] [--tab="<tab-id>"] [--timeout=5]

Keyboard Operations

# Send key combinations (Ctrl+C, Alt+Tab, Shift+Enter, etc.)
cremote key-combination --keys="Ctrl+C" [--tab="<tab-id>"] [--timeout=5]
cremote key-combination --keys="Alt+Tab" [--tab="<tab-id>"] [--timeout=5]
cremote key-combination --keys="Ctrl+Shift+T" [--tab="<tab-id>"] [--timeout=5]

# Send special keys (Enter, Escape, Tab, F1-F12, Arrow keys, etc.)
cremote special-key --key="Enter" [--tab="<tab-id>"] [--timeout=5]
cremote special-key --key="Escape" [--tab="<tab-id>"] [--timeout=5]
cremote special-key --key="ArrowUp" [--tab="<tab-id>"] [--timeout=5]
cremote special-key --key="F1" [--tab="<tab-id>"] [--timeout=5]

# Click with modifier keys (Ctrl+click, Shift+click for multi-selection)
cremote modifier-click --selector=".selectable-item" --modifiers="Ctrl" [--tab="<tab-id>"] [--timeout=5]
cremote modifier-click --selector=".list-item" --modifiers="Shift" [--tab="<tab-id>"] [--timeout=5]
cremote modifier-click --selector=".table-row" --modifiers="Ctrl+Shift" [--tab="<tab-id>"] [--timeout=5]

Advanced Use Cases:

Context Menu Testing: Right-click to test context menus and their functionality
Accessibility Testing: Full keyboard navigation support for accessibility compliance
Tooltip/Dropdown Testing: Hover interactions for UI elements that appear on mouse over
Multi-Selection Testing: Ctrl+click and Shift+click for testing selection interfaces
Copy/Paste Workflows: Test clipboard operations with Ctrl+A, Ctrl+C, Ctrl+V
Precise Mouse Control: Pixel-perfect mouse positioning and scrolling
Function Key Testing: Test application shortcuts using F1-F12 keys
Arrow Key Navigation: Test keyboard navigation in lists, tables, and forms

Technical Details:

Uses Chrome DevTools Protocol's Input domain for precise control
Supports all modifier keys: Ctrl, Alt, Shift, Meta/Cmd
Comprehensive key mapping for 60+ keys including letters, numbers, function keys, special keys
Proper modifier key sequencing (key down → action → key up)
Element positioning using content quads for pixel-perfect accuracy
Mouse button differentiation (Left, Right, Middle)
Realistic interaction patterns matching human behavior

Connecting to a Remote Daemon

By default, the client connects to a daemon running on localhost. To connect to a daemon running on a different host:

cremote open-tab --host="remote-host" --port=8989

Automation Example

Here's an example of how to use cremote in a shell script to automate a login process:

#!/bin/bash

# Make sure Chromium is running with remote debugging enabled
# chromium --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug &

# Make sure the daemon is running
# cremotedaemon &

# Open a new tab
TAB_ID=$(cremote open-tab)

# Load the login page (using the current tab)
cremote load-url --url="https://example.com/login"

# Fill in the username and password (using the current tab)
cremote fill-form --selector="#username" --value="user123"
cremote fill-form --selector="#password" --value="password123"

# Check the 'Remember me' checkbox
cremote fill-form --selector="#remember" --value="true"

# Accept the terms and conditions
cremote fill-form --selector="#terms" --value="true"

# Either submit the form using the form selector (using the current tab)
cremote submit-form --selector="form#login"

# Or click the login button (using the current tab)
# cremote click-element --selector="#login-button"

# You can still specify a tab ID explicitly if needed
# cremote load-url --tab="$TAB_ID" --url="https://example.com/login"

# Wait for navigation to complete (using the current tab)
cremote wait-navigation --timeout=30

# Execute JavaScript to check if login was successful
LOGIN_STATUS=$(cremote eval-js --code="document.querySelector('.welcome-message') !== null")
if [ "$LOGIN_STATUS" = "true" ]; then
    echo "Login successful!"
fi

# Example: Working with an iframe (e.g., payment form)
# Switch to iframe context
cremote switch-iframe --selector="iframe.payment-frame"

# Fill payment form inside iframe
cremote fill-form --selector="#card-number" --value="4111111111111111"
cremote fill-form --selector="#expiry-date" --value="12/25"
cremote click-element --selector="#pay-button"

# Switch back to main page
cremote switch-main

# Get the source code of the page after login (using the current tab)
cremote get-source

# Take a screenshot of the logged-in page
cremote screenshot --output="/tmp/login-success.png"

# Take a full-page screenshot for documentation
cremote screenshot --output="/tmp/full-page.png" --full-page

# Close the current tab
cremote close-tab

Troubleshooting

Daemon Not Running

If you see an error like "connection refused", make sure the daemon is running:

cremote status

If the daemon is not running, start it:

cremotedaemon

Connection Issues

If the daemon can't connect to Chromium, check the following:

Make sure Chromium/Chrome is running with remote debugging enabled on port 9222
Verify that Chromium was started with the correct flags: --remote-debugging-port=9222 --user-data-dir=/tmp/chromium-debug
Check if you can access the Chromium DevTools Protocol by opening http://localhost:9222/json/version in your browser

Tab Management

The daemon manages tab IDs for you, so you don't need to worry about tab persistence between commands. However:

Tab IDs are only valid for the duration of the browser session
If Chromium is restarted, you'll need to get new tab IDs
Store the tab ID returned by open-tab in a variable for use in subsequent commands
If a tab is closed by Chromium (not through the tool), you may need to run open-tab again

License

MIT

README.md

Chrome Remote Daemon (cremote)

Architecture

MCP Server

Prerequisites

Starting Chromium with Remote Debugging

Usage

Starting the Daemon

Using the Client

Commands

Current Tab Feature

Examples

Check Daemon Status

Open a new tab

Load a URL in a tab

Fill a form field

Check/uncheck a checkbox or select a radio button

Select dropdown options

Upload a file

Submit a form

Get the source code of a page

Get the HTML of an element

Click on an element

Close a tab

Wait for navigation to complete

Execute JavaScript code

Take a screenshot

Working with iframes

List all open tabs

Cache and Site Data Management

Drag and Drop Operations

Advanced Input Operations

Mouse Operations

Keyboard Operations

Connecting to a Remote Daemon

Automation Example

Troubleshooting

Daemon Not Running

Connection Issues

Tab Management

License

fyne-bug