# CREMOTE ADA AUTOMATION ENHANCEMENT PLAN **Date:** October 2, 2025 **Status:** APPROVED FOR IMPLEMENTATION **Goal:** Increase automated testing coverage from 70% to 85% **Timeline:** 6-8 weeks **Philosophy:** KISS - Keep it Simple, Stupid --- ## EXECUTIVE SUMMARY This plan outlines practical enhancements to the cremote MCP accessibility testing suite. We will implement 6 new automated testing capabilities using proven, simple tools. The caption accuracy validation using speech-to-text is **EXCLUDED** as it's beyond our current platform capabilities. **Target Coverage Increase:** 70% → 85% (15 percentage point improvement) --- ## SCOPE EXCLUSIONS ### ❌ NOT INCLUDED IN THIS PLAN: 1. **Speech-to-Text Caption Accuracy Validation** - Reason: Requires external services (Whisper API, Google Speech-to-Text) - Complexity: High (video processing, audio extraction, STT integration) - Cost: Ongoing API costs or significant compute resources - Alternative: Manual review or future enhancement 2. **Real-time Live Caption Testing** - Reason: Requires live streaming infrastructure - Complexity: Very high (real-time monitoring, streaming protocols) - Alternative: Manual testing during live events 3. **Complex Video Content Analysis** - Reason: Determining if visual content requires audio description needs human judgment - Alternative: Flag all videos without descriptions for manual review --- ## IMPLEMENTATION PHASES ### **PHASE 1: FOUNDATION (Weeks 1-2)** **Goal:** Implement high-impact, low-effort enhancements **Effort:** 28-36 hours #### 1.1 Gradient Contrast Analysis (ImageMagick) **Priority:** CRITICAL **Effort:** 8-12 hours **Solves:** "Incomplete" findings for text on gradient backgrounds **Deliverables:** - New MCP tool: `web_gradient_contrast_check_cremotemcp_cremotemcp` - Takes element selector, analyzes background gradient - Returns worst-case contrast ratio - Integrates with existing contrast checker **Technical Approach:** ```bash # 1. Screenshot element web_screenshot_element(selector=".hero-section") # 2. Extract text color from computed styles text_color = getComputedStyle(element).color # 3. Sample 100 points across background using ImageMagick convert screenshot.png -resize 10x10! -depth 8 txt:- | parse_colors # 4. Calculate contrast against darkest/lightest points # 5. Return worst-case ratio ``` **Files to Create/Modify:** - `mcp/tools/gradient_contrast.go` (new) - `mcp/server.go` (register new tool) - `docs/llm_ada_testing.md` (document usage) --- #### 1.2 Time-Based Media Validation (Basic) **Priority:** CRITICAL **Effort:** 8-12 hours **Solves:** WCAG 1.2.2, 1.2.3, 1.2.5, 1.4.2 violations **Deliverables:** - New MCP tool: `web_media_validation_cremotemcp_cremotemcp` - Detects all video/audio elements - Checks for caption tracks, audio description tracks, transcripts - Validates track files are accessible - Checks for autoplay violations **What We Test:** ✅ Presence of `` ✅ Presence of `` ✅ Presence of transcript links ✅ Caption file accessibility (HTTP fetch) ✅ Controls attribute present ✅ Autoplay detection ✅ Embedded player detection (YouTube, Vimeo) **What We DON'T Test:** ❌ Caption accuracy (requires speech-to-text) ❌ Audio description quality (requires human judgment) ❌ Transcript completeness (requires human judgment) **Technical Approach:** ```javascript // JavaScript injection via console_command const mediaInventory = { videos: Array.from(document.querySelectorAll('video')).map(v => ({ src: v.src, hasCaptions: !!v.querySelector('track[kind="captions"], track[kind="subtitles"]'), hasDescriptions: !!v.querySelector('track[kind="descriptions"]'), hasControls: v.hasAttribute('controls'), autoplay: v.hasAttribute('autoplay'), captionTracks: Array.from(v.querySelectorAll('track')).map(t => ({ kind: t.kind, src: t.src, srclang: t.srclang })) })), audios: Array.from(document.querySelectorAll('audio')).map(a => ({ src: a.src, hasControls: a.hasAttribute('controls'), autoplay: a.hasAttribute('autoplay') })), embeds: Array.from(document.querySelectorAll('iframe[src*="youtube"], iframe[src*="vimeo"]')).map(i => ({ src: i.src, type: i.src.includes('youtube') ? 'youtube' : 'vimeo' })) }; // For each video, validate caption files for (const video of mediaInventory.videos) { for (const track of video.captionTracks) { const response = await fetch(track.src); track.accessible = response.ok; } } // Check for transcript links near videos const transcriptLinks = Array.from(document.querySelectorAll('a[href*="transcript"]')); return {mediaInventory, transcriptLinks}; ``` **Files to Create/Modify:** - `mcp/tools/media_validation.go` (new) - `mcp/server.go` (register new tool) - `docs/llm_ada_testing.md` (document usage) --- #### 1.3 Hover/Focus Content Persistence Testing **Priority:** HIGH **Effort:** 12-16 hours **Solves:** WCAG 1.4.13 violations (tooltips, dropdowns, popovers) **Deliverables:** - New MCP tool: `web_hover_focus_test_cremotemcp_cremotemcp` - Identifies elements with hover/focus-triggered content - Tests dismissibility (Esc key) - Tests hoverability (can mouse move to triggered content) - Tests persistence (doesn't disappear immediately) **Technical Approach:** ```javascript // 1. Find all elements with hover/focus handlers const interactiveElements = Array.from(document.querySelectorAll('*')).filter(el => { const events = getEventListeners(el); return events.mouseover || events.mouseenter || events.focus; }); // 2. Test each element for (const el of interactiveElements) { // Trigger hover el.dispatchEvent(new MouseEvent('mouseover', {bubbles: true})); await sleep(100); // Check for new content const tooltip = document.querySelector('[role="tooltip"], .tooltip, .popover'); if (tooltip) { // Test dismissibility document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape'})); const dismissed = !document.contains(tooltip); // Test hoverability const rect = tooltip.getBoundingClientRect(); const hoverable = rect.width > 0 && rect.height > 0; // Test persistence el.dispatchEvent(new MouseEvent('mouseout', {bubbles: true})); await sleep(500); const persistent = document.contains(tooltip); results.push({element: el, dismissed, hoverable, persistent}); } } ``` **Files to Create/Modify:** - `mcp/tools/hover_focus_test.go` (new) - `mcp/server.go` (register new tool) - `docs/llm_ada_testing.md` (document usage) --- ### **PHASE 2: EXPANSION (Weeks 3-4)** **Goal:** Add medium-complexity enhancements **Effort:** 32-44 hours #### 2.1 Text-in-Images Detection (OCR) **Priority:** HIGH **Effort:** 12-16 hours **Solves:** WCAG 1.4.5 violations (images of text) **Deliverables:** - New MCP tool: `web_text_in_images_check_cremotemcp_cremotemcp` - Downloads all images from page - Runs Tesseract OCR on each image - Flags images containing significant text (>5 words) - Compares detected text with alt text - Excludes logos (configurable) **Technical Approach:** ```bash # 1. Extract all image URLs images=$(console_command "Array.from(document.querySelectorAll('img')).map(img => ({src: img.src, alt: img.alt}))") # 2. Download each image to container for img in $images; do curl -o /tmp/img_$i.png $img.src # 3. Run OCR tesseract /tmp/img_$i.png /tmp/img_$i_text --psm 6 # 4. Count words word_count=$(wc -w < /tmp/img_$i_text.txt) # 5. If >5 words, flag for review if [ $word_count -gt 5 ]; then echo "WARNING: Image contains text ($word_count words)" echo "Image: $img.src" echo "Alt text: $img.alt" echo "Detected text: $(cat /tmp/img_$i_text.txt)" echo "MANUAL REVIEW: Verify if this should be HTML text instead" fi done ``` **Dependencies:** - Tesseract OCR (install in container) - curl or wget for image download **Files to Create/Modify:** - `mcp/tools/text_in_images.go` (new) - `Dockerfile` (add tesseract-ocr) - `mcp/server.go` (register new tool) - `docs/llm_ada_testing.md` (document usage) --- #### 2.2 Cross-Page Consistency Analysis **Priority:** MEDIUM **Effort:** 16-24 hours **Solves:** WCAG 3.2.3, 3.2.4 violations (consistent navigation/identification) **Deliverables:** - New MCP tool: `web_consistency_check_cremotemcp_cremotemcp` - Crawls multiple pages (configurable limit) - Extracts navigation structure from each page - Compares navigation order across pages - Identifies common elements (search, login, cart) - Verifies consistent labeling **Technical Approach:** ```javascript // 1. Crawl site (limit to 20 pages for performance) const pages = []; const visited = new Set(); async function crawlPage(url, depth = 0) { if (depth > 2 || visited.has(url)) return; visited.add(url); await navigateTo(url); pages.push({ url, navigation: Array.from(document.querySelectorAll('nav a, header a')).map(a => ({ text: a.textContent.trim(), href: a.href, order: Array.from(a.parentElement.children).indexOf(a) })), commonElements: { search: document.querySelector('[type="search"], [role="search"]')?.outerHTML, login: document.querySelector('a[href*="login"]')?.textContent, cart: document.querySelector('a[href*="cart"]')?.textContent } }); // Find more pages const links = Array.from(document.querySelectorAll('a[href]')) .map(a => a.href) .filter(href => href.startsWith(window.location.origin)) .slice(0, 10); for (const link of links) { await crawlPage(link, depth + 1); } } // 2. Analyze consistency const navOrders = pages.map(p => p.navigation.map(n => n.text).join('|')); const uniqueOrders = [...new Set(navOrders)]; if (uniqueOrders.length > 1) { // Navigation order varies - FAIL WCAG 3.2.3 } // Check common element consistency const searchLabels = pages.map(p => p.commonElements.search).filter(Boolean); if (new Set(searchLabels).size > 1) { // Search identified inconsistently - FAIL WCAG 3.2.4 } ``` **Files to Create/Modify:** - `mcp/tools/consistency_check.go` (new) - `mcp/server.go` (register new tool) - `docs/llm_ada_testing.md` (document usage) --- #### 2.3 Sensory Characteristics Detection (Pattern Matching) **Priority:** MEDIUM **Effort:** 8-12 hours **Solves:** WCAG 1.3.3 violations (instructions relying on sensory characteristics) **Deliverables:** - New MCP tool: `web_sensory_check_cremotemcp_cremotemcp` - Scans page text for sensory-only instructions - Flags phrases like "click the red button", "square icon", "on the right" - Uses regex pattern matching - Provides context for manual review **Technical Approach:** ```javascript // Pattern matching for sensory-only instructions const sensoryPatterns = [ // Color-only /click (the )?(red|green|blue|yellow|orange|purple|pink|gray|grey) (button|link|icon)/gi, /the (red|green|blue|yellow|orange|purple|pink|gray|grey) (button|link|icon)/gi, // Shape-only /(round|square|circular|rectangular|triangular) (button|icon|shape)/gi, /click (the )?(circle|square|triangle|rectangle)/gi, // Position-only /(on the |at the )?(left|right|top|bottom|above|below)/gi, /button (on the |at the )?(left|right|top|bottom)/gi, // Size-only /(large|small|big|little) (button|icon|link)/gi, // Sound-only /when you hear (the )?(beep|sound|tone|chime)/gi ]; const pageText = document.body.innerText; const violations = []; for (const pattern of sensoryPatterns) { const matches = pageText.matchAll(pattern); for (const match of matches) { // Get context (50 chars before and after) const index = match.index; const context = pageText.substring(index - 50, index + match[0].length + 50); violations.push({ text: match[0], context, pattern: pattern.source, wcag: '1.3.3 Sensory Characteristics' }); } } return violations; ``` **Files to Create/Modify:** - `mcp/tools/sensory_check.go` (new) - `mcp/server.go` (register new tool) - `docs/llm_ada_testing.md` (document usage) --- ### **PHASE 3: ADVANCED (Weeks 5-6)** **Goal:** Add complex but valuable enhancements **Effort:** 24-32 hours #### 3.1 Animation & Flash Detection (Video Analysis) **Priority:** MEDIUM **Effort:** 16-24 hours **Solves:** WCAG 2.3.1 violations (three flashes or below threshold) **Deliverables:** - New MCP tool: `web_flash_detection_cremotemcp_cremotemcp` - Records page for 10 seconds using CDP screencast - Analyzes frames for brightness changes - Counts flashes per second - Flags if >3 flashes/second detected **Technical Approach:** ```go // Use Chrome DevTools Protocol to capture screencast func (t *FlashDetectionTool) Execute(params map[string]interface{}) (interface{}, error) { // 1. Start screencast err := t.cdp.Page.StartScreencast(&page.StartScreencastArgs{ Format: "png", Quality: 80, MaxWidth: 1280, MaxHeight: 800, }) // 2. Collect frames for 10 seconds frames := [][]byte{} timeout := time.After(10 * time.Second) for { select { case frame := <-t.cdp.Page.ScreencastFrame: frames = append(frames, frame.Data) case <-timeout: goto analyze } } analyze: // 3. Analyze brightness changes between consecutive frames flashes := 0 for i := 1; i < len(frames); i++ { brightness1 := calculateBrightness(frames[i-1]) brightness2 := calculateBrightness(frames[i]) // If brightness change >20%, count as flash if math.Abs(brightness2 - brightness1) > 0.2 { flashes++ } } // 4. Calculate flashes per second flashesPerSecond := float64(flashes) / 10.0 return map[string]interface{}{ "flashes_detected": flashes, "flashes_per_second": flashesPerSecond, "passes": flashesPerSecond <= 3.0, "wcag": "2.3.1 Three Flashes or Below Threshold", }, nil } ``` **Dependencies:** - Chrome DevTools Protocol screencast API - Image processing library (Go image package) **Files to Create/Modify:** - `mcp/tools/flash_detection.go` (new) - `mcp/server.go` (register new tool) - `docs/llm_ada_testing.md` (document usage) --- #### 3.2 Enhanced Accessibility Tree Analysis **Priority:** MEDIUM **Effort:** 8-12 hours **Solves:** Better detection of ARIA issues, role/name/value problems **Deliverables:** - Enhance existing `get_accessibility_tree_cremotemcp_cremotemcp` tool - Add validation rules for common ARIA mistakes - Check for invalid role combinations - Verify required ARIA properties - Detect orphaned ARIA references **Technical Approach:** ```javascript // Validate ARIA usage const ariaValidation = { // Check for invalid role combinations invalidRoles: Array.from(document.querySelectorAll('[role]')).filter(el => { const role = el.getAttribute('role'); const validRoles = ['button', 'link', 'navigation', 'main', 'complementary', ...]; return !validRoles.includes(role); }), // Check for required ARIA properties missingProperties: Array.from(document.querySelectorAll('[role="button"]')).filter(el => { return !el.hasAttribute('aria-label') && !el.textContent.trim(); }), // Check for orphaned aria-describedby/labelledby orphanedReferences: Array.from(document.querySelectorAll('[aria-describedby], [aria-labelledby]')).filter(el => { const describedby = el.getAttribute('aria-describedby'); const labelledby = el.getAttribute('aria-labelledby'); const id = describedby || labelledby; return id && !document.getElementById(id); }) }; ``` **Files to Create/Modify:** - `mcp/tools/accessibility_tree.go` (enhance existing) - `docs/llm_ada_testing.md` (document new validations) --- ## IMPLEMENTATION SCHEDULE ### Week 1-2: Phase 1 Foundation - [ ] Day 1-3: Gradient contrast analysis (ImageMagick) - [ ] Day 4-6: Time-based media validation (basic) - [ ] Day 7-10: Hover/focus content testing ### Week 3-4: Phase 2 Expansion - [ ] Day 11-14: Text-in-images detection (OCR) - [ ] Day 15-20: Cross-page consistency analysis - [ ] Day 21-23: Sensory characteristics detection ### Week 5-6: Phase 3 Advanced - [ ] Day 24-30: Animation/flash detection - [ ] Day 31-35: Enhanced accessibility tree analysis ### Week 7-8: Testing & Documentation - [ ] Day 36-40: Integration testing - [ ] Day 41-45: Documentation updates - [ ] Day 46-50: User acceptance testing --- ## TECHNICAL REQUIREMENTS ### Container Dependencies ```dockerfile # Add to Dockerfile RUN apt-get update && apt-get install -y \ imagemagick \ tesseract-ocr \ tesseract-ocr-eng \ && rm -rf /var/lib/apt/lists/* ``` ### Go Dependencies ```go // Add to go.mod require ( github.com/chromedp/cdproto v0.0.0-20231011050154-1d073bb38998 github.com/disintegration/imaging v1.6.2 // Image processing ) ``` ### Configuration ```yaml # Add to cremote config automation_enhancements: gradient_contrast: enabled: true sample_points: 100 media_validation: enabled: true check_embedded_players: true youtube_api_key: "" # Optional text_in_images: enabled: true min_word_threshold: 5 exclude_logos: true consistency_check: enabled: true max_pages: 20 max_depth: 2 flash_detection: enabled: true recording_duration: 10 brightness_threshold: 0.2 ``` --- ## SUCCESS METRICS ### Coverage Targets - **Current:** 70% automated coverage - **After Phase 1:** 78% automated coverage (+8%) - **After Phase 2:** 83% automated coverage (+5%) - **After Phase 3:** 85% automated coverage (+2%) ### Quality Metrics - **False Positive Rate:** <10% - **False Negative Rate:** <5% - **Test Execution Time:** <5 minutes per page - **Report Clarity:** 100% actionable findings ### Performance Targets - Gradient contrast: <2 seconds per element - Media validation: <5 seconds per page - Text-in-images: <1 second per image - Consistency check: <30 seconds for 20 pages - Flash detection: 10 seconds (fixed recording time) --- ## RISK MITIGATION ### Technical Risks 1. **ImageMagick performance on large images** - Mitigation: Resize images before analysis - Fallback: Skip images >5MB 2. **Tesseract OCR accuracy** - Mitigation: Set confidence threshold - Fallback: Flag low-confidence results for manual review 3. **CDP screencast reliability** - Mitigation: Implement retry logic - Fallback: Skip flash detection if screencast fails 4. **Cross-page crawling performance** - Mitigation: Limit to 20 pages, depth 2 - Fallback: Allow user to specify page list ### Operational Risks 1. **Container size increase** - Mitigation: Use multi-stage Docker builds - Monitor: Keep container <500MB 2. **Increased test execution time** - Mitigation: Make all enhancements optional - Allow: Users to enable/disable specific tests --- ## DELIVERABLES ### Code - [ ] 6 new MCP tools (gradient, media, hover, OCR, consistency, flash) - [ ] 1 enhanced tool (accessibility tree) - [ ] Updated Dockerfile with dependencies - [ ] Updated configuration schema - [ ] Integration tests for all new tools ### Documentation - [ ] Updated `docs/llm_ada_testing.md` with new tools - [ ] Updated `enhanced_chromium_ada_checklist.md` with automation notes - [ ] New `docs/AUTOMATION_TOOLS.md` with technical details - [ ] Updated README with new capabilities - [ ] Example usage for each new tool ### Testing - [ ] Unit tests for each new tool - [ ] Integration tests with real websites - [ ] Performance benchmarks - [ ] Accuracy validation against manual testing --- ## MAINTENANCE PLAN ### Ongoing Support - Monitor false positive/negative rates - Update pattern matching rules (sensory characteristics) - Keep dependencies updated (ImageMagick, Tesseract) - Add new ARIA validation rules as spec evolves ### Future Enhancements (Post-Plan) - LLM-assisted semantic analysis (if budget allows) - Speech-to-text caption validation (if external service available) - Real-time live caption testing (if streaming infrastructure added) - Advanced video content analysis (if AI/ML resources available) --- ## APPROVAL & SIGN-OFF **Plan Status:** READY FOR APPROVAL **Estimated Total Effort:** 84-112 hours (10-14 business days) **Estimated Timeline:** 6-8 weeks (with testing and documentation) **Budget Impact:** Minimal (only open-source dependencies) **Risk Level:** LOW (all technologies proven and stable) --- **Next Steps:** 1. Review and approve this plan 2. Set up development environment with new dependencies 3. Begin Phase 1 implementation 4. Schedule weekly progress reviews --- **Document Prepared By:** Cremote Development Team **Date:** October 2, 2025 **Version:** 1.0