20 KiB
CREMOTE ADA AUTOMATION ENHANCEMENT PLAN
Date: October 2, 2025
Status: APPROVED FOR IMPLEMENTATION
Goal: Increase automated testing coverage from 70% to 85%
Timeline: 6-8 weeks
Philosophy: KISS - Keep it Simple, Stupid
EXECUTIVE SUMMARY
This plan outlines practical enhancements to the cremote MCP accessibility testing suite. We will implement 6 new automated testing capabilities using proven, simple tools. The caption accuracy validation using speech-to-text is EXCLUDED as it's beyond our current platform capabilities.
Target Coverage Increase: 70% → 85% (15 percentage point improvement)
SCOPE EXCLUSIONS
❌ NOT INCLUDED IN THIS PLAN:
-
Speech-to-Text Caption Accuracy Validation
- Reason: Requires external services (Whisper API, Google Speech-to-Text)
- Complexity: High (video processing, audio extraction, STT integration)
- Cost: Ongoing API costs or significant compute resources
- Alternative: Manual review or future enhancement
-
Real-time Live Caption Testing
- Reason: Requires live streaming infrastructure
- Complexity: Very high (real-time monitoring, streaming protocols)
- Alternative: Manual testing during live events
-
Complex Video Content Analysis
- Reason: Determining if visual content requires audio description needs human judgment
- Alternative: Flag all videos without descriptions for manual review
IMPLEMENTATION PHASES
PHASE 1: FOUNDATION (Weeks 1-2)
Goal: Implement high-impact, low-effort enhancements
Effort: 28-36 hours
1.1 Gradient Contrast Analysis (ImageMagick)
Priority: CRITICAL
Effort: 8-12 hours
Solves: "Incomplete" findings for text on gradient backgrounds
Deliverables:
- New MCP tool:
web_gradient_contrast_check_cremotemcp_cremotemcp - Takes element selector, analyzes background gradient
- Returns worst-case contrast ratio
- Integrates with existing contrast checker
Technical Approach:
# 1. Screenshot element
web_screenshot_element(selector=".hero-section")
# 2. Extract text color from computed styles
text_color = getComputedStyle(element).color
# 3. Sample 100 points across background using ImageMagick
convert screenshot.png -resize 10x10! -depth 8 txt:- | parse_colors
# 4. Calculate contrast against darkest/lightest points
# 5. Return worst-case ratio
Files to Create/Modify:
mcp/tools/gradient_contrast.go(new)mcp/server.go(register new tool)docs/llm_ada_testing.md(document usage)
1.2 Time-Based Media Validation (Basic)
Priority: CRITICAL
Effort: 8-12 hours
Solves: WCAG 1.2.2, 1.2.3, 1.2.5, 1.4.2 violations
Deliverables:
- New MCP tool:
web_media_validation_cremotemcp_cremotemcp - Detects all video/audio elements
- Checks for caption tracks, audio description tracks, transcripts
- Validates track files are accessible
- Checks for autoplay violations
What We Test:
✅ Presence of <track kind="captions">
✅ Presence of <track kind="descriptions">
✅ Presence of transcript links
✅ Caption file accessibility (HTTP fetch)
✅ Controls attribute present
✅ Autoplay detection
✅ Embedded player detection (YouTube, Vimeo)
What We DON'T Test:
❌ Caption accuracy (requires speech-to-text)
❌ Audio description quality (requires human judgment)
❌ Transcript completeness (requires human judgment)
Technical Approach:
// JavaScript injection via console_command
const mediaInventory = {
videos: Array.from(document.querySelectorAll('video')).map(v => ({
src: v.src,
hasCaptions: !!v.querySelector('track[kind="captions"], track[kind="subtitles"]'),
hasDescriptions: !!v.querySelector('track[kind="descriptions"]'),
hasControls: v.hasAttribute('controls'),
autoplay: v.hasAttribute('autoplay'),
captionTracks: Array.from(v.querySelectorAll('track')).map(t => ({
kind: t.kind,
src: t.src,
srclang: t.srclang
}))
})),
audios: Array.from(document.querySelectorAll('audio')).map(a => ({
src: a.src,
hasControls: a.hasAttribute('controls'),
autoplay: a.hasAttribute('autoplay')
})),
embeds: Array.from(document.querySelectorAll('iframe[src*="youtube"], iframe[src*="vimeo"]')).map(i => ({
src: i.src,
type: i.src.includes('youtube') ? 'youtube' : 'vimeo'
}))
};
// For each video, validate caption files
for (const video of mediaInventory.videos) {
for (const track of video.captionTracks) {
const response = await fetch(track.src);
track.accessible = response.ok;
}
}
// Check for transcript links near videos
const transcriptLinks = Array.from(document.querySelectorAll('a[href*="transcript"]'));
return {mediaInventory, transcriptLinks};
Files to Create/Modify:
mcp/tools/media_validation.go(new)mcp/server.go(register new tool)docs/llm_ada_testing.md(document usage)
1.3 Hover/Focus Content Persistence Testing
Priority: HIGH
Effort: 12-16 hours
Solves: WCAG 1.4.13 violations (tooltips, dropdowns, popovers)
Deliverables:
- New MCP tool:
web_hover_focus_test_cremotemcp_cremotemcp - Identifies elements with hover/focus-triggered content
- Tests dismissibility (Esc key)
- Tests hoverability (can mouse move to triggered content)
- Tests persistence (doesn't disappear immediately)
Technical Approach:
// 1. Find all elements with hover/focus handlers
const interactiveElements = Array.from(document.querySelectorAll('*')).filter(el => {
const events = getEventListeners(el);
return events.mouseover || events.mouseenter || events.focus;
});
// 2. Test each element
for (const el of interactiveElements) {
// Trigger hover
el.dispatchEvent(new MouseEvent('mouseover', {bubbles: true}));
await sleep(100);
// Check for new content
const tooltip = document.querySelector('[role="tooltip"], .tooltip, .popover');
if (tooltip) {
// Test dismissibility
document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape'}));
const dismissed = !document.contains(tooltip);
// Test hoverability
const rect = tooltip.getBoundingClientRect();
const hoverable = rect.width > 0 && rect.height > 0;
// Test persistence
el.dispatchEvent(new MouseEvent('mouseout', {bubbles: true}));
await sleep(500);
const persistent = document.contains(tooltip);
results.push({element: el, dismissed, hoverable, persistent});
}
}
Files to Create/Modify:
mcp/tools/hover_focus_test.go(new)mcp/server.go(register new tool)docs/llm_ada_testing.md(document usage)
PHASE 2: EXPANSION (Weeks 3-4)
Goal: Add medium-complexity enhancements
Effort: 32-44 hours
2.1 Text-in-Images Detection (OCR)
Priority: HIGH
Effort: 12-16 hours
Solves: WCAG 1.4.5 violations (images of text)
Deliverables:
- New MCP tool:
web_text_in_images_check_cremotemcp_cremotemcp - Downloads all images from page
- Runs Tesseract OCR on each image
- Flags images containing significant text (>5 words)
- Compares detected text with alt text
- Excludes logos (configurable)
Technical Approach:
# 1. Extract all image URLs
images=$(console_command "Array.from(document.querySelectorAll('img')).map(img => ({src: img.src, alt: img.alt}))")
# 2. Download each image to container
for img in $images; do
curl -o /tmp/img_$i.png $img.src
# 3. Run OCR
tesseract /tmp/img_$i.png /tmp/img_$i_text --psm 6
# 4. Count words
word_count=$(wc -w < /tmp/img_$i_text.txt)
# 5. If >5 words, flag for review
if [ $word_count -gt 5 ]; then
echo "WARNING: Image contains text ($word_count words)"
echo "Image: $img.src"
echo "Alt text: $img.alt"
echo "Detected text: $(cat /tmp/img_$i_text.txt)"
echo "MANUAL REVIEW: Verify if this should be HTML text instead"
fi
done
Dependencies:
- Tesseract OCR (install in container)
- curl or wget for image download
Files to Create/Modify:
mcp/tools/text_in_images.go(new)Dockerfile(add tesseract-ocr)mcp/server.go(register new tool)docs/llm_ada_testing.md(document usage)
2.2 Cross-Page Consistency Analysis
Priority: MEDIUM
Effort: 16-24 hours
Solves: WCAG 3.2.3, 3.2.4 violations (consistent navigation/identification)
Deliverables:
- New MCP tool:
web_consistency_check_cremotemcp_cremotemcp - Crawls multiple pages (configurable limit)
- Extracts navigation structure from each page
- Compares navigation order across pages
- Identifies common elements (search, login, cart)
- Verifies consistent labeling
Technical Approach:
// 1. Crawl site (limit to 20 pages for performance)
const pages = [];
const visited = new Set();
async function crawlPage(url, depth = 0) {
if (depth > 2 || visited.has(url)) return;
visited.add(url);
await navigateTo(url);
pages.push({
url,
navigation: Array.from(document.querySelectorAll('nav a, header a')).map(a => ({
text: a.textContent.trim(),
href: a.href,
order: Array.from(a.parentElement.children).indexOf(a)
})),
commonElements: {
search: document.querySelector('[type="search"], [role="search"]')?.outerHTML,
login: document.querySelector('a[href*="login"]')?.textContent,
cart: document.querySelector('a[href*="cart"]')?.textContent
}
});
// Find more pages
const links = Array.from(document.querySelectorAll('a[href]'))
.map(a => a.href)
.filter(href => href.startsWith(window.location.origin))
.slice(0, 10);
for (const link of links) {
await crawlPage(link, depth + 1);
}
}
// 2. Analyze consistency
const navOrders = pages.map(p => p.navigation.map(n => n.text).join('|'));
const uniqueOrders = [...new Set(navOrders)];
if (uniqueOrders.length > 1) {
// Navigation order varies - FAIL WCAG 3.2.3
}
// Check common element consistency
const searchLabels = pages.map(p => p.commonElements.search).filter(Boolean);
if (new Set(searchLabels).size > 1) {
// Search identified inconsistently - FAIL WCAG 3.2.4
}
Files to Create/Modify:
mcp/tools/consistency_check.go(new)mcp/server.go(register new tool)docs/llm_ada_testing.md(document usage)
2.3 Sensory Characteristics Detection (Pattern Matching)
Priority: MEDIUM
Effort: 8-12 hours
Solves: WCAG 1.3.3 violations (instructions relying on sensory characteristics)
Deliverables:
- New MCP tool:
web_sensory_check_cremotemcp_cremotemcp - Scans page text for sensory-only instructions
- Flags phrases like "click the red button", "square icon", "on the right"
- Uses regex pattern matching
- Provides context for manual review
Technical Approach:
// Pattern matching for sensory-only instructions
const sensoryPatterns = [
// Color-only
/click (the )?(red|green|blue|yellow|orange|purple|pink|gray|grey) (button|link|icon)/gi,
/the (red|green|blue|yellow|orange|purple|pink|gray|grey) (button|link|icon)/gi,
// Shape-only
/(round|square|circular|rectangular|triangular) (button|icon|shape)/gi,
/click (the )?(circle|square|triangle|rectangle)/gi,
// Position-only
/(on the |at the )?(left|right|top|bottom|above|below)/gi,
/button (on the |at the )?(left|right|top|bottom)/gi,
// Size-only
/(large|small|big|little) (button|icon|link)/gi,
// Sound-only
/when you hear (the )?(beep|sound|tone|chime)/gi
];
const pageText = document.body.innerText;
const violations = [];
for (const pattern of sensoryPatterns) {
const matches = pageText.matchAll(pattern);
for (const match of matches) {
// Get context (50 chars before and after)
const index = match.index;
const context = pageText.substring(index - 50, index + match[0].length + 50);
violations.push({
text: match[0],
context,
pattern: pattern.source,
wcag: '1.3.3 Sensory Characteristics'
});
}
}
return violations;
Files to Create/Modify:
mcp/tools/sensory_check.go(new)mcp/server.go(register new tool)docs/llm_ada_testing.md(document usage)
PHASE 3: ADVANCED (Weeks 5-6)
Goal: Add complex but valuable enhancements
Effort: 24-32 hours
3.1 Animation & Flash Detection (Video Analysis)
Priority: MEDIUM
Effort: 16-24 hours
Solves: WCAG 2.3.1 violations (three flashes or below threshold)
Deliverables:
- New MCP tool:
web_flash_detection_cremotemcp_cremotemcp - Records page for 10 seconds using CDP screencast
- Analyzes frames for brightness changes
- Counts flashes per second
- Flags if >3 flashes/second detected
Technical Approach:
// Use Chrome DevTools Protocol to capture screencast
func (t *FlashDetectionTool) Execute(params map[string]interface{}) (interface{}, error) {
// 1. Start screencast
err := t.cdp.Page.StartScreencast(&page.StartScreencastArgs{
Format: "png",
Quality: 80,
MaxWidth: 1280,
MaxHeight: 800,
})
// 2. Collect frames for 10 seconds
frames := [][]byte{}
timeout := time.After(10 * time.Second)
for {
select {
case frame := <-t.cdp.Page.ScreencastFrame:
frames = append(frames, frame.Data)
case <-timeout:
goto analyze
}
}
analyze:
// 3. Analyze brightness changes between consecutive frames
flashes := 0
for i := 1; i < len(frames); i++ {
brightness1 := calculateBrightness(frames[i-1])
brightness2 := calculateBrightness(frames[i])
// If brightness change >20%, count as flash
if math.Abs(brightness2 - brightness1) > 0.2 {
flashes++
}
}
// 4. Calculate flashes per second
flashesPerSecond := float64(flashes) / 10.0
return map[string]interface{}{
"flashes_detected": flashes,
"flashes_per_second": flashesPerSecond,
"passes": flashesPerSecond <= 3.0,
"wcag": "2.3.1 Three Flashes or Below Threshold",
}, nil
}
Dependencies:
- Chrome DevTools Protocol screencast API
- Image processing library (Go image package)
Files to Create/Modify:
mcp/tools/flash_detection.go(new)mcp/server.go(register new tool)docs/llm_ada_testing.md(document usage)
3.2 Enhanced Accessibility Tree Analysis
Priority: MEDIUM
Effort: 8-12 hours
Solves: Better detection of ARIA issues, role/name/value problems
Deliverables:
- Enhance existing
get_accessibility_tree_cremotemcp_cremotemcptool - Add validation rules for common ARIA mistakes
- Check for invalid role combinations
- Verify required ARIA properties
- Detect orphaned ARIA references
Technical Approach:
// Validate ARIA usage
const ariaValidation = {
// Check for invalid role combinations
invalidRoles: Array.from(document.querySelectorAll('[role]')).filter(el => {
const role = el.getAttribute('role');
const validRoles = ['button', 'link', 'navigation', 'main', 'complementary', ...];
return !validRoles.includes(role);
}),
// Check for required ARIA properties
missingProperties: Array.from(document.querySelectorAll('[role="button"]')).filter(el => {
return !el.hasAttribute('aria-label') && !el.textContent.trim();
}),
// Check for orphaned aria-describedby/labelledby
orphanedReferences: Array.from(document.querySelectorAll('[aria-describedby], [aria-labelledby]')).filter(el => {
const describedby = el.getAttribute('aria-describedby');
const labelledby = el.getAttribute('aria-labelledby');
const id = describedby || labelledby;
return id && !document.getElementById(id);
})
};
Files to Create/Modify:
mcp/tools/accessibility_tree.go(enhance existing)docs/llm_ada_testing.md(document new validations)
IMPLEMENTATION SCHEDULE
Week 1-2: Phase 1 Foundation
- Day 1-3: Gradient contrast analysis (ImageMagick)
- Day 4-6: Time-based media validation (basic)
- Day 7-10: Hover/focus content testing
Week 3-4: Phase 2 Expansion
- Day 11-14: Text-in-images detection (OCR)
- Day 15-20: Cross-page consistency analysis
- Day 21-23: Sensory characteristics detection
Week 5-6: Phase 3 Advanced
- Day 24-30: Animation/flash detection
- Day 31-35: Enhanced accessibility tree analysis
Week 7-8: Testing & Documentation
- Day 36-40: Integration testing
- Day 41-45: Documentation updates
- Day 46-50: User acceptance testing
TECHNICAL REQUIREMENTS
Container Dependencies
# Add to Dockerfile
RUN apt-get update && apt-get install -y \
imagemagick \
tesseract-ocr \
tesseract-ocr-eng \
&& rm -rf /var/lib/apt/lists/*
Go Dependencies
// Add to go.mod
require (
github.com/chromedp/cdproto v0.0.0-20231011050154-1d073bb38998
github.com/disintegration/imaging v1.6.2 // Image processing
)
Configuration
# Add to cremote config
automation_enhancements:
gradient_contrast:
enabled: true
sample_points: 100
media_validation:
enabled: true
check_embedded_players: true
youtube_api_key: "" # Optional
text_in_images:
enabled: true
min_word_threshold: 5
exclude_logos: true
consistency_check:
enabled: true
max_pages: 20
max_depth: 2
flash_detection:
enabled: true
recording_duration: 10
brightness_threshold: 0.2
SUCCESS METRICS
Coverage Targets
- Current: 70% automated coverage
- After Phase 1: 78% automated coverage (+8%)
- After Phase 2: 83% automated coverage (+5%)
- After Phase 3: 85% automated coverage (+2%)
Quality Metrics
- False Positive Rate: <10%
- False Negative Rate: <5%
- Test Execution Time: <5 minutes per page
- Report Clarity: 100% actionable findings
Performance Targets
- Gradient contrast: <2 seconds per element
- Media validation: <5 seconds per page
- Text-in-images: <1 second per image
- Consistency check: <30 seconds for 20 pages
- Flash detection: 10 seconds (fixed recording time)
RISK MITIGATION
Technical Risks
-
ImageMagick performance on large images
- Mitigation: Resize images before analysis
- Fallback: Skip images >5MB
-
Tesseract OCR accuracy
- Mitigation: Set confidence threshold
- Fallback: Flag low-confidence results for manual review
-
CDP screencast reliability
- Mitigation: Implement retry logic
- Fallback: Skip flash detection if screencast fails
-
Cross-page crawling performance
- Mitigation: Limit to 20 pages, depth 2
- Fallback: Allow user to specify page list
Operational Risks
-
Container size increase
- Mitigation: Use multi-stage Docker builds
- Monitor: Keep container <500MB
-
Increased test execution time
- Mitigation: Make all enhancements optional
- Allow: Users to enable/disable specific tests
DELIVERABLES
Code
- 6 new MCP tools (gradient, media, hover, OCR, consistency, flash)
- 1 enhanced tool (accessibility tree)
- Updated Dockerfile with dependencies
- Updated configuration schema
- Integration tests for all new tools
Documentation
- Updated
docs/llm_ada_testing.mdwith new tools - Updated
enhanced_chromium_ada_checklist.mdwith automation notes - New
docs/AUTOMATION_TOOLS.mdwith technical details - Updated README with new capabilities
- Example usage for each new tool
Testing
- Unit tests for each new tool
- Integration tests with real websites
- Performance benchmarks
- Accuracy validation against manual testing
MAINTENANCE PLAN
Ongoing Support
- Monitor false positive/negative rates
- Update pattern matching rules (sensory characteristics)
- Keep dependencies updated (ImageMagick, Tesseract)
- Add new ARIA validation rules as spec evolves
Future Enhancements (Post-Plan)
- LLM-assisted semantic analysis (if budget allows)
- Speech-to-text caption validation (if external service available)
- Real-time live caption testing (if streaming infrastructure added)
- Advanced video content analysis (if AI/ML resources available)
APPROVAL & SIGN-OFF
Plan Status: READY FOR APPROVAL
Estimated Total Effort: 84-112 hours (10-14 business days)
Estimated Timeline: 6-8 weeks (with testing and documentation)
Budget Impact: Minimal (only open-source dependencies)
Risk Level: LOW (all technologies proven and stable)
Next Steps:
- Review and approve this plan
- Set up development environment with new dependencies
- Begin Phase 1 implementation
- Schedule weekly progress reviews
Document Prepared By: Cremote Development Team
Date: October 2, 2025
Version: 1.0