18 KiB
AUTOMATED TESTING ENHANCEMENTS FOR CREMOTE ADA SUITE
Date: October 2, 2025
Purpose: Propose creative solutions to automate currently manual accessibility tests
Philosophy: KISS - Keep it Simple, Stupid. Practical solutions using existing tools.
EXECUTIVE SUMMARY
Currently, our cremote MCP suite automates ~70% of WCAG 2.1 AA testing. This document proposes practical solutions to increase automation coverage to ~85-90% by leveraging:
- ImageMagick for gradient contrast analysis
- Screenshot-based analysis for visual testing
- OCR tools for text-in-images detection
- Video frame analysis for animation/flash testing
- Enhanced JavaScript injection for deeper DOM analysis
CATEGORY 1: GRADIENT & COMPLEX BACKGROUND CONTRAST
Current Limitation
Problem: Axe-core reports "incomplete" for text on gradient backgrounds because it cannot calculate contrast ratios for non-solid colors.
Example from our assessment:
- Navigation menu links (background color could not be determined due to overlap)
- Gradient backgrounds on hero section (contrast cannot be automatically calculated)
Proposed Solution: ImageMagick Gradient Analysis
Approach:
- Take screenshot of specific element using
web_screenshot_element_cremotemcp_cremotemcp - Use ImageMagick to analyze color distribution
- Calculate contrast ratio against darkest/lightest points in gradient
- Report worst-case contrast ratio
Implementation:
# Step 1: Take element screenshot
web_screenshot_element_cremotemcp(selector=".hero-section", output="/tmp/hero.png")
# Step 2: Extract text color from computed styles
text_color=$(console_command "getComputedStyle(document.querySelector('.hero-section h1')).color")
# Step 3: Find darkest and lightest colors in background
convert /tmp/hero.png -format "%[fx:minima]" info: > darkest.txt
convert /tmp/hero.png -format "%[fx:maxima]" info: > lightest.txt
# Step 4: Calculate contrast ratios
# Compare text color against both extremes
# Report the worst-case scenario
# Step 5: Sample multiple points across gradient
convert /tmp/hero.png -resize 10x10! -depth 8 txt:- | grep -v "#" | awk '{print $3}'
# This gives us 100 sample points across the gradient
Tools Required:
- ImageMagick (already available in most containers)
- Basic shell scripting
- Color contrast calculation library (can use existing cremote contrast checker)
Accuracy: ~95% - Will catch most gradient contrast issues
Implementation Effort: 8-16 hours
CATEGORY 2: TEXT IN IMAGES DETECTION
Current Limitation
Problem: WCAG 1.4.5 requires text to be actual text, not images of text (except logos). Currently requires manual visual inspection.
Proposed Solution: OCR-Based Text Detection
Approach:
- Screenshot all images on page
- Run OCR (Tesseract) on each image
- If text detected, flag for manual review
- Cross-reference with alt text to verify equivalence
Implementation:
# Step 1: Extract all image URLs
images=$(console_command "Array.from(document.querySelectorAll('img')).map(img => ({src: img.src, alt: img.alt}))")
# Step 2: Download each image
for img in $images; do
curl -o /tmp/img_$i.png $img
# Step 3: Run OCR
tesseract /tmp/img_$i.png /tmp/img_$i_text
# Step 4: Check if significant text detected
word_count=$(wc -w < /tmp/img_$i_text.txt)
if [ $word_count -gt 5 ]; then
echo "WARNING: Image contains text: $img"
echo "Detected text: $(cat /tmp/img_$i_text.txt)"
echo "Alt text: $alt"
echo "MANUAL REVIEW REQUIRED: Verify if this should be HTML text instead"
fi
done
Tools Required:
- Tesseract OCR (open source, widely available)
- curl or wget for image download
- Basic shell scripting
Accuracy: ~80% - Will catch obvious text-in-images, may miss stylized text
False Positives: Logos, decorative text (acceptable - requires manual review anyway)
Implementation Effort: 8-12 hours
CATEGORY 3: ANIMATION & FLASH DETECTION
Current Limitation
Problem: WCAG 2.3.1 requires no content flashing more than 3 times per second. Currently requires manual observation.
Proposed Solution: Video Frame Analysis
Approach:
- Record video of page for 10 seconds using Chrome DevTools Protocol
- Extract frames using ffmpeg
- Compare consecutive frames for brightness changes
- Count flashes per second
- Flag if >3 flashes/second detected
Implementation:
# Step 1: Start video recording via CDP
# (Chrome DevTools Protocol supports screencast)
console_command "
chrome.send('Page.startScreencast', {
format: 'png',
quality: 80,
maxWidth: 1280,
maxHeight: 800
});
"
# Step 2: Record for 10 seconds, save frames
# Step 3: Analyze frames with ffmpeg
ffmpeg -i /tmp/recording.mp4 -vf "select='gt(scene,0.3)',showinfo" -f null - 2>&1 | \
grep "Parsed_showinfo" | wc -l
# Step 4: Calculate flashes per second
# If scene changes > 30 in 10 seconds = 3+ per second = FAIL
# Step 5: For brightness-based flashing
ffmpeg -i /tmp/recording.mp4 -vf "signalstats" -f null - 2>&1 | \
grep "lavfi.signalstats.YAVG" | \
awk '{print $NF}' > brightness.txt
# Analyze brightness.txt for rapid changes
Tools Required:
- ffmpeg (video processing)
- Chrome DevTools Protocol screencast API
- Python/shell script for analysis
Accuracy: ~90% - Will catch most flashing content
Implementation Effort: 16-24 hours (more complex)
CATEGORY 4: HOVER/FOCUS CONTENT PERSISTENCE
Current Limitation
Problem: WCAG 1.4.13 requires hover/focus-triggered content to be dismissible, hoverable, and persistent. Currently requires manual testing.
Proposed Solution: Automated Interaction Testing
Approach:
- Identify all elements with hover/focus event listeners
- Programmatically trigger hover/focus
- Measure how long content stays visible
- Test if Esc key dismisses content
- Test if mouse can move to triggered content
Implementation:
// Step 1: Find all elements with hover/focus handlers
const elementsWithHover = Array.from(document.querySelectorAll('*')).filter(el => {
const style = getComputedStyle(el, ':hover');
return style.display !== getComputedStyle(el).display ||
style.visibility !== getComputedStyle(el).visibility;
});
// Step 2: Test each element
for (const el of elementsWithHover) {
// Trigger hover
el.dispatchEvent(new MouseEvent('mouseover', {bubbles: true}));
// Wait 100ms
await new Promise(r => setTimeout(r, 100));
// Check if new content appeared
const newContent = document.querySelector('[role="tooltip"], .tooltip, .popover');
if (newContent) {
// Test 1: Can we hover over the new content?
const rect = newContent.getBoundingClientRect();
const canHover = rect.width > 0 && rect.height > 0;
// Test 2: Does Esc dismiss it?
document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape'}));
await new Promise(r => setTimeout(r, 100));
const dismissed = !document.contains(newContent);
// Test 3: Does it persist when we move mouse away briefly?
el.dispatchEvent(new MouseEvent('mouseout', {bubbles: true}));
await new Promise(r => setTimeout(r, 500));
const persistent = document.contains(newContent);
console.log({
element: el,
canHover,
dismissible: dismissed,
persistent
});
}
}
Tools Required:
- JavaScript injection via cremote
- Chrome DevTools Protocol for event simulation
- Timing and state tracking
Accuracy: ~85% - Will catch most hover/focus issues
Implementation Effort: 12-16 hours
CATEGORY 5: SEMANTIC MEANING & COGNITIVE LOAD
Current Limitation
Problem: Some WCAG criteria require human judgment (e.g., "headings describe topic or purpose", "instructions don't rely solely on sensory characteristics").
Proposed Solution: LLM-Assisted Analysis
Approach:
- Extract all headings, labels, and instructions
- Use LLM (Claude, GPT-4) to analyze semantic meaning
- Check for sensory-only instructions (e.g., "click the red button")
- Verify heading descriptiveness
- Flag potential issues for manual review
Implementation:
// Step 1: Extract content for analysis
const analysisData = {
headings: Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6')).map(h => ({
level: h.tagName,
text: h.textContent.trim(),
context: h.parentElement.textContent.substring(0, 200)
})),
instructions: Array.from(document.querySelectorAll('label, .instructions, [role="note"]')).map(el => ({
text: el.textContent.trim(),
context: el.parentElement.textContent.substring(0, 200)
})),
links: Array.from(document.querySelectorAll('a')).map(a => ({
text: a.textContent.trim(),
href: a.href,
context: a.parentElement.textContent.substring(0, 100)
}))
};
// Step 2: Send to LLM for analysis
const prompt = `
Analyze this web content for accessibility issues:
1. Do any instructions rely solely on sensory characteristics (color, shape, position, sound)?
Examples: "click the red button", "the square icon", "button on the right"
2. Are headings descriptive of their section content?
Flag generic headings like "More Information", "Click Here", "Welcome"
3. Are link texts descriptive of their destination?
Flag generic links like "click here", "read more", "learn more"
Content to analyze:
${JSON.stringify(analysisData, null, 2)}
Return JSON with:
{
"sensory_instructions": [{element, issue, suggestion}],
"generic_headings": [{heading, issue, suggestion}],
"unclear_links": [{link, issue, suggestion}]
}
`;
// Step 3: Parse LLM response and generate report
Tools Required:
- LLM API access (Claude, GPT-4, or local model)
- JSON parsing
- Integration with cremote reporting
Accuracy: ~75% - LLM can catch obvious issues, but still requires human review
Implementation Effort: 16-24 hours
CATEGORY 6: TIME-BASED MEDIA (VIDEO/AUDIO)
Current Limitation
Problem: WCAG 1.2.x criteria require captions, audio descriptions, and transcripts. Currently requires manual review of media content.
Proposed Solution: Automated Media Inventory & Validation
Approach:
- Detect all video/audio elements
- Check for caption tracks
- Verify caption files are accessible
- Use speech-to-text to verify caption accuracy (optional)
- Check for audio description tracks
Implementation:
// Step 1: Find all media elements
const mediaElements = {
videos: Array.from(document.querySelectorAll('video')).map(v => ({
src: v.src,
tracks: Array.from(v.querySelectorAll('track')).map(t => ({
kind: t.kind,
src: t.src,
srclang: t.srclang,
label: t.label
})),
controls: v.hasAttribute('controls'),
autoplay: v.hasAttribute('autoplay'),
duration: v.duration
})),
audios: Array.from(document.querySelectorAll('audio')).map(a => ({
src: a.src,
controls: a.hasAttribute('controls'),
autoplay: a.hasAttribute('autoplay'),
duration: a.duration
}))
};
// Step 2: Validate each video
for (const video of mediaElements.videos) {
const issues = [];
// Check for captions
const captionTrack = video.tracks.find(t => t.kind === 'captions' || t.kind === 'subtitles');
if (!captionTrack) {
issues.push('FAIL: No caption track found (WCAG 1.2.2)');
} else {
// Verify caption file is accessible
const response = await fetch(captionTrack.src);
if (!response.ok) {
issues.push(`FAIL: Caption file not accessible: ${captionTrack.src}`);
}
}
// Check for audio description
const descriptionTrack = video.tracks.find(t => t.kind === 'descriptions');
if (!descriptionTrack) {
issues.push('WARNING: No audio description track found (WCAG 1.2.5)');
}
// Check for transcript link
const transcriptLink = document.querySelector(`a[href*="transcript"]`);
if (!transcriptLink) {
issues.push('WARNING: No transcript link found (WCAG 1.2.3)');
}
console.log({video: video.src, issues});
}
Enhanced with Speech-to-Text (Optional):
# Download video
youtube-dl -o /tmp/video.mp4 $video_url
# Extract audio
ffmpeg -i /tmp/video.mp4 -vn -acodec pcm_s16le -ar 16000 /tmp/audio.wav
# Run speech-to-text (using Whisper or similar)
whisper /tmp/audio.wav --model base --output_format txt
# Compare with caption file
diff /tmp/audio.txt /tmp/captions.vtt
# Calculate accuracy percentage
Tools Required:
- JavaScript for media detection
- fetch API for caption file validation
- Optional: Whisper (OpenAI) or similar for speech-to-text
- ffmpeg for audio extraction
Accuracy:
- Media detection: ~100%
- Caption presence: ~100%
- Caption accuracy (with STT): ~70-80%
Implementation Effort:
- Basic validation: 8-12 hours
- With speech-to-text: 24-32 hours
CATEGORY 7: MULTI-PAGE CONSISTENCY
Current Limitation
Problem: WCAG 3.2.3 (Consistent Navigation) and 3.2.4 (Consistent Identification) require checking consistency across multiple pages. Currently requires manual comparison.
Proposed Solution: Automated Cross-Page Analysis
Approach:
- Crawl all pages on site
- Extract navigation structure from each page
- Compare navigation order across pages
- Extract common elements (search, login, cart, etc.)
- Verify consistent labeling and identification
Implementation:
// Step 1: Crawl site and extract navigation
const siteMap = [];
async function crawlPage(url, visited = new Set()) {
if (visited.has(url)) return;
visited.add(url);
await navigateTo(url);
const pageData = {
url,
navigation: Array.from(document.querySelectorAll('nav a, header a')).map(a => ({
text: a.textContent.trim(),
href: a.href,
order: Array.from(a.parentElement.children).indexOf(a)
})),
commonElements: {
search: document.querySelector('[type="search"], [role="search"]')?.outerHTML,
login: document.querySelector('a[href*="login"], button:contains("Login")')?.outerHTML,
cart: document.querySelector('a[href*="cart"], .cart')?.outerHTML
}
};
siteMap.push(pageData);
// Find more pages to crawl
const links = Array.from(document.querySelectorAll('a[href]'))
.map(a => a.href)
.filter(href => href.startsWith(window.location.origin));
for (const link of links.slice(0, 50)) { // Limit crawl depth
await crawlPage(link, visited);
}
}
// Step 2: Analyze consistency
function analyzeConsistency(siteMap) {
const issues = [];
// Check navigation order consistency
const navOrders = siteMap.map(page =>
page.navigation.map(n => n.text).join('|')
);
const uniqueOrders = [...new Set(navOrders)];
if (uniqueOrders.length > 1) {
issues.push({
criterion: 'WCAG 3.2.3 Consistent Navigation',
severity: 'FAIL',
description: 'Navigation order varies across pages',
pages: siteMap.filter((p, i) => navOrders[i] !== navOrders[0]).map(p => p.url)
});
}
// Check common element consistency
const searchElements = siteMap.map(p => p.commonElements.search).filter(Boolean);
if (new Set(searchElements).size > 1) {
issues.push({
criterion: 'WCAG 3.2.4 Consistent Identification',
severity: 'FAIL',
description: 'Search functionality identified inconsistently across pages'
});
}
return issues;
}
Tools Required:
- Web crawler (can use existing cremote navigation)
- DOM extraction and comparison
- Pattern matching algorithms
Accuracy: ~90% - Will catch most consistency issues
Implementation Effort: 16-24 hours
IMPLEMENTATION PRIORITY
Phase 1: High Impact, Low Effort (Weeks 1-2)
- Gradient Contrast Analysis (ImageMagick) - 8-16 hours
- Hover/Focus Content Testing (JavaScript) - 12-16 hours
- Media Inventory & Validation (Basic) - 8-12 hours
Total Phase 1: 28-44 hours
Phase 2: Medium Impact, Medium Effort (Weeks 3-4)
- Text-in-Images Detection (OCR) - 8-12 hours
- Cross-Page Consistency (Crawler) - 16-24 hours
- LLM-Assisted Semantic Analysis - 16-24 hours
Total Phase 2: 40-60 hours
Phase 3: Lower Priority, Higher Effort (Weeks 5-6)
- Animation/Flash Detection (Video analysis) - 16-24 hours
- Speech-to-Text Caption Validation - 24-32 hours
Total Phase 3: 40-56 hours
Grand Total: 108-160 hours (13-20 business days)
EXPECTED OUTCOMES
Current State:
- Automated Coverage: ~70% of WCAG 2.1 AA criteria
- Manual Review Required: ~30%
After Phase 1:
- Automated Coverage: ~78%
- Manual Review Required: ~22%
After Phase 2:
- Automated Coverage: ~85%
- Manual Review Required: ~15%
After Phase 3:
- Automated Coverage: ~90%
- Manual Review Required: ~10%
Remaining Manual Tests (~10%):
- Cognitive load assessment
- Content quality and readability
- User experience with assistive technologies
- Real-world usability testing
- Complex user interactions requiring human judgment
TECHNICAL REQUIREMENTS
Software Dependencies:
- ImageMagick - Image analysis (usually pre-installed)
- Tesseract OCR - Text detection in images
- ffmpeg - Video/audio processing
- Whisper (optional) - Speech-to-text for caption validation
- LLM API (optional) - Semantic analysis
Installation:
# Ubuntu/Debian
apt-get install imagemagick tesseract-ocr ffmpeg
# For Whisper (Python)
pip install openai-whisper
# For LLM integration
# Use existing API keys for Claude/GPT-4
Container Considerations:
- All tools should be installed in cremote container
- File paths must account for container filesystem
- Use file_download_cremotemcp for retrieving analysis results
CONCLUSION
By implementing these creative automated solutions, we can increase our accessibility testing coverage from 70% to 90%, significantly reducing manual review burden while maintaining high accuracy.
Key Principles:
- ✅ Use existing, proven tools (ImageMagick, Tesseract, ffmpeg)
- ✅ Keep solutions simple and maintainable (KISS philosophy)
- ✅ Prioritize high-impact, low-effort improvements first
- ✅ Accept that some tests will always require human judgment
- ✅ Focus on catching obvious violations automatically
Next Steps:
- Review and approve proposed solutions
- Prioritize implementation based on business needs
- Start with Phase 1 (high impact, low effort)
- Iterate and refine based on real-world testing
- Document all new automated tests in enhanced_chromium_ada_checklist.md
Document Prepared By: Cremote Development Team
Date: October 2, 2025
Status: PROPOSAL - Awaiting Approval