This commit is contained in:
Josh at WLTechBlog
2025-10-03 10:19:06 -05:00
parent 741bd19bd9
commit a27273b581
27 changed files with 11258 additions and 14 deletions

View File

@@ -0,0 +1,631 @@
# AUTOMATED TESTING ENHANCEMENTS FOR CREMOTE ADA SUITE
**Date:** October 2, 2025
**Purpose:** Propose creative solutions to automate currently manual accessibility tests
**Philosophy:** KISS - Keep it Simple, Stupid. Practical solutions using existing tools.
---
## EXECUTIVE SUMMARY
Currently, our cremote MCP suite automates ~70% of WCAG 2.1 AA testing. This document proposes practical solutions to increase automation coverage to **~85-90%** by leveraging:
1. **ImageMagick** for gradient contrast analysis
2. **Screenshot-based analysis** for visual testing
3. **OCR tools** for text-in-images detection
4. **Video frame analysis** for animation/flash testing
5. **Enhanced JavaScript injection** for deeper DOM analysis
---
## CATEGORY 1: GRADIENT & COMPLEX BACKGROUND CONTRAST
### Current Limitation
**Problem:** Axe-core reports "incomplete" for text on gradient backgrounds because it cannot calculate contrast ratios for non-solid colors.
**Example from our assessment:**
- Navigation menu links (background color could not be determined due to overlap)
- Gradient backgrounds on hero section (contrast cannot be automatically calculated)
### Proposed Solution: ImageMagick Gradient Analysis
**Approach:**
1. Take screenshot of specific element using `web_screenshot_element_cremotemcp_cremotemcp`
2. Use ImageMagick to analyze color distribution
3. Calculate contrast ratio against darkest/lightest points in gradient
4. Report worst-case contrast ratio
**Implementation:**
```bash
# Step 1: Take element screenshot
web_screenshot_element_cremotemcp(selector=".hero-section", output="/tmp/hero.png")
# Step 2: Extract text color from computed styles
text_color=$(console_command "getComputedStyle(document.querySelector('.hero-section h1')).color")
# Step 3: Find darkest and lightest colors in background
convert /tmp/hero.png -format "%[fx:minima]" info: > darkest.txt
convert /tmp/hero.png -format "%[fx:maxima]" info: > lightest.txt
# Step 4: Calculate contrast ratios
# Compare text color against both extremes
# Report the worst-case scenario
# Step 5: Sample multiple points across gradient
convert /tmp/hero.png -resize 10x10! -depth 8 txt:- | grep -v "#" | awk '{print $3}'
# This gives us 100 sample points across the gradient
```
**Tools Required:**
- ImageMagick (already available in most containers)
- Basic shell scripting
- Color contrast calculation library (can use existing cremote contrast checker)
**Accuracy:** ~95% - Will catch most gradient contrast issues
**Implementation Effort:** 8-16 hours
---
## CATEGORY 2: TEXT IN IMAGES DETECTION
### Current Limitation
**Problem:** WCAG 1.4.5 requires text to be actual text, not images of text (except logos). Currently requires manual visual inspection.
### Proposed Solution: OCR-Based Text Detection
**Approach:**
1. Screenshot all images on page
2. Run OCR (Tesseract) on each image
3. If text detected, flag for manual review
4. Cross-reference with alt text to verify equivalence
**Implementation:**
```bash
# Step 1: Extract all image URLs
images=$(console_command "Array.from(document.querySelectorAll('img')).map(img => ({src: img.src, alt: img.alt}))")
# Step 2: Download each image
for img in $images; do
curl -o /tmp/img_$i.png $img
# Step 3: Run OCR
tesseract /tmp/img_$i.png /tmp/img_$i_text
# Step 4: Check if significant text detected
word_count=$(wc -w < /tmp/img_$i_text.txt)
if [ $word_count -gt 5 ]; then
echo "WARNING: Image contains text: $img"
echo "Detected text: $(cat /tmp/img_$i_text.txt)"
echo "Alt text: $alt"
echo "MANUAL REVIEW REQUIRED: Verify if this should be HTML text instead"
fi
done
```
**Tools Required:**
- Tesseract OCR (open source, widely available)
- curl or wget for image download
- Basic shell scripting
**Accuracy:** ~80% - Will catch obvious text-in-images, may miss stylized text
**False Positives:** Logos, decorative text (acceptable - requires manual review anyway)
**Implementation Effort:** 8-12 hours
---
## CATEGORY 3: ANIMATION & FLASH DETECTION
### Current Limitation
**Problem:** WCAG 2.3.1 requires no content flashing more than 3 times per second. Currently requires manual observation.
### Proposed Solution: Video Frame Analysis
**Approach:**
1. Record video of page for 10 seconds using Chrome DevTools Protocol
2. Extract frames using ffmpeg
3. Compare consecutive frames for brightness changes
4. Count flashes per second
5. Flag if >3 flashes/second detected
**Implementation:**
```bash
# Step 1: Start video recording via CDP
# (Chrome DevTools Protocol supports screencast)
console_command "
chrome.send('Page.startScreencast', {
format: 'png',
quality: 80,
maxWidth: 1280,
maxHeight: 800
});
"
# Step 2: Record for 10 seconds, save frames
# Step 3: Analyze frames with ffmpeg
ffmpeg -i /tmp/recording.mp4 -vf "select='gt(scene,0.3)',showinfo" -f null - 2>&1 | \
grep "Parsed_showinfo" | wc -l
# Step 4: Calculate flashes per second
# If scene changes > 30 in 10 seconds = 3+ per second = FAIL
# Step 5: For brightness-based flashing
ffmpeg -i /tmp/recording.mp4 -vf "signalstats" -f null - 2>&1 | \
grep "lavfi.signalstats.YAVG" | \
awk '{print $NF}' > brightness.txt
# Analyze brightness.txt for rapid changes
```
**Tools Required:**
- ffmpeg (video processing)
- Chrome DevTools Protocol screencast API
- Python/shell script for analysis
**Accuracy:** ~90% - Will catch most flashing content
**Implementation Effort:** 16-24 hours (more complex)
---
## CATEGORY 4: HOVER/FOCUS CONTENT PERSISTENCE
### Current Limitation
**Problem:** WCAG 1.4.13 requires hover/focus-triggered content to be dismissible, hoverable, and persistent. Currently requires manual testing.
### Proposed Solution: Automated Interaction Testing
**Approach:**
1. Identify all elements with hover/focus event listeners
2. Programmatically trigger hover/focus
3. Measure how long content stays visible
4. Test if Esc key dismisses content
5. Test if mouse can move to triggered content
**Implementation:**
```javascript
// Step 1: Find all elements with hover/focus handlers
const elementsWithHover = Array.from(document.querySelectorAll('*')).filter(el => {
const style = getComputedStyle(el, ':hover');
return style.display !== getComputedStyle(el).display ||
style.visibility !== getComputedStyle(el).visibility;
});
// Step 2: Test each element
for (const el of elementsWithHover) {
// Trigger hover
el.dispatchEvent(new MouseEvent('mouseover', {bubbles: true}));
// Wait 100ms
await new Promise(r => setTimeout(r, 100));
// Check if new content appeared
const newContent = document.querySelector('[role="tooltip"], .tooltip, .popover');
if (newContent) {
// Test 1: Can we hover over the new content?
const rect = newContent.getBoundingClientRect();
const canHover = rect.width > 0 && rect.height > 0;
// Test 2: Does Esc dismiss it?
document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape'}));
await new Promise(r => setTimeout(r, 100));
const dismissed = !document.contains(newContent);
// Test 3: Does it persist when we move mouse away briefly?
el.dispatchEvent(new MouseEvent('mouseout', {bubbles: true}));
await new Promise(r => setTimeout(r, 500));
const persistent = document.contains(newContent);
console.log({
element: el,
canHover,
dismissible: dismissed,
persistent
});
}
}
```
**Tools Required:**
- JavaScript injection via cremote
- Chrome DevTools Protocol for event simulation
- Timing and state tracking
**Accuracy:** ~85% - Will catch most hover/focus issues
**Implementation Effort:** 12-16 hours
---
## CATEGORY 5: SEMANTIC MEANING & COGNITIVE LOAD
### Current Limitation
**Problem:** Some WCAG criteria require human judgment (e.g., "headings describe topic or purpose", "instructions don't rely solely on sensory characteristics").
### Proposed Solution: LLM-Assisted Analysis
**Approach:**
1. Extract all headings, labels, and instructions
2. Use LLM (Claude, GPT-4) to analyze semantic meaning
3. Check for sensory-only instructions (e.g., "click the red button")
4. Verify heading descriptiveness
5. Flag potential issues for manual review
**Implementation:**
```javascript
// Step 1: Extract content for analysis
const analysisData = {
headings: Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6')).map(h => ({
level: h.tagName,
text: h.textContent.trim(),
context: h.parentElement.textContent.substring(0, 200)
})),
instructions: Array.from(document.querySelectorAll('label, .instructions, [role="note"]')).map(el => ({
text: el.textContent.trim(),
context: el.parentElement.textContent.substring(0, 200)
})),
links: Array.from(document.querySelectorAll('a')).map(a => ({
text: a.textContent.trim(),
href: a.href,
context: a.parentElement.textContent.substring(0, 100)
}))
};
// Step 2: Send to LLM for analysis
const prompt = `
Analyze this web content for accessibility issues:
1. Do any instructions rely solely on sensory characteristics (color, shape, position, sound)?
Examples: "click the red button", "the square icon", "button on the right"
2. Are headings descriptive of their section content?
Flag generic headings like "More Information", "Click Here", "Welcome"
3. Are link texts descriptive of their destination?
Flag generic links like "click here", "read more", "learn more"
Content to analyze:
${JSON.stringify(analysisData, null, 2)}
Return JSON with:
{
"sensory_instructions": [{element, issue, suggestion}],
"generic_headings": [{heading, issue, suggestion}],
"unclear_links": [{link, issue, suggestion}]
}
`;
// Step 3: Parse LLM response and generate report
```
**Tools Required:**
- LLM API access (Claude, GPT-4, or local model)
- JSON parsing
- Integration with cremote reporting
**Accuracy:** ~75% - LLM can catch obvious issues, but still requires human review
**Implementation Effort:** 16-24 hours
---
## CATEGORY 6: TIME-BASED MEDIA (VIDEO/AUDIO)
### Current Limitation
**Problem:** WCAG 1.2.x criteria require captions, audio descriptions, and transcripts. Currently requires manual review of media content.
### Proposed Solution: Automated Media Inventory & Validation
**Approach:**
1. Detect all video/audio elements
2. Check for caption tracks
3. Verify caption files are accessible
4. Use speech-to-text to verify caption accuracy (optional)
5. Check for audio description tracks
**Implementation:**
```javascript
// Step 1: Find all media elements
const mediaElements = {
videos: Array.from(document.querySelectorAll('video')).map(v => ({
src: v.src,
tracks: Array.from(v.querySelectorAll('track')).map(t => ({
kind: t.kind,
src: t.src,
srclang: t.srclang,
label: t.label
})),
controls: v.hasAttribute('controls'),
autoplay: v.hasAttribute('autoplay'),
duration: v.duration
})),
audios: Array.from(document.querySelectorAll('audio')).map(a => ({
src: a.src,
controls: a.hasAttribute('controls'),
autoplay: a.hasAttribute('autoplay'),
duration: a.duration
}))
};
// Step 2: Validate each video
for (const video of mediaElements.videos) {
const issues = [];
// Check for captions
const captionTrack = video.tracks.find(t => t.kind === 'captions' || t.kind === 'subtitles');
if (!captionTrack) {
issues.push('FAIL: No caption track found (WCAG 1.2.2)');
} else {
// Verify caption file is accessible
const response = await fetch(captionTrack.src);
if (!response.ok) {
issues.push(`FAIL: Caption file not accessible: ${captionTrack.src}`);
}
}
// Check for audio description
const descriptionTrack = video.tracks.find(t => t.kind === 'descriptions');
if (!descriptionTrack) {
issues.push('WARNING: No audio description track found (WCAG 1.2.5)');
}
// Check for transcript link
const transcriptLink = document.querySelector(`a[href*="transcript"]`);
if (!transcriptLink) {
issues.push('WARNING: No transcript link found (WCAG 1.2.3)');
}
console.log({video: video.src, issues});
}
```
**Enhanced with Speech-to-Text (Optional):**
```bash
# Download video
youtube-dl -o /tmp/video.mp4 $video_url
# Extract audio
ffmpeg -i /tmp/video.mp4 -vn -acodec pcm_s16le -ar 16000 /tmp/audio.wav
# Run speech-to-text (using Whisper or similar)
whisper /tmp/audio.wav --model base --output_format txt
# Compare with caption file
diff /tmp/audio.txt /tmp/captions.vtt
# Calculate accuracy percentage
```
**Tools Required:**
- JavaScript for media detection
- fetch API for caption file validation
- Optional: Whisper (OpenAI) or similar for speech-to-text
- ffmpeg for audio extraction
**Accuracy:**
- Media detection: ~100%
- Caption presence: ~100%
- Caption accuracy (with STT): ~70-80%
**Implementation Effort:**
- Basic validation: 8-12 hours
- With speech-to-text: 24-32 hours
---
## CATEGORY 7: MULTI-PAGE CONSISTENCY
### Current Limitation
**Problem:** WCAG 3.2.3 (Consistent Navigation) and 3.2.4 (Consistent Identification) require checking consistency across multiple pages. Currently requires manual comparison.
### Proposed Solution: Automated Cross-Page Analysis
**Approach:**
1. Crawl all pages on site
2. Extract navigation structure from each page
3. Compare navigation order across pages
4. Extract common elements (search, login, cart, etc.)
5. Verify consistent labeling and identification
**Implementation:**
```javascript
// Step 1: Crawl site and extract navigation
const siteMap = [];
async function crawlPage(url, visited = new Set()) {
if (visited.has(url)) return;
visited.add(url);
await navigateTo(url);
const pageData = {
url,
navigation: Array.from(document.querySelectorAll('nav a, header a')).map(a => ({
text: a.textContent.trim(),
href: a.href,
order: Array.from(a.parentElement.children).indexOf(a)
})),
commonElements: {
search: document.querySelector('[type="search"], [role="search"]')?.outerHTML,
login: document.querySelector('a[href*="login"], button:contains("Login")')?.outerHTML,
cart: document.querySelector('a[href*="cart"], .cart')?.outerHTML
}
};
siteMap.push(pageData);
// Find more pages to crawl
const links = Array.from(document.querySelectorAll('a[href]'))
.map(a => a.href)
.filter(href => href.startsWith(window.location.origin));
for (const link of links.slice(0, 50)) { // Limit crawl depth
await crawlPage(link, visited);
}
}
// Step 2: Analyze consistency
function analyzeConsistency(siteMap) {
const issues = [];
// Check navigation order consistency
const navOrders = siteMap.map(page =>
page.navigation.map(n => n.text).join('|')
);
const uniqueOrders = [...new Set(navOrders)];
if (uniqueOrders.length > 1) {
issues.push({
criterion: 'WCAG 3.2.3 Consistent Navigation',
severity: 'FAIL',
description: 'Navigation order varies across pages',
pages: siteMap.filter((p, i) => navOrders[i] !== navOrders[0]).map(p => p.url)
});
}
// Check common element consistency
const searchElements = siteMap.map(p => p.commonElements.search).filter(Boolean);
if (new Set(searchElements).size > 1) {
issues.push({
criterion: 'WCAG 3.2.4 Consistent Identification',
severity: 'FAIL',
description: 'Search functionality identified inconsistently across pages'
});
}
return issues;
}
```
**Tools Required:**
- Web crawler (can use existing cremote navigation)
- DOM extraction and comparison
- Pattern matching algorithms
**Accuracy:** ~90% - Will catch most consistency issues
**Implementation Effort:** 16-24 hours
---
## IMPLEMENTATION PRIORITY
### Phase 1: High Impact, Low Effort (Weeks 1-2)
1. **Gradient Contrast Analysis** (ImageMagick) - 8-16 hours
2. **Hover/Focus Content Testing** (JavaScript) - 12-16 hours
3. **Media Inventory & Validation** (Basic) - 8-12 hours
**Total Phase 1:** 28-44 hours
### Phase 2: Medium Impact, Medium Effort (Weeks 3-4)
4. **Text-in-Images Detection** (OCR) - 8-12 hours
5. **Cross-Page Consistency** (Crawler) - 16-24 hours
6. **LLM-Assisted Semantic Analysis** - 16-24 hours
**Total Phase 2:** 40-60 hours
### Phase 3: Lower Priority, Higher Effort (Weeks 5-6)
7. **Animation/Flash Detection** (Video analysis) - 16-24 hours
8. **Speech-to-Text Caption Validation** - 24-32 hours
**Total Phase 3:** 40-56 hours
**Grand Total:** 108-160 hours (13-20 business days)
---
## EXPECTED OUTCOMES
### Current State:
- **Automated Coverage:** ~70% of WCAG 2.1 AA criteria
- **Manual Review Required:** ~30%
### After Phase 1:
- **Automated Coverage:** ~78%
- **Manual Review Required:** ~22%
### After Phase 2:
- **Automated Coverage:** ~85%
- **Manual Review Required:** ~15%
### After Phase 3:
- **Automated Coverage:** ~90%
- **Manual Review Required:** ~10%
### Remaining Manual Tests (~10%):
- Cognitive load assessment
- Content quality and readability
- User experience with assistive technologies
- Real-world usability testing
- Complex user interactions requiring human judgment
---
## TECHNICAL REQUIREMENTS
### Software Dependencies:
- **ImageMagick** - Image analysis (usually pre-installed)
- **Tesseract OCR** - Text detection in images
- **ffmpeg** - Video/audio processing
- **Whisper** (optional) - Speech-to-text for caption validation
- **LLM API** (optional) - Semantic analysis
### Installation:
```bash
# Ubuntu/Debian
apt-get install imagemagick tesseract-ocr ffmpeg
# For Whisper (Python)
pip install openai-whisper
# For LLM integration
# Use existing API keys for Claude/GPT-4
```
### Container Considerations:
- All tools should be installed in cremote container
- File paths must account for container filesystem
- Use file_download_cremotemcp for retrieving analysis results
---
## CONCLUSION
By implementing these creative automated solutions, we can increase our accessibility testing coverage from **70% to 90%**, significantly reducing manual review burden while maintaining high accuracy.
**Key Principles:**
- ✅ Use existing, proven tools (ImageMagick, Tesseract, ffmpeg)
- ✅ Keep solutions simple and maintainable (KISS philosophy)
- ✅ Prioritize high-impact, low-effort improvements first
- ✅ Accept that some tests will always require human judgment
- ✅ Focus on catching obvious violations automatically
**Next Steps:**
1. Review and approve proposed solutions
2. Prioritize implementation based on business needs
3. Start with Phase 1 (high impact, low effort)
4. Iterate and refine based on real-world testing
5. Document all new automated tests in enhanced_chromium_ada_checklist.md
---
**Document Prepared By:** Cremote Development Team
**Date:** October 2, 2025
**Status:** PROPOSAL - Awaiting Approval