Files

Josh at WLTechBlog a27273b581 bump

2025-10-03 10:19:06 -05:00

18 KiB

Raw Blame History

AUTOMATED TESTING ENHANCEMENTS FOR CREMOTE ADA SUITE

Date: October 2, 2025
Purpose: Propose creative solutions to automate currently manual accessibility tests
Philosophy: KISS - Keep it Simple, Stupid. Practical solutions using existing tools.

EXECUTIVE SUMMARY

Currently, our cremote MCP suite automates ~70% of WCAG 2.1 AA testing. This document proposes practical solutions to increase automation coverage to ~85-90% by leveraging:

ImageMagick for gradient contrast analysis
Screenshot-based analysis for visual testing
OCR tools for text-in-images detection
Video frame analysis for animation/flash testing
Enhanced JavaScript injection for deeper DOM analysis

CATEGORY 1: GRADIENT & COMPLEX BACKGROUND CONTRAST

Current Limitation

Problem: Axe-core reports "incomplete" for text on gradient backgrounds because it cannot calculate contrast ratios for non-solid colors.

Example from our assessment:

Navigation menu links (background color could not be determined due to overlap)
Gradient backgrounds on hero section (contrast cannot be automatically calculated)

Proposed Solution: ImageMagick Gradient Analysis

Approach:

Take screenshot of specific element using web_screenshot_element_cremotemcp_cremotemcp
Use ImageMagick to analyze color distribution
Calculate contrast ratio against darkest/lightest points in gradient
Report worst-case contrast ratio

Implementation:

# Step 1: Take element screenshot
web_screenshot_element_cremotemcp(selector=".hero-section", output="/tmp/hero.png")

# Step 2: Extract text color from computed styles
text_color=$(console_command "getComputedStyle(document.querySelector('.hero-section h1')).color")

# Step 3: Find darkest and lightest colors in background
convert /tmp/hero.png -format "%[fx:minima]" info: > darkest.txt
convert /tmp/hero.png -format "%[fx:maxima]" info: > lightest.txt

# Step 4: Calculate contrast ratios
# Compare text color against both extremes
# Report the worst-case scenario

# Step 5: Sample multiple points across gradient
convert /tmp/hero.png -resize 10x10! -depth 8 txt:- | grep -v "#" | awk '{print $3}'
# This gives us 100 sample points across the gradient

Tools Required:

ImageMagick (already available in most containers)
Basic shell scripting
Color contrast calculation library (can use existing cremote contrast checker)

Accuracy: ~95% - Will catch most gradient contrast issues

Implementation Effort: 8-16 hours

CATEGORY 2: TEXT IN IMAGES DETECTION

Current Limitation

Problem: WCAG 1.4.5 requires text to be actual text, not images of text (except logos). Currently requires manual visual inspection.

Proposed Solution: OCR-Based Text Detection

Approach:

Screenshot all images on page
Run OCR (Tesseract) on each image
If text detected, flag for manual review
Cross-reference with alt text to verify equivalence

Implementation:

# Step 1: Extract all image URLs
images=$(console_command "Array.from(document.querySelectorAll('img')).map(img => ({src: img.src, alt: img.alt}))")

# Step 2: Download each image
for img in $images; do
  curl -o /tmp/img_$i.png $img
  
  # Step 3: Run OCR
  tesseract /tmp/img_$i.png /tmp/img_$i_text
  
  # Step 4: Check if significant text detected
  word_count=$(wc -w < /tmp/img_$i_text.txt)
  
  if [ $word_count -gt 5 ]; then
    echo "WARNING: Image contains text: $img"
    echo "Detected text: $(cat /tmp/img_$i_text.txt)"
    echo "Alt text: $alt"
    echo "MANUAL REVIEW REQUIRED: Verify if this should be HTML text instead"
  fi
done

Tools Required:

Tesseract OCR (open source, widely available)
curl or wget for image download
Basic shell scripting

Accuracy: ~80% - Will catch obvious text-in-images, may miss stylized text

False Positives: Logos, decorative text (acceptable - requires manual review anyway)

Implementation Effort: 8-12 hours

CATEGORY 3: ANIMATION & FLASH DETECTION

Current Limitation

Problem: WCAG 2.3.1 requires no content flashing more than 3 times per second. Currently requires manual observation.

Proposed Solution: Video Frame Analysis

Approach:

Record video of page for 10 seconds using Chrome DevTools Protocol
Extract frames using ffmpeg
Compare consecutive frames for brightness changes
Count flashes per second
Flag if >3 flashes/second detected

Implementation:

# Step 1: Start video recording via CDP
# (Chrome DevTools Protocol supports screencast)
console_command "
  chrome.send('Page.startScreencast', {
    format: 'png',
    quality: 80,
    maxWidth: 1280,
    maxHeight: 800
  });
"

# Step 2: Record for 10 seconds, save frames

# Step 3: Analyze frames with ffmpeg
ffmpeg -i /tmp/recording.mp4 -vf "select='gt(scene,0.3)',showinfo" -f null - 2>&1 | \
  grep "Parsed_showinfo" | wc -l

# Step 4: Calculate flashes per second
# If scene changes > 30 in 10 seconds = 3+ per second = FAIL

# Step 5: For brightness-based flashing
ffmpeg -i /tmp/recording.mp4 -vf "signalstats" -f null - 2>&1 | \
  grep "lavfi.signalstats.YAVG" | \
  awk '{print $NF}' > brightness.txt

# Analyze brightness.txt for rapid changes

Tools Required:

ffmpeg (video processing)
Chrome DevTools Protocol screencast API
Python/shell script for analysis

Accuracy: ~90% - Will catch most flashing content

Implementation Effort: 16-24 hours (more complex)

CATEGORY 4: HOVER/FOCUS CONTENT PERSISTENCE

Current Limitation

Problem: WCAG 1.4.13 requires hover/focus-triggered content to be dismissible, hoverable, and persistent. Currently requires manual testing.

Proposed Solution: Automated Interaction Testing

Approach:

Identify all elements with hover/focus event listeners
Programmatically trigger hover/focus
Measure how long content stays visible
Test if Esc key dismisses content
Test if mouse can move to triggered content

Implementation:

// Step 1: Find all elements with hover/focus handlers
const elementsWithHover = Array.from(document.querySelectorAll('*')).filter(el => {
  const style = getComputedStyle(el, ':hover');
  return style.display !== getComputedStyle(el).display ||
         style.visibility !== getComputedStyle(el).visibility;
});

// Step 2: Test each element
for (const el of elementsWithHover) {
  // Trigger hover
  el.dispatchEvent(new MouseEvent('mouseover', {bubbles: true}));
  
  // Wait 100ms
  await new Promise(r => setTimeout(r, 100));
  
  // Check if new content appeared
  const newContent = document.querySelector('[role="tooltip"], .tooltip, .popover');
  
  if (newContent) {
    // Test 1: Can we hover over the new content?
    const rect = newContent.getBoundingClientRect();
    const canHover = rect.width > 0 && rect.height > 0;
    
    // Test 2: Does Esc dismiss it?
    document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape'}));
    await new Promise(r => setTimeout(r, 100));
    const dismissed = !document.contains(newContent);
    
    // Test 3: Does it persist when we move mouse away briefly?
    el.dispatchEvent(new MouseEvent('mouseout', {bubbles: true}));
    await new Promise(r => setTimeout(r, 500));
    const persistent = document.contains(newContent);
    
    console.log({
      element: el,
      canHover,
      dismissible: dismissed,
      persistent
    });
  }
}

Tools Required:

JavaScript injection via cremote
Chrome DevTools Protocol for event simulation
Timing and state tracking

Accuracy: ~85% - Will catch most hover/focus issues

Implementation Effort: 12-16 hours

CATEGORY 5: SEMANTIC MEANING & COGNITIVE LOAD

Current Limitation

Problem: Some WCAG criteria require human judgment (e.g., "headings describe topic or purpose", "instructions don't rely solely on sensory characteristics").

Proposed Solution: LLM-Assisted Analysis

Approach:

Extract all headings, labels, and instructions
Use LLM (Claude, GPT-4) to analyze semantic meaning
Check for sensory-only instructions (e.g., "click the red button")
Verify heading descriptiveness
Flag potential issues for manual review

Implementation:

// Step 1: Extract content for analysis
const analysisData = {
  headings: Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6')).map(h => ({
    level: h.tagName,
    text: h.textContent.trim(),
    context: h.parentElement.textContent.substring(0, 200)
  })),
  
  instructions: Array.from(document.querySelectorAll('label, .instructions, [role="note"]')).map(el => ({
    text: el.textContent.trim(),
    context: el.parentElement.textContent.substring(0, 200)
  })),
  
  links: Array.from(document.querySelectorAll('a')).map(a => ({
    text: a.textContent.trim(),
    href: a.href,
    context: a.parentElement.textContent.substring(0, 100)
  }))
};

// Step 2: Send to LLM for analysis
const prompt = `
Analyze this web content for accessibility issues:

1. Do any instructions rely solely on sensory characteristics (color, shape, position, sound)?
   Examples: "click the red button", "the square icon", "button on the right"
   
2. Are headings descriptive of their section content?
   Flag generic headings like "More Information", "Click Here", "Welcome"
   
3. Are link texts descriptive of their destination?
   Flag generic links like "click here", "read more", "learn more"

Content to analyze:
${JSON.stringify(analysisData, null, 2)}

Return JSON with:
{
  "sensory_instructions": [{element, issue, suggestion}],
  "generic_headings": [{heading, issue, suggestion}],
  "unclear_links": [{link, issue, suggestion}]
}
`;

// Step 3: Parse LLM response and generate report

Tools Required:

LLM API access (Claude, GPT-4, or local model)
JSON parsing
Integration with cremote reporting

Accuracy: ~75% - LLM can catch obvious issues, but still requires human review

Implementation Effort: 16-24 hours

CATEGORY 6: TIME-BASED MEDIA (VIDEO/AUDIO)

Current Limitation

Problem: WCAG 1.2.x criteria require captions, audio descriptions, and transcripts. Currently requires manual review of media content.

Proposed Solution: Automated Media Inventory & Validation

Approach:

Detect all video/audio elements
Check for caption tracks
Verify caption files are accessible
Use speech-to-text to verify caption accuracy (optional)
Check for audio description tracks

Implementation:

// Step 1: Find all media elements
const mediaElements = {
  videos: Array.from(document.querySelectorAll('video')).map(v => ({
    src: v.src,
    tracks: Array.from(v.querySelectorAll('track')).map(t => ({
      kind: t.kind,
      src: t.src,
      srclang: t.srclang,
      label: t.label
    })),
    controls: v.hasAttribute('controls'),
    autoplay: v.hasAttribute('autoplay'),
    duration: v.duration
  })),
  
  audios: Array.from(document.querySelectorAll('audio')).map(a => ({
    src: a.src,
    controls: a.hasAttribute('controls'),
    autoplay: a.hasAttribute('autoplay'),
    duration: a.duration
  }))
};

// Step 2: Validate each video
for (const video of mediaElements.videos) {
  const issues = [];
  
  // Check for captions
  const captionTrack = video.tracks.find(t => t.kind === 'captions' || t.kind === 'subtitles');
  if (!captionTrack) {
    issues.push('FAIL: No caption track found (WCAG 1.2.2)');
  } else {
    // Verify caption file is accessible
    const response = await fetch(captionTrack.src);
    if (!response.ok) {
      issues.push(`FAIL: Caption file not accessible: ${captionTrack.src}`);
    }
  }
  
  // Check for audio description
  const descriptionTrack = video.tracks.find(t => t.kind === 'descriptions');
  if (!descriptionTrack) {
    issues.push('WARNING: No audio description track found (WCAG 1.2.5)');
  }
  
  // Check for transcript link
  const transcriptLink = document.querySelector(`a[href*="transcript"]`);
  if (!transcriptLink) {
    issues.push('WARNING: No transcript link found (WCAG 1.2.3)');
  }
  
  console.log({video: video.src, issues});
}

Enhanced with Speech-to-Text (Optional):

# Download video
youtube-dl -o /tmp/video.mp4 $video_url

# Extract audio
ffmpeg -i /tmp/video.mp4 -vn -acodec pcm_s16le -ar 16000 /tmp/audio.wav

# Run speech-to-text (using Whisper or similar)
whisper /tmp/audio.wav --model base --output_format txt

# Compare with caption file
diff /tmp/audio.txt /tmp/captions.vtt

# Calculate accuracy percentage

Tools Required:

JavaScript for media detection
fetch API for caption file validation
Optional: Whisper (OpenAI) or similar for speech-to-text
ffmpeg for audio extraction

Accuracy:

Media detection: ~100%
Caption presence: ~100%
Caption accuracy (with STT): ~70-80%

Implementation Effort:

Basic validation: 8-12 hours
With speech-to-text: 24-32 hours

CATEGORY 7: MULTI-PAGE CONSISTENCY

Current Limitation

Problem: WCAG 3.2.3 (Consistent Navigation) and 3.2.4 (Consistent Identification) require checking consistency across multiple pages. Currently requires manual comparison.

Proposed Solution: Automated Cross-Page Analysis

Approach:

Crawl all pages on site
Extract navigation structure from each page
Compare navigation order across pages
Extract common elements (search, login, cart, etc.)
Verify consistent labeling and identification

Implementation:

// Step 1: Crawl site and extract navigation
const siteMap = [];

async function crawlPage(url, visited = new Set()) {
  if (visited.has(url)) return;
  visited.add(url);
  
  await navigateTo(url);
  
  const pageData = {
    url,
    navigation: Array.from(document.querySelectorAll('nav a, header a')).map(a => ({
      text: a.textContent.trim(),
      href: a.href,
      order: Array.from(a.parentElement.children).indexOf(a)
    })),
    commonElements: {
      search: document.querySelector('[type="search"], [role="search"]')?.outerHTML,
      login: document.querySelector('a[href*="login"], button:contains("Login")')?.outerHTML,
      cart: document.querySelector('a[href*="cart"], .cart')?.outerHTML
    }
  };
  
  siteMap.push(pageData);
  
  // Find more pages to crawl
  const links = Array.from(document.querySelectorAll('a[href]'))
    .map(a => a.href)
    .filter(href => href.startsWith(window.location.origin));
  
  for (const link of links.slice(0, 50)) { // Limit crawl depth
    await crawlPage(link, visited);
  }
}

// Step 2: Analyze consistency
function analyzeConsistency(siteMap) {
  const issues = [];
  
  // Check navigation order consistency
  const navOrders = siteMap.map(page => 
    page.navigation.map(n => n.text).join('|')
  );
  
  const uniqueOrders = [...new Set(navOrders)];
  if (uniqueOrders.length > 1) {
    issues.push({
      criterion: 'WCAG 3.2.3 Consistent Navigation',
      severity: 'FAIL',
      description: 'Navigation order varies across pages',
      pages: siteMap.filter((p, i) => navOrders[i] !== navOrders[0]).map(p => p.url)
    });
  }
  
  // Check common element consistency
  const searchElements = siteMap.map(p => p.commonElements.search).filter(Boolean);
  if (new Set(searchElements).size > 1) {
    issues.push({
      criterion: 'WCAG 3.2.4 Consistent Identification',
      severity: 'FAIL',
      description: 'Search functionality identified inconsistently across pages'
    });
  }
  
  return issues;
}

Tools Required:

Web crawler (can use existing cremote navigation)
DOM extraction and comparison
Pattern matching algorithms

Accuracy: ~90% - Will catch most consistency issues

Implementation Effort: 16-24 hours

IMPLEMENTATION PRIORITY

Phase 1: High Impact, Low Effort (Weeks 1-2)

Gradient Contrast Analysis (ImageMagick) - 8-16 hours
Hover/Focus Content Testing (JavaScript) - 12-16 hours
Media Inventory & Validation (Basic) - 8-12 hours

Total Phase 1: 28-44 hours

Phase 2: Medium Impact, Medium Effort (Weeks 3-4)

Text-in-Images Detection (OCR) - 8-12 hours
Cross-Page Consistency (Crawler) - 16-24 hours
LLM-Assisted Semantic Analysis - 16-24 hours

Total Phase 2: 40-60 hours

Phase 3: Lower Priority, Higher Effort (Weeks 5-6)

Animation/Flash Detection (Video analysis) - 16-24 hours
Speech-to-Text Caption Validation - 24-32 hours

Total Phase 3: 40-56 hours

Grand Total: 108-160 hours (13-20 business days)

EXPECTED OUTCOMES

Current State:

Automated Coverage: ~70% of WCAG 2.1 AA criteria
Manual Review Required: ~30%

After Phase 1:

Automated Coverage: ~78%
Manual Review Required: ~22%

After Phase 2:

Automated Coverage: ~85%
Manual Review Required: ~15%

After Phase 3:

Automated Coverage: ~90%
Manual Review Required: ~10%

Remaining Manual Tests (~10%):

Cognitive load assessment
Content quality and readability
User experience with assistive technologies
Real-world usability testing
Complex user interactions requiring human judgment

TECHNICAL REQUIREMENTS

Software Dependencies:

ImageMagick - Image analysis (usually pre-installed)
Tesseract OCR - Text detection in images
ffmpeg - Video/audio processing
Whisper (optional) - Speech-to-text for caption validation
LLM API (optional) - Semantic analysis

Installation:

# Ubuntu/Debian
apt-get install imagemagick tesseract-ocr ffmpeg

# For Whisper (Python)
pip install openai-whisper

# For LLM integration
# Use existing API keys for Claude/GPT-4

Container Considerations:

All tools should be installed in cremote container
File paths must account for container filesystem
Use file_download_cremotemcp for retrieving analysis results

CONCLUSION

By implementing these creative automated solutions, we can increase our accessibility testing coverage from 70% to 90%, significantly reducing manual review burden while maintaining high accuracy.

Key Principles:

✅ Use existing, proven tools (ImageMagick, Tesseract, ffmpeg)
✅ Keep solutions simple and maintainable (KISS philosophy)
✅ Prioritize high-impact, low-effort improvements first
✅ Accept that some tests will always require human judgment
✅ Focus on catching obvious violations automatically

Next Steps:

Review and approve proposed solutions
Prioritize implementation based on business needs
Start with Phase 1 (high impact, low effort)
Iterate and refine based on real-world testing
Document all new automated tests in enhanced_chromium_ada_checklist.md

Document Prepared By: Cremote Development Team
Date: October 2, 2025
Status: PROPOSAL - Awaiting Approval

18 KiB Raw Blame History

AUTOMATED TESTING ENHANCEMENTS FOR CREMOTE ADA SUITE

EXECUTIVE SUMMARY

CATEGORY 1: GRADIENT & COMPLEX BACKGROUND CONTRAST

Current Limitation

Proposed Solution: ImageMagick Gradient Analysis

CATEGORY 2: TEXT IN IMAGES DETECTION

Current Limitation

Proposed Solution: OCR-Based Text Detection

CATEGORY 3: ANIMATION & FLASH DETECTION

Current Limitation

Proposed Solution: Video Frame Analysis

CATEGORY 4: HOVER/FOCUS CONTENT PERSISTENCE

Current Limitation

Proposed Solution: Automated Interaction Testing

CATEGORY 5: SEMANTIC MEANING & COGNITIVE LOAD

Current Limitation

Proposed Solution: LLM-Assisted Analysis

CATEGORY 6: TIME-BASED MEDIA (VIDEO/AUDIO)

Current Limitation

Proposed Solution: Automated Media Inventory & Validation

CATEGORY 7: MULTI-PAGE CONSISTENCY

Current Limitation

Proposed Solution: Automated Cross-Page Analysis

IMPLEMENTATION PRIORITY

Phase 1: High Impact, Low Effort (Weeks 1-2)

Phase 2: Medium Impact, Medium Effort (Weeks 3-4)

Phase 3: Lower Priority, Higher Effort (Weeks 5-6)

EXPECTED OUTCOMES

Current State:

After Phase 1:

After Phase 2:

After Phase 3:

Remaining Manual Tests (~10%):

TECHNICAL REQUIREMENTS

Software Dependencies:

Installation:

Container Considerations:

CONCLUSION

18 KiB

Raw Blame History