Files
cremote/AUTOMATED_TESTING_ENHANCEMENTS.md
Josh at WLTechBlog a27273b581 bump
2025-10-03 10:19:06 -05:00

18 KiB

AUTOMATED TESTING ENHANCEMENTS FOR CREMOTE ADA SUITE

Date: October 2, 2025
Purpose: Propose creative solutions to automate currently manual accessibility tests
Philosophy: KISS - Keep it Simple, Stupid. Practical solutions using existing tools.


EXECUTIVE SUMMARY

Currently, our cremote MCP suite automates ~70% of WCAG 2.1 AA testing. This document proposes practical solutions to increase automation coverage to ~85-90% by leveraging:

  1. ImageMagick for gradient contrast analysis
  2. Screenshot-based analysis for visual testing
  3. OCR tools for text-in-images detection
  4. Video frame analysis for animation/flash testing
  5. Enhanced JavaScript injection for deeper DOM analysis

CATEGORY 1: GRADIENT & COMPLEX BACKGROUND CONTRAST

Current Limitation

Problem: Axe-core reports "incomplete" for text on gradient backgrounds because it cannot calculate contrast ratios for non-solid colors.

Example from our assessment:

  • Navigation menu links (background color could not be determined due to overlap)
  • Gradient backgrounds on hero section (contrast cannot be automatically calculated)

Proposed Solution: ImageMagick Gradient Analysis

Approach:

  1. Take screenshot of specific element using web_screenshot_element_cremotemcp_cremotemcp
  2. Use ImageMagick to analyze color distribution
  3. Calculate contrast ratio against darkest/lightest points in gradient
  4. Report worst-case contrast ratio

Implementation:

# Step 1: Take element screenshot
web_screenshot_element_cremotemcp(selector=".hero-section", output="/tmp/hero.png")

# Step 2: Extract text color from computed styles
text_color=$(console_command "getComputedStyle(document.querySelector('.hero-section h1')).color")

# Step 3: Find darkest and lightest colors in background
convert /tmp/hero.png -format "%[fx:minima]" info: > darkest.txt
convert /tmp/hero.png -format "%[fx:maxima]" info: > lightest.txt

# Step 4: Calculate contrast ratios
# Compare text color against both extremes
# Report the worst-case scenario

# Step 5: Sample multiple points across gradient
convert /tmp/hero.png -resize 10x10! -depth 8 txt:- | grep -v "#" | awk '{print $3}'
# This gives us 100 sample points across the gradient

Tools Required:

  • ImageMagick (already available in most containers)
  • Basic shell scripting
  • Color contrast calculation library (can use existing cremote contrast checker)

Accuracy: ~95% - Will catch most gradient contrast issues

Implementation Effort: 8-16 hours


CATEGORY 2: TEXT IN IMAGES DETECTION

Current Limitation

Problem: WCAG 1.4.5 requires text to be actual text, not images of text (except logos). Currently requires manual visual inspection.

Proposed Solution: OCR-Based Text Detection

Approach:

  1. Screenshot all images on page
  2. Run OCR (Tesseract) on each image
  3. If text detected, flag for manual review
  4. Cross-reference with alt text to verify equivalence

Implementation:

# Step 1: Extract all image URLs
images=$(console_command "Array.from(document.querySelectorAll('img')).map(img => ({src: img.src, alt: img.alt}))")

# Step 2: Download each image
for img in $images; do
  curl -o /tmp/img_$i.png $img
  
  # Step 3: Run OCR
  tesseract /tmp/img_$i.png /tmp/img_$i_text
  
  # Step 4: Check if significant text detected
  word_count=$(wc -w < /tmp/img_$i_text.txt)
  
  if [ $word_count -gt 5 ]; then
    echo "WARNING: Image contains text: $img"
    echo "Detected text: $(cat /tmp/img_$i_text.txt)"
    echo "Alt text: $alt"
    echo "MANUAL REVIEW REQUIRED: Verify if this should be HTML text instead"
  fi
done

Tools Required:

  • Tesseract OCR (open source, widely available)
  • curl or wget for image download
  • Basic shell scripting

Accuracy: ~80% - Will catch obvious text-in-images, may miss stylized text

False Positives: Logos, decorative text (acceptable - requires manual review anyway)

Implementation Effort: 8-12 hours


CATEGORY 3: ANIMATION & FLASH DETECTION

Current Limitation

Problem: WCAG 2.3.1 requires no content flashing more than 3 times per second. Currently requires manual observation.

Proposed Solution: Video Frame Analysis

Approach:

  1. Record video of page for 10 seconds using Chrome DevTools Protocol
  2. Extract frames using ffmpeg
  3. Compare consecutive frames for brightness changes
  4. Count flashes per second
  5. Flag if >3 flashes/second detected

Implementation:

# Step 1: Start video recording via CDP
# (Chrome DevTools Protocol supports screencast)
console_command "
  chrome.send('Page.startScreencast', {
    format: 'png',
    quality: 80,
    maxWidth: 1280,
    maxHeight: 800
  });
"

# Step 2: Record for 10 seconds, save frames

# Step 3: Analyze frames with ffmpeg
ffmpeg -i /tmp/recording.mp4 -vf "select='gt(scene,0.3)',showinfo" -f null - 2>&1 | \
  grep "Parsed_showinfo" | wc -l

# Step 4: Calculate flashes per second
# If scene changes > 30 in 10 seconds = 3+ per second = FAIL

# Step 5: For brightness-based flashing
ffmpeg -i /tmp/recording.mp4 -vf "signalstats" -f null - 2>&1 | \
  grep "lavfi.signalstats.YAVG" | \
  awk '{print $NF}' > brightness.txt

# Analyze brightness.txt for rapid changes

Tools Required:

  • ffmpeg (video processing)
  • Chrome DevTools Protocol screencast API
  • Python/shell script for analysis

Accuracy: ~90% - Will catch most flashing content

Implementation Effort: 16-24 hours (more complex)


CATEGORY 4: HOVER/FOCUS CONTENT PERSISTENCE

Current Limitation

Problem: WCAG 1.4.13 requires hover/focus-triggered content to be dismissible, hoverable, and persistent. Currently requires manual testing.

Proposed Solution: Automated Interaction Testing

Approach:

  1. Identify all elements with hover/focus event listeners
  2. Programmatically trigger hover/focus
  3. Measure how long content stays visible
  4. Test if Esc key dismisses content
  5. Test if mouse can move to triggered content

Implementation:

// Step 1: Find all elements with hover/focus handlers
const elementsWithHover = Array.from(document.querySelectorAll('*')).filter(el => {
  const style = getComputedStyle(el, ':hover');
  return style.display !== getComputedStyle(el).display ||
         style.visibility !== getComputedStyle(el).visibility;
});

// Step 2: Test each element
for (const el of elementsWithHover) {
  // Trigger hover
  el.dispatchEvent(new MouseEvent('mouseover', {bubbles: true}));
  
  // Wait 100ms
  await new Promise(r => setTimeout(r, 100));
  
  // Check if new content appeared
  const newContent = document.querySelector('[role="tooltip"], .tooltip, .popover');
  
  if (newContent) {
    // Test 1: Can we hover over the new content?
    const rect = newContent.getBoundingClientRect();
    const canHover = rect.width > 0 && rect.height > 0;
    
    // Test 2: Does Esc dismiss it?
    document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape'}));
    await new Promise(r => setTimeout(r, 100));
    const dismissed = !document.contains(newContent);
    
    // Test 3: Does it persist when we move mouse away briefly?
    el.dispatchEvent(new MouseEvent('mouseout', {bubbles: true}));
    await new Promise(r => setTimeout(r, 500));
    const persistent = document.contains(newContent);
    
    console.log({
      element: el,
      canHover,
      dismissible: dismissed,
      persistent
    });
  }
}

Tools Required:

  • JavaScript injection via cremote
  • Chrome DevTools Protocol for event simulation
  • Timing and state tracking

Accuracy: ~85% - Will catch most hover/focus issues

Implementation Effort: 12-16 hours


CATEGORY 5: SEMANTIC MEANING & COGNITIVE LOAD

Current Limitation

Problem: Some WCAG criteria require human judgment (e.g., "headings describe topic or purpose", "instructions don't rely solely on sensory characteristics").

Proposed Solution: LLM-Assisted Analysis

Approach:

  1. Extract all headings, labels, and instructions
  2. Use LLM (Claude, GPT-4) to analyze semantic meaning
  3. Check for sensory-only instructions (e.g., "click the red button")
  4. Verify heading descriptiveness
  5. Flag potential issues for manual review

Implementation:

// Step 1: Extract content for analysis
const analysisData = {
  headings: Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6')).map(h => ({
    level: h.tagName,
    text: h.textContent.trim(),
    context: h.parentElement.textContent.substring(0, 200)
  })),
  
  instructions: Array.from(document.querySelectorAll('label, .instructions, [role="note"]')).map(el => ({
    text: el.textContent.trim(),
    context: el.parentElement.textContent.substring(0, 200)
  })),
  
  links: Array.from(document.querySelectorAll('a')).map(a => ({
    text: a.textContent.trim(),
    href: a.href,
    context: a.parentElement.textContent.substring(0, 100)
  }))
};

// Step 2: Send to LLM for analysis
const prompt = `
Analyze this web content for accessibility issues:

1. Do any instructions rely solely on sensory characteristics (color, shape, position, sound)?
   Examples: "click the red button", "the square icon", "button on the right"
   
2. Are headings descriptive of their section content?
   Flag generic headings like "More Information", "Click Here", "Welcome"
   
3. Are link texts descriptive of their destination?
   Flag generic links like "click here", "read more", "learn more"

Content to analyze:
${JSON.stringify(analysisData, null, 2)}

Return JSON with:
{
  "sensory_instructions": [{element, issue, suggestion}],
  "generic_headings": [{heading, issue, suggestion}],
  "unclear_links": [{link, issue, suggestion}]
}
`;

// Step 3: Parse LLM response and generate report

Tools Required:

  • LLM API access (Claude, GPT-4, or local model)
  • JSON parsing
  • Integration with cremote reporting

Accuracy: ~75% - LLM can catch obvious issues, but still requires human review

Implementation Effort: 16-24 hours


CATEGORY 6: TIME-BASED MEDIA (VIDEO/AUDIO)

Current Limitation

Problem: WCAG 1.2.x criteria require captions, audio descriptions, and transcripts. Currently requires manual review of media content.

Proposed Solution: Automated Media Inventory & Validation

Approach:

  1. Detect all video/audio elements
  2. Check for caption tracks
  3. Verify caption files are accessible
  4. Use speech-to-text to verify caption accuracy (optional)
  5. Check for audio description tracks

Implementation:

// Step 1: Find all media elements
const mediaElements = {
  videos: Array.from(document.querySelectorAll('video')).map(v => ({
    src: v.src,
    tracks: Array.from(v.querySelectorAll('track')).map(t => ({
      kind: t.kind,
      src: t.src,
      srclang: t.srclang,
      label: t.label
    })),
    controls: v.hasAttribute('controls'),
    autoplay: v.hasAttribute('autoplay'),
    duration: v.duration
  })),
  
  audios: Array.from(document.querySelectorAll('audio')).map(a => ({
    src: a.src,
    controls: a.hasAttribute('controls'),
    autoplay: a.hasAttribute('autoplay'),
    duration: a.duration
  }))
};

// Step 2: Validate each video
for (const video of mediaElements.videos) {
  const issues = [];
  
  // Check for captions
  const captionTrack = video.tracks.find(t => t.kind === 'captions' || t.kind === 'subtitles');
  if (!captionTrack) {
    issues.push('FAIL: No caption track found (WCAG 1.2.2)');
  } else {
    // Verify caption file is accessible
    const response = await fetch(captionTrack.src);
    if (!response.ok) {
      issues.push(`FAIL: Caption file not accessible: ${captionTrack.src}`);
    }
  }
  
  // Check for audio description
  const descriptionTrack = video.tracks.find(t => t.kind === 'descriptions');
  if (!descriptionTrack) {
    issues.push('WARNING: No audio description track found (WCAG 1.2.5)');
  }
  
  // Check for transcript link
  const transcriptLink = document.querySelector(`a[href*="transcript"]`);
  if (!transcriptLink) {
    issues.push('WARNING: No transcript link found (WCAG 1.2.3)');
  }
  
  console.log({video: video.src, issues});
}

Enhanced with Speech-to-Text (Optional):

# Download video
youtube-dl -o /tmp/video.mp4 $video_url

# Extract audio
ffmpeg -i /tmp/video.mp4 -vn -acodec pcm_s16le -ar 16000 /tmp/audio.wav

# Run speech-to-text (using Whisper or similar)
whisper /tmp/audio.wav --model base --output_format txt

# Compare with caption file
diff /tmp/audio.txt /tmp/captions.vtt

# Calculate accuracy percentage

Tools Required:

  • JavaScript for media detection
  • fetch API for caption file validation
  • Optional: Whisper (OpenAI) or similar for speech-to-text
  • ffmpeg for audio extraction

Accuracy:

  • Media detection: ~100%
  • Caption presence: ~100%
  • Caption accuracy (with STT): ~70-80%

Implementation Effort:

  • Basic validation: 8-12 hours
  • With speech-to-text: 24-32 hours

CATEGORY 7: MULTI-PAGE CONSISTENCY

Current Limitation

Problem: WCAG 3.2.3 (Consistent Navigation) and 3.2.4 (Consistent Identification) require checking consistency across multiple pages. Currently requires manual comparison.

Proposed Solution: Automated Cross-Page Analysis

Approach:

  1. Crawl all pages on site
  2. Extract navigation structure from each page
  3. Compare navigation order across pages
  4. Extract common elements (search, login, cart, etc.)
  5. Verify consistent labeling and identification

Implementation:

// Step 1: Crawl site and extract navigation
const siteMap = [];

async function crawlPage(url, visited = new Set()) {
  if (visited.has(url)) return;
  visited.add(url);
  
  await navigateTo(url);
  
  const pageData = {
    url,
    navigation: Array.from(document.querySelectorAll('nav a, header a')).map(a => ({
      text: a.textContent.trim(),
      href: a.href,
      order: Array.from(a.parentElement.children).indexOf(a)
    })),
    commonElements: {
      search: document.querySelector('[type="search"], [role="search"]')?.outerHTML,
      login: document.querySelector('a[href*="login"], button:contains("Login")')?.outerHTML,
      cart: document.querySelector('a[href*="cart"], .cart')?.outerHTML
    }
  };
  
  siteMap.push(pageData);
  
  // Find more pages to crawl
  const links = Array.from(document.querySelectorAll('a[href]'))
    .map(a => a.href)
    .filter(href => href.startsWith(window.location.origin));
  
  for (const link of links.slice(0, 50)) { // Limit crawl depth
    await crawlPage(link, visited);
  }
}

// Step 2: Analyze consistency
function analyzeConsistency(siteMap) {
  const issues = [];
  
  // Check navigation order consistency
  const navOrders = siteMap.map(page => 
    page.navigation.map(n => n.text).join('|')
  );
  
  const uniqueOrders = [...new Set(navOrders)];
  if (uniqueOrders.length > 1) {
    issues.push({
      criterion: 'WCAG 3.2.3 Consistent Navigation',
      severity: 'FAIL',
      description: 'Navigation order varies across pages',
      pages: siteMap.filter((p, i) => navOrders[i] !== navOrders[0]).map(p => p.url)
    });
  }
  
  // Check common element consistency
  const searchElements = siteMap.map(p => p.commonElements.search).filter(Boolean);
  if (new Set(searchElements).size > 1) {
    issues.push({
      criterion: 'WCAG 3.2.4 Consistent Identification',
      severity: 'FAIL',
      description: 'Search functionality identified inconsistently across pages'
    });
  }
  
  return issues;
}

Tools Required:

  • Web crawler (can use existing cremote navigation)
  • DOM extraction and comparison
  • Pattern matching algorithms

Accuracy: ~90% - Will catch most consistency issues

Implementation Effort: 16-24 hours


IMPLEMENTATION PRIORITY

Phase 1: High Impact, Low Effort (Weeks 1-2)

  1. Gradient Contrast Analysis (ImageMagick) - 8-16 hours
  2. Hover/Focus Content Testing (JavaScript) - 12-16 hours
  3. Media Inventory & Validation (Basic) - 8-12 hours

Total Phase 1: 28-44 hours

Phase 2: Medium Impact, Medium Effort (Weeks 3-4)

  1. Text-in-Images Detection (OCR) - 8-12 hours
  2. Cross-Page Consistency (Crawler) - 16-24 hours
  3. LLM-Assisted Semantic Analysis - 16-24 hours

Total Phase 2: 40-60 hours

Phase 3: Lower Priority, Higher Effort (Weeks 5-6)

  1. Animation/Flash Detection (Video analysis) - 16-24 hours
  2. Speech-to-Text Caption Validation - 24-32 hours

Total Phase 3: 40-56 hours

Grand Total: 108-160 hours (13-20 business days)


EXPECTED OUTCOMES

Current State:

  • Automated Coverage: ~70% of WCAG 2.1 AA criteria
  • Manual Review Required: ~30%

After Phase 1:

  • Automated Coverage: ~78%
  • Manual Review Required: ~22%

After Phase 2:

  • Automated Coverage: ~85%
  • Manual Review Required: ~15%

After Phase 3:

  • Automated Coverage: ~90%
  • Manual Review Required: ~10%

Remaining Manual Tests (~10%):

  • Cognitive load assessment
  • Content quality and readability
  • User experience with assistive technologies
  • Real-world usability testing
  • Complex user interactions requiring human judgment

TECHNICAL REQUIREMENTS

Software Dependencies:

  • ImageMagick - Image analysis (usually pre-installed)
  • Tesseract OCR - Text detection in images
  • ffmpeg - Video/audio processing
  • Whisper (optional) - Speech-to-text for caption validation
  • LLM API (optional) - Semantic analysis

Installation:

# Ubuntu/Debian
apt-get install imagemagick tesseract-ocr ffmpeg

# For Whisper (Python)
pip install openai-whisper

# For LLM integration
# Use existing API keys for Claude/GPT-4

Container Considerations:

  • All tools should be installed in cremote container
  • File paths must account for container filesystem
  • Use file_download_cremotemcp for retrieving analysis results

CONCLUSION

By implementing these creative automated solutions, we can increase our accessibility testing coverage from 70% to 90%, significantly reducing manual review burden while maintaining high accuracy.

Key Principles:

  • Use existing, proven tools (ImageMagick, Tesseract, ffmpeg)
  • Keep solutions simple and maintainable (KISS philosophy)
  • Prioritize high-impact, low-effort improvements first
  • Accept that some tests will always require human judgment
  • Focus on catching obvious violations automatically

Next Steps:

  1. Review and approve proposed solutions
  2. Prioritize implementation based on business needs
  3. Start with Phase 1 (high impact, low effort)
  4. Iterate and refine based on real-world testing
  5. Document all new automated tests in enhanced_chromium_ada_checklist.md

Document Prepared By: Cremote Development Team
Date: October 2, 2025
Status: PROPOSAL - Awaiting Approval