bump

2025-10-03 10:19:06 -05:00
parent 741bd19bd9
commit a27273b581
27 changed files with 11258 additions and 14 deletions
--- a/AUTOMATED_TESTING_ENHANCEMENTS.md
+++ b/AUTOMATED_TESTING_ENHANCEMENTS.md
@@ -0,0 +1,631 @@
+# AUTOMATED TESTING ENHANCEMENTS FOR CREMOTE ADA SUITE
+
+**Date:** October 2, 2025  
+**Purpose:** Propose creative solutions to automate currently manual accessibility tests  
+**Philosophy:** KISS - Keep it Simple, Stupid. Practical solutions using existing tools.
+
+---
+
+## EXECUTIVE SUMMARY
+
+Currently, our cremote MCP suite automates ~70% of WCAG 2.1 AA testing. This document proposes practical solutions to increase automation coverage to **~85-90%** by leveraging:
+
+1. **ImageMagick** for gradient contrast analysis
+2. **Screenshot-based analysis** for visual testing
+3. **OCR tools** for text-in-images detection
+4. **Video frame analysis** for animation/flash testing
+5. **Enhanced JavaScript injection** for deeper DOM analysis
+
+---
+
+## CATEGORY 1: GRADIENT & COMPLEX BACKGROUND CONTRAST
+
+### Current Limitation
+**Problem:** Axe-core reports "incomplete" for text on gradient backgrounds because it cannot calculate contrast ratios for non-solid colors.
+
+**Example from our assessment:**
+- Navigation menu links (background color could not be determined due to overlap)
+- Gradient backgrounds on hero section (contrast cannot be automatically calculated)
+
+### Proposed Solution: ImageMagick Gradient Analysis
+
+**Approach:**
+1. Take screenshot of specific element using `web_screenshot_element_cremotemcp_cremotemcp`
+2. Use ImageMagick to analyze color distribution
+3. Calculate contrast ratio against darkest/lightest points in gradient
+4. Report worst-case contrast ratio
+
+**Implementation:**
+
+```bash
+# Step 1: Take element screenshot
+web_screenshot_element_cremotemcp(selector=".hero-section", output="/tmp/hero.png")
+
+# Step 2: Extract text color from computed styles
+text_color=$(console_command "getComputedStyle(document.querySelector('.hero-section h1')).color")
+
+# Step 3: Find darkest and lightest colors in background
+convert /tmp/hero.png -format "%[fx:minima]" info: > darkest.txt
+convert /tmp/hero.png -format "%[fx:maxima]" info: > lightest.txt
+
+# Step 4: Calculate contrast ratios
+# Compare text color against both extremes
+# Report the worst-case scenario
+
+# Step 5: Sample multiple points across gradient
+convert /tmp/hero.png -resize 10x10! -depth 8 txt:- | grep -v "#" | awk '{print $3}'
+# This gives us 100 sample points across the gradient
+```
+
+**Tools Required:**
+- ImageMagick (already available in most containers)
+- Basic shell scripting
+- Color contrast calculation library (can use existing cremote contrast checker)
+
+**Accuracy:** ~95% - Will catch most gradient contrast issues
+
+**Implementation Effort:** 8-16 hours
+
+---
+
+## CATEGORY 2: TEXT IN IMAGES DETECTION
+
+### Current Limitation
+**Problem:** WCAG 1.4.5 requires text to be actual text, not images of text (except logos). Currently requires manual visual inspection.
+
+### Proposed Solution: OCR-Based Text Detection
+
+**Approach:**
+1. Screenshot all images on page
+2. Run OCR (Tesseract) on each image
+3. If text detected, flag for manual review
+4. Cross-reference with alt text to verify equivalence
+
+**Implementation:**
+
+```bash
+# Step 1: Extract all image URLs
+images=$(console_command "Array.from(document.querySelectorAll('img')).map(img => ({src: img.src, alt: img.alt}))")
+
+# Step 2: Download each image
+for img in $images; do
+  curl -o /tmp/img_$i.png $img
+  
+  # Step 3: Run OCR
+  tesseract /tmp/img_$i.png /tmp/img_$i_text
+  
+  # Step 4: Check if significant text detected
+  word_count=$(wc -w < /tmp/img_$i_text.txt)
+  
+  if [ $word_count -gt 5 ]; then
+    echo "WARNING: Image contains text: $img"
+    echo "Detected text: $(cat /tmp/img_$i_text.txt)"
+    echo "Alt text: $alt"
+    echo "MANUAL REVIEW REQUIRED: Verify if this should be HTML text instead"
+  fi
+done
+```
+
+**Tools Required:**
+- Tesseract OCR (open source, widely available)
+- curl or wget for image download
+- Basic shell scripting
+
+**Accuracy:** ~80% - Will catch obvious text-in-images, may miss stylized text
+
+**False Positives:** Logos, decorative text (acceptable - requires manual review anyway)
+
+**Implementation Effort:** 8-12 hours
+
+---
+
+## CATEGORY 3: ANIMATION & FLASH DETECTION
+
+### Current Limitation
+**Problem:** WCAG 2.3.1 requires no content flashing more than 3 times per second. Currently requires manual observation.
+
+### Proposed Solution: Video Frame Analysis
+
+**Approach:**
+1. Record video of page for 10 seconds using Chrome DevTools Protocol
+2. Extract frames using ffmpeg
+3. Compare consecutive frames for brightness changes
+4. Count flashes per second
+5. Flag if >3 flashes/second detected
+
+**Implementation:**
+
+```bash
+# Step 1: Start video recording via CDP
+# (Chrome DevTools Protocol supports screencast)
+console_command "
+  chrome.send('Page.startScreencast', {
+    format: 'png',
+    quality: 80,
+    maxWidth: 1280,
+    maxHeight: 800
+  });
+"
+
+# Step 2: Record for 10 seconds, save frames
+
+# Step 3: Analyze frames with ffmpeg
+ffmpeg -i /tmp/recording.mp4 -vf "select='gt(scene,0.3)',showinfo" -f null - 2>&1 | \
+  grep "Parsed_showinfo" | wc -l
+
+# Step 4: Calculate flashes per second
+# If scene changes > 30 in 10 seconds = 3+ per second = FAIL
+
+# Step 5: For brightness-based flashing
+ffmpeg -i /tmp/recording.mp4 -vf "signalstats" -f null - 2>&1 | \
+  grep "lavfi.signalstats.YAVG" | \
+  awk '{print $NF}' > brightness.txt
+
+# Analyze brightness.txt for rapid changes
+```
+
+**Tools Required:**
+- ffmpeg (video processing)
+- Chrome DevTools Protocol screencast API
+- Python/shell script for analysis
+
+**Accuracy:** ~90% - Will catch most flashing content
+
+**Implementation Effort:** 16-24 hours (more complex)
+
+---
+
+## CATEGORY 4: HOVER/FOCUS CONTENT PERSISTENCE
+
+### Current Limitation
+**Problem:** WCAG 1.4.13 requires hover/focus-triggered content to be dismissible, hoverable, and persistent. Currently requires manual testing.
+
+### Proposed Solution: Automated Interaction Testing
+
+**Approach:**
+1. Identify all elements with hover/focus event listeners
+2. Programmatically trigger hover/focus
+3. Measure how long content stays visible
+4. Test if Esc key dismisses content
+5. Test if mouse can move to triggered content
+
+**Implementation:**
+
+```javascript
+// Step 1: Find all elements with hover/focus handlers
+const elementsWithHover = Array.from(document.querySelectorAll('*')).filter(el => {
+  const style = getComputedStyle(el, ':hover');
+  return style.display !== getComputedStyle(el).display ||
+         style.visibility !== getComputedStyle(el).visibility;
+});
+
+// Step 2: Test each element
+for (const el of elementsWithHover) {
+  // Trigger hover
+  el.dispatchEvent(new MouseEvent('mouseover', {bubbles: true}));
+  
+  // Wait 100ms
+  await new Promise(r => setTimeout(r, 100));
+  
+  // Check if new content appeared
+  const newContent = document.querySelector('[role="tooltip"], .tooltip, .popover');
+  
+  if (newContent) {
+    // Test 1: Can we hover over the new content?
+    const rect = newContent.getBoundingClientRect();
+    const canHover = rect.width > 0 && rect.height > 0;
+    
+    // Test 2: Does Esc dismiss it?
+    document.dispatchEvent(new KeyboardEvent('keydown', {key: 'Escape'}));
+    await new Promise(r => setTimeout(r, 100));
+    const dismissed = !document.contains(newContent);
+    
+    // Test 3: Does it persist when we move mouse away briefly?
+    el.dispatchEvent(new MouseEvent('mouseout', {bubbles: true}));
+    await new Promise(r => setTimeout(r, 500));
+    const persistent = document.contains(newContent);
+    
+    console.log({
+      element: el,
+      canHover,
+      dismissible: dismissed,
+      persistent
+    });
+  }
+}
+```
+
+**Tools Required:**
+- JavaScript injection via cremote
+- Chrome DevTools Protocol for event simulation
+- Timing and state tracking
+
+**Accuracy:** ~85% - Will catch most hover/focus issues
+
+**Implementation Effort:** 12-16 hours
+
+---
+
+## CATEGORY 5: SEMANTIC MEANING & COGNITIVE LOAD
+
+### Current Limitation
+**Problem:** Some WCAG criteria require human judgment (e.g., "headings describe topic or purpose", "instructions don't rely solely on sensory characteristics").
+
+### Proposed Solution: LLM-Assisted Analysis
+
+**Approach:**
+1. Extract all headings, labels, and instructions
+2. Use LLM (Claude, GPT-4) to analyze semantic meaning
+3. Check for sensory-only instructions (e.g., "click the red button")
+4. Verify heading descriptiveness
+5. Flag potential issues for manual review
+
+**Implementation:**
+
+```javascript
+// Step 1: Extract content for analysis
+const analysisData = {
+  headings: Array.from(document.querySelectorAll('h1,h2,h3,h4,h5,h6')).map(h => ({
+    level: h.tagName,
+    text: h.textContent.trim(),
+    context: h.parentElement.textContent.substring(0, 200)
+  })),
+  
+  instructions: Array.from(document.querySelectorAll('label, .instructions, [role="note"]')).map(el => ({
+    text: el.textContent.trim(),
+    context: el.parentElement.textContent.substring(0, 200)
+  })),
+  
+  links: Array.from(document.querySelectorAll('a')).map(a => ({
+    text: a.textContent.trim(),
+    href: a.href,
+    context: a.parentElement.textContent.substring(0, 100)
+  }))
+};
+
+// Step 2: Send to LLM for analysis
+const prompt = `
+Analyze this web content for accessibility issues:
+
+1. Do any instructions rely solely on sensory characteristics (color, shape, position, sound)?
+   Examples: "click the red button", "the square icon", "button on the right"
+   
+2. Are headings descriptive of their section content?
+   Flag generic headings like "More Information", "Click Here", "Welcome"
+   
+3. Are link texts descriptive of their destination?
+   Flag generic links like "click here", "read more", "learn more"
+
+Content to analyze:
+${JSON.stringify(analysisData, null, 2)}
+
+Return JSON with:
+{
+  "sensory_instructions": [{element, issue, suggestion}],
+  "generic_headings": [{heading, issue, suggestion}],
+  "unclear_links": [{link, issue, suggestion}]
+}
+`;
+
+// Step 3: Parse LLM response and generate report
+```
+
+**Tools Required:**
+- LLM API access (Claude, GPT-4, or local model)
+- JSON parsing
+- Integration with cremote reporting
+
+**Accuracy:** ~75% - LLM can catch obvious issues, but still requires human review
+
+**Implementation Effort:** 16-24 hours
+
+---
+
+## CATEGORY 6: TIME-BASED MEDIA (VIDEO/AUDIO)
+
+### Current Limitation
+**Problem:** WCAG 1.2.x criteria require captions, audio descriptions, and transcripts. Currently requires manual review of media content.
+
+### Proposed Solution: Automated Media Inventory & Validation
+
+**Approach:**
+1. Detect all video/audio elements
+2. Check for caption tracks
+3. Verify caption files are accessible
+4. Use speech-to-text to verify caption accuracy (optional)
+5. Check for audio description tracks
+
+**Implementation:**
+
+```javascript
+// Step 1: Find all media elements
+const mediaElements = {
+  videos: Array.from(document.querySelectorAll('video')).map(v => ({
+    src: v.src,
+    tracks: Array.from(v.querySelectorAll('track')).map(t => ({
+      kind: t.kind,
+      src: t.src,
+      srclang: t.srclang,
+      label: t.label
+    })),
+    controls: v.hasAttribute('controls'),
+    autoplay: v.hasAttribute('autoplay'),
+    duration: v.duration
+  })),
+  
+  audios: Array.from(document.querySelectorAll('audio')).map(a => ({
+    src: a.src,
+    controls: a.hasAttribute('controls'),
+    autoplay: a.hasAttribute('autoplay'),
+    duration: a.duration
+  }))
+};
+
+// Step 2: Validate each video
+for (const video of mediaElements.videos) {
+  const issues = [];
+  
+  // Check for captions
+  const captionTrack = video.tracks.find(t => t.kind === 'captions' || t.kind === 'subtitles');
+  if (!captionTrack) {
+    issues.push('FAIL: No caption track found (WCAG 1.2.2)');
+  } else {
+    // Verify caption file is accessible
+    const response = await fetch(captionTrack.src);
+    if (!response.ok) {
+      issues.push(`FAIL: Caption file not accessible: ${captionTrack.src}`);
+    }
+  }
+  
+  // Check for audio description
+  const descriptionTrack = video.tracks.find(t => t.kind === 'descriptions');
+  if (!descriptionTrack) {
+    issues.push('WARNING: No audio description track found (WCAG 1.2.5)');
+  }
+  
+  // Check for transcript link
+  const transcriptLink = document.querySelector(`a[href*="transcript"]`);
+  if (!transcriptLink) {
+    issues.push('WARNING: No transcript link found (WCAG 1.2.3)');
+  }
+  
+  console.log({video: video.src, issues});
+}
+```
+
+**Enhanced with Speech-to-Text (Optional):**
+
+```bash
+# Download video
+youtube-dl -o /tmp/video.mp4 $video_url
+
+# Extract audio
+ffmpeg -i /tmp/video.mp4 -vn -acodec pcm_s16le -ar 16000 /tmp/audio.wav
+
+# Run speech-to-text (using Whisper or similar)
+whisper /tmp/audio.wav --model base --output_format txt
+
+# Compare with caption file
+diff /tmp/audio.txt /tmp/captions.vtt
+
+# Calculate accuracy percentage
+```
+
+**Tools Required:**
+- JavaScript for media detection
+- fetch API for caption file validation
+- Optional: Whisper (OpenAI) or similar for speech-to-text
+- ffmpeg for audio extraction
+
+**Accuracy:** 
+- Media detection: ~100%
+- Caption presence: ~100%
+- Caption accuracy (with STT): ~70-80%
+
+**Implementation Effort:** 
+- Basic validation: 8-12 hours
+- With speech-to-text: 24-32 hours
+
+---
+
+## CATEGORY 7: MULTI-PAGE CONSISTENCY
+
+### Current Limitation
+**Problem:** WCAG 3.2.3 (Consistent Navigation) and 3.2.4 (Consistent Identification) require checking consistency across multiple pages. Currently requires manual comparison.
+
+### Proposed Solution: Automated Cross-Page Analysis
+
+**Approach:**
+1. Crawl all pages on site
+2. Extract navigation structure from each page
+3. Compare navigation order across pages
+4. Extract common elements (search, login, cart, etc.)
+5. Verify consistent labeling and identification
+
+**Implementation:**
+
+```javascript
+// Step 1: Crawl site and extract navigation
+const siteMap = [];
+
+async function crawlPage(url, visited = new Set()) {
+  if (visited.has(url)) return;
+  visited.add(url);
+  
+  await navigateTo(url);
+  
+  const pageData = {
+    url,
+    navigation: Array.from(document.querySelectorAll('nav a, header a')).map(a => ({
+      text: a.textContent.trim(),
+      href: a.href,
+      order: Array.from(a.parentElement.children).indexOf(a)
+    })),
+    commonElements: {
+      search: document.querySelector('[type="search"], [role="search"]')?.outerHTML,
+      login: document.querySelector('a[href*="login"], button:contains("Login")')?.outerHTML,
+      cart: document.querySelector('a[href*="cart"], .cart')?.outerHTML
+    }
+  };
+  
+  siteMap.push(pageData);
+  
+  // Find more pages to crawl
+  const links = Array.from(document.querySelectorAll('a[href]'))
+    .map(a => a.href)
+    .filter(href => href.startsWith(window.location.origin));
+  
+  for (const link of links.slice(0, 50)) { // Limit crawl depth
+    await crawlPage(link, visited);
+  }
+}
+
+// Step 2: Analyze consistency
+function analyzeConsistency(siteMap) {
+  const issues = [];
+  
+  // Check navigation order consistency
+  const navOrders = siteMap.map(page => 
+    page.navigation.map(n => n.text).join('|')
+  );
+  
+  const uniqueOrders = [...new Set(navOrders)];
+  if (uniqueOrders.length > 1) {
+    issues.push({
+      criterion: 'WCAG 3.2.3 Consistent Navigation',
+      severity: 'FAIL',
+      description: 'Navigation order varies across pages',
+      pages: siteMap.filter((p, i) => navOrders[i] !== navOrders[0]).map(p => p.url)
+    });
+  }
+  
+  // Check common element consistency
+  const searchElements = siteMap.map(p => p.commonElements.search).filter(Boolean);
+  if (new Set(searchElements).size > 1) {
+    issues.push({
+      criterion: 'WCAG 3.2.4 Consistent Identification',
+      severity: 'FAIL',
+      description: 'Search functionality identified inconsistently across pages'
+    });
+  }
+  
+  return issues;
+}
+```
+
+**Tools Required:**
+- Web crawler (can use existing cremote navigation)
+- DOM extraction and comparison
+- Pattern matching algorithms
+
+**Accuracy:** ~90% - Will catch most consistency issues
+
+**Implementation Effort:** 16-24 hours
+
+---
+
+## IMPLEMENTATION PRIORITY
+
+### Phase 1: High Impact, Low Effort (Weeks 1-2)
+1. **Gradient Contrast Analysis** (ImageMagick) - 8-16 hours
+2. **Hover/Focus Content Testing** (JavaScript) - 12-16 hours
+3. **Media Inventory & Validation** (Basic) - 8-12 hours
+
+**Total Phase 1:** 28-44 hours
+
+### Phase 2: Medium Impact, Medium Effort (Weeks 3-4)
+4. **Text-in-Images Detection** (OCR) - 8-12 hours
+5. **Cross-Page Consistency** (Crawler) - 16-24 hours
+6. **LLM-Assisted Semantic Analysis** - 16-24 hours
+
+**Total Phase 2:** 40-60 hours
+
+### Phase 3: Lower Priority, Higher Effort (Weeks 5-6)
+7. **Animation/Flash Detection** (Video analysis) - 16-24 hours
+8. **Speech-to-Text Caption Validation** - 24-32 hours
+
+**Total Phase 3:** 40-56 hours
+
+**Grand Total:** 108-160 hours (13-20 business days)
+
+---
+
+## EXPECTED OUTCOMES
+
+### Current State:
+- **Automated Coverage:** ~70% of WCAG 2.1 AA criteria
+- **Manual Review Required:** ~30%
+
+### After Phase 1:
+- **Automated Coverage:** ~78%
+- **Manual Review Required:** ~22%
+
+### After Phase 2:
+- **Automated Coverage:** ~85%
+- **Manual Review Required:** ~15%
+
+### After Phase 3:
+- **Automated Coverage:** ~90%
+- **Manual Review Required:** ~10%
+
+### Remaining Manual Tests (~10%):
+- Cognitive load assessment
+- Content quality and readability
+- User experience with assistive technologies
+- Real-world usability testing
+- Complex user interactions requiring human judgment
+
+---
+
+## TECHNICAL REQUIREMENTS
+
+### Software Dependencies:
+- **ImageMagick** - Image analysis (usually pre-installed)
+- **Tesseract OCR** - Text detection in images
+- **ffmpeg** - Video/audio processing
+- **Whisper** (optional) - Speech-to-text for caption validation
+- **LLM API** (optional) - Semantic analysis
+
+### Installation:
+```bash
+# Ubuntu/Debian
+apt-get install imagemagick tesseract-ocr ffmpeg
+
+# For Whisper (Python)
+pip install openai-whisper
+
+# For LLM integration
+# Use existing API keys for Claude/GPT-4
+```
+
+### Container Considerations:
+- All tools should be installed in cremote container
+- File paths must account for container filesystem
+- Use file_download_cremotemcp for retrieving analysis results
+
+---
+
+## CONCLUSION
+
+By implementing these creative automated solutions, we can increase our accessibility testing coverage from **70% to 90%**, significantly reducing manual review burden while maintaining high accuracy.
+
+**Key Principles:**
+- ✅ Use existing, proven tools (ImageMagick, Tesseract, ffmpeg)
+- ✅ Keep solutions simple and maintainable (KISS philosophy)
+- ✅ Prioritize high-impact, low-effort improvements first
+- ✅ Accept that some tests will always require human judgment
+- ✅ Focus on catching obvious violations automatically
+
+**Next Steps:**
+1. Review and approve proposed solutions
+2. Prioritize implementation based on business needs
+3. Start with Phase 1 (high impact, low effort)
+4. Iterate and refine based on real-world testing
+5. Document all new automated tests in enhanced_chromium_ada_checklist.md
+
+---
+
+**Document Prepared By:** Cremote Development Team  
+**Date:** October 2, 2025  
+**Status:** PROPOSAL - Awaiting Approval
+