Files

Josh at WLTechBlog 34a512e278 bump

2025-12-16 12:26:36 -07:00

6.0 KiB

Raw Blame History

Divi Extraction Tools - Implementation Plan & Status

Executive Summary

Successfully implemented 3 new MCP tools for extracting Divi page structure from external websites using cremote browser automation. These tools address the need for competitive analysis and page recreation from sites where WordPress API access is unavailable.

Problem Analysis (from feedback/)

The Challenge

Goal: Extract Divi page structure from external websites
Constraint: No WordPress API access (external sites)
Reality: Can only extract from rendered HTML/CSS (60-70% accuracy)
Use Case: Competitive analysis, external site recreation, quick prototyping

Key Insights from Feedback

CREMOTE_EXTRACTION_SUMMARY.md: Confirmed feasibility with realistic expectations
CREMOTE_REALITY_CHECK.md: Documented limitations and boundaries
CREMOTE_VS_WORDPRESS_API.md: Compared approaches and use cases

Implementation Complete ✅

Tools Implemented

1. web_extract_divi_structure_cremotemcp

Status: ✅ Implemented & Compiled
Location: mcp/main.go (lines 5627-5687), daemon/daemon.go (lines 2243-2260, 12947-13147)
Accuracy: 60-70% (approximation from CSS classes)
Extracts: Sections, rows, columns, modules with types and basic styling

2. web_extract_divi_images_cremotemcp

Status: ✅ Implemented & Compiled
Location: mcp/main.go (lines 5689-5749), daemon/daemon.go (lines 2262-2279, 13149-13252)
Accuracy: 100% for visible images
Extracts: All images with URLs, dimensions, alt text, context

3. web_extract_divi_content_cremotemcp

Status: ✅ Implemented & Compiled
Location: mcp/main.go (lines 5751-5811), daemon/daemon.go (lines 2281-2298, 13254-13400)
Accuracy: 90-100% for visible content
Extracts: All modules with content, images, and metadata

Code Changes

Files Modified

mcp/main.go (+200 lines)
- 3 new MCP tool registrations
- Integration with handleOptionalNavigation
- JSON marshaling and error handling
client/client.go (+170 lines)
- Type definitions for Divi structures
- 3 client methods for command execution
- Response parsing and error handling
daemon/daemon.go (+580 lines)
- 3 command handlers in switch statement
- Type definitions matching client
- 3 extraction methods with JavaScript execution
- DOM traversal and CSS class analysis

Compilation Status

✅ Daemon compiles successfully
✅ Client compiles successfully
✅ No compilation errors
⏳ MCP server requires dependencies (expected)

Technical Architecture

Data Flow

MCP Tool Call → Client Method → Daemon Command Handler → JavaScript Execution → DOM Analysis → JSON Response

JavaScript Extraction Strategy

Query DOM for Divi-specific CSS classes
Extract computed styles using window.getComputedStyle()
Traverse section → row → column → module hierarchy
Identify module types from CSS classes
Extract content using innerHTML, textContent, attributes
Return structured JSON data

Module Type Detection

Supported module types:

text (et_pb_text)
image (et_pb_image)
button (et_pb_button)
blurb (et_pb_blurb)
cta (et_pb_cta)
slider (et_pb_slider)
gallery (et_pb_gallery)
video (et_pb_video)

Usage Examples

Basic Structure Extraction

{
  "tool": "web_extract_divi_structure_cremotemcp",
  "arguments": {
    "url": "https://example.com/divi-page",
    "clear_cache": true,
    "timeout": 30
  }
}

Image Extraction

{
  "tool": "web_extract_divi_images_cremotemcp",
  "arguments": {
    "tab": "tab-id-123",
    "timeout": 30
  }
}

Content Extraction

{
  "tool": "web_extract_divi_content_cremotemcp",
  "arguments": {
    "url": "https://example.com/divi-page",
    "timeout": 30
  }
}

Known Limitations

Cannot Extract

❌ Original Divi shortcode/JSON
❌ Builder settings (animations, responsive, custom CSS)
❌ Advanced module configurations
❌ Dynamic content sources (ACF fields)
❌ Exact responsive layouts

Accuracy Levels

Structure: 60-70% (approximation from CSS classes)
Content: 90-100% (visible content)
Styling: 50-60% (computed styles only)
Images: 100% (visible images)

Appropriate Use Cases

✅ Competitive analysis
✅ External site recreation (with manual refinement)
✅ Quick prototyping
✅ Content extraction
❌ Production migrations (too inaccurate)
❌ Exact recreations (impossible without API)

Next Steps

Immediate (Ready for Testing)

Integration testing with real Divi sites
Verify extraction accuracy on various page types
Test timeout handling and error recovery
Document edge cases and failure modes

Phase 2 (Future Enhancements)

web_download_images_bulk_cremotemcp - Bulk image download
web_extract_divi_backgrounds_cremotemcp - Enhanced background extraction
WordPress MCP integration for page recreation
Image upload orchestration

Phase 3 (Advanced Features)

Contact form extraction
Gallery extraction with full metadata
Slider extraction with slide data
WooCommerce module extraction

Testing Checklist

Test on simple Divi page (1-2 sections)
Test on complex Divi page (10+ sections)
Test with specialty sections
Test with various module types
Test with background images
Test with gradients
Test timeout handling
Test error recovery
Verify JSON output format
Test with slow-loading pages

Documentation

✅ Implementation summary (DIVI_EXTRACTION_IMPLEMENTATION.md)
✅ Implementation plan (this file)
✅ Feedback analysis (feedback/ directory)
⏳ User guide (to be created after testing)
⏳ API documentation (to be created after testing)

Conclusion

Implementation is complete and code compiles successfully. The tools are ready for integration testing on real Divi sites. All three extraction methods follow the same pattern and integrate cleanly with the existing cremote architecture.

Status: ✅ READY FOR TESTING

6.0 KiB Raw Blame History