Files
cremote/docs/DIVI_EXTRACTION_TOOLS.md
Josh at WLTechBlog 34a512e278 bump
2025-12-16 12:26:36 -07:00

6.6 KiB

Divi Extraction Tools - User Guide

Overview

The Divi extraction tools enable you to extract page structure, images, and content from any Divi-powered website using browser automation. These tools are designed for competitive analysis, external site recreation, and quick prototyping.

Important Limitations

⚠️ These tools extract from rendered HTML only (60-70% accuracy)

What You CAN Extract

  • Section, row, and column structure (from CSS classes)
  • Module types and visible content
  • Images with metadata (URLs, dimensions, alt text)
  • Background colors and images (computed styles)
  • Text content and button URLs

What You CANNOT Extract

  • Original Divi shortcode/JSON
  • Builder settings (animations, responsive, custom CSS)
  • Advanced module configurations
  • Dynamic content sources (ACF fields)
  • Exact responsive layouts

Tools Available

1. web_extract_divi_structure_cremotemcp

Extracts the complete page structure including sections, rows, columns, and modules.

Parameters:

  • url (optional): URL to navigate to before extraction
  • tab (optional): Tab ID to use (uses current tab if not specified)
  • clear_cache (optional): Clear browser cache before extraction (default: false)
  • timeout (optional): Timeout in seconds (default: 30)

Example:

{
  "tool": "web_extract_divi_structure_cremotemcp",
  "arguments": {
    "url": "https://example.com/divi-page",
    "clear_cache": true,
    "timeout": 30
  }
}

Output Structure:

{
  "url": "https://example.com/page",
  "sections": [
    {
      "type": "regular",
      "has_parallax": false,
      "background_color": "rgb(255,255,255)",
      "background_image": "url(...)",
      "background_style": "image",
      "rows": [
        {
          "column_structure": "1_2,1_2",
          "columns": [
            {
              "type": "1_2",
              "modules": [
                {
                  "type": "text",
                  "content": "<p>...</p>",
                  "attributes": {},
                  "css_classes": ["et_pb_text", "et_pb_module"]
                }
              ],
              "css_classes": ["et_pb_column", "et_pb_column_1_2"]
            }
          ],
          "css_classes": ["et_pb_row"]
        }
      ],
      "css_classes": ["et_pb_section"]
    }
  ],
  "metadata": {
    "extraction_date": "2025-01-16T...",
    "accuracy": "60-70% (approximation from CSS classes)",
    "limitations": "Cannot access original Divi shortcode/JSON or builder settings"
  }
}

2. web_extract_divi_images_cremotemcp

Extracts all images from the page including regular images and background images.

Parameters:

  • url (optional): URL to navigate to before extraction
  • tab (optional): Tab ID to use
  • clear_cache (optional): Clear browser cache (default: false)
  • timeout (optional): Timeout in seconds (default: 30)

Example:

{
  "tool": "web_extract_divi_images_cremotemcp",
  "arguments": {
    "url": "https://example.com/divi-page",
    "timeout": 30
  }
}

Output Structure:

[
  {
    "url": "https://example.com/image.jpg",
    "alt": "Image description",
    "title": "Image title",
    "width": 1920,
    "height": 1080,
    "context": "image 0",
    "is_background": false
  },
  {
    "url": "https://example.com/bg.jpg",
    "alt": "",
    "title": "",
    "width": 0,
    "height": 0,
    "context": "background 1",
    "is_background": true
  }
]

3. web_extract_divi_content_cremotemcp

Extracts all module content and images with comprehensive metadata.

Parameters:

  • url (optional): URL to navigate to before extraction
  • tab (optional): Tab ID to use
  • clear_cache (optional): Clear browser cache (default: false)
  • timeout (optional): Timeout in seconds (default: 30)

Example:

{
  "tool": "web_extract_divi_content_cremotemcp",
  "arguments": {
    "url": "https://example.com/divi-page",
    "timeout": 30
  }
}

Output Structure:

{
  "url": "https://example.com/page",
  "modules": [
    {
      "type": "text",
      "content": "<p>Text content</p>",
      "attributes": {},
      "css_classes": ["et_pb_text", "et_pb_module"]
    },
    {
      "type": "button",
      "content": "Click Here",
      "attributes": {
        "href": "https://example.com/link",
        "target": "_blank"
      },
      "css_classes": ["et_pb_button", "et_pb_module"]
    }
  ],
  "images": [...],
  "metadata": {
    "extraction_date": "2025-01-16T...",
    "total_modules": 15,
    "total_images": 8
  }
}

Workflow Examples

Extract Complete Page Data

// 1. Navigate to page
{
  "tool": "web_navigate_cremotemcp",
  "arguments": {
    "url": "https://example.com/divi-page",
    "clear_cache": true
  }
}

// 2. Extract structure
{
  "tool": "web_extract_divi_structure_cremotemcp",
  "arguments": {
    "timeout": 30
  }
}

// 3. Extract images
{
  "tool": "web_extract_divi_images_cremotemcp",
  "arguments": {
    "timeout": 30
  }
}

// 4. Extract content
{
  "tool": "web_extract_divi_content_cremotemcp",
  "arguments": {
    "timeout": 30
  }
}

Quick Single-Call Extraction

// Extract structure with automatic navigation
{
  "tool": "web_extract_divi_structure_cremotemcp",
  "arguments": {
    "url": "https://example.com/divi-page",
    "clear_cache": true,
    "timeout": 30
  }
}

Module Types Detected

The tools can identify the following Divi module types:

  • text - Text modules
  • image - Image modules
  • button - Button modules
  • blurb - Blurb modules
  • cta - Call-to-action modules
  • slider - Slider modules
  • gallery - Gallery modules
  • video - Video modules
  • unknown - Unrecognized modules

Best Practices

  1. Always set appropriate timeouts for slow-loading pages
  2. Clear cache when extracting from a new site
  3. Use structure extraction first to understand page layout
  4. Extract images separately if you need detailed image metadata
  5. Combine with WordPress MCP tools for page recreation on your own sites

Troubleshooting

Timeout Errors

  • Increase the timeout parameter
  • Check if the page is loading slowly
  • Verify the URL is accessible

Empty Results

  • Verify the page uses Divi (check for et_pb_ CSS classes)
  • Check if JavaScript is enabled
  • Try navigating to the page first with web_navigate_cremotemcp

Incomplete Data

  • This is expected - tools extract 60-70% accuracy
  • Manual refinement will be required
  • Use for starting point, not exact recreation

See Also