bump
This commit is contained in:
205
DIVI_EXTRACTION_IMPLEMENTATION.md
Normal file
205
DIVI_EXTRACTION_IMPLEMENTATION.md
Normal file
@@ -0,0 +1,205 @@
|
||||
# Divi Extraction Tools - Implementation Summary
|
||||
|
||||
## Overview
|
||||
|
||||
Based on the feedback analysis in `feedback/`, I've implemented three new MCP tools to extract Divi page structure from external websites using cremote browser automation. These tools enable competitive analysis and page recreation from sites where WordPress API access is not available.
|
||||
|
||||
## Implementation Status: ✅ COMPLETE
|
||||
|
||||
### Phase 1: Core Extraction Tools (IMPLEMENTED)
|
||||
|
||||
#### 1. `web_extract_divi_structure_cremotemcp`
|
||||
**Purpose:** Extract page structure from CSS classes and DOM analysis
|
||||
**Accuracy:** 60-70% (approximation from rendered HTML)
|
||||
**Returns:** Sections, rows, columns, modules with types and basic styling
|
||||
|
||||
**What it extracts:**
|
||||
- Section types (regular, specialty, fullwidth) from CSS classes
|
||||
- Parallax flags
|
||||
- Background colors and images (computed styles)
|
||||
- Row and column structures
|
||||
- Module types (text, image, button, blurb, CTA, slider, gallery, video)
|
||||
- Module content and attributes
|
||||
|
||||
**Limitations:**
|
||||
- Cannot access original Divi shortcode/JSON
|
||||
- No builder settings (animations, responsive, custom CSS)
|
||||
- Approximation based on CSS classes only
|
||||
|
||||
#### 2. `web_extract_divi_images_cremotemcp`
|
||||
**Purpose:** Extract all images with metadata
|
||||
**Accuracy:** 100% for visible images
|
||||
**Returns:** Array of images with URLs, dimensions, alt text, context
|
||||
|
||||
**What it extracts:**
|
||||
- Regular `<img>` elements with src, alt, title, dimensions
|
||||
- Background images from CSS (computed styles)
|
||||
- Context information for each image
|
||||
- Background vs foreground image classification
|
||||
|
||||
#### 3. `web_extract_divi_content_cremotemcp`
|
||||
**Purpose:** Extract text content and module data
|
||||
**Accuracy:** 90-100% for visible content
|
||||
**Returns:** All modules with content, images, and metadata
|
||||
|
||||
**What it extracts:**
|
||||
- All Divi modules with type detection
|
||||
- Text content (innerHTML)
|
||||
- Button text and URLs
|
||||
- Blurb titles and content
|
||||
- CTA titles and content
|
||||
- Image sources and attributes
|
||||
- Total counts and extraction metadata
|
||||
|
||||
## Technical Implementation
|
||||
|
||||
### Files Modified
|
||||
|
||||
1. **mcp/main.go** (Lines 5627-5827)
|
||||
- Added 3 new MCP tool registrations
|
||||
- Integrated with existing handleOptionalNavigation helper
|
||||
- Proper error handling and JSON marshaling
|
||||
|
||||
2. **client/client.go** (Lines 4580-4750)
|
||||
- Added type definitions: DiviStructure, DiviSection, DiviRow, DiviColumn, DiviModule, DiviImage, DiviContent
|
||||
- Added 3 client methods: ExtractDiviStructure, ExtractDiviImages, ExtractDiviContent
|
||||
- Proper command sending and response parsing
|
||||
|
||||
3. **daemon/daemon.go** (Lines 2243-2303, 12882-13400)
|
||||
- Added 3 command handlers in switch statement
|
||||
- Added type definitions matching client types
|
||||
- Implemented 3 extraction methods with JavaScript execution
|
||||
- Comprehensive DOM traversal and CSS class analysis
|
||||
- Timeout handling and error recovery
|
||||
|
||||
### JavaScript Extraction Logic
|
||||
|
||||
Each extraction method uses browser-side JavaScript to:
|
||||
1. Query DOM for Divi-specific CSS classes
|
||||
2. Extract computed styles using `window.getComputedStyle()`
|
||||
3. Traverse section → row → column → module hierarchy
|
||||
4. Identify module types from CSS classes
|
||||
5. Extract content using innerHTML, textContent, and attributes
|
||||
6. Return structured JSON data
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Extract Structure
|
||||
```javascript
|
||||
{
|
||||
"tool": "web_extract_divi_structure_cremotemcp",
|
||||
"arguments": {
|
||||
"url": "https://example.com/divi-page",
|
||||
"clear_cache": true,
|
||||
"timeout": 30
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Extract Images
|
||||
```javascript
|
||||
{
|
||||
"tool": "web_extract_divi_images_cremotemcp",
|
||||
"arguments": {
|
||||
"url": "https://example.com/divi-page",
|
||||
"timeout": 30
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Extract Content
|
||||
```javascript
|
||||
{
|
||||
"tool": "web_extract_divi_content_cremotemcp",
|
||||
"arguments": {
|
||||
"url": "https://example.com/divi-page",
|
||||
"timeout": 30
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Expected Output Format
|
||||
|
||||
### Structure Output
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com/page",
|
||||
"sections": [
|
||||
{
|
||||
"type": "regular",
|
||||
"has_parallax": false,
|
||||
"background_color": "rgb(255,255,255)",
|
||||
"background_image": "url(...)",
|
||||
"rows": [
|
||||
{
|
||||
"column_structure": "1_2,1_2",
|
||||
"columns": [
|
||||
{
|
||||
"type": "1_2",
|
||||
"modules": [
|
||||
{
|
||||
"type": "text",
|
||||
"content": "<p>...</p>",
|
||||
"css_classes": ["et_pb_text", "et_pb_module"]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"extraction_date": "2025-01-16T...",
|
||||
"accuracy": "60-70% (approximation from CSS classes)",
|
||||
"limitations": "Cannot access original Divi shortcode/JSON or builder settings"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Next Steps (Future Enhancements)
|
||||
|
||||
### Phase 2: Supporting Tools (Not Yet Implemented)
|
||||
- `web_download_images_bulk_cremotemcp` - Download multiple images from URLs
|
||||
- `web_extract_divi_backgrounds_cremotemcp` - Extract background images and gradients with enhanced metadata
|
||||
|
||||
### Phase 3: Integration Tools (Future)
|
||||
- WordPress MCP integration for page recreation
|
||||
- Image upload orchestration
|
||||
- Content mapping and transformation
|
||||
|
||||
## Testing Recommendations
|
||||
|
||||
1. Test on multiple Divi sites with different structures
|
||||
2. Verify extraction accuracy against known pages
|
||||
3. Test with various module types (text, image, button, blurb, CTA, slider, gallery)
|
||||
4. Test with complex layouts (specialty sections, nested rows)
|
||||
5. Test with background images and gradients
|
||||
6. Verify timeout handling with slow-loading pages
|
||||
|
||||
## Known Limitations
|
||||
|
||||
As documented in `feedback/CREMOTE_REALITY_CHECK.md`:
|
||||
|
||||
1. **Cannot Extract:**
|
||||
- Original Divi shortcode/JSON
|
||||
- Builder settings (animations, responsive, custom CSS)
|
||||
- Advanced module configurations
|
||||
- Dynamic content sources (ACF fields)
|
||||
|
||||
2. **Accuracy:**
|
||||
- Structure: 60-70% (approximation from CSS classes)
|
||||
- Content: 90-100% (visible content)
|
||||
- Styling: 50-60% (computed styles only)
|
||||
|
||||
3. **Use Cases:**
|
||||
- ✅ Competitive analysis
|
||||
- ✅ External site recreation (with manual refinement)
|
||||
- ✅ Quick prototyping
|
||||
- ❌ Production migrations (too inaccurate)
|
||||
- ❌ Exact recreations (impossible without API)
|
||||
|
||||
## Conclusion
|
||||
|
||||
The implementation is complete and ready for testing. These tools provide a solid foundation for extracting Divi page data from external sites, with clear documentation of limitations and expected accuracy levels.
|
||||
|
||||
Reference in New Issue
Block a user