This commit is contained in:
ct
2025-08-11 02:35:35 +08:00
parent 4a80723243
commit f3c91b9a64
24 changed files with 2035 additions and 214 deletions

View File

@@ -57,8 +57,7 @@ ### Supported Parameters (mapped to Browsershot capabilities)
- `url`: Target URL to screenshot
- `viewport_width`: Viewport width (via `windowSize()` method)
- `viewport_height`: Viewport height (via `windowSize()` method)
- `format`: jpg, png, webp (via Imagick post-processing)
- `quality`: Image quality 1-100 for JPEG (via `setScreenshotType('jpeg', quality)`)
- `quality`: WebP image quality 1-100 (via `setScreenshotType('webp', quality)`)
- `block_ads`: true/false - Uses EasyList filter for ad blocking
- `block_cookie_banners`: true/false - Uses cookie banner blocking patterns
- `block_trackers`: true/false - Uses tracker blocking patterns
@@ -164,7 +163,7 @@ ### Directory Structure
storage/app/crawlshot/ # Temporary result storage (24h TTL)
├── html/ # HTML crawl results
└── images/ # Screenshot files (JPEG/PNG/WebP)
└── images/ # Screenshot files (.webp)
routes/
└── api.php # /crawl endpoints with Sanctum auth
@@ -173,10 +172,10 @@ ### Directory Structure
### Browsershot Configuration
```php
// Basic screenshot configuration with EasyList ad blocking
// Basic screenshot configuration with EasyList ad blocking
$browsershot = Browsershot::url($url)
->windowSize($width, $height)
->setScreenshotType('png') // Save as PNG first for Imagick processing
->setScreenshotType('webp', $quality) // Always WebP format
->setDelay($delayInMs)
->waitUntilNetworkIdle()
->timeout($timeoutInSeconds);
@@ -188,17 +187,9 @@ ### Browsershot Configuration
$browsershot->blockDomains($blockedDomains)->blockUrls($blockedUrls);
}
$tempPath = storage_path('temp_screenshot.png');
$tempPath = storage_path('temp_screenshot.webp');
$browsershot->save($tempPath);
// Convert to desired format using Imagick if needed
if ($format === 'webp') {
$imagick = new Imagick($tempPath);
$imagick->setImageFormat('webp');
$imagick->writeImage($finalPath);
unlink($tempPath);
}
// HTML crawling configuration with EasyList filtering
$browsershot = Browsershot::url($url)
->setDelay($delayInMs)
@@ -225,7 +216,7 @@ ### Job States
### Storage Strategy
- HTML results: `storage/app/crawlshot/html/{uuid}.html`
- Image results: `storage/app/crawlshot/images/{uuid}.jpg`, `.png`, or `.webp`
- Image results: `storage/app/crawlshot/images/{uuid}.webp` (WebP format only)
- Auto-cleanup scheduled job removes files after 24 hours
- Database tracks job metadata and file paths
@@ -238,7 +229,7 @@ ### Authentication & Security
### System Requirements
- PHP 8.3+ with extensions: gd, imagick (required for WebP format)
- PHP 8.3+ with extensions: gd (WebP support built into Puppeteer)
- Node.js and npm for Puppeteer
- Chrome/Chromium browser (headless)
- Sufficient disk space for temporary file storage