# Crawlshot PHP Client Library Documentation The Crawlshot PHP Client Library provides a clean, fluent interface for interacting with Crawlshot API services. Designed specifically for Laravel applications, it offers typed responses, method chaining, and comprehensive webhook support. ## Installation & Setup ### 1. Install via Composer ```bash composer require crawlshot/laravel ``` ### 2. Configuration **Option A: Direct instantiation** ```php use Crawlshot\Laravel\CrawlshotClient; $client = new CrawlshotClient('https://crawlshot.test', 'your-api-token'); ``` **Option B: Environment variables (recommended)** ```php # .env CRAWLSHOT_BASE_URL=https://crawlshot.test CRAWLSHOT_TOKEN=1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c # In your code $client = new CrawlshotClient( env('CRAWLSHOT_BASE_URL'), env('CRAWLSHOT_TOKEN') ); ``` ### 3. Service Provider (Optional) For application-wide configuration, create a service provider: ```php // app/Providers/CrawlshotServiceProvider.php class CrawlshotServiceProvider extends ServiceProvider { public function register() { $this->app->singleton(CrawlshotClient::class, function ($app) { return new CrawlshotClient( config('services.crawlshot.base_url'), config('services.crawlshot.token') ); }); } } // config/services.php 'crawlshot' => [ 'base_url' => env('CRAWLSHOT_BASE_URL'), 'token' => env('CRAWLSHOT_TOKEN'), ], ``` --- ## Basic Usage ### Simple HTML Crawling ```php use Crawlshot\Laravel\CrawlshotClient; $client = new CrawlshotClient('https://crawlshot.test', 'your-token'); // Create crawl job $response = $client->createCrawl('https://example.com'); echo "Job UUID: " . $response['uuid']; // Raw array response // Check status $status = $client->getCrawlStatus($response['uuid']); echo "Status: " . $status->getStatus(); // Typed response object if ($status->isCompleted()) { $html = $status->getResultRaw(); echo "HTML content: " . substr($html, 0, 200) . "..."; } ``` ### Simple Screenshot Capture ```php // Create screenshot job $response = $client->createShot('https://example.com'); // Check status $status = $client->getShotStatus($response['uuid']); if ($status->isCompleted()) { echo "Format: " . $status->getFormat(); // webp echo "Size: " . implode('x', $status->getDimensions()); // [1920, 1080] // Get image data $imageData = $status->getImageData(); // base64 $imageFile = $status->downloadImage(); // binary data } ``` --- ## Fluent Interface The client provides a powerful fluent interface for building complex requests with method chaining. ### Fluent HTML Crawling ```php $crawl = $client->crawl('https://example.com') ->timeout(60) ->delay(2000) ->blockAds(true) ->blockCookieBanners(true) ->blockTrackers(true) ->waitUntilNetworkIdle(true) ->webhookUrl('https://myapp.com/webhooks/crawlshot') ->webhookEventsFilter(['completed', 'failed']) ->create(); // Returns CrawlResponse echo "Job created: " . $crawl->getUuid(); echo "Status: " . $crawl->getStatus(); // Wait for completion while ($crawl->isProcessing() || $crawl->isQueued()) { sleep(2); $crawl->refresh(); // Updates from API } if ($crawl->isCompleted()) { $html = $crawl->getResultRaw(); file_put_contents('page.html', $html); } ``` ### Fluent Screenshot Capture ```php $screenshot = $client->shot('https://example.com') ->viewportSize(1200, 800) ->quality(85) ->timeout(30) ->delay(1000) ->blockAds(true) ->webhookUrl('https://myapp.com/webhooks/crawlshot') ->webhookEventsFilter(['completed']) ->create(); // Returns ShotResponse echo "Screenshot job: " . $screenshot->getUuid(); // Poll until complete while (!$screenshot->isCompleted() && !$screenshot->isFailed()) { sleep(3); $screenshot->refresh(); } if ($screenshot->isCompleted()) { // Save image $imageData = $screenshot->downloadImage(); file_put_contents('screenshot.webp', $imageData); echo "Saved {$screenshot->getWidth()}x{$screenshot->getHeight()} image"; } ``` ### Available Fluent Methods #### CrawlJobBuilder Methods ```php $client->crawl($url) ->webhookUrl(string $url) // Webhook notification URL ->webhookEventsFilter(array $events) // ['queued', 'processing', 'completed', 'failed'] ->timeout(int $seconds) // Request timeout (5-300) ->delay(int $milliseconds) // Delay before capture (0-30000) ->blockAds(bool $block = true) // Block ads via EasyList ->blockCookieBanners(bool $block = true) // Block cookie banners ->blockTrackers(bool $block = true) // Block tracking scripts ->waitUntilNetworkIdle(bool $wait = true) // Wait for network idle ->create(); // Execute and return CrawlResponse ``` #### ShotJobBuilder Methods ```php $client->shot($url) ->webhookUrl(string $url) // Webhook notification URL ->webhookEventsFilter(array $events) // ['queued', 'processing', 'completed', 'failed'] ->viewportSize(int $width, int $height) // Viewport dimensions ->quality(int $quality) // Image quality 1-100 ->timeout(int $seconds) // Request timeout (5-300) ->delay(int $milliseconds) // Delay before capture (0-30000) ->blockAds(bool $block = true) // Block ads via EasyList ->blockCookieBanners(bool $block = true) // Block cookie banners ->blockTrackers(bool $block = true) // Block tracking scripts ->create(); // Execute and return ShotResponse ``` --- ## Response Objects The client library provides typed response objects that make it easy to work with job results. ### Common Methods (Both CrawlResponse & ShotResponse) ```php // Job information $response->getUuid(): string // Job UUID $response->getStatus(): string // queued|processing|completed|failed $response->getUrl(): string // Original URL $response->getCreatedAt(): \DateTime // Job creation time $response->getStartedAt(): ?\DateTime // Processing start time (null if not started) $response->getCompletedAt(): ?\DateTime // Completion time (null if not completed) $response->getError(): ?string // Error message (null if no error) // Status checks $response->isQueued(): bool // Job waiting to start $response->isProcessing(): bool // Job currently running $response->isCompleted(): bool // Job finished successfully $response->isFailed(): bool // Job encountered error // Utility methods $response->refresh(): static // Refresh from API $response->getRawResponse(): array // Original API response $response->getResult(): ?array // Result data (null if not completed) ``` ### CrawlResponse Specific Methods ```php // HTML content access $crawl->getResultRaw(): ?string // Raw HTML content $crawl->getResultUrl(): ?string // Download URL (/api/crawl/{uuid}.html) $crawl->downloadHtml(): ?string // Direct download HTML content // Example usage if ($crawl->isCompleted()) { $html = $crawl->getResultRaw(); $downloadUrl = $crawl->getResultUrl(); // Or download directly $htmlContent = $crawl->downloadHtml(); file_put_contents('page.html', $htmlContent); } ``` ### ShotResponse Specific Methods ```php // Image data access $shot->getImageData(): ?string // Base64 encoded image $shot->getImageUrl(): ?string // Download URL (/api/shot/{uuid}.webp) $shot->downloadImage(): ?string // Direct download binary data // Image metadata $shot->getMimeType(): ?string // image/webp $shot->getFormat(): ?string // webp $shot->getWidth(): ?int // Image width in pixels $shot->getHeight(): ?int // Image height in pixels $shot->getSize(): ?int // File size in bytes $shot->getDimensions(): ?array // [width, height] or null // Example usage if ($shot->isCompleted()) { $imageData = $shot->getImageData(); // Base64 $imageBinary = $shot->downloadImage(); // Binary $dimensions = $shot->getDimensions(); // [1920, 1080] echo "Format: {$shot->getFormat()}"; // webp echo "Size: {$dimensions[0]}x{$dimensions[1]}"; // 1920x1080 echo "File size: {$shot->getSize()} bytes"; // 45678 bytes } ``` --- ## Webhook Integration Webhooks provide real-time notifications when job statuses change, eliminating the need for constant polling. ### Basic Webhook Setup ```php // Configure webhook when creating jobs $crawl = $client->crawl('https://example.com') ->webhookUrl('https://myapp.com/webhooks/crawlshot') ->webhookEventsFilter(['completed', 'failed']) ->create(); // Your webhook endpoint receives the same data as status APIs ``` ### Webhook Event Filtering Control which status changes trigger webhooks: ```php // Only notify on completion ->webhookEventsFilter(['completed']) // Only notify on completion or failure ->webhookEventsFilter(['completed', 'failed']) // Notify on all status changes (default) ->webhookEventsFilter(['queued', 'processing', 'completed', 'failed']) // Disable webhooks entirely ->webhookEventsFilter([]) ``` ### Webhook Handler Example ```php // routes/web.php or routes/api.php Route::post('/webhooks/crawlshot', function (Request $request) { $jobData = $request->all(); // The webhook payload is identical to GET /api/crawl/{uuid} response $uuid = $jobData['uuid']; $status = $jobData['status']; $url = $jobData['url']; switch ($status) { case 'completed': if (isset($jobData['result']['html'])) { // Handle crawl completion $html = $jobData['result']['html']['raw']; // Process HTML content... } elseif (isset($jobData['result']['image'])) { // Handle screenshot completion $imageUrl = $jobData['result']['image']['url']; $dimensions = [$jobData['result']['width'], $jobData['result']['height']]; // Process screenshot... } break; case 'failed': $error = $jobData['error']; Log::error("Crawlshot job {$uuid} failed: {$error}"); break; case 'processing': Log::info("Crawlshot job {$uuid} started processing"); break; } return response('OK', 200); }); ``` ### Webhook Error Management When webhooks fail, you can manage them through the client: ```php // List all jobs with failed webhooks $errors = $client->listWebhookErrors(); foreach ($errors['jobs'] as $job) { echo "Job {$job['uuid']} webhook failed: {$job['webhook_last_error']}\n"; echo "Attempts: {$job['webhook_attempts']}\n"; // Retry immediately $client->retryWebhook($job['uuid']); // Or clear the error without retrying // $client->clearWebhookError($job['uuid']); } ``` --- ## Advanced Configuration ### Custom Options ```php // Advanced crawling options $crawl = $client->crawl('https://spa-website.com') ->timeout(120) // Long timeout for slow sites ->delay(3000) // Wait 3 seconds for JS ->waitUntilNetworkIdle(true) // Wait for AJAX requests ->blockAds(false) // Allow ads for testing ->blockCookieBanners(true) // But block cookie banners ->webhookUrl('https://myapp.com/webhook') ->create(); // High-quality screenshots $shot = $client->shot('https://dashboard.example.com') ->viewportSize(2560, 1440) // High resolution ->quality(95) // High quality ->delay(5000) // Wait for dashboard to load ->blockAds(true) // Clean screenshot ->create(); ``` ### Batch Processing ```php $urls = ['https://site1.com', 'https://site2.com', 'https://site3.com']; $jobs = []; // Create multiple jobs foreach ($urls as $url) { $job = $client->crawl($url) ->webhookUrl('https://myapp.com/webhook') ->create(); $jobs[] = $job; echo "Created job: {$job->getUuid()}\n"; } // Monitor all jobs while (true) { $completed = 0; $failed = 0; foreach ($jobs as $job) { $job->refresh(); if ($job->isCompleted()) $completed++; if ($job->isFailed()) $failed++; } echo "Progress: {$completed} completed, {$failed} failed\n"; if ($completed + $failed === count($jobs)) { break; // All jobs done } sleep(5); } // Process results foreach ($jobs as $job) { if ($job->isCompleted()) { $html = $job->getResultRaw(); // Process HTML... } } ``` --- ## Error Handling ### Exception Handling ```php use Crawlshot\Laravel\CrawlshotClient; try { $client = new CrawlshotClient('https://crawlshot.test', 'invalid-token'); $response = $client->createCrawl('https://example.com'); } catch (\Exception $e) { if (str_contains($e->getMessage(), 'Unauthenticated')) { echo "Invalid API token\n"; } elseif (str_contains($e->getMessage(), '422')) { echo "Validation error: " . $e->getMessage(); } else { echo "API error: " . $e->getMessage(); } } ``` ### Response Validation ```php $shot = $client->getShotStatus($uuid); // Always check status before accessing results if ($shot->isCompleted()) { $imageData = $shot->getImageData(); if ($imageData) { file_put_contents('screenshot.webp', base64_decode($imageData)); } else { echo "No image data available\n"; } } elseif ($shot->isFailed()) { echo "Screenshot failed: " . $shot->getError(); } else { echo "Still processing... Status: " . $shot->getStatus(); } ``` ### Common Issues & Solutions **1. Connection Timeout** ```php // Increase timeout for slow networks $crawl = $client->crawl($url)->timeout(300)->create(); // 5 minutes ``` **2. Invalid URLs** ```php // Validate URLs before sending if (filter_var($url, FILTER_VALIDATE_URL)) { $crawl = $client->crawl($url)->create(); } else { echo "Invalid URL: {$url}"; } ``` **3. Large Files** ```php // Handle large responses $shot = $client->getShotStatus($uuid); if ($shot->isCompleted()) { $size = $shot->getSize(); if ($size > 10 * 1024 * 1024) { // 10MB echo "Large file ({$size} bytes), downloading directly..."; $imageData = $shot->downloadImage(); // More memory efficient } else { $imageData = $shot->getImageData(); // Base64 } } ``` --- ## Best Practices ### 1. Use Webhooks for Production ```php // ❌ Polling (inefficient) do { sleep(5); $status = $client->getCrawlStatus($uuid); } while ($status->isProcessing()); // ✅ Webhooks (efficient) $crawl = $client->crawl($url) ->webhookUrl('https://myapp.com/webhook') ->create(); ``` ### 2. Handle Failures Gracefully ```php $crawl = $client->crawl($url) ->timeout(60) ->webhookEventsFilter(['completed', 'failed']) // Include 'failed' events ->create(); // In webhook handler if ($jobData['status'] === 'failed') { // Log error and potentially retry with different settings Log::error("Crawl failed for {$jobData['url']}: {$jobData['error']}"); // Maybe retry with longer timeout $retry = $client->crawl($jobData['url']) ->timeout(120) ->create(); } ``` ### 3. Use Environment-Specific Configuration ```php // .env.production CRAWLSHOT_BASE_URL=https://crawlshot.production.com CRAWLSHOT_TOKEN=prod_token_here // .env.development CRAWLSHOT_BASE_URL=https://crawlshot.test CRAWLSHOT_TOKEN=dev_token_here // .env.testing CRAWLSHOT_BASE_URL=https://crawlshot.staging.com CRAWLSHOT_TOKEN=test_token_here ``` ### 4. Implement Proper Error Logging ```php try { $crawl = $client->crawl($url)->create(); } catch (\Exception $e) { Log::channel('crawlshot')->error('Crawl creation failed', [ 'url' => $url, 'error' => $e->getMessage(), 'trace' => $e->getTraceAsString() ]); throw $e; // Re-throw if needed } ``` ### 5. Monitor Webhook Failures ```php // Scheduled job to check webhook failures Schedule::call(function () { $client = app(CrawlshotClient::class); $errors = $client->listWebhookErrors(); if ($errors['pagination']['total_items'] > 0) { Log::warning('Webhook failures detected', [ 'count' => $errors['pagination']['total_items'] ]); // Optionally retry recent failures foreach ($errors['jobs'] as $job) { if ($job['webhook_attempts'] < 3) { // Don't retry too many times $client->retryWebhook($job['uuid']); } } } })->hourly(); ``` --- ## Complete Examples ### Content Monitoring System ```php class ContentMonitor { private CrawlshotClient $client; public function __construct(CrawlshotClient $client) { $this->client = $client; } public function monitorWebsite(string $url): void { $crawl = $this->client->crawl($url) ->blockAds(true) ->blockCookieBanners(true) ->timeout(60) ->webhookUrl(route('webhook.crawlshot')) ->webhookEventsFilter(['completed', 'failed']) ->create(); // Store job info for later processing MonitorJob::create([ 'uuid' => $crawl->getUuid(), 'url' => $url, 'status' => 'queued', 'created_at' => now() ]); } public function handleWebhook(array $data): void { $monitorJob = MonitorJob::where('uuid', $data['uuid'])->first(); if (!$monitorJob) return; $monitorJob->update(['status' => $data['status']]); if ($data['status'] === 'completed') { $html = $data['result']['html']['raw']; // Check for changes $previousHash = $monitorJob->content_hash; $currentHash = md5($html); if ($previousHash && $previousHash !== $currentHash) { // Content changed, send notification Mail::to('admin@example.com')->send( new ContentChangedNotification($monitorJob->url, $html) ); } $monitorJob->update(['content_hash' => $currentHash]); } } } ``` ### Screenshot Gallery Generator ```php class ScreenshotGallery { private CrawlshotClient $client; public function generateGallery(array $urls): array { $jobs = []; // Create all screenshot jobs foreach ($urls as $url) { $shot = $this->client->shot($url) ->viewportSize(1200, 800) ->quality(80) ->blockAds(true) ->delay(2000) ->webhookUrl(route('webhook.screenshot')) ->create(); $jobs[] = [ 'uuid' => $shot->getUuid(), 'url' => $url, 'response' => $shot ]; } return $jobs; } public function handleScreenshotWebhook(array $data): void { if ($data['status'] === 'completed') { // Save screenshot to permanent storage $imageData = base64_decode($data['result']['image']['raw']); $filename = $data['uuid'] . '.webp'; Storage::disk('public')->put("screenshots/{$filename}", $imageData); // Update database Screenshot::updateOrCreate(['uuid' => $data['uuid']], [ 'url' => $data['url'], 'filename' => $filename, 'width' => $data['result']['width'], 'height' => $data['result']['height'], 'size' => $data['result']['size'], 'completed_at' => now() ]); } } } ``` The Crawlshot PHP Client Library provides a comprehensive, developer-friendly interface for all your web crawling and screenshot needs. With its fluent interface, typed responses, and robust webhook support, it's designed to make integration as smooth as possible while maintaining full access to all advanced features.