21 KiB
Crawlshot PHP Client Library Documentation
The Crawlshot PHP Client Library provides a clean, fluent interface for interacting with Crawlshot API services. Designed specifically for Laravel applications, it offers typed responses, method chaining, and comprehensive webhook support.
Installation & Setup
1. Install via Composer
composer require crawlshot/laravel
2. Configuration
Option A: Direct instantiation
use Crawlshot\Laravel\CrawlshotClient;
$client = new CrawlshotClient('https://crawlshot.test', 'your-api-token');
Option B: Environment variables (recommended)
# .env
CRAWLSHOT_BASE_URL=https://crawlshot.test
CRAWLSHOT_TOKEN=1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c
# In your code
$client = new CrawlshotClient(
env('CRAWLSHOT_BASE_URL'),
env('CRAWLSHOT_TOKEN')
);
3. Service Provider (Optional)
For application-wide configuration, create a service provider:
// app/Providers/CrawlshotServiceProvider.php
class CrawlshotServiceProvider extends ServiceProvider
{
public function register()
{
$this->app->singleton(CrawlshotClient::class, function ($app) {
return new CrawlshotClient(
config('services.crawlshot.base_url'),
config('services.crawlshot.token')
);
});
}
}
// config/services.php
'crawlshot' => [
'base_url' => env('CRAWLSHOT_BASE_URL'),
'token' => env('CRAWLSHOT_TOKEN'),
],
Basic Usage
Simple HTML Crawling
use Crawlshot\Laravel\CrawlshotClient;
$client = new CrawlshotClient('https://crawlshot.test', 'your-token');
// Create crawl job
$response = $client->createCrawl('https://example.com');
echo "Job UUID: " . $response['uuid']; // Raw array response
// Check status
$status = $client->getCrawlStatus($response['uuid']);
echo "Status: " . $status->getStatus(); // Typed response object
if ($status->isCompleted()) {
$html = $status->getResultRaw();
echo "HTML content: " . substr($html, 0, 200) . "...";
}
Simple Screenshot Capture
// Create screenshot job
$response = $client->createShot('https://example.com');
// Check status
$status = $client->getShotStatus($response['uuid']);
if ($status->isCompleted()) {
echo "Format: " . $status->getFormat(); // webp
echo "Size: " . implode('x', $status->getDimensions()); // [1920, 1080]
// Get image data
$imageData = $status->getImageData(); // base64
$imageFile = $status->downloadImage(); // binary data
}
Fluent Interface
The client provides a powerful fluent interface for building complex requests with method chaining.
Fluent HTML Crawling
$crawl = $client->crawl('https://example.com')
->timeout(60)
->delay(2000)
->blockAds(true)
->blockCookieBanners(true)
->blockTrackers(true)
// Network idle waiting is always enabled for optimal rendering
->webhookUrl('https://myapp.com/webhooks/crawlshot')
->webhookEventsFilter(['completed', 'failed'])
->create(); // Returns CrawlResponse
echo "Job created: " . $crawl->getUuid();
echo "Status: " . $crawl->getStatus();
// Wait for completion
while ($crawl->isProcessing() || $crawl->isQueued()) {
sleep(2);
$crawl->refresh(); // Updates from API
}
if ($crawl->isCompleted()) {
$html = $crawl->getResultRaw();
file_put_contents('page.html', $html);
}
Fluent Screenshot Capture
$screenshot = $client->shot('https://example.com')
->viewportSize(1200, 800)
->quality(85)
->timeout(30)
->delay(1000)
->blockAds(true)
->webhookUrl('https://myapp.com/webhooks/crawlshot')
->webhookEventsFilter(['completed'])
->create(); // Returns ShotResponse
echo "Screenshot job: " . $screenshot->getUuid();
// Poll until complete
while (!$screenshot->isCompleted() && !$screenshot->isFailed()) {
sleep(3);
$screenshot->refresh();
}
if ($screenshot->isCompleted()) {
// Save image
$imageData = $screenshot->downloadImage();
file_put_contents('screenshot.webp', $imageData);
echo "Saved {$screenshot->getWidth()}x{$screenshot->getHeight()} image";
}
Available Fluent Methods
CrawlJobBuilder Methods
$client->crawl($url)
->webhookUrl(string $url) // Webhook notification URL
->webhookEventsFilter(array $events) // ['queued', 'processing', 'completed', 'failed']
->timeout(int $seconds) // Request timeout (5-300)
->delay(int $milliseconds) // Delay before capture (0-30000)
->blockAds(bool $block = true) // Block ads via EasyList
->blockCookieBanners(bool $block = true) // Block cookie banners
->blockTrackers(bool $block = true) // Block tracking scripts
// waitUntilNetworkIdle is always enabled server-side for optimal rendering
->create(); // Execute and return CrawlResponse
ShotJobBuilder Methods
$client->shot($url)
->webhookUrl(string $url) // Webhook notification URL
->webhookEventsFilter(array $events) // ['queued', 'processing', 'completed', 'failed']
->viewportSize(int $width, int $height) // Viewport dimensions
->quality(int $quality) // Image quality 1-100
->timeout(int $seconds) // Request timeout (5-300)
->delay(int $milliseconds) // Delay before capture (0-30000)
->blockAds(bool $block = true) // Block ads via EasyList
->blockCookieBanners(bool $block = true) // Block cookie banners
->blockTrackers(bool $block = true) // Block tracking scripts
->create(); // Execute and return ShotResponse
Response Objects
The client library provides typed response objects that make it easy to work with job results.
Common Methods (Both CrawlResponse & ShotResponse)
// Job information
$response->getUuid(): string // Job UUID
$response->getStatus(): string // queued|processing|completed|failed
$response->getUrl(): string // Original URL
$response->getCreatedAt(): \DateTime // Job creation time
$response->getStartedAt(): ?\DateTime // Processing start time (null if not started)
$response->getCompletedAt(): ?\DateTime // Completion time (null if not completed)
$response->getError(): ?string // Error message (null if no error)
// Status checks
$response->isQueued(): bool // Job waiting to start
$response->isProcessing(): bool // Job currently running
$response->isCompleted(): bool // Job finished successfully
$response->isFailed(): bool // Job encountered error
// Utility methods
$response->refresh(): static // Refresh from API
$response->getRawResponse(): array // Original API response
$response->getResult(): ?array // Result data (null if not completed)
CrawlResponse Specific Methods
// HTML content access
$crawl->getResultRaw(): ?string // Raw HTML content
$crawl->getResultUrl(): ?string // Download URL (/api/crawl/{uuid}.html)
$crawl->downloadHtml(): ?string // Direct download HTML content
// Example usage
if ($crawl->isCompleted()) {
$html = $crawl->getResultRaw();
$downloadUrl = $crawl->getResultUrl();
// Or download directly
$htmlContent = $crawl->downloadHtml();
file_put_contents('page.html', $htmlContent);
}
ShotResponse Specific Methods
// Image data access
$shot->getImageData(): ?string // Base64 encoded image
$shot->getImageUrl(): ?string // Download URL (/api/shot/{uuid}.webp)
$shot->downloadImage(): ?string // Direct download binary data
// Image metadata
$shot->getMimeType(): ?string // image/webp
$shot->getFormat(): ?string // webp
$shot->getWidth(): ?int // Image width in pixels
$shot->getHeight(): ?int // Image height in pixels
$shot->getSize(): ?int // File size in bytes
$shot->getDimensions(): ?array // [width, height] or null
// Example usage
if ($shot->isCompleted()) {
$imageData = $shot->getImageData(); // Base64
$imageBinary = $shot->downloadImage(); // Binary
$dimensions = $shot->getDimensions(); // [1920, 1080]
echo "Format: {$shot->getFormat()}"; // webp
echo "Size: {$dimensions[0]}x{$dimensions[1]}"; // 1920x1080
echo "File size: {$shot->getSize()} bytes"; // 45678 bytes
}
Webhook Integration
Webhooks provide real-time notifications when job statuses change, eliminating the need for constant polling.
Basic Webhook Setup
// Configure webhook when creating jobs
$crawl = $client->crawl('https://example.com')
->webhookUrl('https://myapp.com/webhooks/crawlshot')
->webhookEventsFilter(['completed', 'failed'])
->create();
// Your webhook endpoint receives the same data as status APIs
Webhook Event Filtering
Control which status changes trigger webhooks:
// Only notify on completion
->webhookEventsFilter(['completed'])
// Only notify on completion or failure
->webhookEventsFilter(['completed', 'failed'])
// Notify on all status changes (default)
->webhookEventsFilter(['queued', 'processing', 'completed', 'failed'])
// Disable webhooks entirely
->webhookEventsFilter([])
Webhook Handler Example
// routes/web.php or routes/api.php
Route::post('/webhooks/crawlshot', function (Request $request) {
$jobData = $request->all();
// The webhook payload is identical to GET /api/crawl/{uuid} response
$uuid = $jobData['uuid'];
$status = $jobData['status'];
$url = $jobData['url'];
switch ($status) {
case 'completed':
if (isset($jobData['result']['html'])) {
// Handle crawl completion
$html = $jobData['result']['html']['raw'];
// Process HTML content...
} elseif (isset($jobData['result']['image'])) {
// Handle screenshot completion
$imageUrl = $jobData['result']['image']['url'];
$dimensions = [$jobData['result']['width'], $jobData['result']['height']];
// Process screenshot...
}
break;
case 'failed':
$error = $jobData['error'];
Log::error("Crawlshot job {$uuid} failed: {$error}");
break;
case 'processing':
Log::info("Crawlshot job {$uuid} started processing");
break;
}
return response('OK', 200);
});
Webhook Error Management
When webhooks fail, you can manage them through the client:
// List all jobs with failed webhooks
$errors = $client->listWebhookErrors();
foreach ($errors['jobs'] as $job) {
echo "Job {$job['uuid']} webhook failed: {$job['webhook_last_error']}\n";
echo "Attempts: {$job['webhook_attempts']}\n";
// Retry immediately
$client->retryWebhook($job['uuid']);
// Or clear the error without retrying
// $client->clearWebhookError($job['uuid']);
}
Advanced Configuration
Custom Options
// Advanced crawling options
$crawl = $client->crawl('https://spa-website.com')
->timeout(120) // Long timeout for slow sites
->delay(3000) // Wait 3 seconds for JS
// Network idle waiting is always enabled for AJAX/dynamic content
->blockAds(false) // Allow ads for testing
->blockCookieBanners(true) // But block cookie banners
->webhookUrl('https://myapp.com/webhook')
->create();
// High-quality screenshots
$shot = $client->shot('https://dashboard.example.com')
->viewportSize(2560, 1440) // High resolution
->quality(95) // High quality
->delay(5000) // Wait for dashboard to load
->blockAds(true) // Clean screenshot
->create();
Batch Processing
$urls = ['https://site1.com', 'https://site2.com', 'https://site3.com'];
$jobs = [];
// Create multiple jobs
foreach ($urls as $url) {
$job = $client->crawl($url)
->webhookUrl('https://myapp.com/webhook')
->create();
$jobs[] = $job;
echo "Created job: {$job->getUuid()}\n";
}
// Monitor all jobs
while (true) {
$completed = 0;
$failed = 0;
foreach ($jobs as $job) {
$job->refresh();
if ($job->isCompleted()) $completed++;
if ($job->isFailed()) $failed++;
}
echo "Progress: {$completed} completed, {$failed} failed\n";
if ($completed + $failed === count($jobs)) {
break; // All jobs done
}
sleep(5);
}
// Process results
foreach ($jobs as $job) {
if ($job->isCompleted()) {
$html = $job->getResultRaw();
// Process HTML...
}
}
Error Handling
Exception Handling
use Crawlshot\Laravel\CrawlshotClient;
try {
$client = new CrawlshotClient('https://crawlshot.test', 'invalid-token');
$response = $client->createCrawl('https://example.com');
} catch (\Exception $e) {
if (str_contains($e->getMessage(), 'Unauthenticated')) {
echo "Invalid API token\n";
} elseif (str_contains($e->getMessage(), '422')) {
echo "Validation error: " . $e->getMessage();
} else {
echo "API error: " . $e->getMessage();
}
}
Response Validation
$shot = $client->getShotStatus($uuid);
// Always check status before accessing results
if ($shot->isCompleted()) {
$imageData = $shot->getImageData();
if ($imageData) {
file_put_contents('screenshot.webp', base64_decode($imageData));
} else {
echo "No image data available\n";
}
} elseif ($shot->isFailed()) {
echo "Screenshot failed: " . $shot->getError();
} else {
echo "Still processing... Status: " . $shot->getStatus();
}
Common Issues & Solutions
1. Connection Timeout
// Increase timeout for slow networks
$crawl = $client->crawl($url)->timeout(300)->create(); // 5 minutes
2. Invalid URLs
// Validate URLs before sending
if (filter_var($url, FILTER_VALIDATE_URL)) {
$crawl = $client->crawl($url)->create();
} else {
echo "Invalid URL: {$url}";
}
3. Large Files
// Handle large responses
$shot = $client->getShotStatus($uuid);
if ($shot->isCompleted()) {
$size = $shot->getSize();
if ($size > 10 * 1024 * 1024) { // 10MB
echo "Large file ({$size} bytes), downloading directly...";
$imageData = $shot->downloadImage(); // More memory efficient
} else {
$imageData = $shot->getImageData(); // Base64
}
}
Best Practices
1. Use Webhooks for Production
// ❌ Polling (inefficient)
do {
sleep(5);
$status = $client->getCrawlStatus($uuid);
} while ($status->isProcessing());
// ✅ Webhooks (efficient)
$crawl = $client->crawl($url)
->webhookUrl('https://myapp.com/webhook')
->create();
2. Handle Failures Gracefully
$crawl = $client->crawl($url)
->timeout(60)
->webhookEventsFilter(['completed', 'failed']) // Include 'failed' events
->create();
// In webhook handler
if ($jobData['status'] === 'failed') {
// Log error and potentially retry with different settings
Log::error("Crawl failed for {$jobData['url']}: {$jobData['error']}");
// Maybe retry with longer timeout
$retry = $client->crawl($jobData['url'])
->timeout(120)
->create();
}
3. Use Environment-Specific Configuration
// .env.production
CRAWLSHOT_BASE_URL=https://crawlshot.production.com
CRAWLSHOT_TOKEN=prod_token_here
// .env.development
CRAWLSHOT_BASE_URL=https://crawlshot.test
CRAWLSHOT_TOKEN=dev_token_here
// .env.testing
CRAWLSHOT_BASE_URL=https://crawlshot.staging.com
CRAWLSHOT_TOKEN=test_token_here
4. Implement Proper Error Logging
try {
$crawl = $client->crawl($url)->create();
} catch (\Exception $e) {
Log::channel('crawlshot')->error('Crawl creation failed', [
'url' => $url,
'error' => $e->getMessage(),
'trace' => $e->getTraceAsString()
]);
throw $e; // Re-throw if needed
}
5. Monitor Webhook Failures
// Scheduled job to check webhook failures
Schedule::call(function () {
$client = app(CrawlshotClient::class);
$errors = $client->listWebhookErrors();
if ($errors['pagination']['total_items'] > 0) {
Log::warning('Webhook failures detected', [
'count' => $errors['pagination']['total_items']
]);
// Optionally retry recent failures
foreach ($errors['jobs'] as $job) {
if ($job['webhook_attempts'] < 3) { // Don't retry too many times
$client->retryWebhook($job['uuid']);
}
}
}
})->hourly();
Complete Examples
Content Monitoring System
class ContentMonitor
{
private CrawlshotClient $client;
public function __construct(CrawlshotClient $client)
{
$this->client = $client;
}
public function monitorWebsite(string $url): void
{
$crawl = $this->client->crawl($url)
->blockAds(true)
->blockCookieBanners(true)
->timeout(60)
->webhookUrl(route('webhook.crawlshot'))
->webhookEventsFilter(['completed', 'failed'])
->create();
// Store job info for later processing
MonitorJob::create([
'uuid' => $crawl->getUuid(),
'url' => $url,
'status' => 'queued',
'created_at' => now()
]);
}
public function handleWebhook(array $data): void
{
$monitorJob = MonitorJob::where('uuid', $data['uuid'])->first();
if (!$monitorJob) return;
$monitorJob->update(['status' => $data['status']]);
if ($data['status'] === 'completed') {
$html = $data['result']['html']['raw'];
// Check for changes
$previousHash = $monitorJob->content_hash;
$currentHash = md5($html);
if ($previousHash && $previousHash !== $currentHash) {
// Content changed, send notification
Mail::to('admin@example.com')->send(
new ContentChangedNotification($monitorJob->url, $html)
);
}
$monitorJob->update(['content_hash' => $currentHash]);
}
}
}
Screenshot Gallery Generator
class ScreenshotGallery
{
private CrawlshotClient $client;
public function generateGallery(array $urls): array
{
$jobs = [];
// Create all screenshot jobs
foreach ($urls as $url) {
$shot = $this->client->shot($url)
->viewportSize(1200, 800)
->quality(80)
->blockAds(true)
->delay(2000)
->webhookUrl(route('webhook.screenshot'))
->create();
$jobs[] = [
'uuid' => $shot->getUuid(),
'url' => $url,
'response' => $shot
];
}
return $jobs;
}
public function handleScreenshotWebhook(array $data): void
{
if ($data['status'] === 'completed') {
// Save screenshot to permanent storage
$imageData = base64_decode($data['result']['image']['raw']);
$filename = $data['uuid'] . '.webp';
Storage::disk('public')->put("screenshots/{$filename}", $imageData);
// Update database
Screenshot::updateOrCreate(['uuid' => $data['uuid']], [
'url' => $data['url'],
'filename' => $filename,
'width' => $data['result']['width'],
'height' => $data['result']['height'],
'size' => $data['result']['size'],
'completed_at' => now()
]);
}
}
}
The Crawlshot PHP Client Library provides a comprehensive, developer-friendly interface for all your web crawling and screenshot needs. With its fluent interface, typed responses, and robust webhook support, it's designed to make integration as smooth as possible while maintaining full access to all advanced features.