Crawlshot
High-performance web crawling and screenshot service built with Laravel, featuring intelligent ad blocking, webhook notifications, and a powerful fluent PHP client.
🎯 Perfect for: Content monitoring • Screenshot automation • QA testing • Social media previews • Compliance archival
✨ Key Features
- 🚀 Dual Deployment: Standalone API service or Laravel package
- 🔗 Webhook Notifications: Real-time updates with progressive retry
- 🎨 Fluent Interface:
$client->crawl($url)->webhookUrl($webhook)->create() - 📦 Typed Responses:
$result->isCompleted(),$shot->getDimensions() - 🛡️ Smart Blocking: EasyList ad/tracker/cookie banner filtering
- ⚡ Background Processing: Laravel Horizon queue management
- 🔄 Auto-cleanup: 24-hour file retention with scheduled cleanup
- 🔐 Secure: Laravel Sanctum API authentication
📚 Documentation
- 📖 API Documentation - Complete REST API reference with webhook system
- 🔧 Client Documentation - PHP client library guide with fluent interface
- ⚙️ Setup Guide - Detailed installation and configuration
🚀 Quick Start
Option 1: Standalone API Service
Deploy your own Crawlshot API server:
git clone [repository]
cd crawlshot
composer install && npm install puppeteer
php artisan migrate && php artisan serve
Option 2: Laravel Package
Use as a client library in your Laravel app:
composer require crawlshot/laravel
$client = new CrawlshotClient('https://crawlshot.test', 'your-token');
⚡ Modern Usage Examples
Fluent Interface with Webhooks
use Crawlshot\Laravel\CrawlshotClient;
$client = new CrawlshotClient('https://crawlshot.test', 'your-token');
// HTML Crawling with webhook notifications
$crawl = $client->crawl('https://example.com')
->webhookUrl('https://myapp.com/webhook')
->webhookEventsFilter(['completed', 'failed'])
->blockAds(true)
->timeout(60)
->create();
echo "Job: {$crawl->getUuid()} - Status: {$crawl->getStatus()}";
// Screenshot with custom dimensions
$shot = $client->shot('https://dashboard.example.com')
->viewportSize(1920, 1080)
->quality(90)
->webhookUrl('https://myapp.com/webhook')
->create();
if ($shot->isCompleted()) {
$dimensions = $shot->getDimensions(); // [1920, 1080]
$imageData = $shot->downloadImage(); // Binary data
}
Webhook Handler Example
Route::post('/webhook', function (Request $request) {
$job = $request->all();
if ($job['status'] === 'completed') {
if (isset($job['result']['html'])) {
// Process HTML crawl result
$html = $job['result']['html']['raw'];
} elseif (isset($job['result']['image'])) {
// Process screenshot result
$imageUrl = $job['result']['image']['url'];
}
}
return response('OK', 200);
});
Direct API Usage
# HTML crawl with webhook
curl -X POST "https://crawlshot.test/api/crawl" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"webhook_url": "https://myapp.com/webhook",
"webhook_events_filter": ["completed"],
"block_ads": true
}'
# Screenshot with custom viewport
curl -X POST "https://crawlshot.test/api/shot" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"viewport_width": 1200,
"viewport_height": 800,
"webhook_url": "https://myapp.com/webhook"
}'
🎯 Core APIs
HTML Crawling
POST /api/crawl- Create HTML crawl job with ad blockingGET /api/crawl/{uuid}- Get crawl status and resultsGET /api/crawl/{uuid}.html- Download HTML file directly
Screenshot Capture
POST /api/shot- Create screenshot job (always WebP format)GET /api/shot/{uuid}- Get screenshot status and resultsGET /api/shot/{uuid}.webp- Download image file directly
Webhook Management
GET /api/webhook-errors- List failed webhook deliveriesPOST /api/webhook-errors/{uuid}/retry- Retry failed webhookDELETE /api/webhook-errors/{uuid}/clear- Clear webhook error
Client Library Methods
| Method | Returns | Description |
|---|---|---|
$client->crawl($url)->create() |
CrawlResponse |
Fluent crawl job creation |
$client->getCrawlStatus($uuid) |
CrawlResponse |
Typed crawl status |
$client->shot($url)->create() |
ShotResponse |
Fluent screenshot creation |
$client->getShotStatus($uuid) |
ShotResponse |
Typed screenshot status |
$client->listWebhookErrors() |
array |
Failed webhook list |
🔧 Architecture & Features
Webhook System
- Event Filtering - Choose which status changes trigger webhooks (
queued,processing,completed,failed) - Progressive Retry - Automatic retry with exponential backoff (1, 2, 4, 8, 16, 32 minutes)
- Error Management - List, retry, and clear failed webhook deliveries
- Consistent Payload - Webhook data matches status API responses exactly
Smart Filtering
- EasyList Integration - Automatic ad/tracker/cookie banner blocking
- Custom Blocking - Fine-grained control over content filtering
- Performance Optimized - Cached filter lists with 24-hour updates
Developer Experience
- Fluent Interface - Method chaining for clean, readable code
- Typed Responses -
CrawlResponseandShotResponseclasses with helpful methods - Laravel Integration - Service providers, facades, auto-discovery
- Comprehensive Docs - Complete API and client documentation
🛠️ Requirements & Setup
System Requirements
- PHP 8.3+ with ImageMagick extension
- Laravel 12.0+ framework
- Node.js with Puppeteer for browser automation
- Database (SQLite included, MySQL/PostgreSQL supported)
Quick Setup
# Clone and install
git clone [repository] && cd crawlshot
composer install && npm install puppeteer
# Configure and run
cp .env.example .env
php artisan key:generate
php artisan migrate
php artisan serve
# Start queue processing (separate terminal)
php artisan horizon
Key Dependencies
- Spatie Browsershot - Puppeteer wrapper for browser automation
- Laravel Horizon - Queue monitoring and management
- Laravel Sanctum - API authentication
- ProtonMail AdBlock Parser - EasyList filter processing
📄 License
MIT License - see LICENSE file for details.
Description
Languages
PHP
59.1%
Blade
40.5%
CSS
0.2%
JavaScript
0.2%