# Crawlshot [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Laravel](https://img.shields.io/badge/Laravel-12-red.svg)](https://laravel.com) [![PHP](https://img.shields.io/badge/PHP-8.3+-blue.svg)](https://php.net) **High-performance web crawling and screenshot service** built with Laravel, featuring intelligent ad blocking, webhook notifications, and a powerful fluent PHP client. 🎯 **Perfect for:** Content monitoring • Screenshot automation • QA testing • Social media previews • Compliance archival ## ✨ Key Features - 🚀 **Dual Deployment**: Standalone API service or Laravel package - 🔗 **Webhook Notifications**: Real-time updates with progressive retry - 🎨 **Fluent Interface**: `$client->crawl($url)->webhookUrl($webhook)->create()` - 📦 **Typed Responses**: `$result->isCompleted()`, `$shot->getDimensions()` - 🛡️ **Smart Blocking**: EasyList ad/tracker/cookie banner filtering - ⚡ **Background Processing**: Laravel Horizon queue management - 🔄 **Auto-cleanup**: 24-hour file retention with scheduled cleanup - 🔐 **Secure**: Laravel Sanctum API authentication ## 📚 Documentation - 📖 **[API Documentation](API_DOCUMENTATION.md)** - Complete REST API reference with webhook system - 🔧 **[Client Documentation](CLIENT_DOCUMENTATION.md)** - PHP client library guide with fluent interface - ⚙️ **[Setup Guide](SETUP.md)** - Detailed installation and configuration ## 🚀 Quick Start ### Option 1: Standalone API Service Deploy your own Crawlshot API server: ```bash git clone [repository] cd crawlshot composer install && npm install puppeteer php artisan migrate && php artisan serve ``` ### Option 2: Laravel Package Use as a client library in your Laravel app: ```bash composer require crawlshot/laravel ``` ```php $client = new CrawlshotClient('https://crawlshot.test', 'your-token'); ``` ## ⚡ Modern Usage Examples ### Fluent Interface with Webhooks ```php use Crawlshot\Laravel\CrawlshotClient; $client = new CrawlshotClient('https://crawlshot.test', 'your-token'); // HTML Crawling with webhook notifications $crawl = $client->crawl('https://example.com') ->webhookUrl('https://myapp.com/webhook') ->webhookEventsFilter(['completed', 'failed']) ->blockAds(true) ->timeout(60) ->create(); echo "Job: {$crawl->getUuid()} - Status: {$crawl->getStatus()}"; // Screenshot with custom dimensions $shot = $client->shot('https://dashboard.example.com') ->viewportSize(1920, 1080) ->quality(90) ->webhookUrl('https://myapp.com/webhook') ->create(); if ($shot->isCompleted()) { $dimensions = $shot->getDimensions(); // [1920, 1080] $imageData = $shot->downloadImage(); // Binary data } ``` ### Webhook Handler Example ```php Route::post('/webhook', function (Request $request) { $job = $request->all(); if ($job['status'] === 'completed') { if (isset($job['result']['html'])) { // Process HTML crawl result $html = $job['result']['html']['raw']; } elseif (isset($job['result']['image'])) { // Process screenshot result $imageUrl = $job['result']['image']['url']; } } return response('OK', 200); }); ``` ### Direct API Usage ```bash # HTML crawl with webhook curl -X POST "https://crawlshot.test/api/crawl" \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "webhook_url": "https://myapp.com/webhook", "webhook_events_filter": ["completed"], "block_ads": true }' # Screenshot with custom viewport curl -X POST "https://crawlshot.test/api/shot" \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "viewport_width": 1200, "viewport_height": 800, "webhook_url": "https://myapp.com/webhook" }' ``` ## 🎯 Core APIs ### HTML Crawling - `POST /api/crawl` - Create HTML crawl job with ad blocking - `GET /api/crawl/{uuid}` - Get crawl status and results - `GET /api/crawl/{uuid}.html` - Download HTML file directly ### Screenshot Capture - `POST /api/shot` - Create screenshot job (always WebP format) - `GET /api/shot/{uuid}` - Get screenshot status and results - `GET /api/shot/{uuid}.webp` - Download image file directly ### Webhook Management - `GET /api/webhook-errors` - List failed webhook deliveries - `POST /api/webhook-errors/{uuid}/retry` - Retry failed webhook - `DELETE /api/webhook-errors/{uuid}/clear` - Clear webhook error ### Client Library Methods | Method | Returns | Description | |--------|---------|-------------| | `$client->crawl($url)->create()` | `CrawlResponse` | Fluent crawl job creation | | `$client->getCrawlStatus($uuid)` | `CrawlResponse` | Typed crawl status | | `$client->shot($url)->create()` | `ShotResponse` | Fluent screenshot creation | | `$client->getShotStatus($uuid)` | `ShotResponse` | Typed screenshot status | | `$client->listWebhookErrors()` | `array` | Failed webhook list | ## 🔧 Architecture & Features ### Webhook System - **Event Filtering** - Choose which status changes trigger webhooks (`queued`, `processing`, `completed`, `failed`) - **Progressive Retry** - Automatic retry with exponential backoff (1, 2, 4, 8, 16, 32 minutes) - **Error Management** - List, retry, and clear failed webhook deliveries - **Consistent Payload** - Webhook data matches status API responses exactly ### Smart Filtering - **EasyList Integration** - Automatic ad/tracker/cookie banner blocking - **Custom Blocking** - Fine-grained control over content filtering - **Performance Optimized** - Cached filter lists with 24-hour updates ### Developer Experience - **Fluent Interface** - Method chaining for clean, readable code - **Typed Responses** - `CrawlResponse` and `ShotResponse` classes with helpful methods - **Laravel Integration** - Service providers, facades, auto-discovery - **Comprehensive Docs** - Complete API and client documentation ## 🛠️ Requirements & Setup ### System Requirements - **PHP 8.3+** with ImageMagick extension - **Laravel 12.0+** framework - **Node.js** with Puppeteer for browser automation - **Database** (SQLite included, MySQL/PostgreSQL supported) ### Quick Setup ```bash # Clone and install git clone [repository] && cd crawlshot composer install && npm install puppeteer # Configure and run cp .env.example .env php artisan key:generate php artisan migrate php artisan serve # Start queue processing (separate terminal) php artisan horizon ``` ### Key Dependencies - **[Spatie Browsershot](https://github.com/spatie/browsershot)** - Puppeteer wrapper for browser automation - **[Laravel Horizon](https://laravel.com/docs/horizon)** - Queue monitoring and management - **[Laravel Sanctum](https://laravel.com/docs/sanctum)** - API authentication - **ProtonMail AdBlock Parser** - EasyList filter processing ## 📄 License MIT License - see [LICENSE](LICENSE) file for details. --- **[Get Started →](CLIENT_DOCUMENTATION.md)** | **[View API Docs →](API_DOCUMENTATION.md)** | **[Setup Guide →](SETUP.md)**