2025-08-12 05:06:31 +08:00
2025-08-12 05:06:31 +08:00
2025-08-10 21:10:33 +08:00
2025-08-12 04:35:58 +08:00
2025-08-11 02:35:35 +08:00
2025-08-11 02:35:35 +08:00
2025-08-10 16:28:46 +08:00
2025-08-10 16:28:46 +08:00
2025-08-11 02:35:35 +08:00
2025-08-11 02:56:17 +08:00
2025-08-10 16:28:46 +08:00
2025-08-10 16:28:54 +08:00
2025-08-10 16:28:46 +08:00
2025-08-10 22:36:50 +08:00
2025-08-10 16:28:46 +08:00
2025-08-10 16:28:46 +08:00
2025-08-11 02:35:35 +08:00
2025-08-10 16:28:46 +08:00
2025-08-11 02:56:17 +08:00
2025-08-11 02:56:17 +08:00
2025-08-10 22:23:22 +08:00
2025-08-10 21:54:05 +08:00
2025-08-10 21:10:33 +08:00
2025-08-11 02:35:35 +08:00
2025-08-10 21:10:33 +08:00
2025-08-10 21:10:33 +08:00
2025-08-10 16:28:46 +08:00
2025-08-11 02:35:35 +08:00
2025-08-10 23:05:55 +08:00
2025-08-10 16:28:46 +08:00

Crawlshot

License: MIT Laravel PHP

High-performance web crawling and screenshot service built with Laravel, featuring intelligent ad blocking, webhook notifications, and a powerful fluent PHP client.

🎯 Perfect for: Content monitoring • Screenshot automation • QA testing • Social media previews • Compliance archival

Key Features

  • 🚀 Dual Deployment: Standalone API service or Laravel package
  • 🔗 Webhook Notifications: Real-time updates with progressive retry
  • 🎨 Fluent Interface: $client->crawl($url)->webhookUrl($webhook)->create()
  • 📦 Typed Responses: $result->isCompleted(), $shot->getDimensions()
  • 🛡️ Smart Blocking: EasyList ad/tracker/cookie banner filtering
  • Background Processing: Laravel Horizon queue management
  • 🔄 Auto-cleanup: 24-hour file retention with scheduled cleanup
  • 🔐 Secure: Laravel Sanctum API authentication

📚 Documentation

🚀 Quick Start

Option 1: Standalone API Service

Deploy your own Crawlshot API server:

git clone [repository]
cd crawlshot
composer install && npm install puppeteer
php artisan migrate && php artisan serve

Option 2: Laravel Package

Use as a client library in your Laravel app:

composer require crawlshot/laravel
$client = new CrawlshotClient('https://crawlshot.test', 'your-token');

Modern Usage Examples

Fluent Interface with Webhooks

use Crawlshot\Laravel\CrawlshotClient;

$client = new CrawlshotClient('https://crawlshot.test', 'your-token');

// HTML Crawling with webhook notifications
$crawl = $client->crawl('https://example.com')
    ->webhookUrl('https://myapp.com/webhook')
    ->webhookEventsFilter(['completed', 'failed'])
    ->blockAds(true)
    ->timeout(60)
    ->create();

echo "Job: {$crawl->getUuid()} - Status: {$crawl->getStatus()}";

// Screenshot with custom dimensions
$shot = $client->shot('https://dashboard.example.com')
    ->viewportSize(1920, 1080)
    ->quality(90)
    ->webhookUrl('https://myapp.com/webhook')
    ->create();

if ($shot->isCompleted()) {
    $dimensions = $shot->getDimensions(); // [1920, 1080]
    $imageData = $shot->downloadImage();  // Binary data
}

Webhook Handler Example

Route::post('/webhook', function (Request $request) {
    $job = $request->all();
    
    if ($job['status'] === 'completed') {
        if (isset($job['result']['html'])) {
            // Process HTML crawl result
            $html = $job['result']['html']['raw'];
        } elseif (isset($job['result']['image'])) {
            // Process screenshot result  
            $imageUrl = $job['result']['image']['url'];
        }
    }
    
    return response('OK', 200);
});

Direct API Usage

# HTML crawl with webhook
curl -X POST "https://crawlshot.test/api/crawl" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "webhook_url": "https://myapp.com/webhook", 
    "webhook_events_filter": ["completed"],
    "block_ads": true
  }'

# Screenshot with custom viewport
curl -X POST "https://crawlshot.test/api/shot" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "viewport_width": 1200,
    "viewport_height": 800,
    "webhook_url": "https://myapp.com/webhook"
  }'

🎯 Core APIs

HTML Crawling

  • POST /api/crawl - Create HTML crawl job with ad blocking
  • GET /api/crawl/{uuid} - Get crawl status and results
  • GET /api/crawl/{uuid}.html - Download HTML file directly

Screenshot Capture

  • POST /api/shot - Create screenshot job (always WebP format)
  • GET /api/shot/{uuid} - Get screenshot status and results
  • GET /api/shot/{uuid}.webp - Download image file directly

Webhook Management

  • GET /api/webhook-errors - List failed webhook deliveries
  • POST /api/webhook-errors/{uuid}/retry - Retry failed webhook
  • DELETE /api/webhook-errors/{uuid}/clear - Clear webhook error

Client Library Methods

Method Returns Description
$client->crawl($url)->create() CrawlResponse Fluent crawl job creation
$client->getCrawlStatus($uuid) CrawlResponse Typed crawl status
$client->shot($url)->create() ShotResponse Fluent screenshot creation
$client->getShotStatus($uuid) ShotResponse Typed screenshot status
$client->listWebhookErrors() array Failed webhook list

🔧 Architecture & Features

Webhook System

  • Event Filtering - Choose which status changes trigger webhooks (queued, processing, completed, failed)
  • Progressive Retry - Automatic retry with exponential backoff (1, 2, 4, 8, 16, 32 minutes)
  • Error Management - List, retry, and clear failed webhook deliveries
  • Consistent Payload - Webhook data matches status API responses exactly

Smart Filtering

  • EasyList Integration - Automatic ad/tracker/cookie banner blocking
  • Custom Blocking - Fine-grained control over content filtering
  • Performance Optimized - Cached filter lists with 24-hour updates

Developer Experience

  • Fluent Interface - Method chaining for clean, readable code
  • Typed Responses - CrawlResponse and ShotResponse classes with helpful methods
  • Laravel Integration - Service providers, facades, auto-discovery
  • Comprehensive Docs - Complete API and client documentation

🛠️ Requirements & Setup

System Requirements

  • PHP 8.3+ with ImageMagick extension
  • Laravel 12.0+ framework
  • Node.js with Puppeteer for browser automation
  • Database (SQLite included, MySQL/PostgreSQL supported)

Quick Setup

# Clone and install
git clone [repository] && cd crawlshot
composer install && npm install puppeteer

# Configure and run
cp .env.example .env
php artisan key:generate
php artisan migrate
php artisan serve

# Start queue processing (separate terminal)
php artisan horizon

Key Dependencies

📄 License

MIT License - see LICENSE file for details.


Get Started → | View API Docs → | Setup Guide →

Description
No description provided
Readme 221 KiB
Languages
PHP 59.1%
Blade 40.5%
CSS 0.2%
JavaScript 0.2%