d5b5742e62e2317530a58323d229f81cb4634788
Crawlshot
A Laravel web crawling and screenshot service with dual deployment options:
- Standalone API Service - Full Laravel application with REST API endpoints
- Laravel Package - HTTP client package for use in other Laravel applications
Architecture Overview
Standalone API Service
The main Laravel application provides a complete web crawling and screenshot service:
- Spatie Browsershot Integration - Uses Puppeteer for browser automation
- EasyList Ad Blocking - Automatic ad/tracker blocking using EasyList filters
- Queue Processing - Laravel Horizon for async job processing
- 24-hour Cleanup - Automatic file and database cleanup
- Sanctum Authentication - API token-based authentication
- SQLite Database - Stores job metadata and processing status
Laravel Package
Simple HTTP client package that provides a clean interface to the API:
- 8 Methods for 8 APIs - Direct 1:1 mapping to REST endpoints
- Facade Support - Clean Laravel integration
- Auto-discovery - Automatic service provider registration
Deployment Options
Option 1: Standalone API Service
Deploy as a complete Laravel application:
git clone [repository]
cd crawlshot
composer install
npm install puppeteer
php artisan migrate
php artisan serve
API Endpoints:
POST /api/crawl- Create HTML crawl jobGET /api/crawl/{uuid}- Get crawl status/resultGET /api/crawl- List all crawl jobsPOST /api/shot- Create screenshot jobGET /api/shot/{uuid}- Get screenshot status/resultGET /api/shot/{uuid}/download- Download screenshot fileGET /api/shot- List all screenshot jobsGET /api/health- Health check
Example API Usage:
# Create crawl job
curl -X POST "https://crawlshot.test/api/crawl" \
-H "Authorization: Bearer {token}" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "block_ads": true}'
# Check status
curl -H "Authorization: Bearer {token}" \
"https://crawlshot.test/api/crawl/{uuid}"
Option 2: Laravel Package
Install as a package in your Laravel application:
composer require crawlshot/laravel
php artisan vendor:publish --tag=crawlshot-config
Configuration:
CRAWLSHOT_BASE_URL=https://your-crawlshot-api.com
CRAWLSHOT_TOKEN=your-sanctum-token
Package Usage:
use Crawlshot\Laravel\Facades\Crawlshot;
// Create crawl job
$response = Crawlshot::createCrawl('https://example.com', [
'block_ads' => true,
'timeout' => 30
]);
// Check status
$status = Crawlshot::getCrawlStatus($response['uuid']);
// Create screenshot
$response = Crawlshot::createShot('https://example.com', [
'format' => 'jpg',
'width' => 1920,
'height' => 1080
]);
// Download screenshot
$imageData = Crawlshot::downloadShot($response['uuid']);
file_put_contents('screenshot.jpg', $imageData);
API Reference
Available Methods (Package)
| Method | API Endpoint | Description |
|---|---|---|
createCrawl(string $url, array $options = []) |
POST /api/crawl |
Create crawl job |
getCrawlStatus(string $uuid) |
GET /api/crawl/{uuid} |
Get crawl status |
listCrawls() |
GET /api/crawl |
List all crawl jobs |
createShot(string $url, array $options = []) |
POST /api/shot |
Create screenshot job |
getShotStatus(string $uuid) |
GET /api/shot/{uuid} |
Get screenshot status |
downloadShot(string $uuid) |
GET /api/shot/{uuid}/download |
Download screenshot file |
listShots() |
GET /api/shot |
List all screenshot jobs |
health() |
GET /api/health |
Health check |
Crawl Options
[
'block_ads' => true, // Block ads using EasyList
'block_trackers' => true, // Block tracking scripts
'timeout' => 30, // Request timeout in seconds
'user_agent' => 'Custom UA', // Custom user agent
'wait_until' => 'networkidle0' // Wait condition
]
Screenshot Options
[
'format' => 'jpg', // jpg, png, webp
'quality' => 90, // 1-100 for jpg/webp
'width' => 1920, // Viewport width
'height' => 1080, // Viewport height
'full_page' => true, // Capture full page
'block_ads' => true, // Block ads
'timeout' => 30 // Request timeout
]
Features
Core Functionality
- HTML Crawling - Extract clean HTML content from web pages
- Screenshot Capture - Generate high-quality screenshots (JPG, PNG, WebP)
- Ad Blocking - Built-in EasyList integration for ad/tracker blocking
- Queue Processing - Async job processing with Laravel Horizon
- File Management - Automatic cleanup after 24 hours
Technical Features
- Laravel 12 support with PHP 8.3+
- Puppeteer Integration via Spatie Browsershot
- Sanctum Authentication for API security
- SQLite Database with migrations
- Auto-discovery for package installation
- Environment Configuration via .env variables
Development
Requirements
- PHP 8.3+
- Laravel 12.0+
- Node.js with Puppeteer
- SQLite (or other database)
- ImageMagick extension
Key Dependencies
spatie/browsershot- Browser automationprotonlabs/php-adblock-parser- EasyList parsinglaravel/horizon- Queue monitoring (standalone)laravel/sanctum- API authentication (standalone)
File Structure
├── app/ # Laravel application (standalone)
│ ├── Http/Controllers/Api/ # API controllers
│ ├── Jobs/ # Queue jobs
│ ├── Models/ # Eloquent models
│ └── Services/ # Core services
├── src/ # Package source (both modes)
│ ├── CrawlshotClient.php # HTTP client (package mode)
│ ├── CrawlshotServiceProvider.php
│ ├── Facades/Crawlshot.php
│ └── config/crawlshot.php
├── routes/api.php # API routes (standalone)
├── database/migrations/ # Database schema
└── composer.json # Package definition
License
MIT
Description
Languages
PHP
59.1%
Blade
40.5%
CSS
0.2%
JavaScript
0.2%