12 KiB
Crawlshot API Documentation
Crawlshot is a self-hosted web crawling and screenshot service built with Laravel and Spatie Browsershot. This API provides endpoints for capturing web content and generating screenshots with advanced filtering capabilities.
Base URL
https://crawlshot.test
Authentication
All API endpoints (except health check) require authentication using Laravel Sanctum API tokens.
Authentication Header
Authorization: Bearer {your-api-token}
Example API Token
1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c
Health Check
GET /api/health
Check if the Crawlshot service is running and healthy.
Authentication: Not required
Request Example
curl -X GET "https://crawlshot.test/api/health" \
-H "Accept: application/json"
Response Example
{
"status": "healthy",
"timestamp": "2025-08-10T09:54:52.195383Z",
"service": "crawlshot"
}
Web Crawling APIs
POST /api/crawl
Initiate a web crawling job to extract HTML content from a URL.
Authentication: Required
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string | ✅ | - | Target URL to crawl (max 2048 chars) |
timeout |
integer | ❌ | 30 | Request timeout in seconds (5-300) |
delay |
integer | ❌ | 0 | Wait time before capture in milliseconds (0-30000) |
block_ads |
boolean | ❌ | true | Block ads using EasyList filters |
block_cookie_banners |
boolean | ❌ | true | Block cookie consent banners |
block_trackers |
boolean | ❌ | true | Block tracking scripts |
wait_until_network_idle |
boolean | ❌ | false | Wait for network activity to cease |
Request Example
curl -X POST "https://crawlshot.test/api/crawl" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c" \
-d '{
"url": "https://example.com",
"timeout": 30,
"delay": 2000,
"block_ads": true,
"block_cookie_banners": true,
"block_trackers": true,
"wait_until_network_idle": true
}'
Response Example
{
"uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
"status": "queued",
"message": "Crawl job initiated successfully"
}
GET /api/crawl/{uuid}
Check the status and retrieve results of a crawl job.
Authentication: Required
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
uuid |
string | ✅ | Job UUID returned from crawl initiation |
Request Example
curl -X GET "https://crawlshot.test/api/crawl/b5dc483b-f62d-4e40-8b9e-4715324a8cbb" \
-H "Accept: application/json" \
-H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c"
Response Examples
Queued Status:
{
"uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
"status": "queued",
"url": "https://example.com",
"created_at": "2025-08-10T10:00:42.000000Z"
}
Processing Status:
{
"uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
"status": "processing",
"url": "https://example.com",
"created_at": "2025-08-10T10:00:42.000000Z",
"started_at": "2025-08-10T10:00:45.000000Z"
}
Completed Status:
{
"uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
"status": "completed",
"url": "https://example.com",
"created_at": "2025-08-10T10:00:42.000000Z",
"started_at": "2025-08-10T10:00:45.000000Z",
"completed_at": "2025-08-10T10:01:12.000000Z",
"result": "<!doctype html>\n<html>\n<head>\n <title>Example Domain</title>\n</head>\n<body>\n <h1>Example Domain</h1>\n <p>This domain is for use in illustrative examples...</p>\n</body>\n</html>"
}
Failed Status:
{
"uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
"status": "failed",
"url": "https://example.com",
"created_at": "2025-08-10T10:00:42.000000Z",
"started_at": "2025-08-10T10:00:45.000000Z",
"completed_at": "2025-08-10T10:00:50.000000Z",
"error": "Timeout: Navigation failed after 30 seconds"
}
GET /api/crawl
List all crawl jobs with pagination (optional endpoint for debugging).
Authentication: Required
Request Example
curl -X GET "https://crawlshot.test/api/crawl" \
-H "Accept: application/json" \
-H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c"
Response Example
{
"jobs": [
{
"uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
"type": "crawl",
"url": "https://example.com",
"status": "completed",
"created_at": "2025-08-10T10:00:42.000000Z",
"completed_at": "2025-08-10T10:01:12.000000Z"
}
],
"pagination": {
"current_page": 1,
"total_pages": 5,
"total_items": 100,
"per_page": 20
}
}
Screenshot APIs
POST /api/shot
Initiate a screenshot job to capture an image of a webpage.
Authentication: Required
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string | ✅ | - | Target URL to screenshot (max 2048 chars) |
viewport_width |
integer | ❌ | 1920 | Viewport width in pixels (320-3840) |
viewport_height |
integer | ❌ | 1080 | Viewport height in pixels (240-2160) |
format |
string | ❌ | "jpg" | Image format: "jpg", "png", "webp" |
quality |
integer | ❌ | 90 | Image quality 1-100 (for JPEG/WebP) |
timeout |
integer | ❌ | 30 | Request timeout in seconds (5-300) |
delay |
integer | ❌ | 0 | Wait time before capture in milliseconds (0-30000) |
block_ads |
boolean | ❌ | true | Block ads using EasyList filters |
block_cookie_banners |
boolean | ❌ | true | Block cookie consent banners |
block_trackers |
boolean | ❌ | true | Block tracking scripts |
Request Example
curl -X POST "https://crawlshot.test/api/shot" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c" \
-d '{
"url": "https://example.com",
"viewport_width": 1920,
"viewport_height": 1080,
"format": "webp",
"quality": 90,
"timeout": 30,
"delay": 2000,
"block_ads": true,
"block_cookie_banners": true,
"block_trackers": true
}'
Response Example
{
"uuid": "fe37d511-99cb-4295-853b-6d484900a851",
"status": "queued",
"message": "Screenshot job initiated successfully"
}
GET /api/shot/{uuid}
Check the status and retrieve results of a screenshot job. When completed, returns base64 image data and download URL.
Authentication: Required
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
uuid |
string | ✅ | Job UUID returned from screenshot initiation |
Request Example
curl -X GET "https://crawlshot.test/api/shot/fe37d511-99cb-4295-853b-6d484900a851" \
-H "Accept: application/json" \
-H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c"
Response Examples
Queued Status:
{
"uuid": "fe37d511-99cb-4295-853b-6d484900a851",
"status": "queued",
"url": "https://example.com",
"created_at": "2025-08-10T10:05:42.000000Z"
}
Processing Status:
{
"uuid": "fe37d511-99cb-4295-853b-6d484900a851",
"status": "processing",
"url": "https://example.com",
"created_at": "2025-08-10T10:05:42.000000Z",
"started_at": "2025-08-10T10:05:45.000000Z"
}
Completed Status:
{
"uuid": "fe37d511-99cb-4295-853b-6d484900a851",
"status": "completed",
"url": "https://example.com",
"created_at": "2025-08-10T10:05:42.000000Z",
"started_at": "2025-08-10T10:05:45.000000Z",
"completed_at": "2025-08-10T10:06:12.000000Z",
"result": {
"image_data": "iVBORw0KGgoAAAANSUhEUgAAAHgAAAAyCAYAAACXpx/Y...",
"download_url": "https://crawlshot.test/api/shot/fe37d511-99cb-4295-853b-6d484900a851/download",
"mime_type": "image/webp",
"format": "webp",
"width": 1920,
"height": 1080,
"size": 45678
}
}
Failed Status:
{
"uuid": "fe37d511-99cb-4295-853b-6d484900a851",
"status": "failed",
"url": "https://example.com",
"created_at": "2025-08-10T10:05:42.000000Z",
"started_at": "2025-08-10T10:05:45.000000Z",
"completed_at": "2025-08-10T10:05:50.000000Z",
"error": "Timeout: Navigation failed after 30 seconds"
}
GET /api/shot/{uuid}/download
Download the screenshot file directly. Returns the actual image file with appropriate headers.
Authentication: Required
Path Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
uuid |
string | ✅ | Job UUID of a completed screenshot job |
Request Example
curl -X GET "https://crawlshot.test/api/shot/fe37d511-99cb-4295-853b-6d484900a851/download" \
-H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c" \
--output screenshot.webp
Response
Returns the image file directly with appropriate Content-Type headers:
Content-Type: image/jpegfor JPEG filesContent-Type: image/pngfor PNG filesContent-Type: image/webpfor WebP files
GET /api/shot
List all screenshot jobs with pagination (optional endpoint for debugging).
Authentication: Required
Request Example
curl -X GET "https://crawlshot.test/api/shot" \
-H "Accept: application/json" \
-H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c"
Response Example
{
"jobs": [
{
"uuid": "fe37d511-99cb-4295-853b-6d484900a851",
"type": "shot",
"url": "https://example.com",
"status": "completed",
"created_at": "2025-08-10T10:05:42.000000Z",
"completed_at": "2025-08-10T10:06:12.000000Z"
}
],
"pagination": {
"current_page": 1,
"total_pages": 3,
"total_items": 50,
"per_page": 20
}
}
Job Status Flow
Both crawl and screenshot jobs follow the same status progression:
queued- Job created and waiting for processingprocessing- Job is currently being executed by a workercompleted- Job finished successfully, results availablefailed- Job encountered an error and could not complete
Error Responses
401 Unauthorized
{
"message": "Unauthenticated."
}
404 Not Found
{
"error": "Job not found"
}
422 Validation Error
{
"message": "The given data was invalid.",
"errors": {
"url": [
"The url field is required."
],
"timeout": [
"The timeout must be between 5 and 300."
]
}
}
Features
Ad & Tracker Blocking
- EasyList Integration: Automatically downloads and applies EasyList filters
- Cookie Banner Blocking: Removes cookie consent prompts
- Tracker Blocking: Blocks Google Analytics, Facebook Pixel, and other tracking scripts
- Custom Domain Blocking: Blocks common advertising and tracking domains
Image Processing
- Multiple Formats: Support for JPEG, PNG, and WebP
- Quality Control: Adjustable compression quality (1-100)
- Imagick Integration: High-quality image processing and format conversion
- Responsive Sizing: Custom viewport dimensions up to 4K resolution
Storage & Cleanup
- 24-Hour TTL: All files automatically deleted after 24 hours
- Scheduled Cleanup: Daily automated cleanup of expired files
- Manual Cleanup:
php artisan crawlshot:prune-storagecommand available
Performance
- Background Processing: All jobs processed asynchronously via Laravel Horizon
- Queue Management: Built-in retry logic and failure handling
- Caching: EasyList filters cached for optimal performance
- Monitoring: Horizon dashboard for real-time job monitoring at
/horizon
Rate Limiting
API endpoints include rate limiting to prevent abuse. Contact your system administrator for current rate limit settings.
Support
For technical support or questions about the Crawlshot API, please refer to the system documentation or contact your administrator.