Files

ct 583a804073 Update

2025-08-10 21:10:33 +08:00

12 KiB

Raw Blame History

Crawlshot API Documentation

Crawlshot is a self-hosted web crawling and screenshot service built with Laravel and Spatie Browsershot. This API provides endpoints for capturing web content and generating screenshots with advanced filtering capabilities.

Base URL

https://crawlshot.test

Authentication

All API endpoints (except health check) require authentication using Laravel Sanctum API tokens.

Authentication Header

Authorization: Bearer {your-api-token}

Example API Token

1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c

Health Check

GET `/api/health`

Check if the Crawlshot service is running and healthy.

Authentication: Not required

Request Example

curl -X GET "https://crawlshot.test/api/health" \
  -H "Accept: application/json"

Response Example

{
  "status": "healthy",
  "timestamp": "2025-08-10T09:54:52.195383Z",
  "service": "crawlshot"
}

Web Crawling APIs

POST `/api/crawl`

Initiate a web crawling job to extract HTML content from a URL.

Authentication: Required

Request Parameters

Parameter	Type	Required	Default	Description
`url`	string	✅	-	Target URL to crawl (max 2048 chars)
`timeout`	integer	❌	30	Request timeout in seconds (5-300)
`delay`	integer	❌	0	Wait time before capture in milliseconds (0-30000)
`block_ads`	boolean	❌	true	Block ads using EasyList filters
`block_cookie_banners`	boolean	❌	true	Block cookie consent banners
`block_trackers`	boolean	❌	true	Block tracking scripts
`wait_until_network_idle`	boolean	❌	false	Wait for network activity to cease

Request Example

curl -X POST "https://crawlshot.test/api/crawl" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c" \
  -d '{
    "url": "https://example.com",
    "timeout": 30,
    "delay": 2000,
    "block_ads": true,
    "block_cookie_banners": true,
    "block_trackers": true,
    "wait_until_network_idle": true
  }'

Response Example

{
  "uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
  "status": "queued",
  "message": "Crawl job initiated successfully"
}

GET `/api/crawl/{uuid}`

Check the status and retrieve results of a crawl job.

Authentication: Required

Path Parameters

Parameter	Type	Required	Description
`uuid`	string	✅	Job UUID returned from crawl initiation

Request Example

curl -X GET "https://crawlshot.test/api/crawl/b5dc483b-f62d-4e40-8b9e-4715324a8cbb" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c"

Response Examples

Queued Status:

{
  "uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
  "status": "queued",
  "url": "https://example.com",
  "created_at": "2025-08-10T10:00:42.000000Z"
}

Processing Status:

{
  "uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
  "status": "processing",
  "url": "https://example.com",
  "created_at": "2025-08-10T10:00:42.000000Z",
  "started_at": "2025-08-10T10:00:45.000000Z"
}

Completed Status:

{
  "uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
  "status": "completed",
  "url": "https://example.com",
  "created_at": "2025-08-10T10:00:42.000000Z",
  "started_at": "2025-08-10T10:00:45.000000Z",
  "completed_at": "2025-08-10T10:01:12.000000Z",
  "result": "<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n</head>\n<body>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples...</p>\n</body>\n</html>"
}

Failed Status:

{
  "uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
  "status": "failed",
  "url": "https://example.com",
  "created_at": "2025-08-10T10:00:42.000000Z",
  "started_at": "2025-08-10T10:00:45.000000Z",
  "completed_at": "2025-08-10T10:00:50.000000Z",
  "error": "Timeout: Navigation failed after 30 seconds"
}

GET `/api/crawl`

List all crawl jobs with pagination (optional endpoint for debugging).

Authentication: Required

Request Example

curl -X GET "https://crawlshot.test/api/crawl" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c"

Response Example

{
  "jobs": [
    {
      "uuid": "b5dc483b-f62d-4e40-8b9e-4715324a8cbb",
      "type": "crawl",
      "url": "https://example.com",
      "status": "completed",
      "created_at": "2025-08-10T10:00:42.000000Z",
      "completed_at": "2025-08-10T10:01:12.000000Z"
    }
  ],
  "pagination": {
    "current_page": 1,
    "total_pages": 5,
    "total_items": 100,
    "per_page": 20
  }
}

Screenshot APIs

POST `/api/shot`

Initiate a screenshot job to capture an image of a webpage.

Authentication: Required

Request Parameters

Parameter	Type	Required	Default	Description
`url`	string	✅	-	Target URL to screenshot (max 2048 chars)
`viewport_width`	integer	❌	1920	Viewport width in pixels (320-3840)
`viewport_height`	integer	❌	1080	Viewport height in pixels (240-2160)
`format`	string	❌	"jpg"	Image format: "jpg", "png", "webp"
`quality`	integer	❌	90	Image quality 1-100 (for JPEG/WebP)
`timeout`	integer	❌	30	Request timeout in seconds (5-300)
`delay`	integer	❌	0	Wait time before capture in milliseconds (0-30000)
`block_ads`	boolean	❌	true	Block ads using EasyList filters
`block_cookie_banners`	boolean	❌	true	Block cookie consent banners
`block_trackers`	boolean	❌	true	Block tracking scripts

Request Example

curl -X POST "https://crawlshot.test/api/shot" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c" \
  -d '{
    "url": "https://example.com",
    "viewport_width": 1920,
    "viewport_height": 1080,
    "format": "webp",
    "quality": 90,
    "timeout": 30,
    "delay": 2000,
    "block_ads": true,
    "block_cookie_banners": true,
    "block_trackers": true
  }'

Response Example

{
  "uuid": "fe37d511-99cb-4295-853b-6d484900a851",
  "status": "queued",
  "message": "Screenshot job initiated successfully"
}

GET `/api/shot/{uuid}`

Check the status and retrieve results of a screenshot job. When completed, returns base64 image data and download URL.

Authentication: Required

Path Parameters

Parameter	Type	Required	Description
`uuid`	string	✅	Job UUID returned from screenshot initiation

Request Example

curl -X GET "https://crawlshot.test/api/shot/fe37d511-99cb-4295-853b-6d484900a851" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c"

Response Examples

Queued Status:

{
  "uuid": "fe37d511-99cb-4295-853b-6d484900a851",
  "status": "queued",
  "url": "https://example.com",
  "created_at": "2025-08-10T10:05:42.000000Z"
}

Processing Status:

{
  "uuid": "fe37d511-99cb-4295-853b-6d484900a851",
  "status": "processing",
  "url": "https://example.com",
  "created_at": "2025-08-10T10:05:42.000000Z",
  "started_at": "2025-08-10T10:05:45.000000Z"
}

Completed Status:

{
  "uuid": "fe37d511-99cb-4295-853b-6d484900a851",
  "status": "completed",
  "url": "https://example.com",
  "created_at": "2025-08-10T10:05:42.000000Z",
  "started_at": "2025-08-10T10:05:45.000000Z",
  "completed_at": "2025-08-10T10:06:12.000000Z",
  "result": {
    "image_data": "iVBORw0KGgoAAAANSUhEUgAAAHgAAAAyCAYAAACXpx/Y...",
    "download_url": "https://crawlshot.test/api/shot/fe37d511-99cb-4295-853b-6d484900a851/download",
    "mime_type": "image/webp",
    "format": "webp",
    "width": 1920,
    "height": 1080,
    "size": 45678
  }
}

Failed Status:

{
  "uuid": "fe37d511-99cb-4295-853b-6d484900a851",
  "status": "failed",
  "url": "https://example.com",
  "created_at": "2025-08-10T10:05:42.000000Z",
  "started_at": "2025-08-10T10:05:45.000000Z",
  "completed_at": "2025-08-10T10:05:50.000000Z",
  "error": "Timeout: Navigation failed after 30 seconds"
}

GET `/api/shot/{uuid}/download`

Download the screenshot file directly. Returns the actual image file with appropriate headers.

Authentication: Required

Path Parameters

Parameter	Type	Required	Description
`uuid`	string	✅	Job UUID of a completed screenshot job

Request Example

curl -X GET "https://crawlshot.test/api/shot/fe37d511-99cb-4295-853b-6d484900a851/download" \
  -H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c" \
  --output screenshot.webp

Response

Returns the image file directly with appropriate Content-Type headers:

Content-Type: image/jpeg for JPEG files
Content-Type: image/png for PNG files
Content-Type: image/webp for WebP files

GET `/api/shot`

List all screenshot jobs with pagination (optional endpoint for debugging).

Authentication: Required

Request Example

curl -X GET "https://crawlshot.test/api/shot" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer 1|rrWUM5ZkmLfGipkm1oIusYX45KbukIekUwMjgB3Nd1121a5c"

Response Example

{
  "jobs": [
    {
      "uuid": "fe37d511-99cb-4295-853b-6d484900a851",
      "type": "shot",
      "url": "https://example.com",
      "status": "completed",
      "created_at": "2025-08-10T10:05:42.000000Z",
      "completed_at": "2025-08-10T10:06:12.000000Z"
    }
  ],
  "pagination": {
    "current_page": 1,
    "total_pages": 3,
    "total_items": 50,
    "per_page": 20
  }
}

Job Status Flow

Both crawl and screenshot jobs follow the same status progression:

queued - Job created and waiting for processing
processing - Job is currently being executed by a worker
completed - Job finished successfully, results available
failed - Job encountered an error and could not complete

Error Responses

401 Unauthorized

{
  "message": "Unauthenticated."
}

404 Not Found

{
  "error": "Job not found"
}

422 Validation Error

{
  "message": "The given data was invalid.",
  "errors": {
    "url": [
      "The url field is required."
    ],
    "timeout": [
      "The timeout must be between 5 and 300."
    ]
  }
}

Features

Ad & Tracker Blocking

EasyList Integration: Automatically downloads and applies EasyList filters
Cookie Banner Blocking: Removes cookie consent prompts
Tracker Blocking: Blocks Google Analytics, Facebook Pixel, and other tracking scripts
Custom Domain Blocking: Blocks common advertising and tracking domains

Image Processing

Multiple Formats: Support for JPEG, PNG, and WebP
Quality Control: Adjustable compression quality (1-100)
Imagick Integration: High-quality image processing and format conversion
Responsive Sizing: Custom viewport dimensions up to 4K resolution

Storage & Cleanup

24-Hour TTL: All files automatically deleted after 24 hours
Scheduled Cleanup: Daily automated cleanup of expired files
Manual Cleanup: php artisan crawlshot:prune-storage command available

Performance

Background Processing: All jobs processed asynchronously via Laravel Horizon
Queue Management: Built-in retry logic and failure handling
Caching: EasyList filters cached for optimal performance
Monitoring: Horizon dashboard for real-time job monitoring at /horizon

Rate Limiting

API endpoints include rate limiting to prevent abuse. Contact your system administrator for current rate limit settings.

Support

For technical support or questions about the Crawlshot API, please refer to the system documentation or contact your administrator.

12 KiB Raw Blame History

Crawlshot API Documentation

Base URL

Authentication

Authentication Header

Example API Token

Health Check

GET /api/health

Request Example

Response Example

Web Crawling APIs

POST /api/crawl

Request Parameters

Request Example

Response Example

GET /api/crawl/{uuid}

Path Parameters

Request Example

Response Examples

GET /api/crawl

Request Example

Response Example

Screenshot APIs

POST /api/shot

Request Parameters

Request Example

Response Example

GET /api/shot/{uuid}

Path Parameters

Request Example

Response Examples

GET /api/shot/{uuid}/download

Path Parameters

Request Example

Response

GET /api/shot

Request Example

Response Example

Job Status Flow

Error Responses

401 Unauthorized

404 Not Found

422 Validation Error

Features

Ad & Tracker Blocking

Image Processing

Storage & Cleanup

Performance

Rate Limiting

Support

12 KiB

Raw Blame History

GET `/api/health`

POST `/api/crawl`

GET `/api/crawl/{uuid}`

GET `/api/crawl`

POST `/api/shot`

GET `/api/shot/{uuid}`

GET `/api/shot/{uuid}/download`

GET `/api/shot`