Extracting structured data with schemas
The Structured extraction schema field lets you define a JSON schema so FireScraper extracts specific, typed fields from every page it crawls — instead of returning raw text.
When to use a schema
Use structured extraction when you need specific data points from pages, not just their text content. Common use cases:
How it works
corpus-extracted.json file you can downloadSchema format
The schema must be a valid JSON object with a properties field. Each property has a type.
Supported types
| Type | Description | Example value |
|------|-------------|---------------|
| string | Text value | "Widget Pro" |
| number | Numeric value | 49.99 |
| boolean | True or false | true |
| array | List of values | ["red", "blue"] |
Example schemas
Product pricing
{
"type": "object",
"properties": {
"title": { "type": "string" },
"price": { "type": "number" },
"description": { "type": "string" }
}
}
Output per page:
{
"url": "https://store.example.com/widget-pro",
"extracted": {
"title": "Widget Pro",
"price": 49.99,
"description": "Professional-grade widget with 3-year warranty"
}
}
Job listings
{
"type": "object",
"properties": {
"title": { "type": "string" },
"company": { "type": "string" },
"location": { "type": "string" },
"salary": { "type": "string" },
"remote": { "type": "boolean" }
}
}
Blog articles
{
"type": "object",
"properties": {
"title": { "type": "string" },
"author": { "type": "string" },
"tags": { "type": "array" }
}
}
Array values are split from text using commas, semicolons, or pipe characters. For example, "AI, Machine Learning, NLP" becomes ["AI", "Machine Learning", "NLP"].
How extraction works internally
FireScraper uses three strategies to match your schema fields to page content:
title map to the page's HTML <title> tag, and url maps to the page URL"Price: $50" or "Author: Jane Smith" using the field name as a label$ and converts "49.99" to the number 49.99)Tips
price works better than p or field_3string as the default type when unsure — it's the most forgivingnumber for prices — FireScraper automatically strips currency symbols and extracts the numeric valuecorpus-extracted.json download to verify extraction quality before building a pipeline around itWas this article helpful?
