Back to articles

How crawl templates work

Projects & CrawlingMay 20, 2026

When you create a new project in FireScraper, you'll see crawl templates at the top of the dialog. Templates are pre-configured starting points that set sensible defaults for common scraping scenarios.

What templates control

Each template adjusts these settings automatically:

  • Crawl depth — how many levels of links to follow from each start URL
  • Minimum text length — pages with fewer words than this threshold are skipped
  • Scraper mode — whether to extract article text, full-page text, or structured data
  • Deduplication — whether to skip pages with duplicate content
  • Available templates

    Documentation & knowledge bases

    Best for building RAG datasets and knowledge bases. Sets depth to 3, extracts article text, enables deduplication, and requires a minimum of 50 words per page. This is the default template.

    Blog & articles

    Best for content monitoring and LLM fine-tuning datasets. Uses depth 1 (stays close to the start URLs), sets a higher minimum of 80 words to skip thin pages, and enables deduplication.

    Product & pricing pages

    Best for pricing monitors and competitive analysis. Uses depth 2, captures full-page text (not just article content), and works well with structured data extraction.

    Shallow / quick extract

    Best for quick tests and one-off extractions. Sets depth to 0 (only scrapes the exact URLs you provide, no link following) and applies no content filtering.

    Templates are starting points, not constraints

    After selecting a template, you can change any setting it applied. The template just saves you from configuring everything from scratch. If you change the depth or minimum text length after selecting a template, your custom values are used.

    Tips

  • Start with the Documentation template if you're unsure — it works well for most sites
  • Use Shallow mode to test a single URL before committing to a deep crawl
  • Switch to Product mode when you need pricing tables or structured page content that article extraction would miss
  • Was this article helpful?