Module 01 — Web Crawler & Discovery¶

Icon: Magnifying Glass | Colour: Blue

Overview¶

The crawler is the first module to run and forms the foundation for all subsequent tests. It discovers pages, endpoints, forms, API routes, and JavaScript files across the target application.

How It Works¶

Well-known path probing — checks approximately 40 common paths (e.g. /robots.txt, /sitemap.xml, /.env, /wp-admin, /api/docs).
Breadth-first crawl — follows links from the start URL up to the configured depth and page limits.
Dual-mode fetching — crawls both anonymously and with the provided Bearer token (if supplied) to discover authenticated-only content.
Form & link extraction — identifies all HTML forms, anchor links, and script tags.
OPTIONS requests — probes discovered endpoints to check for available HTTP methods.
JavaScript analysis — extracts API endpoint patterns from JavaScript files (e.g. fetch('/api/...')).

Expected Findings¶

Finding	Severity
Crawl Summary	Info
Sensitive file discovered	Medium
Sensitive HTML comment	Low

Tips¶

Tip

Providing a Bearer token when creating the scan significantly improves crawl coverage on applications with authenticated areas.

Tip

Use Skip Paths to prevent the crawler from visiting URLs that might cause side effects (e.g. /logout, /delete-account).