Module 01 — Web Crawler & Discovery¶
Icon: Magnifying Glass | Colour: Blue
Overview¶
The crawler is the first module to run and forms the foundation for all subsequent tests. It discovers pages, endpoints, forms, API routes, and JavaScript files across the target application.
How It Works¶
- Well-known path probing — checks approximately 40 common paths (e.g.
/robots.txt,/sitemap.xml,/.env,/wp-admin,/api/docs). - Breadth-first crawl — follows links from the start URL up to the configured depth and page limits.
- Dual-mode fetching — crawls both anonymously and with the provided Bearer token (if supplied) to discover authenticated-only content.
- Form & link extraction — identifies all HTML forms, anchor links, and script tags.
- OPTIONS requests — probes discovered endpoints to check for available HTTP methods.
- JavaScript analysis — extracts API endpoint patterns from JavaScript files (e.g.
fetch('/api/...')).
Expected Findings¶
| Finding | Severity |
|---|---|
| Crawl Summary | Info |
| Sensitive file discovered | Medium |
| Sensitive HTML comment | Low |
Tips¶
Tip
Providing a Bearer token when creating the scan significantly improves crawl coverage on applications with authenticated areas.
Tip
Use Skip Paths to prevent the crawler from visiting URLs that might cause side effects (e.g. /logout, /delete-account).