From Robots With Love

A Burp Professional and dirsearch optimized wordlist for content discovery, built by scraping and analyzing /robots.txt from the top 100k most visited domains in February 2026.

Wordlists

File	Tool	Description
`dirsearch-robots.txt`	dirsearch	Combined wordlist with `%EXT%` placeholders for dirsearch's extension handling
`burp-robots-files.txt`	Burp Suite	Files only (required by Burp's Content Discovery)
`burp-robots-directories.txt`	Burp Suite	Directories only (required by Burp's Content Discovery)

The underlying content is the same - the Burp lists are simply the dirsearch list split into files and directories.

Usage

Basic usage

The wordlist contains one entry per line and is optimized for recursive scanning:

python3 dirsearch.py --random-agent -u https://target.com \
  -w dirsearch-robots.txt \
  --recursive -R 3

Using extensions (recommended)

The wordlist uses %EXT% placeholders for server-side files. Define extensions based on the target stack to keep scans efficient and avoid testing irrelevant file types:

python3 dirsearch.py --random-agent -u https://target.com \
  -w dirsearch-robots.txt \
  --recursive -R 3 \
  -e php,html

Case variations

The wordlist is primarily lowercase. Let dirsearch handle case transformations automatically:

python3 dirsearch.py -u https://target.com \
  -w dirsearch-robots.txt \
  --recursive -R 3 \
  -e php \
  --capital

Tips

Choose extensions based on the target stack to avoid unnecessary requests.
Adjust case transformations depending on the target environment.
Use recursion for deeper discovery.
Refer to the dirsearch and Burp documentation for additional tuning options.

Motivation

In pentests, a common question is: Which wordlist should I use for content discovery?

For many testers, the go-to choice is SecLists / Discovery / Web-Content. However, many of those wordlists come with practical limitations:

Outdated coverage
Some lists are up to 9 years old and don't reflect modern applications and technologies.
Redundant extensions
Entries like file.php, file.html, file.json test the same resource with multiple extensions, many of which may not exist due to the target’s technology stack, unnecessarily increasing scan time.
Overlap between lists
The same entries appear across multiple wordlists, leading to duplicate requests.
Noisy entries
Static assets (e.g. JavaScript files) and questionable entries (looking at you, raft-*.txt) add bulk without value.

The result is unnecessary requests, increased brute-force time, and less focused testing.

Approach

This project aims to create a universal and (relatively) compact wordlist that captures the most common directories and files while leveraging dirsearch's built-in features.

Crawl /robots.txt from the top 100,000 most visited domains.
Extract and clean paths from Disallow/Allow/Noindex directives.
Remove noise (see below).
Sort entries by frequency of occurrence across domains.
Keep only entries that occur at least 10 times overall.

Filtering

To reduce noise and improve scan efficiency, the following categories are removed:

Sex-related terms
Non-English/German language-specific words
Site-specific or highly contextual paths (e.g. product filter URLs from individual shops)
Language and country codes
City and brand names
Static content (JavaScript files, images, fonts, etc.)
Entries that don't meaningfully contribute to discovery

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
scripts		scripts
README.md		README.md
burp-robots-directories.txt		burp-robots-directories.txt
burp-robots-files.txt		burp-robots-files.txt
dirsearch-robots.txt		dirsearch-robots.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Robots With Love

Wordlists

Usage

Basic usage

Using extensions (recommended)

Case variations

Tips

Motivation

Approach

Filtering

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

From Robots With Love

Wordlists

Usage

Basic usage

Using extensions (recommended)

Case variations

Tips

Motivation

Approach

Filtering

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages