-
-
Notifications
You must be signed in to change notification settings - Fork 288
audit: BreadcrumbList, robots policy, og:title, sitemap priorities #832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0ba5264
4b143f0
758cff5
3ffe847
56cae9e
5de8526
0af8f3c
3c8ac01
b56c813
69e5eee
3622c50
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,12 +1,98 @@ | ||||||||||
| # Block specific bot | ||||||||||
| # Keploy docs robots.txt | ||||||||||
| # Policy: allow AI search/answer engines, block training-only crawlers, | ||||||||||
| # block Bytespider. Search bots drive visibility in ChatGPT, Claude, | ||||||||||
| # Perplexity, Copilot, Gemini answers. Training bots feed future model | ||||||||||
| # weights and provide nothing back. | ||||||||||
| # Reference: Speedscale / Katalon / Testsigma split policy (2026 competitor audit) | ||||||||||
|
Comment on lines
+1
to
+6
|
||||||||||
|
|
||||||||||
| # ============================================================================= | ||||||||||
| # ALLOW — AI search / answer engines | ||||||||||
| # Legacy-version disallows are repeated inside this group because a bot that | ||||||||||
| # matches a named User-agent group only reads rules from THAT group; it does | ||||||||||
| # not fall through to `User-agent: *`. Without these lines, Perplexity/ | ||||||||||
| # Applebot/OAI-SearchBot/etc. would still crawl /docs/{1,2,3}.0.0/ despite | ||||||||||
| # the global block further below. | ||||||||||
| # ============================================================================= | ||||||||||
|
|
||||||||||
| User-agent: OAI-SearchBot | ||||||||||
| User-agent: ChatGPT-User | ||||||||||
| User-agent: Claude-SearchBot | ||||||||||
| User-agent: Claude-User | ||||||||||
| User-agent: PerplexityBot | ||||||||||
| User-agent: Perplexity-User | ||||||||||
| User-agent: Gemini-Deep-Research | ||||||||||
| User-agent: GoogleOther | ||||||||||
| User-agent: Applebot | ||||||||||
| User-agent: DuckAssistBot | ||||||||||
| User-agent: Amazonbot | ||||||||||
| Allow: / | ||||||||||
|
||||||||||
| Allow: / | |
| Allow: / | |
| Crawl-delay: 5 | |
| Disallow: /cgi-bin/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 56cae9e. Added Crawl-delay: 5 and Disallow: /cgi-bin/ inside the named AI-search User-agent group so the allowed bots (OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Perplexity-User, Gemini-Deep-Research, GoogleOther, Applebot, DuckAssistBot, Amazonbot) get the same rate-limit and global disallow as User-agent: *. The legacy-version disallows (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) were already duplicated in this group for the same inheritance reason — this extends that pattern to the two global rules you flagged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 56cae9e: Crawl-delay: 5 and Disallow: /cgi-bin/ are now mirrored inside the AI-search allow group alongside the legacy-version disallows, so the group is a proper superset of the User-agent: * defaults. Named AI search bots (Perplexity/Applebot/OAI-SearchBot/etc.) now see the same crawl-rate limit and /cgi-bin/ block as fall-through bots.
Copilot
AI
Apr 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The legacy-version Disallow: /docs/1.0.0/ (and 2.0.0/3.0.0) rules are only under User-agent: *, so they will not apply to crawlers that match one of the explicit allow groups above (e.g., PerplexityBot, Applebot, OAI-SearchBot). If the intent is to block those legacy versions for all crawlers, either move the legacy disallows into each explicit allow group (and keep Allow: /), or remove the explicit allow groups entirely and let those bots fall through to User-agent: * (while keeping explicit disallow groups for training bots).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 758cff5. Went with option (a) but consolidated: the 11 AI-search-bot allow groups are now a single block that uses multiple User-agent: headers sharing one rule set, with the three legacy-version Disallow lines (/docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/) applied directly inside it. Same intent ("allow these AI search bots everywhere except legacy versions") but now actually enforced, and only 8 lines of net change instead of 33.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 758cff5 (the earlier commit that moved the legacy disallows inside the named allow group). The /docs/1.0.0/, /docs/2.0.0/, /docs/3.0.0/ lines now sit directly under the User-agent: OAI-SearchBot / ChatGPT-User / Claude-SearchBot / ... / Amazonbot block so every allowed AI bot gets the legacy-version block, not just crawlers that fall through to User-agent: *. 56cae9e just now extended the same pattern to Crawl-delay: 5 and Disallow: /cgi-bin/ per your other comment — both global rules are now duplicated inside the named group as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comments from all fines related to internal task ticket numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in b56c813 — stripped Task 35, LIVE-12, LIVE-13, and LIVE-20 ticket references from comments across docusaurus.config.js, src/theme/DocBreadcrumbs/index.js, and src/theme/DocItem/index.js. Kept the explanatory comments that describe why each piece of logic exists, just without the internal ticket numbers.