Robots.txt Generator

Create precise crawler directives for search engines, SEO tools, and AI bots with our comprehensive robots.txt generator. Save hours of manual configuration with pre-configured options for 40+ bots, including recently added AI crawlers like GPTBot, Claude-Web, and PerplexityBot. Control exactly how your content is accessed and indexed with just a few clicks.

1 Select User Agents

2 Configure Access Rules

Disallow (Block Access)

One path per line. Use * for wildcards.

Allow (Override Blocks)

Exceptions to disallow rules.

3 Advanced Settings (Optional)

Sitemap URL

Crawl Delay (seconds)

Generated robots.txt

# Robots.txt generated by CL SEO Tools # Generated: 2025-11-21 # Select user agents and configure rules above

Quick Guide

Select bots you want to control
Add paths to block in the Disallow section
Override blocks with Allow rules if needed
Generate and download your robots.txt file

Detailed Guide to Using Robots.txt

Understanding Robots.txt

The robots.txt file is a powerful tool that tells search engine crawlers which pages or sections of your site they can and cannot access. It's the first file bots check when visiting your website, making it crucial for SEO and site management.

How to Use This Tool

Select User Agents: Choose which bots you want to control. Use the search bar to find specific bots or filter by category. The "All Bots" (*) option applies rules to every crawler.
Configure Access Rules:
- Disallow: Blocks access to specified paths. Example: /admin/ blocks everything in the admin directory.
- Allow: Creates exceptions to disallow rules. Example: If you disallow /scripts/ but allow /scripts/public.js, only that file will be accessible.
Use Wildcards:
- * matches any sequence of characters
- $ matches the end of a URL
- Example: /*.pdf$ blocks all PDF files
Add Sitemap: Include your XML sitemap URL to help search engines discover your content more efficiently.
Set Crawl Delay: Use sparingly - this tells bots to wait X seconds between requests. Most modern search engines ignore this directive.

⚠️ Critical SEO Warnings

Never Block Major Search Engines Unless Absolutely Necessary

Blocking search engine bots can have severe consequences for your website's visibility:

Risks of Blocking Search Engine Bots

Googlebot: Blocking Google's crawler will:
- Remove your site from Google Search results entirely
- Stop indexing of new content
- Potentially impact Google Ads and other Google services
- Result in complete loss of organic traffic from Google (typically 50-90% of total search traffic)
Bingbot: Blocking Bing's crawler will:
- Remove your site from Bing and Yahoo search results
- Impact visibility on Microsoft ecosystem (Cortana, Windows search)
- Lose approximately 10-30% of search traffic
Other Search Engines: Each blocked search engine means lost visibility on that platform, reducing your potential audience reach.

Best Practices for Robots.txt

✅ Do's

Block sensitive areas: Admin pages, internal search results, duplicate content, and development areas.
Use specific paths: Be precise with your rules to avoid accidentally blocking important content.
Include your sitemap: Always add your XML sitemap URL for better crawling efficiency.
Block resource-heavy bots: Consider blocking aggressive SEO tool bots if they're consuming too much bandwidth.
Test before deploying: Use Google Search Console's robots.txt tester to verify your rules.

❌ Don'ts

Don't block search engines unless you want to hide your entire site.
Don't use robots.txt for security: It's publicly visible - use proper authentication instead.
Don't block CSS/JS files that search engines need to render your pages properly.
Don't use comments with sensitive information - robots.txt is public.
Don't forget the file location: robots.txt must be in your root directory (example.com/robots.txt).

Common User Agent Guidelines

AI Crawlers

With the rise of AI, many websites choose to block AI training bots to protect their content:

GPTBot, ChatGPT-User: OpenAI's AI crawlers
CCBot: Common Crawl dataset used by many AI companies
Claude-Web, Claude-User: Anthropic's AI crawlers
Consider blocking these if you don't want your content used for AI training

SEO Tool Bots

These bots analyze backlinks and SEO metrics:

AhrefsBot, SemrushBot, MJ12bot: Popular SEO analysis tools
Block if: They're using excessive bandwidth or you want to hide competitive data
Allow if: You use these tools yourself or want accurate data in their databases

Social Media Bots

These bots generate link previews when content is shared:

facebookexternalhit, Twitterbot: Create rich previews for shared links
LinkedInBot, WhatsApp: Generate link cards in messages
Always allow these for proper social media sharing functionality

💡 Pro Tip

Start with minimal restrictions and add more as needed. It's easier to open access later than to recover from accidentally blocking important crawlers. Always monitor your search console after making changes to ensure your site is being crawled properly.

Example Configurations

Standard WordPress Site

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /?s=
Disallow: /search/

Sitemap: https://example.com/sitemap.xml

E-commerce Site

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /*?sort=
Disallow: /*?filter=
Allow: /products/

User-agent: GPTBot
User-agent: CCBot
Disallow: /

Sitemap: https://example.com/sitemap.xml