Robots.txt Generator
Create precise crawler directives for search engines, SEO tools, and AI bots with our comprehensive robots.txt generator. Save hours of manual configuration with pre-configured options for 40+ bots, including recently added AI crawlers like GPTBot, Claude-Web, and PerplexityBot. Control exactly how your content is accessed and indexed with just a few clicks.
1 Select User Agents
2 Configure Access Rules
3 Advanced Settings (Optional)
Generated robots.txt
Quick Guide
- Select bots you want to control
- Add paths to block in the Disallow section
- Override blocks with Allow rules if needed
- Generate and download your robots.txt file
Detailed Guide to Using Robots.txt
Understanding Robots.txt
The robots.txt file is a powerful tool that tells search engine crawlers which pages or sections of your site they can and cannot access. It's the first file bots check when visiting your website, making it crucial for SEO and site management.
How to Use This Tool
- Select User Agents: Choose which bots you want to control. Use the search bar to find specific bots or filter by category. The "All Bots" (*) option applies rules to every crawler.
- Configure Access Rules:
- Disallow: Blocks access to specified paths. Example:
/admin/
blocks everything in the admin directory. - Allow: Creates exceptions to disallow rules. Example: If you disallow
/scripts/
but allow/scripts/public.js
, only that file will be accessible.
- Disallow: Blocks access to specified paths. Example:
- Use Wildcards:
*
matches any sequence of characters$
matches the end of a URL- Example:
/*.pdf$
blocks all PDF files
- Add Sitemap: Include your XML sitemap URL to help search engines discover your content more efficiently.
- Set Crawl Delay: Use sparingly - this tells bots to wait X seconds between requests. Most modern search engines ignore this directive.
⚠️ Critical SEO Warnings
Never Block Major Search Engines Unless Absolutely Necessary
Blocking search engine bots can have severe consequences for your website's visibility:
Risks of Blocking Search Engine Bots
- Googlebot: Blocking Google's crawler will:
- Remove your site from Google Search results entirely
- Stop indexing of new content
- Potentially impact Google Ads and other Google services
- Result in complete loss of organic traffic from Google (typically 50-90% of total search traffic)
- Bingbot: Blocking Bing's crawler will:
- Remove your site from Bing and Yahoo search results
- Impact visibility on Microsoft ecosystem (Cortana, Windows search)
- Lose approximately 10-30% of search traffic
- Other Search Engines: Each blocked search engine means lost visibility on that platform, reducing your potential audience reach.
Best Practices for Robots.txt
✅ Do's
- Block sensitive areas: Admin pages, internal search results, duplicate content, and development areas.
- Use specific paths: Be precise with your rules to avoid accidentally blocking important content.
- Include your sitemap: Always add your XML sitemap URL for better crawling efficiency.
- Block resource-heavy bots: Consider blocking aggressive SEO tool bots if they're consuming too much bandwidth.
- Test before deploying: Use Google Search Console's robots.txt tester to verify your rules.
❌ Don'ts
- Don't block search engines unless you want to hide your entire site.
- Don't use robots.txt for security: It's publicly visible - use proper authentication instead.
- Don't block CSS/JS files that search engines need to render your pages properly.
- Don't use comments with sensitive information - robots.txt is public.
- Don't forget the file location: robots.txt must be in your root directory (example.com/robots.txt).
Common User Agent Guidelines
AI Crawlers
With the rise of AI, many websites choose to block AI training bots to protect their content:
- GPTBot, ChatGPT-User: OpenAI's AI crawlers
- CCBot: Common Crawl dataset used by many AI companies
- Claude-Web, Claude-User: Anthropic's AI crawlers
- Consider blocking these if you don't want your content used for AI training
SEO Tool Bots
These bots analyze backlinks and SEO metrics:
- AhrefsBot, SemrushBot, MJ12bot: Popular SEO analysis tools
- Block if: They're using excessive bandwidth or you want to hide competitive data
- Allow if: You use these tools yourself or want accurate data in their databases
Social Media Bots
These bots generate link previews when content is shared:
- facebookexternalhit, Twitterbot: Create rich previews for shared links
- LinkedInBot, WhatsApp: Generate link cards in messages
- Always allow these for proper social media sharing functionality
💡 Pro Tip
Start with minimal restrictions and add more as needed. It's easier to open access later than to recover from accidentally blocking important crawlers. Always monitor your search console after making changes to ensure your site is being crawled properly.
Example Configurations
Standard WordPress Site
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: /wp-login.php Disallow: /wp-register.php Disallow: /?s= Disallow: /search/ Sitemap: https://example.com/sitemap.xml
E-commerce Site
User-agent: * Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /*?sort= Disallow: /*?filter= Allow: /products/ User-agent: GPTBot User-agent: CCBot Disallow: / Sitemap: https://example.com/sitemap.xml
Development Site
User-agent: * Disallow: /