-
Home
-
User Agent Directory
- GPTBot
GPTBot
What is GPTBot?
GPTBot is OpenAI's official web crawler designed to collect publicly available internet content for training and improving GPT models, including ChatGPT. Launched in August 2023, this bot respects robots.txt directives and provides website owners with full control over whether their content is used for AI training. GPTBot identifies itself clearly in server logs and follows ethical crawling practices, including respecting crawl delays and rate limits. Website owners who block GPTBot are effectively opting out of having their content used to train future GPT models, which could impact how well these models understand and reference their content. The bot primarily focuses on high-quality, publicly accessible content while automatically filtering out paywall-restricted content, personally identifiable information, and content that violates OpenAI's policies.
User Agent String
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
How to Control GPTBot
Block Completely
To prevent GPTBot from accessing your entire website, add this to your robots.txt file:
Block Specific Directories
To restrict access to certain parts of your site while allowing others:
Set Crawl Delay
To slow down the crawl rate (note: not all bots respect this directive):
How to Verify GPTBot
Reverse DNS lookup should resolve to a domain ending in openai.com
Learn more in the official documentation.
This bot may collect and use your website content for AI model training. Consider whether you want your content used for this purpose before allowing access.
Detection Patterns
Multiple ways to detect GPTBot in your application:
Basic Pattern
/GPTBot/i
Strict Pattern
/^Mozilla/5\.0 AppleWebKit/537\.36 \(KHTML, like Gecko; compatible; GPTBot/1\.0; \+https\://openai\.com/gptbot\)$/
Flexible Pattern
/GPTBot[\s\/]?[\d\.]*?/i
Vendor Match
/.*OpenAI.*GPTBot/i
Implementation Examples
Should You Block This Bot?
Recommendations based on your website type:
| Site Type | Recommendation | Reasoning |
|---|---|---|
| E-commerce | Limit Access | Protect pricing and inventory data from AI training |
| Blog/News | Consider Blocking | Your content may be used for AI training without compensation |
| SaaS Application | Block | No benefit for application interfaces; preserve resources |
| Documentation | Selective | Allow for public docs, block for internal docs |
| Corporate Site | Limit | Allow for public pages, block sensitive areas like intranets |
Advanced robots.txt Configurations
E-commerce Site Configuration
Publishing/Blog Configuration
SaaS/Application Configuration
Quick Reference
User Agent Match
GPTBot
Robots.txt Name
GPTBot
Category
ai
Respects robots.txt
Yes