-
Home

- User Agent Directory
User Agent String Database & Bot Directory
A comprehensive, regularly updated database of 189 verified user agent strings used by web crawlers, bots, and spiders. Identify AI crawlers like GPTBot and ClaudeBot, search engine spiders like Googlebot and Bingbot, SEO tools, social media preview bots, and more.
Use this directory to look up any bot's user agent string, find its robots.txt name for blocking or allowing access, verify its vendor, and understand its crawling behavior. Each entry includes the full user agent string you can copy directly into your server configuration or analytics filters.
Recently Added Bots & Crawlers
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-Web/1.0; +https://www.anthropic.com)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/bot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Google-Extended/1.0; +https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers)
Meta-ExternalAgent/1.0 (+https://developers.facebook.com/docs/sharing/bot)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
Mozilla/5.0 (compatible; DuckAssistBot/1.0; +https://duckduckgo.com/duckassist)
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot-Extended/0.1; +http://www.apple.com/go/applebot)
What Is a User Agent String?
A user agent string is an HTTP header that identifies the client making a request to a web server. Every time a browser, bot, or crawler visits a web page, it sends a user agent string that reveals what software is making the request. This allows web servers to serve appropriate content, log traffic sources, and enforce access rules.
Anatomy of a Bot User Agent String
Bot user agent strings generally follow a common format. Here is the structure of Googlebot's user agent string:
Mozilla/5.0
(compatible;
Googlebot/2.1;
+http://www.google.com/bot.html)
Platform token — compatibility string inherited from early web browsers
Compatible flag — indicates this is a bot, not a regular browser
Bot name & version — the crawler's identity used in robots.txt rules
Documentation URL — link to the bot's official information page
Not all bots follow this format. Some use minimal identifiers (e.g., Bytespider), while others mimic full browser user agent strings. The key identifier for robots.txt rules is the bot name portion, which is listed as the "robots.txt name" in each entry of this directory.
Types of Web Crawlers & Bots
Web bots serve many different purposes. Understanding which category a bot belongs to helps you decide whether to allow or block it. This directory organizes bots into the following categories:
AI AI & LLM Crawlers
Bots operated by AI companies to collect training data for large language models, or to fetch web content in real-time when users interact with AI assistants. Includes bots from OpenAI (GPTBot), Anthropic (ClaudeBot), Meta (FacebookBot), Google (Gemini-Deep-Research), and others.
Search Search Engine Bots
Crawlers from search engines that index web pages to serve search results. These are the most well-known bots on the web and include Googlebot, Bingbot, Yandex, Baiduspider, DuckDuckBot, and others. Blocking these bots will remove your site from their search results.
SEO SEO & Marketing Tools
Bots from SEO platforms that crawl websites to analyze backlinks, track rankings, audit technical issues, and monitor competitor sites. Includes crawlers from Ahrefs, Semrush, Moz, Screaming Frog, Majestic, and others commonly seen in server logs.
Social Media Bots
Fetchers that retrieve page metadata when URLs are shared on social platforms. They generate the link preview cards showing titles, descriptions, and thumbnails. Includes bots from Facebook, Twitter/X, LinkedIn, Discord, Slack, Telegram, and others.
Monitoring Monitoring & Performance
Synthetic monitoring agents that test website availability, performance, and functionality. They run scheduled checks from global locations to detect outages and measure response times. Includes tools like UptimeRobot, Pingdom, Datadog, and others.
Security Security Scanners
Vulnerability scanners and security auditing tools that test web applications for known weaknesses. These tools typically do not respect robots.txt as they need to test all accessible endpoints. Includes Nessus, Qualys, Nikto, OpenVAS, and CensysInspect.
How to Manage Bot Access with robots.txt
The robots.txt file is the standard mechanism for controlling which bots can crawl your website. Place it at the root of your domain (e.g., https://example.com/robots.txt) and define rules for each bot using its robots.txt name. Here is an example configuration:
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
# Allow everything else
User-agent: *
Allow: /
Key Points About robots.txt
- Voluntary compliance: robots.txt is a protocol, not a security measure. Legitimate bots from major companies respect it, but malicious scrapers may ignore it entirely.
- Bot name matching: The
User-agentvalue must match the bot's robots.txt name exactly. Use the "robots.txt name" field shown in each bot entry in this directory. - Specificity matters: More specific rules take precedence over general ones. A rule for
Googlebot-Imageoverrides a wildcard rule. - Crawl-delay: Some bots support a
Crawl-delaydirective that limits how frequently they request pages, reducing server load.
AI Crawlers: Understanding the New Wave of Web Bots
Since 2023, a significant new category of web crawlers has emerged: AI training bots. Companies like OpenAI, Anthropic, Google, Meta, and Cohere now operate crawlers that collect web content to train and improve their large language models (LLMs). This has created new challenges for website owners who need to decide whether their content should be used for AI training.
Types of AI Crawlers
AI-related bots generally fall into three categories:
- Training data crawlers collect web content at scale for model training. Examples include GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), Bytespider (ByteDance), and FacebookBot (Meta). Blocking these prevents your content from being included in future training datasets.
- User-action fetchers retrieve web pages in real-time when a user asks an AI assistant for current information. Examples include ChatGPT-User, Claude-User, Perplexity-User, and MistralAI-User. Blocking these prevents AI assistants from accessing your content during conversations.
- AI search crawlers index content specifically for AI-powered search products. Examples include OAI-SearchBot (OpenAI), Claude-SearchBot (Anthropic), and Google-CloudVertexBot. Blocking these may affect your visibility in AI search results.
How to Control AI Crawler Access
Most reputable AI companies respect robots.txt directives. You can selectively block training crawlers while still allowing user-action fetchers if you want your content to be referenceable but not used for training. Each AI bot entry in this directory specifies whether it respects robots.txt and what its specific purpose is, helping you make informed decisions about which bots to allow.
Frequently Asked Questions About User Agents & Web Crawlers
A user agent string is an identifier that web browsers, bots, and crawlers send to web servers with every HTTP request. It tells the server what software is making the request, including the application name, version, and sometimes the operating system. For example, Googlebot identifies itself as "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" so that website owners can recognize it as Google's search engine crawler.
You can identify bots crawling your site by checking your server access logs, which record the user agent string for every request. Look for known bot identifiers like Googlebot, Bingbot, GPTBot, or ClaudeBot. You can also use analytics tools that filter bot traffic, or set up server-side logging to flag requests from known crawler user agent strings. This directory catalogs all major bots to help you identify them.
To block AI crawlers, add disallow rules to your robots.txt file. For example, to block OpenAI's GPTBot, add "User-agent: GPTBot" followed by "Disallow: /" on the next line. You can block multiple AI crawlers by adding separate rules for each bot (GPTBot, ClaudeBot, CCBot, Bytespider, etc.). Note that robots.txt relies on voluntary compliance — most reputable AI companies respect these directives, but it is not a guaranteed enforcement mechanism.
robots.txt is a plain text file placed in the root directory of a website (e.g., example.com/robots.txt) that provides instructions to web crawlers about which pages or sections of the site they are allowed or not allowed to access. It uses a simple syntax with "User-agent" to specify which bot the rule applies to and "Disallow" or "Allow" to define access permissions. While it is an industry standard respected by major search engines and legitimate bots, it is advisory rather than enforceable.
Reputable crawlers from established companies like Google, Microsoft, and most AI companies typically respect robots.txt. You can verify compliance by adding a disallow rule for a specific bot and then monitoring your server logs to see if that bot continues to access blocked URLs. Each bot entry in this directory includes a "respects robots.txt" field to help you understand the expected behavior. You can also use the verification methods listed for each bot to confirm the crawler's identity is genuine.
A web crawler (or spider) systematically browses the web to index content, typically following links from page to page. Search engines like Google use crawlers to discover and index web pages. A web scraper extracts specific data from web pages for a particular purpose such as price monitoring or data analysis. The key difference is intent: crawlers aim to index and discover content broadly, while scrapers target specific data from specific pages. Many AI bots function as crawlers that collect training data at scale.
Google uses specialized crawlers for different purposes: Googlebot handles general web search indexing, Googlebot-Image focuses on image search, Googlebot-Video targets video content, AdsBot-Google checks ad landing page quality, Google-Extended collects data for AI training, and Storebot-Google indexes product and shopping content. Each crawler can be independently controlled via robots.txt, giving website owners granular control over which Google services can access their content.
Blocking all bots is generally not recommended, as it would prevent search engines from indexing your site, remove you from search results, and stop social media platforms from generating link previews. Instead, take a selective approach: allow search engine bots and social media crawlers that benefit your visibility, while blocking unwanted bots such as aggressive scrapers or AI training crawlers if you prefer not to contribute to their datasets. Review each bot's purpose before deciding whether to allow or block it.
How to Use This User Agent Database
For Website Owners
- Create robots.txt rules: Look up the robots.txt name for any bot and add allow or disallow rules to control access to your content.
- Filter analytics data: Exclude known bot user agent strings from your analytics reports to get accurate visitor counts.
- Configure rate limiting: Set up server-side rate limits for aggressive crawlers to protect your server resources.
- Audit your traffic: Cross-reference your server logs with this database to identify which bots are crawling your site and how often.
For Developers & Security Teams
- Bot detection: Use the user agent strings in this database to build detection logic that identifies and classifies bot traffic in your applications.
- Firewall rules: Create WAF rules to block unwanted bots at the network level before they reach your application servers.
- Verification: Use the verification methods listed for each bot to confirm that a request actually comes from the claimed bot and not a spoofed user agent.
- Monitoring: Track the appearance of new or unknown bots in your logs to identify potential scraping or scanning activity.
