Cookie Consent by Free Privacy Policy Generator User Agent Database | Bot & Crawler User Agent Strings

User Agent String Database & Bot Directory

A comprehensive, regularly updated database of 189 verified user agent strings used by web crawlers, bots, and spiders. Identify AI crawlers like GPTBot and ClaudeBot, search engine spiders like Googlebot and Bingbot, SEO tools, social media preview bots, and more.

Use this directory to look up any bot's user agent string, find its robots.txt name for blocking or allowing access, verify its vendor, and understand its crawling behavior. Each entry includes the full user agent string you can copy directly into your server configuration or analytics filters.

189
Total User Agents
40
AI Crawlers
49
Search Engine Bots
69
SEO, Social & Monitoring

Found 17 user agents in category "Monitoring"

monitoring
Vendor: UptimeRobot
Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)
#monitoring #uptime #health-check #alerts
robots.txt: UptimeRobot
monitoring
Vendor: Pingdom
Mozilla/5.0 (compatible; Pingdom.com_bot_version_1.4_(http://www.pingdom.com/))
#monitoring #uptime #performance #alerts
robots.txt: Pingdom.com_bot
monitoring
Vendor: StatusCake
Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 StatusCake
#monitoring #uptime #performance #testing
robots.txt: StatusCake
monitoring
Vendor: Site24x7
Mozilla/5.0 (compatible; Site24x7/1.0; +https://www.site24x7.com/)
#monitoring #uptime #performance #apm
robots.txt: Site24x7
monitoring
Vendor: GTmetrix
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 GTmetrix
#performance #pagespeed #monitoring #testing
robots.txt: GTmetrix
Vendor: Google
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Chrome-Lighthouse
#performance #audit #pagespeed #google
robots.txt: Chrome-Lighthouse
Vendor: Google
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/120.0.0.0 Safari/537.36
#performance #pagespeed #google #testing
robots.txt: Google Page Speed Insights
Vendor: Siteimprove
Mozilla/5.0 (compatible; SITEIMPROVE)
#accessibility #seo #quality #crawler
robots.txt: SITEIMPROVE
Vendor: ContentKing
Mozilla/5.0 (compatible; ContentKing/1.0; +https://www.contentkingapp.com)
#seo #monitoring #real-time #crawler
robots.txt: ContentKing
Vendor: Datadog
Mozilla/5.0 (X11; Linux x86_64; DatadogSynthetics) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36
#monitoring #synthetics #apm #observability
robots.txt: DatadogSynthetics
Vendor: New Relic
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36 NewRelicSynthetics/1.0
#monitoring #synthetics #apm #performance
robots.txt: NewRelicSynthetics
monitoring
Vendor: Catchpoint Systems
Mozilla/5.0 (compatible; Windows NT 6.1; Catchpoint) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
#monitoring #performance #synthetic #uptime
robots.txt: Catchpoint
monitoring
Vendor: Cisco ThousandEyes
ThousandEyes
#monitoring #network #performance #cisco
robots.txt: ThousandEyes
monitoring
Vendor: Checkly
Checkly/1.0 (https://www.checklyhq.com)
#monitoring #synthetic #api #playwright
robots.txt: Checkly
Vendor: Better Stack
Better Uptime Bot
#monitoring #uptime #betterstack
robots.txt: Better Uptime Bot
monitoring
Vendor: HetrixTools
HetrixTools
#monitoring #uptime #blacklist
robots.txt: HetrixTools
monitoring
Vendor: Freshworks
Freshping
#monitoring #uptime #freshworks
robots.txt: Freshping

What Is a User Agent String?

A user agent string is an HTTP header that identifies the client making a request to a web server. Every time a browser, bot, or crawler visits a web page, it sends a user agent string that reveals what software is making the request. This allows web servers to serve appropriate content, log traffic sources, and enforce access rules.

Anatomy of a Bot User Agent String

Bot user agent strings generally follow a common format. Here is the structure of Googlebot's user agent string:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Platform token — compatibility string inherited from early web browsers

Compatible flag — indicates this is a bot, not a regular browser

Bot name & version — the crawler's identity used in robots.txt rules

Documentation URL — link to the bot's official information page

Not all bots follow this format. Some use minimal identifiers (e.g., Bytespider), while others mimic full browser user agent strings. The key identifier for robots.txt rules is the bot name portion, which is listed as the "robots.txt name" in each entry of this directory.

Types of Web Crawlers & Bots

Web bots serve many different purposes. Understanding which category a bot belongs to helps you decide whether to allow or block it. This directory organizes bots into the following categories:

AI AI & LLM Crawlers

Bots operated by AI companies to collect training data for large language models, or to fetch web content in real-time when users interact with AI assistants. Includes bots from OpenAI (GPTBot), Anthropic (ClaudeBot), Meta (FacebookBot), Google (Gemini-Deep-Research), and others.

40 bots in this category

Search Search Engine Bots

Crawlers from search engines that index web pages to serve search results. These are the most well-known bots on the web and include Googlebot, Bingbot, Yandex, Baiduspider, DuckDuckBot, and others. Blocking these bots will remove your site from their search results.

49 bots in this category

SEO SEO & Marketing Tools

Bots from SEO platforms that crawl websites to analyze backlinks, track rankings, audit technical issues, and monitor competitor sites. Includes crawlers from Ahrefs, Semrush, Moz, Screaming Frog, Majestic, and others commonly seen in server logs.

30 bots in this category

Social Media Bots

Fetchers that retrieve page metadata when URLs are shared on social platforms. They generate the link preview cards showing titles, descriptions, and thumbnails. Includes bots from Facebook, Twitter/X, LinkedIn, Discord, Slack, Telegram, and others.

15 bots in this category

Monitoring Monitoring & Performance

Synthetic monitoring agents that test website availability, performance, and functionality. They run scheduled checks from global locations to detect outages and measure response times. Includes tools like UptimeRobot, Pingdom, Datadog, and others.

17 bots in this category

Security Security Scanners

Vulnerability scanners and security auditing tools that test web applications for known weaknesses. These tools typically do not respect robots.txt as they need to test all accessible endpoints. Includes Nessus, Qualys, Nikto, OpenVAS, and CensysInspect.

7 bots in this category

How to Manage Bot Access with robots.txt

The robots.txt file is the standard mechanism for controlling which bots can crawl your website. Place it at the root of your domain (e.g., https://example.com/robots.txt) and define rules for each bot using its robots.txt name. Here is an example configuration:

# Allow search engines full access
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

# Allow everything else
User-agent: *
Allow: /

Key Points About robots.txt

  • Voluntary compliance: robots.txt is a protocol, not a security measure. Legitimate bots from major companies respect it, but malicious scrapers may ignore it entirely.
  • Bot name matching: The User-agent value must match the bot's robots.txt name exactly. Use the "robots.txt name" field shown in each bot entry in this directory.
  • Specificity matters: More specific rules take precedence over general ones. A rule for Googlebot-Image overrides a wildcard rule.
  • Crawl-delay: Some bots support a Crawl-delay directive that limits how frequently they request pages, reducing server load.

AI Crawlers: Understanding the New Wave of Web Bots

Since 2023, a significant new category of web crawlers has emerged: AI training bots. Companies like OpenAI, Anthropic, Google, Meta, and Cohere now operate crawlers that collect web content to train and improve their large language models (LLMs). This has created new challenges for website owners who need to decide whether their content should be used for AI training.

Types of AI Crawlers

AI-related bots generally fall into three categories:

  • Training data crawlers collect web content at scale for model training. Examples include GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), Bytespider (ByteDance), and FacebookBot (Meta). Blocking these prevents your content from being included in future training datasets.
  • User-action fetchers retrieve web pages in real-time when a user asks an AI assistant for current information. Examples include ChatGPT-User, Claude-User, Perplexity-User, and MistralAI-User. Blocking these prevents AI assistants from accessing your content during conversations.
  • AI search crawlers index content specifically for AI-powered search products. Examples include OAI-SearchBot (OpenAI), Claude-SearchBot (Anthropic), and Google-CloudVertexBot. Blocking these may affect your visibility in AI search results.

How to Control AI Crawler Access

Most reputable AI companies respect robots.txt directives. You can selectively block training crawlers while still allowing user-action fetchers if you want your content to be referenceable but not used for training. Each AI bot entry in this directory specifies whether it respects robots.txt and what its specific purpose is, helping you make informed decisions about which bots to allow.

Frequently Asked Questions About User Agents & Web Crawlers

What is a user agent string? +

A user agent string is an identifier that web browsers, bots, and crawlers send to web servers with every HTTP request. It tells the server what software is making the request, including the application name, version, and sometimes the operating system. For example, Googlebot identifies itself as "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" so that website owners can recognize it as Google's search engine crawler.

How do I identify which bots are crawling my website? +

You can identify bots crawling your site by checking your server access logs, which record the user agent string for every request. Look for known bot identifiers like Googlebot, Bingbot, GPTBot, or ClaudeBot. You can also use analytics tools that filter bot traffic, or set up server-side logging to flag requests from known crawler user agent strings. This directory catalogs all major bots to help you identify them.

How do I block AI crawlers like GPTBot or ClaudeBot from my website? +

To block AI crawlers, add disallow rules to your robots.txt file. For example, to block OpenAI's GPTBot, add "User-agent: GPTBot" followed by "Disallow: /" on the next line. You can block multiple AI crawlers by adding separate rules for each bot (GPTBot, ClaudeBot, CCBot, Bytespider, etc.). Note that robots.txt relies on voluntary compliance — most reputable AI companies respect these directives, but it is not a guaranteed enforcement mechanism.

What is robots.txt and how does it work? +

robots.txt is a plain text file placed in the root directory of a website (e.g., example.com/robots.txt) that provides instructions to web crawlers about which pages or sections of the site they are allowed or not allowed to access. It uses a simple syntax with "User-agent" to specify which bot the rule applies to and "Disallow" or "Allow" to define access permissions. While it is an industry standard respected by major search engines and legitimate bots, it is advisory rather than enforceable.

How do I know if a crawler respects robots.txt? +

Reputable crawlers from established companies like Google, Microsoft, and most AI companies typically respect robots.txt. You can verify compliance by adding a disallow rule for a specific bot and then monitoring your server logs to see if that bot continues to access blocked URLs. Each bot entry in this directory includes a "respects robots.txt" field to help you understand the expected behavior. You can also use the verification methods listed for each bot to confirm the crawler's identity is genuine.

What is the difference between a web crawler and a web scraper? +

A web crawler (or spider) systematically browses the web to index content, typically following links from page to page. Search engines like Google use crawlers to discover and index web pages. A web scraper extracts specific data from web pages for a particular purpose such as price monitoring or data analysis. The key difference is intent: crawlers aim to index and discover content broadly, while scrapers target specific data from specific pages. Many AI bots function as crawlers that collect training data at scale.

Why are there so many different Google crawlers? +

Google uses specialized crawlers for different purposes: Googlebot handles general web search indexing, Googlebot-Image focuses on image search, Googlebot-Video targets video content, AdsBot-Google checks ad landing page quality, Google-Extended collects data for AI training, and Storebot-Google indexes product and shopping content. Each crawler can be independently controlled via robots.txt, giving website owners granular control over which Google services can access their content.

Should I block all bots from my website? +

Blocking all bots is generally not recommended, as it would prevent search engines from indexing your site, remove you from search results, and stop social media platforms from generating link previews. Instead, take a selective approach: allow search engine bots and social media crawlers that benefit your visibility, while blocking unwanted bots such as aggressive scrapers or AI training crawlers if you prefer not to contribute to their datasets. Review each bot's purpose before deciding whether to allow or block it.

How to Use This User Agent Database

For Website Owners

  • Create robots.txt rules: Look up the robots.txt name for any bot and add allow or disallow rules to control access to your content.
  • Filter analytics data: Exclude known bot user agent strings from your analytics reports to get accurate visitor counts.
  • Configure rate limiting: Set up server-side rate limits for aggressive crawlers to protect your server resources.
  • Audit your traffic: Cross-reference your server logs with this database to identify which bots are crawling your site and how often.

For Developers & Security Teams

  • Bot detection: Use the user agent strings in this database to build detection logic that identifies and classifies bot traffic in your applications.
  • Firewall rules: Create WAF rules to block unwanted bots at the network level before they reach your application servers.
  • Verification: Use the verification methods listed for each bot to confirm that a request actually comes from the claimed bot and not a spoofed user agent.
  • Monitoring: Track the appearance of new or unknown bots in your logs to identify potential scraping or scanning activity.