Cookie Consent by Free Privacy Policy Generator Googlebot-News User Agent - Google Bot Details | CL SEO

Googlebot-News

Google Since 2002
Search Respects robots.txt
#search #google #news #crawler
Quick Actions
Official Docs

What is Googlebot-News?

Googlebot-News is a specialized crawler that indexes content specifically for Google News. This bot focuses on news websites and blogs, crawling them more frequently than standard Googlebot to capture breaking news and time-sensitive content. It looks for news articles, press releases, and editorial content that meets Google News content policies. The crawler pays special attention to article metadata, publication dates, and news-specific structured data. Publishers can optimize for Googlebot-News by following Google News guidelines and submitting their sites to Google News Publisher Center.

User Agent String

Googlebot-News

How to Control Googlebot-News

Block Completely

To prevent Googlebot-News from accessing your entire website, add this to your robots.txt file:

# Block Googlebot-News User-agent: Googlebot-News Disallow: /

Block Specific Directories

To restrict access to certain parts of your site while allowing others:

User-agent: Googlebot-News Disallow: /admin/ Disallow: /private/ Disallow: /wp-admin/ Allow: /public/

Set Crawl Delay

To slow down the crawl rate (note: not all bots respect this directive):

User-agent: Googlebot-News Crawl-delay: 10

How to Verify Googlebot-News

Verification Method:
Same verification as standard Googlebot

Learn more in the official documentation.

Detection Patterns

Multiple ways to detect Googlebot-News in your application:

Basic Pattern

/Googlebot\-News/i

Strict Pattern

/^Googlebot\-News$/

Flexible Pattern

/Googlebot\-News[\s\/]?[\d\.]*?/i

Vendor Match

/.*Google.*Googlebot\-News/i

Implementation Examples

// PHP Detection for Googlebot-News function detect_googlebot_news() { $user_agent = $_SERVER['HTTP_USER_AGENT'] ?? ''; $pattern = '/Googlebot\\-News/i'; if (preg_match($pattern, $user_agent)) { // Log the detection error_log('Googlebot-News detected from IP: ' . $_SERVER['REMOTE_ADDR']); // Set cache headers header('Cache-Control: public, max-age=3600'); header('X-Robots-Tag: noarchive'); // Optional: Serve cached version if (file_exists('cache/' . md5($_SERVER['REQUEST_URI']) . '.html')) { readfile('cache/' . md5($_SERVER['REQUEST_URI']) . '.html'); exit; } return true; } return false; }
# Python/Flask Detection for Googlebot-News import re from flask import request, make_responsedef detect_googlebot_news(): user_agent = request.headers.get('User-Agent', '') pattern = r'Googlebot-News' if re.search(pattern, user_agent, re.IGNORECASE): # Create response with caching response = make_response() response.headers['Cache-Control'] = 'public, max-age=3600' response.headers['X-Robots-Tag'] = 'noarchive' return True return False# Django Middleware class GooglebotNewsMiddleware: def __init__(self, get_response): self.get_response = get_response def __call__(self, request): if self.detect_bot(request): # Handle bot traffic pass return self.get_response(request)
// JavaScript/Node.js Detection for Googlebot-News const express = require('express'); const app = express();// Middleware to detect Googlebot-News function detectGooglebotNews(req, res, next) { const userAgent = req.headers['user-agent'] || ''; const pattern = /Googlebot-News/i; if (pattern.test(userAgent)) { // Log bot detection console.log('Googlebot-News detected from IP:', req.ip); // Set cache headers res.set({ 'Cache-Control': 'public, max-age=3600', 'X-Robots-Tag': 'noarchive' }); // Mark request as bot req.isBot = true; req.botName = 'Googlebot-News'; } next(); }app.use(detectGooglebotNews);
# Apache .htaccess rules for Googlebot-News# Block completely RewriteEngine On RewriteCond %{HTTP_USER_AGENT} Googlebot\-News [NC] RewriteRule .* - [F,L]# Or redirect to a static version RewriteCond %{HTTP_USER_AGENT} Googlebot\-News [NC] RewriteCond %{REQUEST_URI} !^/static/ RewriteRule ^(.*)$ /static/$1 [L]# Or set environment variable for PHP SetEnvIfNoCase User-Agent "Googlebot\-News" is_bot=1# Add cache headers for this bot <If "%{HTTP_USER_AGENT} =~ /Googlebot\-News/i"> Header set Cache-Control "public, max-age=3600" Header set X-Robots-Tag "noarchive" </If>
# Nginx configuration for Googlebot-News# Map user agent to variable map $http_user_agent $is_googlebot_news { default 0; ~*Googlebot\-News 1; }server { # Block the bot completely if ($is_googlebot_news) { return 403; } # Or serve cached content location / { if ($is_googlebot_news) { root /var/www/cached; try_files $uri $uri.html $uri/index.html @backend; } try_files $uri @backend; } # Add headers for bot requests location @backend { if ($is_googlebot_news) { add_header Cache-Control "public, max-age=3600"; add_header X-Robots-Tag "noarchive"; } proxy_pass http://backend; } }

Should You Block This Bot?

Recommendations based on your website type:

Site TypeRecommendationReasoning
E-commerce Allow Essential for product visibility in search results
Blog/News Allow Increases content reach and discoverability
SaaS Application Block No benefit for application interfaces; preserve resources
Documentation Allow Improves documentation discoverability for developers
Corporate Site Allow Allow for public pages, block sensitive areas like intranets

Advanced robots.txt Configurations

E-commerce Site Configuration

User-agent: Googlebot-News Crawl-delay: 5 Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /api/ Disallow: /*?sort= Disallow: /*?filter= Disallow: /*&page= Allow: /products/ Allow: /categories/ Sitemap: https://example.com/sitemap.xml

Publishing/Blog Configuration

User-agent: Googlebot-News Crawl-delay: 10 Disallow: /wp-admin/ Disallow: /drafts/ Disallow: /preview/ Disallow: /*?replytocom= Allow: /

SaaS/Application Configuration

User-agent: Googlebot-News Disallow: /app/ Disallow: /api/ Disallow: /dashboard/ Disallow: /settings/ Allow: / Allow: /pricing/ Allow: /features/ Allow: /docs/

Quick Reference

User Agent Match

Googlebot-News

Robots.txt Name

Googlebot-News

Category

search

Respects robots.txt

Yes
Copied to clipboard!