Googlebot-News

Name: Googlebot-News
Author: Google

Google • Since 2002

Search Respects robots.txt

#search #google #news #crawler

Quick Actions

Official Docs

What is Googlebot-News?

Googlebot-News is a specialized crawler that indexes content specifically for Google News. This bot focuses on news websites and blogs, crawling them more frequently than standard Googlebot to capture breaking news and time-sensitive content. It looks for news articles, press releases, and editorial content that meets Google News content policies. The crawler pays special attention to article metadata, publication dates, and news-specific structured data. Publishers can optimize for Googlebot-News by following Google News guidelines and submitting their sites to Google News Publisher Center.

User Agent String

Googlebot-News

How to Control Googlebot-News

Block Completely

To prevent Googlebot-News from accessing your entire website, add this to your robots.txt file:

# Block Googlebot-News
User-agent: Googlebot-News
Disallow: /

Block Specific Directories

To restrict access to certain parts of your site while allowing others:

User-agent: Googlebot-News
Disallow: /admin/
Disallow: /private/
Disallow: /wp-admin/
Allow: /public/

Set Crawl Delay

To slow down the crawl rate (note: not all bots respect this directive):

User-agent: Googlebot-News
Crawl-delay: 10

How to Verify Googlebot-News

Verification Method:
Same verification as standard Googlebot

Learn more in the official documentation.

Detection Patterns

Multiple ways to detect Googlebot-News in your application:

Basic Pattern

/Googlebot\-News/i

Strict Pattern

/^Googlebot\-News$/

Flexible Pattern

/Googlebot\-News[\s\/]?[\d\.]*?/i

Vendor Match

/.*Google.*Googlebot\-News/i

Implementation Examples

// PHP Detection for Googlebot-News function detect_googlebot_news() { $user_agent = $_SERVER['HTTP_USER_AGENT'] ?? ''; $pattern = '/Googlebot\\-News/i'; if (preg_match($pattern, $user_agent)) { // Log the detection error_log('Googlebot-News detected from IP: ' . $_SERVER['REMOTE_ADDR']); // Set cache headers header('Cache-Control: public, max-age=3600'); header('X-Robots-Tag: noarchive'); // Optional: Serve cached version if (file_exists('cache/' . md5($_SERVER['REQUEST_URI']) . '.html')) { readfile('cache/' . md5($_SERVER['REQUEST_URI']) . '.html'); exit; } return true; } return false; }

# Python/Flask Detection for Googlebot-News import re from flask import request, make_response def detect_googlebot_news(): user_agent = request.headers.get('User-Agent', '') pattern = r'Googlebot-News' if re.search(pattern, user_agent, re.IGNORECASE): # Create response with caching response = make_response() response.headers['Cache-Control'] = 'public, max-age=3600' response.headers['X-Robots-Tag'] = 'noarchive' return True return False # Django Middleware class GooglebotNewsMiddleware: def __init__(self, get_response): self.get_response = get_response def __call__(self, request): if self.detect_bot(request): # Handle bot traffic pass return self.get_response(request)

// JavaScript/Node.js Detection for Googlebot-News const express = require('express'); const app = express(); // Middleware to detect Googlebot-News function detectGooglebotNews(req, res, next) { const userAgent = req.headers['user-agent'] || ''; const pattern = /Googlebot-News/i; if (pattern.test(userAgent)) { // Log bot detection console.log('Googlebot-News detected from IP:', req.ip); // Set cache headers res.set({ 'Cache-Control': 'public, max-age=3600', 'X-Robots-Tag': 'noarchive' }); // Mark request as bot req.isBot = true; req.botName = 'Googlebot-News'; } next(); } app.use(detectGooglebotNews);

# Apache .htaccess rules for Googlebot-News # Block completely RewriteEngine On RewriteCond %{HTTP_USER_AGENT} Googlebot\-News [NC] RewriteRule .* - [F,L] # Or redirect to a static version RewriteCond %{HTTP_USER_AGENT} Googlebot\-News [NC] RewriteCond %{REQUEST_URI} !^/static/ RewriteRule ^(.*)$ /static/$1 [L] # Or set environment variable for PHP SetEnvIfNoCase User-Agent "Googlebot\-News" is_bot=1 # Add cache headers for this bot <If "%{HTTP_USER_AGENT} =~ /Googlebot\-News/i"> Header set Cache-Control "public, max-age=3600" Header set X-Robots-Tag "noarchive" </If>

# Nginx configuration for Googlebot-News # Map user agent to variable map $http_user_agent $is_googlebot_news { default 0; ~*Googlebot\-News 1; } server { # Block the bot completely if ($is_googlebot_news) { return 403; } # Or serve cached content location / { if ($is_googlebot_news) { root /var/www/cached; try_files $uri $uri.html $uri/index.html @backend; } try_files $uri @backend; } # Add headers for bot requests location @backend { if ($is_googlebot_news) { add_header Cache-Control "public, max-age=3600"; add_header X-Robots-Tag "noarchive"; } proxy_pass http://backend; } }

Should You Block This Bot?

Recommendations based on your website type:

Site Type	Recommendation	Reasoning
E-commerce	Allow	Essential for product visibility in search results
Blog/News	Allow	Increases content reach and discoverability
SaaS Application	Block	No benefit for application interfaces; preserve resources
Documentation	Allow	Improves documentation discoverability for developers
Corporate Site	Allow	Allow for public pages, block sensitive areas like intranets

Advanced robots.txt Configurations

E-commerce Site Configuration

User-agent: Googlebot-News Crawl-delay: 5 Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /api/ Disallow: /*?sort= Disallow: /*?filter= Disallow: /*&page= Allow: /products/ Allow: /categories/ Sitemap: https://example.com/sitemap.xml

Publishing/Blog Configuration

User-agent: Googlebot-News Crawl-delay: 10 Disallow: /wp-admin/ Disallow: /drafts/ Disallow: /preview/ Disallow: /*?replytocom= Allow: /

SaaS/Application Configuration

User-agent: Googlebot-News Disallow: /app/ Disallow: /api/ Disallow: /dashboard/ Disallow: /settings/ Allow: / Allow: /pricing/ Allow: /features/ Allow: /docs/

Quick Reference

User Agent Match

Googlebot-News

Robots.txt Name

Googlebot-News

Respects robots.txt

Yes

Technical Details

Bot Name Googlebot-News

Vendor Google

Category Search

Robots Name Googlebot-News

First Seen Sep 2002

Quick Patterns

Simple Match:
/Googlebot\-News/i

Exact Match:
/^Googlebot\-News$/

Recommended:
Generally allow

← Back to User Agent Directory

Copied to clipboard!

Googlebot-News

What is Googlebot-News?

User Agent String

How to Control Googlebot-News

Block Completely

Block Specific Directories

Set Crawl Delay

How to Verify Googlebot-News

Detection Patterns

Basic Pattern

Strict Pattern

Flexible Pattern

Vendor Match

Implementation Examples

Should You Block This Bot?

Advanced robots.txt Configurations

E-commerce Site Configuration

Publishing/Blog Configuration

SaaS/Application Configuration

Quick Reference

User Agent Match

Robots.txt Name

Category

Respects robots.txt

Technical Details

Quick Patterns

Tags

Related User Agents