Cookie Consent by Free Privacy Policy Generator Sogou web spider User Agent - Sogou Bot Details | CL SEO

Sogou web spider

Sogou Since 2004
Search Respects robots.txt
#search #chinese #sogou #crawler
Quick Actions
Official Docs

What is Sogou web spider?

Sogou web spider is the crawler for Sogou, a major Chinese search engine now part of Tencent. Sogou is known for innovative features including integration with WeChat (China's dominant messaging app) for searching within the WeChat ecosystem. The crawler indexes Chinese-language content and plays a crucial role in making content discoverable within Tencent's vast ecosystem. For businesses targeting Chinese markets, especially those using WeChat for marketing, Sogou's crawler is increasingly important.

User Agent String

Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)

How to Control Sogou web spider

Block Completely

To prevent Sogou web spider from accessing your entire website, add this to your robots.txt file:

# Block Sogou web spider User-agent: Sogou web spider Disallow: /

Block Specific Directories

To restrict access to certain parts of your site while allowing others:

User-agent: Sogou web spider Disallow: /admin/ Disallow: /private/ Disallow: /wp-admin/ Allow: /public/

Set Crawl Delay

To slow down the crawl rate (note: not all bots respect this directive):

User-agent: Sogou web spider Crawl-delay: 10

How to Verify Sogou web spider

Verification Method:
Sogou spider identification

Learn more in the official documentation.

Detection Patterns

Multiple ways to detect Sogou web spider in your application:

Basic Pattern

/Sogou web spider/i

Strict Pattern

/^Sogou web spider/4\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm\#07\)$/

Flexible Pattern

/Sogou web spider[\s\/]?[\d\.]*?/i

Vendor Match

/.*Sogou.*Sogou/i

Implementation Examples

// PHP Detection for Sogou web spider function detect_sogou_web_spider() { $user_agent = $_SERVER['HTTP_USER_AGENT'] ?? ''; $pattern = '/Sogou web spider/i'; if (preg_match($pattern, $user_agent)) { // Log the detection error_log('Sogou web spider detected from IP: ' . $_SERVER['REMOTE_ADDR']); // Set cache headers header('Cache-Control: public, max-age=3600'); header('X-Robots-Tag: noarchive'); // Optional: Serve cached version if (file_exists('cache/' . md5($_SERVER['REQUEST_URI']) . '.html')) { readfile('cache/' . md5($_SERVER['REQUEST_URI']) . '.html'); exit; } return true; } return false; }
# Python/Flask Detection for Sogou web spider import re from flask import request, make_response def detect_sogou_web_spider(): user_agent = request.headers.get('User-Agent', '') pattern = r'Sogou web spider' if re.search(pattern, user_agent, re.IGNORECASE): # Create response with caching response = make_response() response.headers['Cache-Control'] = 'public, max-age=3600' response.headers['X-Robots-Tag'] = 'noarchive' return True return False # Django Middleware class SogouwebspiderMiddleware: def __init__(self, get_response): self.get_response = get_response def __call__(self, request): if self.detect_bot(request): # Handle bot traffic pass return self.get_response(request)
// JavaScript/Node.js Detection for Sogou web spider const express = require('express'); const app = express(); // Middleware to detect Sogou web spider function detectSogouwebspider(req, res, next) { const userAgent = req.headers['user-agent'] || ''; const pattern = /Sogou web spider/i; if (pattern.test(userAgent)) { // Log bot detection console.log('Sogou web spider detected from IP:', req.ip); // Set cache headers res.set({ 'Cache-Control': 'public, max-age=3600', 'X-Robots-Tag': 'noarchive' }); // Mark request as bot req.isBot = true; req.botName = 'Sogou web spider'; } next(); } app.use(detectSogouwebspider);
# Apache .htaccess rules for Sogou web spider # Block completely RewriteEngine On RewriteCond %{HTTP_USER_AGENT} Sogou web spider [NC] RewriteRule .* - [F,L] # Or redirect to a static version RewriteCond %{HTTP_USER_AGENT} Sogou web spider [NC] RewriteCond %{REQUEST_URI} !^/static/ RewriteRule ^(.*)$ /static/$1 [L] # Or set environment variable for PHP SetEnvIfNoCase User-Agent "Sogou web spider" is_bot=1 # Add cache headers for this bot <If "%{HTTP_USER_AGENT} =~ /Sogou web spider/i"> Header set Cache-Control "public, max-age=3600" Header set X-Robots-Tag "noarchive" </If>
# Nginx configuration for Sogou web spider # Map user agent to variable map $http_user_agent $is_sogou_web_spider { default 0; ~*Sogou web spider 1; } server { # Block the bot completely if ($is_sogou_web_spider) { return 403; } # Or serve cached content location / { if ($is_sogou_web_spider) { root /var/www/cached; try_files $uri $uri.html $uri/index.html @backend; } try_files $uri @backend; } # Add headers for bot requests location @backend { if ($is_sogou_web_spider) { add_header Cache-Control "public, max-age=3600"; add_header X-Robots-Tag "noarchive"; } proxy_pass http://backend; } }

Should You Block This Bot?

Recommendations based on your website type:

Site Type Recommendation Reasoning
E-commerce Allow Essential for product visibility in search results
Blog/News Allow Increases content reach and discoverability
SaaS Application Block No benefit for application interfaces; preserve resources
Documentation Allow Improves documentation discoverability for developers
Corporate Site Allow Allow for public pages, block sensitive areas like intranets

Advanced robots.txt Configurations

E-commerce Site Configuration

User-agent: Sogou web spider Crawl-delay: 5 Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /api/ Disallow: /*?sort= Disallow: /*?filter= Disallow: /*&page= Allow: /products/ Allow: /categories/ Sitemap: https://example.com/sitemap.xml

Publishing/Blog Configuration

User-agent: Sogou web spider Crawl-delay: 10 Disallow: /wp-admin/ Disallow: /drafts/ Disallow: /preview/ Disallow: /*?replytocom= Allow: /

SaaS/Application Configuration

User-agent: Sogou web spider Disallow: /app/ Disallow: /api/ Disallow: /dashboard/ Disallow: /settings/ Allow: / Allow: /pricing/ Allow: /features/ Allow: /docs/

Quick Reference

User Agent Match

Sogou web spider

Robots.txt Name

Sogou web spider

Category

search

Respects robots.txt

Yes
Copied to clipboard!