Cookie Consent by Free Privacy Policy Generator Algolia Crawler User Agent - Algolia Bot Details | CL SEO

Algolia Crawler

Algolia Since 2016
Other Respects robots.txt
#search #indexing #algolia #crawler
Quick Actions
Official Docs

What is Algolia Crawler?

Algolia Crawler is used by Algolia's hosted search service to index website content for their search-as-a-service platform. Many websites and applications use Algolia to power their internal search functionality, and this crawler helps keep search indices up to date. The bot is typically configured to crawl specific websites that have implemented Algolia search, focusing on content that needs to be searchable. Algolia's crawler can be customized to extract specific data and follow custom crawling rules, making it flexible for various search implementations.

User Agent String

Algolia Crawler

How to Control Algolia Crawler

Block Completely

To prevent Algolia Crawler from accessing your entire website, add this to your robots.txt file:

# Block Algolia Crawler User-agent: Algolia Crawler Disallow: /

Block Specific Directories

To restrict access to certain parts of your site while allowing others:

User-agent: Algolia Crawler Disallow: /admin/ Disallow: /private/ Disallow: /wp-admin/ Allow: /public/

Set Crawl Delay

To slow down the crawl rate (note: not all bots respect this directive):

User-agent: Algolia Crawler Crawl-delay: 10

How to Verify Algolia Crawler

Verification Method:
Configured for specific Algolia customers

Learn more in the official documentation.

Detection Patterns

Multiple ways to detect Algolia Crawler in your application:

Basic Pattern

/Algolia Crawler/i

Strict Pattern

/^Algolia Crawler$/

Flexible Pattern

/Algolia Crawler[\s\/]?[\d\.]*?/i

Vendor Match

/.*Algolia.*Algolia/i

Implementation Examples

// PHP Detection for Algolia Crawler function detect_algolia_crawler() { $user_agent = $_SERVER['HTTP_USER_AGENT'] ?? ''; $pattern = '/Algolia Crawler/i'; if (preg_match($pattern, $user_agent)) { // Log the detection error_log('Algolia Crawler detected from IP: ' . $_SERVER['REMOTE_ADDR']); // Set cache headers header('Cache-Control: public, max-age=3600'); header('X-Robots-Tag: noarchive'); // Optional: Serve cached version if (file_exists('cache/' . md5($_SERVER['REQUEST_URI']) . '.html')) { readfile('cache/' . md5($_SERVER['REQUEST_URI']) . '.html'); exit; } return true; } return false; }
# Python/Flask Detection for Algolia Crawler import re from flask import request, make_response def detect_algolia_crawler(): user_agent = request.headers.get('User-Agent', '') pattern = r'Algolia Crawler' if re.search(pattern, user_agent, re.IGNORECASE): # Create response with caching response = make_response() response.headers['Cache-Control'] = 'public, max-age=3600' response.headers['X-Robots-Tag'] = 'noarchive' return True return False # Django Middleware class AlgoliaCrawlerMiddleware: def __init__(self, get_response): self.get_response = get_response def __call__(self, request): if self.detect_bot(request): # Handle bot traffic pass return self.get_response(request)
// JavaScript/Node.js Detection for Algolia Crawler const express = require('express'); const app = express(); // Middleware to detect Algolia Crawler function detectAlgoliaCrawler(req, res, next) { const userAgent = req.headers['user-agent'] || ''; const pattern = /Algolia Crawler/i; if (pattern.test(userAgent)) { // Log bot detection console.log('Algolia Crawler detected from IP:', req.ip); // Set cache headers res.set({ 'Cache-Control': 'public, max-age=3600', 'X-Robots-Tag': 'noarchive' }); // Mark request as bot req.isBot = true; req.botName = 'Algolia Crawler'; } next(); } app.use(detectAlgoliaCrawler);
# Apache .htaccess rules for Algolia Crawler # Block completely RewriteEngine On RewriteCond %{HTTP_USER_AGENT} Algolia Crawler [NC] RewriteRule .* - [F,L] # Or redirect to a static version RewriteCond %{HTTP_USER_AGENT} Algolia Crawler [NC] RewriteCond %{REQUEST_URI} !^/static/ RewriteRule ^(.*)$ /static/$1 [L] # Or set environment variable for PHP SetEnvIfNoCase User-Agent "Algolia Crawler" is_bot=1 # Add cache headers for this bot <If "%{HTTP_USER_AGENT} =~ /Algolia Crawler/i"> Header set Cache-Control "public, max-age=3600" Header set X-Robots-Tag "noarchive" </If>
# Nginx configuration for Algolia Crawler # Map user agent to variable map $http_user_agent $is_algolia_crawler { default 0; ~*Algolia Crawler 1; } server { # Block the bot completely if ($is_algolia_crawler) { return 403; } # Or serve cached content location / { if ($is_algolia_crawler) { root /var/www/cached; try_files $uri $uri.html $uri/index.html @backend; } try_files $uri @backend; } # Add headers for bot requests location @backend { if ($is_algolia_crawler) { add_header Cache-Control "public, max-age=3600"; add_header X-Robots-Tag "noarchive"; } proxy_pass http://backend; } }

Should You Block This Bot?

Recommendations based on your website type:

Site Type Recommendation Reasoning
E-commerce Optional Evaluate based on bandwidth usage vs. benefits
Blog/News Allow Increases content reach and discoverability
SaaS Application Block No benefit for application interfaces; preserve resources
Documentation Selective Allow for public docs, block for internal docs
Corporate Site Limit Allow for public pages, block sensitive areas like intranets

Advanced robots.txt Configurations

E-commerce Site Configuration

User-agent: Algolia Crawler Crawl-delay: 5 Disallow: /cart/ Disallow: /checkout/ Disallow: /my-account/ Disallow: /api/ Disallow: /*?sort= Disallow: /*?filter= Disallow: /*&page= Allow: /products/ Allow: /categories/ Sitemap: https://example.com/sitemap.xml

Publishing/Blog Configuration

User-agent: Algolia Crawler Crawl-delay: 10 Disallow: /wp-admin/ Disallow: /drafts/ Disallow: /preview/ Disallow: /*?replytocom= Allow: /

SaaS/Application Configuration

User-agent: Algolia Crawler Disallow: /app/ Disallow: /api/ Disallow: /dashboard/ Disallow: /settings/ Allow: / Allow: /pricing/ Allow: /features/ Allow: /docs/

Quick Reference

User Agent Match

Algolia Crawler

Robots.txt Name

Algolia Crawler

Category

other

Respects robots.txt

Yes
Copied to clipboard!