- October 11, 2025


Sitemaps often get treated as an afterthought. One of those jobs that gets done at the start of a project, ticked off, and then ignored. But when a site starts growing — whether that’s new products, new regions, or new content types — the way your sitemaps are structured starts to matter.
That’s where sitemap index files come into their own. Quietly doing their job in the background, making sure search engines know what’s there, how it’s grouped, and where to start crawling.
Before XML sitemaps existed, websites used to have HTML sitemaps. Literal pages of links. More for people than bots. That worked fine back when sites were small and search engines were just starting to figure things out.
Then in 2005, Google introduced the XML Sitemap Protocol. It gave webmasters a structured way to list all their URLs and tell crawlers when things were last updated. Yahoo and Microsoft got involved soon after, and it became the standard.
That worked well for a while. But as websites got bigger, sitemap files started hitting the limits. At the time, the cap was 50,000 URLs per file or 50MB uncompressed. That’s when the sitemap index format was introduced — a file that lists multiple sitemaps so everything stays scalable.
It made a lot of sense. Instead of stuffing everything into one file, you split it up and keep it organised. That hasn’t changed.
A sitemap index file is just a list of other sitemap files. It doesn’t contain page URLs, just links to the sitemaps that do. Think of it like a contents page for the search engines.
When Googlebot hits that file, it goes off and fetches each sitemap listed. It’s cleaner, easier to manage, and lets you scale things up properly without having to constantly rebuild one giant sitemap.
Technically, a sitemap can hold up to 50,000 URLs or 50MB uncompressed. But in reality, that’s a bit much.
Personally, I like to keep each sitemap under 10,000 URLs. Smaller files are easier to audit, easier to generate, and less likely to hit parser issues. If something goes wrong, you don’t want to be trawling through tens of thousands of lines to figure it out.
Breaking it down by content type, region, or update frequency gives you more control and better visibility when something starts to slip.
Big sites need structure. A flat sitemap with every URL lumped together doesn’t tell you much. But split sitemaps do.
You can have one for product pages, one for blog content, one for international versions, and so on. If Google suddenly stops indexing one section, you’ll know exactly where the issue is. No guesswork. No digging through the weeds.
And when you submit those sitemaps through Google Search Console, you’ll get a clear view of what’s been discovered, what’s been indexed, and what’s potentially broken.
Most platforms and plugins generate sitemap index files by default. If you’re using something like WordPress with RankMath or Yoast, you’ll probably already have one at /sitemap_index.xml.
Inside that file, you’ll find entries like:
If you’re building your own, or want a bit more control, here’s the basic format:
1 2 3 4 5 6 7 8 9 10 | <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>https://www.example.com/sitemap-products.xml</loc> <lastmod>2024-03-24</lastmod> </sitemap> <sitemap> <loc>https://www.example.com/sitemap-blog.xml</loc> <lastmod>2024-03-24</lastmod> </sitemap> </sitemapindex> |
It’s simple. Just make sure all the sitemap files listed are accessible and actually return a 200 status. Submit the index to Google and you’re good.
There are best practices, and then there’s what actually works when you’re managing large or awkward sites. Here’s what I tend to follow:
And don’t forget to link the index file in your robots.txt. It still helps bots find it faster, especially if it hasn’t been submitted directly to Google yet.
No one’s going to give you a pat on the back for setting up a good sitemap structure. But when your indexing is solid, crawl budget is being used efficiently, and problems are easier to diagnose, you’ll feel the difference.
Search engines don’t need perfect sitemaps to crawl your site. But when they’re well structured, they make the whole process smoother.
If you’ve not looked at your sitemap index setup for a while, or you’re still lumping everything into one flat file, it’s probably time to rethink it. It’s not exciting work, but it’s the kind of thing that keeps everything else running smoothly behind the scenes.
Comments: