Cookie Consent by Free Privacy Policy Generator Optimising XML Sitemaps: Getting It Right Without Sending the Wrong Signals - Chris Lever

Optimising XML Sitemaps: Getting It Right Without Sending the Wrong Signals

Optimising XML Sitemaps: Getting It Right Without Sending the Wrong Signals

XML sitemaps should be one of the easiest wins in technical SEO, but too often I see them working against a site rather than helping it. They are meant to help search engines discover and prioritise content, but a poorly managed sitemap can waste crawl budget, reduce trust, and send all the wrong signals.

Over the years, I have seen countless sitemaps bloated with irrelevant URLs, lastmod dates that update daily for no reason, and Search Console failing to process files that are technically valid but practically useless. Optimising sitemaps properly is about more than just making sure they exist. It is about making sure they help search engines do their job efficiently.

Lastmod Dates Should Be Accurate or Not Used at All

One of the first things I check in an XML sitemap is whether the lastmod dates actually mean anything. Too many sites let their CMS auto-generate lastmod values so every single URL gets refreshed daily, even when nothing has changed. I have seen this happen across e-commerce, news sites, and blogs, and in every case, it reduces trust in the sitemap.

Google treats lastmod as a signal, not a guarantee. If every page in a sitemap is marked as updated every day, but Google finds that nothing has actually changed, it stops believing the timestamps. The result is that even when a real update happens, Google might not act on it quickly.

The right way to handle lastmod is to update it only when meaningful content changes. A price update, a rewritten product description, or a major layout change is a legitimate reason. Fixing a typo or adjusting internal links is not. If there is no way to ensure lastmod is accurate, it is better to leave it out of the sitemap altogether.

Keep Sitemaps Lean and Focused on Indexable Pages

An XML sitemap is not a dumping ground for every URL on a website. It should only contain pages that are intended for indexing. I have audited too many sites where the sitemap includes:

  • Canonicalised URLs pointing elsewhere
  • Noindex pages that Google is being told not to rank
  • Redirected URLs that do not belong in a crawl queue
  • Expired pages that return 404 or 410

This kind of clutter does not just waste Google’s time. It makes the sitemap unreliable. When Google finds enough URLs that should not be there, it starts ignoring the sitemap altogether. I always make sure a sitemap only contains URLs that are meant to be crawled, indexed, and ranked.

Breaking Large Sitemaps Into Smaller Files Improves Reliability

Google allows a single XML sitemap to contain up to 50,000 URLs or 50MB of uncompressed data. While that is useful in theory, in practice, large sitemaps can be difficult for Google to process. I have seen Search Console struggle with big sitemaps, sometimes failing to detect all URLs, even when they are well within Google’s limits.

To avoid this, I break sitemaps into smaller, more manageable files. I usually aim for around 10,000 URLs per sitemap, which makes it easier to track indexation issues and keeps things organised. A segmented sitemap structure also means that if Google has trouble processing one file, it does not impact the entire site.

Using Sitemap Index Files to Improve Organisation

For sites with multiple sitemaps, I always use a sitemap index file. Instead of submitting each sitemap separately, a single index file tells Google where to find them. This keeps things structured and ensures Google knows how to process them efficiently. Plus you don’t need to declare every sitemap in the robots.txt file.

I segment sitemaps based on content type so product pages, categories, blog posts, and location pages all have their own dedicated sitemap. This makes it easier to track what is getting indexed and spot patterns in Search Console. If one segment has a lower indexation rate, it is easier to diagnose the issue without affecting everything else.

Zipping Large Sitemaps for Faster Processing

Google supports compressed XML sitemaps in .gz format. While this is often overlooked, I have found that zipping large sitemaps improves reliability. Uncompressed sitemaps can sometimes fail to load properly in Search Console, especially when they are close to the size limit. Compressing them reduces bandwidth usage and speeds up processing.

I have seen cases where an uncompressed sitemap was being partially processed or not detected at all, but once compressed, it was handled without issues. It is a small but effective optimisation.

Handling XML Sitemaps During a Website Migration

Website migrations can be a nightmare for SEO if not handled correctly. One of the most overlooked aspects is how XML sitemaps should be managed before, during, and after the migration.

If a site is moving to a new domain, changing its URL structure, or switching platforms, I always create a temporary migration sitemap that includes the old URLs mapped to their new equivalents. This helps search engines process redirects faster and ensures they do not waste time crawling outdated pages.

For major migrations, I follow a structured approach:

  • Before the migration: Generate a full sitemap of existing URLs and store a backup
  • During the migration: Submit a new XML sitemap containing old URLs to help with speeding up Google following the 301 redirects in place to their new locations
  • After the migration: Replace the old sitemap with the new one, making sure only the latest structure is reflected

I also monitor Search Console closely post-migration to check for indexing drops, redirect chains, and crawl errors. A poorly handled sitemap update can slow down recovery and cause unnecessary ranking fluctuations. Two months after the migration, I take down the XML sitemap containing old URLs.

Tracking Indexation and Adjusting Based on Search Console Data

Submitting a sitemap is not enough. I regularly check Google Search Console’s reports to see what is being indexed and what is being ignored. There are a few common patterns I watch for:

  • Pages marked as discovered but not indexed, which usually means Google does not see value in crawling them
  • Discrepancies between submitted and indexed URLs, which could indicate sitemap bloat or weak content signals
  • Errors like 404s or redirects, which should never be in a sitemap

If a sitemap is not leading to improved indexation, I adjust it. That could mean tightening up which URLs are included, restructuring how sitemaps are segmented, or addressing underlying content issues. The latter is often the case.

A Well-Optimised Sitemap Strengthens a Site’s SEO Foundation

A properly managed XML sitemap is not just about getting pages crawled. It reinforces trust in a site’s structure, ensures Google is prioritising the right content, and eliminates unnecessary noise from the indexation process.

I Recommend:

  • Lastmod dates are accurate or removed
  • Only indexable URLs are included
  • Large sitemaps are broken down into smaller files for better processing
  • Sitemap index files for keeping everything organised
  • Large sitemaps are .gz compressed to improve reliability
  • Use Search Console data to inform ongoing optimisation

XML sitemaps are one of the easiest ways to communicate a site’s structure to Google, but they need to be done right. When optimised properly, they do not just help with indexing. They improve efficiency, increase trust, and ensure the most valuable content gets the attention it deserves.

If in doubt, refer to Google’s best practices documentation: https://developers.google.com/search/docs/crawling-indexing/sitemaps/build-sitemap

Comments:

Leave a Reply

Your email address will not be published. Required fields are marked *