Cookie Consent by Free Privacy Policy Generator

Scrape a XML Sitemap and store the data in a MySQL Database

Below is a PHP script that scrapes all the URLs in a single standard XML sitemap file and then stores all the sitemap data in a MySQL database table.

Assuming that you have already created the database. Run the following SQL statement. This will create a table for you to store the data scraped from the XML Sitemap.

Code:
CREATE TABLE xmlscraper(
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`page_id` bigint unsigned NOT NULL,
`url` text NOT NULL,
`lastmod` text NOT NULL,
`changefreq` text NOT NULL,
PRIMARY KEY  (`id`),
KEY `page_id` (`page_id`)
) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;

For the next part create a PHP, name the PHP file xml-scraper.php, and past in the PHP script below. Ensure that the database access details match the details of your newly created database.

PHP:
<?php
// get sitemap url
$sitemap = 'https://exampleshop.com/sitemap.xml';

// get sitemap contents
$content = file_get_contents($sitemap);

// parse the sitemap contents
$xml = simplexml_load_string($content);

// retrieve elements from sitemap contents
foreach ($xml->url as $urlElement) {
 
    // get elements
    $url = $urlElement->loc;
    $lastmod = $urlElement->lastmod;
    $changefreq = $urlElement->changefreq;
    $priority = $urlElement->priority;
    $page_id = ++$char;

    // echo elements
    echo '<strong>URL</strong>: '. $url . '<br>';
    echo '<strong>Lastmod</strong>: '. $lastmod . '<br>';
    echo '<strong>Changefreq</strong>: '. $changefreq . '<br>';
    echo '<strong>Priority</strong>: '. $priority . '<br>';
    echo '<strong>Page ID</strong>: '.   $page_id . '<br>';
    echo '<br>---<br>'; // separator

    //  connect to database
    $servername = "localhost";
    $username = "database_user";
    $password = "passwordxyz";
    $dbname = "database_name";

// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);

// Check connection
if ($conn->connect_error) {
  die("Connection failed: " . $conn->connect_error);
}

// start sql statement
$sql = "INSERT INTO xmlscraper(page_id, url, lastmod, changefreq )
VALUES ('$page_id', '$url', '$lastmod', '$changefreq')";

if ($conn->query($sql) === TRUE) {
 echo "New record created successfully";
} else {
  echo "Error: " . $sql . "<br>" . $conn->error;
}

$conn->close();
 
}
?>

A word of caution, scraping and storing large XML files can take a number of minutes. You may need to increase the PHP execution time on the server.

Let me know how you get on. Feel free to comment below.
 
Last edited:
Back
Top