Below is a PHP script that scrapes all the URLs in a single standard XML sitemap file and then stores all the sitemap data in a MySQL database table.
Assuming that you have already created the database. Run the following SQL statement. This will create a table for you to store the data scraped from the XML Sitemap.
For the next part create a PHP, name the PHP file xml-scraper.php, and past in the PHP script below. Ensure that the database access details match the details of your newly created database.
A word of caution, scraping and storing large XML files can take a number of minutes. You may need to increase the PHP execution time on the server.
Let me know how you get on. Feel free to comment below.
Assuming that you have already created the database. Run the following SQL statement. This will create a table for you to store the data scraped from the XML Sitemap.
Code:
CREATE TABLE xmlscraper(
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`page_id` bigint unsigned NOT NULL,
`url` text NOT NULL,
`lastmod` text NOT NULL,
`changefreq` text NOT NULL,
PRIMARY KEY (`id`),
KEY `page_id` (`page_id`)
) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci;
For the next part create a PHP, name the PHP file xml-scraper.php, and past in the PHP script below. Ensure that the database access details match the details of your newly created database.
PHP:
<?php
// get sitemap url
$sitemap = 'https://exampleshop.com/sitemap.xml';
// get sitemap contents
$content = file_get_contents($sitemap);
// parse the sitemap contents
$xml = simplexml_load_string($content);
// retrieve elements from sitemap contents
foreach ($xml->url as $urlElement) {
// get elements
$url = $urlElement->loc;
$lastmod = $urlElement->lastmod;
$changefreq = $urlElement->changefreq;
$priority = $urlElement->priority;
$page_id = ++$char;
// echo elements
echo '<strong>URL</strong>: '. $url . '<br>';
echo '<strong>Lastmod</strong>: '. $lastmod . '<br>';
echo '<strong>Changefreq</strong>: '. $changefreq . '<br>';
echo '<strong>Priority</strong>: '. $priority . '<br>';
echo '<strong>Page ID</strong>: '. $page_id . '<br>';
echo '<br>---<br>'; // separator
// connect to database
$servername = "localhost";
$username = "database_user";
$password = "passwordxyz";
$dbname = "database_name";
// Create connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
// start sql statement
$sql = "INSERT INTO xmlscraper(page_id, url, lastmod, changefreq )
VALUES ('$page_id', '$url', '$lastmod', '$changefreq')";
if ($conn->query($sql) === TRUE) {
echo "New record created successfully";
} else {
echo "Error: " . $sql . "<br>" . $conn->error;
}
$conn->close();
}
?>
A word of caution, scraping and storing large XML files can take a number of minutes. You may need to increase the PHP execution time on the server.
Let me know how you get on. Feel free to comment below.
Last edited: