What Does the Script Do?
This script filters out specific status codes from the bulk all_inlinks CSV file exported from Screaming Frog. Whether you're looking for 404s, 301s, or any other codes, this script makes it a breeze by deleting the unwanted status codes.
It's convenient for large datasets where manual sorting becomes impractical. For example, when Excel says, "This data set is too large for the Excel grid. If you save this workbook, you'll lose data that wasn't loaded."
It's simple! You can just run the script on the exported CSV file from Screaming Frog's bulk inlinks export tool, and it filters out the rows with the status codes you've specified. Plus, it counts how many rows were removed for each status code, helping you gain a clearer picture of your site's internal link profile.
Here is the basic version for filtering out a single status code. In this example, I'm just filtering out any rows that match the status code of "200".
This script filters out specific status codes from the bulk all_inlinks CSV file exported from Screaming Frog. Whether you're looking for 404s, 301s, or any other codes, this script makes it a breeze by deleting the unwanted status codes.
It's convenient for large datasets where manual sorting becomes impractical. For example, when Excel says, "This data set is too large for the Excel grid. If you save this workbook, you'll lose data that wasn't loaded."
It's simple! You can just run the script on the exported CSV file from Screaming Frog's bulk inlinks export tool, and it filters out the rows with the status codes you've specified. Plus, it counts how many rows were removed for each status code, helping you gain a clearer picture of your site's internal link profile.
Code:
import pandas as pd
# Load the CSV file with low_memory set to False
# Screaming Frog > Bulk Exports > Links > All Inlinks
df = pd.read_csv('all_inlinks.csv', low_memory=False)
# Filter out rows where the "Status Code" column (Column G) is 200 or 404
# update with your desired status codes
filtered_df = df[(df.iloc[:, 6] != 200) & (df.iloc[:, 6] != 404)]
# Count the number of rows with Status Code 200 and 201
# update with your own values
count_200 = len(df[df.iloc[:, 6] == 200])
count_404 = len(df[df.iloc[:, 6] == 404])
# Save the filtered data to a new CSV file
filtered_df.to_csv('filtered_inlinks.csv', index=False)
# Print out the number of rows removed for each status code
# Remember to update the below, if you update any values above
print(f"Filtered CSV file has been saved as 'filtered_inlinks.csv'.")
print(f"Number of rows removed with Status Code 200: {count_200}")
print(f"Number of rows removed with Status Code 404: {count_404}")
Here is the basic version for filtering out a single status code. In this example, I'm just filtering out any rows that match the status code of "200".
Code:
import pandas as pd
# Load the CSV file with low_memory set to False
df = pd.read_csv('all_inlinks.csv', low_memory=False)
# Filter out rows where the "Status Code" column (Column G) is 200
filtered_df = df[df.iloc[:, 6] != 200]
# Save the filtered data to a new CSV file
filtered_df.to_csv('filtered_inlinks.csv', index=False)
print("Filtered CSV file has been saved as 'filtered_inlinks.csv'.")
Last edited: