Cookie Consent by Free Privacy Policy Generator How I Integrated Google NLP with Screaming Frog Using a PHP Proxy - Chris Lever

How I Integrated Google NLP with Screaming Frog Using a PHP Proxy

How I Integrated Google NLP with Screaming Frog Using a PHP Proxy

Google’s Natural Language API is seriously underrated in SEO. It gives you insight into how Google might interpret a piece of content beyond just keywords, sentiment, topical category, and key entities. The catch? It’s not built for SEOs. It’s built for developers. Getting access involves service accounts, bearer tokens, and APIs that are not exactly plug-and-play.

You’ll find all the replication instructions, plus copy-paste ready code, here:
https://chrisleverseo.com/forum/t/google-nlp-api-php-proxy-and-screaming-frog-custom-javascript.142/

Why I built a Google NLP Proxy for Screaming Frog

I already have working Python scripts that use Google’s Natural Language API. They do the job. Drop in a list of URLs, run the script, and out comes sentiment scores, entity extraction, and category classification. It’s accurate and reliable.

But it’s still a process. I have to collect the URLs first, prepare the input, then run the script separately from the crawl. It works fine for me, but it’s not built for the way most teams work.

Most SEOs live in Screaming Frog. It’s familiar, it’s flexible, and it fits how we audit and analyse. The idea of having NLP analysis run directly inside the crawl, without jumping between tools, just made sense. Less friction. More insight in the same workflow.

I was challenged to see if I could take what I’d already built and make it work in Screaming Frog. Not just for me, but in a way that others in the team could use. People who aren’t writing Python or setting up service accounts. People who just want to run a crawl and see what Google might understand about the content.

Why Direct Access to Google NLP doesn’t Work

To use the Google Cloud Natural Language API, you need to create a service account within the Google Cloud Console. This service account acts as a non-human identity that allows applications to authenticate securely. Once the account is created, you generate a private key file, specifically a JSON key that contains the client_email, private_key, and other required fields used in token-based authentication.

Accessing the NLP API with this key isn’t straightforward. You must first construct a JSON Web Token (JWT), sign it using the service account’s private key, then exchange that signed token for an OAuth 2.0 access token. Only once you have that access token can you make a proper API call to the NLP endpoint.

That’s a lot of overhead just to send some text and receive analysis in return.

My first attempt to simplify this was using Google Apps Script. I wrote a script that would take text input, handle the authentication steps, and call the API on my behalf. The plan was to publish it as a web app and then connect to it from Screaming Frog using a custom JavaScript snippet. While the idea made sense on paper, the implementation failed repeatedly. There were limitations around service account scopes, token exchanges, and how Apps Script handles authentication with external APIs. It wasn’t stable enough to rely on.

Next, I explored using Pipedream, a low-code workflow platform that could act as a middleware relay between Screaming Frog and the NLP API. That did work technically, but it came with an unavoidable problem: cost. Since Google NLP already charges per character analysed, layering a paid relay service on top of it would push costs up quickly, especially when crawling large volumes of content.

What I needed was something stable, free to run, and entirely under my control. That’s when I turned to a more traditional solution, a secure PHP proxy running on my own hosting.

Building a PHP Proxy for Google NLP

With Apps Script and Pipedream ruled out, I went back to basics. I knew I could run PHP on my shared cPanel hosting, and I knew it gave me just enough flexibility to manually build the authentication flow required by Google’s NLP API.

The proxy script accepts a POST request containing two values: a shared secret and the text to be analysed. It reads the Google service account JSON key from a secure, non-public directory on the server and uses it to create a signed JWT. That JWT is then exchanged for an OAuth 2.0 access token using Google’s token_uri. Once authenticated, the script makes direct API calls to Google’s Natural Language endpoints for entity analysis, sentiment scoring, and content classification.

To keep things secure, the proxy is locked down in two ways. First, it requires a hardcoded secret key that must be sent with every request. If the secret is missing or incorrect, the request is blocked. Second, the script is deployed in a subdirectory and protected from direct exposure; only POST requests with the correct payload are processed.

Because everything runs server-side, no API keys or credentials are exposed in the browser or in Screaming Frog’s JavaScript console. It behaves just like any other private endpoint, but it runs entirely on infrastructure I already manage.

This approach gave me exactly what I wanted: a secure, cost-free, and reliable way to make Google NLP available directly from Screaming Frog, without handing over control to a third-party tool or writing throwaway workarounds.

How I built it and how you can too

This was all built on standard cPanel shared hosting, no SSH, no Composer, no background workers, just raw PHP in a public subdirectory and a secure folder outside the web root to hold the private key. In other words, no devs needed.

It’s not a fancy setup, but that’s the point. You don’t need cloud functions, containers or virtual environments to make this work.

I’ve been using cPanel for nearly twenty years, and despite its quirks, it’s more than capable of handling this kind of integration. You can easily replicate the same process on Plesk, DirectAdmin, or any other hosting control panel that allows you to upload files, run PHP, and manage permissions. If you’ve got access to a file manager and a way to store a non-public JSON key securely, that’s enough.

The entire flow runs off a single proxy.php file. Once it receives the POST request, it handles authentication, communicates with Google’s NLP API, and returns a JSON response with the entities, sentiment score, and category. The only moving part on the server is that JSON key, which you should keep well away from your web root to avoid any risk of exposure.

I didn’t need to modify server configs, install extra libraries, or make use of anything my hosting provider didn’t already support. This is lightweight, repeatable, and easy to maintain.

If you’re comfortable with basic PHP and know your way around your hosting panel, you can have something similar up and running in less than an hour. All it takes is access to a Google Cloud project, a service account with NLP permissions, and a safe place to store your key.

Full Replication Steps Shared


If you want to replicate this setup for yourself or your team, I’ve put together a full walkthrough on my forum. The post covers everything you need to get it running, including the proxy.php file and the Screaming Frog custom JavaScript snippet. You’ll likely want to customise the Custom Javascript script to your own needs and requirements. This script is just the basis to get you up and running,

It explains how to securely obtain your Google NLP API key, store it away from public access, and connect it to a custom-built PHP relay script. I also walk through how to configure Screaming Frog to send each page’s content to that script and retrieve the NLP response directly into your crawl output.

You’ll find all the replication instructions, plus copy-paste ready code, here:
https://chrisleverseo.com/forum/t/google-nlp-api-php-proxy-and-screaming-frog-custom-javascript.142/

Whether you’re technical or just want a plug-and-play way to bring Google’s NLP analysis into Screaming Frog, it’s all in that post.

Watch Your Crawl Speed

Once the integration is live, it’s tempting to run Screaming Frog across your whole site at full speed. But keep in mind that every single page triggers a POST request to your proxy, which in turn fires three API calls to Google’s NLP service.

If you crawl too aggressively, two things will happen: first, you’ll quickly run into rate limits or quota issues on the Google side. Second, your hosting server might start to time out or throttle PHP processes if it’s shared or under load.

I strongly recommend slowing the crawl rate down. Head into Screaming Frog’s configuration and reduce the number of threads and crawl speed to something reasonable, especially if you’re working on shared hosting or using Google’s free tier. This will keep things stable and help avoid sporadic errors like 429s or 5xx responses from your proxy.

If you want to analyse large websites, do it in batches. Crawl selected folders or use a custom extraction crawl setup. It’s more manageable, and it’s how you’ll keep the NLP responses clean and consistent.

 

Run It, Pair It, Expand It

Once your Screaming Frog setup is talking to the Google NLP API, you can start doing more than just sentiment scoring or entity recognition. Pair it with the embeddings model, cluster intent types, or use the NLP response as input for supporting internal linking logic. You can even pipe it into custom dashboards or use it to highlight which pages need editorial rework based on tone, named entity density, or thematic categorisation.

This experience has really opened my eyes to what else could be proxied and made accessible through Screaming Frog’s JavaScript snippets. It’s not just about NLP,  you could apply this method to any API that needs authentication or sensitive keys. Think of translation APIs, classification tools, custom content classifiers, even OpenAI functions if you’re careful.

The flexibility of combining trusted APIs with Screaming Frog’s crawl logic is seriously powerful, especially when you’re working in mixed teams where not everyone writes code. I’ve now got a framework in place that I can reuse, adapt, and expand.

Comments:

Leave a Reply

Your email address will not be published. Required fields are marked *