Cookie Consent by Free Privacy Policy Generator

Google NLP API, PHP Proxy and Screaming Frog Custom JavaScript

1. How to Obtain and Secure Your Google NLP API Key
To connect securely to the Google Cloud Natural Language API, you’ll first need to create a service account within the Google Cloud Console. Once created, assign it the Cloud Natural Language API role. From there, generate a new JSON key file. This file contains your client_email, private_key, and other fields required for authenticated server-to-server access.

Once downloaded, do not upload this file directly into your public web directory. Instead, use your hosting provider’s File Manager or an FTP client to place the .json file into a directory outside of public_html (or www) so it’s not accessible via a browser. For example, on cPanel hosting, you can upload it to a folder like /home/youraccount/secure/.

After uploading, update your proxy.php script so the $keyFile variable points to the full path of this secure location. This ensures the file remains protected while still allowing PHP to access it during runtime.

This small step is critical. Leaving the key anywhere publicly accessible, even by accident, exposes your API quota and risks unauthorised usage.

2. proxy.php – A Secure Relay to Google NLP

This is the full PHP script that acts as a secure relay between Screaming Frog and the Google Cloud Natural Language API. It handles JWT creation, access token exchange, and routes the content to three NLP endpoints: entity recognition, sentiment analysis, and content classification. It requires a Google service account JSON key stored in a private directory on your hosting account.


To replicate this setup:
  • Place this script somewhere within your web root (e.g. /public_html/nlp-api/proxy.php)
  • Store your service account key (the .json file) outside of your public directories for security
  • Update the path to that key file in the script
  • Replace $secretKey = 'helloworld123'; with your own secure string
  • Protect the endpoint by ensuring only POST requests with the correct secret are processed

Code:
<?php
// Shared secret for Screaming Frog
$secretKey = 'helloworld123;

// Only POST allowed
if ($_SERVER['REQUEST_METHOD'] !== 'POST') {
    http_response_code(405);
    exit('Only POST allowed');
}

// Read JSON input
$input = json_decode(file_get_contents('php://input'), true);

// Debug log input
// file_put_contents('/home/xxx/public_html/nlp-api/debug.log', print_r($input, true));

if (!isset($input['secret']) || $input['secret'] !== $secretKey) {
    http_response_code(403);
    exit('Invalid secret');
}

if (!isset($input['text'])) {
    http_response_code(400);
    exit('Missing text parameter');
}

$text = $input['text'];

// Load JSON key
$keyFile = '/home/xxx/secure/dashboard-project-json-key.json';
$key = json_decode(file_get_contents($keyFile), true);

// Prepare JWT
$now = time();
$header = ['alg'=>'RS256','typ'=>'JWT'];
$claims = [
  'iss' => $key['client_email'],
  'scope' => 'https://www.googleapis.com/auth/cloud-language',
  'aud' => $key['token_uri'],
  'iat' => $now,
  'exp' => $now + 3600
];
function b64url($d){return rtrim(strtr(base64_encode($d),'+/','-_'),'=');}
$jwt = b64url(json_encode($header)).'.'.b64url(json_encode($claims));
openssl_sign($jwt, $sig, $key['private_key'], 'sha256WithRSAEncryption');
$jwt .= '.'.b64url($sig);

// Exchange for access token
$postdata = http_build_query([
  'grant_type' => 'urn:ietf:params:oauth:grant-type:jwt-bearer',
  'assertion' => $jwt
]);
$opts = ['http'=>[
  'method'=>'POST',
  'header'=>"Content-Type:application/x-www-form-urlencoded\r\n",
  'content'=>$postdata
]];
$response = file_get_contents($key['token_uri'], false, stream_context_create($opts));
$token = json_decode($response, true)['access_token'];
if (!$token) {
    http_response_code(500);
    exit('Failed to get access token');
}

// Function to call Google NLP
function callNlp($path, $body, $token) {
  $url = "https://language.googleapis.com/v1/documents:$path";
  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_HTTPHEADER, [
    "Authorization: Bearer $token",
    "Content-Type: application/json"
  ]);
  curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($body));
  $res = curl_exec($ch);
  curl_close($ch);
  return json_decode($res, true);
}

$doc = ['document'=>['type'=>'PLAIN_TEXT','content'=>$text]];

// Run NLP calls
$entities = callNlp('analyzeEntities', array_merge($doc, ['encodingType'=>'UTF8']), $token);
$sentiment = callNlp('analyzeSentiment', $doc, $token);
$categories = callNlp('classifyText', $doc, $token);

// Return JSON result
header('Content-Type: application/json');
echo json_encode([
  'entities' => $entities['entities'] ?? [],
  'sentiment' => $sentiment['documentSentiment'] ?? [],
  'categories' => $categories['categories'] ?? []
]);

3. Screaming Frog Custom JavaScript Snippet

This is the JavaScript snippet to paste into Screaming Frog’s Custom JavaScript configuration. It grabs the page’s body text, sends it to your PHP proxy, and returns a concise summary with sentiment score, top entity, and top category.


To replicate:
  • Make sure your proxy.php script is live and responding securely
  • Replace const proxyUrl = 'https://example.com/nlp-api/proxy.php'; with your actual URL
  • Replace the secret value with the same one used in your proxy script
  • Screaming Frog will run this script on every crawled page and append the result to a column in your crawl data

Code:
const proxyUrl = 'https://example.com/nlp-api/proxy.php';
const secret = 'helloworld123';
const text = document.body.innerText;

function callProxy() {
  return fetch(proxyUrl, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      secret: secret,
      text: text
    })
  })
  .then(response => {
    if (!response.ok) {
      return response.text().then(text => { throw new Error(text); });
    }
    return response.json();
  })
  .then(data => {
    const sentimentScore = data.sentiment?.score ?? 'null';
    const topEntity = data.entities?.[0]?.name ?? 'none';
    const topCategory = data.categories?.[0]?.name ?? 'none';

    // Combine the result into one cell
    return `Sentiment: ${sentimentScore} | Entity: ${topEntity} | Category: ${topCategory}`;
  });
}

return callProxy()
  .then(result => seoSpider.data(result))
  .catch(error => seoSpider.error(error));
 
Last edited:
Back
Top