How to block fake Googlebot clients in NGINX

Lately, some of my websites have had some serious leeching issues with clients pretending to be Googlebot.
They send the correct user agent string, ‘Googlebot/2.1 (+http://www.googlebot.com/bot.html)’, but they come from IP’s which have not been assigned to Google.

This got me thinking… There must be a way to configure NGINX so that it makes sure clients saying they’re Googlebot are actually coming from an IP assign to Google.

I’ve come up with the following, which seems to be working.

Add the following to your http-block (probably in /etc/nginx/nginx.conf, or somewhere similar):


geo $googlebotip {
default 0;
64.18.0.0/20 1;
64.233.160.0/19 1;
66.102.0.0/20 1;
66.249.80.0/20 1;
72.14.192.0/18 1;
74.125.0.0/16 1;
108.177.8.0/21 1;
172.217.0.0/19 1;
173.194.0.0/16 1;
207.126.144.0/20 1;
209.85.128.0/17 1;
216.58.192.0/19 1;
216.239.32.0/19 1;

2001:4860:4000::/36 1;
2404:6800:4000::/36 1;
2607:f8b0:4000::/36 1;
2800:3f0:4000::/36 1;
2a00:1450:4000::/36 1;
2c0f:fb50:4000::/36 1;
}


This sets $googlebotip to 1 when the client comes from an IP which has been assigned to Google.

On the site where you want to block the fakers, add the following to the server-block:


if ($googlebotip = 0) {
set $GoogleBAD A;
}

if ($http_user_agent ~ “Googlebot” ) {
set $GoogleBAD “${GoogleBAD}B”;
}

if ($GoogleBAD = AB) {
return 410;
}


(instead of 410 you’re of course welcome to use another HTTP status code)

This is most likely not the most efficient way to do it. Neither is it very future-proof with the hard-coded Google IP’s. But it does work! 😉

--
860 views

Leave a Reply

reduction
%d bloggers like this: