Filtering Googlebot IPs from a list of IP addresses
Aug 03, 2018
I recently needed a simple script to filter Googlebot IPs from a list of IP addresses to be able to extract actual Googlebot visits from an access log.
Thankfully, Google provides a method to make sure a visitor is actually Googlebot.
If you have a similar need, there you go:
You may save it as something like filter-googlebot-ips.sh
and provide a file with a list of IP addresses to filter (each on a single line), as an argument. Like so:
$ ./filter-googlebot-ips.sh access-log-ips.txt > googlebot-ips.txt
This will perform reverse and forward DNS lookups for each of the IP addresses and print out the verified Googlebot IPs to STDOUT
, which you can write to a file like in the example above.
Hope it helps someone out there! 🙌
PS: Here is a GitHub Gist if you prefer that.