A Python script to check Google rankings for a specific domain and search term

Using Python’s pycurl (cURL) and re (Regular Expression) libraries, it’s possible to write a script that will check the Google ranking of a specific domain for a specific search term.

To check for and install Python 2.4 and the py-curl library on Mac OS X:

Follow these instructions to install MacPorts if it hasn’t been installed yet, then open a new Terminal window and enter the following command to see a listing of all installed ports:

sudo port installed

If ‘python24‘ and ‘py-curl‘ are not listed amongst the installed ports, install them by entering:

sudo port install python24
sudo port install py-curl

To check for and install Python 2.4 and the pycurl library on Ubuntu Linux:

Open a new Terminal window and enter the following command to install Python and the pycurl library (you’ll be notified if they’ve already been installed):

sudo apt-get install python2.4
sudo apt-get install python-pycurl

To run the rankcheck.py script:

Download Geekology’s version of this script here, or copy the code below to create your own rankcheck.py script file:

#!/usr/bin/python
 
"""
 
 This script accepts Domain, Search String and Google Locale arguments, then returns
 which Search String results page for the Google Locale the Domain appears on.
 
 
 Usage example:
 
  rankcheck {domain} {searchstring} {locale}
 
 
 Output example:
 
  rankcheck geekology.co.za 'bash scripting' .co.za
   - The domain 'geekology.co.za' is listed in position 2 (page 1) for the search 'bash+scripting' on google.co.za
 
"""
 
__author__    = "Willem van Zyl (willem@geekology.co.za)"
__version__   = "$Revision: 1.5 $"
__date__      = "$Date: 2009/02/10 12:10:24 $"
__license__   = "GPLv3"
 
import sys, pycurl, re
 
# check if all arguments were specified and whether help was requested:
if len(sys.argv) < 4:
  if len(sys.argv) == 1:
    print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE";
    print "`rankcheck --help' for more information."
    sys.exit()
  elif sys.argv[1] == '--help':
    print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"
    print "Check the Search String page ranking of a Domain on a specific Google Locale"
    print "\nExample: rankcheck geekology.co.za 'bash scripting' .co.za"
    print "\nReport bugs to <willem@geekology.co.za>."
    sys.exit()
  else:
    print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE";
    print "`rankcheck --help' for more information."
    sys.exit()
 
 
# some initial setup:
USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)'
FIND_DOMAIN = sys.argv[1]
SEARCH_STRING = sys.argv[2].replace(' ', '+')
LOCALE = sys.argv[3]
 
# check if the locale is valid:
if sys.argv[3] == '.co.za':
  SEARCH_COUNTRY = '&meta=cr%3DcountryZA'
elif sys.argv[3] == '.co.uk':
  SEARCH_COUNTRY = '&meta=cr%3DcountryUK'
elif sys.argv[3] == '.com':
  SEARCH_COUNTRY = ''
else:
  print "Only the '.com', '.co.uk' and '.co.za' locales are allowed."
  sys.exit()
 
ENGINE_URL = 'http://www.google' + LOCALE + '/search?q=' + SEARCH_STRING + SEARCH_COUNTRY
 
 
# define class to store result:
class RankCheck:
  def __init__(self):
    self.contents = ''
 
  def body_callback(self, buf):
    self.contents = self.contents + buf
 
 
# instantiate curl and result objects:
rankRequest = pycurl.Curl()
rankCheck = RankCheck();
 
 
# set up curl:
rankRequest.setopt(pycurl.USERAGENT, USER_AGENT)
rankRequest.setopt(pycurl.FOLLOWLOCATION, 1)
rankRequest.setopt(pycurl.AUTOREFERER, 1)
rankRequest.setopt(pycurl.WRITEFUNCTION, rankCheck.body_callback)
rankRequest.setopt(pycurl.COOKIEFILE, '')
rankRequest.setopt(pycurl.HTTPGET, 1)
rankRequest.setopt(pycurl.REFERER, '')
 
# run curl:
for i in range(0, 10):
  rankRequest.setopt(pycurl.URL, ENGINE_URL + '&start=' + str(i * 10))
  rankRequest.perform()
 
# close curl:
rankRequest.close()
 
 
# collect the search results
html = rankCheck.contents
counter = 0
result = 0
 
url=unicode(r'(<h3 class=r><a href=")((https?):((//))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)')
 
for google_result in re.finditer(url, html):
  # print m.group()
  this_url = google_result.group()
  this_url = this_url[21:]
  counter += 1
 
  google_url_regex = re.compile("((https?):((//))+([\w\d:#@%/;$()~_?\+-=\\\.&])*" + FIND_DOMAIN + "+([\w\d:#@%/;$()~_?\+-=\\\.&])*)")
  google_url_regex_result = google_url_regex.match(this_url)
  if google_url_regex_result:
    result = counter
    break
 
 
# show results
if result == 0:
  print " - The domain '" + FIND_DOMAIN + "' wasn't listed in the first 10 pages for the search '" + SEARCH_STRING + "' on google" + LOCALE
else:
  print " - The domain '" + FIND_DOMAIN + "' is listed in position " + str(result) + " (page " + str((result / 10) + 1) + ") for the search '" + SEARCH_STRING + "' on google" + LOCALE

Open a new Terminal window and navigate to the folder containing the script, then execute it by entering:

python ./rankcheck.py {domain} '{search string}' {locale}

… filling in the Domain, Search String and Locale that you want to check.

Because the Python script file starts with ‘#!/usr/bin/python‘, you’ll be able to execute it from the command line without invoking the python executeable if you set execute permissions on the file:

sudo chmod 744 rankcheck.py
 
./rankcheck.py {domain} '{search string}' {locale}

 

Related posts:

  1. Checking your internal and external IP Addresses on a Unix machine
  2. How to regain country-specific searches on Google
  3. Uninstall Google Software Update on Mac OS X
  4. Sending Tweets from the command line using a Bash script
  5. Advanced Google search hacks and tricks
Twitter Digg Delicious Stumbleupon Technorati Facebook Email

7 Responses to “A Python script to check Google rankings for a specific domain and search term”

  1. Never used pyton before .php is as far as I go…some homework todo before I understand how to even execute this :D appreciate you sharing the code…

  2. Hey Sean

    If you’re using a Linux / Mac machine the instructions at the start of the article should set you up with a working Python installation, but if you get stuck please let me know!

  3. Seems like a task like this could be done much more easily in a language such as REBOL. For example:

    http://www.flippingsweet.com/scrape.html

  4. Hi Edoc

    Thanks for the link! Yes, there are several simpler solutions, but my main purpose with this post was to demonstrate re (Regular Expression) and pycurl (cURL) usage in Python.

  5. Thanks, I’ve used a variation of this code in a new seo reporting app I’m writing.

  6. Hey Jonathan

    Sure! If you’re going to release the app, I’d love to see it when it’s done. :)

  7. Hi willem, This post was pretty useful fr me to understand how to use pycurl. Just want to ensure that others like me understand that if response data is greater than 16kb in size, the call back function (body_callback) is called multiple times with chunks of 16kb each time.

Afrigator