A Python script to check Google rankings for a specific domain and search term
Author: willem In: apple, coding, linux, python, seo, tools, unixUsing Python’s pycurl (cURL) and re (Regular Expression) libraries, it’s possible to write a script that will check the Google ranking of a specific domain for a specific search term.
To check for and install Python 2.4 and the py-curl library on Mac OS X:
Follow these instructions to install MacPorts if it hasn’t been installed yet, then open a new Terminal window and enter the following command to see a listing of all installed ports:
sudo port installedIf ‘python24‘ and ‘py-curl‘ are not listed amongst the installed ports, install them by entering:
sudo port install python24 sudo port install py-curl
To check for and install Python 2.4 and the pycurl library on Ubuntu Linux:
Open a new Terminal window and enter the following command to install Python and the pycurl library (you’ll be notified if they’ve already been installed):
sudo apt-get install python2.4 sudo apt-get install python-pycurl
To run the rankcheck.py script:
Download Geekology’s version of this script here, or copy the code below to create your own rankcheck.py script file:
#!/usr/bin/python """ This script accepts Domain, Search String and Google Locale arguments, then returns which Search String results page for the Google Locale the Domain appears on. Usage example: rankcheck {domain} {searchstring} {locale} Output example: rankcheck geekology.co.za 'bash scripting' .co.za - The domain 'geekology.co.za' is listed in position 2 (page 1) for the search 'bash+scripting' on google.co.za """ __author__ = "Willem van Zyl (willem@geekology.co.za)" __version__ = "$Revision: 1.5 $" __date__ = "$Date: 2009/02/10 12:10:24 $" __license__ = "GPLv3" import sys, pycurl, re # check if all arguments were specified and whether help was requested: if len(sys.argv) < 4: if len(sys.argv) == 1: print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"; print "`rankcheck --help' for more information." sys.exit() elif sys.argv[1] == '--help': print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE" print "Check the Search String page ranking of a Domain on a specific Google Locale" print "\nExample: rankcheck geekology.co.za 'bash scripting' .co.za" print "\nReport bugs to <willem@geekology.co.za>." sys.exit() else: print "usage: rankcheck DOMAIN SEARCHSTRING LOCALE"; print "`rankcheck --help' for more information." sys.exit() # some initial setup: USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 6.0)' FIND_DOMAIN = sys.argv[1] SEARCH_STRING = sys.argv[2].replace(' ', '+') LOCALE = sys.argv[3] # check if the locale is valid: if sys.argv[3] == '.co.za': SEARCH_COUNTRY = '&meta=cr%3DcountryZA' elif sys.argv[3] == '.co.uk': SEARCH_COUNTRY = '&meta=cr%3DcountryUK' elif sys.argv[3] == '.com': SEARCH_COUNTRY = '' else: print "Only the '.com', '.co.uk' and '.co.za' locales are allowed." sys.exit() ENGINE_URL = 'http://www.google' + LOCALE + '/search?q=' + SEARCH_STRING + SEARCH_COUNTRY # define class to store result: class RankCheck: def __init__(self): self.contents = '' def body_callback(self, buf): self.contents = self.contents + buf # instantiate curl and result objects: rankRequest = pycurl.Curl() rankCheck = RankCheck(); # set up curl: rankRequest.setopt(pycurl.USERAGENT, USER_AGENT) rankRequest.setopt(pycurl.FOLLOWLOCATION, 1) rankRequest.setopt(pycurl.AUTOREFERER, 1) rankRequest.setopt(pycurl.WRITEFUNCTION, rankCheck.body_callback) rankRequest.setopt(pycurl.COOKIEFILE, '') rankRequest.setopt(pycurl.HTTPGET, 1) rankRequest.setopt(pycurl.REFERER, '') # run curl: for i in range(0, 10): rankRequest.setopt(pycurl.URL, ENGINE_URL + '&start=' + str(i * 10)) rankRequest.perform() # close curl: rankRequest.close() # collect the search results html = rankCheck.contents counter = 0 result = 0 url=unicode(r'(<h3 class=r><a href=")((https?):((//))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)') for google_result in re.finditer(url, html): # print m.group() this_url = google_result.group() this_url = this_url[21:] counter += 1 google_url_regex = re.compile("((https?):((//))+([\w\d:#@%/;$()~_?\+-=\\\.&])*" + FIND_DOMAIN + "+([\w\d:#@%/;$()~_?\+-=\\\.&])*)") google_url_regex_result = google_url_regex.match(this_url) if google_url_regex_result: result = counter break # show results if result == 0: print " - The domain '" + FIND_DOMAIN + "' wasn't listed in the first 10 pages for the search '" + SEARCH_STRING + "' on google" + LOCALE else: print " - The domain '" + FIND_DOMAIN + "' is listed in position " + str(result) + " (page " + str((result / 10) + 1) + ") for the search '" + SEARCH_STRING + "' on google" + LOCALE
Open a new Terminal window and navigate to the folder containing the script, then execute it by entering:
python ./rankcheck.py {domain} '{search string}' {locale}
… filling in the Domain, Search String and Locale that you want to check.
Because the Python script file starts with ‘#!/usr/bin/python‘, you’ll be able to execute it from the command line without invoking the python executeable if you set execute permissions on the file:
sudo chmod 744 rankcheck.py ./rankcheck.py {domain} '{search string}' {locale}
Related posts:
Like this post? Subscribe to the Geekology RSS 2.0 feed!












Sean / Marketing
February 26th, 2009 at 14:57
Never used pyton before .php is as far as I go…some homework todo before I understand how to even execute this
appreciate you sharing the code…
willem
February 26th, 2009 at 15:01
Hey Sean
If you’re using a Linux / Mac machine the instructions at the start of the article should set you up with a working Python installation, but if you get stuck please let me know!
Edoc
March 10th, 2009 at 00:04
Seems like a task like this could be done much more easily in a language such as REBOL. For example:
http://www.flippingsweet.com/scrape.html
willem
March 10th, 2009 at 08:12
Hi Edoc
Thanks for the link! Yes, there are several simpler solutions, but my main purpose with this post was to demonstrate re (Regular Expression) and pycurl (cURL) usage in Python.
Jonathan Bydendyk
April 17th, 2009 at 14:31
Thanks, I’ve used a variation of this code in a new seo reporting app I’m writing.
willem
April 17th, 2009 at 19:39
Hey Jonathan
Sure! If you’re going to release the app, I’d love to see it when it’s done.
raghu
November 22nd, 2009 at 14:01
Hi willem, This post was pretty useful fr me to understand how to use pycurl. Just want to ensure that others like me understand that if response data is greater than 16kb in size, the call back function (body_callback) is called multiple times with chunks of 16kb each time.