Convert IP addresses to Countries using Python on Windows, Linux, Unix.

IP address to Country program

Source

Why convert IP addresses to Countries?

If you use any kind of log management or intrusion prevention or detection method then it's likely that these can import or make use of a list of known bad IP addresses. If so, then there might be a need to create known entities and enrich the data with geographic attributes.

As an example, there is a list kept at http://www.spamhaus.org/drop/drop.txt which is a list of networks and IP addresses that are known 'bad'.

The format is:

AA.BB.CC.DD/nn ; <name> <newline>

We would like to turn this into a CSV or similar and add the location.

Another source is found at http://www.blocklist.de/lists/all.txt

Warning: tabs tend to get eaten by blogging software, so some of these examples might be missing code indentation. Python needs this indentation. Download the final code at the end of this article.

GeoIP data source

There is a source of IP address to Location mapping at http://www.maxmind.com/ with a limit of 25 lookups per day. For example, the shot below is for an IP address currently in the 'bad' list: 2.56.0.5


Demo from maxmind

Get the data.

You can see that this is in the Ukraine which is a country notorious for naughty net high-jinx.

The maxmind database is a subscription service with either a site license or a site license with optional updates.

However, there is a free version called 'GeoLite' which is under a Creative Commons Attribution-ShareAlike 3.0 Unported License and you have to include the following as attribution:

"This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com"

Download the ZIP for Country or Country and City.

Don't open this in Excel because it's more than 65535 rows long and won't completely load. Also note there is a binary version for use with API and databases. I also discovered that Unicode ASCII is used for some of the names.

Here is the format of the file called GeoLiteCity-Blocks.csv

startIpNum, endIpNum   ,locId
"7602176" , "7864319"  ,"16"

This is the format of the file called GeoLiteCity-Location.csv

locId,country,region,city         ,postalCode,latitude,longitude,metroCode,areaCode
562  ,"ZA"   ,"02"  ,"Mpumalanga" ,""        ,-27.8818,31.5340  ,         ,


Not all fields are used in all rows.

How do we interpret this data?

Let's do this by example. Let's take the line

"7602176","7864319","16"

from GeoLiteCity-Blocks.csv

and

16,"AT","","","",47.3333,13.3333,,
from GeoLiteCity-Location.csv


"16" is the location code which matches in both files as an index. The code "AT" is the top level domain code for Austria, and you can verify this using google maps by pasting the coordinates 47.3333,13.3333 into maps.google.com directly into the search bar.

These country codes are also listed in a table.

What are those odd-looking IP addresses?

Normally, decimal-dotted IP addresses are used, but in this case, it's a single (quoted) string of digits. Let's assume that it's a single integer that can be computed from an IP address. Let's take a simple address like 1.2.3.4 and create a single integer.

Each digit in the decimal-dotted notation is a power of 256 (Because it's 8 bit and 2^8 = 256) so the formula to convert 1.2.3.4 into a single integer is:

(1*256^3)+(2*256^2)+(3*256^1)+(4*256^0) = 16909060

(You can make Google do that just by "searching" for that formula - cut and paste (1*256^3)+(2*256^2)+(3*256^1)+(4*256^0) into the Google search bar to find out.)

Conversely, to go from an integer to decimal-dotted notation you need to use the following formula:

If C4, D4, E4 and F4 are spreadsheet cells, and x is a named cell containing the ip address integer, then

Given IP address x (as a decimal integer) to convert to C4.D4.E4.F4

C4=INT(x/(256^3))

D4=INT((x-(C4*(256^3)))/(256^2))

E4=INT((x-(C4*(256^3)+D4*(256^2)))/(256))

F4=x-(C4*(256^3)+D4*(256^2)+(E4*256))

e.g. The integer 41735425 = 2.124.213.1

For a spot-check, there is an on line calculator.


Now we need a scripting language.

We could use several languages to perform lookups. It depends what you want to do. A front-end could be written in HTML using PHP or Javatext or similar, or we could use a command line scripting language like Python or Perl. Both are popular choices. We could also use C.

Python seems like a good choice. It's stable and well documented and has several Integrated Development Environments. It's also platform independent.

We could also use Visual Basic VB or one of the GUI-based integrated environments to make a nice GUI front end. But let's stick with a command line version because this will let us suck in IP addresses in a batch mode which can then be imported into whatever system you need them for.

Python for Windows.

Let's go for this freeware http://www.activestate.com/activepython/downloads version.

Download the installer for your operating system and launch it. This will install the files and after that you can launch in interactive shell the program menu (what used to be the start-menu).



Interactive shell

Note there is no module named pygeoip
Note there is no module named pygeoip

pygeoip

There just happens to be a "pure Python API for MaxMind GeoIP database" documented at code.google.com and we should use this and save ourselves a lot of coding.

We need to find out how to install that module because, as you can see above, it's not seen by the interactive Python shell.

There is a download. Get the file called pygeoip-<version>.tar.gz and the *apidocs.zip for the documentation. It's probably worth reading some documentation on python modules first. (Also here.)

In the interactive shell import the "sys" module (This is always supplied during install).

>>> import sys

Now you can find out which DOS paths are checked when trying to import a module:

>>> print sys.path
['', 'C:\\Windows\\system32\\python27.zip', 
'C:\\Python27\\DLLs', 
'C:\\Python27\\lib', 
'C:\\Python27\\lib\\plat-win', 
'C:\\Python27\\lib\\lib-tk', 
'C:\\Python27', 
'C:\\Python27\\lib\\site-packages', 
'C:\\Python27\\lib\\site-packages\\win32', 
'C:\\Python27\\lib\\site-packages\\win32\\lib', 
'C:\\Python27\\lib\\site-packages\\Pythonwin', 
'C:\\Python27\\lib\\site-packages\\setuptools-0.6c11-py2.7.egg-info']

You can see there are several places that you could put the module called pygeoip. The one highlighted contains a lot of *.py files and we could place the contents of the pygeoip.zip file there. However, the zip file contains a file called setup.py and in there we find Python code that will do the module installation for us.

Extract the zip file somewhere and use a DOS prompt and navigate to that directory.

Issue this help command to see what can be done:

C:\Python27\python.exe setup.py --help

This tells us that setup.py install will install the file and that's just what we need.

C:\Python27\python.exe setup.py install

The README

Pure Python GeoIP API. The API is based off of MaxMind's C-based Python API [1],

but the code itself is based on the pure PHP5 API [2] by Jim Winstead and Hans Lellelid.


It is mostly a drop-in replacement, except the

`new` and `open` methods are gone. You should instantiate the GeoIP class yourself:


gi = GeoIP('/path/to/GeoIP.dat', pygeoip.MEMORY_CACHE)


The only supported flags are STANDARD, MMAP_CACHE, and MEMORY_CACHE


If you have any questions or find a bug, have a look at the project page [3] or

contact Jennifer Ennis <zaylea at gmail dot com>


[1] http://www.maxmind.com/app/python

[2] http://pear.php.net/package/Net_GeoIP/

[3] http://code.google.com/p/pygeoip/

Test your module

RELAUNCH the interactive shell and type:

>>> import sys,pygeoip

This should not return any errors. If there are no errors, then the new module is installed and is ready for use.

Before we can use it, let's revisit the MaxMind download site and get the binary version (not the CSV) because this is cleaner. It's too big for a spreadsheet anyway, and the pygeoip module works with the binary (*.dat) file directly.

Make a folder called C:\GeoIP

Unzip the .dat file into C:\GeoIP

Now we can test it in the interactive Python window.

You should already have imported the sys and pygeoip modules. Check the name of the *.dat file in C:\GeoIP. Mine was called geoLiteCity.dat and so we can do this:

>>> geo = pygeoip.GeoIP('C:\GeoIP\GeoLiteCity.dat')


This makes a 'handle' called geo and we can use it to access the data.



Here is the whole interactive test

>>> import sys,pygeoip
>>> geo = pygeoip.GeoIP('C:\GeoIP\GeoLiteCity.dat')
>>> print geo.record_by_addr('20.2.3.4')['country_name']
United States



Now we can write a python script only a few lines long that will match most IP addresses to a country. (The database is not foolproof).

Open a text editor and put this code inside it:, saving it as ip2C.py

#!/usr/bin/env /c:/Python27/python
import pygeoip, sys
geo = pygeoip.GeoIP('C:\GeoIP\GeoLiteCity.dat')
for row in sys.stdin:
        rec = geo.record_by_addr(row)
        print rec['country_name']


NOTE Python uses white-space indentation as demarcation for code blocks and you have to consistently use the right number of tabs or spaces as indentation. It's one of very few annoying 'features' of Python.

Then create or obtain a list of IP addresses (one per line) and put them into a file called ip.txt and try this from your command prompt:

ip2C.py < ip.txt


You should get a list of countries for each of the IP addresses.


What can we do with this?

There are several internet sites that keep lists of IP addresses involved in 'bad' things. We could pull that data in and add country-data to it. This extra data could be useful for raising alerts on incoming or outgoing internet traffic.

Since the list contents could change frequently, it would be nice to have a way to automatically retrieve them. An ideal program for this is called wget and is available for Windows and *nix. It certainly comes with the excellent cygwin package for Windows.

This is an install for Windows wget follow the obvious links to find the latest version. It installs into C:\Program Files (x86)\GnuWin32

With wget, you can download a website or data. Here is an example:

>"C:\Program Files (x86)\GnuWin32\bin\wget.exe" http://data.phishtank.com/data/online-valid.csv

This is a website that generates a list of phishing sites and makes them available for download. wget let's you get this from a script. Please visit the website for rules about automated downloading.

This is the format of the .csv version of the data:

phish_id,url,phish_detail_url,submission_time,verified,verification_time,online,target
123456,http://www.example.com/,http://www.phishtank.com/phish_detail.php?phish_id=123456,2009-06-19T15:15:47+00:00,yes,2009-06-19T15:37:31+00:00,yes,1st National Example Bank

wget http://www.blocklist.de/lists/all.txt

--2012-07-12 10:34:06--  http://www.blocklist.de/lists/all.txt
Resolving www.blocklist.de (www.blocklist.de)... 176.9.54.236
Connecting to www.blocklist.de (www.blocklist.de)|176.9.54.236|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 302961 (296K) [text/plain]
Saving to: `all.txt.1'
100%[======================================>] 302,961     85.9K/s   in 3.4s
2012-07-12 10:34:17 (85.9 KB/s) - `all.txt.1' saved [302961/302961]

ip2C.py

#!/usr/bin/env /c:/Python27/python
import pygeoip, sys
geo = pygeoip.GeoIP('C:\GeoIP\GeoLiteCity.dat')
for row in sys.stdin:
        rec = geo.record_by_addr(row)
        print '{},\"{}\"'.format(row.strip() , rec['country_name'])

A simple example

For now, let's use wget to pull down a simple list of IP addresses: See the pane to the right. This could be done on a daily basis.

Now let's modify the script to print the IP address and the country as a CSV. Note that we have to use the format command and wrap the country in double quotes because some countries are listed like "China , Republic of " and we need to protect the comma inside that string. In the Python code, the double quotes are 'escaped' with a back-slash.


Try it out

Here is the command, and some sample output.

> python.exe ip2C.py < all.txt

  • 119.46.101.34,"Thailand"
  • 119.46.133.214,"Thailand"
  • 119.46.56.75,"Thailand"
  • 119.46.90.28,"Thailand"
  • 119.53.196.76,"China"
  • 119.57.37.9,"China"
  • 119.57.77.163,"China"
  • 119.6.7.27,"China"
  • 119.62.48.23,"China"
  • 119.66.129.77,"Korea, Republic of"
  • 119.66.144.93,"Korea, Republic of"
  • 119.70.227.139,"Korea, Republic of"
  • 119.71.213.76,"Korea, Republic of"
  • 119.74.243.150,"Singapore"
  • 119.82.252.240,"Cambodia"
  • 119.82.73.242,"India"
  • 119.84.117.74,"China"
  • 119.84.117.75,"China"
  • 119.92.225.242,"Philippines"

Let's get more information

In the Python code above, the variable rec is an array. It's what is called an associative array which means that the index is a non-ordinal. That in turn simply means that the index is not a number. Actually, this index is a string of characters. Another term used in this context is 'key value pair'. The key that we used is 'country name'.

There are other values in this database and we can get them all by printing the whole record as follows:

print rec

That's rather simple. Here is a single sample output:

{'city': '', 'time_zone': 'Asia/Taipei', 'longitude': 121.0, 'metro_code': '', 'country_code3': 'TWN', 'latitude': 23.5, 'postal_code': None, 'country_code': 'TW', 'country_name': 'Taiwan'}

From this, we can list the available indexes:

  • city
  • time_zone
  • longitude
  • metro_code
  • country_code3
  • latitude
  • postal_code
  • country_code
  • country_name

... and modify our script to also print out the city ( or any of the other data ).

print '{},\"{}\",\"{}\"'.format(row.strip() , rec['country_name'], rec['city'])

Extracting the Spamhaus IP addresses

Here is sample data from the Spanhaus sire mentioned at the start of this article.

; Spamhaus DROP List 07/12/12 - (c) 2012 The Spamhaus Project
; Last-Modified: Tue, 10 Jul 2012 22:45:16 GMT
; Expires: Fri, 13 Jul 2012 01:02:23 GMT
2.56.0.0/14 ; SBL102988
14.192.0.0/19 ; SBL123577
14.192.48.0/21 ; SBL131019
14.192.56.0/22 ; SBL131020

If we pulled this in using wget, then tried to process it with the existing python script, then it would fail because the IP address is not isolated. We need a way to extract it. We also need to ignore the semi-colon. There are several ways to do this. It would be nice to find the most flexible method to make our script work for other formats too. In fact, if we use a module called "re" in Python, this gives us the power of "regular expressions" (I wrote a full hub about this. We should be able to extract valid IP addresses from each line, no matter where they appear.

The module is imported like this:

import re

It's installed by default so you won't need to find and download a special module.

A better IP address finder

s='kj lkj lkjsdfjj 2.3.4.5/23;ll kjdskj 4.5.6.7 45.76.345.32 1.-6.34. 100.200.300.400./234'

ip=re.findall(r'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b',s)

print ip

['2.3.4.5', '4.5.6.7']

Use the interactive shell to test it.

import re

s='1.2.3.4/34;sksdfjkl'
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )
print ip[0]

If you run these lines, then the output is a clean IP address.

But the regex used above will find invalid IP addresses too so we need a better pattern.

This example)to the right) shows not only how the single line of regex will find only valid ip addresses, it will also find them no matter how they are buried in the string.

The output is an array, and we can pick these out one by one.

for i in ip:
      print i



import re

s=';123//;456#999'

r = re.split(';|#|//',s)

if r[0]=='':
	print 'Nothing'

s='This is an IP that is valid 1.2.3.4 see!; and a comment 1.4.5.6'

r = re.split(';|#|//',s)
if r[0]=='':

	print 'Nothing'

else:

	print r[0]

Ignoring the comment lines in spamhaus

The spamhaus file uses a semi-colon at the start of the line to indicate that the line should be ignored. We also need to prepare each line by stripping everything to the right of a comment character. This is because the comment might contain a valid IP address and we should ignore it. There are several characters used for comments so our script should be able to pick from a list of them. Luckily, in Python, there is an easy way to split up a string based on a set of delimiters. All we need to do us split the string based on some common comment delimiters, and throw away all but what is before the comment.

Here is a start, using a single delimiter option of a semi-colon.

import re
s='123;456'
print re.split(';',s)


... and the result:

$ python.exe x.py
['123', '456']


We can add more delimiters and test it thus:

import re
s='123;456'
print re.split(';',s)
s=';123//;456#999'
print re.split(';|#|//',s)

with the result:

$ python.exe x.py
['123', '456']
['', '123', '', '456', '999']

Note that the first element of the output array is null. We can simply use the first element as the input string and ignore it if it is null. This takes care of all the comment lines.

The script to the right contains an example of what we need to extract the non-comment portion of an input string.

wget and portability

wget is great, but there is a way to do what we need directly from Python, and this is an advantage for portability. Instead of separately installing wget, and possibly needing to set up paths, and find a way to call wget from python, we can use a module called urllib. Read about it first of course.

This is another module that comes built-in to Python which is great because it's less to worry about when installing onto a new system. Many environments use http proxies to access the internet. The module deals with this transparently if there is no authentication. In a Windows environment, if no proxy environment variables are set, proxy settings are obtained from the registry’s Internet Settings section. Under Linux etc it will take note of the environment variable as in this example:

export http_proxy="http://192.0.2.5:3128"

Example- obtain Spamaus data

Here is sample code and output to pull in data from Spamhaus

import urllib2
response = urllib2.urlopen('http://www.spamhaus.org/drop/drop.txt')
html = response.read()
print html

Output

; Spamhaus DROP List 07/12/12 - (c) 2012 The Spamhaus Project
; Last-Modified: Tue, 10 Jul 2012 22:45:16 GMT
; Expires: Fri, 13 Jul 2012 02:38:21 GMT
2.56.0.0/14 ; SBL102988
14.192.0.0/19 ; SBL123577
14.192.48.0/21 ; SBL131019
(etc)

This is very good because we can use this in our ip2C.py script to get the data directly from the internet.

It is important to gracefully deal with errors. Python will trap errors using the 'try' keyword. Once we trap an error, it would be a good idea to save the error somewhere. However, for now, let's just report this to the standard error channel.

Sample error handling code

#!/usr/bin/env /c:/Python27/python
import sys, urllib2
someurl='http://www.blocklist.de/lists/all.txt'
req = urllib2.Request(someurl)
try:
    response = urllib2.urlopen(req)
except urllib2.URLError, e:
        if hasattr(e, 'reason'):
                print 'We failed to reach a server.'
                print 'Reason: ', e.reason
        elif hasattr(e, 'code'):
                print 'The server couldn\'t fulfill the request.'
                print 'Error code: ', e.code
else:
        print 'everything is fine'

Putting it all together

A fully working, copyrighted, licensed script is available called ip2Cdir.py from this link.

https://docs.google.com/open?id=0BxOqkVO4vQsqVlprT2xjM0NiZ28

You may use it with attribution of the author (me), and of course the maxmind database. See the comments in the file for details.

Run it from the command line on a Windows system with interactive Python installed in the default directory and also with the pygeoip python module installed. You will need internet access and either no proxy, or one that does not require authentication.

To run open a command prompt and type

python ip2Cdir.py

(Assuming your path variable will find python, and the script is in the current directory).

If you have any problems - hit the comment section below. I can't promise a fast response as I probably won't have time to maintain the script. After all, this was just done on a day off feeling sick. I don't get that kind of time often.

#!/usr/bin/env /c:/Python27/python
#
# "This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com"
#
# The first line works for 'interactive Python' as installed on Windows.
# For unix/linux/mac etc this line will need to be changed to suit.
#
# This script pulls a list of IP addresses (Assumed one per line)
# from the internet at the specified variable 'someurl' below.
# Then it uses an IP address to country database to print a
# csv list containing
# <IP> , <"Country"> , <"City">
#
# The script may be used to obtain other fields indexed by any of:
#
# country_name
# city
# longitude
# latitude
# time_zone
# metro_code
# country code
# country_code3
# postal_code
#
# To modify the output, change the output format string
#
#
# obtain and install the module pygeoip before using this script.
# http://code.google.com/p/pygeoip/downloads/list
# ( download pygeoip-0.2.2.tar.gz and run the Python install script )
#
# obtain GeoLiteCity.dat on a regular basis and put it into C:\GeoIP
# http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
#
#
# Author: Jeremy Lee 12/07/2012 version 0.1 beta
#
# License: Use for any pupose under the condition that the Author is
# referenced as above, and the maxmind attribution remains intact.
#
#
# potential enhancements:
# 1. Use a .ini file for configuration.
# 2. Provide a GUI to configure the .ini file
# 3. Detect the presence and date of the geoIP data and install automatically
# 4. Make an installer script
# 5. Merge multiple sources of potentially malicious IP address lists.
# 6. Write the output to a file
# 7. Update the output file on a periodic basis
# 8. Parameterise the output based on .ini information using a format string
#
#

import pygeoip, sys, urllib2, re


geo = pygeoip.GeoIP('C:\GeoIP\GeoLiteCity.dat')

#
#
# Choose a source of malicious IP addresses here
#
#someurl='http://www.blocklist.de/lists/all.txt'
someurl='http://www.spamhaus.org/drop/drop.txt'

# Pull in the data from the internet
req = urllib2.Request(someurl)
# Try to open it and report any errors to stderr
try:
    response = urllib2.urlopen(req)
except urllib2.URLError, e:
	if hasattr(e, 'reason'):
		print 'We failed to reach a server.'
		print 'Reason: ', e.reason
	elif hasattr(e, 'code'):
		print 'The server couldn\'t fulfill the request.'
		print 'Error code: ', e.code
		# At this point, the script terminates.
else:
	# At this point, we have the internet data in 'response'
	while True:
		s=response.readline().strip()
		if s.__len__() == 0 :
			break
			# Here, we terminate the loop because there is no more data in 'response'
		# This next line splits the row where we find delimeters that are used as comments.
		# Presently, a semi-colon, hash or two slashes are considered comments. You can
		# add more if needed.
		r = re.split(';|#|//',s)
		# Now we only use the first element in the split array as data. If the comment
		# was at the beginning, then this element is empty.
		if r[0]!='':
			# The data is not empty - so look for a valid ip address
			ip=re.findall(r'\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b',r[0])
			# Assume an IP address was found...
			rec = geo.record_by_addr(ip[0])
			# If so, then 'rec' is True and we can continue
        		if rec:
				# Modify the following output format as needed
				print '{},\"{}\",\"{}\"'.format(ip[0] , rec['country_name'], rec['city'])

More by this Author


Comments 1 comment

guest 4 years ago

good, but I didn't understand; may be too much technical.

    Sign in or sign up and post using a HubPages Network account.

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    Click to Rate This Article
    working