ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Benchmarking Python, Cython, and Cython+MPI

Updated on July 12, 2012

Introduction

For better or worse, I'm primarily a Python programmer. However, I occasionally run into a severe need for speed. Luckily, I'm not alone and many great features and libraries are available to help get my code "closer to metal". In this article, I'll detail the performance of pure Python, Cython, and distributed Cython (via mpi4py) in numerical processing. The machine I'm running on is a Toshiba laptop with a quad-core i7 (8 threads).

The Test: Naïve Numerical Integration

To test the approaches, I'll calculate the Riemann integral of sin(x2) over [0,1]. I'll vary the number of sub-intervals from 10,000,000 to 100,000,000.

First Approach: Pure Python

For the base case, we'll use pure Python with the math module's sine function. Code is shown below.

Pure Python Implementation

import time, math

def f(x):
	return math.sin(x**2)

def integrate(a,b,N):
	s = 0
	dx = (b-a)/float(N)
	for i in range(N):
		s += f(a + i*dx)
	return s

for n in range(10000000,100000001,10000000):
	start = time.time()
	integrate(0,1,n)
	tme = time.time() - start
	print '%s,%s'%(n, tme)

Second Approach: Cython

Cython is an excellent and simple method for writing and calling C-functions in Python. Even if you have never programmed C before, you should be able to pick up the basics quickly and immediate add value to your more computationally intense tasks (assuming they can't be solved via Numpy).

To implement, I wrote a "a.pyx" file with C definitions and directly call C's math library. To build a Python-accessible module, I've supplied a setup.py file. To compile, run the following at the command-line:

python setup.py build_ext --inplace

Cython File, a.pyx

cdef extern from "math.h":
	double sin(double)

cdef double f(double x) except *:
	return sin(x**2)

def integrate(double a, double b, int N):
	cdef int i
	cdef double s, dx
	s = 0
	dx = (b-a)/N
	for i in range(N):
		s += f(a + i*dx)
	return s*dx

Cython Build File, setup.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("a", ["a.pyx"])]

setup(
name = 'Cython Example',
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules
)

Cython Testing Script

import a as App # integration is in "a.so"
import time

for n in range(10000000,100000001,10000000):
	start = time.time()
	v = App.integrate(0,1,n)
	tme = time.time() - start
	print '%s,%s,%s'%(n, tme, v)

Third Approach: Cython with MPI

MPI stands for Message Passing Interface and is a standard library for running parallel tasks on high performance clusters and supercomputers. In this case, I'll use it to distribute the integration workload across 7 threads on my laptop. I'll use one thread as the "master"; the seven "slaves" will carry out integration over a sub-interval of the domain.

The specific MPI Python library to be used is called mpi4py, written by Lisandro Dalcin. MPI is very powerful, but involves a significant amount of I/O overhead, so we should expect relative performance to improve as the work volume increases.

Cython + MPI: Master Script

import time, sys, numpy
from mpi4py import MPI

PROC = 7 # seven slaves

for n in range(10000000,100000001,10000000):

	start = time.time()

	# we include the establishment/closing of connections within the timer
	comm = MPI.COMM_SELF.Spawn(sys.executable, args=['slave.py'], maxprocs=PROC)
	N = numpy.array(int(n/PROC+.5), 'i')
	comm.Bcast([N, MPI.INT], root=MPI.ROOT)
	AREA = numpy.array(0.0, 'd')
	comm.Reduce(None, [AREA, MPI.DOUBLE], op=MPI.SUM, root=MPI.ROOT)
	comm.Disconnect()

	tme = time.time() - start

	print '%s,%s'%(n, tme)

Cython + MPI: Slave Script

#!/usr/bin/env python
from mpi4py import MPI
import a as App # again, we use Cython integration from "a.so"
import numpy

comm = MPI.Comm.Get_parent()
size = comm.Get_size()
rank = comm.Get_rank()
N = numpy.array(0, dtype='i')
comm.Bcast([N, MPI.INT], root=0)
AREA = numpy.array(App.integrate(rank*0.1428571, (rank+1)*0.1428571, N), dtype='d')
comm.Reduce([AREA, MPI.DOUBLE], None, op=MPI.SUM, root=0)
comm.Disconnect()

Performance Comparison

Performance Comparison - Log Time

Analysis & Conclusion

We can see from the figures above that pure Python is, predictably, the slowest. For a little more effort, we can implement via Cython and gain a massive performance increase (a full order of magnitude). Finally, for large interval values, message passing can squeeze out even more performance. However, for smaller problems, the overhead introduced by setting up communication can be expensive.

Overall, while clearly a toy problem, I hope this illustrates how versatile Python can be and some of the lesser known tools for boosting performance.

Comments

    0 of 8192 characters used
    Post Comment

    • electronician profile image

      Dean Walsh 

      4 years ago from Birmingham, England

      Bookmarked for future reference - I'm still learning Python so I don't want to have to learn C at the same time, but I know I'm going to end up needing to improve the performance of Python code at some point.

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://hubpages.com/privacy-policy#gdpr

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)