Writing numerical code using C and Python

2009-11-28

The main requirement for numerical code is that it must be fast. This means that numerical code is usually written in some “low-level” language such as C or Fortran. Such code can be highly optimized but low-level languages are not very expressive (where expressiveness is defined as the inverse of the number of lines of code that is necessary in order to express a program). Thus they are not very well suited to other tasks, such as data I/O, command line argument parsing, graphics, and user interaction that are peripheral to, but nevertheless important for, the computation and where speed does not play the primary role.

C++ is being often used in numerical programming since, due to its multi-paradigm nature, it is expressive enough to support high-level tasks (a library like Boost can be extremely helpful here) while at the same time it can produce highly optimized code. Despite these benefits, C++'s complexity, static typing, and the edit-compile-link cycle create severe impediments to exploratory programming.

Exploratory programming is possible in dynamic (often called “scripting”) languages such as Perl, Python, and Ruby. Such languages offer great, easy to use libraries for high-level tasks. Moreover they offer very expressive language constructs and a rich repertoire of built-in types (lists, dictionaries, etc.) that make exploratory programming not only possible but even a joy. Of course the problem with using such languages for numerical programming is that they are abysmally slow.

But since we have languages that are very well suited to low-level tasks and languages that are very well suited to high-level tasks why not combine them and program each task in the appropriate language? This is a reasonable idea but its success or failure in the real world depends strongly on how easily this can be done. Python has had for several years a number of solutions in this field. At one side the official, but rather low-level, Python/C API. At the other side projects like SWIG and SIP that help in writing Python bindings to C libraries. Although they work, and I have used SWIG in some of my previous projects, I was never quite satisfied.

A recent addition to this field is ctypes which is the easiest way to write Python bindings to low-level (C or Fortran) libraries. The bindings are actually written in Python so we get all the benefits of using a beautiful, expressive language. Using ctypes has changed the way I write numerical code. Usually, I would just write the whole thing in C++ and learn to accept the pain. Now, I start by writing the most computationally expensive part of the code, usually the ODE integrator, in C. Then I build my programs in Python and I call the integrator through ctypes. Since I write the programs in Python I enjoy all the benefits that the language provides. After I am satisfied that things work correctly, I may translate some other computationally intensive parts of the code to C and call them from my Python program through ctypes. This type of development allows me to test several ideas during an exploratory phase and then optimize for speed by rewriting the critical parts of the code in C. This method of building programs through successive rounds of experimentation and optimization is extremely productive.

Writing the bindings is actually trivial. For example, the C function

double hatom_2d_energy(double x[4]);

which has been compiled into the dynamic library libhatom_2d.dylib, can be imported in a Python program with the following code

from ctypes import *
lib = CDLL("libhatom_2d.dylib")
point_t = c_double * 4
lib.hatom_2d_energy.argtypes = [point_t]
lib.hatom_2d_energy.restype = c_double

and then called as

x = [0.0, 0.12, 1.1, 0.2]
print lib.hatom_2d_energy(point_t(*x))

The binding itself is defined through the two member variables argtypes and restype. The whole code is still ugly, but it can be hidden in a module and a wrapper function can be defined which is then imported from the module. The only real problem with ctypes is that argtypes and restype must be defined for each C function to be used in the Python program. This can be a problem for writing bindings to a C library with a few hundred functions but it´s not a problem I had to face.