Monday, November 8, 2010

Getting Started With Cython

Getting Started With Cython



Quote about Cython:

Andrew Tipton says "I'm honestly never going back to writing C again. Cython gives me all the expressiveness of Python combined with all the performance and close-to-the-metal-godlike-powers of C. I've been using it to implement high-performance graph traversal and routing algorithms and to interface with C/C++ libraries, and it's been an absolute amazing productivity boost."  Yep.


Cython has two major use cases

  1. Extending the CPython interpreter with fast compiled modules,
  2. Interfacing Python code with external C/C++ libraries.

Cython supports type declarations

  1. For changing code from having dynamic Python semantics into having static-and-fast (but less generic) C semantics.
  2. Directly manipulating C data types defined in external libraries.

Tutorial: Building Your First Cython Code by Hand

It happens in two stages:
  1. A .pyx file is compiled by Cython to a .c or .cpp file.
  2. The .c or .cpp file is compild by a C compiler (such as GCC) to a .so file.
Let's try it now:
First, create a file sum.pyx that contains the following code (see this directory for original code files):

def sum_cython(long n):
    cdef long i, s = 0
    for i in range(n):
        s += i
    return s
Then use Cython to compile it:


Since we're using Sage, you can do

bash$ sage -cython sum.pyx
bash$ ls
sum.c  sum.pyx
Notice the new file sum.c.  Compile it with gcc as follows on OS X: 

bash$ sage -sh
bash$ gcc -I$SAGE_ROOT/local/include/python2.6 -bundle -undefined dynamic_lookup sum.c -o sum.so 

On Linux, do:

bash$ sage -sh
bash$ gcc -I$SAGE_ROOT/local/include/python2.6 -shared -fPIC sum.c -o sum.so
Finally, try it out.


You must run Sage from the same directory that contains the file sum.so. When you type import sum below, the Python interpreter sees the file sum.so, opens it, and it contains functions and data that define a compiled "Python C-extension module", so Python can load it (like it would like a module like sum.py).

bash$ sage
-------------------------------------------------
| Sage Version 4.6, Release Date: 2010-10-30     
| Type notebook() for the GUI, and license() for
-------------------------------------------------
sage: import sum
sage: sum.sum_cython(101)
5050
sage: timeit('sum.sum_cython(101)')
625 loops, best of 3: 627 ns per loop
sage: timeit('sum.sum_cython(101)', number=10^6)    # better quality timing
1000000 loops, best of 3: 539 ns per loop

Finally, take a look at the (more than 1000 line) autogenerated C file sum.c:
bash$ wc -l sum.c
    1178 sum.c
bash$ less sum.c
...

Notice code like this, which illustrates that Cython generates code that supports both Python2 and Python3:


#if PY_MAJOR_VERSION < 3
  #define __Pyx_BUILTIN_MODULE_NAME "__builtin__"
#else
  #define __Pyx_BUILTIN_MODULE_NAME "builtins"
#endif

#if PY_MAJOR_VERSION >= 3
  #define Py_TPFLAGS_CHECKTYPES 0
  #define Py_TPFLAGS_HAVE_INDEX 0
#endif
The official Python docs say: "If you are writing a new extension module, you might consider Cython. It translates a Python-like language to C. The extension modules it creates are compatible with Python 3.x and 2.x."  

If you scroll down further you'll get past the boilerplate and see the actual code:

...
  /* "/Users/wstein/edu/2010-2011/581d/notes/2010-11-08/sum.pyx":2
 * def sum_cython(long n):
 *     cdef long i, s = 0             # <<<<<<<<<<<<<<
 *     for i in range(n):
 *         s += i
 */
  __pyx_v_s = 0;

  /* "/Users/wstein/edu/2010-2011/581d/notes/2010-11-08/sum.pyx":3
 * def sum_cython(long n):
 *     cdef long i, s = 0
 *     for i in range(n):             # <<<<<<<<<<<<<<
 *         s += i
 *     return s
 */
  __pyx_t_1 = __pyx_v_n;
  for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) {
    __pyx_v_i = __pyx_t_2;

    /* "/Users/wstein/edu/2010-2011/581d/notes/2010-11-08/sum.pyx":4
 *     cdef long i, s = 0
 *     for i in range(n):
 *         s += i             # <<<<<<<<<<<<<<
 *     return s
 */
    __pyx_v_s += __pyx_v_i;
  }
...
There is a big comment that shows the original Cython code with context and a little arrow pointing at the current line (these comment blocks with context were I think the first thing I personally added to Pyrex... before, it just gave that first line with the .pyx filename and line number, but nothing else).  Below that big comment, there is the actual C code that Cython generates.  For example, the Cython code  s += i is turned into the C code __pyx_v_s += __pyx_v_i;.  


The Same Extension From Scratch, for Comparison

If you read Extending and Embedding Python you'll see how you could write a C extension module from scratch that does the same thing as sum.so above. Let's see what this is like, for comparison. Given how simple sum.pyx is, this isn't so hard. When creating more complicated Cython code---e.g., new extension classes, more complicated type conversions, and memory management---writing C code directly quickly becomes unwieldy.
First, create a file sum2.c as follows:


#include <Python.h>

static PyObject * 
sum2_sum_c(PyObject *self, PyObject *n_arg)
{
    long i, s=0, n = PyInt_AsLong(n_arg);
    
    for (i=0; i<n; i++)  {
 s += i;
    }
    PyObject* t = PyInt_FromLong(s);
    return t;
}

static PyMethodDef Sum2Methods[] = {
    {"sum_c", sum2_sum_c, METH_O, "Sum the numbers up to n."},
    {NULL, NULL, 0, NULL} /* Sentinel */
};

PyMODINIT_FUNC
initsum2(void)
{
    PyObject *m;
    m = Py_InitModule("sum2", Sum2Methods);
}
Now compile and run it as before: 
bash$ sage -sh
bash$ gcc -I$SAGE_ROOT/local/include/python2.6 -bundle -undefined dynamic_lookup sum2.c -o sum2.so 
bash$ sage
...
sage: import sum2
sage: sum2.sum_c(101)
5050
sage: import sum
sage: sum.sum_cython(101)
5050
sage: timeit('sum.sum_cython(1000000r)')
125 loops, best of 3: 2.54 ms per loop
sage: timeit('sum2.sum_c(1000000r)')
125 loops, best of 3: 2.03 ms per loop
Note that this is a little faster than the corresponding Cython code. This is because the Cython code is more careful, checking various error conditions, etc.  

Note that the C code is 5 times as long as the Cython code.

Building Extensions using Setuptools Instead

In nontrivial projects, the Cython step of transforming your code from .pyx to .c is typically done by explicitly calling cython somehow (this will change in the newest version of Cython), but the step of running the C compiler is usually done using either distutils or setuptools. To use the tools, one creates a file "setup.py" which defines the extensions in your project, and Python itself then runs a C compiler for you, with the proper options, includes paths, etc.

Let's create a new setuptools project that includes the sum and sum2 extensions that we defined above. First, create the following file and call it setup.py. This should be in the same directory as sum.c and sum2.c.
from setuptools import setup, Extension

ext_modules = [
    Extension("sum", ["sum.c"]),
    Extension("sum2", ["sum2.c"])
    ]

setup(
    name = 'sum',
    version = '0.1',
    ext_modules = ext_modules)
Then type 
bash$ rm *.so  # make sure something happens
bash$ sage setup.py develop
...
bash$ ls *.so
sum.so sum2.so


Notice that running
setup.py develop

resulted in Python generating the right gcc commmand lines for your platform. You don't have to do anything differently on Linux, OS X, etc.

If you change sum2.c, and want to rebuild it, just type sage setup.py develop again to rebuild sum2.so If you change sum.pyx, you have to manually run Cython:
sage -cython sum.pyx

then again do sage setup.py develop to rebuild sum.so. Try this now. In sum.pyx, change
for i in range(n):

to
for i in range(1,n+1):

then rebuild:
 
bash$ sage -cython sum.pyx
...
bash$ sage setup.py develop
...
bash$ sage
...
sage: import sum
sage: sum.sum_cython(100)
5050

There are ways to make setup.py automatically notice when sum.pyx changes, and run Cython. A nice implementation of this will be in the next Cython release. See the setup.py and build_system.py files of Purple sage for an example of how to write a little build system write now (before the new version of Cython).

An Automated Way to Experiment

Given any single Cython file such as sum.pyx, in Sage you can do
sage: load sum.pyx
Compiling sum.pyx...
sage: sum_cython(100)
5050
Behind the scenes, Sage created a setup.py file, ran Cython, made a new module, compiled it, and imported everything it defines into the global namespace.   If you look in the spyx subdirectory of the directory listed below, before you exit Sage (!), then you'll see all this. 
sage: SAGE_TMP
'/Users/wstein/.sage//temp/deep.local/14837/'

You can also do
sage: attach sum.pyx

Then every time sum.pyx changes, Sage will notice this and reload it. This can be useful for development of small chunks of Cython code.

You can also use the Sage notebook, and put %cython as the first line of a notebook cell. The rest of the cell will be compiled exactly as if it were written to a .pyx file and loaded as above. In fact, that is almost exactly what happens behind the scenes.

Next Time

Now that we understand at a reasonably deep level what Cython really is and does, it is time to learn about the various constructs of the language:
  1. How to create extension classes using Cython.
  2. How to call external C/C++ library code.

We will rewrite our sum.pyx file first to use a class. Then we'll rewrite it again to make use of the MPIR (or GMP) C library for arithmetic, and again to make use of the C++ NTL library.