Numba support

Warning

This is an experimental feature, available starting with release 2.4.0. It is still incomplete (see listing below) and has only been tested on Linux on x86_64.

Numba is a JIT compiler for Python functions that can be statically typed based on their input arguments. Since C++ objects are always statically typed and already implemented at the machine level, they can be dynamically integrated into the Numba type tracing and lowering by exposing type details through C++ reflection at runtime.

JIT-compiling traces of mixed Python/bound C++ code reduces (or even removes) the overhead of boxing/unboxing native data into their Python proxies and vice versa. It can also remove temporaries, especially for template expressions. Thus, there can be a significant speedup for mixed code, beyond the compilation of Python code itself. Note that the current implementation integrates compiled C++ into the intermediate representation (IR) as generated by Numba. A future version may integrate C++ (Cling-generated) IR with Numba IR, if the C++ code is exposed from (precompiled) header files, thus allowing inlining of C++ code into Numba traces, for further expected speedups.

Usage

cppyy exposes extension hooks that automatically make it available to Numba. If, however, you do not use setuptools (through pip or otherwise), you can load the extensions explicitly:

>>> import cppyy.numba_ext
>>>

After that, Numba is able to trace cppyy bound code. Note that the numba_ext module will register only the proxy base classes, to keep overheads to a minimum. The pre-registered base classes will, lazily and automatically, register further type information for the concrete classes and overloads on actual use in Numba traces, not from other uses.

Examples

The following, non-exhaustive, set of examples gives an idea of the current level of support. The overall goal, however, is to be able to use cppyy bound code in Numba traces in the same as it can be used in normal Python code.

C++ free (global) functions can be called and overloads will be selected, or a template will be instantiated, based on the provided types, assuming all types match explicitly (thus e.g. typedefs, implicit conversions, and default arguments are not yet supported). Example:

>>> import cppyy
>>> import numba
>>> import numpy as np
>>>
>>> cppyy.cppdef("""
... template<typename T>
... T square(T t) { return t*t; }
... """)
True
>>> @numba.jit(nopython=True)
... def tsa(a):
...     total = type(a[0])(0)
...     for i in range(len(a)):
...         total += cppyy.gbl.square(a[i])
...     return total
...
>>> a = np.array(range(10), dtype=np.float32)
>>> print(tsa(a))
285.0
>>> a = np.array(range(10), dtype=np.int64)
>>> print(tsa(a))
285
>>>

Instances of C++ classes can be passed into Numba traces. They can be returned from functions called _within_ the trace, but can not yet be returned _from_ the trace. Their public data is accessible (read-only) if of builtin type and their public methods can be called, for which overload selection works. Example:

>>> import cppyy
>>> import numba
>>> import numpy as np
>>>
>>> cppyy.cppdef("""\
... class MyData {
... public:
...     MyData(int i, int j) : fField1(i), fField2(j) {}
...
... public:
...     int get_field1() { return fField1; }
...     int get_field2() { return fField2; }
...
...     MyData copy() { return *this; }
...
... public:
...     int fField1;
...     int fField2;
... };""")
True
>>> @numba.jit(nopython=True)
>>> def tsdf(a, d):
...     total = type(a[0])(0)
...     for i in range(len(a)):
...         total += a[i] + d.fField1 + d.fField2
...     return total
...
>>> d = cppyy.gbl.MyData(5, 6)
>>> a = np.array(range(10), dtype=np.int32)
>>> print(tsdf(a, d))
155
>>> # example of method calls
>>> @numba.jit(nopython=True)
>>> def tsdm(a, d):
...     total = type(a[0])(0)
...     for i in range(len(a)):
...         total += a[i] +  d.get_field1() + d.get_field2()
...     return total
...
>>> print(tsdm(a, d))
155
>>> # example of object return by-value
>>> @numba.jit(nopython=True)
>>> def tsdcm(a, d):
...     total = type(a[0])(0)
...     for i in range(len(a)):
...         total += a[i] + d.copy().fField1 + d.get_field2()
...     return total
...
>>> print(tsdcm(a, d))
155
>>>

Performance

The main overhead of JITing Numba traces is in Numba itself; optimization of the IR and assembly by the backend plays a much smaller role. The use of bound C++ does not change that, since its introspection by and large relies on the same mechanisms as that of Python code. For example, it takes the same amount of wall clock time to JIT a trace using Numba’s included math functions (from module math or numpy) as one using C++ bound ones whether from the standard library or templated versions from e.g. Eigen. Use of very complex template expressions may change this balance, but in principle, wherever it makes sense in the first place to use Numba JITing, it is also fine, performance-wise, to use cppyy bound C++ inside the trace.

A second important overhead is in unboxing Python proxies of C++ objects, in particular when passed as an argument to a Numba-JITed function. The main costs are in the lookup (types are matched at every invocation) and to a lesser extent the subsequent copying of the instance data. Thus, functions that take a C++ object as an argument will require more time spent in the function body for JITing to be worth it than functions that do not.

The current implementation invokes C++ callables through function pointers and accesses data through offsets calculations from the object’s base address. A future implementation will be able to inline C++ into the Numba trace if code is available in headers files or was JITed.