diff options
Diffstat (limited to 'weave/doc/tutorial.html')
-rw-r--r-- | weave/doc/tutorial.html | 2721 |
1 files changed, 2721 insertions, 0 deletions
diff --git a/weave/doc/tutorial.html b/weave/doc/tutorial.html new file mode 100644 index 000000000..4cb25a7d8 --- /dev/null +++ b/weave/doc/tutorial.html @@ -0,0 +1,2721 @@ + +<h1>Compiler Documentation</h1> +<p> +By Eric Jones eric@enthought.com +<p> +<h2>Outline</h2> +<dl> +<dd> <A href="#Introduction">Introduction</a> +<dd> <A href="#Requirements">Requirements</a> +<dd> <A href="#Installation">Installation and Testing</a> +<dd> <A href="#Inline">Inline</a> + <dl> + <dd><A href="#More with printf">More with printf</a> + <dd> + <A href="#More examples">More examples</a> + <dl> + <dd><A href="#Binary search">Binary search</a> + <dd><A href="#Dictionary sort">Dictionary sort</a> + <dd><A href="#Numeric -- cast/copy/transpose">Numeric -- cast/copy/transpose</a> + <dd><A href="#wxPython">wxPython</a></dd> + </dl> + <dd><A href="#Keyword options">Keyword options</a> + <dd><A href="#Returning values">Returning values</a> + <dl> + <dd><A href="#The issue with locals()"> + The issue with <code>locals()</code></a></dd> + </dl> + <dd><A href="#inline_quick_look_at_code">A quick look at the code</a> + <dd> + <A href="#inline_technical_details">Technical Details</a> + <dl> + <dd><A href="#Converting Types">Converting Types</a> + <dl> + <dd><A href="#inline_numeric_argument_conversion"> + Numeric Argument Conversion</a> + <dd><A href="#inline_python_argument_conversion"> + String, List, Tuple, and Dictionary Conversion</a> + <dd><A href="#inline_callable_argument_conversion">File Conversion</a> + <dd><A href="#inline_callable_argument_conversion"> + Callable, Instance, and Module Conversion</a> + <dd><A href="#Customizing Conversions">Customizing Conversions</a> + </dl> + <dd><A href="#Compiling Code">Compiling Code</a> + <dd><a href="#The Catalog">"Cataloging" functions</a> + <dl> + <dd><a href="#function storage">Function Storage</a> + <dd><a href="#PYTHONCOMPILED">The PYTHONCOMPILED evnironment variable</a></dd> + </dl> + </dd> + </dl> + </dd> + </dl> +<dd><A href="#Blitz">Blitz</a> + <dl> + <dd><a href="#blitz_requirements">Requirements</a> + <dd><a href="#blitz_limitations">Limitations</a> + <dd><a href="#Numeric Efficiency">Numeric Efficiency Issues</a> + <dd><a href="#blitz_tools">The Tools</a> + <dl> + <dd><a href="#blitz_parser">Parser</a> + <dd><a href="#blitz_blitz">Blitz and Numeric</a> + </dl> + <dd><a href="#blitz_type_conversions">Type defintions and coersion</a> + <dd><a href="#blitz_catalog">Cataloging Compiled Functions</a> + <dd><a href="#blitz_array_sizes">Checking Array Sizes</a> + <dd><a href="#blitz_extension_module">Creating the Extension Module</a> + </dl> +<dd> <a href="#Extension Modules"> Extension Modules</a> + <dl> + <dd><a href="#A Simple Example">A Simple Example</a> + <dd><a href="#Fibonacci Example">Fibonacci Example</a> + </dl> +<dd> <a href="#Type Factories"> Customizing Type Conversions -- Type Factories (not written)</a> + <dl> + <dd>Type Specifications + <dd>Type Information + <dd>The Conversion Process + </dl> +</dl> +<a name="Introduction"></a> +<h1>Introduction</h1> + +<p> +The <code>compiler</code> package allows the inclusion of C/C++ within +Python code. This offers both another level of optimization to those who need +it, and an easy way to modify and extend any supported extension libraries such +as wxPython and hopefully VTK soon. Inlining C/C++ code within Python generally +results in speed ups of 1.5x to 30x speed-up over algorithms written in pure +Python (However, it is also possible to slow things down...). Generally +algorithms that require a large number of calls to the Python API don't benefit +as much from the conversion to C/C++ as algorithms that have inner loops +completely convertable to C. +<p> +There are three basic ways to use <code>compiler</code>. The +<code>compiler.inline()</code> function executes C code directly within Python, +and <code>compiler.blitz()</code> translates Python Numeric expressions to C++ +for fast execution. This was the original functionality for which +<code>compiler</code> was built. For those interested in building extension +libraries, the <code>ext_tools</code> module provides classes for building +extension modules within Python. +<p> +Most of <code>compiler's</code> functionality should work on Windows and Unix, +although some of its functionality requires <code>gcc</code> or a similarly +modern C++ compiler that handles templates well. Up to now, most testing has +been done on Windows 2000 with Microsoft's C++ compiler (MSVC) and with gcc +(mingw32 2.95.2 and 2.95.3-6). All tests also seem to pass on Linux (RH 7.1 +with gcc 2.96). +<p> +The <code>inline</code> and <code>blitz</code> provide new functionality to +Python (although I've recently learned about the <a +href="http://pyinline.sourceforge.net/" >PyInline</a> project which may offer +similar functionality to <code>inline</code>). On the other hand, tools for +building Python extension modules already exists (SWIG, SIP, pycpp, CXX, and +others). As of yet, I'm not sure where <code>compiler</code> fits in this +spectrum. It is closest in flavor to CXX in that it makes creating new C/C++ +extension modules pretty easy. However, If you're wrapping a gaggle of legacy +functions or classes, SWIG and friends are definitely the better choice. +<code>compiler</code> is set up so that you can customize how Python types are +converted to C types in <code>compiler</code>. This is great for +<code>inline()</code>, but, for wrapping legacy code, it is generally better to +specify things the other way around -- that is how C types map to +Python types. This <code>compiler</code> does not do. I guess it would be +possible to build such a tool on top of <code>compiler</code>, but with good +tools like SWIG around, I'm not sure the effort produces any new capabilities. +Things like function overloading are probably easily implemented in +<code>compiler</code> and it might be easier to mix Python/C code in function +calls, but nothing beyond this comes to mind. So, if you're developing new +extension modules, or just want to optimize a few functions in C, +<code>compiler</code> might be the tool for you. If you're wrapping legacy code, +stick with SWIG. +<p> +The next several sections give the basics of how to use <code>compiler</code>. +We'll discuss what's happening under the covers in more detail later +on. Serious users will need to at least look at the type conversion section to +understand how Python variables map to C/C++ types and how to customize this +behavior. One other note. If you don't know C or C++ then these docs are +probably of very little help to you. Further, it'd be helpful if you know +something about writing Python extensions. <code>compiler</code> does quite a +bit for you, but for anything complex, you'll need to do some conversions, +reference counting, etc. +<p> +<em> +Note: </em><code>compiler</code><em> is actually part of the <a +href="http://www.scipy.org">SciPy</a> package. However, it works fine as a +standalone package. The examples here are given as if it is used as a stand +alone package. If you are using from within scipy, you can use <code> from +scipy import compiler</code> and the examples will work identically.</em> + +<a name="Requirements"></a> +<h1>Requirements</h1> +<ul> + <li> Python + <p> + I use 2.1.1. Probably 2.0 or higher should work. + <p> + </li> + + <li> C++ compiler + <p> + compiler uses <code>distutils</code> to actually build extension modules, + so it uses whatever compiler was originally used to build Python. + compiler itself requires a C++ compiler. If you used a C++ compiler + to build Python, your probably fine. + <p> + On Unix gcc is the preferred choice, because I've done a little + testing with it. All testing has been done with gcc, but I expect the + majority of compilers should work for <code>inline</code> and + <code>ext_tools</code>. The one issue I'm not sure about is that I've + hard coded things so that compilations are linked with the + <code>stdc++</code> library. Is this standard across + Unix compilers, or is this a gcc-ism? + <p> + For <code>blitz()</code>, you'll need a reasonably recent version of + gcc. 2.95.2 works on windows and 2.96 looks fine on Linux. Other + versions are likely to work. Its likely that KAI's C++ compiler and + maybe some others will work, but I haven't tried. My advise is to use + gcc for now unless your willing to tinker with the code some. + <p> + On Windows, either MSVC or gcc (<a + href="http://www.mingw.org>www.mingw.org" > mingw32</a>) should work. Again, + you'll need gcc for <code>blitz()</code> as the + MSVC compiler doesn't handle templates well. + <p> + I have not tried Cygwin, so please report success if it works for you. + <p> + </li> + + <li> Numeric (optional) + <p> + The python Numeric module from <a + href="http://www.pfdubois.com/numpy/">here</a>. is required for + <code>blitz()</code> to work. Be sure and get NumPy, not NumArray + which is the "next generation" implementation. + <p> + </li> + <li> scipy_distutils and scipy_test (packaged with compiler) + <p> + These two modules are packaged with <code>compiler</code> in both + the windows installer and the source distributions. If you are using + CVS, however, you'll need to download these separately (also available + through CVS at SciPy). + <p> + </li> +</ul> +<p> + +<a name="Installation"></a> +<h1>Installation and Testing</h1> +<p> +There are currently two ways to get <code>compiler</code>. Fist, +<code>compiler</code> is part of SciPy and installed automatically (as a sub- +package) whenever SciPy is installed (although the latest version isn't in +SciPy yet, so use this one for now). Second, since compiler is useful outside +of the scientific community, it has been setup so that it can be used as a +stand-alone module. + +<p> +The stand-alone version can be downloaded from <a +href="http://www.scipy.org/site_content/compiler">here</a>. Unix users should grab the +tar ball (.tgz file) and install it using the following commands. + + <blockquote><pre><code> + tar -xzvf compiler.tgz + cd compiler + python setup.py install + </code></pre></blockquote> + +This will also install two other packages, <code>scipy_distutils</code> and +<code>scipy_test</code>. The first is needed by the setup process itself and +both are used in the unit-testing process. For Windows users, it's even easier. +They can download the click-install .exe file and run it for automatic +installation. Numeric is required if you want to use <code>blitz()</code>, but +isn't necessary for <code>inline()</code> or <code>ext_tools</code> +<p> +If you're using the CVS version, you'll need to install scipy_distutils and +scipy_test modules (also available from CVS) on your own. +<p> +<em> Note: The dependency issue here is a little sticky. I hate to make people +download more than one file (and so I haven't), but distutils doesn't have a +way to do conditional installation -- at least that I know about. This can +lead to undesired clobbering of modules. What to do, what to do...</em> +<p> +Once <code>compiler</code> is installed, fire up python and run its unit tests. + + <blockquote><pre><code> + >>> import compiler + >>> compiler.test() + runs long time... spews tons of output + </code></pre></blockquote> + +This takes a loooong time. On windows, it is usually several minutes. On Unix +with remote file systems, I've had it take 15 or so minutes. In the end, it +should run about 150 tests and spew some speed results along the way. If you +get errors, please let me know. + +If you don't have Numeric installed, you'll get some module import errors +during the test setup phase for modules that are Numeric specific (blitz_spec, +blitz_tools, size_check, standard_array_spec, ast_tools), but all test should +pass (about 60 and the run time should be quite a bit less). +<p> +If you only want to test a single module of the package, you can do this by +running test() for that specific module. + + <blockquote><pre><code> + >>> import compiler.scalar_spec + >>> compiler.scalar_spec.test() + ....... + ---------------------------------------------------------------------- + Ran 7 tests in 23.284s + </code></pre></blockquote> +<em> +Note: I've had some test fail on windows machines where I have msvc, gcc-2.95.2 +(in c:\gcc-2.95.2), and gcc-2.95.3-6 (in c:\gcc) all installed. My environment +has c:\gcc in the path and does not have c:\gcc-2.95.2 in the path. The test +process runs very smoothly until the end where several test using gcc fail with +cpp0 not found by g++. If I check os.system('gcc -v') before running tests, I +get gcc-2.95.3-6. If I check after running tests (and after failure), I get +gcc-2.95.2. ??huh??. The os.environ['PATH'] still has c:\gcc first in it and +is not corrupted (msvc/distutils messes with the environment variables, so we +have to undo its work in some places). If anyone else sees this, let me know - +- it may just be an quirk on my machine (unlikely). Testing with the gcc- +2.95.2 installation always works. +</em> + +<a name="Inline"></a> +<h1>Inline</h1> +<p> +<code>inline()</code> compiles and executes C/C++ code on the fly. Variables +in the local and global Python scope are also available in the C/C++ code. +Values are passed to the C/C++ code by assignment much like variables +are passed into a standard Python function. Values are returned from the C/C++ +code through a special argument called return_val. Also, the contents of +mutable objects can be changed within the C/C++ code and the changes remain +after the C code exits and returns to Python. (more on this later) +<p> +Here's a trivial <code>printf</code> example using <code>inline()</code>: + + <blockquote><pre><code> + >>> import compiler + >>> a = 1 + >>> compiler.inline('printf("%d\\n",a);',['a']) + 1 + </code></pre></blockquote> +<p> +In this, its most basic form, <code>inline(c_code, var_list)</code> requires two +arguments. <code>c_code</code> is a string of valid C/C++ code. +<code>var_list</code> is a list of variable names that are passed from +Python into C/C++. Here we have a simple <code>printf</code> statement that +writes the Python variable <code>a</code> to the screen. The first time you run +this, there will be a pause while the code is written to a .cpp file, compiled +into an extension module, loaded into Python, cataloged for future use, and +executed. On windows (850 MHz PIII), this takes about 1.5 seconds when using +Microsoft's C++ compiler (MSVC) and 6-12 seconds using gcc (mingw32 2.95.2). +All subsequent executions of the code will happen very quickly because the code +only needs to be compiled once. If you kill and restart the compiler and then +execute the same code fragment again, there will be a much shorter delay in the +fractions of seconds range. This is because <code>compiler</code> stores a +catalog of all previously compiled functions in an on disk cache. When it sees +a string that has been compiled, it loads the already compiled module and +executes the appropriate function. +<p> +<em> +Note: If you try the <code>printf</code> example in a GUI shell such as IDLE, +PythonWin, PyShell, etc., you're unlikely to see the output. This is because the +C code is writing to stdout, instead of to the GUI window. This doesn't mean +that inline doesn't work in these environments -- it only means that standard +out in C is not the same as the standard out for Python in these cases. Non +input/output functions will work as expected. +</em> +<p> +Although effort has been made to reduce the overhead associated with calling +inline, it is still less efficient for simple code snippets than using +equivalent Python code. The simple <code>printf</code> example is actually +slower by 30% or so than using Python <code>print</code> statement. And, it is +not difficult to create code fragments that are 8-10 times slower using inline +than equivalent Python. However, for more complicated algorithms, +the speed up can be worth while -- anywhwere from 1.5- 30 times faster. +Algorithms that have to manipulate Python objects (sorting a list) usually only +see a factor of 2 or so improvement. Algorithms that are highly computational +or manipulate Numeric arrays can see much larger improvements. The +examples/vq.py file shows a factor of 30 or more improvement on the vector +quantization algorithm that is used heavily in information theory and +classification problems. +<p> + +<a name="More with printf"></a> +<h2>More with printf</h2> +<p> +MSVC users will actually see a bit of compiler output that distutils does not +supress the first time the code executes: + + <blockquote><pre><code> + >>> compiler.inline(r'printf("%d\n",a);',['a']) + sc_e013937dbc8c647ac62438874e5795131.cpp + Creating library C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp + \Release\sc_e013937dbc8c647ac62438874e5795131.lib and object C:\DOCUME + ~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_e013937dbc8c64 + 7ac62438874e5795131.exp + 1 + </code></pre></blockquote> +<p> +Nothing bead is happening, its just a bit annoying. <em> Anyone know how to +turn this off?</em> +<p> +This example also demonstrates using 'raw strings'. The <code>r</code> +preceeding the code string in the last example denotes that this is a 'raw +string'. In raw strings, the backslash character is not interpreted as an +escape character, and so it isn't necessary to use a double backslash to +indicate that the '\n' is meant to be interpreted in the C <code>printf</code> +statement instead of by Python. If your C code contains a lot +of strings and control characters, raw strings might make things easier. +Most of the time, however, standard strings work just as well. + +<p> +The <code>printf</code> statement in these examples is formatted to print +out integers. What happens if <code>a</code> is a string? <code>inline</code> +will happily, compile a new version of the code to accept strings as input, +and execute the code. The result? + + <blockquote><pre><code> + >>> a = 'string' + >>> compiler.inline(r'printf("%d\n",a);',['a']) + 32956972 + </code></pre></blockquote> +<p> +In this case, the result is non-sensical, but also non-fatal. In other +situations, it might produce a compile time error because <code>a</code> is +required to be an integer at some point in the code, or it could produce a +segmentation fault. Its possible to protect against passing +<code>inline</code> arguments of the wrong data type by using asserts in +Python. + + <blockquote><pre><code> + >>> a = 'string' + >>> def protected_printf(a): + ... assert(type(a) == type(1)) + ... compiler.inline(r'printf("%d\n",a);',['a']) + >>> protected_printf(1) + 1 + >>> protected_printf('string') + AssertError... + </code></pre></blockquote> + +<p> +For printing strings, the format statement needs to be changed. + + <blockquote><pre><code> + >>> a = 'string' + >>> compiler.inline(r'printf("%s\n",a);',['a']) + string + </code></pre></blockquote> + +<p> +As in this case, C/C++ code fragments often have to change to accept different +types. For the given printing task, however, C++ streams provide a way of a +single statement that works for integers and strings. By default, the stream +objects live in the std (standard) namespace and thus require the use of +<code>std::</code>. + + <blockquote><pre><code> + >>> compiler.inline('std::cout << a << std::endl;',['a']) + 1 + >>> a = 'string' + >>> compiler.inline('std::cout << a << std::endl;',['a']) + string + </code></pre></blockquote> + +<p> +Examples using <code>printf</code> and <code>cout</code> are included in +examples/print_example.py. + +<a name="More examples"></a> +<h2> More examples </h2> + +This section shows several more advanced uses of <code>inline</code>. It +includes a few algorithms from the <a +href="http://aspn.activestate.com/ASPN/Cookbook/Python">Python Cookbook</a> +that have been re-written in inline C to improve speed as well as a couple +examples using Numeric and wxPython. + +<a name="Binary search"></a> +<h3> Binary search</h3> +Lets look at the example of searching a sorted list of integers for a value. +For inspiration, we'll use Kalle Svensson's <a +href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81188"> +binary_search()</a> algorithm from the Python Cookbook. His recipe follows: + + <blockquote><pre><code> + def binary_search(seq, t): + min = 0; max = len(seq) - 1 + while 1: + if max < min: + return -1 + m = (min + max) / 2 + if seq[m] < t: + min = m + 1 + elif seq[m] > t: + max = m - 1 + else: + return m + </blockquote></PRE></CODE> + +This Python version works for arbitrary Python data types. The C version below is +specialized to handle integer values. There is a little type checking done in +Python to assure that we're working with the correct data types before heading +into C. The variables <code>seq</code> and <code>t</code> don't need to be +declared beacuse <code>compiler</code> handles converting and declaring them in +the C code. All other temporary variables such as <code>min, max</code>, etc. +must be declared -- it is C after all. Here's the new mixed Python/C function: + + <blockquote><pre><code> + def c_int_binary_search(seq,t): + # do a little type checking in Python + assert(type(t) == type(1)) + assert(type(seq) == type([])) + + # now the C code + code = """ + #line 29 "binary_search.py" + int val, m, min = 0; + int max = seq.length() - 1; + PyObject *py_val; + for(;;) + { + if (max < min ) + { + return_val = Py::new_reference_to(Py::Int(-1)); + break; + } + m = (min + max) /2; + val = py_to_int(PyList_GetItem(seq.ptr(),m),"val"); + if (val < t) + min = m + 1; + else if (val > t) + max = m - 1; + else + { + return_val = Py::new_reference_to(Py::Int(m)); + break; + } + } + """ + return inline(code,['seq','t']) + </code></pre></blockquote> +<p> +We have two variables <code>seq</code> and <code>t</code> passed in. +<code>t</code> is guaranteed (by the <code>assert</code>) to be an integer. +Python integers are converted to C int types in the transition from Python to +C. <code>seq</code> is a Python list. By default, it is translated to a CXX +list object. Full documentation for the CXX library can be found at its <a +href="http://cxx.sourceforge.net/">website</a>. The basics are that the CXX +provides C++ class equivalents for Python objects that simplify, or at +least object orientify, working with Python objects in C/C++. For example, +<code>seq.length()</code> returns the length of the list. A little more about +CXX and its class methods, etc. is in the ** type conversions ** section. +<p> +Most of the algorithm above looks similar in C to the original Python code. +There are two main differences. The first is the setting of +<code>return_val</code> instead of directly returning from the C code with a +<code>return</code> statement. <code>return_val</code> is an automatically +defined variable of type <code>PyObject*</code> that is returned from the C +code back to Python. You'll have to handle reference counting issues when +setting this variable. In this example, CXX classes and functions handle the +dirty work. All CXX functions and classes live in the namespace +<code>Py::</code>. The following code converts the integer <code>m</code> to a +CXX <code>Int()</code> object and then to a <code>PyObject*</code> with an +incremented reference count using <code>Py::new_reference_to()</code>. + + <blockquote><pre><code> + return_val = Py::new_reference_to(Py::Int(m)); + </code></pre></blockquote> +<p> +The second big differences shows up in the retrieval of integer values from the +Python list. The simple Python <code>seq[i]</code> call balloons into a C +Python API call to grab the value out of the list and then a separate call to +<code>py_to_int()</code> that converts the PyObject* to an integer. +<code>py_to_int()</code> includes both a NULL cheack and a +<code>PyInt_Check()</code> call as well as the conversion call. If either of +the checks fail, an exception is raised. The entire C++ code block is executed +with in a <code>try/catch</code> block that handles exceptions much like Python +does. This removes the need for most error checking code. +<p> +It is worth note that CXX lists do have indexing operators that result +in code that looks much like Python. However, the overhead in using them +appears to be relatively high, so the standard Python API was used on the +<code>seq.ptr()</code> which is the underlying <code>PyObject*</code> of the +List object. +<p> +The <code>#line</code> directive that is the first line of the C code +block isn't necessary, but it's nice for debugging. If the compilation fails +because of the syntax error in the code, the error will be reported as an error +in the Python file "binary_search.py" with an offset from the given line number +(29 here). +<p> +So what was all our effort worth in terms of efficiency? Well not a lot in +this case. The examples/binary_search.py file runs both Python and C versions +of the functions As well as using the standard <code>bisect</code> module. If +we run it on a 1 million element list and run the search 3000 times (for 0- +2999), here are the results we get: + + <blockquote><pre><code> + C:\home\ej\wrk\scipy\compiler\examples> python binary_search.py + Binary search for 3000 items in 1000000 length list of integers: + speed in python: 0.159999966621 + speed of bisect: 0.121000051498 + speed up: 1.32 + speed in c: 0.110000014305 + speed up: 1.45 + speed in c(no asserts): 0.0900000333786 + speed up: 1.78 + </code></pre></blockquote> +<p> +So, we get roughly a 50-75% improvement depending on whether we use the Python +asserts in our C version. If we move down to searching a 10000 element list, +the advantage evaporates. Even smaller lists might result in the Python +version being faster. I'd like to say that moving to Numeric lists (and +getting rid of the GetItem() call) offers a substantial speed up, but my +preliminary efforts didn't produce one. I think the log(N) algorithm is to +blame. Because the algorithm is nice, there just isn't much time spent +computing things, so moving to C isn't that big of a win. If there are ways to +reduce conversion overhead of values, this may improve the C/Python speed +up. Anyone have other explanations or faster code, please let me know. + +<a name="#Dictionary sort"></a> +<h3> Dictionary Sort</h3> +<p> +The demo in examples/dict_sort.py is another example from the Python CookBook. +<a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52306">This +submission</a>, by Alex Martelli, demonstrates how to return the values from a +dictionary sorted by their keys: + + <blockquote><pre><code> + def sortedDictValues3(adict): + keys = adict.keys() + keys.sort() + return map(adict.get, keys) + </code></pre></blockquote> +<p> +Alex provides 3 algorithms and this is the 3rd and fastest of the set. The C +version of this same algorithm follows: + + <blockquote><pre><code> + def c_sort(adict): + assert(type(adict) == type({})) + code = """ + #line 21 "dict_sort.py" + Py::List keys = adict.keys(); + Py::List items(keys.length()); keys.sort(); + PyObject* item = NULL; + for(int i = 0; i < keys.length();i++) + { + item = PyList_GET_ITEM(keys.ptr(),i); + item = PyDict_GetItem(adict.ptr(),item); + Py_XINCREF(item); + PyList_SetItem(items.ptr(),i,item); + } + return_val = Py::new_reference_to(items); + """ + return inline_tools.inline(code,['adict'],verbose=1) + </code></pre></blockquote> +<p> +Like the original Python function, the C++ version can handle any Python +dictionary regardless of the key/value pair types. It uses CXX objects for the +most part to declare python types in C++, but uses Python API calls to manipulate +their contents. Again, this choice is made for speed. The C++ version, while +more complicated, is about a factor of 2 faster than Python. + + <blockquote><pre><code> + C:\home\ej\wrk\scipy\compiler\examples> python dict_sort.py + Dict sort of 1000 items for 300 iterations: + speed in python: 0.319999933243 + [0, 1, 2, 3, 4] + speed in c: 0.151000022888 + speed up: 2.12 + [0, 1, 2, 3, 4] + </code></pre></blockquote> +<p> +<a name="#Numeric -- cast/copy/transpose"></a> +<h3>Numeric -- cast/copy/transpose</h3> + +CastCopyTranspose is a function called quite heavily by Linear Algebra routines +in the Numeric library. Its needed in part because of the row-major memory layout +of multi-demensional Python (and C) arrays vs. the col-major order of the underlying +Fortran algorithms. For small matrices (say 100x100 or less), a significant +portion of the common routines such as LU decompisition or singular value decompostion +are spent in this setup routine. This shouldn't happen. Here is the Python +version of the function using standard Numeric operations. + + <blockquote><pre><code> + def _castCopyAndTranspose(type, array): + if a.typecode() == type: + cast_array = copy.copy(Numeric.transpose(a)) + else: + cast_array = copy.copy(Numeric.transpose(a).astype(type)) + return cast_array + </code></pre></blockquote> + +And the following is a inline C version of the same function: + + <blockquote><pre><code> + from compiler.blitz_tools import blitz_type_factories + from compiler import scalar_spec + from compiler import inline + def _cast_copy_transpose(type,a_2d): + assert(len(shape(a_2d)) == 2) + new_array = zeros(shape(a_2d),type) + numeric_type = scalar_spec.numeric_to_blitz_type_mapping[type] + code = \ + """ + for(int i = 0;i < _Na_2d[0]; i++) + for(int j = 0; j < _Na_2d[1]; j++) + new_array(i,j) = (%s) a_2d(j,i); + """ % numeric_type + inline(code,['new_array','a_2d'], + type_factories = blitz_type_factories,compiler='gcc') + return new_array + </code></pre></blockquote> + +This example uses blitz++ arrays instead of the standard representation of +Numeric arrays so that indexing is simplier to write. This is accomplished by +passing in the blitz++ "type factories" to override the standard Python to C++ +type conversions. Blitz++ arrays allow you to write clean, fast code, but they +also are sloooow to compile (20 seconds or more for this snippet). This is why +they aren't the default type used for Numeric arrays (and also because most +compilers can't compile blitz arrays...). <code>inline()</code> is also forced +to use 'gcc' as the compiler because the default compiler on Windows (MSVC) +will not compile blitz code. <em> 'gcc' I think will use the standard compiler +on Unix machine instead of explicitly forcing gcc (check this) </em> + +Comparisons of the Python vs inline C++ code show a factor of 3 speed up. Also +shown are the results of an "inplace" transpose routine that can be used if the +output of the linear algebra routine can overwrite the original matrix (this is +often appropriate). This provides another factor of 2 improvement. + + <blockquote><pre><code> + #C:\home\ej\wrk\scipy\compiler\examples> python cast_copy_transpose.py + # Cast/Copy/Transposing (150,150)array 1 times + # speed in python: 0.870999932289 + # speed in c: 0.25 + # speed up: 3.48 + # inplace transpose c: 0.129999995232 + # speed up: 6.70 + </code></pre></blockquote> + +<a name="#wxPython" a <> +<h3>wxPython</h3> + +<code>inline</code> knows how to handle wxPython objects. Thats nice in and of +itself, but it also demonstrates that the type conversion mechanism is reasonably +flexible. Chances are, it won't take a ton of effort to support special types +you might have. The examples/wx_example.py borrows the scrolled window +example from the wxPython demo, accept that it mixes inline C code in the middle +of the drawing function. + + <blockquote><pre><code> + def DoDrawing(self, dc): + + red = wxNamedColour("RED"); + blue = wxNamedColour("BLUE"); + grey_brush = wxLIGHT_GREY_BRUSH; + code = \ + """ + #line 108 "wx_example.py" + dc->BeginDrawing(); + dc->SetPen(wxPen(*red,4,wxSOLID)); + dc->DrawRectangle(5,5,50,50); + dc->SetBrush(*grey_brush); + dc->SetPen(wxPen(*blue,4,wxSOLID)); + dc->DrawRectangle(15, 15, 50, 50); + """ + inline(code,['dc','red','blue','grey_brush']) + + dc.SetFont(wxFont(14, wxSWISS, wxNORMAL, wxNORMAL)) + dc.SetTextForeground(wxColour(0xFF, 0x20, 0xFF)) + te = dc.GetTextExtent("Hello World") + dc.DrawText("Hello World", 60, 65) + + dc.SetPen(wxPen(wxNamedColour('VIOLET'), 4)) + dc.DrawLine(5, 65+te[1], 60+te[0], 65+te[1]) + ... + </code></pre></blockquote> + +Here, some of the Python calls to wx objects were just converted to C++ calls. There +isn't any benefit, it just demonstrates the capabilities. You might want to use this +if you have a computationally intensive loop in your drawing code that you want to +speed up. + +On windows, you'll have to use the MSVC compiler if you use the standard wxPython +DLLs distributed by Robin Dunn. Thats because MSVC and gcc, while binary +compatible in C, are not binary compatible for C++. In fact, its probably best, no +matter what platform you're on, to specify that <code>inline</code> use the same +compiler that was used to build wxPython to be on the safe side. There isn't currently +a way to learn this info from the library -- you just have to know. Also, at least +on the windows platform, you'll need to install the wxWindows libraries and link to +them. I think there is a way around this, but I haven't found it yet -- I get some +linking errors dealing with wxString. One final note. You'll probably have to +tweak compiler/wx_spec.py or compiler/wx_info.py for your machine's configuration to +point at the correct directories etc. There. That should sufficiently scare people +into not even looking at this... :) + +<a name="Keyword Options"></a> +<h2> Keyword Options </h2> +<p> +The basic definition of the <code>inline()</code> function has a slew of +optional variables. It also takes keyword arguments that are passed to +<code>distutils</code> as compiler options. The following is a formatted +cut/paste of the argument section of <code>inline's</code> doc-string. It +explains all of the variables. Some examples using various options will +follow. + + <blockquote><pre><code> + def inline(code,arg_names,local_dict = None, global_dict = None, + force = 0, + compiler='', + verbose = 0, + support_code = None, + customize=None, + type_factories = None, + auto_downcast=1, + **kw): + </code></pre></blockquote> + + +<code>inline</code> has quite +a few options as listed below. Also, the keyword arguments for distutils +extension modules are accepted to specify extra information needed for +compiling. +<BLOCKQUOTE></BLOCKQUOTE> +<h4>inline Arguments:</h4> +<blockquote> +<dl> +<dt>code </dt> + +<dd> +string. A string of valid C++ code. It should not + specify a return statement. Instead it should assign results that need to be + returned to Python in the return_val. +</dd> + +<dt>arg_names </dt> + +<dd> +list of strings. A list of Python variable names + that should be transferred from Python into the C/C++ code. +</dd> + +<dt>local_dict </dt> + +<dd> +optional. dictionary. If specified, it is a + dictionary of values that should be used as the local scope for the C/C++ + code. If local_dict is not specified the local dictionary of the calling + function is used. +</dd> + +<dt>global_dict </dt> + +<dd> +optional. dictionary. If specified, it is a + dictionary of values that should be used as the global scope for the C/C++ + code. If global_dict is not specified the global dictionary of the calling + function is used. +</dd> + +<dt>force </dt> + +<dd> +optional. 0 or 1. default 0. If 1, the C++ code is + compiled every time inline is called. This is really only useful for + debugging, and probably only useful if you're editing support_code a lot. +</dd> + +<dt>compiler </dt> + +<dd> +optional. string. The name of compiler to use when compiling. On windows, it +understands 'msvc' and 'gcc' as well as all the compiler names understood by +distutils. On Unix, it'll only understand the values understoof by distutils. +(I should add 'gcc' though to this). +<p> +On windows, the compiler defaults to the Microsoft C++ compiler. If this isn't +available, it looks for mingw32 (the gcc compiler). +<p> +On Unix, it'll probably use the same compiler that was used when compiling +Python. Cygwin's behavior should be similar.</p> +</dd> + +<dt>verbose </dt> + +<dd> +optional. 0,1, or 2. defualt 0. Speficies how much + much information is printed during the compile phase of inlining code. 0 is + silent (except on windows with msvc where it still prints some garbage). 1 + informs you when compiling starts, finishes, and how long it took. 2 prints + out the command lines for the compilation process and can be useful if you're + having problems getting code to work. Its handy for finding the name of the + .cpp file if you need to examine it. verbose has no affect if the + compilation isn't necessary. +</dd> + +<dt>support_code </dt> + +<dd> +optional. string. A string of valid C++ code + declaring extra code that might be needed by your compiled function. This + could be declarations of functions, classes, or structures. +</dd> + +<dt>customize </dt> + +<dd> +optional. base_info.custom_info object. An + alternative way to specifiy support_code, headers, etc. needed by the + function see the compiler.base_info module for more details. (not sure + this'll be used much). + +</dd> +<dt>type_factories </dt> + +<dd> +optional. list of type specification factories. These guys are what convert +Python data types to C/C++ data types. If you'd like to use a different set of +type conversions than the default, specify them here. Look in the type +conversions section of the main documentation for examples. +</dd> +<dt>auto_downcast </dt> + +<dd> +optional. 0 or 1. default 1. This only affects functions that have Numeric +arrays as input variables. Setting this to 1 will cause all floating point +values to be cast as float instead of double if all the Numeric arrays are of +type float. If even one of the arrays has type double or double complex, all +variables maintain there standard types. +</dd> +</dl> +</blockquote> + +<h4> Distutils keywords:</h4> +<blockquote> +<code>inline()</code> also accepts a number of <code>distutils</code> keywords +for controlling how the code is compiled. The following descriptions have been +copied from Greg Ward's <code>distutils.extension.Extension</code> class doc- +strings for convenience: + +<dl> +<dt>sources </dt> + +<dd> +[string] list of source filenames, relative to the + distribution root (where the setup script lives), in Unix form + (slash-separated) for portability. Source files may be C, C++, SWIG (.i), + platform-specific resource files, or whatever else is recognized by the + "build_ext" command as source for a Python extension. Note: The module_path + file is always appended to the front of this list +</dd> + +<dt>include_dirs </dt> + +<dd> +[string] list of directories to search for C/C++ + header files (in Unix form for portability) +</dd> + +<dt>define_macros </dt> + +<dd> +[(name : string, value : string|None)] list of + macros to define; each macro is defined using a 2-tuple, where 'value' is + either the string to define it to or None to define it without a particular + value (equivalent of "#define FOO" in source or -DFOO on Unix C compiler + command line) +</dd> +<dt>undef_macros </dt> + +<dd> +[string] list of macros to undefine explicitly +</dd> +<dt>library_dirs </dt> +<dd> +[string] list of directories to search for C/C++ libraries at link time +</dd> +<dt>libraries </dt> +<dd> +[string] list of library names (not filenames or paths) to link against +</dd> +<dt>runtime_library_dirs </dt> +<dd> +[string] list of directories to search for C/C++ libraries at run time (for +shared extensions, this is when the extension is loaded) +</dd> + +<dt>extra_objects </dt> + +<dd> +[string] list of extra files to link with (eg. + object files not implied by 'sources', static library that must be + explicitly specified, binary resource files, etc.) +</dd> + +<dt>extra_compile_args </dt> + +<dd> +[string] any extra platform- and compiler-specific + information to use when compiling the source files in 'sources'. For + platforms and compilers where "command line" makes sense, this is typically + a list of command-line arguments, but for other platforms it could be + anything. +</dd> +<dt>extra_link_args </dt> + +<dd> +[string] any extra platform- and compiler-specific + information to use when linking object files together to create the + extension (or to create a new static Python interpreter). Similar + interpretation as for 'extra_compile_args'. +</dd> +<dt>export_symbols </dt> + +<dd> +[string] list of symbols to be exported from a shared extension. Not used on +all platforms, and not generally necessary for Python extensions, which +typically export exactly one symbol: "init" + extension_name. +</dd> +</dl> +</blockquote> + +<a name="Keyword Option Examples"></a> +<h3> Keyword Option Examples</h3> +We'll walk through several examples here to demonstrate the behavior of +<code>inline</code> and also how the various arguments are used. + +In the simplest (most) cases, <code>code</code> and <code>arg_names</code> +are the only arguments that need to be specified. Here's a simple example +run on Windows machine that has Microsoft VC++ installed. + + <blockquote><pre><code> + >>> from compiler import inline + >>> a = 'string' + >>> code = """ + ... int l = a.length(); + ... return_val = Py::new_reference_to(Py::Int(l)); + ... """ + >>> inline(code,['a']) + sc_86e98826b65b047ffd2cd5f479c627f12.cpp + Creating + library C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b047ffd2cd5f479c627f12.lib + and object C:\DOCUME~ 1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b047ff + d2cd5f479c627f12.exp + 6 + >>> inline(code,['a']) + 6 + </code></pre></blockquote> + +When <code>inline</code> is first run, you'll notice that pause and some +trash printed to the screen. The "trash" is acutually part of the compilers +output that distutils does not supress. On Unix or windows machines with only +gcc installed, the trash will not appear. On the second call, the code +fragment is not compiled since it already exists, and only the answer is +returned. Now kill the interpreter and restart, and run the same code with +a different string. + + <blockquote><pre><code> + >>> from compiler import inline + >>> a = 'a longer string' + >>> code = """ + ... int l = a.length(); + ... return_val = Py::new_reference_to(Py::Int(l)); + ... """ + >>> inline(code,['a']) + 15 + </code></pre></blockquote> +<p> +Notice this time, <code>inline()</code> did not recompile the code because it +found the compiled function in the persistent catalog of functions. There is +a short pause as it looks up and loads the function, but it is much shorter +than compiling would require. +<p> +You can specify the local and global dictionaries if you'd like (much like +<code>exec</code> or <code>eval()</code> in Python), but if they aren't +specified, the "expected" ones are used -- i.e. the ones from the function that +called <code>inline() </code>. This is accomplished through a little call +frame trickery. Here is an example where the local_dict is specified using +the same code example from above: + + <blockquote><pre><code> + >>> a = 'a longer string' + >>> b = 'an even longer string' + >>> my_dict = {'a':b} + >>> inline(code,['a']) + 15 + >>> inline(code,['a'],my_dict) + 21 + </code></pre></blockquote> + +<p> +Everytime, the <code>code</code> is changed, <code>inline</code> does a +recompile. However, changing any of the other options in inline does not +force a recompile. The <code>force</code> option was added so that one +could force a recompile when tinkering with other variables. In practice, +it is just as easy to change the <code>code</code> by a single character +(like adding a space some place) to force the recompile. <em>Note: It also +might be nice to add some methods for purging the cache and on disk +catalogs.</em> +<p> +I use <code>verbose</code> sometimes for debugging. When set to 2, it'll +output all the information (including the name of the .cpp file) that you'd +expect from running a make file. This is nice if you need to examine the +generated code to see where things are going haywire. Note that error +messages from failed compiles are printed to the screen even if <code>verbose +</code> is set to 0. +<p> +The following example demonstrates using gcc instead of the standard msvc +compiler on windows using same code fragment as above. Because the example has +already been compiled, the <code>force=1</code> flag is needed to make +<code>inline()</code> ignore the previously compiled version and recompile +using gcc. The verbose flag is added to show what is printed out: + + <blockquote><pre><code> + >>>inline(code,['a'],compiler='gcc',verbose=2,force=1) + running build_ext + building 'sc_86e98826b65b047ffd2cd5f479c627f13' extension + c:\gcc-2.95.2\bin\g++.exe -mno-cygwin -mdll -O2 -w -Wstrict-prototypes -IC: + \home\ej\wrk\scipy\compiler -IC:\Python21\Include -c C:\DOCUME~1\eric\LOCAL + S~1\Temp\python21_compiled\sc_86e98826b65b047ffd2cd5f479c627f13.cpp -o C:\D + OCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b04 + 7ffd2cd5f479c627f13.o + skipping C:\home\ej\wrk\scipy\compiler\CXX\cxxextensions.c (C:\DOCUME~1\eri + c\LOCALS~1\Temp\python21_compiled\temp\Release\cxxextensions.o up-to-date) + skipping C:\home\ej\wrk\scipy\compiler\CXX\cxxsupport.cxx (C:\DOCUME~1\eric + \LOCALS~1\Temp\python21_compiled\temp\Release\cxxsupport.o up-to-date) + skipping C:\home\ej\wrk\scipy\compiler\CXX\IndirectPythonInterface.cxx (C:\ + DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\indirectpythonin + terface.o up-to-date) + skipping C:\home\ej\wrk\scipy\compiler\CXX\cxx_extensions.cxx (C:\DOCUME~1\ + eric\LOCALS~1\Temp\python21_compiled\temp\Release\cxx_extensions.o up-to-da + te) + writing C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86 + e98826b65b047ffd2cd5f479c627f13.def + c:\gcc-2.95.2\bin\dllwrap.exe --driver-name g++ -mno-cygwin -mdll -static - + -output-lib C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\l + ibsc_86e98826b65b047ffd2cd5f479c627f13.a --def C:\DOCUME~1\eric\LOCALS~1\Te + mp\python21_compiled\temp\Release\sc_86e98826b65b047ffd2cd5f479c627f13.def + -s C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e9882 + 6b65b047ffd2cd5f479c627f13.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compil + ed\temp\Release\cxxextensions.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_com + piled\temp\Release\cxxsupport.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_com + piled\temp\Release\indirectpythoninterface.o C:\DOCUME~1\eric\LOCALS~1\Temp + \python21_compiled\temp\Release\cxx_extensions.o -LC:\Python21\libs -lpytho + n21 -o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\sc_86e98826b65b047f + fd2cd5f479c627f13.pyd + 15 + </code></pre></blockquote> + +That's quite a bit of output. <code>verbose=1</code> just prints the compile +time. + + <blockquote><pre><code> + >>>inline(code,['a'],compiler='gcc',verbose=1,force=1) + Compiling code... + finished compiling (sec): 6.00800001621 + 15 + </code></pre></blockquote> + +<p> +<em> Note: I've only used the <code>compiler</code> option for switching between 'msvc' +and 'gcc' on windows. It may have use on Unix also, but I don't know yet. +</em> + +<p> +The <code>support_code</code> argument is likely to be used a lot. It allows +you to specify extra code fragments such as function, structure or class +definitions that you want to use in the <code>code</code> string. Note that +changes to <code>support_code</code> do <em>not</em> force a recompile. The +catalog only relies on <code>code</code> (for performance reasons) to determine +whether recompiling is necessary. So, if you make a change to support_code, +you'll need to alter <code>code</code> in some way or use the +<code>force</code> argument to get the code to recompile. I usually just add +some inocuous whitespace to the end of one of the lines in <code>code</code> +somewhere. Here's an example of defining a separate method for calculating +the string length: + + <blockquote><pre><code> + >>> from compiler import inline + >>> a = 'a longer string' + >>> support_code = """ + ... PyObject* length(Py::String a) + ... { + ... int l = a.length(); + ... return Py::new_reference_to(Py::Int(l)); + ... } + ... """ + >>> inline("return_val = length(a);",['a'], + ... support_code = support_code) + 15 + </code></pre></blockquote> +<p> +<code>customize</code> is a left over from a previous way of specifying +compiler options. It is a <code>custom_info</code> object that can specify +quite a bit of information about how a file is compiled. These +<code>info</code> objects are the standard way of defining compile information +for type conversion classes. However, I don't think they are as handy here, +especially since we've exposed all the keyword arguments that distutils can +handle. Between these keywords, and the <code>support_code</code> option, I +think <code>customize</code> may be obsolete. We'll see if anyone cares to use +it. If not, it'll get axed in the next version. +<p> +The <code>type_factories</code> variable is important to people who want to +customize the way arguments are converted from Python to C. We'll talk about +this in the next chapter **xx** of this document when we discuss type +conversions. +<p> +<code>auto_downcast</code> handles one of the big type conversion issues that +is common when using Numeric arrays in conjunction with Python scalar values. +If you have an array of single precision values and multiply that array by a +Python scalar, the result is upcast to a double precision array because the +scalar value is double precision. This is not usually the desired behavior +because it can double your memory usage. <code>auto_downcast</code> goes +some distance towards changing the casting precedence of arrays and scalars. +If your only using single precision arrays, it will automatically downcast all +scalar values from double to single precision when they are passed into the +C++ code. This is the default behavior. If you want all values to keep there +default type, set <code>auto_downcast</code> to 0. +<p> + + +<a name="Returning Values"></a> +<h3> Returning Values</h3> + +Python variables in the local and global scope transfer seemlessly from Python +into the C++ snippets. And, if <code>inline</code> were to completely live up +to its name, any modifications to variables in the C++ code would be reflected +in the Python variables when control was passed back to Python. For example, +the desired behavior would be something like: + + <blockquote><pre><code> + # THIS DOES NOT WORK + >>> a = 1 + >>> compiler.inline("a++;",['a']) + >>> a + 2 + </code></pre></blockquote> + +Instead you get: + + <blockquote><pre><code> + >>> a = 1 + >>> compiler.inline("a++;",['a']) + >>> a + 1 + </code></pre></blockquote> + +Variables are passed into C++ as if you are calling a Python function. Python's +calling convention is sometimes called "pass by assignment". This means its as +if a <code>c_a = a</code> assignment is made right before <code>inline</code> +call is made and the <code>c_a</code> variable is used within the C++ code. +Thus, any changes made to <code>c_a</code> are not reflected in Python's +<code>a</code> variable. Things do get a little more confusing, however, when +looking at variables with mutable types. Changes made in C++ to the contents +of mutable types <em>are</em> reflected in the Python variables. + + <blockquote><pre><code> + >>> a= [1,2] + >>> compiler.inline("PyList_SetItem(a.ptr(),0,PyInt_FromLong(3));",['a']) + >>> print a + [3, 2] + </code></pre></blockquote> + +So modifications to the contents of mutable types in C++ are seen when control +is returned to Python. Modifications to immutable types such as tuples, +strings, and numbers do not alter the Python variables. + +If you need to make changes to an immutable variable, you'll need to assign +the new value to the "magic" variable <code>return_val</code> in C++. This +value is returned by the <code>inline()</code> function: + + <blockquote><pre><code> + >>> a = 1 + >>> a = compiler.inline("return_val = Py::new_reference_to(Py::Int(a+1));",['a']) + >>> a + 2 + </code></pre></blockquote> + +The <code>return_val</code> variable can also be used to return newly created +values. This is possible by returning a tuple. The following trivial example +illustrates how this can be done: + + <blockquote><pre><code> + # python version + def multi_return(): + return 1, '2nd' + + # C version. + def c_multi_return(): + code = """ + Py::Tuple results(2); + results[0] = Py::Int(1); + results[1] = Py::String("2nd"); + return_val = Py::new_reference_to(results); + """ + return inline_tools.inline(code,[]) + </code></pre></blockquote> +<p> +The example is available in <code>examples/tuple_return.py</code>. It also +has the dubious honor of demonstrating how much <code>inline()</code> can +slow things down. The C version here is about 10 times slower than the Python +version. Of course, something so trivial has no reason to be written in +C anyway. + +<a name="The issue with locals()"></a> +<h4> The issue with <code>locals()</code></h4> +<p> +<code>inline</code> passes the <code>locals()</code> and <code>globals()</code> +dictionaries from Python into the C++ function from the calling function. It +extracts the variables that are used in the C++ code from these dictionaries, +converts then to C++ variables, and then calculates using them. It seems like +it would be trivial, then, after the calculations were finished to then insert +the new values back into the <code>locals()</code> and <code>globals()</code> +dictionaries so that the modified values were reflected in Python. +Unfortunately, as pointed out by the Python manual, the locals() dictionary is +not writable. +<p> +<em> +I suspect <code>locals()</code> is not writable because there are some +optimizations done to speed lookups of the local namespace. I'm guessing local +lookups don't always look at a dictionary to find values. Can someone "in the +know" confirm or correct this? Another thing I'd like to know is whether there +is a way to write to the local namespace of another stack frame from C/C++. If +so, it would be possible to have some clean up code in compiled functions that +wrote final values of variables in C++ back to the correct Python stack frame. +I think this goes a long way toward making <code>inline</code> truely live up +to its name. I don't think we'll get to the point of creating variables in +Python for variables created in C -- although I suppose with a C/C++ parser you +could do that also. +</em> +<p> + +<a name="inline_quick_look_at_code"></a> +<h3>A quick look at the code</h3> + +<code>compiler</code> generates a C++ file holding an extension function for +each <code>inline</code> code snippet. These file names are generated using +from the md5 signature of the code snippet and saved to a location specified by +the PYTHONCOMPILED environment variable (discussed later). The cpp files are +generally about 200-400 lines long and include quite a few functions to support +type conversions, etc. However, the actual compiled function is pretty simple. +Below is the familiar <code>printf</code> example: + + <blockquote><pre><code> + >>> import compiler + >>> a = 1 + >>> compiler.inline('printf("%d\\n",a);',['a']) + 1 + </code></pre></blockquote> + +And here is the extension function generated by <code>inline</code>: + + <blockquote><pre><code> + static PyObject* compiled_func(PyObject*self, PyObject* args) + { + // The Py_None needs an incref before returning + PyObject *return_val = NULL; + int exception_occured = 0; + PyObject *py__locals = NULL; + PyObject *py__globals = NULL; + PyObject *py_a; + py_a = NULL; + + if(!PyArg_ParseTuple(args,"OO:compiled_func",&py__locals,&py__globals)) + return NULL; + try + { + PyObject* raw_locals = py_to_raw_dict(py__locals,"_locals"); + PyObject* raw_globals = py_to_raw_dict(py__globals,"_globals"); + int a = py_to_int (get_variable("a",raw_locals,raw_globals),"a"); + /* Here is the inline code */ + printf("%d\n",a); + /* I would like to fill in changed locals and globals here... */ + } + catch( Py::Exception& e) + { + return_val = Py::Null(); + exception_occured = 1; + } + if(!return_val && !exception_occured) + { + + Py_INCREF(Py_None); + return_val = Py_None; + } + /* clean up code */ + + /* return */ + return return_val; + } + </code></pre></blockquote> + +Every inline function takes exactly two arguments -- the local and global +dictionaries for the current scope. All variable values are looked up out +of these dictionaries. The lookups, along with all <code>inline</code> code +execution, are done within a C++ <code>try</code> block. If the variables +aren't found, or there is an error converting a Python variable to the +appropriate type in C++, an exception is raised. The C++ exception +is automatically converted to a Python exception by CXX and returned to Python. + +The <code>py_to_int()</code> function illustrates how the conversions and +exception handling works. py_to_int first checks that the given PyObject* +pointer is not NULL and is a Python integer. If all is well, it calls the +Python API to convert the value to an <code>int</code>. Otherwise, it calls +<code>handle_bad_type()</code> which gathers information about what went wrong +and then raises a CXX TypeError which returns to Python as a TypeError. + + <blockquote><pre><code> + int py_to_int(PyObject* py_obj,char* name) + { + if (!py_obj || !PyInt_Check(py_obj)) + handle_bad_type(py_obj,"int", name); + return (int) PyInt_AsLong(py_obj); + } + </code></pre></blockquote> + + <blockquote><pre><code> + void handle_bad_type(PyObject* py_obj, char* good_type, char* var_name) + { + char msg[500]; + sprintf(msg,"received '%s' type instead of '%s' for variable '%s'", + find_type(py_obj),good_type,var_name); + throw Py::TypeError(msg); + } + + char* find_type(PyObject* py_obj) + { + if(py_obj == NULL) return "C NULL value"; + if(PyCallable_Check(py_obj)) return "callable"; + if(PyString_Check(py_obj)) return "string"; + if(PyInt_Check(py_obj)) return "int"; + if(PyFloat_Check(py_obj)) return "float"; + if(PyDict_Check(py_obj)) return "dict"; + if(PyList_Check(py_obj)) return "list"; + if(PyTuple_Check(py_obj)) return "tuple"; + if(PyFile_Check(py_obj)) return "file"; + if(PyModule_Check(py_obj)) return "module"; + + //should probably do more interagation (and thinking) on these. + if(PyCallable_Check(py_obj) && PyInstance_Check(py_obj)) return "callable"; + if(PyInstance_Check(py_obj)) return "instance"; + if(PyCallable_Check(py_obj)) return "callable"; + return "unkown type"; + } + </code></pre></blockquote> + +Since the <code>inline</code> is also executed within the <code>try/catch</code> +block, you can use CXX exceptions within your code. It is usually a bad idea +to directly <code>return</code> from your code, even if an error occurs. This +skips the clean up section of the extension function. In this simple example, +there isn't any clean up code, but in more complicated examples, there may +be some reference counting that needs to be taken care of here on converted +variables. To avoid this, either uses exceptions or set +<code>return_val</code> to NULL and use <code>if/then's</code> to skip code +after errors. + +<a name="inline_technical_details"></a> +<h2> Technical Details </h2> +<p> +There are +<ol> + <li>Type conversion + <li>Generating C/C++ code + <li>Compile the code to an extension module + <li>Catalog (and cache) the function for future use</li> +</ol> +<p> +Items 1 and 2 above are related, but most easily discussed separately. Type +conversions are customizable by the user if needed. Understanding them is +pretty important for anything beyond trivial uses of <code>inline</code>. +Generating the C/C++ code is handled by <code>ext_function</code> and +<code>ext_module</code> classes and . For the most part, compiling the code is +handled by distutils. Some customizations were needed, but they were +relatively minor and do not require changes to distutils itself (although a few +changes would be nice...). Cataloging is pretty simple in concept, but surprisingly +required the most code to implement (and still likely needs some work). So, +this section covers items 1 and 4 from the list. Item 2 is covered later in +the chapter covering the <code>ext_tools</code> module, and distutils is covered +by a completely separate document xxx. + +<h2>Passing Variables in/out of the C/C++ code</h2> +<em> +Note: Passing variables into the C code is pretty straight forward, but there +are subtlties to how variable modifications in C are returned to Python. see +xxx for a more thorough discussion of this issue. +</em> + +<A name="Converting Types"></a> +<h2>Type Conversions</h2> + +<em> +Note: Maybe xxx_converter instead of xxx_specification is a more descriptive +name. +</em> + +<p> +By default, <code>inline()</code> makes the following type conversions between +Python and C++ types. +<p> + +<center> +<table border=1 style="WIDTH: 420px; HEIGHT: 395px"> +<tr><td colspan="2" width="100%"> + <P align=center>Default Data Type Conversions</P> </td></tr> +<tr><td> + <P align=center>Python</P></td><td> + <P align=center>C++</P></td></tr> +<tr><td> int</td><td> int</td></tr> +<tr><td> float</td><td> double</td></tr> +<tr><td> complex</td><td> std::complex<double></td></tr> +<tr><td> string</td><td> Py::String</td></tr> +<tr><td> list</td><td> Py::List</td></tr> +<tr><td> dict</td><td> Py::Dict</td></tr> +<tr><td> tuple</td><td> Py::Tuple</td></tr> +<tr><td> file</td><td> FILE*</td></tr> +<tr><td> callable</td><td> PyObject*</td></tr> +<tr><td> instance</td><td> PyObject*</td></tr> +<tr><td> Numeric.array</td><td> PyArrayObject*</td></tr> +<tr><td> wxXXX</td><td> wxXXX*</td></tr> +</table> +</center> +<p> +The <code>Py::</code> namespace is defined by the +<a href="http://cxx.sourceforge.net/">CXX</a> library which has C++ class +equivalents for many Python types. <code>std::</code> is the namespace of the +standard library in C++. +<p> +<em> +Note: +<ul> +<li>I haven't figured out how to handle <code>long int</code> yet (I think they are currenlty converted + to int - - check this). + +<li> +Hopefully VTK will be added to the list soon</li> + </ul> +<UL></UL> +</em> +<p> + +Python to C++ conversions fill in code in several locations in the generated +<code>inline</code> extension function. Below is the basic template for the +function. This is actually the exact code that is generated by calling +<code>compiler.inline("",[])</code>. + + <blockquote><pre><code> + static PyObject* compiled_func(PyObject*self, PyObject* args) + { + PyObject *return_val = NULL; + int exception_occured = 0; + PyObject *py__locals = NULL; + PyObject *py__globals = NULL; + PyObject *py_a; + py_a = NULL; + + if(!PyArg_ParseTuple(args,"OO:compiled_func",&py__locals,&py__globals)) + return NULL; + try + { + PyObject* raw_locals = py_to_raw_dict(py__locals,"_locals"); + PyObject* raw_globals = py_to_raw_dict(py__globals,"_globals"); + /* argument conversion code */ + /* inline code */ + /*I would like to fill in changed locals and globals here...*/ + + } + catch( Py::Exception& e) + { + return_val = Py::Null(); + exception_occured = 1; + } + /* cleanup code */ + if(!return_val && !exception_occured) + { + + Py_INCREF(Py_None); + return_val = Py_None; + } + + return return_val; + } + </code></pre></blockquote> + +The <code>/* inline code */</code> section is filled with the code passed to +the <code>inline()</code> function call. The +<code>/*argument convserion code*/</code> and <code>/* cleanup code */</code> +sections are filled with code that handles conversion from Python to C++ +types and code that deallocates memory or manipulates reference counts before +the function returns. The following sections demostrate how these two areas +are filled in by the default conversion methods. + +<em> +Note: I'm not sure I have reference counting correct on a few of these. The +only thing I increase/decrease the ref count on is Numeric arrays. If you +see an issue, please let me know. +</em> + +<a name="inline_numeric_argument_conversion"></a> +<h3> Numeric Argument Conversion </h3> + +Integer, floating point, and complex arguments are handled in a very similar +fashion. Consider the following inline function that has a single integer +variable passed in: + + <blockquote><pre><code> + >>> a = 1 + >>> inline("",['a']) + </code></pre></blockquote> + +The argument conversion code inserted for <code>a</code> is: + + <blockquote><pre><code> + /* argument conversion code */ + int a = py_to_int (get_variable("a",raw_locals,raw_globals),"a"); + </code></pre></blockquote> + +<code>get_variable()</code> reads the variable <code>a</code> +from the local and global namespaces. <code>py_to_int()</code> has the following +form: + + <blockquote><pre><code> + static int py_to_int(PyObject* py_obj,char* name) + { + if (!py_obj || !PyInt_Check(py_obj)) + handle_bad_type(py_obj,"int", name); + return (int) PyInt_AsLong(py_obj); + } + </code></pre></blockquote> + +Similarly, the float and complex conversion routines look like: + + <blockquote><pre><code> + static double py_to_float(PyObject* py_obj,char* name) + { + if (!py_obj || !PyFloat_Check(py_obj)) + handle_bad_type(py_obj,"float", name); + return PyFloat_AsDouble(py_obj); + } + + static std::complex<double> py_to_complex(PyObject* py_obj,char* name) + { + if (!py_obj || !PyComplex_Check(py_obj)) + handle_bad_type(py_obj,"complex", name); + return std::complex<double>(PyComplex_RealAsDouble(py_obj), + PyComplex_ImagAsDouble(py_obj)); + } + </code></pre></blockquote> + +Numeric conversions do not require any clean up code. + +<a name="inline_python_argument_conversion"></a> +<h3> String, List, Tuple, and Dictionary Conversion </h3> + +Strings, Lists, Tuples and Dictionary conversions are all converted to +CXX types by default. + +For the following code, + + <blockquote><pre><code> + >>> a = [1] + >>> inline("",['a']) + </code></pre></blockquote> + +The argument conversion code inserted for <code>a</code> is: + + <blockquote><pre><code> + /* argument conversion code */ + Py::List a = py_to_list (get_variable("a",raw_locals,raw_globals),"a"); + </code></pre></blockquote> + +<code>get_variable()</code> reads the variable <code>a</code> +from the local and global namespaces. <code>py_to_list()</code> and its +friends has the following form: + + <blockquote><pre><code> + static Py::List py_to_list(PyObject* py_obj,char* name) + { + if (!py_obj || !PyList_Check(py_obj)) + handle_bad_type(py_obj,"list", name); + return Py::List(py_obj); + } + + static Py::String py_to_string(PyObject* py_obj,char* name) + { + if (!PyString_Check(py_obj)) + handle_bad_type(py_obj,"string", name); + return Py::String(py_obj); + } + + static Py::Dict py_to_dict(PyObject* py_obj,char* name) + { + if (!py_obj || !PyDict_Check(py_obj)) + handle_bad_type(py_obj,"dict", name); + return Py::Dict(py_obj); + } + + static Py::Tuple py_to_tuple(PyObject* py_obj,char* name) + { + if (!py_obj || !PyTuple_Check(py_obj)) + handle_bad_type(py_obj,"tuple", name); + return Py::Tuple(py_obj); + } + </code></pre></blockquote> + +CXX handles reference counts on for strings, lists, tuples, and dictionaries, +so clean up code isn't necessary. + +<a name="#inline_file_argument_conversion"></a> +<h3> File Conversion </h3> + +For the following code, + + <blockquote><pre><code> + >>> a = open("bob",'w') + >>> inline("",['a']) + </code></pre></blockquote> + +The argument conversion code is: + + <blockquote><pre><code> + /* argument conversion code */ + PyObject* py_a = get_variable("a",raw_locals,raw_globals); + FILE* a = py_to_file(py_a,"a"); + </code></pre></blockquote> + +<code>get_variable()</code> reads the variable <code>a</code> +from the local and global namespaces. <code>py_to_file()</code> converts +PyObject* to a FILE* and increments the reference count of the PyObject*: + + <blockquote><pre><code> + FILE* py_to_file(PyObject* py_obj, char* name) + { + if (!py_obj || !PyFile_Check(py_obj)) + handle_bad_type(py_obj,"file", name); + + Py_INCREF(py_obj); + return PyFile_AsFile(py_obj); + } + </code></pre></blockquote> + +Because the PyObject* was incremented, the clean up code needs to decrement +the counter + + <blockquote><pre><code> + /* cleanup code */ + Py_XDECREF(py_a); + </code></pre></blockquote> + +Its important to understand that file conversion only works on actual files -- +i.e. ones created using the <code>open()</code> command in Python. It does +not support converting arbitrary objects that support the file interface into +C <code>FILE*</code> pointers. This can affect many things. For example, in +initial <code>printf()</code> examples, one might be tempted to solve the +problem of C and Python IDE's (PythonWin, PyCrust, etc.) writing to different +stdout and stderr by using <code>fprintf()</code> and passing in +<code>sys.stdout</code> and <code>sys.stderr</code>. For example, instead of + + <blockquote><pre><code> + >>> compiler.inline('printf("hello\\n");',[]) + </code></pre></blockquote> + +You might try: + + <blockquote><pre><code> + >>> buf = sys.stdout + >>> compiler.inline('fprintf(buf,"hello\\n");',['buf']) + </code></pre></blockquote> + +This will work as expected from a standard python interpreter, but in PythonWin, +the following occurs: + + <blockquote><pre><code> + >>> buf = sys.stdout + >>> compiler.inline('fprintf(buf,"hello\\n");',['buf']) + Traceback (most recent call last): + File "<interactive input>", line 1, in ? + File "C:\Python21\compiler\inline_tools.py", line 315, in inline + auto_downcast = auto_downcast, + File "C:\Python21\compiler\inline_tools.py", line 386, in compile_function + type_factories = type_factories) + File "C:\Python21\compiler\ext_tools.py", line 197, in __init__ + auto_downcast, type_factories) + File "C:\Python21\compiler\ext_tools.py", line 390, in assign_variable_types + raise TypeError, format_error_msg(errors) + TypeError: {'buf': "Unable to convert variable 'buf' to a C++ type."} + </code></pre></blockquote> + +The traceback tells us that <code>inline()</code> was unable to convert 'buf' to a +C++ type (If instance conversion was implemented, the error would have occurred at +runtime instead). Why is this? Let's look at what the <code>buf</code> object +really is: + + <blockquote><pre><code> + >>> buf + pywin.framework.interact.InteractiveView instance at 00EAD014 + </code></pre></blockquote> + +PythonWin has reassigned <code>sys.stdout</code> to a special object that implements the Python +file interface. This works great in Python, but since the special object doesn't +have a FILE* pointer underlying it, fprintf doesn't know what to do with it (well this +will be the problem when instance conversion is implemented...). + +<a name="#inline_callable_argument_conversion"></a> +<h3> Callable, Instance, and Module Conversion </h3> + +<em>Note: Need to look into how ref counts should be handled. Also, +Instance and Module conversion are not currently implemented. +</em> + + <blockquote><pre><code> + >>> def a(): + pass + >>> inline("",['a']) + </code></pre></blockquote> + +Callable and instance variables are converted to PyObject*. Nothing is done +to there reference counts. + + <blockquote><pre><code> + /* argument conversion code */ + PyObject* a = py_to_callable(get_variable("a",raw_locals,raw_globals),"a"); + </code></pre></blockquote> + +<code>get_variable()</code> reads the variable <code>a</code> +from the local and global namespaces. The <code>py_to_callable()</code> and +<code>py_to_instance()</code> don't currently increment the ref count. + + <blockquote><pre><code> + PyObject* py_to_callable(PyObject* py_obj, char* name) + { + if (!py_obj || !PyCallable_Check(py_obj)) + handle_bad_type(py_obj,"callable", name); + return py_obj; + } + + PyObject* py_to_instance(PyObject* py_obj, char* name) + { + if (!py_obj || !PyFile_Check(py_obj)) + handle_bad_type(py_obj,"instance", name); + return py_obj; + } + </code></pre></blockquote> + +There is no cleanup code for callables, modules, or instances. + +<a name="#Customizing Conversions"></a> +<h3> Customizing Conversions </h3> +<p> +Converting from Python to C++ types is handled by xxx_specification classes. A +type specification class actually serve in two related but different +roles. The first is in determining whether a Python variable that needs to be +converted should be represented by the given class. The second is as a code +generator that generate C++ code needed to convert from Python to C++ types for +a specific variable. +<p> +When + + <blockquote><pre><code> + >>> a = 1 + >>> compiler.inline('printf("%d",a);',['a']) + </code></pre></blockquote> + +is called for the first time, the code snippet has to be compiled. In this +process, the variable 'a' is tested against a list of type specifications (the +default list is stored in compiler/ext_tools.py). The <em>first</em> +specification in the list is used to represent the variable. + +<p> +Examples of <code>xxx_specification</code> are scattered throughout numerous +"xxx_spec.py" files in the <code>compiler</code> package. Closely related to +the <code>xxx_specification</code> classes are <code>yyy_info</code> classes. +These classes contain compiler, header, and support code information necessary +for including a certain set of capabilities (such as blitz++ or CXX support) +in a compiled module. <code>xxx_specification</code> classes have one or more +<code>yyy_info</code> classes associated with them. + +If you'd like to define your own set of type specifications, the current best route +is to examine some of the existing spec and info files. Maybe looking over +sequence_spec.py and cxx_info.py are a good place to start. After defining +specification classes, you'll need to pass them into <code>inline</code> using the +<code>type_factories</code> argument. + +A lot of times you may just want to change how a specific variable type is +represented. Say you'd rather have Python strings converted to +<code>std::string</code> or maybe <code>char*</code> instead of using the CXX +string object, but would like all other type conversions to have default +behavior. This requires that a new specification class that handles strings +is written and then prepended to a list of the default type specifications. Since +it is closer to the front of the list, it effectively overrides the default +string specification. + +The following code demonstrates how this is done: + +... + +<a name="The Catalog"></a> +<h2> The Catalog </h2> +<p> +<code>catalog.py</code> has a class called <code>catalog</code> that helps keep +track of previously compiled functions. This prevents <code>inline()</code> +and related functions from having to compile functions everytime they are +called. Instead, catalog will check an in memory cache to see if the function +has already been loaded into python. If it hasn't, then it starts searching +through persisent catalogs on disk to see if it finds an entry for the given +function. By saving information about compiled functions to disk, it isn't +necessary to re-compile functions everytime you stop and restart the interpreter. +Functions are compiled once and stored for future use. + +<p> +When <code>inline(cpp_code)</code> is called the following things happen: +<ol> + <li> + A fast local cache of functions is checked for the last function called for + <code>cpp_code</code>. If an entry for <code>cpp_code</code> doesn't exist in the + cache or the cached function call fails (perhaps because the function doesn't + have compatible types) then the next step is to check the catalog. + <li> + The catalog class also keeps an in-memory cache with a list of all the + functions compiled for <code>cpp_code</code>. If <code>cpp_code</code> has + ever been called, then this cache will be present (loaded from disk). If + the cache isn't present, then it is loaded from disk. + <p> + If the cache is present, each function in the cache is + called until one is found that was compiled for the correct argument types. If + none of the functions work, a new function is compiled with the given argument + types. This function is written to the on-disk catalog as well as into the + in-memory cache.</p> + <li> + When a lookup for <code>cpp_code</code> fails, the catalog looks through + the on-disk function catalogs for the entries. The PYTHONCOMPILED variable + determines where to search for these catalogs and in what order. If + PYTHONCOMPILED is not present several platform dependent locations are + searched. All functions found for <code>cpp_code</code> in the path are + loaded into the in-memory cache with functions found earlier in the search + path closer to the front of the call list. + <p> + If the function isn't found in the on-disk catalog, + then the function is compiled, written to the first writable directory in the + PYTHONCOMPILED path, and also loaded into the in-memory cache.</p> + </li> +</ol> + +<a name="function storage"></a> +<h3> Function Storage: How functions are stored in caches and on disk </h3> +<p> +Function caches are stored as dictionaries where the key is the entire C++ +code string and the value is either a single function (as in the "level 1" +cache) or a list of functions (as in the main catalog cache). On disk +catalogs are stored in the same manor using standard Python shelves. +<p> +Early on, there was a question as to whether md5 check sums of the C++ +code strings should be used instead of the actual code strings. I think this +is the route inline Perl took. Some (admittedly quick) tests of the md5 vs. +the entire string showed that using the entire string was at least a +factor of 3 or 4 faster for Python. I think this is because it is more +time consuming to compute the md5 value than it is to do look-ups of long +strings in the dictionary. Look at the examples/md5_speed.py file for the +test run. + +<a name="PYTHONCOMPILED"></a> +<h3> Catalog search paths and the PYTHONCOMPILED variable</h3> +<p> +The default location for catalog files on Unix is is ~/.pythonXX_compiled where +XX is version of Python being used. If this directory doesn't exist, it is +created the first time a catalog is used. The directory must be writable. If, +for any reason it isn't, then the catalog attempts to create a directory based +on your user id in the /tmp directory. The directory permissions are set so +that only you have access to the directory. If this fails, I think you're out of +luck. I don't think either of these should ever fail though. On Windows, a +directory called pythonXX_compiled is created in the user's temporary +directory. +<p> +The actual catalog file that lives in this directory is a Python shelve with +a platform specific name such as "nt21compiled_catalog" so that multiple OSes +can share the same file systems without trampling on each other. Along with +the catalog file, the .cpp and .so or .pyd files created by inline will live +in this directory. The catalog file simply contains keys which are the C++ +code strings with values that are lists of functions. The function lists point +at functions within these compiled modules. Each function in the lists +executes the same C++ code string, but compiled for different input variables. +<p> +You can use the PYTHONCOMPILED environment variable to specify alternative +locations for compiled functions. On Unix this is a colon (':') separated +list of directories. On windows, it is a (';') separated list of directories. +These directories will be searched prior to the default directory for a +compiled function catalog. Also, the first writable directory in the list +is where all new compiled function catalogs, .cpp and .so or .pyd files are +written. Relative directory paths ('.' and '..') should work fine in the +PYTHONCOMPILED variable as should environement variables. +<p> +There is a "special" path variable called MODULE that can be placed in the +PYTHONCOMPILED variable. It specifies that the compiled catalog should +reside in the same directory as the module that called it. This is useful +if an admin wants to build a lot of compiled functions during the build +of a package and then install them in site-packages along with the package. +User's who specify MODULE in their PYTHONCOMPILED variable will have access +to these compiled functions. Note, however, that if they call the function +with a set of argument types that it hasn't previously been built for, the +new function will be stored in their default directory (or some other writable +directory in the PYTHONCOMPILED path) because the user will not have write +access to the site-packages directory. +<p> +An example of using the PYTHONCOMPILED path on bash follows: + + <blockquote><pre><code> + PYTHONCOMPILED=MODULE:/some/path;export PYTHONCOMPILED; + </code></pre></blockquote> + +If you are using python21 on linux, and the module bob.py in site-packages +has a compiled function in it, then the catalog search order when calling that +function for the first time in a python session would be: + + <blockquote><pre><code> + /usr/lib/python21/site-packages/linuxpython_compiled + /some/path/linuxpython_compiled + ~/.python21_compiled/linuxpython_compiled + </code></pre></blockquote> + +The default location is always included in the search path. +<p> +<em> +Note: hmmm. see a possible problem here. I should probably make a sub- +directory such as /usr/lib/python21/site- +packages/python21_compiled/linuxpython_compiled so that library files compiled +with python21 are tried to link with python22 files in some strange scenarios. +Need to check this. +</em> + +<p> +The in-module cache (in <code>compiler.inline_tools</code> reduces the overhead +of calling inline functions by about a factor of 2. It can be reduced a little +more for type loop calls where the same function is called over and over again +if the cache was a single value instead of a dictionary, but the benefit is +very small (less than 5%) and the utility is quite a bit less. So, we'll stick +with a dictionary as the cache. +<p></p> + +<a name="Blitz"></a> +<h1>Blitz</h1> +<em> Note: most of this section is lifted from old documentation. It should be +pretty accurate, but there may be a few discrepancies.</em> +<p> +<code>compiler.blitz()</code> compiles Numeric Python expressions for fast +execution. For most applications, compiled expressions should provide a +factor of 2-10 speed-up over Numeric arrays. Using compiled +expressions is meant to be as unobtrusive as possible and works much like +pythons exec statement. As an example, the following code fragment takes a 5 +point average of the 512x512 2d image, b, and stores it in array, a: + + <blockquote><pre><code> + from scipy import * # or from Numeric import * + a = ones((512,512), Float64) + b = ones((512,512), Float64) + # ...do some stuff to fill in b... + # now average + a[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1] \ + + b[1:-1,2:] + b[1:-1,:-2]) / 5. + </code></pre></blockquote> + +To compile the expression, convert the expression to a string by putting +quotes around it and then use <code>compiler.blitz</code>: + + <blockquote><pre><code> + import compiler + expr = "a[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1]" \ + "+ b[1:-1,2:] + b[1:-1,:-2]) / 5." + compiler.blitz(expr) + </code></pre></blockquote> + +The first time <code>compiler.blitz</code> is run for a given expression and +set of arguements, C++ code that accomplishes the exact same task as the Python +expression is generated and compiled to an extension module. This can take up +to a couple of minutes depending on the complexity of the function. Subsequent +calls to the function are very fast. Futher, the generated module is saved +between program executions so that the compilation is only done once for a +given expression and associated set of array types. If the given expression +is executed with a new set of array types, the code most be compiled again. This +does not overwrite the previously compiled function -- both of them are saved and +available for exectution. +<p> +The following table compares the run times for standard Numeric code and +compiled code for the 5 point averaging. +<p> +<center> +<table border=1 > +<tr><td>Method</td> <td>Run Time (seconds)</td></tr> +<tr><td>Standard Numeric</td> <td>0.46349</td></tr> +<tr><td>blitz (1st time compiling)</td> <td> 78.95526</td></tr> +<tr><td>blitz (subsequent calls)</td> <td>0.05843 (factor of 8 speedup)</td></tr> +</table> +</center> +<p> +These numbers are for a 512x512 double precision image run on a 400 MHz Celeron +processor under RedHat Linux 6.2. +<p> +Because of the slow compile times, its probably most effective to develop +algorithms as you usually do using the capabilities of scipy or the Numeric +module. Once the algorithm is perfected, put quotes around it and execute it +using <code>compiler.blitz</code>. This provides the standard rapid +prototyping strengths of Python and results in algorithms that run close to +that of hand coded C or Fortran. + +<a name="blitz_requirements"></a> +<h2>Requirements</h2> + +Currently, the <code>compiler.blitz</code> has only been tested under Linux +with gcc-2.95-3 and on Windows with Mingw32 (2.95.2). Its compiler +requirements are pretty heavy duty (see the +<a href="http://www.oonumerics.org/blitz/">blitz++ home page</a>), so it won't +work with just any compiler. Particularly MSVC++ isn't up to snuff. A number +of other compilers such as KAI++ will also work, but my suspicions are that gcc +will get the most use. + +<a name="blitz_limitations"></a> +<h2>Limitations</h2> +<ol> +<li> +Currently, <code>compiler.blitz</code> handles all standard mathematic +operators except for the ** power operator. The built-in trigonmetric, log, +floor/ceil, and fabs functions might work (but haven't been tested). It also +handles all types of array indexing supported by the Numeric module. +<p> +<code>compiler.blitz</code> does not currently support operations that use +array broadcasting, nor have any of the special purpose functions in Numeric +such as take, compress, etc. been implemented. Note that there are no obvious +reasons why most of this functionality cannot be added to scipy.compiler, so it +will likely trickle into future versions. Using <code>slice()</code> objects +directly instead of <code>start:stop:step</code> is also not supported. +</li> +<li> +Currently Python only works on expressions that include assignment such as + + <blockquote><pre><code> + >>> result = b + c + d + </code></pre></blockquote> + +This means that the result array must exist before calling +<code>compiler.blitz</code>. Future versions will allow the following: + + <blockquote><pre><code> + >>> result = compiler.blitz_eval("b + c + d") + </code></pre></blockquote> +</li> +<li> +<code>compiler.blitz</code> works best when algorithms can be expressed in a +"vectorized" form. Algorithms that have a large number of if/thens and other +conditions are better hand written in C or Fortran. Further, the restrictions +imposed by requiring vectorized expressions sometimes preclude the use of more +efficient data structures or algorithms. For maximum speed in these cases, +hand-coded C or Fortran code is the only way to go. +</li> +<li> +One other point deserves mention lest people be confused. +<code>compiler.blitz</code> is not a general purpose Python->C compiler. It +only works for expressions that contain Numeric arrays and/or +Python scalar values. This focused scope concentrates effort on the +compuationally intensive regions of the program and sidesteps the difficult +issues associated with a general purpose Python->C compiler. +</li> +</ol> + +<a name="Numeric Efficiency"></a> +<h2>Numeric efficiency issues: What compilation buys you</h2> + +Some might wonder why compiling Numeric expressions to C++ is beneficial since +operations on Numeric array operations are already executed within C loops. +The problem is that anything other than the simplest expression are executed in +less than optimal fashion. Consider the following Numeric expression: + + <blockquote><pre><code> + a = 1.2 * b + c * d + </code></pre></blockquote> + +When Numeric calculates the value for the 2d array, <code>a</code>, it does +the following steps: + + <blockquote><pre><code> + temp1 = 1.2 * b + temp2 = c * d + a = temp1 + temp2 + </code></pre></blockquote> + +Two things to note. Since <code>c</code> is an (perhaps large) array, a large +temporary array must be created to store the results of <code>1.2 * b</code>. +The same is true for <code>temp2</code>. Allocation is slow. The second thing +is that we have 3 loops executing, one to calculate <code>temp1</code>, one for +<code>temp2</code> and one for adding them up. A C loop for the same problem +might look like: + + <blockquote><pre><code> + for(int i = 0; i < M; i++) + for(int j = 0; j < N; j++) + a[i,j] = 1.2 * b[i,j] + c[i,j] * d[i,j] + </code></pre></blockquote> + +Here, the 3 loops have been fused into a single loop and there is no longer +a need for a temporary array. This provides a significant speed improvement +over the above example (write me and tell me what you get). +<p> +So, converting Numeric expressions into C/C++ loops that fuse the loops and +eliminate temporary arrays can provide big gains. The goal then,is to convert +Numeric expression to C/C++ loops, compile them in an extension module, and +then call the compiled extension function. The good news is that there is an +obvious correspondence between the Numeric expression above and the C loop. The +bad news is that Numeric is generally much more powerful than this simple +example illustrates and handling all possible indexing possibilities results in +loops that are less than straight forward to write. (take a peak in Numeric for +confirmation). Luckily, there are several available tools that simplify the +process. + +<a name="blitz_tools"></a> +<h2>The Tools</h2> + +<code>compiler.blitz</code> relies heavily on several remarkable tools. On the +Python side, the main facilitators are Jermey Hylton's parser module and Jim +Huginin's Numeric module. On the compiled language side, Todd Veldhuizen's +blitz++ array library, written in C++ (shhhh. don't tell David Beazley), does +the heavy lifting. Don't assume that, because it's C++, it's much slower than C +or Fortran. Blitz++ uses a jaw dropping array of template techniques +(metaprogramming, template expression, etc) to convert innocent looking and +readable C++ expressions into to code that usually executes within a few +percentage points of Fortran code for the same problem. This is good. +Unfortunately all the template raz-ma-taz is very expensive to compile, so the +200 line extension modules often take 2 or more minutes to compile. This isn't so +good. <code>compiler.blitz</code> works to minimize this issue by remembering +where compiled modules live and reusing them instead of re-compiling every time +a program is re-run. + +<a name="blitz_parser"></a> +<h3>Parser</h3> +Tearing Numeric expressions apart, examining the pieces, and then rebuilding +them as C++ (blitz) expressions requires a parser of some sort. I can imagine +someone attacking this problem with regular expressions, but it'd likely be +ugly and fragile. Amazingly, Python solves this problem for us. It actually +exposes its parsing engine to the world through the <code>parser</code> module. +The following fragment creates an Abstract Syntax Tree (AST) object for the +expression and then converts to a (rather unpleasant looking) deeply nested list +representation of the tree. + + <blockquote><pre><code> + >>> import parser + >>> import scipy.compiler.misc + >>> ast = parser.suite("a = b * c + d") + >>> ast_list = ast.tolist() + >>> sym_list = scipy.compiler.misc.translate_symbols(ast_list) + >>> pprint.pprint(sym_list) + ['file_input', + ['stmt', + ['simple_stmt', + ['small_stmt', + ['expr_stmt', + ['testlist', + ['test', + ['and_test', + ['not_test', + ['comparison', + ['expr', + ['xor_expr', + ['and_expr', + ['shift_expr', + ['arith_expr', + ['term', + ['factor', ['power', ['atom', ['NAME', 'a']]]]]]]]]]]]]]], + ['EQUAL', '='], + ['testlist', + ['test', + ['and_test', + ['not_test', + ['comparison', + ['expr', + ['xor_expr', + ['and_expr', + ['shift_expr', + ['arith_expr', + ['term', + ['factor', ['power', ['atom', ['NAME', 'b']]]], + ['STAR', '*'], + ['factor', ['power', ['atom', ['NAME', 'c']]]]], + ['PLUS', '+'], + ['term', + ['factor', ['power', ['atom', ['NAME', 'd']]]]]]]]]]]]]]]]], + ['NEWLINE', '']]], + ['ENDMARKER', '']] + </code></pre></blockquote> + +Despite its looks, with some tools developed by Jermey H., its possible +to search these trees for specific patterns (sub-trees), extract the +sub-tree, manipulate them converting python specific code fragments +to blitz code fragments, and then re-insert it in the parse tree. The parser +module documentation has some details on how to do this. Traversing the +new blitzified tree, writing out the terminal symbols as you go, creates +our new blitz++ expression string. + +<a name="blitz_blitz"></a> +<h3> Blitz and Numeric </h3> +The other nice discovery in the project is that the data structure used +for Numeric arrays and blitz arrays is nearly identical. Numeric stores +"strides" as byte offsets and blitz stores them as element offsets, but +other than that, they are the same. Further, most of the concept and +capabilities of the two libraries are remarkably similar. It is satisfying +that two completely different implementations solved the problem with +similar basic architectures. It is also fortitous. The work involved in +converting Numeric expressions to blitz expressions was greatly diminished. +As an example, consider the code for slicing an array in Python with a +stride: + + <blockquote><pre><code> + >>> a = b[0:4:2] + c + >>> a + [0,2,4] + </code></pre></blockquote> + + +In Blitz it is as follows: + + <blockquote><pre><code> + Array<2,int> b(10); + Array<2,int> c(3); + // ... + Array<2,int> a = b(Range(0,3,2)) + c; + </code></pre></blockquote> + + +Here the range object works exactly like Python slice objects with the exception +that the top index (3) is inclusive where as Python's (4) is exclusive. Other +differences include the type declaraions in C++ and parentheses instead of +brackets for indexing arrays. Currently, <code>compiler.blitz</code> handles the +inclusive/exclusive issue by subtracting one from upper indices during the +translation. An alternative that is likely more robust/maintainable in the +long run, is to write a PyRange class that behaves like Python's range. +This is likely very easy. +<p> +The stock blitz also doesn't handle negative indices in ranges. The current +implementation of the compiler has a partial solution to this problem. It +calculates and index that starts with a '-' sign by subtracting it from the +maximum index in the array so that: + + <blockquote><pre><code> + upper index limit + /-----\ + b[:-1] -> b(Range(0,Nb[0]-1-1)) + </code></pre></blockquote> + +This approach fails, however, when the top index is calculated from other +values. In the following scenario, if <code>i+j</code> evaluates to a negative +value, the compiled code will produce incorrect results and could even core- +dump. Right now, all calculated indices are assumed to be positive. + + <blockquote><pre><code> + b[:i-j] -> b(Range(0,i+j)) + </code></pre></blockquote> + +A solution is to calculate all indices up front using if/then to handle the ++/- cases. This is a little work and results in more code, so it hasn't been +done. I'm holding out to see if blitz++ can be modified to handle negative +indexing, but haven't looked into how much effort is involved yet. While it +needs fixin', I don't think there is a ton of code where this is an issue. +<p> +The actual translation of the Python expressions to blitz expressions is +currently a two part process. First, all x:y:z slicing expression are removed +from the AST, converted to slice(x,y,z) and re-inserted into the tree. Any +math needed on these expressions (subtracting from the +maximum index, etc.) are also preformed here. _beg and _end are used as special +variables that are defined as blitz::fromBegin and blitz::toEnd. + + <blockquote><pre><code> + a[i+j:i+j+1,:] = b[2:3,:] + </code></pre></blockquote> + +becomes a more verbose: + + <blockquote><pre><code> + a[slice(i+j,i+j+1),slice(_beg,_end)] = b[slice(2,3),slice(_beg,_end)] + </code></pre></blockquote> + +The second part does a simple string search/replace to convert to a blitz +expression with the following translations: + + <blockquote><pre><code> + slice(_beg,_end) -> _all # not strictly needed, but cuts down on code. + slice -> blitz::Range + [ -> ( + ] -> ) + _stp -> 1 + </code></pre></blockquote> + +<code>_all</code> is defined in the compiled function as +<code>blitz::Range.all()</code>. These translations could of course happen +directly in the syntax tree. But the string replacement is slightly easier. +Note that name spaces are maintained in the C++ code to lessen the likelyhood +of name clashes. Currently no effort is made to detect name clashes. A good +rule of thumb is don't use values that start with '_' or 'py_' in compiled +expressions and you'll be fine. + +<a name="blitz_type_conversions"></a> +<h2>Type definitions and coersion</h2> + +So far we've glossed over the dynamic vs. static typing issue between Python +and C++. In Python, the type of value that a variable holds can change +through the course of program execution. C/C++, on the other hand, forces you +to declare the type of value a variables will hold prior at compile time. +<code>compiler.blitz</code> handles this issue by examining the types of the +variables in the expression being executed, and compiling a function for those +explicit types. For example: + + <blockquote><pre><code> + a = ones((5,5),Float32) + b = ones((5,5),Float32) + compiler.blitz("a = a + b") + </code></pre></blockquote> + +When compiling this expression to C++, <code>compiler.blitz</code> sees that the +values for a and b in the local scope have type <code>Float32</code>, or 'float' +on a 32 bit architecture. As a result, it compiles the function using +the float type (no attempt has been made to deal with 64 bit issues). +It also goes one step further. If all arrays have the same type, a templated +version of the function is made and instantiated for float, double, +complex<float>, and complex<double> arrays. <em> Note: This feature has been +removed from the current version of the code. Each version will be compiled +separately </em> +<p> +What happens if you call a compiled function with array types that are +different than the ones for which it was originally compiled? No biggie, you'll +just have to wait on it to compile a new version for your new types. This +doesn't overwrite the old functions, as they are still accessible. See the +catalog section in the inline() documentation to see how this is handled. +Suffice to say, the mechanism is transparent to the user and behaves +like dynamic typing with the occasional wait for compiling newly typed +functions. +<p> +When working with combined scalar/array operations, the type of the array is +<em>always</em> used. This is similar to the savespace flag that was recently +added to Numeric. This prevents issues with the following expression perhaps +unexpectedly being calculated at a higher (more expensive) precision that can +occur in Python: + + <blockquote><pre><code> + >>> a = array((1,2,3),typecode = Float32) + >>> b = a * 2.1 # results in b being a Float64 array. + </code></pre></blockquote> + +In this example, + + <blockquote><pre><code> + >>> a = ones((5,5),Float32) + >>> b = ones((5,5),Float32) + >>> compiler.blitz("b = a * 2.1") + </code></pre></blockquote> + +the <code>2.1</code> is cast down to a <code>float</code> before carrying out +the operation. If you really want to force the calculation to be a +<code>double</code>, define <code>a</code> and <code>b</code> as +<code>double</code> arrays. +<p> +One other point of note. Currently, you must include both the right hand side +and left hand side (assignment side) of your equation in the compiled +expression. Also, the array being assigned to must be created prior to calling +<code>compiler.blitz</code>. I'm pretty sure this is easily changed so that a +compiled_eval expression can be defined, but no effort has been made to +allocate new arrays (and decern their type) on the fly. + +<a name="blitz_catalog"></a> +<h2>Cataloging Compiled Functions</h2> + +See the <a href="#The Catalog">Cataloging functions</a> section in the +<code>compiler.inline()</code> documentation. + +<a name="blitz_array_sizes"></a> +<h2>Checking Array Sizes</h2> + +Surprisingly, one of the big initial problems with compiled code was making +sure all the arrays in an operation were of compatible type. The following +case, is of course, trivially easy: + + <blockquote><pre><code> + a = b + c + </code></pre></blockquote> + +It only requires that arrays <code>a</code>, <code>b</code>, and <code>c</code> +have the same shape. However, expressions like: + + <blockquote><pre><code> + a[i+j:i+j+1,:] = b[2:3,:] + c + </code></pre></blockquote> + +are not so trivial. Since slicing is involved, the size of the slices, not the +input arrays must be checked. Broadcasting complicates things further because +arrays and slices with different dimensions and shapes may be compatible for +math operations (broadcasting isn't yet supported by +<code>compiler.blitz</code>). Reductions have a similar effect as their +results are different shapes than their input operand. The binary operators in +Numeric compare the shapes of their two operands just before they operate on +them. This is possible because Numeric treats each operation independently. +The intermediate (temporary) arrays created during sub-operations in an +expression are tested for the correct shape before they are combined by another +operation. Because <code>compiler.blitz</code> fuses all operations into a +single loop, this isn't possible. The shape comparisons must be done and +guaranteed compatible before evaluating the expression. +<p> +The solution chosen converts input arrays to "dummy arrays" that only represent +the dimensions of the arrays, not the data. Binary operations on dummy arrays +check that input array sizes are comptible and return a dummy array with the +size correct size. Evaluating an expression of dummy arrays traces the +changing array sizes through all operations and fails if incompatible array +sizes are ever found. +<p> +The machinery for this is housed in <code>compiler.size_check</code>. It +basically involves writing a new class (dummy array) and overloading it math +operators to calculate the new sizes correctly. All the code is in Python and +there is a fair amount of logic (mainly to handle indexing and slicing) so the +operation does impose some overhead. For large arrays (ie. 50x50x50), the +overhead is negligible compared to evaluating the actual expression. For small +arrays (ie. 16x16), the overhead imposed for checking the shapes with this +method can cause the <code>compiler.blitz</code> to be slower than evaluating +the expression in Python. +<p> +What can be done to reduce the overhead? (1) The size checking code could be +moved into C. This would likely remove most of the overhead penalty compared +to Numeric (although there is also some calling overhead), but no effort has +been made to do this. (2) You can also call <code>compiler.blitz</code> with +<code>check_size=0</code> and the size checking isn't done. However, if the +sizes aren't compatible, it can cause a core-dump. So, foregoing size_checking +isn't advisable until your code is well debugged. + +<a name="blitz_extension_module"></a> +<h2>Creating the Extension Module</h2> + +<code>compiler.blitz</code> uses the same machinery as +<code>compiler.inline</code> to build the extension module. The only difference +is the code included in the function is automatically generated from the +Numeric array expression instead of supplied by the user. + +<a name="#Extension Modules"></a> +<h1>Extension Modules</h1> +<code>compiler.inline</code> and <code>compiler.blitz</code> are high level tools +that generate extension modules automatically. Under the covers, they use several +classes from <code>compiler.ext_tools</code> to help generate the extension module. +The main two classes are <code>ext_module</code> and <code>ext_function</code> (I'd +like to add <code>ext_class</code> and <code>ext_method</code> also). These classes +simplify the process of generating extension modules by handling most of the "boiler +plate" code automatically. + +<em> +Note: <code>inline</code> actually sub-classes <code>compiler.ext_tools.ext_function</code> +to generate slightly different code than the standard <code>ext_function</code>. +The main difference is that the standard class converts function arguments to +C types, while inline always has two arguments, the local and global dicts, and +the grabs the variables that need to be convereted to C from these. +</em> + +<a name="A Simple Example"></a> +<h2> A Simple Example </h2> +The following simple example demonstrates how to build an extension module within +a Python function: + + <blockquote><pre><code> + # examples/increment_example.py + from compiler import ext_tools + + def build_increment_ext(): + """ Build a simple extension with functions that increment numbers. + The extension will be built in the local directory. + """ + mod = ext_tools.ext_module('increment_ext') + + a = 1 # effectively a type declaration for 'a' in the + # following functions. + + ext_code = "return_val = Py::new_reference_to(Py::Int(a+1));" + func = ext_tools.ext_function('increment',ext_code,['a']) + mod.add_function(func) + + ext_code = "return_val = Py::new_reference_to(Py::Int(a+2));" + func = ext_tools.ext_function('increment_by_2',ext_code,['a']) + mod.add_function(func) + + mod.compile() + </code></pre></blockquote> + + +The function <code>build_increment_ext()</code> creates an extension module +named <code>increment_ext</code> and compiles it to a shared library (.so or +.pyd) that can be loaded into Python.. <code>increment_ext</code> contains two +functions, <code>increment</code> and <code>increment_by_2</code>. + +The first line of <code>build_increment_ext()</code>, + + <blockquote><pre><code> + mod = ext_tools.ext_module('increment_ext') + </code></pre></blockquote> + +creates an <code>ext_module</code> instance that is ready to have +<code>ext_function</code> instances added to it. <code>ext_function</code> +instances are created much with a calling convention similar to +<code>compiler.inline()</code>. The most common call includes a C/C++ code +snippet and a list of the arguments for the function. The following + + <blockquote><pre><code> + ext_code = "return_val = Py::new_reference_to(Py::Int(a+1));" + func = ext_tools.ext_function('increment',ext_code,['a']) + </code></pre></blockquote> + +creates a C/C++ extension function that is equivalent to the following Python +function: + + <blockquote><pre><code> + def increment(a): + return a + 1 + </code></pre></blockquote> + +A second method is also added to the module and then, + + <blockquote><pre><code> + mod.compile() + </code></pre></blockquote> + +is called to build the extension module. By default, the module is created +in the current working directory. + +This example is available in the <code>examples/increment_example.py</code> file +found in the compiler <code>directory</code>. At the bottom of the file in the +module's "main" program, an attempt to import <code>increment_ext</code> without +building it is made. If this fails (the module doesn't exist in the PYTHONPATH), +the module is built by calling <code>build_increment_ext()</code>. This approach +only takes the time consuming ( a few seconds for this example) process of building +the module if it hasn't been built before. + + <blockquote><pre><code> + if __name__ == "__main__": + try: + import increment_ext + except ImportError: + build_increment_ext() + import increment_ext + a = 1 + print 'a, a+1:', a, increment_ext.increment(a) + print 'a, a+2:', a, increment_ext.increment_by_2(a) + </code></pre></blockquote> + +<em> +Note: If we were willing to always pay the penalty of building the C++ code for +a module, we could store the md5 checksum of the C++ code along with some +information about the compiler, platform, etc. Then, +<code>ext_module.compile()</code> could try importing the module before it actually +compiles it, check the md5 checksum and other meta-data in the imported module +with the meta-data of the code it just produced and only compile the code if +the module didn't exist or the meta-data didn't match. This would reduce the +above code to: +</em> + <blockquote><pre><code> + if __name__ == "__main__": + build_increment_ext() + + a = 1 + print 'a, a+1:', a, increment_ext.increment(a) + print 'a, a+2:', a, increment_ext.increment_by_2(a) + </code></pre></blockquote> +<em> +Note: There would always be the overhead of building the C++ code, but it would only actually compile the code once. You pay a little in overhead and get cleaner +"import" code. Needs some thought. +</em> +<p> + +If you run <code>increment_example.py</code> from the command line, you get +the following: + + <blockquote><pre><code> + [eric@n0]$ python increment_example.py + a, a+1: 1 2 + a, a+2: 1 3 + </code></pre></blockquote> + +If the module didn't exist before it was run, the module is created. If it did +exist, it is just imported and used. + +<a name="Fibonacci Example"></a> +<h1> Fibonacci Example </h1> +<code>examples/fibonacci.py</code> provides a little more complex example of +how to use <code>ext_tools</code>. Fibonacci numbers are a series of numbers +where each number in the series is the sum of the previous two: 1, 1, 2, 3, 5, +8, etc. Here, the first two numbers in the series are taken to be 1. One +approach to calculating Fibonacci numbers uses recursive function calls. In +Python, it might be written as: + + <blockquote><pre><code> + def fib(a): + if a <= 2: + return 1 + else: + return fib(a-2) + fib(a-1) + </code></pre></blockquote> + +In C, the same function would look something like this: + + <blockquote><pre><code> + int fib(int a) + { + if(a <= 2) + return 1; + else + return fib(a-2) + fib(a-1); + } + </code></pre></blockquote> + +Recursion is much faster in C than in Python, so it would be beneficial +to use the C version for fibonacci number calculations instead of the +Python version. We need an extension function that calls this C function +to do this. This is possible by including the above code snippet as +"support code" and then calling it from the extension function. Support +code snippets (usually structure definitions, helper functions and the like) +are inserted into the extension module C/C++ file before the extension +function code. Here is how to build the C version of the fibonacci number +generator: + + <blockquote><pre><code> +def build_fibonacci(): + """ Builds an extension module with fibonacci calculators. + """ + mod = ext_tools.ext_module('fibonacci_ext') + a = 1 # this is effectively a type declaration + + # recursive fibonacci in C + fib_code = """ + int fib1(int a) + { + if(a <= 2) + return 1; + else + return fib1(a-2) + fib1(a-1); + } + """ + ext_code = """ + int val = fib1(a); + return_val = Py::new_reference_to(Py::Int(val)); + """ + fib = ext_tools.ext_function('fib',ext_code,['a']) + fib.customize.add_support_code(fib_code) + mod.add_function(fib) + + mod.compile() + + </code></pre></blockquote> + +XXX More about custom_info, and what xxx_info instances are good for. + +<p> +<em> +Note: recursion is not the fastest way to calculate fibonacci numbers, but this +approach serves nicely for this example. +</em> +<p> +<a name="#Type Factories"></a> +<h1>Customizing Type Conversions -- Type Factories</h1> +not written + +<h1>Things I wish compiler did</h1> +not written |