summaryrefslogtreecommitdiff
path: root/scipy/weave/doc/tutorial.html
diff options
context:
space:
mode:
Diffstat (limited to 'scipy/weave/doc/tutorial.html')
-rw-r--r--scipy/weave/doc/tutorial.html2900
1 files changed, 2900 insertions, 0 deletions
diff --git a/scipy/weave/doc/tutorial.html b/scipy/weave/doc/tutorial.html
new file mode 100644
index 000000000..fc5d7cf43
--- /dev/null
+++ b/scipy/weave/doc/tutorial.html
@@ -0,0 +1,2900 @@
+
+<h1>Weave Documentation</h1>
+<p>
+By Eric Jones eric@enthought.com
+<p>
+<h2>Outline</h2>
+<dl>
+<dd> <A href="#Introduction">Introduction</a>
+<dd> <A href="#Requirements">Requirements</a>
+<dd> <A href="#Installation">Installation</a>
+<dd> <A href="#Testing">Testing</a>
+<dd> <A href="#Benchmarks">Benchmarks</a>
+<dd> <A href="#Inline">Inline</a>
+ <dl>
+ <dd><A href="#More with printf">More with printf</a>
+ <dd>
+ <A href="#More examples">More examples</a>
+ <dl>
+ <dd><A href="#Binary search">Binary search</a>
+ <dd><A href="#Dictionary sort">Dictionary sort</a>
+ <dd><A href="#Numeric -- cast/copy/transpose">Numeric -- cast/copy/transpose</a>
+ <dd><A href="#wxPython">wxPython</a></dd>
+ </dl>
+ <dd><A href="#Keyword options">Keyword options</a>
+ <dd><A href="#Returning values">Returning values</a>
+ <dl>
+ <dd><A href="#The issue with locals()">
+ The issue with <code>locals()</code></a></dd>
+ </dl>
+ <dd><A href="#inline_quick_look_at_code">A quick look at the code</a>
+ <dd>
+ <A href="#inline_technical_details">Technical Details</a>
+ <dl>
+ <dd><A href="#Converting Types">Converting Types</a>
+ <dl>
+ <dd><A href="#inline_numeric_argument_conversion">
+ Numeric Argument Conversion</a>
+ <dd><A href="#inline_python_argument_conversion">
+ String, List, Tuple, and Dictionary Conversion</a>
+ <dd><A href="#inline_callable_argument_conversion">File Conversion</a>
+ <dd><A href="#inline_callable_argument_conversion">
+ Callable, Instance, and Module Conversion</a>
+ <dd><A href="#Customizing Conversions">Customizing Conversions</a>
+ </dl>
+ <dd><A href="#Compiling Code">Compiling Code</a>
+ <dd><a href="#The Catalog">"Cataloging" functions</a>
+ <dl>
+ <dd><a href="#function storage">Function Storage</a>
+ <dd><a href="#PYTHONCOMPILED">The PYTHONCOMPILED evnironment variable</a></dd>
+ </dl>
+ </dd>
+ </dl>
+ </dd>
+ </dl>
+<dd><A href="#Blitz">Blitz</a>
+ <dl>
+ <dd><a href="#blitz_requirements">Requirements</a>
+ <dd><a href="#blitz_limitations">Limitations</a>
+ <dd><a href="#Numeric Efficiency">Numeric Efficiency Issues</a>
+ <dd><a href="#blitz_tools">The Tools</a>
+ <dl>
+ <dd><a href="#blitz_parser">Parser</a>
+ <dd><a href="#blitz_blitz">Blitz and Numeric</a>
+ </dl>
+ <dd><a href="#blitz_type_conversions">Type defintions and coersion</a>
+ <dd><a href="#blitz_catalog">Cataloging Compiled Functions</a>
+ <dd><a href="#blitz_array_sizes">Checking Array Sizes</a>
+ <dd><a href="#blitz_extension_module">Creating the Extension Module</a>
+ </dl>
+<dd> <a href="#Extension Modules"> Extension Modules</a>
+ <dl>
+ <dd><a href="#A Simple Example">A Simple Example</a>
+ <dd><a href="#Fibonacci Example">Fibonacci Example</a>
+ </dl>
+<dd> <a href="#Type Factories"> Customizing Type Conversions -- Type Factories (not written)</a>
+ <dl>
+ <dd>Type Specifications
+ <dd>Type Information
+ <dd>The Conversion Process
+ </dl>
+</dl>
+<a name="Introduction"></a>
+<h1>Introduction</h1>
+
+<p>
+The <code>weave</code> package provides tools for including C/C++ code within
+in Python code. This offers both another level of optimization to those who need
+it, and an easy way to modify and extend any supported extension libraries such
+as wxPython and hopefully VTK soon. Inlining C/C++ code within Python generally
+results in speed ups of 1.5x to 30x speed-up over algorithms written in pure
+Python (However, it is also possible to slow things down...). Generally
+algorithms that require a large number of calls to the Python API don't benefit
+as much from the conversion to C/C++ as algorithms that have inner loops
+completely convertable to C.
+<p>
+There are three basic ways to use <code>weave</code>. The
+<code>weave.inline()</code> function executes C code directly within Python,
+and <code>weave.blitz()</code> translates Python Numeric expressions to C++
+for fast execution. <code>blitz()</code> was the original reason
+<code>weave</code> was built. For those interested in building extension
+libraries, the <code>ext_tools</code> module provides classes for building
+extension modules within Python.
+<p>
+Most of <code>weave's</code> functionality should work on Windows and Unix,
+although some of its functionality requires <code>gcc</code> or a similarly
+modern C++ compiler that handles templates well. Up to now, most testing has
+been done on Windows 2000 with Microsoft's C++ compiler (MSVC) and with gcc
+(mingw32 2.95.2 and 2.95.3-6). All tests also pass on Linux (RH 7.1
+with gcc 2.96), and I've had reports that it works on Debian also (thanks
+Pearu).
+<p>
+The <code>inline</code> and <code>blitz</code> provide new functionality to
+Python (although I've recently learned about the <a
+href="http://pyinline.sourceforge.net/" >PyInline</a> project which may offer
+similar functionality to <code>inline</code>). On the other hand, tools for
+building Python extension modules already exists (SWIG, SIP, pycpp, CXX, and
+others). As of yet, I'm not sure where <code>weave</code> fits in this
+spectrum. It is closest in flavor to CXX in that it makes creating new C/C++
+extension modules pretty easy. However, if you're wrapping a gaggle of legacy
+functions or classes, SWIG and friends are definitely the better choice.
+<code>weave</code> is set up so that you can customize how Python types are
+converted to C types in <code>weave</code>. This is great for
+<code>inline()</code>, but, for wrapping legacy code, it is more flexible to
+specify things the other way around -- that is how C types map to Python types.
+This <code>weave</code> does not do. I guess it would be possible to build
+such a tool on top of <code>weave</code>, but with good tools like SWIG around,
+I'm not sure the effort produces any new capabilities. Things like function
+overloading are probably easily implemented in <code>weave</code> and it might
+be easier to mix Python/C code in function calls, but nothing beyond this comes
+to mind. So, if you're developing new extension modules or optimizing Python
+functions in C, <code>weave.ext_tools()</code> might be the tool
+for you. If you're wrapping legacy code, stick with SWIG.
+<p>
+The next several sections give the basics of how to use <code>weave</code>.
+We'll discuss what's happening under the covers in more detail later
+on. Serious users will need to at least look at the type conversion section to
+understand how Python variables map to C/C++ types and how to customize this
+behavior. One other note. If you don't know C or C++ then these docs are
+probably of very little help to you. Further, it'd be helpful if you know
+something about writing Python extensions. <code>weave</code> does quite a
+bit for you, but for anything complex, you'll need to do some conversions,
+reference counting, etc.
+<p>
+<em>
+Note: </em><code>weave</code><em> is actually part of the <a
+href="http://www.scipy.org">SciPy</a> package. However, it works fine as a
+standalone package. The examples here are given as if it is used as a stand
+alone package. If you are using from within scipy, you can use <code> from
+scipy import weave</code> and the examples will work identically.</em>
+
+<a name="Requirements"></a>
+<h1>Requirements</h1>
+<ul>
+ <li> Python
+ <p>
+ I use 2.1.1. Probably 2.0 or higher should work.
+ <p>
+ </li>
+
+ <li> C++ compiler
+ <p>
+ <code>weave</code> uses <code>distutils</code> to actually build
+ extension modules, so it uses whatever compiler was originally used to
+ build Python. <code>weave</code> itself requires a C++ compiler. If
+ you used a C++ compiler to build Python, your probably fine.
+ <p>
+ On Unix gcc is the preferred choice because I've done a little
+ testing with it. All testing has been done with gcc, but I expect the
+ majority of compilers should work for <code>inline</code> and
+ <code>ext_tools</code>. The one issue I'm not sure about is that I've
+ hard coded things so that compilations are linked with the
+ <code>stdc++</code> library. <em>Is this standard across
+ Unix compilers, or is this a gcc-ism?</em>
+ <p>
+ For <code>blitz()</code>, you'll need a reasonably recent version of
+ gcc. 2.95.2 works on windows and 2.96 looks fine on Linux. Other
+ versions are likely to work. Its likely that KAI's C++ compiler and
+ maybe some others will work, but I haven't tried. My advice is to use
+ gcc for now unless your willing to tinker with the code some.
+ <p>
+ On Windows, either MSVC or gcc (<a
+ href="http://www.mingw.org>www.mingw.org" > mingw32</a>) should work. Again,
+ you'll need gcc for <code>blitz()</code> as the
+ MSVC compiler doesn't handle templates well.
+ <p>
+ I have not tried Cygwin, so please report success if it works for you.
+ <p>
+ </li>
+
+ <li> Numeric or numarray (optional)
+ <p>
+ The python Numeric module from <a
+ href="http://numeric.scipy.org/">here</a>. is required for
+ <code>blitz()</code> to work. Weave now also works with the
+ second generation array package numarray.
+ <p>
+ </li>
+ <li> scipy_distutils and scipy_test (packaged with <code>weave</code>)
+ <p>
+ These two modules are packaged with <code>weave</code> in both
+ the windows installer and the source distributions. If you are using
+ CVS, however, you'll need to download these separately (also available
+ through CVS at SciPy).
+ <p>
+ </li>
+</ul>
+<p>
+
+<a name="Installation"></a>
+<h1>Installation</h1>
+<p>
+There are currently two ways to get <code>weave</code>. Fist,
+<code>weave</code> is part of SciPy and installed automatically (as a sub-
+package) whenever SciPy is installed (although the latest version isn't in
+SciPy yet, so use this one for now). Second, since <code>weave</code> is
+useful outside of the scientific community, it has been setup so that it can be
+used as a stand-alone module.
+
+<p>
+The stand-alone version can be downloaded from <a
+href="http://www.scipy.org/weave">here</a>. Unix users should grab the
+tar ball (.tgz file) and install it using the following commands.
+
+ <blockquote><pre><code>
+ tar -xzvf weave-0.2.tar.gz
+ cd weave-0.2
+ python setup.py install
+ </code></pre></blockquote>
+
+This will also install two other packages, <code>scipy_distutils</code> and
+<code>scipy_test</code>. The first is needed by the setup process itself and
+both are used in the unit-testing process. Numeric is required if you want to
+use <code>blitz()</code>, but isn't necessary for <code>inline()</code> or
+<code>ext_tools</code>
+<p>
+For Windows users, it's even easier. You can download the click-install .exe
+file and run it for automatic installation. There is also a .zip file of the
+source for those interested. It also includes a setup.py file to simplify
+installation.
+<p>
+If you're using the CVS version, you'll need to install
+<code>scipy_distutils</code> and <code>scipy_test</code> packages (also
+available from CVS) on your own.
+<p>
+<em>
+Note: The dependency issue here is a little sticky. I hate to make people
+download more than one file (and so I haven't), but distutils doesn't have a
+way to do conditional installation -- at least that I know about. This can
+lead to undesired clobbering of the scipy_test and scipy_distutils modules.
+What to do, what to do... Right now it is a very minor issue.
+</em>
+<p>
+<a name="Testing"></a>
+<h1>Testing</h1>
+Once <code>weave</code> is installed, fire up python and run its unit tests.
+
+ <blockquote><pre><code>
+ >>> import weave
+ >>> weave.test()
+ runs long time... spews tons of output and a few warnings
+ .
+ .
+ .
+ ..............................................................
+ ................................................................
+ ..................................................
+ ----------------------------------------------------------------------
+ Ran 184 tests in 158.418s
+
+ OK
+ <unittest.TextTestRunner instance at 01562934>
+ >>>
+ </code></pre></blockquote>
+
+This takes a loooong time. On windows, it is usually several minutes. On Unix
+with remote file systems, I've had it take 15 or so minutes. In the end, it
+should run about 180 tests and spew some speed results along the way. If you
+get errors, they'll be reported at the end of the output. Please let me know
+what if this occurs.
+
+If you don't have Numeric installed, you'll get some module import errors
+during the test setup phase for modules that are Numeric specific (blitz_spec,
+blitz_tools, size_check, standard_array_spec, ast_tools), but all test should
+pass (about 100 and they should complete in several minutes).
+<p>
+If you only want to test a single module of the package, you can do this by
+running test() for that specific module.
+
+ <blockquote><pre><code>
+ >>> import weave.scalar_spec
+ >>> weave.scalar_spec.test()
+ .......
+ ----------------------------------------------------------------------
+ Ran 7 tests in 23.284s
+ </code></pre></blockquote>
+<em>
+Testing Notes:
+<ul>
+ <li>
+ Windows 1
+ <p>
+ I've had some test fail on windows machines where I have msvc, gcc-2.95.2
+ (in c:\gcc-2.95.2), and gcc-2.95.3-6 (in c:\gcc) all installed. My
+ environment has c:\gcc in the path and does not have c:\gcc-2.95.2 in the
+ path. The test process runs very smoothly until the end where several test
+ using gcc fail with cpp0 not found by g++. If I check os.system('gcc -v')
+ before running tests, I get gcc-2.95.3-6. If I check after running tests
+ (and after failure), I get gcc-2.95.2. ??huh??. The os.environ['PATH']
+ still has c:\gcc first in it and is not corrupted (msvc/distutils messes
+ with the environment variables, so we have to undo its work in some
+ places). If anyone else sees this, let me know - - it may just be an quirk
+ on my machine (unlikely). Testing with the gcc- 2.95.2 installation always
+ works.
+ <p>
+ </li>
+ <li>
+ Windows 2
+ <p>
+ If you run the tests from PythonWin or some other GUI tool, you'll get a
+ ton of DOS windows popping up periodically as <code>weave</code> spawns
+ the compiler multiple times. Very annoying. Anyone know how to fix this?
+ <p>
+ </li>
+ <li>
+ wxPython
+ <p>
+ wxPython tests are not enabled by default because importing wxPython on a
+ Unix machine without access to a X-term will cause the program to exit.
+ Anyone know of a safe way to detect whether wxPython can be imported and
+ whether a display exists on a machine?
+ <p>
+ </li>
+<p>
+</ul>
+</em>
+
+
+<A name="Benchmarks"></a>
+<h1>Benchmarks</h1>
+This section has a few benchmarks -- thats all people want to see anyway right?
+These are mostly taken from running files in the <code>weave/example</code>
+directory and also from the test scripts. Without more information about what
+the test actually do, their value is limited. Still, their here for the
+curious. Look at the example scripts for more specifics about what problem was
+actually solved by each run. These examples are run under windows 2000 using
+Microsoft Visual C++ and python2.1 on a 850 MHz PIII laptop with 320 MB of RAM.
+Speed up is the improvement (degredation) factor of <code>weave</code> compared to
+conventional Python functions. <code>The blitz()</code> comparisons are shown
+compared to Numeric.
+<p>
+<center>
+<table border=1 width="100%">
+<tr><td colspan="2" width="100%">
+ <P align=center>inline and ext_tools</P> </td></tr>
+ <tr><td><p align=center>Algorithm</td> <td><p align=center>Speed up </td> </tr>
+ <tr><td>binary search</td> <td> &nbsp;&nbsp;1.50 </td> </tr>
+ <tr><td>fibonacci (recursive)</td> <td> &nbsp;82.10 </td> </tr>
+ <tr><td>fibonacci (loop)</td> <td> &nbsp;&nbsp;9.17 </td> </tr>
+ <tr><td>return None</td> <td> &nbsp;&nbsp;0.14 </td> </tr>
+ <tr><td>map</td> <td> &nbsp;&nbsp;1.20 </td> </tr>
+ <tr><td>dictionary sort</td> <td> &nbsp;&nbsp;2.54 </td> </tr>
+ <tr><td>vector quantization</td> <td> &nbsp;37.40 </td> </tr>
+<tr><td colspan="2" width="100%">
+ <P align=center>blitz -- double precision</P> </td></tr>
+ <tr><td><p align=center>Algorithm</td> <td><p align=center>Speed up </td> </tr>
+ <tr><td>a = b + c 512x512</td> <td> &nbsp;&nbsp;3.05 </td> </tr>
+ <tr><td>a = b + c + d 512x512</td> <td> &nbsp;&nbsp;4.59 </td> </tr>
+ <tr><td>5 pt avg. filter, 2D Image 512x512</td> <td> &nbsp;&nbsp;9.01 </td> </tr>
+ <tr><td>Electromagnetics (FDTD) 100x100x100</td> <td> &nbsp;&nbsp;8.61 </td> </tr>
+
+</table>
+</center>
+<p>
+
+The benchmarks shown <code>blitz</code> in the best possible light. Numeric
+(at least on my machine) is significantly worse for double precision than it is
+for single precision calculations. If your interested in single precision
+results, you can pretty much divide the double precision speed up by 3 and you'll
+be close.
+
+<a name="Inline"></a>
+<h1>Inline</h1>
+<p>
+<code>inline()</code> compiles and executes C/C++ code on the fly. Variables
+in the local and global Python scope are also available in the C/C++ code.
+Values are passed to the C/C++ code by assignment much like variables
+are passed into a standard Python function. Values are returned from the C/C++
+code through a special argument called return_val. Also, the contents of
+mutable objects can be changed within the C/C++ code and the changes remain
+after the C code exits and returns to Python. (more on this later)
+<p>
+Here's a trivial <code>printf</code> example using <code>inline()</code>:
+
+ <blockquote><pre><code>
+ >>> import weave
+ >>> a = 1
+ >>> weave.inline('printf("%d\\n",a);',['a'])
+ 1
+ </code></pre></blockquote>
+<p>
+In this, its most basic form, <code>inline(c_code, var_list)</code> requires two
+arguments. <code>c_code</code> is a string of valid C/C++ code.
+<code>var_list</code> is a list of variable names that are passed from
+Python into C/C++. Here we have a simple <code>printf</code> statement that
+writes the Python variable <code>a</code> to the screen. The first time you run
+this, there will be a pause while the code is written to a .cpp file, compiled
+into an extension module, loaded into Python, cataloged for future use, and
+executed. On windows (850 MHz PIII), this takes about 1.5 seconds when using
+Microsoft's C++ compiler (MSVC) and 6-12 seconds using gcc (mingw32 2.95.2).
+All subsequent executions of the code will happen very quickly because the code
+only needs to be compiled once. If you kill and restart the interpreter and then
+execute the same code fragment again, there will be a much shorter delay in the
+fractions of seconds range. This is because <code>weave</code> stores a
+catalog of all previously compiled functions in an on disk cache. When it sees
+a string that has been compiled, it loads the already compiled module and
+executes the appropriate function.
+<p>
+<em>
+Note: If you try the <code>printf</code> example in a GUI shell such as IDLE,
+PythonWin, PyShell, etc., you're unlikely to see the output. This is because the
+C code is writing to stdout, instead of to the GUI window. This doesn't mean
+that inline doesn't work in these environments -- it only means that standard
+out in C is not the same as the standard out for Python in these cases. Non
+input/output functions will work as expected.
+</em>
+<p>
+Although effort has been made to reduce the overhead associated with calling
+inline, it is still less efficient for simple code snippets than using
+equivalent Python code. The simple <code>printf</code> example is actually
+slower by 30% or so than using Python <code>print</code> statement. And, it is
+not difficult to create code fragments that are 8-10 times slower using inline
+than equivalent Python. However, for more complicated algorithms,
+the speed up can be worth while -- anywhwere from 1.5- 30 times faster.
+Algorithms that have to manipulate Python objects (sorting a list) usually only
+see a factor of 2 or so improvement. Algorithms that are highly computational
+or manipulate Numeric arrays can see much larger improvements. The
+examples/vq.py file shows a factor of 30 or more improvement on the vector
+quantization algorithm that is used heavily in information theory and
+classification problems.
+<p>
+
+<a name="More with printf"></a>
+<h2>More with printf</h2>
+<p>
+MSVC users will actually see a bit of compiler output that distutils does not
+supress the first time the code executes:
+
+ <blockquote><pre><code>
+ >>> weave.inline(r'printf("%d\n",a);',['a'])
+ sc_e013937dbc8c647ac62438874e5795131.cpp
+ Creating library C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp
+ \Release\sc_e013937dbc8c647ac62438874e5795131.lib and object C:\DOCUME
+ ~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_e013937dbc8c64
+ 7ac62438874e5795131.exp
+ 1
+ </code></pre></blockquote>
+<p>
+Nothing bad is happening, its just a bit annoying. <em> Anyone know how to
+turn this off?</em>
+<p>
+This example also demonstrates using 'raw strings'. The <code>r</code>
+preceeding the code string in the last example denotes that this is a 'raw
+string'. In raw strings, the backslash character is not interpreted as an
+escape character, and so it isn't necessary to use a double backslash to
+indicate that the '\n' is meant to be interpreted in the C <code>printf</code>
+statement instead of by Python. If your C code contains a lot
+of strings and control characters, raw strings might make things easier.
+Most of the time, however, standard strings work just as well.
+
+<p>
+The <code>printf</code> statement in these examples is formatted to print
+out integers. What happens if <code>a</code> is a string? <code>inline</code>
+will happily, compile a new version of the code to accept strings as input,
+and execute the code. The result?
+
+ <blockquote><pre><code>
+ >>> a = 'string'
+ >>> weave.inline(r'printf("%d\n",a);',['a'])
+ 32956972
+ </code></pre></blockquote>
+<p>
+In this case, the result is non-sensical, but also non-fatal. In other
+situations, it might produce a compile time error because <code>a</code> is
+required to be an integer at some point in the code, or it could produce a
+segmentation fault. Its possible to protect against passing
+<code>inline</code> arguments of the wrong data type by using asserts in
+Python.
+
+ <blockquote><pre><code>
+ >>> a = 'string'
+ >>> def protected_printf(a):
+ ... assert(type(a) == type(1))
+ ... weave.inline(r'printf("%d\n",a);',['a'])
+ >>> protected_printf(1)
+ 1
+ >>> protected_printf('string')
+ AssertError...
+ </code></pre></blockquote>
+
+<p>
+For printing strings, the format statement needs to be changed. Also, weave
+doesn't convert strings to char*. Instead it uses CXX Py::String type, so
+you have to do a little more work. Here we convert it to a C++ std::string
+and then ask cor the char* version.
+
+ <blockquote><pre><code>
+ >>> a = 'string'
+ >>> weave.inline(r'printf("%s\n",std::string(a).c_str());',['a'])
+ string
+ </code></pre></blockquote>
+<p>
+<em>
+This is a little convoluted. Perhaps strings should convert to std::string
+objects instead of CXX objects. Or maybe to char*.
+</em>
+
+<p>
+As in this case, C/C++ code fragments often have to change to accept different
+types. For the given printing task, however, C++ streams provide a way of a
+single statement that works for integers and strings. By default, the stream
+objects live in the std (standard) namespace and thus require the use of
+<code>std::</code>.
+
+ <blockquote><pre><code>
+ >>> weave.inline('std::cout << a << std::endl;',['a'])
+ 1
+ >>> a = 'string'
+ >>> weave.inline('std::cout << a << std::endl;',['a'])
+ string
+ </code></pre></blockquote>
+
+<p>
+Examples using <code>printf</code> and <code>cout</code> are included in
+examples/print_example.py.
+
+<a name="More examples"></a>
+<h2> More examples </h2>
+
+This section shows several more advanced uses of <code>inline</code>. It
+includes a few algorithms from the <a
+href="http://aspn.activestate.com/ASPN/Cookbook/Python">Python Cookbook</a>
+that have been re-written in inline C to improve speed as well as a couple
+examples using Numeric and wxPython.
+
+<a name="Binary search"></a>
+<h3> Binary search</h3>
+Lets look at the example of searching a sorted list of integers for a value.
+For inspiration, we'll use Kalle Svensson's <a
+href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81188">
+binary_search()</a> algorithm from the Python Cookbook. His recipe follows:
+
+ <blockquote><pre><code>
+ def binary_search(seq, t):
+ min = 0; max = len(seq) - 1
+ while 1:
+ if max < min:
+ return -1
+ m = (min + max) / 2
+ if seq[m] < t:
+ min = m + 1
+ elif seq[m] > t:
+ max = m - 1
+ else:
+ return m
+ </blockquote></PRE></CODE>
+
+This Python version works for arbitrary Python data types. The C version below is
+specialized to handle integer values. There is a little type checking done in
+Python to assure that we're working with the correct data types before heading
+into C. The variables <code>seq</code> and <code>t</code> don't need to be
+declared beacuse <code>weave</code> handles converting and declaring them in
+the C code. All other temporary variables such as <code>min, max</code>, etc.
+must be declared -- it is C after all. Here's the new mixed Python/C function:
+
+ <blockquote><pre><code>
+ def c_int_binary_search(seq,t):
+ # do a little type checking in Python
+ assert(type(t) == type(1))
+ assert(type(seq) == type([]))
+
+ # now the C code
+ code = """
+ #line 29 "binary_search.py"
+ int val, m, min = 0;
+ int max = seq.length() - 1;
+ PyObject *py_val;
+ for(;;)
+ {
+ if (max < min )
+ {
+ return_val = Py::new_reference_to(Py::Int(-1));
+ break;
+ }
+ m = (min + max) /2;
+ val = py_to_int(PyList_GetItem(seq.ptr(),m),"val");
+ if (val < t)
+ min = m + 1;
+ else if (val > t)
+ max = m - 1;
+ else
+ {
+ return_val = Py::new_reference_to(Py::Int(m));
+ break;
+ }
+ }
+ """
+ return inline(code,['seq','t'])
+ </code></pre></blockquote>
+<p>
+We have two variables <code>seq</code> and <code>t</code> passed in.
+<code>t</code> is guaranteed (by the <code>assert</code>) to be an integer.
+Python integers are converted to C int types in the transition from Python to
+C. <code>seq</code> is a Python list. By default, it is translated to a CXX
+list object. Full documentation for the CXX library can be found at its <a
+href="http://cxx.sourceforge.net/">website</a>. The basics are that the CXX
+provides C++ class equivalents for Python objects that simplify, or at
+least object orientify, working with Python objects in C/C++. For example,
+<code>seq.length()</code> returns the length of the list. A little more about
+CXX and its class methods, etc. is in the ** type conversions ** section.
+<p>
+<em>
+Note: CXX uses templates and therefore may be a little less portable than
+another alternative by Gordan McMillan called SCXX which was inspired by
+CXX. It doesn't use templates so it should compile faster and be more portable.
+SCXX has a few less features, but it appears to me that it would mesh with
+the needs of weave quite well. Hopefully xxx_spec files will be written
+for SCXX in the future, and we'll be able to compare on a more empirical
+basis. Both sets of spec files will probably stick around, it just a question
+of which becomes the default.
+</em>
+<p>
+Most of the algorithm above looks similar in C to the original Python code.
+There are two main differences. The first is the setting of
+<code>return_val</code> instead of directly returning from the C code with a
+<code>return</code> statement. <code>return_val</code> is an automatically
+defined variable of type <code>PyObject*</code> that is returned from the C
+code back to Python. You'll have to handle reference counting issues when
+setting this variable. In this example, CXX classes and functions handle the
+dirty work. All CXX functions and classes live in the namespace
+<code>Py::</code>. The following code converts the integer <code>m</code> to a
+CXX <code>Int()</code> object and then to a <code>PyObject*</code> with an
+incremented reference count using <code>Py::new_reference_to()</code>.
+
+ <blockquote><pre><code>
+ return_val = Py::new_reference_to(Py::Int(m));
+ </code></pre></blockquote>
+<p>
+The second big differences shows up in the retrieval of integer values from the
+Python list. The simple Python <code>seq[i]</code> call balloons into a C
+Python API call to grab the value out of the list and then a separate call to
+<code>py_to_int()</code> that converts the PyObject* to an integer.
+<code>py_to_int()</code> includes both a NULL cheack and a
+<code>PyInt_Check()</code> call as well as the conversion call. If either of
+the checks fail, an exception is raised. The entire C++ code block is executed
+with in a <code>try/catch</code> block that handles exceptions much like Python
+does. This removes the need for most error checking code.
+<p>
+It is worth note that CXX lists do have indexing operators that result
+in code that looks much like Python. However, the overhead in using them
+appears to be relatively high, so the standard Python API was used on the
+<code>seq.ptr()</code> which is the underlying <code>PyObject*</code> of the
+List object.
+<p>
+The <code>#line</code> directive that is the first line of the C code
+block isn't necessary, but it's nice for debugging. If the compilation fails
+because of the syntax error in the code, the error will be reported as an error
+in the Python file "binary_search.py" with an offset from the given line number
+(29 here).
+<p>
+So what was all our effort worth in terms of efficiency? Well not a lot in
+this case. The examples/binary_search.py file runs both Python and C versions
+of the functions As well as using the standard <code>bisect</code> module. If
+we run it on a 1 million element list and run the search 3000 times (for 0-
+2999), here are the results we get:
+
+ <blockquote><pre><code>
+ C:\home\ej\wrk\scipy\weave\examples> python binary_search.py
+ Binary search for 3000 items in 1000000 length list of integers:
+ speed in python: 0.159999966621
+ speed of bisect: 0.121000051498
+ speed up: 1.32
+ speed in c: 0.110000014305
+ speed up: 1.45
+ speed in c(no asserts): 0.0900000333786
+ speed up: 1.78
+ </code></pre></blockquote>
+<p>
+So, we get roughly a 50-75% improvement depending on whether we use the Python
+asserts in our C version. If we move down to searching a 10000 element list,
+the advantage evaporates. Even smaller lists might result in the Python
+version being faster. I'd like to say that moving to Numeric lists (and
+getting rid of the GetItem() call) offers a substantial speed up, but my
+preliminary efforts didn't produce one. I think the log(N) algorithm is to
+blame. Because the algorithm is nice, there just isn't much time spent
+computing things, so moving to C isn't that big of a win. If there are ways to
+reduce conversion overhead of values, this may improve the C/Python speed
+up. Anyone have other explanations or faster code, please let me know.
+
+<a name="#Dictionary sort"></a>
+<h3> Dictionary Sort</h3>
+<p>
+The demo in examples/dict_sort.py is another example from the Python CookBook.
+<a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52306">This
+submission</a>, by Alex Martelli, demonstrates how to return the values from a
+dictionary sorted by their keys:
+
+ <blockquote><pre><code>
+ def sortedDictValues3(adict):
+ keys = adict.keys()
+ keys.sort()
+ return map(adict.get, keys)
+ </code></pre></blockquote>
+<p>
+Alex provides 3 algorithms and this is the 3rd and fastest of the set. The C
+version of this same algorithm follows:
+
+ <blockquote><pre><code>
+ def c_sort(adict):
+ assert(type(adict) == type({}))
+ code = """
+ #line 21 "dict_sort.py"
+ Py::List keys = adict.keys();
+ Py::List items(keys.length()); keys.sort();
+ PyObject* item = NULL;
+ for(int i = 0; i < keys.length();i++)
+ {
+ item = PyList_GET_ITEM(keys.ptr(),i);
+ item = PyDict_GetItem(adict.ptr(),item);
+ Py_XINCREF(item);
+ PyList_SetItem(items.ptr(),i,item);
+ }
+ return_val = Py::new_reference_to(items);
+ """
+ return inline_tools.inline(code,['adict'],verbose=1)
+ </code></pre></blockquote>
+<p>
+Like the original Python function, the C++ version can handle any Python
+dictionary regardless of the key/value pair types. It uses CXX objects for the
+most part to declare python types in C++, but uses Python API calls to manipulate
+their contents. Again, this choice is made for speed. The C++ version, while
+more complicated, is about a factor of 2 faster than Python.
+
+ <blockquote><pre><code>
+ C:\home\ej\wrk\scipy\weave\examples> python dict_sort.py
+ Dict sort of 1000 items for 300 iterations:
+ speed in python: 0.319999933243
+ [0, 1, 2, 3, 4]
+ speed in c: 0.151000022888
+ speed up: 2.12
+ [0, 1, 2, 3, 4]
+ </code></pre></blockquote>
+<p>
+<a name="#Numeric -- cast/copy/transpose"></a>
+<h3>Numeric -- cast/copy/transpose</h3>
+
+CastCopyTranspose is a function called quite heavily by Linear Algebra routines
+in the Numeric library. Its needed in part because of the row-major memory layout
+of multi-demensional Python (and C) arrays vs. the col-major order of the underlying
+Fortran algorithms. For small matrices (say 100x100 or less), a significant
+portion of the common routines such as LU decompisition or singular value decompostion
+are spent in this setup routine. This shouldn't happen. Here is the Python
+version of the function using standard Numeric operations.
+
+ <blockquote><pre><code>
+ def _castCopyAndTranspose(type, array):
+ if a.typecode() == type:
+ cast_array = copy.copy(Numeric.transpose(a))
+ else:
+ cast_array = copy.copy(Numeric.transpose(a).astype(type))
+ return cast_array
+ </code></pre></blockquote>
+
+And the following is a inline C version of the same function:
+
+ <blockquote><pre><code>
+ from weave.blitz_tools import blitz_type_factories
+ from weave import scalar_spec
+ from weave import inline
+ def _cast_copy_transpose(type,a_2d):
+ assert(len(shape(a_2d)) == 2)
+ new_array = zeros(shape(a_2d),type)
+ numeric_type = scalar_spec.numeric_to_blitz_type_mapping[type]
+ code = \
+ """
+ for(int i = 0;i < _Na_2d[0]; i++)
+ for(int j = 0; j < _Na_2d[1]; j++)
+ new_array(i,j) = (%s) a_2d(j,i);
+ """ % numeric_type
+ inline(code,['new_array','a_2d'],
+ type_factories = blitz_type_factories,compiler='gcc')
+ return new_array
+ </code></pre></blockquote>
+
+This example uses blitz++ arrays instead of the standard representation of
+Numeric arrays so that indexing is simplier to write. This is accomplished by
+passing in the blitz++ "type factories" to override the standard Python to C++
+type conversions. Blitz++ arrays allow you to write clean, fast code, but they
+also are sloooow to compile (20 seconds or more for this snippet). This is why
+they aren't the default type used for Numeric arrays (and also because most
+compilers can't compile blitz arrays...). <code>inline()</code> is also forced
+to use 'gcc' as the compiler because the default compiler on Windows (MSVC)
+will not compile blitz code. <em> 'gcc' I think will use the standard compiler
+on Unix machine instead of explicitly forcing gcc (check this) </em>
+
+Comparisons of the Python vs inline C++ code show a factor of 3 speed up. Also
+shown are the results of an "inplace" transpose routine that can be used if the
+output of the linear algebra routine can overwrite the original matrix (this is
+often appropriate). This provides another factor of 2 improvement.
+
+ <blockquote><pre><code>
+ #C:\home\ej\wrk\scipy\weave\examples> python cast_copy_transpose.py
+ # Cast/Copy/Transposing (150,150)array 1 times
+ # speed in python: 0.870999932289
+ # speed in c: 0.25
+ # speed up: 3.48
+ # inplace transpose c: 0.129999995232
+ # speed up: 6.70
+ </code></pre></blockquote>
+
+<a name="#wxPython" a <>
+<h3>wxPython</h3>
+
+<code>inline</code> knows how to handle wxPython objects. Thats nice in and of
+itself, but it also demonstrates that the type conversion mechanism is reasonably
+flexible. Chances are, it won't take a ton of effort to support special types
+you might have. The examples/wx_example.py borrows the scrolled window
+example from the wxPython demo, accept that it mixes inline C code in the middle
+of the drawing function.
+
+ <blockquote><pre><code>
+ def DoDrawing(self, dc):
+
+ red = wxNamedColour("RED");
+ blue = wxNamedColour("BLUE");
+ grey_brush = wxLIGHT_GREY_BRUSH;
+ code = \
+ """
+ #line 108 "wx_example.py"
+ dc->BeginDrawing();
+ dc->SetPen(wxPen(*red,4,wxSOLID));
+ dc->DrawRectangle(5,5,50,50);
+ dc->SetBrush(*grey_brush);
+ dc->SetPen(wxPen(*blue,4,wxSOLID));
+ dc->DrawRectangle(15, 15, 50, 50);
+ """
+ inline(code,['dc','red','blue','grey_brush'])
+
+ dc.SetFont(wxFont(14, wxSWISS, wxNORMAL, wxNORMAL))
+ dc.SetTextForeground(wxColour(0xFF, 0x20, 0xFF))
+ te = dc.GetTextExtent("Hello World")
+ dc.DrawText("Hello World", 60, 65)
+
+ dc.SetPen(wxPen(wxNamedColour('VIOLET'), 4))
+ dc.DrawLine(5, 65+te[1], 60+te[0], 65+te[1])
+ ...
+ </code></pre></blockquote>
+
+Here, some of the Python calls to wx objects were just converted to C++ calls. There
+isn't any benefit, it just demonstrates the capabilities. You might want to use this
+if you have a computationally intensive loop in your drawing code that you want to
+speed up.
+
+On windows, you'll have to use the MSVC compiler if you use the standard wxPython
+DLLs distributed by Robin Dunn. Thats because MSVC and gcc, while binary
+compatible in C, are not binary compatible for C++. In fact, its probably best, no
+matter what platform you're on, to specify that <code>inline</code> use the same
+compiler that was used to build wxPython to be on the safe side. There isn't currently
+a way to learn this info from the library -- you just have to know. Also, at least
+on the windows platform, you'll need to install the wxWindows libraries and link to
+them. I think there is a way around this, but I haven't found it yet -- I get some
+linking errors dealing with wxString. One final note. You'll probably have to
+tweak weave/wx_spec.py or weave/wx_info.py for your machine's configuration to
+point at the correct directories etc. There. That should sufficiently scare people
+into not even looking at this... :)
+
+<a name="Keyword Options"></a>
+<h2> Keyword Options </h2>
+<p>
+The basic definition of the <code>inline()</code> function has a slew of
+optional variables. It also takes keyword arguments that are passed to
+<code>distutils</code> as compiler options. The following is a formatted
+cut/paste of the argument section of <code>inline's</code> doc-string. It
+explains all of the variables. Some examples using various options will
+follow.
+
+ <blockquote><pre><code>
+ def inline(code,arg_names,local_dict = None, global_dict = None,
+ force = 0,
+ compiler='',
+ verbose = 0,
+ support_code = None,
+ customize=None,
+ type_factories = None,
+ auto_downcast=1,
+ **kw):
+ </code></pre></blockquote>
+
+
+<code>inline</code> has quite
+a few options as listed below. Also, the keyword arguments for distutils
+extension modules are accepted to specify extra information needed for
+compiling.
+<BLOCKQUOTE></BLOCKQUOTE>
+<h4>inline Arguments:</h4>
+<blockquote>
+<dl>
+<dt>code </dt>
+
+<dd>
+string. A string of valid C++ code. It should not
+ specify a return statement. Instead it should assign results that need to be
+ returned to Python in the return_val.
+</dd>
+
+<dt>arg_names </dt>
+
+<dd>
+list of strings. A list of Python variable names
+ that should be transferred from Python into the C/C++ code.
+</dd>
+
+<dt>local_dict </dt>
+
+<dd>
+optional. dictionary. If specified, it is a
+ dictionary of values that should be used as the local scope for the C/C++
+ code. If local_dict is not specified the local dictionary of the calling
+ function is used.
+</dd>
+
+<dt>global_dict </dt>
+
+<dd>
+optional. dictionary. If specified, it is a
+ dictionary of values that should be used as the global scope for the C/C++
+ code. If global_dict is not specified the global dictionary of the calling
+ function is used.
+</dd>
+
+<dt>force </dt>
+
+<dd>
+optional. 0 or 1. default 0. If 1, the C++ code is
+ compiled every time inline is called. This is really only useful for
+ debugging, and probably only useful if you're editing support_code a lot.
+</dd>
+
+<dt>compiler </dt>
+
+<dd>
+optional. string. The name of compiler to use when compiling. On windows, it
+understands 'msvc' and 'gcc' as well as all the compiler names understood by
+distutils. On Unix, it'll only understand the values understoof by distutils.
+(I should add 'gcc' though to this).
+<p>
+On windows, the compiler defaults to the Microsoft C++ compiler. If this isn't
+available, it looks for mingw32 (the gcc compiler).
+<p>
+On Unix, it'll probably use the same compiler that was used when compiling
+Python. Cygwin's behavior should be similar.</p>
+</dd>
+
+<dt>verbose </dt>
+
+<dd>
+optional. 0,1, or 2. defualt 0. Speficies how much
+ much information is printed during the compile phase of inlining code. 0 is
+ silent (except on windows with msvc where it still prints some garbage). 1
+ informs you when compiling starts, finishes, and how long it took. 2 prints
+ out the command lines for the compilation process and can be useful if you're
+ having problems getting code to work. Its handy for finding the name of the
+ .cpp file if you need to examine it. verbose has no affect if the
+ compilation isn't necessary.
+</dd>
+
+<dt>support_code </dt>
+
+<dd>
+optional. string. A string of valid C++ code
+ declaring extra code that might be needed by your compiled function. This
+ could be declarations of functions, classes, or structures.
+</dd>
+
+<dt>customize </dt>
+
+<dd>
+optional. base_info.custom_info object. An
+ alternative way to specifiy support_code, headers, etc. needed by the
+ function see the weave.base_info module for more details. (not sure
+ this'll be used much).
+
+</dd>
+<dt>type_factories </dt>
+
+<dd>
+optional. list of type specification factories. These guys are what convert
+Python data types to C/C++ data types. If you'd like to use a different set of
+type conversions than the default, specify them here. Look in the type
+conversions section of the main documentation for examples.
+</dd>
+<dt>auto_downcast </dt>
+
+<dd>
+optional. 0 or 1. default 1. This only affects functions that have Numeric
+arrays as input variables. Setting this to 1 will cause all floating point
+values to be cast as float instead of double if all the Numeric arrays are of
+type float. If even one of the arrays has type double or double complex, all
+variables maintain there standard types.
+</dd>
+</dl>
+</blockquote>
+
+<h4> Distutils keywords:</h4>
+<blockquote>
+<code>inline()</code> also accepts a number of <code>distutils</code> keywords
+for controlling how the code is compiled. The following descriptions have been
+copied from Greg Ward's <code>distutils.extension.Extension</code> class doc-
+strings for convenience:
+
+<dl>
+<dt>sources </dt>
+
+<dd>
+[string] list of source filenames, relative to the
+ distribution root (where the setup script lives), in Unix form
+ (slash-separated) for portability. Source files may be C, C++, SWIG (.i),
+ platform-specific resource files, or whatever else is recognized by the
+ "build_ext" command as source for a Python extension. Note: The module_path
+ file is always appended to the front of this list
+</dd>
+
+<dt>include_dirs </dt>
+
+<dd>
+[string] list of directories to search for C/C++
+ header files (in Unix form for portability)
+</dd>
+
+<dt>define_macros </dt>
+
+<dd>
+[(name : string, value : string|None)] list of
+ macros to define; each macro is defined using a 2-tuple, where 'value' is
+ either the string to define it to or None to define it without a particular
+ value (equivalent of "#define FOO" in source or -DFOO on Unix C compiler
+ command line)
+</dd>
+<dt>undef_macros </dt>
+
+<dd>
+[string] list of macros to undefine explicitly
+</dd>
+<dt>library_dirs </dt>
+<dd>
+[string] list of directories to search for C/C++ libraries at link time
+</dd>
+<dt>libraries </dt>
+<dd>
+[string] list of library names (not filenames or paths) to link against
+</dd>
+<dt>runtime_library_dirs </dt>
+<dd>
+[string] list of directories to search for C/C++ libraries at run time (for
+shared extensions, this is when the extension is loaded)
+</dd>
+
+<dt>extra_objects </dt>
+
+<dd>
+[string] list of extra files to link with (eg.
+ object files not implied by 'sources', static library that must be
+ explicitly specified, binary resource files, etc.)
+</dd>
+
+<dt>extra_compile_args </dt>
+
+<dd>
+[string] any extra platform- and compiler-specific
+ information to use when compiling the source files in 'sources'. For
+ platforms and compilers where "command line" makes sense, this is typically
+ a list of command-line arguments, but for other platforms it could be
+ anything.
+</dd>
+<dt>extra_link_args </dt>
+
+<dd>
+[string] any extra platform- and compiler-specific
+ information to use when linking object files together to create the
+ extension (or to create a new static Python interpreter). Similar
+ interpretation as for 'extra_compile_args'.
+</dd>
+<dt>export_symbols </dt>
+
+<dd>
+[string] list of symbols to be exported from a shared extension. Not used on
+all platforms, and not generally necessary for Python extensions, which
+typically export exactly one symbol: "init" + extension_name.
+</dd>
+</dl>
+</blockquote>
+
+<a name="Keyword Option Examples"></a>
+<h3> Keyword Option Examples</h3>
+We'll walk through several examples here to demonstrate the behavior of
+<code>inline</code> and also how the various arguments are used.
+
+In the simplest (most) cases, <code>code</code> and <code>arg_names</code>
+are the only arguments that need to be specified. Here's a simple example
+run on Windows machine that has Microsoft VC++ installed.
+
+ <blockquote><pre><code>
+ >>> from weave import inline
+ >>> a = 'string'
+ >>> code = """
+ ... int l = a.length();
+ ... return_val = Py::new_reference_to(Py::Int(l));
+ ... """
+ >>> inline(code,['a'])
+ sc_86e98826b65b047ffd2cd5f479c627f12.cpp
+ Creating
+ library C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b047ffd2cd5f479c627f12.lib
+ and object C:\DOCUME~ 1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b047ff
+ d2cd5f479c627f12.exp
+ 6
+ >>> inline(code,['a'])
+ 6
+ </code></pre></blockquote>
+
+When <code>inline</code> is first run, you'll notice that pause and some
+trash printed to the screen. The "trash" is acutually part of the compilers
+output that distutils does not supress. The name of the extension file,
+<code>sc_bighonkingnumber.cpp</code>, is generated from the md5 check sum
+of the C/C++ code fragment. On Unix or windows machines with only
+gcc installed, the trash will not appear. On the second call, the code
+fragment is not compiled since it already exists, and only the answer is
+returned. Now kill the interpreter and restart, and run the same code with
+a different string.
+
+ <blockquote><pre><code>
+ >>> from weave import inline
+ >>> a = 'a longer string'
+ >>> code = """
+ ... int l = a.length();
+ ... return_val = Py::new_reference_to(Py::Int(l));
+ ... """
+ >>> inline(code,['a'])
+ 15
+ </code></pre></blockquote>
+<p>
+Notice this time, <code>inline()</code> did not recompile the code because it
+found the compiled function in the persistent catalog of functions. There is
+a short pause as it looks up and loads the function, but it is much shorter
+than compiling would require.
+<p>
+You can specify the local and global dictionaries if you'd like (much like
+<code>exec</code> or <code>eval()</code> in Python), but if they aren't
+specified, the "expected" ones are used -- i.e. the ones from the function that
+called <code>inline() </code>. This is accomplished through a little call
+frame trickery. Here is an example where the local_dict is specified using
+the same code example from above:
+
+ <blockquote><pre><code>
+ >>> a = 'a longer string'
+ >>> b = 'an even longer string'
+ >>> my_dict = {'a':b}
+ >>> inline(code,['a'])
+ 15
+ >>> inline(code,['a'],my_dict)
+ 21
+ </code></pre></blockquote>
+
+<p>
+Everytime, the <code>code</code> is changed, <code>inline</code> does a
+recompile. However, changing any of the other options in inline does not
+force a recompile. The <code>force</code> option was added so that one
+could force a recompile when tinkering with other variables. In practice,
+it is just as easy to change the <code>code</code> by a single character
+(like adding a space some place) to force the recompile. <em>Note: It also
+might be nice to add some methods for purging the cache and on disk
+catalogs.</em>
+<p>
+I use <code>verbose</code> sometimes for debugging. When set to 2, it'll
+output all the information (including the name of the .cpp file) that you'd
+expect from running a make file. This is nice if you need to examine the
+generated code to see where things are going haywire. Note that error
+messages from failed compiles are printed to the screen even if <code>verbose
+</code> is set to 0.
+<p>
+The following example demonstrates using gcc instead of the standard msvc
+compiler on windows using same code fragment as above. Because the example has
+already been compiled, the <code>force=1</code> flag is needed to make
+<code>inline()</code> ignore the previously compiled version and recompile
+using gcc. The verbose flag is added to show what is printed out:
+
+ <blockquote><pre><code>
+ >>>inline(code,['a'],compiler='gcc',verbose=2,force=1)
+ running build_ext
+ building 'sc_86e98826b65b047ffd2cd5f479c627f13' extension
+ c:\gcc-2.95.2\bin\g++.exe -mno-cygwin -mdll -O2 -w -Wstrict-prototypes -IC:
+ \home\ej\wrk\scipy\weave -IC:\Python21\Include -c C:\DOCUME~1\eric\LOCAL
+ S~1\Temp\python21_compiled\sc_86e98826b65b047ffd2cd5f479c627f13.cpp -o C:\D
+ OCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b04
+ 7ffd2cd5f479c627f13.o
+ skipping C:\home\ej\wrk\scipy\weave\CXX\cxxextensions.c (C:\DOCUME~1\eri
+ c\LOCALS~1\Temp\python21_compiled\temp\Release\cxxextensions.o up-to-date)
+ skipping C:\home\ej\wrk\scipy\weave\CXX\cxxsupport.cxx (C:\DOCUME~1\eric
+ \LOCALS~1\Temp\python21_compiled\temp\Release\cxxsupport.o up-to-date)
+ skipping C:\home\ej\wrk\scipy\weave\CXX\IndirectPythonInterface.cxx (C:\
+ DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\indirectpythonin
+ terface.o up-to-date)
+ skipping C:\home\ej\wrk\scipy\weave\CXX\cxx_extensions.cxx (C:\DOCUME~1\
+ eric\LOCALS~1\Temp\python21_compiled\temp\Release\cxx_extensions.o up-to-da
+ te)
+ writing C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86
+ e98826b65b047ffd2cd5f479c627f13.def
+ c:\gcc-2.95.2\bin\dllwrap.exe --driver-name g++ -mno-cygwin -mdll -static -
+ -output-lib C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\l
+ ibsc_86e98826b65b047ffd2cd5f479c627f13.a --def C:\DOCUME~1\eric\LOCALS~1\Te
+ mp\python21_compiled\temp\Release\sc_86e98826b65b047ffd2cd5f479c627f13.def
+ -s C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e9882
+ 6b65b047ffd2cd5f479c627f13.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compil
+ ed\temp\Release\cxxextensions.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_com
+ piled\temp\Release\cxxsupport.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_com
+ piled\temp\Release\indirectpythoninterface.o C:\DOCUME~1\eric\LOCALS~1\Temp
+ \python21_compiled\temp\Release\cxx_extensions.o -LC:\Python21\libs -lpytho
+ n21 -o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\sc_86e98826b65b047f
+ fd2cd5f479c627f13.pyd
+ 15
+ </code></pre></blockquote>
+
+That's quite a bit of output. <code>verbose=1</code> just prints the compile
+time.
+
+ <blockquote><pre><code>
+ >>>inline(code,['a'],compiler='gcc',verbose=1,force=1)
+ Compiling code...
+ finished compiling (sec): 6.00800001621
+ 15
+ </code></pre></blockquote>
+
+<p>
+<em> Note: I've only used the <code>compiler</code> option for switching between 'msvc'
+and 'gcc' on windows. It may have use on Unix also, but I don't know yet.
+</em>
+
+<p>
+The <code>support_code</code> argument is likely to be used a lot. It allows
+you to specify extra code fragments such as function, structure or class
+definitions that you want to use in the <code>code</code> string. Note that
+changes to <code>support_code</code> do <em>not</em> force a recompile. The
+catalog only relies on <code>code</code> (for performance reasons) to determine
+whether recompiling is necessary. So, if you make a change to support_code,
+you'll need to alter <code>code</code> in some way or use the
+<code>force</code> argument to get the code to recompile. I usually just add
+some inocuous whitespace to the end of one of the lines in <code>code</code>
+somewhere. Here's an example of defining a separate method for calculating
+the string length:
+
+ <blockquote><pre><code>
+ >>> from weave import inline
+ >>> a = 'a longer string'
+ >>> support_code = """
+ ... PyObject* length(Py::String a)
+ ... {
+ ... int l = a.length();
+ ... return Py::new_reference_to(Py::Int(l));
+ ... }
+ ... """
+ >>> inline("return_val = length(a);",['a'],
+ ... support_code = support_code)
+ 15
+ </code></pre></blockquote>
+<p>
+<code>customize</code> is a left over from a previous way of specifying
+compiler options. It is a <code>custom_info</code> object that can specify
+quite a bit of information about how a file is compiled. These
+<code>info</code> objects are the standard way of defining compile information
+for type conversion classes. However, I don't think they are as handy here,
+especially since we've exposed all the keyword arguments that distutils can
+handle. Between these keywords, and the <code>support_code</code> option, I
+think <code>customize</code> may be obsolete. We'll see if anyone cares to use
+it. If not, it'll get axed in the next version.
+<p>
+The <code>type_factories</code> variable is important to people who want to
+customize the way arguments are converted from Python to C. We'll talk about
+this in the next chapter **xx** of this document when we discuss type
+conversions.
+<p>
+<code>auto_downcast</code> handles one of the big type conversion issues that
+is common when using Numeric arrays in conjunction with Python scalar values.
+If you have an array of single precision values and multiply that array by a
+Python scalar, the result is upcast to a double precision array because the
+scalar value is double precision. This is not usually the desired behavior
+because it can double your memory usage. <code>auto_downcast</code> goes
+some distance towards changing the casting precedence of arrays and scalars.
+If your only using single precision arrays, it will automatically downcast all
+scalar values from double to single precision when they are passed into the
+C++ code. This is the default behavior. If you want all values to keep there
+default type, set <code>auto_downcast</code> to 0.
+<p>
+
+
+<a name="Returning Values"></a>
+<h3> Returning Values</h3>
+
+Python variables in the local and global scope transfer seemlessly from Python
+into the C++ snippets. And, if <code>inline</code> were to completely live up
+to its name, any modifications to variables in the C++ code would be reflected
+in the Python variables when control was passed back to Python. For example,
+the desired behavior would be something like:
+
+ <blockquote><pre><code>
+ # THIS DOES NOT WORK
+ >>> a = 1
+ >>> weave.inline("a++;",['a'])
+ >>> a
+ 2
+ </code></pre></blockquote>
+
+Instead you get:
+
+ <blockquote><pre><code>
+ >>> a = 1
+ >>> weave.inline("a++;",['a'])
+ >>> a
+ 1
+ </code></pre></blockquote>
+
+Variables are passed into C++ as if you are calling a Python function. Python's
+calling convention is sometimes called "pass by assignment". This means its as
+if a <code>c_a = a</code> assignment is made right before <code>inline</code>
+call is made and the <code>c_a</code> variable is used within the C++ code.
+Thus, any changes made to <code>c_a</code> are not reflected in Python's
+<code>a</code> variable. Things do get a little more confusing, however, when
+looking at variables with mutable types. Changes made in C++ to the contents
+of mutable types <em>are</em> reflected in the Python variables.
+
+ <blockquote><pre><code>
+ >>> a= [1,2]
+ >>> weave.inline("PyList_SetItem(a.ptr(),0,PyInt_FromLong(3));",['a'])
+ >>> print a
+ [3, 2]
+ </code></pre></blockquote>
+
+So modifications to the contents of mutable types in C++ are seen when control
+is returned to Python. Modifications to immutable types such as tuples,
+strings, and numbers do not alter the Python variables.
+
+If you need to make changes to an immutable variable, you'll need to assign
+the new value to the "magic" variable <code>return_val</code> in C++. This
+value is returned by the <code>inline()</code> function:
+
+ <blockquote><pre><code>
+ >>> a = 1
+ >>> a = weave.inline("return_val = Py::new_reference_to(Py::Int(a+1));",['a'])
+ >>> a
+ 2
+ </code></pre></blockquote>
+
+The <code>return_val</code> variable can also be used to return newly created
+values. This is possible by returning a tuple. The following trivial example
+illustrates how this can be done:
+
+ <blockquote><pre><code>
+ # python version
+ def multi_return():
+ return 1, '2nd'
+
+ # C version.
+ def c_multi_return():
+ code = """
+ Py::Tuple results(2);
+ results[0] = Py::Int(1);
+ results[1] = Py::String("2nd");
+ return_val = Py::new_reference_to(results);
+ """
+ return inline_tools.inline(code)
+ </code></pre></blockquote>
+<p>
+The example is available in <code>examples/tuple_return.py</code>. It also
+has the dubious honor of demonstrating how much <code>inline()</code> can
+slow things down. The C version here is about 10 times slower than the Python
+version. Of course, something so trivial has no reason to be written in
+C anyway.
+
+<a name="The issue with locals()"></a>
+<h4> The issue with <code>locals()</code></h4>
+<p>
+<code>inline</code> passes the <code>locals()</code> and <code>globals()</code>
+dictionaries from Python into the C++ function from the calling function. It
+extracts the variables that are used in the C++ code from these dictionaries,
+converts then to C++ variables, and then calculates using them. It seems like
+it would be trivial, then, after the calculations were finished to then insert
+the new values back into the <code>locals()</code> and <code>globals()</code>
+dictionaries so that the modified values were reflected in Python.
+Unfortunately, as pointed out by the Python manual, the locals() dictionary is
+not writable.
+<p>
+<em>
+I suspect <code>locals()</code> is not writable because there are some
+optimizations done to speed lookups of the local namespace. I'm guessing local
+lookups don't always look at a dictionary to find values. Can someone "in the
+know" confirm or correct this? Another thing I'd like to know is whether there
+is a way to write to the local namespace of another stack frame from C/C++. If
+so, it would be possible to have some clean up code in compiled functions that
+wrote final values of variables in C++ back to the correct Python stack frame.
+I think this goes a long way toward making <code>inline</code> truely live up
+to its name. I don't think we'll get to the point of creating variables in
+Python for variables created in C -- although I suppose with a C/C++ parser you
+could do that also.
+</em>
+<p>
+
+<a name="inline_quick_look_at_code"></a>
+<h3>A quick look at the code</h3>
+
+<code>weave</code> generates a C++ file holding an extension function for
+each <code>inline</code> code snippet. These file names are generated using
+from the md5 signature of the code snippet and saved to a location specified by
+the PYTHONCOMPILED environment variable (discussed later). The cpp files are
+generally about 200-400 lines long and include quite a few functions to support
+type conversions, etc. However, the actual compiled function is pretty simple.
+Below is the familiar <code>printf</code> example:
+
+ <blockquote><pre><code>
+ >>> import weave
+ >>> a = 1
+ >>> weave.inline('printf("%d\\n",a);',['a'])
+ 1
+ </code></pre></blockquote>
+
+And here is the extension function generated by <code>inline</code>:
+
+ <blockquote><pre><code>
+ static PyObject* compiled_func(PyObject*self, PyObject* args)
+ {
+ // The Py_None needs an incref before returning
+ PyObject *return_val = NULL;
+ int exception_occured = 0;
+ PyObject *py__locals = NULL;
+ PyObject *py__globals = NULL;
+ PyObject *py_a;
+ py_a = NULL;
+
+ if(!PyArg_ParseTuple(args,"OO:compiled_func",&py__locals,&py__globals))
+ return NULL;
+ try
+ {
+ PyObject* raw_locals = py_to_raw_dict(py__locals,"_locals");
+ PyObject* raw_globals = py_to_raw_dict(py__globals,"_globals");
+ int a = py_to_int (get_variable("a",raw_locals,raw_globals),"a");
+ /* Here is the inline code */
+ printf("%d\n",a);
+ /* I would like to fill in changed locals and globals here... */
+ }
+ catch( Py::Exception& e)
+ {
+ return_val = Py::Null();
+ exception_occured = 1;
+ }
+ if(!return_val && !exception_occured)
+ {
+
+ Py_INCREF(Py_None);
+ return_val = Py_None;
+ }
+ /* clean up code */
+
+ /* return */
+ return return_val;
+ }
+ </code></pre></blockquote>
+
+Every inline function takes exactly two arguments -- the local and global
+dictionaries for the current scope. All variable values are looked up out
+of these dictionaries. The lookups, along with all <code>inline</code> code
+execution, are done within a C++ <code>try</code> block. If the variables
+aren't found, or there is an error converting a Python variable to the
+appropriate type in C++, an exception is raised. The C++ exception
+is automatically converted to a Python exception by CXX and returned to Python.
+
+The <code>py_to_int()</code> function illustrates how the conversions and
+exception handling works. py_to_int first checks that the given PyObject*
+pointer is not NULL and is a Python integer. If all is well, it calls the
+Python API to convert the value to an <code>int</code>. Otherwise, it calls
+<code>handle_bad_type()</code> which gathers information about what went wrong
+and then raises a CXX TypeError which returns to Python as a TypeError.
+
+ <blockquote><pre><code>
+ int py_to_int(PyObject* py_obj,char* name)
+ {
+ if (!py_obj || !PyInt_Check(py_obj))
+ handle_bad_type(py_obj,"int", name);
+ return (int) PyInt_AsLong(py_obj);
+ }
+ </code></pre></blockquote>
+
+ <blockquote><pre><code>
+ void handle_bad_type(PyObject* py_obj, char* good_type, char* var_name)
+ {
+ char msg[500];
+ sprintf(msg,"received '%s' type instead of '%s' for variable '%s'",
+ find_type(py_obj),good_type,var_name);
+ throw Py::TypeError(msg);
+ }
+
+ char* find_type(PyObject* py_obj)
+ {
+ if(py_obj == NULL) return "C NULL value";
+ if(PyCallable_Check(py_obj)) return "callable";
+ if(PyString_Check(py_obj)) return "string";
+ if(PyInt_Check(py_obj)) return "int";
+ if(PyFloat_Check(py_obj)) return "float";
+ if(PyDict_Check(py_obj)) return "dict";
+ if(PyList_Check(py_obj)) return "list";
+ if(PyTuple_Check(py_obj)) return "tuple";
+ if(PyFile_Check(py_obj)) return "file";
+ if(PyModule_Check(py_obj)) return "module";
+
+ //should probably do more interagation (and thinking) on these.
+ if(PyCallable_Check(py_obj) && PyInstance_Check(py_obj)) return "callable";
+ if(PyInstance_Check(py_obj)) return "instance";
+ if(PyCallable_Check(py_obj)) return "callable";
+ return "unkown type";
+ }
+ </code></pre></blockquote>
+
+Since the <code>inline</code> is also executed within the <code>try/catch</code>
+block, you can use CXX exceptions within your code. It is usually a bad idea
+to directly <code>return</code> from your code, even if an error occurs. This
+skips the clean up section of the extension function. In this simple example,
+there isn't any clean up code, but in more complicated examples, there may
+be some reference counting that needs to be taken care of here on converted
+variables. To avoid this, either uses exceptions or set
+<code>return_val</code> to NULL and use <code>if/then's</code> to skip code
+after errors.
+
+<a name="inline_technical_details"></a>
+<h2> Technical Details </h2>
+<p>
+There are several main steps to using C/C++ code withing Python:
+<ol>
+ <li>Type conversion
+ <li>Generating C/C++ code
+ <li>Compile the code to an extension module
+ <li>Catalog (and cache) the function for future use</li>
+</ol>
+<p>
+Items 1 and 2 above are related, but most easily discussed separately. Type
+conversions are customizable by the user if needed. Understanding them is
+pretty important for anything beyond trivial uses of <code>inline</code>.
+Generating the C/C++ code is handled by <code>ext_function</code> and
+<code>ext_module</code> classes and . For the most part, compiling the code is
+handled by distutils. Some customizations were needed, but they were
+relatively minor and do not require changes to distutils itself. Cataloging is
+pretty simple in concept, but surprisingly required the most code to implement
+(and still likely needs some work). So, this section covers items 1 and 4 from
+the list. Item 2 is covered later in the chapter covering the
+<code>ext_tools</code> module, and distutils is covered by a completely
+separate document xxx.
+
+<h2>Passing Variables in/out of the C/C++ code</h2>
+<em>
+Note: Passing variables into the C code is pretty straight forward, but there
+are subtlties to how variable modifications in C are returned to Python. see <a
+href="#Returning Values">Returning Values</a> for a more thorough discussion of
+this issue.
+</em>
+
+<A name="Converting Types"></a>
+<h2>Type Conversions</h2>
+
+<em>
+Note: Maybe <code>xxx_converter</code> instead of
+<code>xxx_specification</code> is a more descriptive name. Might change in
+future version?
+</em>
+
+<p>
+By default, <code>inline()</code> makes the following type conversions between
+Python and C++ types.
+<p>
+
+<center>
+<table border=1 width="100%">
+<tr><td colspan="2" width="100%">
+ <P align=center>Default Data Type Conversions</P> </td></tr>
+<tr><td>
+ <P align=center>Python</P></td><td>
+ <P align=center>C++</P></td></tr>
+<tr><td>&nbsp;&nbsp; int</td><td>&nbsp;&nbsp; int</td></tr>
+<tr><td>&nbsp;&nbsp; float</td><td>&nbsp;&nbsp; double</td></tr>
+<tr><td>&nbsp;&nbsp; complex</td><td>&nbsp;&nbsp; std::complex<double></td></tr>
+<tr><td>&nbsp;&nbsp; string</td><td>&nbsp;&nbsp; Py::String</td></tr>
+<tr><td>&nbsp;&nbsp; list</td><td>&nbsp;&nbsp; Py::List</td></tr>
+<tr><td>&nbsp;&nbsp; dict</td><td>&nbsp;&nbsp; Py::Dict</td></tr>
+<tr><td>&nbsp;&nbsp; tuple</td><td>&nbsp;&nbsp; Py::Tuple</td></tr>
+<tr><td>&nbsp;&nbsp; file</td><td>&nbsp;&nbsp; FILE*</td></tr>
+<tr><td>&nbsp;&nbsp; callable</td><td>&nbsp;&nbsp; PyObject*</td></tr>
+<tr><td>&nbsp;&nbsp; instance</td><td>&nbsp;&nbsp; PyObject*</td></tr>
+<tr><td>&nbsp;&nbsp; Numeric.array</td><td>&nbsp;&nbsp; PyArrayObject*</td></tr>
+<tr><td>&nbsp;&nbsp; wxXXX</td><td>&nbsp;&nbsp; wxXXX*</td></tr>
+</table>
+</center>
+<p>
+The <code>Py::</code> namespace is defined by the
+<a href="http://cxx.sourceforge.net/">CXX</a> library which has C++ class
+equivalents for many Python types. <code>std::</code> is the namespace of the
+standard library in C++.
+<p>
+<em>
+Note:
+<ul>
+<li>I haven't figured out how to handle <code>long int</code> yet (I think they are currenlty converted
+ to int - - check this).
+
+<li>
+Hopefully VTK will be added to the list soon</li>
+ </ul>
+</em>
+<p>
+
+Python to C++ conversions fill in code in several locations in the generated
+<code>inline</code> extension function. Below is the basic template for the
+function. This is actually the exact code that is generated by calling
+<code>weave.inline("")</code>.
+
+ <blockquote><pre><code>
+ static PyObject* compiled_func(PyObject*self, PyObject* args)
+ {
+ PyObject *return_val = NULL;
+ int exception_occured = 0;
+ PyObject *py__locals = NULL;
+ PyObject *py__globals = NULL;
+ PyObject *py_a;
+ py_a = NULL;
+
+ if(!PyArg_ParseTuple(args,"OO:compiled_func",&py__locals,&py__globals))
+ return NULL;
+ try
+ {
+ PyObject* raw_locals = py_to_raw_dict(py__locals,"_locals");
+ PyObject* raw_globals = py_to_raw_dict(py__globals,"_globals");
+ /* argument conversion code */
+ /* inline code */
+ /*I would like to fill in changed locals and globals here...*/
+
+ }
+ catch( Py::Exception& e)
+ {
+ return_val = Py::Null();
+ exception_occured = 1;
+ }
+ /* cleanup code */
+ if(!return_val && !exception_occured)
+ {
+
+ Py_INCREF(Py_None);
+ return_val = Py_None;
+ }
+
+ return return_val;
+ }
+ </code></pre></blockquote>
+
+The <code>/* inline code */</code> section is filled with the code passed to
+the <code>inline()</code> function call. The
+<code>/*argument convserion code*/</code> and <code>/* cleanup code */</code>
+sections are filled with code that handles conversion from Python to C++
+types and code that deallocates memory or manipulates reference counts before
+the function returns. The following sections demostrate how these two areas
+are filled in by the default conversion methods.
+
+<em>
+Note: I'm not sure I have reference counting correct on a few of these. The
+only thing I increase/decrease the ref count on is Numeric arrays. If you
+see an issue, please let me know.
+</em>
+
+<a name="inline_numeric_argument_conversion"></a>
+<h3> Numeric Argument Conversion </h3>
+
+Integer, floating point, and complex arguments are handled in a very similar
+fashion. Consider the following inline function that has a single integer
+variable passed in:
+
+ <blockquote><pre><code>
+ >>> a = 1
+ >>> inline("",['a'])
+ </code></pre></blockquote>
+
+The argument conversion code inserted for <code>a</code> is:
+
+ <blockquote><pre><code>
+ /* argument conversion code */
+ int a = py_to_int (get_variable("a",raw_locals,raw_globals),"a");
+ </code></pre></blockquote>
+
+<code>get_variable()</code> reads the variable <code>a</code>
+from the local and global namespaces. <code>py_to_int()</code> has the following
+form:
+
+ <blockquote><pre><code>
+ static int py_to_int(PyObject* py_obj,char* name)
+ {
+ if (!py_obj || !PyInt_Check(py_obj))
+ handle_bad_type(py_obj,"int", name);
+ return (int) PyInt_AsLong(py_obj);
+ }
+ </code></pre></blockquote>
+
+Similarly, the float and complex conversion routines look like:
+
+ <blockquote><pre><code>
+ static double py_to_float(PyObject* py_obj,char* name)
+ {
+ if (!py_obj || !PyFloat_Check(py_obj))
+ handle_bad_type(py_obj,"float", name);
+ return PyFloat_AsDouble(py_obj);
+ }
+
+ static std::complex<double> py_to_complex(PyObject* py_obj,char* name)
+ {
+ if (!py_obj || !PyComplex_Check(py_obj))
+ handle_bad_type(py_obj,"complex", name);
+ return std::complex<double>(PyComplex_RealAsDouble(py_obj),
+ PyComplex_ImagAsDouble(py_obj));
+ }
+ </code></pre></blockquote>
+
+Numeric conversions do not require any clean up code.
+
+<a name="inline_python_argument_conversion"></a>
+<h3> String, List, Tuple, and Dictionary Conversion </h3>
+
+Strings, Lists, Tuples and Dictionary conversions are all converted to
+CXX types by default.
+
+For the following code,
+
+ <blockquote><pre><code>
+ >>> a = [1]
+ >>> inline("",['a'])
+ </code></pre></blockquote>
+
+The argument conversion code inserted for <code>a</code> is:
+
+ <blockquote><pre><code>
+ /* argument conversion code */
+ Py::List a = py_to_list (get_variable("a",raw_locals,raw_globals),"a");
+ </code></pre></blockquote>
+
+<code>get_variable()</code> reads the variable <code>a</code>
+from the local and global namespaces. <code>py_to_list()</code> and its
+friends has the following form:
+
+ <blockquote><pre><code>
+ static Py::List py_to_list(PyObject* py_obj,char* name)
+ {
+ if (!py_obj || !PyList_Check(py_obj))
+ handle_bad_type(py_obj,"list", name);
+ return Py::List(py_obj);
+ }
+
+ static Py::String py_to_string(PyObject* py_obj,char* name)
+ {
+ if (!PyString_Check(py_obj))
+ handle_bad_type(py_obj,"string", name);
+ return Py::String(py_obj);
+ }
+
+ static Py::Dict py_to_dict(PyObject* py_obj,char* name)
+ {
+ if (!py_obj || !PyDict_Check(py_obj))
+ handle_bad_type(py_obj,"dict", name);
+ return Py::Dict(py_obj);
+ }
+
+ static Py::Tuple py_to_tuple(PyObject* py_obj,char* name)
+ {
+ if (!py_obj || !PyTuple_Check(py_obj))
+ handle_bad_type(py_obj,"tuple", name);
+ return Py::Tuple(py_obj);
+ }
+ </code></pre></blockquote>
+
+CXX handles reference counts on for strings, lists, tuples, and dictionaries,
+so clean up code isn't necessary.
+
+<a name="#inline_file_argument_conversion"></a>
+<h3> File Conversion </h3>
+
+For the following code,
+
+ <blockquote><pre><code>
+ >>> a = open("bob",'w')
+ >>> inline("",['a'])
+ </code></pre></blockquote>
+
+The argument conversion code is:
+
+ <blockquote><pre><code>
+ /* argument conversion code */
+ PyObject* py_a = get_variable("a",raw_locals,raw_globals);
+ FILE* a = py_to_file(py_a,"a");
+ </code></pre></blockquote>
+
+<code>get_variable()</code> reads the variable <code>a</code>
+from the local and global namespaces. <code>py_to_file()</code> converts
+PyObject* to a FILE* and increments the reference count of the PyObject*:
+
+ <blockquote><pre><code>
+ FILE* py_to_file(PyObject* py_obj, char* name)
+ {
+ if (!py_obj || !PyFile_Check(py_obj))
+ handle_bad_type(py_obj,"file", name);
+
+ Py_INCREF(py_obj);
+ return PyFile_AsFile(py_obj);
+ }
+ </code></pre></blockquote>
+
+Because the PyObject* was incremented, the clean up code needs to decrement
+the counter
+
+ <blockquote><pre><code>
+ /* cleanup code */
+ Py_XDECREF(py_a);
+ </code></pre></blockquote>
+
+Its important to understand that file conversion only works on actual files --
+i.e. ones created using the <code>open()</code> command in Python. It does
+not support converting arbitrary objects that support the file interface into
+C <code>FILE*</code> pointers. This can affect many things. For example, in
+initial <code>printf()</code> examples, one might be tempted to solve the
+problem of C and Python IDE's (PythonWin, PyCrust, etc.) writing to different
+stdout and stderr by using <code>fprintf()</code> and passing in
+<code>sys.stdout</code> and <code>sys.stderr</code>. For example, instead of
+
+ <blockquote><pre><code>
+ >>> weave.inline('printf("hello\\n");')
+ </code></pre></blockquote>
+
+You might try:
+
+ <blockquote><pre><code>
+ >>> buf = sys.stdout
+ >>> weave.inline('fprintf(buf,"hello\\n");',['buf'])
+ </code></pre></blockquote>
+
+This will work as expected from a standard python interpreter, but in PythonWin,
+the following occurs:
+
+ <blockquote><pre><code>
+ >>> buf = sys.stdout
+ >>> weave.inline('fprintf(buf,"hello\\n");',['buf'])
+ Traceback (most recent call last):
+ File "<interactive input>", line 1, in ?
+ File "C:\Python21\weave\inline_tools.py", line 315, in inline
+ auto_downcast = auto_downcast,
+ File "C:\Python21\weave\inline_tools.py", line 386, in compile_function
+ type_factories = type_factories)
+ File "C:\Python21\weave\ext_tools.py", line 197, in __init__
+ auto_downcast, type_factories)
+ File "C:\Python21\weave\ext_tools.py", line 390, in assign_variable_types
+ raise TypeError, format_error_msg(errors)
+ TypeError: {'buf': "Unable to convert variable 'buf' to a C++ type."}
+ </code></pre></blockquote>
+
+The traceback tells us that <code>inline()</code> was unable to convert 'buf' to a
+C++ type (If instance conversion was implemented, the error would have occurred at
+runtime instead). Why is this? Let's look at what the <code>buf</code> object
+really is:
+
+ <blockquote><pre><code>
+ >>> buf
+ pywin.framework.interact.InteractiveView instance at 00EAD014
+ </code></pre></blockquote>
+
+PythonWin has reassigned <code>sys.stdout</code> to a special object that
+implements the Python file interface. This works great in Python, but since
+the special object doesn't have a FILE* pointer underlying it, fprintf doesn't
+know what to do with it (well this will be the problem when instance conversion
+is implemented...).
+
+<a name="#inline_callable_argument_conversion"></a>
+<h3> Callable, Instance, and Module Conversion </h3>
+
+<em>Note: Need to look into how ref counts should be handled. Also,
+Instance and Module conversion are not currently implemented.
+</em>
+
+ <blockquote><pre><code>
+ >>> def a():
+ pass
+ >>> inline("",['a'])
+ </code></pre></blockquote>
+
+Callable and instance variables are converted to PyObject*. Nothing is done
+to there reference counts.
+
+ <blockquote><pre><code>
+ /* argument conversion code */
+ PyObject* a = py_to_callable(get_variable("a",raw_locals,raw_globals),"a");
+ </code></pre></blockquote>
+
+<code>get_variable()</code> reads the variable <code>a</code>
+from the local and global namespaces. The <code>py_to_callable()</code> and
+<code>py_to_instance()</code> don't currently increment the ref count.
+
+ <blockquote><pre><code>
+ PyObject* py_to_callable(PyObject* py_obj, char* name)
+ {
+ if (!py_obj || !PyCallable_Check(py_obj))
+ handle_bad_type(py_obj,"callable", name);
+ return py_obj;
+ }
+
+ PyObject* py_to_instance(PyObject* py_obj, char* name)
+ {
+ if (!py_obj || !PyFile_Check(py_obj))
+ handle_bad_type(py_obj,"instance", name);
+ return py_obj;
+ }
+ </code></pre></blockquote>
+
+There is no cleanup code for callables, modules, or instances.
+
+<a name="#Customizing Conversions"></a>
+<h3> Customizing Conversions </h3>
+<p>
+Converting from Python to C++ types is handled by xxx_specification classes. A
+type specification class actually serve in two related but different
+roles. The first is in determining whether a Python variable that needs to be
+converted should be represented by the given class. The second is as a code
+generator that generate C++ code needed to convert from Python to C++ types for
+a specific variable.
+<p>
+When
+
+ <blockquote><pre><code>
+ >>> a = 1
+ >>> weave.inline('printf("%d",a);',['a'])
+ </code></pre></blockquote>
+
+is called for the first time, the code snippet has to be compiled. In this
+process, the variable 'a' is tested against a list of type specifications (the
+default list is stored in weave/ext_tools.py). The <em>first</em>
+specification in the list is used to represent the variable.
+
+<p>
+Examples of <code>xxx_specification</code> are scattered throughout numerous
+"xxx_spec.py" files in the <code>weave</code> package. Closely related to
+the <code>xxx_specification</code> classes are <code>yyy_info</code> classes.
+These classes contain compiler, header, and support code information necessary
+for including a certain set of capabilities (such as blitz++ or CXX support)
+in a compiled module. <code>xxx_specification</code> classes have one or more
+<code>yyy_info</code> classes associated with them.
+
+If you'd like to define your own set of type specifications, the current best route
+is to examine some of the existing spec and info files. Maybe looking over
+sequence_spec.py and cxx_info.py are a good place to start. After defining
+specification classes, you'll need to pass them into <code>inline</code> using the
+<code>type_factories</code> argument.
+
+A lot of times you may just want to change how a specific variable type is
+represented. Say you'd rather have Python strings converted to
+<code>std::string</code> or maybe <code>char*</code> instead of using the CXX
+string object, but would like all other type conversions to have default
+behavior. This requires that a new specification class that handles strings
+is written and then prepended to a list of the default type specifications. Since
+it is closer to the front of the list, it effectively overrides the default
+string specification.
+
+The following code demonstrates how this is done:
+
+...
+
+<a name="The Catalog"></a>
+<h2> The Catalog </h2>
+<p>
+<code>catalog.py</code> has a class called <code>catalog</code> that helps keep
+track of previously compiled functions. This prevents <code>inline()</code>
+and related functions from having to compile functions everytime they are
+called. Instead, catalog will check an in memory cache to see if the function
+has already been loaded into python. If it hasn't, then it starts searching
+through persisent catalogs on disk to see if it finds an entry for the given
+function. By saving information about compiled functions to disk, it isn't
+necessary to re-compile functions everytime you stop and restart the interpreter.
+Functions are compiled once and stored for future use.
+
+<p>
+When <code>inline(cpp_code)</code> is called the following things happen:
+<ol>
+ <li>
+ A fast local cache of functions is checked for the last function called for
+ <code>cpp_code</code>. If an entry for <code>cpp_code</code> doesn't exist in the
+ cache or the cached function call fails (perhaps because the function doesn't
+ have compatible types) then the next step is to check the catalog.
+ <li>
+ The catalog class also keeps an in-memory cache with a list of all the
+ functions compiled for <code>cpp_code</code>. If <code>cpp_code</code> has
+ ever been called, then this cache will be present (loaded from disk). If
+ the cache isn't present, then it is loaded from disk.
+ <p>
+ If the cache is present, each function in the cache is
+ called until one is found that was compiled for the correct argument types. If
+ none of the functions work, a new function is compiled with the given argument
+ types. This function is written to the on-disk catalog as well as into the
+ in-memory cache.</p>
+ <li>
+ When a lookup for <code>cpp_code</code> fails, the catalog looks through
+ the on-disk function catalogs for the entries. The PYTHONCOMPILED variable
+ determines where to search for these catalogs and in what order. If
+ PYTHONCOMPILED is not present several platform dependent locations are
+ searched. All functions found for <code>cpp_code</code> in the path are
+ loaded into the in-memory cache with functions found earlier in the search
+ path closer to the front of the call list.
+ <p>
+ If the function isn't found in the on-disk catalog,
+ then the function is compiled, written to the first writable directory in the
+ PYTHONCOMPILED path, and also loaded into the in-memory cache.</p>
+ </li>
+</ol>
+
+<a name="function storage"></a>
+<h3> Function Storage: How functions are stored in caches and on disk </h3>
+<p>
+Function caches are stored as dictionaries where the key is the entire C++
+code string and the value is either a single function (as in the "level 1"
+cache) or a list of functions (as in the main catalog cache). On disk
+catalogs are stored in the same manor using standard Python shelves.
+<p>
+Early on, there was a question as to whether md5 check sums of the C++
+code strings should be used instead of the actual code strings. I think this
+is the route inline Perl took. Some (admittedly quick) tests of the md5 vs.
+the entire string showed that using the entire string was at least a
+factor of 3 or 4 faster for Python. I think this is because it is more
+time consuming to compute the md5 value than it is to do look-ups of long
+strings in the dictionary. Look at the examples/md5_speed.py file for the
+test run.
+
+<a name="PYTHONCOMPILED"></a>
+<h3> Catalog search paths and the PYTHONCOMPILED variable</h3>
+<p>
+The default location for catalog files on Unix is is ~/.pythonXX_compiled where
+XX is version of Python being used. If this directory doesn't exist, it is
+created the first time a catalog is used. The directory must be writable. If,
+for any reason it isn't, then the catalog attempts to create a directory based
+on your user id in the /tmp directory. The directory permissions are set so
+that only you have access to the directory. If this fails, I think you're out of
+luck. I don't think either of these should ever fail though. On Windows, a
+directory called pythonXX_compiled is created in the user's temporary
+directory.
+<p>
+The actual catalog file that lives in this directory is a Python shelve with
+a platform specific name such as "nt21compiled_catalog" so that multiple OSes
+can share the same file systems without trampling on each other. Along with
+the catalog file, the .cpp and .so or .pyd files created by inline will live
+in this directory. The catalog file simply contains keys which are the C++
+code strings with values that are lists of functions. The function lists point
+at functions within these compiled modules. Each function in the lists
+executes the same C++ code string, but compiled for different input variables.
+<p>
+You can use the PYTHONCOMPILED environment variable to specify alternative
+locations for compiled functions. On Unix this is a colon (':') separated
+list of directories. On windows, it is a (';') separated list of directories.
+These directories will be searched prior to the default directory for a
+compiled function catalog. Also, the first writable directory in the list
+is where all new compiled function catalogs, .cpp and .so or .pyd files are
+written. Relative directory paths ('.' and '..') should work fine in the
+PYTHONCOMPILED variable as should environement variables.
+<p>
+There is a "special" path variable called MODULE that can be placed in the
+PYTHONCOMPILED variable. It specifies that the compiled catalog should
+reside in the same directory as the module that called it. This is useful
+if an admin wants to build a lot of compiled functions during the build
+of a package and then install them in site-packages along with the package.
+User's who specify MODULE in their PYTHONCOMPILED variable will have access
+to these compiled functions. Note, however, that if they call the function
+with a set of argument types that it hasn't previously been built for, the
+new function will be stored in their default directory (or some other writable
+directory in the PYTHONCOMPILED path) because the user will not have write
+access to the site-packages directory.
+<p>
+An example of using the PYTHONCOMPILED path on bash follows:
+
+ <blockquote><pre><code>
+ PYTHONCOMPILED=MODULE:/some/path;export PYTHONCOMPILED;
+ </code></pre></blockquote>
+
+If you are using python21 on linux, and the module bob.py in site-packages
+has a compiled function in it, then the catalog search order when calling that
+function for the first time in a python session would be:
+
+ <blockquote><pre><code>
+ /usr/lib/python21/site-packages/linuxpython_compiled
+ /some/path/linuxpython_compiled
+ ~/.python21_compiled/linuxpython_compiled
+ </code></pre></blockquote>
+
+The default location is always included in the search path.
+<p>
+<em>
+Note: hmmm. see a possible problem here. I should probably make a sub-
+directory such as /usr/lib/python21/site-
+packages/python21_compiled/linuxpython_compiled so that library files compiled
+with python21 are tried to link with python22 files in some strange scenarios.
+Need to check this.
+</em>
+
+<p>
+The in-module cache (in <code>weave.inline_tools</code> reduces the overhead
+of calling inline functions by about a factor of 2. It can be reduced a little
+more for type loop calls where the same function is called over and over again
+if the cache was a single value instead of a dictionary, but the benefit is
+very small (less than 5%) and the utility is quite a bit less. So, we'll stick
+with a dictionary as the cache.
+<p></p>
+
+<a name="Blitz"></a>
+<h1>Blitz</h1>
+<em> Note: most of this section is lifted from old documentation. It should be
+pretty accurate, but there may be a few discrepancies.</em>
+<p>
+<code>weave.blitz()</code> compiles Numeric Python expressions for fast
+execution. For most applications, compiled expressions should provide a
+factor of 2-10 speed-up over Numeric arrays. Using compiled
+expressions is meant to be as unobtrusive as possible and works much like
+pythons exec statement. As an example, the following code fragment takes a 5
+point average of the 512x512 2d image, b, and stores it in array, a:
+
+ <blockquote><pre><code>
+ from scipy import * # or from Numeric import *
+ a = ones((512,512), Float64)
+ b = ones((512,512), Float64)
+ # ...do some stuff to fill in b...
+ # now average
+ a[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1] \
+ + b[1:-1,2:] + b[1:-1,:-2]) / 5.
+ </code></pre></blockquote>
+
+To compile the expression, convert the expression to a string by putting
+quotes around it and then use <code>weave.blitz</code>:
+
+ <blockquote><pre><code>
+ import weave
+ expr = "a[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1]" \
+ "+ b[1:-1,2:] + b[1:-1,:-2]) / 5."
+ weave.blitz(expr)
+ </code></pre></blockquote>
+
+The first time <code>weave.blitz</code> is run for a given expression and
+set of arguements, C++ code that accomplishes the exact same task as the Python
+expression is generated and compiled to an extension module. This can take up
+to a couple of minutes depending on the complexity of the function. Subsequent
+calls to the function are very fast. Futher, the generated module is saved
+between program executions so that the compilation is only done once for a
+given expression and associated set of array types. If the given expression
+is executed with a new set of array types, the code most be compiled again. This
+does not overwrite the previously compiled function -- both of them are saved and
+available for exectution.
+<p>
+The following table compares the run times for standard Numeric code and
+compiled code for the 5 point averaging.
+<p>
+<center>
+<table border=1 >
+<tr><td>Method</td> <td>Run Time (seconds)</td></tr>
+<tr><td>Standard Numeric</td> <td>0.46349</td></tr>
+<tr><td>blitz (1st time compiling)</td> <td> 78.95526</td></tr>
+<tr><td>blitz (subsequent calls)</td> <td>0.05843 (factor of 8 speedup)</td></tr>
+</table>
+</center>
+<p>
+These numbers are for a 512x512 double precision image run on a 400 MHz Celeron
+processor under RedHat Linux 6.2.
+<p>
+Because of the slow compile times, its probably most effective to develop
+algorithms as you usually do using the capabilities of scipy or the Numeric
+module. Once the algorithm is perfected, put quotes around it and execute it
+using <code>weave.blitz</code>. This provides the standard rapid
+prototyping strengths of Python and results in algorithms that run close to
+that of hand coded C or Fortran.
+
+<a name="blitz_requirements"></a>
+<h2>Requirements</h2>
+
+Currently, the <code>weave.blitz</code> has only been tested under Linux
+with gcc-2.95-3 and on Windows with Mingw32 (2.95.2). Its compiler
+requirements are pretty heavy duty (see the
+<a href="http://www.oonumerics.org/blitz/">blitz++ home page</a>), so it won't
+work with just any compiler. Particularly MSVC++ isn't up to snuff. A number
+of other compilers such as KAI++ will also work, but my suspicions are that gcc
+will get the most use.
+
+<a name="blitz_limitations"></a> <h2>Limitations</h2> <ol> <li>
+Currently, <code>weave.blitz</code> handles all standard mathematic
+operators except for the ** power operator. The built-in
+trigonmetric, log, floor/ceil, and fabs functions might work (but
+haven't been tested). It also handles all types of array indexing
+supported by the Numeric module. numarray's Numeric compatible array
+indexing modes are likewise supported, but numarray's enhanced
+(array based) indexing modes are not supported.
+
+<p>
+<code>weave.blitz</code> does not currently support operations that use
+array broadcasting, nor have any of the special purpose functions in Numeric
+such as take, compress, etc. been implemented. Note that there are no obvious
+reasons why most of this functionality cannot be added to scipy.weave, so it
+will likely trickle into future versions. Using <code>slice()</code> objects
+directly instead of <code>start:stop:step</code> is also not supported.
+</li>
+<li>
+Currently Python only works on expressions that include assignment such as
+
+ <blockquote><pre><code>
+ >>> result = b + c + d
+ </code></pre></blockquote>
+
+This means that the result array must exist before calling
+<code>weave.blitz</code>. Future versions will allow the following:
+
+ <blockquote><pre><code>
+ >>> result = weave.blitz_eval("b + c + d")
+ </code></pre></blockquote>
+</li>
+<li>
+<code>weave.blitz</code> works best when algorithms can be expressed in a
+"vectorized" form. Algorithms that have a large number of if/thens and other
+conditions are better hand written in C or Fortran. Further, the restrictions
+imposed by requiring vectorized expressions sometimes preclude the use of more
+efficient data structures or algorithms. For maximum speed in these cases,
+hand-coded C or Fortran code is the only way to go.
+</li>
+<li>
+<code>weave.blitz</code> can produce different results than Numeric in certain
+situations. It can happen when the array receiving the results of a
+calculation is also used during the calculation. The Numeric behavior is to
+carry out the entire calculation on the right hand side of an equation and
+store it in a temporary array. This temprorary array is assigned to the array
+on the left hand side of the equation. blitz, on the other hand, does a
+"running" calculation of the array elements assigning values from the right hand
+side to the elements on the left hand side immediately after they are calculated.
+Here is an example, provided by Prabhu Ramachandran, where this happens:
+
+ <blockquote><pre><code>
+ # 4 point average.
+ >>> expr = "u[1:-1, 1:-1] = (u[0:-2, 1:-1] + u[2:, 1:-1] + "\
+ ... "u[1:-1,0:-2] + u[1:-1, 2:])*0.25"
+ >>> u = zeros((5, 5), 'd'); u[0,:] = 100
+ >>> exec (expr)
+ >>> u
+ array([[ 100., 100., 100., 100., 100.],
+ [ 0., 25., 25., 25., 0.],
+ [ 0., 0., 0., 0., 0.],
+ [ 0., 0., 0., 0., 0.],
+ [ 0., 0., 0., 0., 0.]])
+
+ >>> u = zeros((5, 5), 'd'); u[0,:] = 100
+ >>> weave.blitz (expr)
+ >>> u
+ array([[ 100. , 100. , 100. , 100. , 100. ],
+ [ 0. , 25. , 31.25 , 32.8125 , 0. ],
+ [ 0. , 6.25 , 9.375 , 10.546875 , 0. ],
+ [ 0. , 1.5625 , 2.734375 , 3.3203125, 0. ],
+ [ 0. , 0. , 0. , 0. , 0. ]])
+ </code></pre></blockquote>
+
+ You can prevent this behavior by using a temporary array.
+
+ <blockquote><pre><code>
+ >>> u = zeros((5, 5), 'd'); u[0,:] = 100
+ >>> temp = zeros((4, 4), 'd');
+ >>> expr = "temp = (u[0:-2, 1:-1] + u[2:, 1:-1] + "\
+ ... "u[1:-1,0:-2] + u[1:-1, 2:])*0.25;"\
+ ... "u[1:-1,1:-1] = temp"
+ >>> weave.blitz (expr)
+ >>> u
+ array([[ 100., 100., 100., 100., 100.],
+ [ 0., 25., 25., 25., 0.],
+ [ 0., 0., 0., 0., 0.],
+ [ 0., 0., 0., 0., 0.],
+ [ 0., 0., 0., 0., 0.]])
+ </code></pre></blockquote>
+
+</li>
+<li>
+One other point deserves mention lest people be confused.
+<code>weave.blitz</code> is not a general purpose Python->C compiler. It
+only works for expressions that contain Numeric arrays and/or
+Python scalar values. This focused scope concentrates effort on the
+compuationally intensive regions of the program and sidesteps the difficult
+issues associated with a general purpose Python->C compiler.
+</li>
+</ol>
+
+<a name="Numeric Efficiency"></a>
+<h2>Numeric efficiency issues: What compilation buys you</h2>
+
+Some might wonder why compiling Numeric expressions to C++ is beneficial since
+operations on Numeric array operations are already executed within C loops.
+The problem is that anything other than the simplest expression are executed in
+less than optimal fashion. Consider the following Numeric expression:
+
+ <blockquote><pre><code>
+ a = 1.2 * b + c * d
+ </code></pre></blockquote>
+
+When Numeric calculates the value for the 2d array, <code>a</code>, it does
+the following steps:
+
+ <blockquote><pre><code>
+ temp1 = 1.2 * b
+ temp2 = c * d
+ a = temp1 + temp2
+ </code></pre></blockquote>
+
+Two things to note. Since <code>c</code> is an (perhaps large) array, a large
+temporary array must be created to store the results of <code>1.2 * b</code>.
+The same is true for <code>temp2</code>. Allocation is slow. The second thing
+is that we have 3 loops executing, one to calculate <code>temp1</code>, one for
+<code>temp2</code> and one for adding them up. A C loop for the same problem
+might look like:
+
+ <blockquote><pre><code>
+ for(int i = 0; i < M; i++)
+ for(int j = 0; j < N; j++)
+ a[i,j] = 1.2 * b[i,j] + c[i,j] * d[i,j]
+ </code></pre></blockquote>
+
+Here, the 3 loops have been fused into a single loop and there is no longer
+a need for a temporary array. This provides a significant speed improvement
+over the above example (write me and tell me what you get).
+<p>
+So, converting Numeric expressions into C/C++ loops that fuse the loops and
+eliminate temporary arrays can provide big gains. The goal then,is to convert
+Numeric expression to C/C++ loops, compile them in an extension module, and
+then call the compiled extension function. The good news is that there is an
+obvious correspondence between the Numeric expression above and the C loop. The
+bad news is that Numeric is generally much more powerful than this simple
+example illustrates and handling all possible indexing possibilities results in
+loops that are less than straight forward to write. (take a peak in Numeric for
+confirmation). Luckily, there are several available tools that simplify the
+process.
+
+<a name="blitz_tools"></a>
+<h2>The Tools</h2>
+
+<code>weave.blitz</code> relies heavily on several remarkable tools. On the
+Python side, the main facilitators are Jermey Hylton's parser module and Jim
+Huginin's Numeric module. On the compiled language side, Todd Veldhuizen's
+blitz++ array library, written in C++ (shhhh. don't tell David Beazley), does
+the heavy lifting. Don't assume that, because it's C++, it's much slower than C
+or Fortran. Blitz++ uses a jaw dropping array of template techniques
+(metaprogramming, template expression, etc) to convert innocent looking and
+readable C++ expressions into to code that usually executes within a few
+percentage points of Fortran code for the same problem. This is good.
+Unfortunately all the template raz-ma-taz is very expensive to compile, so the
+200 line extension modules often take 2 or more minutes to compile. This isn't so
+good. <code>weave.blitz</code> works to minimize this issue by remembering
+where compiled modules live and reusing them instead of re-compiling every time
+a program is re-run.
+
+<a name="blitz_parser"></a>
+<h3>Parser</h3>
+Tearing Numeric expressions apart, examining the pieces, and then rebuilding
+them as C++ (blitz) expressions requires a parser of some sort. I can imagine
+someone attacking this problem with regular expressions, but it'd likely be
+ugly and fragile. Amazingly, Python solves this problem for us. It actually
+exposes its parsing engine to the world through the <code>parser</code> module.
+The following fragment creates an Abstract Syntax Tree (AST) object for the
+expression and then converts to a (rather unpleasant looking) deeply nested list
+representation of the tree.
+
+ <blockquote><pre><code>
+ >>> import parser
+ >>> import scipy.weave.misc
+ >>> ast = parser.suite("a = b * c + d")
+ >>> ast_list = ast.tolist()
+ >>> sym_list = scipy.weave.misc.translate_symbols(ast_list)
+ >>> pprint.pprint(sym_list)
+ ['file_input',
+ ['stmt',
+ ['simple_stmt',
+ ['small_stmt',
+ ['expr_stmt',
+ ['testlist',
+ ['test',
+ ['and_test',
+ ['not_test',
+ ['comparison',
+ ['expr',
+ ['xor_expr',
+ ['and_expr',
+ ['shift_expr',
+ ['arith_expr',
+ ['term',
+ ['factor', ['power', ['atom', ['NAME', 'a']]]]]]]]]]]]]]],
+ ['EQUAL', '='],
+ ['testlist',
+ ['test',
+ ['and_test',
+ ['not_test',
+ ['comparison',
+ ['expr',
+ ['xor_expr',
+ ['and_expr',
+ ['shift_expr',
+ ['arith_expr',
+ ['term',
+ ['factor', ['power', ['atom', ['NAME', 'b']]]],
+ ['STAR', '*'],
+ ['factor', ['power', ['atom', ['NAME', 'c']]]]],
+ ['PLUS', '+'],
+ ['term',
+ ['factor', ['power', ['atom', ['NAME', 'd']]]]]]]]]]]]]]]]],
+ ['NEWLINE', '']]],
+ ['ENDMARKER', '']]
+ </code></pre></blockquote>
+
+Despite its looks, with some tools developed by Jermey H., its possible
+to search these trees for specific patterns (sub-trees), extract the
+sub-tree, manipulate them converting python specific code fragments
+to blitz code fragments, and then re-insert it in the parse tree. The parser
+module documentation has some details on how to do this. Traversing the
+new blitzified tree, writing out the terminal symbols as you go, creates
+our new blitz++ expression string.
+
+<a name="blitz_blitz"></a>
+<h3> Blitz and Numeric </h3>
+The other nice discovery in the project is that the data structure used
+for Numeric arrays and blitz arrays is nearly identical. Numeric stores
+"strides" as byte offsets and blitz stores them as element offsets, but
+other than that, they are the same. Further, most of the concept and
+capabilities of the two libraries are remarkably similar. It is satisfying
+that two completely different implementations solved the problem with
+similar basic architectures. It is also fortitous. The work involved in
+converting Numeric expressions to blitz expressions was greatly diminished.
+As an example, consider the code for slicing an array in Python with a
+stride:
+
+ <blockquote><pre><code>
+ >>> a = b[0:4:2] + c
+ >>> a
+ [0,2,4]
+ </code></pre></blockquote>
+
+
+In Blitz it is as follows:
+
+ <blockquote><pre><code>
+ Array<2,int> b(10);
+ Array<2,int> c(3);
+ // ...
+ Array<2,int> a = b(Range(0,3,2)) + c;
+ </code></pre></blockquote>
+
+
+Here the range object works exactly like Python slice objects with the exception
+that the top index (3) is inclusive where as Python's (4) is exclusive. Other
+differences include the type declaraions in C++ and parentheses instead of
+brackets for indexing arrays. Currently, <code>weave.blitz</code> handles the
+inclusive/exclusive issue by subtracting one from upper indices during the
+translation. An alternative that is likely more robust/maintainable in the
+long run, is to write a PyRange class that behaves like Python's range.
+This is likely very easy.
+<p>
+The stock blitz also doesn't handle negative indices in ranges. The current
+implementation of the <code>blitz()</code> has a partial solution to this
+problem. It calculates and index that starts with a '-' sign by subtracting it
+from the maximum index in the array so that:
+
+ <blockquote><pre><code>
+ upper index limit
+ /-----\
+ b[:-1] -> b(Range(0,Nb[0]-1-1))
+ </code></pre></blockquote>
+
+This approach fails, however, when the top index is calculated from other
+values. In the following scenario, if <code>i+j</code> evaluates to a negative
+value, the compiled code will produce incorrect results and could even core-
+dump. Right now, all calculated indices are assumed to be positive.
+
+ <blockquote><pre><code>
+ b[:i-j] -> b(Range(0,i+j))
+ </code></pre></blockquote>
+
+A solution is to calculate all indices up front using if/then to handle the
++/- cases. This is a little work and results in more code, so it hasn't been
+done. I'm holding out to see if blitz++ can be modified to handle negative
+indexing, but haven't looked into how much effort is involved yet. While it
+needs fixin', I don't think there is a ton of code where this is an issue.
+<p>
+The actual translation of the Python expressions to blitz expressions is
+currently a two part process. First, all x:y:z slicing expression are removed
+from the AST, converted to slice(x,y,z) and re-inserted into the tree. Any
+math needed on these expressions (subtracting from the
+maximum index, etc.) are also preformed here. _beg and _end are used as special
+variables that are defined as blitz::fromBegin and blitz::toEnd.
+
+ <blockquote><pre><code>
+ a[i+j:i+j+1,:] = b[2:3,:]
+ </code></pre></blockquote>
+
+becomes a more verbose:
+
+ <blockquote><pre><code>
+ a[slice(i+j,i+j+1),slice(_beg,_end)] = b[slice(2,3),slice(_beg,_end)]
+ </code></pre></blockquote>
+
+The second part does a simple string search/replace to convert to a blitz
+expression with the following translations:
+
+ <blockquote><pre><code>
+ slice(_beg,_end) -> _all # not strictly needed, but cuts down on code.
+ slice -> blitz::Range
+ [ -> (
+ ] -> )
+ _stp -> 1
+ </code></pre></blockquote>
+
+<code>_all</code> is defined in the compiled function as
+<code>blitz::Range.all()</code>. These translations could of course happen
+directly in the syntax tree. But the string replacement is slightly easier.
+Note that name spaces are maintained in the C++ code to lessen the likelyhood
+of name clashes. Currently no effort is made to detect name clashes. A good
+rule of thumb is don't use values that start with '_' or 'py_' in compiled
+expressions and you'll be fine.
+
+<a name="blitz_type_conversions"></a>
+<h2>Type definitions and coersion</h2>
+
+So far we've glossed over the dynamic vs. static typing issue between Python
+and C++. In Python, the type of value that a variable holds can change
+through the course of program execution. C/C++, on the other hand, forces you
+to declare the type of value a variables will hold prior at compile time.
+<code>weave.blitz</code> handles this issue by examining the types of the
+variables in the expression being executed, and compiling a function for those
+explicit types. For example:
+
+ <blockquote><pre><code>
+ a = ones((5,5),Float32)
+ b = ones((5,5),Float32)
+ weave.blitz("a = a + b")
+ </code></pre></blockquote>
+
+When compiling this expression to C++, <code>weave.blitz</code> sees that the
+values for a and b in the local scope have type <code>Float32</code>, or 'float'
+on a 32 bit architecture. As a result, it compiles the function using
+the float type (no attempt has been made to deal with 64 bit issues).
+It also goes one step further. If all arrays have the same type, a templated
+version of the function is made and instantiated for float, double,
+complex<float>, and complex<double> arrays. <em> Note: This feature has been
+removed from the current version of the code. Each version will be compiled
+separately </em>
+<p>
+What happens if you call a compiled function with array types that are
+different than the ones for which it was originally compiled? No biggie, you'll
+just have to wait on it to compile a new version for your new types. This
+doesn't overwrite the old functions, as they are still accessible. See the
+catalog section in the inline() documentation to see how this is handled.
+Suffice to say, the mechanism is transparent to the user and behaves
+like dynamic typing with the occasional wait for compiling newly typed
+functions.
+<p>
+When working with combined scalar/array operations, the type of the array is
+<em>always</em> used. This is similar to the savespace flag that was recently
+added to Numeric. This prevents issues with the following expression perhaps
+unexpectedly being calculated at a higher (more expensive) precision that can
+occur in Python:
+
+ <blockquote><pre><code>
+ >>> a = array((1,2,3),typecode = Float32)
+ >>> b = a * 2.1 # results in b being a Float64 array.
+ </code></pre></blockquote>
+
+In this example,
+
+ <blockquote><pre><code>
+ >>> a = ones((5,5),Float32)
+ >>> b = ones((5,5),Float32)
+ >>> weave.blitz("b = a * 2.1")
+ </code></pre></blockquote>
+
+the <code>2.1</code> is cast down to a <code>float</code> before carrying out
+the operation. If you really want to force the calculation to be a
+<code>double</code>, define <code>a</code> and <code>b</code> as
+<code>double</code> arrays.
+<p>
+One other point of note. Currently, you must include both the right hand side
+and left hand side (assignment side) of your equation in the compiled
+expression. Also, the array being assigned to must be created prior to calling
+<code>weave.blitz</code>. I'm pretty sure this is easily changed so that a
+compiled_eval expression can be defined, but no effort has been made to
+allocate new arrays (and decern their type) on the fly.
+
+<a name="blitz_catalog"></a>
+<h2>Cataloging Compiled Functions</h2>
+
+See the <a href="#The Catalog">Cataloging functions</a> section in the
+<code>weave.inline()</code> documentation.
+
+<a name="blitz_array_sizes"></a>
+<h2>Checking Array Sizes</h2>
+
+Surprisingly, one of the big initial problems with compiled code was making
+sure all the arrays in an operation were of compatible type. The following
+case is trivially easy:
+
+ <blockquote><pre><code>
+ a = b + c
+ </code></pre></blockquote>
+
+It only requires that arrays <code>a</code>, <code>b</code>, and <code>c</code>
+have the same shape. However, expressions like:
+
+ <blockquote><pre><code>
+ a[i+j:i+j+1,:] = b[2:3,:] + c
+ </code></pre></blockquote>
+
+are not so trivial. Since slicing is involved, the size of the slices, not the
+input arrays must be checked. Broadcasting complicates things further because
+arrays and slices with different dimensions and shapes may be compatible for
+math operations (broadcasting isn't yet supported by
+<code>weave.blitz</code>). Reductions have a similar effect as their
+results are different shapes than their input operand. The binary operators in
+Numeric compare the shapes of their two operands just before they operate on
+them. This is possible because Numeric treats each operation independently.
+The intermediate (temporary) arrays created during sub-operations in an
+expression are tested for the correct shape before they are combined by another
+operation. Because <code>weave.blitz</code> fuses all operations into a
+single loop, this isn't possible. The shape comparisons must be done and
+guaranteed compatible before evaluating the expression.
+<p>
+The solution chosen converts input arrays to "dummy arrays" that only represent
+the dimensions of the arrays, not the data. Binary operations on dummy arrays
+check that input array sizes are comptible and return a dummy array with the
+size correct size. Evaluating an expression of dummy arrays traces the
+changing array sizes through all operations and fails if incompatible array
+sizes are ever found.
+<p>
+The machinery for this is housed in <code>weave.size_check</code>. It
+basically involves writing a new class (dummy array) and overloading it math
+operators to calculate the new sizes correctly. All the code is in Python and
+there is a fair amount of logic (mainly to handle indexing and slicing) so the
+operation does impose some overhead. For large arrays (ie. 50x50x50), the
+overhead is negligible compared to evaluating the actual expression. For small
+arrays (ie. 16x16), the overhead imposed for checking the shapes with this
+method can cause the <code>weave.blitz</code> to be slower than evaluating
+the expression in Python.
+<p>
+What can be done to reduce the overhead? (1) The size checking code could be
+moved into C. This would likely remove most of the overhead penalty compared
+to Numeric (although there is also some calling overhead), but no effort has
+been made to do this. (2) You can also call <code>weave.blitz</code> with
+<code>check_size=0</code> and the size checking isn't done. However, if the
+sizes aren't compatible, it can cause a core-dump. So, foregoing size_checking
+isn't advisable until your code is well debugged.
+
+<a name="blitz_extension_module"></a>
+<h2>Creating the Extension Module</h2>
+
+<code>weave.blitz</code> uses the same machinery as
+<code>weave.inline</code> to build the extension module. The only difference
+is the code included in the function is automatically generated from the
+Numeric array expression instead of supplied by the user.
+
+<a name="#Extension Modules"></a>
+<h1>Extension Modules</h1>
+<code>weave.inline</code> and <code>weave.blitz</code> are high level tools
+that generate extension modules automatically. Under the covers, they use several
+classes from <code>weave.ext_tools</code> to help generate the extension module.
+The main two classes are <code>ext_module</code> and <code>ext_function</code> (I'd
+like to add <code>ext_class</code> and <code>ext_method</code> also). These classes
+simplify the process of generating extension modules by handling most of the "boiler
+plate" code automatically.
+
+<em>
+Note: <code>inline</code> actually sub-classes <code>weave.ext_tools.ext_function</code>
+to generate slightly different code than the standard <code>ext_function</code>.
+The main difference is that the standard class converts function arguments to
+C types, while inline always has two arguments, the local and global dicts, and
+the grabs the variables that need to be convereted to C from these.
+</em>
+
+<a name="A Simple Example"></a>
+<h2> A Simple Example </h2>
+The following simple example demonstrates how to build an extension module within
+a Python function:
+
+ <blockquote><pre><code>
+ # examples/increment_example.py
+ from weave import ext_tools
+
+ def build_increment_ext():
+ """ Build a simple extension with functions that increment numbers.
+ The extension will be built in the local directory.
+ """
+ mod = ext_tools.ext_module('increment_ext')
+
+ a = 1 # effectively a type declaration for 'a' in the
+ # following functions.
+
+ ext_code = "return_val = Py::new_reference_to(Py::Int(a+1));"
+ func = ext_tools.ext_function('increment',ext_code,['a'])
+ mod.add_function(func)
+
+ ext_code = "return_val = Py::new_reference_to(Py::Int(a+2));"
+ func = ext_tools.ext_function('increment_by_2',ext_code,['a'])
+ mod.add_function(func)
+
+ mod.compile()
+ </code></pre></blockquote>
+
+
+The function <code>build_increment_ext()</code> creates an extension module
+named <code>increment_ext</code> and compiles it to a shared library (.so or
+.pyd) that can be loaded into Python.. <code>increment_ext</code> contains two
+functions, <code>increment</code> and <code>increment_by_2</code>.
+
+The first line of <code>build_increment_ext()</code>,
+
+ <blockquote><pre><code>
+ mod = ext_tools.ext_module('increment_ext')
+ </code></pre></blockquote>
+
+creates an <code>ext_module</code> instance that is ready to have
+<code>ext_function</code> instances added to it. <code>ext_function</code>
+instances are created much with a calling convention similar to
+<code>weave.inline()</code>. The most common call includes a C/C++ code
+snippet and a list of the arguments for the function. The following
+
+ <blockquote><pre><code>
+ ext_code = "return_val = Py::new_reference_to(Py::Int(a+1));"
+ func = ext_tools.ext_function('increment',ext_code,['a'])
+ </code></pre></blockquote>
+
+creates a C/C++ extension function that is equivalent to the following Python
+function:
+
+ <blockquote><pre><code>
+ def increment(a):
+ return a + 1
+ </code></pre></blockquote>
+
+A second method is also added to the module and then,
+
+ <blockquote><pre><code>
+ mod.compile()
+ </code></pre></blockquote>
+
+is called to build the extension module. By default, the module is created
+in the current working directory.
+
+This example is available in the <code>examples/increment_example.py</code> file
+found in the <code>weave</code> directory. At the bottom of the file in the
+module's "main" program, an attempt to import <code>increment_ext</code> without
+building it is made. If this fails (the module doesn't exist in the PYTHONPATH),
+the module is built by calling <code>build_increment_ext()</code>. This approach
+only takes the time consuming ( a few seconds for this example) process of building
+the module if it hasn't been built before.
+
+ <blockquote><pre><code>
+ if __name__ == "__main__":
+ try:
+ import increment_ext
+ except ImportError:
+ build_increment_ext()
+ import increment_ext
+ a = 1
+ print 'a, a+1:', a, increment_ext.increment(a)
+ print 'a, a+2:', a, increment_ext.increment_by_2(a)
+ </code></pre></blockquote>
+
+<em>
+Note: If we were willing to always pay the penalty of building the C++ code for
+a module, we could store the md5 checksum of the C++ code along with some
+information about the compiler, platform, etc. Then,
+<code>ext_module.compile()</code> could try importing the module before it actually
+compiles it, check the md5 checksum and other meta-data in the imported module
+with the meta-data of the code it just produced and only compile the code if
+the module didn't exist or the meta-data didn't match. This would reduce the
+above code to:
+</em>
+ <blockquote><pre><code>
+ if __name__ == "__main__":
+ build_increment_ext()
+
+ a = 1
+ print 'a, a+1:', a, increment_ext.increment(a)
+ print 'a, a+2:', a, increment_ext.increment_by_2(a)
+ </code></pre></blockquote>
+<em>
+Note: There would always be the overhead of building the C++ code, but it would only actually compile the code once. You pay a little in overhead and get cleaner
+"import" code. Needs some thought.
+</em>
+<p>
+
+If you run <code>increment_example.py</code> from the command line, you get
+the following:
+
+ <blockquote><pre><code>
+ [eric@n0]$ python increment_example.py
+ a, a+1: 1 2
+ a, a+2: 1 3
+ </code></pre></blockquote>
+
+If the module didn't exist before it was run, the module is created. If it did
+exist, it is just imported and used.
+
+<a name="Fibonacci Example"></a>
+<h2> Fibonacci Example </h2>
+<code>examples/fibonacci.py</code> provides a little more complex example of
+how to use <code>ext_tools</code>. Fibonacci numbers are a series of numbers
+where each number in the series is the sum of the previous two: 1, 1, 2, 3, 5,
+8, etc. Here, the first two numbers in the series are taken to be 1. One
+approach to calculating Fibonacci numbers uses recursive function calls. In
+Python, it might be written as:
+
+ <blockquote><pre><code>
+ def fib(a):
+ if a <= 2:
+ return 1
+ else:
+ return fib(a-2) + fib(a-1)
+ </code></pre></blockquote>
+
+In C, the same function would look something like this:
+
+ <blockquote><pre><code>
+ int fib(int a)
+ {
+ if(a <= 2)
+ return 1;
+ else
+ return fib(a-2) + fib(a-1);
+ }
+ </code></pre></blockquote>
+
+Recursion is much faster in C than in Python, so it would be beneficial
+to use the C version for fibonacci number calculations instead of the
+Python version. We need an extension function that calls this C function
+to do this. This is possible by including the above code snippet as
+"support code" and then calling it from the extension function. Support
+code snippets (usually structure definitions, helper functions and the like)
+are inserted into the extension module C/C++ file before the extension
+function code. Here is how to build the C version of the fibonacci number
+generator:
+
+ <blockquote><pre><code>
+def build_fibonacci():
+ """ Builds an extension module with fibonacci calculators.
+ """
+ mod = ext_tools.ext_module('fibonacci_ext')
+ a = 1 # this is effectively a type declaration
+
+ # recursive fibonacci in C
+ fib_code = """
+ int fib1(int a)
+ {
+ if(a <= 2)
+ return 1;
+ else
+ return fib1(a-2) + fib1(a-1);
+ }
+ """
+ ext_code = """
+ int val = fib1(a);
+ return_val = Py::new_reference_to(Py::Int(val));
+ """
+ fib = ext_tools.ext_function('fib',ext_code,['a'])
+ fib.customize.add_support_code(fib_code)
+ mod.add_function(fib)
+
+ mod.compile()
+
+ </code></pre></blockquote>
+
+XXX More about custom_info, and what xxx_info instances are good for.
+
+<p>
+<em>
+Note: recursion is not the fastest way to calculate fibonacci numbers, but this
+approach serves nicely for this example.
+</em>
+<p>
+<a name="#Type Factories"></a>
+<h1>Customizing Type Conversions -- Type Factories</h1>
+not written
+
+<h1>Things I wish <code>weave</code> did</h1>
+
+It is possible to get name clashes if you uses a variable name that is already defined
+in a header automatically included (such as <code>stdio.h</code>) For instance, if you
+try to pass in a variable named <code>stdout</code>, you'll get a cryptic error report
+due to the fact that <code>stdio.h</code> also defines the name. <code>weave</code>
+should probably try and handle this in some way.
+
+Other things...