1 files changed, 2721 insertions, 0 deletions
diff --git a/weave/doc/tutorial.html b/weave/doc/tutorial.html
new file mode 100644
index 000000000..4cb25a7d8
--- /dev/null
+++ b/weave/doc/tutorial.html
@@ -0,0 +1,2721 @@
+
+<h1>Compiler Documentation</h1>
+<p>
+By Eric Jones eric@enthought.com
+<p>
+<h2>Outline</h2>
+<dl> 
+<dd> <A href="#Introduction">Introduction</a>
+<dd> <A href="#Requirements">Requirements</a>
+<dd> <A href="#Installation">Installation and Testing</a>
+<dd> <A href="#Inline">Inline</a>
+    <dl>
+        <dd><A href="#More with printf">More with printf</a>
+        <dd>
+        <A href="#More examples">More examples</a>
+        <dl>
+            <dd><A href="#Binary search">Binary search</a>
+            <dd><A href="#Dictionary sort">Dictionary sort</a>
+            <dd><A href="#Numeric -- cast/copy/transpose">Numeric -- cast/copy/transpose</a>
+            <dd><A href="#wxPython">wxPython</a></dd>
+        </dl>
+        <dd><A href="#Keyword options">Keyword options</a>
+        <dd><A href="#Returning values">Returning values</a>
+        <dl>
+            <dd><A href="#The issue with locals()">
+                        The issue with <code>locals()</code></a></dd>
+        </dl>
+        <dd><A href="#inline_quick_look_at_code">A quick look at the code</a>
+        <dd>
+        <A href="#inline_technical_details">Technical Details</a>
+        <dl>
+            <dd><A href="#Converting Types">Converting Types</a>
+                <dl>
+                    <dd><A href="#inline_numeric_argument_conversion">
+                        Numeric Argument Conversion</a>
+                    <dd><A href="#inline_python_argument_conversion">
+                        String, List, Tuple, and Dictionary Conversion</a>
+                    <dd><A href="#inline_callable_argument_conversion">File Conversion</a> 
+                        <dd><A href="#inline_callable_argument_conversion">
+                        Callable, Instance, and Module Conversion</a>                
+                    <dd><A href="#Customizing Conversions">Customizing Conversions</a>
+                </dl>    
+            <dd><A href="#Compiling Code">Compiling Code</a>
+            <dd><a href="#The Catalog">"Cataloging" functions</a>
+            <dl>
+                <dd><a href="#function storage">Function Storage</a>
+                <dd><a href="#PYTHONCOMPILED">The PYTHONCOMPILED evnironment variable</a></dd>
+            </dl>
+            </dd>    
+        </dl>
+        </dd>        
+    </dl>
+<dd><A href="#Blitz">Blitz</a>
+    <dl>
+        <dd><a href="#blitz_requirements">Requirements</a>
+        <dd><a href="#blitz_limitations">Limitations</a>
+        <dd><a href="#Numeric Efficiency">Numeric Efficiency Issues</a>
+        <dd><a href="#blitz_tools">The Tools</a>        
+        <dl>
+            <dd><a href="#blitz_parser">Parser</a>
+            <dd><a href="#blitz_blitz">Blitz and Numeric</a>
+        </dl>
+        <dd><a href="#blitz_type_conversions">Type defintions and coersion</a>
+        <dd><a href="#blitz_catalog">Cataloging Compiled Functions</a>
+        <dd><a href="#blitz_array_sizes">Checking Array Sizes</a>
+         <dd><a href="#blitz_extension_module">Creating the Extension Module</a>
+    </dl>
+<dd> <a href="#Extension Modules"> Extension Modules</a>
+    <dl>
+        <dd><a href="#A Simple Example">A Simple Example</a>
+        <dd><a href="#Fibonacci Example">Fibonacci Example</a>
+   </dl>
+<dd> <a href="#Type Factories"> Customizing Type Conversions -- Type Factories (not written)</a>
+    <dl>
+        <dd>Type Specifications
+        <dd>Type Information
+        <dd>The Conversion Process    
+    </dl>    
+</dl>
+<a name="Introduction"></a>
+<h1>Introduction</h1>
+
+<p>
+The <code>compiler</code> package allows the inclusion of C/C++ within 
+Python code.  This offers both another level of optimization to those who need 
+it, and an easy way to modify and extend any supported extension libraries such 
+as wxPython and hopefully VTK soon. Inlining C/C++ code within Python generally
+results in speed ups of 1.5x to 30x speed-up over algorithms written in pure
+Python (However, it is also possible to slow things down...).  Generally 
+algorithms that require a large number of calls to the Python API don't benefit
+as much from the conversion to C/C++ as algorithms that have inner loops 
+completely convertable to C.
+<p> 
+There are three basic ways to use <code>compiler</code>. The 
+<code>compiler.inline()</code> function executes C code directly within Python, 
+and <code>compiler.blitz()</code> translates Python Numeric expressions to C++ 
+for fast execution.  This was the original functionality for which 
+<code>compiler</code> was built.  For those interested in building extension
+libraries, the <code>ext_tools</code> module provides classes for building 
+extension modules within Python. 
+<p>
+Most of <code>compiler's</code> functionality should work on Windows and Unix, 
+although some of its functionality requires <code>gcc</code> or a similarly 
+modern C++ compiler that handles templates well.  Up to now, most testing has 
+been done on Windows 2000 with Microsoft's C++ compiler (MSVC) and with gcc 
+(mingw32 2.95.2 and 2.95.3-6). All tests also seem to pass on Linux (RH 7.1 
+with gcc 2.96).
+<p>
+The <code>inline</code> and <code>blitz</code> provide new functionality to 
+Python (although I've recently learned about the <a 
+href="http://pyinline.sourceforge.net/" >PyInline</a> project which may offer 
+similar functionality to <code>inline</code>).  On the other hand, tools for 
+building Python extension modules already exists (SWIG, SIP, pycpp, CXX, and 
+others). As of yet, I'm not sure where <code>compiler</code> fits in this 
+spectrum. It is closest in flavor to CXX in that it makes creating new C/C++ 
+extension modules pretty easy. However, If you're wrapping a gaggle of legacy 
+functions or classes, SWIG and friends are definitely the better choice. 
+<code>compiler</code> is set up so that you can customize how Python types are 
+converted to C types in <code>compiler</code>. This is great for 
+<code>inline()</code>, but, for wrapping legacy code, it is generally better to 
+specify things the other way around -- that is how C types map to 
+Python types.  This <code>compiler</code> does not do.  I guess it would be 
+possible to build such a tool on top of <code>compiler</code>, but with good 
+tools like SWIG around, I'm not sure the effort produces any new capabilities. 
+Things like function overloading are probably easily implemented in 
+<code>compiler</code> and it might be easier to mix Python/C code in function 
+calls, but nothing beyond this comes to mind. So, if you're developing new 
+extension modules, or just want to optimize a few functions in C, 
+<code>compiler</code> might be the tool for you. If you're wrapping legacy code, 
+stick with SWIG.
+<p>
+The next several sections give the basics of how to use <code>compiler</code>.
+We'll discuss what's happening under the covers in more detail later 
+on.  Serious users will need to at least look at the type conversion section to 
+understand how Python variables map to C/C++ types and how to customize this 
+behavior.  One other note.  If you don't know C or C++ then these docs are 
+probably of very little help to you. Further, it'd be helpful if you know 
+something about writing Python extensions. <code>compiler</code> does quite a 
+bit for you, but for anything complex, you'll need to do some conversions, 
+reference counting, etc.
+<p>
+<em>
+Note: </em><code>compiler</code><em> is actually part of the <a 
+href="http://www.scipy.org">SciPy</a> package.  However, it works fine as a 
+standalone package.  The examples here are given as if it is used as a stand 
+alone package.  If you are using from within scipy, you can use <code> from 
+scipy import compiler</code> and the examples will work identically.</em>
+
+<a name="Requirements"></a>
+<h1>Requirements</h1>
+<ul>
+    <li> Python
+         <p>
+         I use 2.1.1.  Probably 2.0 or higher should work.
+         <p>
+    </li>
+    
+    <li> C++ compiler
+         <p>
+         compiler uses <code>distutils</code> to actually build extension modules,
+         so it uses whatever compiler was originally used to build Python.  
+         compiler itself requires a C++ compiler.  If you used a C++ compiler
+         to build Python, your probably fine.
+         <p>
+         On Unix gcc is the preferred choice, because I've done a little 
+         testing with it.  All testing has been done with gcc, but I expect the 
+         majority of compilers should work for <code>inline</code> and 
+         <code>ext_tools</code>.  The one issue I'm not sure about is that I've 
+         hard coded things so that compilations are linked with the 
+         <code>stdc++</code> library.  Is this standard across 
+         Unix compilers, or is this a gcc-ism?
+         <p>
+         For <code>blitz()</code>, you'll need a reasonably recent version of 
+         gcc. 2.95.2 works on windows and 2.96 looks fine on Linux. Other 
+         versions are likely to work.  Its likely that KAI's C++ compiler and 
+         maybe some others will work, but I haven't tried. My advise is to use 
+         gcc for now unless your willing to tinker with the code some.
+         <p>
+         On Windows, either MSVC or gcc (<a 
+         href="http://www.mingw.org>www.mingw.org" > mingw32</a>) should work.  Again, 
+         you'll need gcc for <code>blitz()</code> as the
+         MSVC compiler doesn't handle templates well.
+         <p>
+         I have not tried Cygwin, so please report success if it works for you.
+         <p>
+    </li>
+
+    <li> Numeric (optional)
+         <p>
+         The python Numeric module from <a 
+         href="http://www.pfdubois.com/numpy/">here</a>. is required for 
+         <code>blitz()</code> to work. Be sure and get NumPy, not NumArray
+         which is the "next generation" implementation.
+         <p>
+    </li>
+    <li> scipy_distutils and scipy_test (packaged with compiler)
+         <p>
+         These two modules are packaged with <code>compiler</code> in both
+         the windows installer and the source distributions.  If you are using
+         CVS, however, you'll need to download these separately (also available
+         through CVS at SciPy).
+         <p>
+    </li>
+</ul>
+<p>
+
+<a name="Installation"></a>
+<h1>Installation and Testing</h1>
+<p>
+There are currently two ways to get <code>compiler</code>. Fist, 
+<code>compiler</code> is part of SciPy and installed automatically (as a sub-
+package) whenever SciPy is installed (although the latest version isn't in
+SciPy yet, so use this one for now).  Second, since compiler is useful outside 
+of the scientific community, it has been setup so that it can be used as a 
+stand-alone module. 
+
+<p>
+The stand-alone version can be downloaded from <a 
+href="http://www.scipy.org/site_content/compiler">here</a>.  Unix users should grab the 
+tar ball (.tgz file) and install it using the following commands.
+
+    <blockquote><pre><code>
+    tar -xzvf compiler.tgz
+    cd compiler
+    python setup.py install
+    </code></pre></blockquote>        
+
+This will also install two other packages, <code>scipy_distutils</code> and 
+<code>scipy_test</code>.  The first is needed by the setup process itself and 
+both are used in the unit-testing process. For Windows users, it's even easier. 
+They can download the click-install .exe file and run it for automatic 
+installation.  Numeric is required if you want to use <code>blitz()</code>, but
+isn't necessary for <code>inline()</code> or <code>ext_tools</code>
+<p>
+If you're using the CVS version, you'll need to install scipy_distutils and 
+scipy_test modules (also available from CVS) on your own.
+<p>
+<em> Note: The dependency issue here is a little sticky.  I hate to make people 
+download more than one file (and so I haven't), but distutils doesn't have a 
+way to do conditional installation -- at least that I know about.  This can 
+lead to undesired clobbering of modules.  What to do, what to do...</em>
+<p>
+Once <code>compiler</code> is installed, fire up python and run its unit tests.
+
+    <blockquote><pre><code>
+    >>> import compiler
+    >>> compiler.test()
+    runs long time... spews tons of output
+    </code></pre></blockquote>        
+
+This takes a loooong time.  On windows, it is usually several minutes.  On Unix 
+with remote file systems, I've had it take 15 or so minutes.  In the end, it 
+should run about 150 tests and spew some speed results along the way.  If you 
+get errors, please let me know.
+
+If you don't have Numeric installed, you'll get some module import errors 
+during the test setup phase for modules that are Numeric specific (blitz_spec, 
+blitz_tools, size_check, standard_array_spec, ast_tools), but all test should
+pass (about 60 and the run time should be quite a bit less).
+<p>
+If you only want to test a single module of the package, you can do this by
+running test() for that specific module.
+
+    <blockquote><pre><code>
+    >>> import compiler.scalar_spec
+    >>> compiler.scalar_spec.test()
+    .......
+    ----------------------------------------------------------------------
+    Ran 7 tests in 23.284s
+    </code></pre></blockquote>        
+<em>
+Note: I've had some test fail on windows machines where I have msvc, gcc-2.95.2 
+(in c:\gcc-2.95.2), and gcc-2.95.3-6 (in c:\gcc) all installed. My environment 
+has c:\gcc in the path and does not have c:\gcc-2.95.2 in the path.  The test 
+process runs very smoothly until the end where several test using gcc fail with 
+cpp0 not found by g++.  If I check os.system('gcc -v') before running tests, I 
+get gcc-2.95.3-6.  If I check after running tests (and after failure), I get 
+gcc-2.95.2. ??huh??.  The os.environ['PATH'] still has c:\gcc first in it and 
+is not corrupted (msvc/distutils messes with the environment variables, so we 
+have to undo its work in some places).  If anyone else sees this, let me know -
+- it may just be an quirk on my machine (unlikely).  Testing with the gcc-
+2.95.2 installation always works.
+</em>
+
+<a name="Inline"></a>
+<h1>Inline</h1>
+<p>
+<code>inline()</code> compiles and executes C/C++ code on the fly.  Variables 
+in the local and global Python scope are also available in the C/C++ code. 
+Values are passed to the C/C++ code by assignment much like variables 
+are passed into a standard Python function.  Values are returned from the C/C++ 
+code through a special argument called return_val.  Also, the contents of 
+mutable objects can be changed within the C/C++ code and the changes remain 
+after the C code exits and returns to Python. (more on this later)
+<p>        
+Here's a trivial <code>printf</code> example using <code>inline()</code>:
+
+    <blockquote><pre><code>
+    >>> import compiler    
+    >>> a  = 1
+    >>> compiler.inline('printf("%d\\n",a);',['a'])
+    1
+    </code></pre></blockquote>        
+<p>
+In this, its most basic form, <code>inline(c_code, var_list)</code> requires two 
+arguments.  <code>c_code</code> is a string of valid C/C++ code. 
+<code>var_list</code> is a list of variable names that are passed from 
+Python into C/C++.  Here we have a simple <code>printf</code> statement that 
+writes the Python variable <code>a</code> to the screen. The first time you run 
+this, there will be a pause while the code is written to a .cpp file, compiled 
+into an extension module, loaded into Python, cataloged for future use, and 
+executed.  On windows (850 MHz PIII), this takes about 1.5 seconds when using 
+Microsoft's C++ compiler (MSVC) and 6-12 seconds using gcc (mingw32 2.95.2). 
+All subsequent executions of the code will happen very quickly because the code 
+only needs to be compiled once.  If you kill and restart the compiler and then 
+execute the same code fragment again, there will be a much shorter delay in the 
+fractions of seconds range. This is because <code>compiler</code> stores a 
+catalog of all previously compiled functions in an on disk cache. When it sees 
+a string that has been compiled, it loads the already compiled module and 
+executes the appropriate function.  
+<p>
+<em>
+Note: If you try the <code>printf</code> example in a GUI shell such as IDLE, 
+PythonWin, PyShell, etc., you're unlikely to see the output.  This is because the 
+C code is writing to stdout, instead of to the GUI window.  This doesn't mean 
+that inline doesn't work in these environments -- it only means that standard 
+out in C is not the same as the standard out for Python in these cases.  Non 
+input/output functions will work as expected.
+</em>
+<p>
+Although effort has been made to reduce the overhead associated with calling 
+inline, it is still less efficient for simple code snippets than using 
+equivalent Python code.  The simple <code>printf</code> example is actually 
+slower by 30% or so than using Python <code>print</code> statement. And, it is 
+not difficult to create code fragments that are 8-10 times slower using inline 
+than equivalent Python.  However, for more complicated algorithms, 
+the speed up can be worth while -- anywhwere from 1.5- 30 times faster. 
+Algorithms that have to manipulate Python objects (sorting a list) usually only 
+see a factor of 2 or so improvement.  Algorithms that are highly computational 
+or manipulate Numeric arrays can see much larger improvements.  The 
+examples/vq.py file shows a factor of 30 or more improvement on the vector 
+quantization algorithm that is used heavily in information theory and 
+classification problems.
+<p>
+
+<a name="More with printf"></a>
+<h2>More with printf</h2>
+<p>
+MSVC users will actually see a bit of compiler output that distutils does not
+supress the first time the code executes:
+
+    <blockquote><pre><code>    
+    >>> compiler.inline(r'printf("%d\n",a);',['a'])
+    sc_e013937dbc8c647ac62438874e5795131.cpp
+       Creating library C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp
+       \Release\sc_e013937dbc8c647ac62438874e5795131.lib and object C:\DOCUME
+       ~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_e013937dbc8c64
+       7ac62438874e5795131.exp
+    1
+    </code></pre></blockquote>
+<p>
+Nothing bead is happening, its just a bit annoying. <em> Anyone know how to 
+turn this off?</em>    
+<p>
+This example also demonstrates using 'raw strings'. The <code>r</code> 
+preceeding the code string in the last example denotes that this is a 'raw 
+string'.  In raw strings, the backslash character is not interpreted as an 
+escape character, and so it isn't necessary to use a double backslash to 
+indicate that the '\n' is meant to be interpreted in the C <code>printf</code> 
+statement instead of by Python.  If your C code contains a lot
+of strings and control characters, raw strings might make things easier.
+Most of the time, however, standard strings work just as well.
+
+<p>
+The <code>printf</code> statement in these examples is formatted to print 
+out integers.  What happens if <code>a</code> is a string?  <code>inline</code>
+will happily, compile a new version of the code to accept strings as input,
+and execute the code.  The result?
+
+    <blockquote><pre><code>    
+    >>> a = 'string'
+    >>> compiler.inline(r'printf("%d\n",a);',['a'])
+    32956972
+    </code></pre></blockquote>      
+<p>
+In this case, the result is non-sensical, but also non-fatal.  In other 
+situations, it might produce a compile time error because <code>a</code> is 
+required to be an integer at some point in the code, or it could produce a 
+segmentation fault.  Its possible to protect against passing 
+<code>inline</code> arguments of the wrong data type by using asserts in 
+Python.
+
+    <blockquote><pre><code>    
+    >>> a = 'string'
+    >>> def protected_printf(a):    
+    ...     assert(type(a) == type(1))
+    ...     compiler.inline(r'printf("%d\n",a);',['a'])
+    >>> protected_printf(1)
+     1
+    >>> protected_printf('string')
+    AssertError...
+    </code></pre></blockquote>      
+
+<p>
+For printing strings, the format statement needs to be changed.
+
+    <blockquote><pre><code>    
+    >>> a = 'string'
+    >>> compiler.inline(r'printf("%s\n",a);',['a'])
+    string
+    </code></pre></blockquote>      
+
+<p>
+As in this case, C/C++ code fragments often have to change to accept different 
+types. For the given printing task, however, C++ streams provide a way of a 
+single statement that works for integers and strings.  By default, the stream 
+objects live in the std (standard) namespace and thus require the use of 
+<code>std::</code>.
+
+    <blockquote><pre><code>    
+    >>> compiler.inline('std::cout << a << std::endl;',['a'])
+    1    
+    >>> a = 'string'
+    >>> compiler.inline('std::cout << a << std::endl;',['a'])
+    string
+    </code></pre></blockquote> 
+         
+<p>
+Examples using <code>printf</code> and <code>cout</code> are included in 
+examples/print_example.py.
+
+<a name="More examples"></a>
+<h2> More examples </h2>
+
+This section shows several more advanced uses of <code>inline</code>.  It 
+includes a few algorithms from the <a 
+href="http://aspn.activestate.com/ASPN/Cookbook/Python">Python Cookbook</a> 
+that have been re-written in inline C to improve speed as well as a couple 
+examples using Numeric and wxPython.
+
+<a name="Binary search"></a>
+<h3> Binary search</h3>
+Lets look at the example of searching a sorted list of integers for a value. 
+For inspiration, we'll use Kalle Svensson's <a 
+href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/81188"> 
+binary_search()</a> algorithm from the Python Cookbook.  His recipe follows:
+
+    <blockquote><pre><code>
+    def binary_search(seq, t):
+        min = 0; max = len(seq) - 1
+        while 1:
+            if max < min:
+                return -1
+            m = (min  + max)  / 2
+            if seq[m] < t: 
+                min = m  + 1 
+            elif seq[m] > t: 
+                max = m  - 1 
+            else:
+                return m    
+    </blockquote></PRE></CODE>
+
+This Python version works for arbitrary Python data types.  The C version below is 
+specialized to handle integer values.  There is a little type checking done in 
+Python to assure that we're working with the correct data types before heading 
+into C.  The variables <code>seq</code> and <code>t</code> don't need to be 
+declared beacuse <code>compiler</code> handles converting and declaring them in 
+the C code.  All other temporary variables such as <code>min, max</code>, etc. 
+must be declared -- it is C after all.  Here's the new mixed Python/C function:
+
+    <blockquote><pre><code>    
+    def c_int_binary_search(seq,t):
+        # do a little type checking in Python
+        assert(type(t) == type(1))
+        assert(type(seq) == type([]))
+        
+        # now the C code
+        code = """
+               #line 29 "binary_search.py"
+               int val, m, min = 0;  
+               int max = seq.length() - 1;
+               PyObject *py_val; 
+               for(;;)
+               {
+                   if (max < min  ) 
+                   { 
+                       return_val =  Py::new_reference_to(Py::Int(-1)); 
+                       break;
+                   } 
+                   m =  (min + max) /2;
+                   val =    py_to_int(PyList_GetItem(seq.ptr(),m),"val"); 
+                   if (val  < t) 
+                       min = m  + 1;
+                   else if (val >  t)
+                       max = m - 1;
+                   else
+                   {
+                       return_val = Py::new_reference_to(Py::Int(m));
+                       break;
+                   }
+               }
+               """
+        return inline(code,['seq','t'])
+    </code></pre></blockquote> 
+<p>
+We have two variables <code>seq</code> and <code>t</code> passed in. 
+<code>t</code> is guaranteed (by the <code>assert</code>) to be an integer. 
+Python integers are converted to C int types in the transition from Python to 
+C.  <code>seq</code> is a Python list.  By default, it is translated to a CXX 
+list object.  Full documentation for the CXX library can be found at its <a 
+href="http://cxx.sourceforge.net/">website</a>. The basics are that the CXX 
+provides C++ class equivalents for Python objects that simplify, or at 
+least object orientify, working with Python objects in C/C++.  For example, 
+<code>seq.length()</code> returns the length of the list. A little more about
+CXX and its class methods, etc. is in the ** type conversions ** section.
+<p>
+Most of the algorithm above looks similar in C to the original Python code. 
+There are two main differences.  The first is the setting of 
+<code>return_val</code> instead of directly returning from the C code with a 
+<code>return</code> statement.  <code>return_val</code> is an automatically 
+defined variable of type <code>PyObject*</code> that is returned from the C 
+code back to Python.  You'll have to handle reference counting issues when 
+setting this variable.  In this example, CXX classes and functions handle the 
+dirty work. All CXX functions and classes live in the namespace 
+<code>Py::</code>.  The following code converts the integer <code>m</code> to a 
+CXX <code>Int()</code> object and then to a <code>PyObject*</code> with an 
+incremented reference count using <code>Py::new_reference_to()</code>.
+
+    <blockquote><pre><code>   
+    return_val = Py::new_reference_to(Py::Int(m));
+    </code></pre></blockquote>   
+<p>
+The second big differences shows up in the retrieval of integer values from the 
+Python list.  The simple Python <code>seq[i]</code> call balloons into a C 
+Python API call to grab the value out of the list and then a separate call to 
+<code>py_to_int()</code> that converts the PyObject* to an integer. 
+<code>py_to_int()</code> includes both a NULL cheack and a 
+<code>PyInt_Check()</code> call as well as the conversion call.  If either of 
+the checks fail, an exception is raised.  The entire C++ code block is executed 
+with in a <code>try/catch</code> block that handles exceptions much like Python 
+does.  This removes the need for most error checking code.
+<p>
+It is worth note that CXX lists do have indexing operators that result 
+in code that looks much like Python.  However, the overhead in using them 
+appears to be relatively high, so the standard Python API was used on the 
+<code>seq.ptr()</code> which is the underlying <code>PyObject*</code> of the 
+List object.
+<p>
+The <code>#line</code> directive that is the first line of the C code 
+block isn't necessary, but it's nice for debugging. If the compilation fails 
+because of the syntax error in the code, the error will be reported as an error 
+in the Python file "binary_search.py" with an offset from the given line number 
+(29 here).
+<p>
+So what was all our effort worth in terms of efficiency?  Well not a lot in 
+this case.  The examples/binary_search.py file runs both Python and C versions 
+of the functions  As well as using the standard <code>bisect</code> module.  If 
+we run it on a 1 million element list and run the search 3000 times (for 0-
+2999), here are the results we get:
+
+    <blockquote><pre><code>   
+    C:\home\ej\wrk\scipy\compiler\examples> python binary_search.py
+    Binary search for 3000 items in 1000000 length list of integers:
+     speed in python: 0.159999966621
+     speed of bisect: 0.121000051498
+     speed up: 1.32
+     speed in c: 0.110000014305
+     speed up: 1.45
+     speed in c(no asserts): 0.0900000333786
+     speed up: 1.78
+    </code></pre></blockquote>   
+<p>
+So, we get roughly a 50-75% improvement depending on whether we use the Python 
+asserts in our C version.  If we move down to searching a 10000 element list, 
+the advantage evaporates.  Even smaller lists might result in the Python 
+version being faster.  I'd like to say that moving to Numeric lists (and 
+getting rid of the GetItem() call) offers a substantial speed up, but my 
+preliminary efforts didn't produce one.  I think the log(N) algorithm is to 
+blame.  Because the algorithm is nice, there just isn't much time spent 
+computing things, so moving to C isn't that big of a win.  If there are ways to 
+reduce conversion overhead of values, this may improve the C/Python speed 
+up. Anyone have other explanations or faster code, please let me know.
+
+<a name="#Dictionary sort"></a>
+<h3> Dictionary Sort</h3>
+<p>
+The demo in examples/dict_sort.py is another example from the Python CookBook. 
+<a href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52306">This 
+submission</a>, by Alex Martelli, demonstrates how to return the values from a 
+dictionary sorted by their keys:
+
+    <blockquote><pre><code>       
+    def sortedDictValues3(adict):
+        keys = adict.keys()
+        keys.sort()
+        return map(adict.get, keys)
+    </code></pre></blockquote>   
+<p>
+Alex provides 3 algorithms and this is the 3rd and fastest of the set.  The C 
+version of this same algorithm follows:
+
+    <blockquote><pre><code>       
+    def c_sort(adict):
+        assert(type(adict) == type({}))
+        code = """     
+        #line 21 "dict_sort.py"  
+        Py::List keys = adict.keys();
+        Py::List items(keys.length()); keys.sort();     
+        PyObject* item = NULL; 
+        for(int i = 0;  i < keys.length();i++)
+        {
+            item = PyList_GET_ITEM(keys.ptr(),i);
+            item = PyDict_GetItem(adict.ptr(),item);
+            Py_XINCREF(item);
+            PyList_SetItem(items.ptr(),i,item);              
+        }           
+        return_val = Py::new_reference_to(items);
+        """   
+        return inline_tools.inline(code,['adict'],verbose=1)
+    </code></pre></blockquote>   
+<p>
+Like the original Python function, the C++ version can handle any Python 
+dictionary regardless of the key/value pair types.  It uses CXX objects for the 
+most part to declare python types in C++, but uses Python API calls to manipulate 
+their contents.  Again, this choice is made for speed.  The C++ version, while
+more complicated, is about a factor of 2 faster than Python.
+
+    <blockquote><pre><code>       
+    C:\home\ej\wrk\scipy\compiler\examples> python dict_sort.py
+    Dict sort of 1000 items for 300 iterations:
+     speed in python: 0.319999933243
+    [0, 1, 2, 3, 4]
+     speed in c: 0.151000022888
+     speed up: 2.12
+    [0, 1, 2, 3, 4]
+    </code></pre></blockquote>
+<p>
+<a name="#Numeric -- cast/copy/transpose"></a>
+<h3>Numeric -- cast/copy/transpose</h3>
+
+CastCopyTranspose is a function called quite heavily by Linear Algebra routines
+in the Numeric library.  Its needed in part because of the row-major memory layout
+of multi-demensional Python (and C) arrays vs. the col-major order of the underlying
+Fortran algorithms.  For small matrices (say 100x100 or less), a significant
+portion of the common routines such as LU decompisition or singular value decompostion
+are spent in this setup routine.  This shouldn't happen.  Here is the Python
+version of the function using standard Numeric operations.
+
+    <blockquote><pre><code>       
+    def _castCopyAndTranspose(type, array):
+        if a.typecode() == type:
+            cast_array = copy.copy(Numeric.transpose(a))
+        else:
+            cast_array = copy.copy(Numeric.transpose(a).astype(type))
+        return cast_array
+    </code></pre></blockquote>
+
+And the following is a inline C version of the same function:
+
+    <blockquote><pre><code>
+    from compiler.blitz_tools import blitz_type_factories
+    from compiler import scalar_spec
+    from compiler import inline
+    def _cast_copy_transpose(type,a_2d):
+        assert(len(shape(a_2d)) == 2)
+        new_array = zeros(shape(a_2d),type)
+        numeric_type = scalar_spec.numeric_to_blitz_type_mapping[type]
+        code = \
+        """  
+        for(int i = 0;i < _Na_2d[0]; i++)  
+            for(int j = 0;  j < _Na_2d[1]; j++)
+                new_array(i,j) = (%s) a_2d(j,i);
+        """ % numeric_type
+        inline(code,['new_array','a_2d'],
+               type_factories = blitz_type_factories,compiler='gcc')
+        return new_array
+    </code></pre></blockquote>
+
+This example uses blitz++ arrays instead of the standard representation of 
+Numeric arrays so that indexing is simplier to write.  This is accomplished by 
+passing in the blitz++ "type factories" to override the standard Python to C++ 
+type conversions.  Blitz++ arrays allow you to write clean, fast code, but they 
+also are sloooow to compile (20 seconds or more for this snippet). This is why 
+they aren't the default type used for Numeric arrays (and also because most 
+compilers can't compile blitz arrays...).  <code>inline()</code> is also forced 
+to use 'gcc' as the compiler because the default compiler on Windows (MSVC) 
+will not compile blitz code.  <em> 'gcc' I think will use the standard compiler 
+on Unix machine instead of explicitly forcing gcc (check this) </em>
+
+Comparisons of the Python vs inline C++ code show a factor of 3 speed up.  Also 
+shown are the results of an "inplace" transpose routine that can be used if the 
+output of the linear algebra routine can overwrite the original matrix (this is 
+often appropriate).  This provides another factor of 2 improvement.
+
+    <blockquote><pre><code>
+     #C:\home\ej\wrk\scipy\compiler\examples> python cast_copy_transpose.py
+    # Cast/Copy/Transposing (150,150)array 1 times
+    #  speed in python: 0.870999932289
+    #  speed in c: 0.25
+    #  speed up: 3.48
+    #  inplace transpose c: 0.129999995232
+    #  speed up: 6.70
+    </code></pre></blockquote>
+
+<a name="#wxPython" a <>
+<h3>wxPython</h3>
+
+<code>inline</code> knows how to handle wxPython objects.  Thats nice in and of
+itself, but it also demonstrates that the type conversion mechanism is reasonably 
+flexible.  Chances are, it won't take a ton of effort to support special types
+you might have.  The examples/wx_example.py borrows the scrolled window
+example from the wxPython demo, accept that it mixes inline C code in the middle
+of the drawing function.
+
+    <blockquote><pre><code>
+    def DoDrawing(self, dc):
+        
+        red = wxNamedColour("RED");
+        blue = wxNamedColour("BLUE");
+        grey_brush = wxLIGHT_GREY_BRUSH;
+        code = \
+        """
+        #line 108 "wx_example.py" 
+        dc->BeginDrawing();
+        dc->SetPen(wxPen(*red,4,wxSOLID));
+        dc->DrawRectangle(5,5,50,50);
+        dc->SetBrush(*grey_brush);
+        dc->SetPen(wxPen(*blue,4,wxSOLID));
+        dc->DrawRectangle(15, 15, 50, 50);
+        """
+        inline(code,['dc','red','blue','grey_brush'])
+        
+        dc.SetFont(wxFont(14, wxSWISS, wxNORMAL, wxNORMAL))
+        dc.SetTextForeground(wxColour(0xFF, 0x20, 0xFF))
+        te = dc.GetTextExtent("Hello World")
+        dc.DrawText("Hello World", 60, 65)
+
+        dc.SetPen(wxPen(wxNamedColour('VIOLET'), 4))
+        dc.DrawLine(5, 65+te[1], 60+te[0], 65+te[1])
+        ...
+    </code></pre></blockquote>
+
+Here, some of the Python calls to wx objects were just converted to C++ calls.  There
+isn't any benefit, it just demonstrates the capabilities.  You might want to use this
+if you have a computationally intensive loop in your drawing code that you want to 
+speed up.
+
+On windows, you'll have to use the MSVC compiler if you use the standard wxPython
+DLLs distributed by Robin Dunn.  Thats because MSVC and gcc, while binary
+compatible in C, are not binary compatible for C++.  In fact, its probably best, no 
+matter what platform you're on, to specify that <code>inline</code> use the same
+compiler that was used to build wxPython to be on the safe side.  There isn't currently
+a way to learn this info from the library -- you just have to know.  Also, at least
+on the windows platform, you'll need to install the wxWindows libraries and link to 
+them.  I think there is a way around this, but I haven't found it yet -- I get some
+linking errors dealing with wxString.  One final note.  You'll probably have to
+tweak compiler/wx_spec.py or compiler/wx_info.py for your machine's configuration to
+point at the correct directories etc.  There.  That should sufficiently scare people
+into not even looking at this... :)
+
+<a name="Keyword Options"></a>
+<h2> Keyword Options </h2>
+<p>
+The basic definition of the <code>inline()</code> function has a slew of 
+optional variables.  It also takes keyword arguments that are passed to 
+<code>distutils</code> as compiler options.  The following is a formatted 
+cut/paste of the argument section of <code>inline's</code> doc-string.  It 
+explains all of the variables.  Some examples using various options will 
+follow.
+
+    <blockquote><pre><code>       
+    def inline(code,arg_names,local_dict = None, global_dict = None, 
+               force = 0, 
+               compiler='',
+               verbose = 0, 
+               support_code = None,
+               customize=None, 
+               type_factories = None, 
+               auto_downcast=1,
+               **kw):
+    </code></pre></blockquote>           
+
+   
+<code>inline</code> has quite 
+a few options as listed below. Also, the keyword arguments for distutils 
+extension modules are accepted to specify extra information needed for 
+compiling. 
+<BLOCKQUOTE></BLOCKQUOTE>
+<h4>inline Arguments:</h4>       
+<blockquote>
+<dl>
+<dt>code </dt>
+    
+<dd>
+string. A string of valid C++ code. It should not 
+    specify a return statement. Instead it should assign results that need to be 
+    returned to Python in the return_val. 
+</dd>
+
+<dt>arg_names </dt>
+    
+<dd>
+list of strings. A list of Python variable names 
+    that should be transferred from Python into the C/C++ code. 
+</dd>
+
+<dt>local_dict </dt>
+    
+<dd>
+optional. dictionary. If specified, it is a 
+    dictionary of values that should be used as the local scope for the C/C++ 
+    code. If local_dict is not specified the local dictionary of the calling 
+    function is used. 
+</dd>
+
+<dt>global_dict </dt>
+    
+<dd>
+optional. dictionary. If specified, it is a 
+    dictionary of values that should be used as the global scope for the C/C++ 
+    code. If global_dict is not specified the global dictionary of the calling 
+    function is used. 
+</dd>
+
+<dt>force </dt>
+    
+<dd>
+optional. 0 or 1. default 0. If 1, the C++ code is 
+    compiled every time inline is called. This is really only useful for 
+    debugging, and probably only useful if you're editing support_code a lot. 
+</dd>    
+
+<dt>compiler </dt>
+    
+<dd>
+optional. string.  The name of compiler to use when compiling.  On windows, it 
+understands 'msvc' and 'gcc' as well as all the compiler names understood by 
+distutils.  On Unix, it'll only understand the values understoof by distutils. 
+(I should add 'gcc' though to this).
+<p>
+On windows, the compiler defaults to the Microsoft C++ compiler.  If this isn't 
+available, it looks for mingw32 (the gcc compiler).
+<p>
+On Unix, it'll probably use the same compiler that was used when compiling 
+Python. Cygwin's behavior should be similar.</p>
+</dd>
+
+<dt>verbose </dt>
+    
+<dd>
+optional. 0,1, or 2. defualt 0. Speficies how much 
+    much information is printed during the compile phase of inlining code. 0 is 
+    silent (except on windows with msvc where it still prints some garbage). 1 
+    informs you when compiling starts, finishes, and how long it took. 2 prints 
+    out the command lines for the compilation process and can be useful if you're 
+    having problems getting code to work. Its handy for finding the name of the 
+    .cpp file if you need to examine it. verbose has no affect if the 
+    compilation isn't necessary. 
+</dd>
+
+<dt>support_code </dt>
+    
+<dd>
+optional. string. A string of valid C++ code 
+    declaring extra code that might be needed by your compiled function. This 
+    could be declarations of functions, classes, or structures. 
+</dd>
+
+<dt>customize </dt>
+    
+<dd>
+optional. base_info.custom_info object. An 
+    alternative way to specifiy support_code, headers, etc. needed by the 
+    function see the compiler.base_info module for more details. (not sure 
+    this'll be used much). 
+    
+</dd>
+<dt>type_factories </dt>
+    
+<dd>
+optional. list of type specification factories. These guys are what convert 
+Python data types to C/C++ data types.  If you'd like to use a different set of 
+type conversions than the default, specify them here. Look in the type 
+conversions section of the main documentation for examples.
+</dd>
+<dt>auto_downcast </dt>
+    
+<dd>
+optional. 0 or 1. default 1.  This only affects functions that have Numeric 
+arrays as input variables. Setting this to 1 will cause all floating point 
+values to be cast as float instead of double if all the Numeric arrays are of 
+type float.  If even one of the arrays has type double or double complex, all 
+variables maintain there standard types.
+</dd>
+</dl>
+</blockquote>        
+
+<h4> Distutils keywords:</h4>
+<blockquote>
+<code>inline()</code> also accepts a number of <code>distutils</code> keywords 
+for controlling how the code is compiled.  The following descriptions have been 
+copied from Greg Ward's <code>distutils.extension.Extension</code> class doc-
+strings for convenience:
+
+<dl>               
+<dt>sources </dt>
+    
+<dd>
+[string] list of source filenames, relative to the 
+    distribution root (where the setup script lives), in Unix form 
+    (slash-separated) for portability. Source files may be C, C++, SWIG (.i), 
+    platform-specific resource files, or whatever else is recognized by the 
+    "build_ext" command as source for a Python extension. Note: The module_path 
+    file is always appended to the front of this list 
+</dd>   
+
+<dt>include_dirs </dt>
+    
+<dd>
+[string] list of directories to search for C/C++ 
+    header files (in Unix form for portability) 
+</dd>
+
+<dt>define_macros </dt>
+    
+<dd>
+[(name : string, value : string|None)] list of 
+    macros to define; each macro is defined using a 2-tuple, where 'value' is 
+    either the string to define it to or None to define it without a particular 
+    value (equivalent of "#define FOO" in source or -DFOO on Unix C compiler 
+    command line) 
+</dd>    
+<dt>undef_macros </dt>
+    
+<dd>
+[string] list of macros to undefine explicitly 
+</dd>
+<dt>library_dirs </dt>
+<dd> 
+[string] list of directories to search for C/C++ libraries at link time 
+</dd>
+<dt>libraries </dt>   
+<dd> 
+[string] list of library names (not filenames or paths) to link against 
+</dd>
+<dt>runtime_library_dirs </dt>
+<dd>
+[string] list of directories to search for C/C++ libraries at run time (for 
+shared extensions, this is when the extension is loaded) 
+</dd>
+
+<dt>extra_objects </dt>
+    
+<dd>
+[string] list of extra files to link with (eg. 
+    object files not implied by 'sources', static library that must be 
+    explicitly specified, binary resource files, etc.) 
+</dd>
+
+<dt>extra_compile_args </dt>
+    
+<dd>
+[string] any extra platform- and compiler-specific 
+    information to use when compiling the source files in 'sources'. For 
+    platforms and compilers where "command line" makes sense, this is typically 
+    a list of command-line arguments, but for other platforms it could be 
+    anything. 
+</dd>
+<dt>extra_link_args </dt>
+    
+<dd>
+[string] any extra platform- and compiler-specific 
+    information to use when linking object files together to create the 
+    extension (or to create a new static Python interpreter). Similar 
+    interpretation as for 'extra_compile_args'. 
+</dd>
+<dt>export_symbols </dt>
+    
+<dd>
+[string] list of symbols to be exported from a shared extension.  Not used on 
+all platforms, and not generally necessary for Python extensions, which 
+typically export exactly one symbol: "init" + extension_name.           
+</dd>
+</dl>
+</blockquote>
+
+<a name="Keyword Option  Examples"></a>
+<h3> Keyword Option Examples</h3>
+We'll walk through several examples here to demonstrate the behavior of 
+<code>inline</code> and also how the various arguments are used.
+
+In the simplest (most) cases, <code>code</code> and <code>arg_names</code>
+are the only arguments that need to be specified.  Here's a simple example
+run on Windows machine that has Microsoft VC++ installed.
+
+    <blockquote><pre><code>
+    >>> from compiler import inline
+    >>> a = 'string'
+    >>> code = """
+    ...        int l = a.length();
+    ...        return_val = Py::new_reference_to(Py::Int(l));
+    ...        """
+    >>> inline(code,['a'])
+     sc_86e98826b65b047ffd2cd5f479c627f12.cpp
+    Creating
+       library C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b047ffd2cd5f479c627f12.lib
+    and object C:\DOCUME~ 1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b047ff
+    d2cd5f479c627f12.exp
+    6
+    >>> inline(code,['a'])
+    6
+    </code></pre></blockquote>
+    
+When <code>inline</code> is first run, you'll notice that pause and some 
+trash printed to the screen.  The "trash" is acutually part of the compilers
+output that distutils does not supress.  On Unix or windows machines with only
+gcc installed, the trash will not appear.  On the second call, the code 
+fragment is not compiled since it already exists, and only the answer is 
+returned. Now kill the interpreter and restart, and run the same code with
+a different string.
+
+    <blockquote><pre><code>
+    >>> from compiler import inline
+    >>> a = 'a longer string' 
+    >>> code = """ 
+    ...        int l = a.length();
+    ...        return_val = Py::new_reference_to(Py::Int(l));  
+    ...        """
+    >>> inline(code,['a'])
+    15
+    </code></pre></blockquote>
+<p>
+Notice this time, <code>inline()</code> did not recompile the code because it
+found the compiled function in the persistent catalog of functions.  There is
+a short pause as it looks up and loads the function, but it is much shorter 
+than compiling would require.
+<p>
+You can specify the local and global dictionaries if you'd like (much like 
+<code>exec</code> or <code>eval()</code> in Python), but if they aren't 
+specified, the "expected" ones are used -- i.e. the ones from the function that 
+called <code>inline() </code>.  This is accomplished through a little call 
+frame trickery.  Here is an example where the local_dict is specified using
+the same code example from above:
+
+    <blockquote><pre><code>
+    >>> a = 'a longer string'
+    >>> b = 'an even  longer string' 
+    >>> my_dict = {'a':b}
+    >>> inline(code,['a'])
+    15
+    >>> inline(code,['a'],my_dict)
+    21
+    </code></pre></blockquote>
+    
+<p>
+Everytime, the <code>code</code> is changed, <code>inline</code> does a 
+recompile.  However, changing any of the other options in inline does not
+force a recompile.  The <code>force</code> option was added so that one
+could force a recompile when tinkering with other variables.  In practice,
+it is just as easy to change the <code>code</code> by a single character
+(like adding a space some place) to force the recompile. <em>Note: It also 
+might be nice to add some methods for purging the cache and on disk 
+catalogs.</em>
+<p>
+I use <code>verbose</code> sometimes for debugging.  When set to 2, it'll 
+output all the information (including the name of the .cpp file) that you'd
+expect from running a make file.  This is nice if you need to examine the
+generated code to see where things are going haywire.  Note that error
+messages from failed compiles are printed to the screen even if <code>verbose
+</code> is set to 0.
+<p>
+The following example demonstrates using gcc instead of the standard msvc 
+compiler on windows using same code fragment as above.  Because the example has 
+already been compiled, the <code>force=1</code> flag is needed to make 
+<code>inline()</code> ignore the previously compiled version and recompile 
+using gcc.  The verbose flag is added to show what is printed out:
+
+    <blockquote><pre><code>
+    >>>inline(code,['a'],compiler='gcc',verbose=2,force=1)
+    running build_ext    
+    building 'sc_86e98826b65b047ffd2cd5f479c627f13' extension 
+    c:\gcc-2.95.2\bin\g++.exe -mno-cygwin -mdll -O2 -w -Wstrict-prototypes -IC:
+    \home\ej\wrk\scipy\compiler -IC:\Python21\Include -c C:\DOCUME~1\eric\LOCAL
+    S~1\Temp\python21_compiled\sc_86e98826b65b047ffd2cd5f479c627f13.cpp -o C:\D
+    OCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e98826b65b04
+    7ffd2cd5f479c627f13.o    
+    skipping C:\home\ej\wrk\scipy\compiler\CXX\cxxextensions.c (C:\DOCUME~1\eri
+    c\LOCALS~1\Temp\python21_compiled\temp\Release\cxxextensions.o up-to-date)
+    skipping C:\home\ej\wrk\scipy\compiler\CXX\cxxsupport.cxx (C:\DOCUME~1\eric
+    \LOCALS~1\Temp\python21_compiled\temp\Release\cxxsupport.o up-to-date)
+    skipping C:\home\ej\wrk\scipy\compiler\CXX\IndirectPythonInterface.cxx (C:\
+    DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\indirectpythonin
+    terface.o up-to-date)
+    skipping C:\home\ej\wrk\scipy\compiler\CXX\cxx_extensions.cxx (C:\DOCUME~1\
+    eric\LOCALS~1\Temp\python21_compiled\temp\Release\cxx_extensions.o up-to-da
+    te)
+    writing C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86
+    e98826b65b047ffd2cd5f479c627f13.def
+    c:\gcc-2.95.2\bin\dllwrap.exe --driver-name g++ -mno-cygwin -mdll -static -
+    -output-lib C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\l
+    ibsc_86e98826b65b047ffd2cd5f479c627f13.a --def C:\DOCUME~1\eric\LOCALS~1\Te
+    mp\python21_compiled\temp\Release\sc_86e98826b65b047ffd2cd5f479c627f13.def 
+    -s C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\temp\Release\sc_86e9882
+    6b65b047ffd2cd5f479c627f13.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compil
+    ed\temp\Release\cxxextensions.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_com
+    piled\temp\Release\cxxsupport.o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_com
+    piled\temp\Release\indirectpythoninterface.o C:\DOCUME~1\eric\LOCALS~1\Temp
+    \python21_compiled\temp\Release\cxx_extensions.o -LC:\Python21\libs -lpytho
+    n21 -o C:\DOCUME~1\eric\LOCALS~1\Temp\python21_compiled\sc_86e98826b65b047f
+    fd2cd5f479c627f13.pyd
+    15
+    </code></pre></blockquote>
+
+That's quite a bit of output.  <code>verbose=1</code> just prints the compile
+time.
+
+    <blockquote><pre><code>
+    >>>inline(code,['a'],compiler='gcc',verbose=1,force=1)
+    Compiling code...
+    finished compiling (sec):  6.00800001621
+    15
+    </code></pre></blockquote>
+
+<p>
+<em> Note: I've only used the <code>compiler</code> option for switching between 'msvc'
+and 'gcc' on windows.  It may have use on Unix also, but I don't know yet.
+</em>
+
+<p>
+The <code>support_code</code> argument is likely to be used a lot.  It allows 
+you to specify extra code fragments such as function, structure or class 
+definitions that you want to use in the <code>code</code> string.  Note that 
+changes to <code>support_code</code> do <em>not</em> force a recompile.  The 
+catalog only relies on <code>code</code> (for performance reasons) to determine 
+whether recompiling is necessary.  So, if you make a change to support_code, 
+you'll need to alter <code>code</code> in some way or use the 
+<code>force</code> argument to get the code to recompile.  I usually just add 
+some inocuous whitespace to the end of one of the lines in <code>code</code> 
+somewhere.  Here's an example of defining a separate method for calculating
+the string length:
+
+    <blockquote><pre><code>
+    >>> from compiler import inline
+    >>> a = 'a longer string'
+    >>> support_code = """
+    ...                PyObject* length(Py::String a)
+    ...                {
+    ...                    int l = a.length();  
+    ...                    return Py::new_reference_to(Py::Int(l)); 
+    ...                }
+    ...                """        
+    >>> inline("return_val = length(a);",['a'],
+    ...        support_code = support_code)
+    15
+    </code></pre></blockquote>
+<p>
+<code>customize</code> is a left over from a previous way of specifying 
+compiler options.  It is a <code>custom_info</code> object that can specify 
+quite a bit of information about how a file is compiled.  These 
+<code>info</code> objects are the standard way of defining compile information 
+for type conversion classes.  However, I don't think they are as handy here, 
+especially since we've exposed all the keyword arguments that distutils can 
+handle. Between these keywords, and the <code>support_code</code> option, I 
+think <code>customize</code> may be obsolete.  We'll see if anyone cares to use 
+it. If not, it'll get axed in the next version.
+<p>
+The <code>type_factories</code> variable is important to people who want to
+customize the way arguments are converted from Python to C.  We'll talk about
+this in the next chapter **xx** of this document when we discuss type
+conversions.
+<p>
+<code>auto_downcast</code> handles one of the big type conversion issues that
+is common when using Numeric arrays in conjunction with Python scalar values.
+If you have an array of single precision values and multiply that array by a 
+Python scalar, the result is upcast to a double precision array because the
+scalar value is double precision.  This is not usually the desired behavior
+because it can double your memory usage.  <code>auto_downcast</code> goes
+some distance towards changing the casting precedence of arrays and scalars.
+If your only using single precision arrays, it will automatically downcast all
+scalar values from double to single precision when they are passed into the
+C++ code.  This is the default behavior.  If you want all values to keep there
+default type, set <code>auto_downcast</code> to 0.
+<p>
+
+
+<a name="Returning Values"></a>
+<h3> Returning Values</h3>
+
+Python variables in the local and global scope transfer seemlessly from Python 
+into the C++ snippets.  And, if <code>inline</code> were to completely live up
+to its name, any modifications to variables in the C++ code would be reflected
+in the Python variables when control was passed back to Python.  For example,
+the desired behavior would be something like:
+
+    <blockquote><pre><code>
+    # THIS DOES NOT WORK
+    >>> a = 1
+    >>> compiler.inline("a++;",['a'])
+    >>> a
+    2
+    </code></pre></blockquote>        
+
+Instead you get:
+
+    <blockquote><pre><code>
+    >>> a = 1
+    >>> compiler.inline("a++;",['a'])
+    >>> a
+    1
+    </code></pre></blockquote>        
+    
+Variables are passed into C++ as if you are calling a Python function.  Python's 
+calling convention is sometimes called "pass by assignment".  This means its as 
+if a <code>c_a = a</code> assignment is made right before <code>inline</code> 
+call is made and the <code>c_a</code> variable is used within the C++ code. 
+Thus, any changes made to <code>c_a</code> are not reflected in Python's 
+<code>a</code> variable.  Things do get a little more confusing, however, when 
+looking at variables with mutable types.  Changes made in C++ to the contents 
+of mutable types <em>are</em> reflected in the Python variables.
+
+    <blockquote><pre><code>
+    >>> a= [1,2]
+    >>> compiler.inline("PyList_SetItem(a.ptr(),0,PyInt_FromLong(3));",['a'])
+    >>> print a
+    [3, 2]
+    </code></pre></blockquote>        
+
+So modifications to the contents of mutable types in C++ are seen when control
+is returned to Python.  Modifications to immutable types such as tuples,
+strings, and numbers do not alter the Python variables.
+
+If you need to make changes to an immutable variable, you'll need to assign
+the new value to the "magic" variable <code>return_val</code> in C++.  This
+value is returned by the <code>inline()</code> function:
+
+    <blockquote><pre><code>
+    >>> a = 1
+    >>> a = compiler.inline("return_val = Py::new_reference_to(Py::Int(a+1));",['a'])  
+    >>> a
+    2
+    </code></pre></blockquote>        
+
+The <code>return_val</code> variable can also be used to return newly created 
+values.  This is possible by returning a tuple.  The following trivial example 
+illustrates how this can be done:
+
+    <blockquote><pre><code>       
+    # python version
+    def multi_return():
+        return 1, '2nd'
+    
+    # C version.
+    def c_multi_return():    
+        code =  """
+     	        Py::Tuple results(2);
+     	        results[0] = Py::Int(1);
+     	        results[1] = Py::String("2nd");
+     	        return_val = Py::new_reference_to(results); 	        
+                """
+        return inline_tools.inline(code,[])
+    </code></pre></blockquote>
+<p> 
+The example is available in <code>examples/tuple_return.py</code>.  It also
+has the dubious honor of demonstrating how much <code>inline()</code> can 
+slow things down.  The C version here is about 10 times slower than the Python
+version.  Of course, something so trivial has no reason to be written in
+C anyway.
+
+<a name="The issue with locals()"></a>
+<h4> The issue with <code>locals()</code></h4>
+<p>
+<code>inline</code> passes the <code>locals()</code> and <code>globals()</code> 
+dictionaries from Python into the C++ function from the calling function.  It 
+extracts the variables that are used in the C++ code from these dictionaries, 
+converts then to C++ variables, and then calculates using them.  It seems like 
+it would be trivial, then, after the calculations were finished to then insert 
+the new values back into the <code>locals()</code> and <code>globals()</code> 
+dictionaries so that the modified values were reflected in Python. 
+Unfortunately, as pointed out by the Python manual, the locals() dictionary is 
+not writable.  
+<p>
+<em>
+I suspect <code>locals()</code> is not writable because there are some 
+optimizations done to speed lookups of the local namespace.  I'm guessing local 
+lookups don't always look at a dictionary to find values.  Can someone "in the 
+know" confirm or correct this?  Another thing I'd like to know is whether there 
+is a way to write to the local namespace of another stack frame from C/C++.  If 
+so, it would be possible to have some clean up code in compiled functions that 
+wrote final values of variables in C++ back to the correct Python stack frame. 
+I think this goes a long way toward making <code>inline</code> truely live up 
+to its name. I don't think we'll get to the point of creating variables in 
+Python for variables created in C -- although I suppose with a C/C++ parser you 
+could do that also.
+</em>
+<p>
+
+<a name="inline_quick_look_at_code"></a>
+<h3>A quick look at the code</h3>
+
+<code>compiler</code> generates a C++ file holding an extension function for 
+each <code>inline</code> code snippet.  These file names are generated using 
+from the md5 signature of the code snippet and saved to a location specified by 
+the PYTHONCOMPILED environment variable (discussed later).  The cpp files are 
+generally about 200-400 lines long and include quite a few functions to support 
+type conversions, etc.  However, the actual compiled function is pretty simple. 
+Below is the familiar <code>printf</code> example:
+
+    <blockquote><pre><code>
+    >>> import compiler    
+    >>> a = 1
+    >>> compiler.inline('printf("%d\\n",a);',['a'])
+    1
+    </code></pre></blockquote>        
+
+And here is the extension function generated by <code>inline</code>:
+
+    <blockquote><pre><code>
+    static PyObject* compiled_func(PyObject*self, PyObject* args)
+    {
+        // The Py_None needs an incref before returning
+        PyObject *return_val = NULL;
+        int exception_occured = 0;
+        PyObject *py__locals = NULL;
+        PyObject *py__globals = NULL;
+        PyObject *py_a;
+        py_a = NULL;
+        
+        if(!PyArg_ParseTuple(args,"OO:compiled_func",&py__locals,&py__globals))
+            return NULL;
+        try                              
+        {                                
+            PyObject* raw_locals = py_to_raw_dict(py__locals,"_locals");
+            PyObject* raw_globals = py_to_raw_dict(py__globals,"_globals");
+            int a = py_to_int (get_variable("a",raw_locals,raw_globals),"a");
+            /* Here is the inline code */            
+            printf("%d\n",a);
+            /* I would like to fill in changed locals and globals here... */
+        }                                       
+        catch( Py::Exception& e)           
+        {                                
+            return_val =  Py::Null();    
+            exception_occured = 1;       
+        }                                 
+        if(!return_val && !exception_occured)
+        {
+                                      
+            Py_INCREF(Py_None);              
+            return_val = Py_None;            
+        }
+        /* clean up code */
+        
+        /* return */                              
+        return return_val;           
+    }                                
+    </code></pre></blockquote>        
+
+Every inline function takes exactly two arguments -- the local and global
+dictionaries for the current scope.  All variable values are looked up out
+of these dictionaries.  The lookups, along with all <code>inline</code> code 
+execution, are done within a C++ <code>try</code> block.  If the variables
+aren't found, or there is an error converting a Python variable to the 
+appropriate type in C++, an exception is raised.  The C++ exception
+is automatically converted to a Python exception by CXX and returned to Python.
+
+The <code>py_to_int()</code> function illustrates how the conversions and
+exception handling works.  py_to_int first checks that the given PyObject*
+pointer is not NULL and is a Python integer.  If all is well, it calls the
+Python API to convert the value to an <code>int</code>.  Otherwise, it calls
+<code>handle_bad_type()</code> which gathers information about what went wrong
+and then raises a CXX TypeError which returns to Python as a TypeError.
+
+    <blockquote><pre><code>
+    int py_to_int(PyObject* py_obj,char* name)
+    {
+        if (!py_obj || !PyInt_Check(py_obj))
+            handle_bad_type(py_obj,"int", name);
+        return (int) PyInt_AsLong(py_obj);
+    }
+    </code></pre></blockquote>        
+
+    <blockquote><pre><code>
+    void handle_bad_type(PyObject* py_obj, char* good_type, char*  var_name)
+    {
+        char msg[500];
+        sprintf(msg,"received '%s' type instead of '%s' for variable '%s'",
+                find_type(py_obj),good_type,var_name);
+        throw Py::TypeError(msg);
+    }
+    
+    char* find_type(PyObject* py_obj)
+    {
+        if(py_obj == NULL) return "C NULL value";
+        if(PyCallable_Check(py_obj)) return "callable";
+        if(PyString_Check(py_obj)) return "string";
+        if(PyInt_Check(py_obj)) return "int";
+        if(PyFloat_Check(py_obj)) return "float";
+        if(PyDict_Check(py_obj)) return "dict";
+        if(PyList_Check(py_obj)) return "list";
+        if(PyTuple_Check(py_obj)) return "tuple";
+        if(PyFile_Check(py_obj)) return "file";
+        if(PyModule_Check(py_obj)) return "module";
+        
+        //should probably do more interagation (and thinking) on these.
+        if(PyCallable_Check(py_obj) && PyInstance_Check(py_obj)) return "callable";
+        if(PyInstance_Check(py_obj)) return "instance"; 
+        if(PyCallable_Check(py_obj)) return "callable";
+        return "unkown type";
+    }
+    </code></pre></blockquote>        
+
+Since the <code>inline</code> is also executed within the <code>try/catch</code>
+block, you can use CXX exceptions within your code.  It is usually a bad idea
+to directly <code>return</code> from your code, even if an error occurs.  This
+skips the clean up section of the extension function.  In this simple example,
+there isn't any clean up code, but in more complicated examples, there may
+be some reference counting that needs to be taken care of here on converted
+variables.  To avoid this, either uses exceptions or set 
+<code>return_val</code> to NULL and use <code>if/then's</code> to skip code
+after errors.
+
+<a name="inline_technical_details"></a>
+<h2> Technical Details </h2>
+<p>
+There are 
+<ol>
+    <li>Type conversion 
+    <li>Generating C/C++ code 
+    <li>Compile the code to an extension module 
+    <li>Catalog (and cache) the function for future use</li>
+</ol>
+<p>
+Items 1 and 2 above are related, but most easily discussed separately.  Type 
+conversions are customizable by the user if needed.  Understanding them is 
+pretty important for anything beyond trivial uses of <code>inline</code>. 
+Generating the C/C++ code is handled by <code>ext_function</code> and 
+<code>ext_module</code> classes and . For the most part, compiling the code is 
+handled by distutils.  Some customizations were needed, but they were 
+relatively minor and do not require changes to distutils itself (although a few 
+changes would be nice...). Cataloging is pretty simple in concept, but surprisingly
+required the most code to implement (and still likely needs some work).  So,
+this section covers items 1 and 4 from the list.  Item 2 is covered later in
+the chapter covering the <code>ext_tools</code> module, and distutils is covered
+by a completely separate document xxx.
+    
+<h2>Passing Variables in/out of the C/C++ code</h2>
+<em>
+Note: Passing variables into the C code is pretty straight forward, but there 
+are subtlties to how variable modifications in C are returned to Python. see 
+xxx for a more thorough discussion of this issue.
+</em> 
+
+<A name="Converting Types"></a>
+<h2>Type Conversions</h2>
+
+<em>
+Note: Maybe xxx_converter instead of xxx_specification is a more descriptive 
+name.
+</em>
+
+<p>
+By default, <code>inline()</code> makes the following type conversions between
+Python and C++ types.
+<p>
+
+<center>
+<table border=1 style="WIDTH: 420px; HEIGHT: 395px">
+<tr><td colspan="2" width="100%">
+      <P align=center>Default Data Type Conversions</P>   </td></tr>
+<tr><td>
+      <P align=center>Python</P></td><td>
+      <P align=center>C++</P></td></tr>
+<tr><td>&nbsp;&nbsp; int</td><td>&nbsp;&nbsp; int</td></tr>
+<tr><td>&nbsp;&nbsp; float</td><td>&nbsp;&nbsp; double</td></tr>
+<tr><td>&nbsp;&nbsp; complex</td><td>&nbsp;&nbsp; std::complex<double></td></tr>
+<tr><td>&nbsp;&nbsp; string</td><td>&nbsp;&nbsp; Py::String</td></tr>
+<tr><td>&nbsp;&nbsp; list</td><td>&nbsp;&nbsp; Py::List</td></tr>
+<tr><td>&nbsp;&nbsp; dict</td><td>&nbsp;&nbsp; Py::Dict</td></tr>
+<tr><td>&nbsp;&nbsp; tuple</td><td>&nbsp;&nbsp; Py::Tuple</td></tr>
+<tr><td>&nbsp;&nbsp; file</td><td>&nbsp;&nbsp; FILE*</td></tr>
+<tr><td>&nbsp;&nbsp; callable</td><td>&nbsp;&nbsp; PyObject*</td></tr>
+<tr><td>&nbsp;&nbsp; instance</td><td>&nbsp;&nbsp; PyObject*</td></tr>
+<tr><td>&nbsp;&nbsp; Numeric.array</td><td>&nbsp;&nbsp; PyArrayObject*</td></tr>
+<tr><td>&nbsp;&nbsp; wxXXX</td><td>&nbsp;&nbsp; wxXXX*</td></tr>
+</table>
+</center>
+<p>
+The <code>Py::</code> namespace is defined by the 
+<a href="http://cxx.sourceforge.net/">CXX</a> library which has C++ class
+equivalents for many Python types.  <code>std::</code> is the namespace of the
+standard library in C++.
+<p>
+<em>
+Note: 
+<ul>
+<li>I haven't figured out how to handle <code>long int</code> yet (I think they are currenlty converted 
+  to int - - check this). 
+  
+<li>
+Hopefully VTK will be added to the list soon</li>
+        </ul>
+<UL></UL>
+</em>
+<p>
+
+Python to C++ conversions fill in code in several locations in the generated
+<code>inline</code> extension function.  Below is the basic template for the
+function.   This is actually the exact code that is generated by calling
+<code>compiler.inline("",[])</code>.
+
+    <blockquote><pre><code>
+    static PyObject* compiled_func(PyObject*self, PyObject* args)
+    {
+        PyObject *return_val = NULL;
+        int exception_occured = 0;
+        PyObject *py__locals = NULL;
+        PyObject *py__globals = NULL;
+        PyObject *py_a;
+        py_a = NULL;
+    
+        if(!PyArg_ParseTuple(args,"OO:compiled_func",&py__locals,&py__globals))
+            return NULL;
+        try
+        {
+            PyObject* raw_locals = py_to_raw_dict(py__locals,"_locals");
+            PyObject* raw_globals = py_to_raw_dict(py__globals,"_globals");
+            /* argument conversion code */
+            /* inline code */
+            /*I would like to fill in changed locals and globals here...*/
+    
+        }
+        catch( Py::Exception& e)
+        {
+            return_val =  Py::Null();
+            exception_occured = 1;
+        }
+        /* cleanup code */
+        if(!return_val && !exception_occured)
+        {
+    
+            Py_INCREF(Py_None);
+            return_val = Py_None;
+        }
+    
+        return return_val;
+    }
+    </code></pre></blockquote>
+
+The <code>/* inline code */</code> section is filled with the code passed to
+the <code>inline()</code> function call. The 
+<code>/*argument convserion code*/</code> and <code>/* cleanup code */</code>
+sections are filled with code that handles conversion from Python to C++
+types and code that deallocates memory or manipulates reference counts before
+the function returns.  The following sections demostrate how these two areas
+are filled in by the default conversion methods.
+
+<em> 
+Note:  I'm not sure I have reference counting correct on a few of these.  The 
+only thing I increase/decrease the ref count on is Numeric arrays.  If you
+see an issue, please let me know.
+</em>
+
+<a name="inline_numeric_argument_conversion"></a>
+<h3> Numeric Argument Conversion </h3>
+
+Integer, floating point, and complex arguments are handled in a very similar
+fashion.  Consider the following inline function that has a single integer 
+variable passed in:
+
+    <blockquote><pre><code>
+    >>> a = 1
+    >>> inline("",['a'])
+    </code></pre></blockquote>
+
+The argument conversion code inserted for <code>a</code> is:
+
+    <blockquote><pre><code>
+    /* argument conversion code */
+    int a = py_to_int (get_variable("a",raw_locals,raw_globals),"a");
+    </code></pre></blockquote>
+
+<code>get_variable()</code> reads the variable <code>a</code>
+from the local and global namespaces.  <code>py_to_int()</code> has the following
+form:
+
+    <blockquote><pre><code>
+    static int py_to_int(PyObject* py_obj,char* name)
+    {
+        if (!py_obj || !PyInt_Check(py_obj))
+            handle_bad_type(py_obj,"int", name);
+        return (int) PyInt_AsLong(py_obj);
+    }
+    </code></pre></blockquote>
+
+Similarly, the float and complex conversion routines look like:
+
+    <blockquote><pre><code>    
+    static double py_to_float(PyObject* py_obj,char* name)
+    {
+        if (!py_obj || !PyFloat_Check(py_obj))
+            handle_bad_type(py_obj,"float", name);
+        return PyFloat_AsDouble(py_obj);
+    }
+    
+    static std::complex<double> py_to_complex(PyObject* py_obj,char* name)
+    {
+        if (!py_obj || !PyComplex_Check(py_obj))
+            handle_bad_type(py_obj,"complex", name);
+        return std::complex<double>(PyComplex_RealAsDouble(py_obj),
+                                    PyComplex_ImagAsDouble(py_obj));    
+    }
+    </code></pre></blockquote>
+
+Numeric conversions do not require any clean up code.
+
+<a name="inline_python_argument_conversion"></a>
+<h3> String, List, Tuple, and Dictionary Conversion </h3>
+
+Strings, Lists, Tuples and Dictionary conversions are all converted to 
+CXX types by default.
+
+For the following code, 
+
+    <blockquote><pre><code>
+    >>> a = [1]
+    >>> inline("",['a'])
+    </code></pre></blockquote>
+
+The argument conversion code inserted for <code>a</code> is:
+
+    <blockquote><pre><code>
+    /* argument conversion code */
+    Py::List a = py_to_list (get_variable("a",raw_locals,raw_globals),"a");
+    </code></pre></blockquote>
+
+<code>get_variable()</code> reads the variable <code>a</code>
+from the local and global namespaces.  <code>py_to_list()</code> and its
+friends has the following form:
+
+    <blockquote><pre><code>    
+    static Py::List py_to_list(PyObject* py_obj,char* name)
+    {
+        if (!py_obj || !PyList_Check(py_obj))
+            handle_bad_type(py_obj,"list", name);
+        return Py::List(py_obj);
+    }
+    
+    static Py::String py_to_string(PyObject* py_obj,char* name)
+    {
+        if (!PyString_Check(py_obj))
+            handle_bad_type(py_obj,"string", name);
+        return Py::String(py_obj);
+    }
+
+    static Py::Dict py_to_dict(PyObject* py_obj,char* name)
+    {
+        if (!py_obj || !PyDict_Check(py_obj))
+            handle_bad_type(py_obj,"dict", name);
+        return Py::Dict(py_obj);
+    }
+    
+    static Py::Tuple py_to_tuple(PyObject* py_obj,char* name)
+    {
+        if (!py_obj || !PyTuple_Check(py_obj))
+            handle_bad_type(py_obj,"tuple", name);
+        return Py::Tuple(py_obj);
+    }
+    </code></pre></blockquote>
+
+CXX handles reference counts on for strings, lists, tuples, and dictionaries,
+so clean up code isn't necessary.
+
+<a name="#inline_file_argument_conversion"></a>
+<h3> File Conversion </h3>
+
+For the following code, 
+
+    <blockquote><pre><code>
+    >>> a = open("bob",'w')  
+    >>> inline("",['a'])
+    </code></pre></blockquote>
+
+The argument conversion code is:
+
+    <blockquote><pre><code>
+    /* argument conversion code */
+    PyObject* py_a = get_variable("a",raw_locals,raw_globals);
+    FILE* a = py_to_file(py_a,"a");
+    </code></pre></blockquote>
+
+<code>get_variable()</code> reads the variable <code>a</code>
+from the local and global namespaces.  <code>py_to_file()</code> converts
+PyObject* to a FILE* and increments the reference count of the PyObject*:
+
+    <blockquote><pre><code>
+    FILE* py_to_file(PyObject* py_obj, char* name)
+    {
+        if (!py_obj || !PyFile_Check(py_obj))
+            handle_bad_type(py_obj,"file", name);
+    
+        Py_INCREF(py_obj);
+        return PyFile_AsFile(py_obj);
+    }
+    </code></pre></blockquote>
+
+Because the PyObject* was incremented, the clean up code needs to decrement
+the counter
+
+    <blockquote><pre><code>
+    /* cleanup code */
+    Py_XDECREF(py_a);
+    </code></pre></blockquote>
+
+Its important to understand that file conversion only works on actual files --
+i.e. ones created using the <code>open()</code> command in Python.  It does
+not support converting arbitrary objects that support the file interface into
+C <code>FILE*</code> pointers.  This can affect many things.  For example, in
+initial <code>printf()</code> examples, one might be tempted to solve the 
+problem of C and Python IDE's (PythonWin, PyCrust, etc.) writing to different
+stdout and stderr by using <code>fprintf()</code> and passing in 
+<code>sys.stdout</code> and <code>sys.stderr</code>.  For example, instead of
+
+    <blockquote><pre><code>
+    >>> compiler.inline('printf("hello\\n");',[])
+    </code></pre></blockquote>
+    
+You might try:
+
+    <blockquote><pre><code>
+    >>> buf = sys.stdout
+    >>> compiler.inline('fprintf(buf,"hello\\n");',['buf'])
+    </code></pre></blockquote>
+
+This will work as expected from a standard python interpreter, but in PythonWin,
+the following occurs:
+
+    <blockquote><pre><code>
+    >>> buf = sys.stdout
+    >>> compiler.inline('fprintf(buf,"hello\\n");',['buf'])
+    Traceback (most recent call last):
+        File "<interactive input>", line 1, in ?
+        File "C:\Python21\compiler\inline_tools.py", line 315, in inline
+            auto_downcast = auto_downcast,
+        File "C:\Python21\compiler\inline_tools.py", line 386, in compile_function
+            type_factories = type_factories)
+        File "C:\Python21\compiler\ext_tools.py", line 197, in __init__
+            auto_downcast, type_factories)
+        File "C:\Python21\compiler\ext_tools.py", line 390, in assign_variable_types
+            raise TypeError, format_error_msg(errors)
+        TypeError: {'buf': "Unable to convert variable 'buf' to a C++ type."}
+    </code></pre></blockquote>
+
+The traceback tells us that <code>inline()</code> was unable to convert 'buf' to a
+C++ type (If instance conversion was implemented, the error would have occurred at 
+runtime instead).  Why is this?  Let's look at what the <code>buf</code> object 
+really is:
+
+    <blockquote><pre><code>
+    >>> buf
+    pywin.framework.interact.InteractiveView instance at 00EAD014
+    </code></pre></blockquote>
+
+PythonWin has reassigned <code>sys.stdout</code> to a special object that implements the Python
+file interface.  This works great in Python, but since the special object doesn't
+have a FILE* pointer underlying it, fprintf doesn't know what to do with it (well this
+will be the problem when instance conversion is implemented...).
+
+<a name="#inline_callable_argument_conversion"></a>
+<h3> Callable, Instance, and Module Conversion </h3>
+
+<em>Note:  Need to look into how ref counts should be handled.  Also,
+Instance and Module conversion are not currently implemented.
+</em>
+
+    <blockquote><pre><code>
+    >>> def a(): 
+        pass
+    >>> inline("",['a'])
+    </code></pre></blockquote>
+
+Callable and instance variables are converted to PyObject*.  Nothing is done
+to there reference counts.
+
+    <blockquote><pre><code>
+    /* argument conversion code */
+    PyObject* a = py_to_callable(get_variable("a",raw_locals,raw_globals),"a");
+    </code></pre></blockquote>
+
+<code>get_variable()</code> reads the variable <code>a</code>
+from the local and global namespaces.  The <code>py_to_callable()</code> and
+<code>py_to_instance()</code> don't currently increment the ref count.
+
+    <blockquote><pre><code>    
+    PyObject* py_to_callable(PyObject* py_obj, char* name)
+    {
+        if (!py_obj || !PyCallable_Check(py_obj))
+            handle_bad_type(py_obj,"callable", name);    
+        return py_obj;
+    }
+
+    PyObject* py_to_instance(PyObject* py_obj, char* name)
+    {
+        if (!py_obj || !PyFile_Check(py_obj))
+            handle_bad_type(py_obj,"instance", name);    
+        return py_obj;
+    }
+    </code></pre></blockquote>
+    
+There is no cleanup code for callables, modules, or instances.
+
+<a name="#Customizing Conversions"></a>
+<h3> Customizing Conversions </h3>
+<p>
+Converting from Python to C++ types is handled by xxx_specification classes. A 
+type specification class actually serve in two related but different 
+roles. The first is in determining whether a Python variable that needs to be 
+converted should be represented by the given class.  The second is as a code 
+generator that generate C++ code needed to convert from Python to C++ types for 
+a specific variable.
+<p>
+When 
+
+    <blockquote><pre><code>
+    >>> a = 1
+    >>> compiler.inline('printf("%d",a);',['a'])
+    </code></pre></blockquote>        
+    
+is called for the first time, the code snippet has to be compiled.  In this 
+process, the variable 'a' is tested against a list of type specifications (the 
+default list is stored in compiler/ext_tools.py). The <em>first</em> 
+specification in the list is used to represent the variable. 
+
+<p>
+Examples of <code>xxx_specification</code> are scattered throughout numerous 
+"xxx_spec.py" files in the <code>compiler</code> package.  Closely related to 
+the <code>xxx_specification</code> classes are <code>yyy_info</code> classes. 
+These classes contain compiler, header, and support code information necessary 
+for including a certain set of capabilities (such as blitz++ or CXX support)
+in a compiled module.  <code>xxx_specification</code> classes have one or more
+<code>yyy_info</code> classes associated with them.
+
+If you'd like to define your own set of type specifications, the current best route
+is to examine some of the existing spec and info files.  Maybe looking over
+sequence_spec.py and cxx_info.py are a good place to start.  After defining 
+specification classes, you'll need to pass them into <code>inline</code> using the 
+<code>type_factories</code> argument.  
+
+A lot of times you may just want to change how a specific variable type is 
+represented.  Say you'd rather have Python strings converted to 
+<code>std::string</code> or maybe <code>char*</code> instead of using the CXX 
+string object, but would like all other type conversions to have default 
+behavior.  This requires that a new specification class that handles strings
+is written and then prepended to a list of the default type specifications.  Since
+it is closer to the front of the list, it effectively overrides the default
+string specification.
+
+The following code demonstrates how this is done:
+
+...
+
+<a name="The Catalog"></a>
+<h2> The Catalog </h2>
+<p>
+<code>catalog.py</code> has a class called <code>catalog</code> that helps keep 
+track of previously compiled functions.  This prevents <code>inline()</code> 
+and related functions from having to compile functions everytime they are 
+called.  Instead, catalog will check an in memory cache to see if the function 
+has already been loaded into python.  If it hasn't, then it starts searching 
+through persisent catalogs on disk to see if it finds an entry for the given 
+function.  By saving information about compiled functions to disk, it isn't
+necessary to re-compile functions everytime you stop and restart the interpreter.
+Functions are compiled once and stored for future use.
+
+<p>
+When <code>inline(cpp_code)</code> is called the following things happen:
+<ol>    
+    <li>
+    A fast local cache of functions is checked for the last function called for 
+    <code>cpp_code</code>.  If an entry for <code>cpp_code</code> doesn't exist in the 
+  cache or the cached function call fails (perhaps because the function doesn't 
+  have compatible types) then the next step is to check the catalog. 
+    <li> 
+    The catalog class also keeps an in-memory cache with a list of all the 
+    functions compiled for <code>cpp_code</code>.  If <code>cpp_code</code> has
+    ever been called, then this cache will be present (loaded from disk). If
+    the cache isn't present, then it is loaded from disk.
+    <p>
+    If the cache is present, each function in the cache is 
+  called until one is found that was compiled for the correct argument types. If 
+  none of the functions work, a new function is compiled with the given argument 
+  types. This function is written to the on-disk catalog as well as into the 
+  in-memory cache.</p>
+    <li>
+    When a lookup for <code>cpp_code</code> fails, the catalog looks through 
+    the on-disk function catalogs for the entries.  The PYTHONCOMPILED variable 
+    determines where to search for these catalogs and in what order.  If 
+    PYTHONCOMPILED is not present several platform dependent locations are 
+    searched. All functions found for <code>cpp_code</code> in the path are 
+    loaded into the in-memory cache with functions found earlier in the search 
+    path closer to the front of the call list.
+    <p>
+    If the function isn't found in the on-disk catalog, 
+  then the function is compiled, written to the first writable directory in the 
+  PYTHONCOMPILED path, and also loaded into the in-memory cache.</p>
+    </li>
+</ol>    
+
+<a name="function storage"></a>
+<h3> Function Storage: How functions are stored in caches and on disk </h3>
+<p>
+Function caches are stored as dictionaries where the key is the entire C++
+code string and the value is either a single function (as in the "level 1"
+cache) or a list of functions (as in the main catalog cache).  On disk
+catalogs are stored in the same manor using standard Python shelves.
+<p>
+Early on, there was a question as to whether md5 check sums of the C++
+code strings should be used instead of the actual code strings.  I think this
+is the route inline Perl took.  Some (admittedly quick) tests of the md5 vs.
+the entire string showed that using the entire string was at least a
+factor of 3 or 4 faster for Python.  I think this is because it is more
+time consuming to compute the md5 value than it is to do look-ups of long
+strings in the dictionary.  Look at the examples/md5_speed.py file for the
+test run.  
+
+<a name="PYTHONCOMPILED"></a>
+<h3> Catalog search paths and the PYTHONCOMPILED variable</h3>
+<p>
+The default location for catalog files on Unix is is ~/.pythonXX_compiled where 
+XX is version of Python being used.  If this directory doesn't exist, it is 
+created the first time a catalog is used.  The directory must be writable.  If, 
+for any reason it isn't, then the catalog attempts to create a directory based 
+on your user id in the /tmp directory.  The directory permissions are set so 
+that only you have access to the directory.  If this fails, I think you're out of 
+luck.  I don't think either of these should ever fail though. On Windows, a 
+directory called pythonXX_compiled is created in the user's temporary 
+directory.  
+<p>
+The actual catalog file that lives in this directory is a Python shelve with
+a platform specific name such as "nt21compiled_catalog" so that multiple OSes
+can share the same file systems without trampling on each other.  Along with
+the catalog file, the .cpp and .so or .pyd files created by inline will live
+in this directory.  The catalog file simply contains keys which are the C++
+code strings with values that are lists of functions.  The function lists point
+at functions within these compiled modules.  Each function in the lists 
+executes the same C++ code string, but compiled for different input variables.
+<p>
+You can use the PYTHONCOMPILED environment variable to specify alternative
+locations for compiled functions.  On Unix this is a colon (':') separated
+list of directories.  On windows, it is a (';') separated list of directories.
+These directories will be searched prior to the default directory for a
+compiled function catalog.  Also, the first writable directory in the list
+is where all new compiled function catalogs, .cpp and .so or .pyd files are
+written.  Relative directory paths ('.' and '..') should work fine in the
+PYTHONCOMPILED variable as should environement variables.
+<p>
+There is a "special" path variable called MODULE that can be placed in the 
+PYTHONCOMPILED variable.  It specifies that the compiled catalog should
+reside in the same directory as the module that called it.  This is useful
+if an admin wants to build a lot of compiled functions during the build
+of a package and then install them in site-packages along with the package.
+User's who specify MODULE in their PYTHONCOMPILED variable will have access
+to these compiled functions.  Note, however, that if they call the function
+with a set of argument types that it hasn't previously been built for, the
+new function will be stored in their default directory (or some other writable
+directory in the PYTHONCOMPILED path) because the user will not have write
+access to the site-packages directory.
+<p>
+An example of using the PYTHONCOMPILED path on bash follows:
+
+    <blockquote><pre><code>
+    PYTHONCOMPILED=MODULE:/some/path;export PYTHONCOMPILED;
+    </code></pre></blockquote>        
+
+If you are using python21 on linux, and the module bob.py in site-packages
+has a compiled function in it, then the catalog search order when calling that
+function for the first time in a python session would be:
+
+    <blockquote><pre><code>
+    /usr/lib/python21/site-packages/linuxpython_compiled
+    /some/path/linuxpython_compiled
+    ~/.python21_compiled/linuxpython_compiled
+    </code></pre></blockquote>            
+
+The default location is always included in the search path.
+<p>
+<em> 
+Note: hmmm.  see a possible problem here.  I should probably make a sub-
+directory such as /usr/lib/python21/site-
+packages/python21_compiled/linuxpython_compiled so that library files compiled 
+with python21 are tried to link with python22 files in some strange scenarios. 
+Need to check this.
+</em>
+
+<p>
+The in-module cache (in <code>compiler.inline_tools</code> reduces the overhead 
+of calling inline functions by about a factor of 2.  It can be reduced a little 
+more for type loop calls where the same function is called over and over again 
+if the cache was a single value instead of a dictionary, but the benefit is 
+very small (less than 5%) and the utility is quite a bit less.  So, we'll stick 
+with a dictionary as the cache.
+<p></p>
+
+<a name="Blitz"></a>
+<h1>Blitz</h1>
+<em> Note: most of this section is lifted from old documentation.  It should be
+pretty accurate, but there may be a few discrepancies.</em>
+<p>
+<code>compiler.blitz()</code> compiles Numeric Python expressions for fast 
+execution.  For most applications, compiled expressions should provide a 
+factor of 2-10 speed-up over Numeric arrays.  Using compiled 
+expressions is meant to be as unobtrusive as possible and works much like 
+pythons exec statement. As an example, the following code fragment takes a 5 
+point average of the 512x512 2d image, b, and stores it in array, a:
+
+    <blockquote><pre><code>
+    from scipy import *  # or from Numeric import *
+    a = ones((512,512), Float64) 
+    b = ones((512,512), Float64) 
+    # ...do some stuff to fill in b...
+    # now average
+    a[1:-1,1:-1] =  (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1] \
+                   + b[1:-1,2:] + b[1:-1,:-2]) / 5.
+    </code></pre></blockquote>            
+                       
+To compile the expression, convert the expression to a string by putting
+quotes around it and then use <code>compiler.blitz</code>:
+
+    <blockquote><pre><code>
+    import compiler
+    expr = "a[1:-1,1:-1] =  (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1]" \
+                          "+ b[1:-1,2:] + b[1:-1,:-2]) / 5."
+    compiler.blitz(expr)
+    </code></pre></blockquote>            
+
+The first time <code>compiler.blitz</code> is run for a given expression and 
+set of arguements, C++ code that accomplishes the exact same task as the Python 
+expression is generated and compiled to an extension module.  This can take up 
+to a couple of minutes depending on the complexity of the function.  Subsequent 
+calls to the function are very fast.  Futher, the generated module is saved 
+between program executions so that the compilation is only done once for a 
+given expression and associated set of array types.  If the given expression
+is executed with a new set of array types, the code most be compiled again.  This
+does not overwrite the previously compiled function -- both of them are saved and
+available for exectution.  
+<p>
+The following table compares the run times for standard Numeric code and 
+compiled code for the 5 point averaging.
+<p>
+<center>
+<table border=1 >
+<tr><td>Method</td> <td>Run Time (seconds)</td></tr>
+<tr><td>Standard Numeric</td> <td>0.46349</td></tr>
+<tr><td>blitz (1st time compiling)</td> <td> 78.95526</td></tr>
+<tr><td>blitz (subsequent calls)</td> <td>0.05843 (factor of 8 speedup)</td></tr>
+</table>    
+</center>
+<p>
+These numbers are for a 512x512 double precision image run on a 400 MHz Celeron 
+processor under RedHat Linux 6.2.
+<p>
+Because of the slow compile times, its probably most effective to develop 
+algorithms as you usually do using the capabilities of scipy or the Numeric 
+module.  Once the algorithm is perfected, put quotes around it and execute it 
+using <code>compiler.blitz</code>.  This provides the standard rapid 
+prototyping strengths of Python and results in algorithms that run close to 
+that of hand coded C or Fortran.
+
+<a name="blitz_requirements"></a>
+<h2>Requirements</h2>
+
+Currently, the <code>compiler.blitz</code> has only been tested under Linux 
+with gcc-2.95-3 and on Windows with Mingw32 (2.95.2).  Its compiler 
+requirements are pretty heavy duty (see the 
+<a href="http://www.oonumerics.org/blitz/">blitz++ home page</a>), so it won't 
+work with just any compiler. Particularly MSVC++ isn't up to snuff.  A number 
+of other compilers such as KAI++ will also work, but my suspicions are that gcc 
+will get the most use.
+
+<a name="blitz_limitations"></a>
+<h2>Limitations</h2>
+<ol>
+<li>
+Currently, <code>compiler.blitz</code> handles all standard mathematic 
+operators except for the ** power operator.  The built-in trigonmetric, log, 
+floor/ceil, and fabs functions might work (but haven't been tested). It also 
+handles all types of array indexing supported by the Numeric module.  
+<p>
+<code>compiler.blitz</code> does not currently support operations that use 
+array broadcasting, nor have any of the special purpose functions in Numeric 
+such as take, compress, etc. been implemented.  Note that there are no obvious 
+reasons why most of this functionality cannot be added to scipy.compiler, so it 
+will likely trickle into future versions.  Using <code>slice()</code> objects 
+directly instead of <code>start:stop:step</code> is also not supported.
+</li>
+<li>
+Currently Python only works on expressions that include assignment such as
+    
+    <blockquote><pre><code>
+    >>> result = b + c + d
+    </code></pre></blockquote>            
+
+This means that the result array must exist before calling 
+<code>compiler.blitz</code>.  Future versions will allow the following:
+
+    <blockquote><pre><code>
+    >>> result = compiler.blitz_eval("b + c + d")
+    </code></pre></blockquote>            
+</li>
+<li>    
+<code>compiler.blitz</code> works best when algorithms can be expressed in a 
+"vectorized" form.  Algorithms that have a large number of if/thens and other 
+conditions are better hand written in C or Fortran.  Further, the restrictions 
+imposed by requiring vectorized expressions sometimes preclude the use of more 
+efficient data structures or algorithms.  For maximum speed in these cases, 
+hand-coded C or Fortran code is the only way to go.
+</li>
+<li>
+One other point deserves mention lest people be confused. 
+<code>compiler.blitz</code> is not a general purpose Python->C compiler.  It 
+only works for expressions that contain Numeric arrays and/or 
+Python scalar values. This focused scope concentrates effort on the 
+compuationally intensive regions of the program and sidesteps the difficult 
+issues associated with a general purpose Python->C compiler.
+</li>
+</ol>
+
+<a name="Numeric Efficiency"></a>
+<h2>Numeric efficiency issues: What compilation buys you</h2>
+
+Some might wonder why compiling Numeric expressions to C++ is beneficial since 
+operations on Numeric array operations are already executed within C loops. 
+The problem is that anything other than the simplest expression are executed in 
+less than optimal fashion. Consider the following Numeric expression:
+
+    <blockquote><pre><code>
+    a = 1.2 * b + c * d
+    </code></pre></blockquote>            
+    
+When Numeric calculates the value for the 2d array, <code>a</code>, it does 
+the following steps:
+
+    <blockquote><pre><code>
+    temp1 = 1.2 * b
+    temp2 = c * d
+    a = temp1 + temp2
+    </code></pre></blockquote>            
+    
+Two things to note. Since <code>c</code> is an (perhaps large) array, a large 
+temporary array must be created to store the results of <code>1.2 * b</code>. 
+The same is true for <code>temp2</code>.  Allocation is slow.  The second thing 
+is that we have 3 loops executing, one to calculate <code>temp1</code>, one for 
+<code>temp2</code> and one for adding them up. A C loop for the same problem 
+might look like:
+
+    <blockquote><pre><code>
+    for(int i = 0; i < M; i++)
+        for(int j = 0; j < N; j++)
+            a[i,j] = 1.2 * b[i,j] + c[i,j] * d[i,j]
+    </code></pre></blockquote>            
+        
+Here, the 3 loops have been fused into a single loop and there is no longer
+a need for a temporary array.  This provides a significant speed improvement
+over the above example (write me and tell me what you get).  
+<p>
+So, converting Numeric expressions into C/C++ loops that fuse the loops and 
+eliminate temporary arrays can provide big gains.  The goal then,is to convert 
+Numeric expression to C/C++ loops, compile them in an extension module, and 
+then call the compiled extension function.  The good news is that there is an 
+obvious correspondence between the Numeric expression above and the C loop. The 
+bad news is that Numeric is generally much more powerful than this simple 
+example illustrates and handling all possible indexing possibilities results in 
+loops that are less than straight forward to write. (take a peak in Numeric for 
+confirmation).  Luckily, there are several available tools that simplify the 
+process.
+
+<a name="blitz_tools"></a>
+<h2>The Tools</h2>
+
+<code>compiler.blitz</code> relies heavily on several remarkable tools. On the 
+Python side, the main facilitators are Jermey Hylton's parser module and Jim 
+Huginin's Numeric module.  On the compiled language side, Todd Veldhuizen's 
+blitz++ array library, written in C++ (shhhh. don't tell David Beazley), does 
+the heavy lifting.  Don't assume that, because it's C++, it's much slower than C 
+or Fortran. Blitz++ uses a jaw dropping array of template techniques 
+(metaprogramming, template expression, etc) to convert innocent looking and 
+readable C++ expressions into to code that usually executes within a few 
+percentage points of Fortran code for the same problem.  This is good. 
+Unfortunately all the template raz-ma-taz is very expensive to compile, so the 
+200 line extension modules often take 2 or more minutes to compile. This isn't so 
+good. <code>compiler.blitz</code> works to minimize this issue by remembering 
+where compiled modules live and reusing them instead of re-compiling every time 
+a program is re-run.
+
+<a name="blitz_parser"></a>
+<h3>Parser</h3>
+Tearing Numeric expressions apart, examining the pieces, and then rebuilding 
+them as C++ (blitz) expressions requires a parser of some sort. I can imagine 
+someone attacking this problem with regular expressions, but it'd likely be 
+ugly and fragile. Amazingly, Python solves this problem for us.  It actually 
+exposes its parsing engine to the world through the <code>parser</code> module. 
+The following fragment creates an Abstract Syntax Tree (AST) object for the 
+expression and then converts to a (rather unpleasant looking) deeply nested list 
+representation of the tree.  
+    
+    <blockquote><pre><code>
+    >>> import parser
+    >>> import scipy.compiler.misc
+    >>> ast = parser.suite("a = b * c + d")
+    >>> ast_list = ast.tolist()
+    >>> sym_list = scipy.compiler.misc.translate_symbols(ast_list)
+    >>> pprint.pprint(sym_list)
+    ['file_input',
+     ['stmt',
+      ['simple_stmt',
+       ['small_stmt',
+        ['expr_stmt',
+         ['testlist',
+          ['test',
+           ['and_test',
+            ['not_test',
+             ['comparison',
+              ['expr',
+               ['xor_expr',
+                ['and_expr',
+                 ['shift_expr',
+                  ['arith_expr',
+                   ['term',
+                    ['factor', ['power', ['atom', ['NAME', 'a']]]]]]]]]]]]]]],
+         ['EQUAL', '='],
+         ['testlist',
+          ['test',
+           ['and_test',
+            ['not_test',
+             ['comparison',
+              ['expr',
+               ['xor_expr',
+                ['and_expr',
+                 ['shift_expr',
+                  ['arith_expr',
+                   ['term',
+                    ['factor', ['power', ['atom', ['NAME', 'b']]]],
+                    ['STAR', '*'],
+                    ['factor', ['power', ['atom', ['NAME', 'c']]]]],
+                   ['PLUS', '+'],
+                   ['term',
+                    ['factor', ['power', ['atom', ['NAME', 'd']]]]]]]]]]]]]]]]],
+       ['NEWLINE', '']]],
+     ['ENDMARKER', '']]
+    </code></pre></blockquote>            
+
+Despite its looks, with some tools developed by Jermey H., its possible
+to search these trees for specific patterns (sub-trees), extract the 
+sub-tree, manipulate them converting python specific code fragments
+to blitz code fragments, and then re-insert it in the parse tree.  The parser
+module documentation has some details on how to do this.  Traversing the 
+new blitzified tree, writing out the terminal symbols as you go, creates
+our new blitz++ expression string.
+
+<a name="blitz_blitz"></a>   
+<h3> Blitz and Numeric </h3>
+The other nice discovery in the project is that the data structure used
+for Numeric arrays and blitz arrays is nearly identical.  Numeric stores
+"strides" as byte offsets and blitz stores them as element offsets, but
+other than that, they are the same.  Further, most of the concept and
+capabilities of the two libraries are remarkably similar.  It is satisfying 
+that two completely different implementations solved the problem with 
+similar basic architectures.  It is also fortitous.  The work involved in 
+converting Numeric expressions to blitz expressions was greatly diminished.
+As an example, consider the code for slicing an array in Python with a
+stride:
+
+    <blockquote><pre><code>
+    >>> a = b[0:4:2] + c
+    >>> a
+    [0,2,4]
+    </code></pre></blockquote>            
+
+
+In Blitz it is as follows:
+
+    <blockquote><pre><code>
+    Array<2,int> b(10);
+    Array<2,int> c(3);
+    // ...
+    Array<2,int> a = b(Range(0,3,2)) + c;
+    </code></pre></blockquote>            
+
+
+Here the range object works exactly like Python slice objects with the exception
+that the top index (3) is inclusive where as Python's (4) is exclusive.  Other 
+differences include the type declaraions in C++ and parentheses instead of 
+brackets for indexing arrays.  Currently, <code>compiler.blitz</code> handles the 
+inclusive/exclusive issue by subtracting one from upper indices during the
+translation.  An alternative that is likely more robust/maintainable in the 
+long run, is to write a PyRange class that behaves like Python's range.  
+This is likely very easy.
+<p>
+The stock blitz also doesn't handle negative indices in ranges.  The current 
+implementation of the compiler has a partial solution to this problem.  It 
+calculates and index that starts with a '-' sign by subtracting it from the
+maximum index in the array so that:
+
+    <blockquote><pre><code>
+                    upper index limit
+                        /-----\
+    b[:-1] -> b(Range(0,Nb[0]-1-1))
+    </code></pre></blockquote>            
+
+This approach fails, however, when the top index is calculated from other 
+values. In the following scenario, if <code>i+j</code> evaluates to a negative 
+value, the compiled code will produce incorrect results and could even core-
+dump.  Right now, all calculated indices are assumed to be positive.
+    
+    <blockquote><pre><code>
+    b[:i-j] -> b(Range(0,i+j))
+    </code></pre></blockquote>            
+
+A solution is to calculate all indices up front using if/then to handle the
++/- cases.  This is a little work and results in more code, so it hasn't been
+done.  I'm holding out to see if blitz++ can be modified to handle negative
+indexing, but haven't looked into how much effort is involved yet.  While it 
+needs fixin', I don't think there is a ton of code where this is an issue.
+<p>
+The actual translation of the Python expressions to blitz expressions is 
+currently a two part process.  First, all x:y:z slicing expression are removed
+from the AST, converted to slice(x,y,z) and re-inserted into the tree.  Any
+math needed on these expressions (subtracting from the 
+maximum index, etc.) are also preformed here. _beg and _end are used as special
+variables that are defined as blitz::fromBegin and blitz::toEnd.
+
+    <blockquote><pre><code>
+    a[i+j:i+j+1,:] = b[2:3,:] 
+    </code></pre></blockquote>            
+
+becomes a more verbose:
+    
+    <blockquote><pre><code>
+    a[slice(i+j,i+j+1),slice(_beg,_end)] = b[slice(2,3),slice(_beg,_end)]
+    </code></pre></blockquote>            
+    
+The second part does a simple string search/replace to convert to a blitz 
+expression with the following translations:
+
+    <blockquote><pre><code>
+    slice(_beg,_end) -> _all  # not strictly needed, but cuts down on code.
+    slice            -> blitz::Range
+    [                -> (
+    ]                -> )
+    _stp             -> 1
+    </code></pre></blockquote>            
+
+<code>_all</code> is defined in the compiled function as 
+<code>blitz::Range.all()</code>.  These translations could of course happen 
+directly in the syntax tree.  But the string replacement is slightly easier. 
+Note that name spaces are maintained in the C++ code to lessen the likelyhood 
+of name clashes.  Currently no effort is made to detect name clashes.  A good 
+rule of thumb is don't use values that start with '_' or 'py_' in compiled 
+expressions and you'll be fine.
+
+<a name="blitz_type_conversions"></a>   
+<h2>Type definitions and coersion</h2>
+
+So far we've glossed over the dynamic vs. static typing issue between Python 
+and C++.  In Python, the type of value that a variable holds can change
+through the course of program execution.  C/C++, on the other hand, forces you
+to declare the type of value a variables will hold prior at compile time.
+<code>compiler.blitz</code> handles this issue by examining the types of the
+variables in the expression being executed, and compiling a function for those
+explicit types.  For example:
+
+    <blockquote><pre><code>
+    a = ones((5,5),Float32)
+    b = ones((5,5),Float32)
+    compiler.blitz("a = a + b")
+    </code></pre></blockquote>            
+
+When compiling this expression to C++, <code>compiler.blitz</code> sees that the
+values for a and b in the local scope have type <code>Float32</code>, or 'float'
+on a 32 bit architecture.  As a result, it compiles the function using 
+the float type (no attempt has been made to deal with 64 bit issues).
+It also goes one step further.  If all arrays have the same type, a templated
+version of the function is made and instantiated for float, double, 
+complex<float>, and complex<double> arrays.  <em> Note: This feature has been 
+removed from the current version of the code.  Each version will be compiled
+separately </em>
+<p>
+What happens if you call a compiled function with array types that are 
+different than the ones for which it was originally compiled? No biggie, you'll 
+just have to wait on it to compile a new version for your new types.  This 
+doesn't overwrite the old functions, as they are still accessible.  See the 
+catalog section in the inline() documentation to see how this is handled. 
+Suffice to say, the mechanism is transparent to the user and behaves 
+like dynamic typing with the occasional wait for compiling newly typed 
+functions.
+<p>
+When working with combined scalar/array operations, the type of the array is 
+<em>always</em> used.  This is similar to the savespace flag that was recently 
+added to Numeric.  This prevents issues with the following expression perhaps 
+unexpectedly being calculated at a higher (more expensive) precision that can 
+occur in Python:
+
+    <blockquote><pre><code>
+    >>> a = array((1,2,3),typecode = Float32)
+    >>> b = a * 2.1 # results in b being a Float64 array.
+    </code></pre></blockquote>            
+    
+In this example, 
+
+    <blockquote><pre><code>
+    >>> a = ones((5,5),Float32)
+    >>> b = ones((5,5),Float32)
+    >>> compiler.blitz("b = a * 2.1")
+    </code></pre></blockquote>            
+    
+the <code>2.1</code> is cast down to a <code>float</code> before carrying out 
+the operation.  If you really want to force the calculation to be a 
+<code>double</code>, define <code>a</code> and <code>b</code> as 
+<code>double</code> arrays.
+<p>
+One other point of note.  Currently, you must include both the right hand side 
+and left hand side (assignment side) of your equation in the compiled 
+expression.  Also, the array being assigned to must be created prior to calling 
+<code>compiler.blitz</code>.  I'm pretty sure this is easily changed so that a 
+compiled_eval expression can be defined, but no effort has been made to 
+allocate new arrays (and decern their type) on the fly.
+
+<a name="blitz_catalog"></a>   
+<h2>Cataloging Compiled Functions</h2>
+
+See the <a href="#The Catalog">Cataloging functions</a> section in the 
+<code>compiler.inline()</code> documentation.
+
+<a name="blitz_array_sizes"></a>   
+<h2>Checking Array Sizes</h2>
+
+Surprisingly, one of the big initial problems with compiled code was making
+sure all the arrays in an operation were of compatible type.  The following
+case, is of course, trivially easy:
+
+    <blockquote><pre><code>
+    a = b + c
+    </code></pre></blockquote>            
+   
+It only requires that arrays <code>a</code>, <code>b</code>, and <code>c</code> 
+have the same shape.  However, expressions like:
+
+    <blockquote><pre><code>
+    a[i+j:i+j+1,:] = b[2:3,:] + c
+    </code></pre></blockquote>            
+
+are not so trivial.  Since slicing is involved, the size of the slices, not the 
+input arrays must be checked.  Broadcasting complicates things further because 
+arrays and slices with different dimensions and shapes may be compatible for 
+math operations (broadcasting isn't yet supported by 
+<code>compiler.blitz</code>).  Reductions have a similar effect as their 
+results are different shapes than their input operand.  The binary operators in 
+Numeric compare the shapes of their two operands just before they operate on 
+them.  This is possible because Numeric treats each operation independently. 
+The intermediate (temporary) arrays created during sub-operations in an 
+expression are tested for the correct shape before they are combined by another 
+operation.  Because <code>compiler.blitz</code> fuses all operations into a 
+single loop, this isn't possible.  The shape comparisons must be done and 
+guaranteed compatible before evaluating the expression.
+<p>
+The solution chosen converts input arrays to "dummy arrays" that only represent 
+the dimensions of the arrays, not the data. Binary operations on dummy arrays 
+check that input array sizes are comptible and return a dummy array with the 
+size correct size.  Evaluating an expression of dummy arrays traces the 
+changing array sizes through all operations and fails if incompatible array 
+sizes are ever found.  
+<p>
+The machinery for this is housed in <code>compiler.size_check</code>.  It 
+basically involves writing a new class (dummy array) and overloading it math 
+operators to calculate the new sizes correctly.  All the code is in Python and 
+there is a fair amount of logic (mainly to handle indexing and slicing) so the 
+operation does impose some overhead. For large arrays (ie. 50x50x50), the 
+overhead is negligible compared to evaluating the actual expression.  For small 
+arrays (ie. 16x16), the overhead imposed for checking the shapes with this 
+method can cause the <code>compiler.blitz</code> to be slower than evaluating 
+the expression in Python.  
+<p>
+What can be done to reduce the overhead?  (1) The size checking code could be 
+moved into C.  This would likely remove most of the overhead penalty compared 
+to Numeric (although there is also some calling overhead), but no effort has 
+been made to do this. (2) You can also call <code>compiler.blitz</code> with
+<code>check_size=0</code> and the size checking isn't done.  However, if the 
+sizes aren't compatible, it can cause a core-dump.  So, foregoing size_checking
+isn't advisable until your code is well debugged.
+
+<a name="blitz_extension_module"></a>       
+<h2>Creating the Extension Module</h2>
+
+<code>compiler.blitz</code> uses the same machinery as 
+<code>compiler.inline</code> to build the extension module.  The only difference
+is the code included in the function is automatically generated from the
+Numeric array expression instead of supplied by the user.
+
+<a name="#Extension Modules"></a>
+<h1>Extension Modules</h1>
+<code>compiler.inline</code> and <code>compiler.blitz</code> are high level tools
+that generate extension modules automatically.  Under the covers, they use several
+classes from <code>compiler.ext_tools</code> to help generate the extension module.
+The main two classes are <code>ext_module</code> and <code>ext_function</code> (I'd
+like to add <code>ext_class</code> and <code>ext_method</code> also).  These classes
+simplify the process of generating extension modules by handling most of the "boiler
+plate" code automatically.
+
+<em>
+Note: <code>inline</code> actually sub-classes <code>compiler.ext_tools.ext_function</code> 
+to generate slightly different code than the standard <code>ext_function</code>.
+The main difference is that the standard class converts function arguments to
+C types, while inline always has two arguments, the local and global dicts, and
+the grabs the variables that need to be convereted to C from these.
+</em>
+
+<a name="A Simple Example"></a>
+<h2> A Simple Example </h2>
+The following simple example demonstrates how to build an extension module within
+a Python function:
+
+    <blockquote><pre><code>
+    # examples/increment_example.py
+    from compiler import ext_tools
+    
+    def build_increment_ext():
+        """ Build a simple extension with functions that increment numbers.
+            The extension will be built in the local directory.
+        """        
+        mod = ext_tools.ext_module('increment_ext')
+    
+        a = 1 # effectively a type declaration for 'a' in the 
+              # following functions.
+    
+        ext_code = "return_val = Py::new_reference_to(Py::Int(a+1));"    
+        func = ext_tools.ext_function('increment',ext_code,['a'])
+        mod.add_function(func)
+        
+        ext_code = "return_val = Py::new_reference_to(Py::Int(a+2));"    
+        func = ext_tools.ext_function('increment_by_2',ext_code,['a'])
+        mod.add_function(func)
+                
+        mod.compile()
+    </code></pre></blockquote>
+
+
+The function <code>build_increment_ext()</code> creates an extension module 
+named <code>increment_ext</code> and compiles it to a shared library (.so or 
+.pyd) that can be loaded into Python.. <code>increment_ext</code> contains two 
+functions, <code>increment</code> and <code>increment_by_2</code>.  
+
+The first line of <code>build_increment_ext()</code>,
+
+    <blockquote><pre><code>
+        mod = ext_tools.ext_module('increment_ext') 
+    </code></pre></blockquote>
+
+creates an <code>ext_module</code> instance that is ready to have 
+<code>ext_function</code> instances added to it.  <code>ext_function</code> 
+instances are created much with a calling convention similar to 
+<code>compiler.inline()</code>. The most common call includes a C/C++ code 
+snippet and a list of the arguments for the function.  The following
+
+    <blockquote><pre><code>
+        ext_code = "return_val = Py::new_reference_to(Py::Int(a+1));"    
+        func = ext_tools.ext_function('increment',ext_code,['a'])
+    </code></pre></blockquote>
+    
+creates a C/C++ extension function that is equivalent to the following Python
+function:
+
+    <blockquote><pre><code>
+        def increment(a):
+            return a + 1
+    </code></pre></blockquote>
+
+A second method is also added to the module and then,
+
+    <blockquote><pre><code>
+        mod.compile()
+    </code></pre></blockquote>
+
+is called to build the extension module.  By default, the module is created
+in the current working directory.
+
+This example is available in the <code>examples/increment_example.py</code> file
+found in the compiler <code>directory</code>.  At the bottom of the file in the
+module's "main" program, an attempt to import <code>increment_ext</code> without
+building it is made.  If this fails (the module doesn't exist in the PYTHONPATH), 
+the module is built by calling <code>build_increment_ext()</code>.  This approach
+only takes the time consuming ( a few seconds for this example) process of building
+the module if it hasn't been built before.
+
+    <blockquote><pre><code>
+    if __name__ == "__main__":
+        try:
+            import increment_ext
+        except ImportError:
+            build_increment_ext()
+            import increment_ext
+        a = 1
+        print 'a, a+1:', a, increment_ext.increment(a)
+        print 'a, a+2:', a, increment_ext.increment_by_2(a)           
+    </code></pre></blockquote>            
+
+<em>
+Note: If we were willing to always pay the penalty of building the C++ code for 
+a module, we could store the md5 checksum of the C++ code along with some 
+information about the compiler, platform, etc.  Then, 
+<code>ext_module.compile()</code> could try importing the module before it actually
+compiles it, check the md5 checksum and other meta-data in the imported module
+with the meta-data of the code it just produced and only compile the code if
+the module didn't exist or the meta-data didn't match.  This would reduce the
+above code to:
+</em>
+    <blockquote><pre><code>
+    if __name__ == "__main__":
+        build_increment_ext()
+
+        a = 1
+        print 'a, a+1:', a, increment_ext.increment(a)
+        print 'a, a+2:', a, increment_ext.increment_by_2(a)           
+    </code></pre></blockquote>            
+<em>
+Note:  There would always be the overhead of building the C++ code, but it would only actually compile the code once.  You pay a little in overhead and get cleaner
+"import" code.  Needs some thought.
+</em>
+<p>
+
+If you run <code>increment_example.py</code> from the command line, you get
+the following:
+
+    <blockquote><pre><code>
+    [eric@n0]$ python increment_example.py
+    a, a+1: 1 2
+    a, a+2: 1 3
+    </code></pre></blockquote>            
+
+If the module didn't exist before it was run, the module is created.  If it did
+exist, it is just imported and used.
+
+<a name="Fibonacci Example"></a>
+<h1> Fibonacci Example </h1>
+<code>examples/fibonacci.py</code> provides a little more complex example of 
+how to use <code>ext_tools</code>.  Fibonacci numbers are a series of numbers 
+where each number in the series is the sum of the previous two: 1, 1, 2, 3, 5, 
+8, etc. Here, the first two numbers in the series are taken to be 1. One 
+approach to calculating Fibonacci numbers uses recursive function calls.  In 
+Python, it might be written as:
+
+    <blockquote><pre><code>
+    def fib(a):
+        if a <= 2:
+            return 1
+        else:
+            return fib(a-2) + fib(a-1)
+    </code></pre></blockquote>            
+
+In C, the same function would look something like this:
+
+    <blockquote><pre><code>
+     int fib(int a)
+     {                   
+         if(a <= 2)
+             return 1;
+         else
+             return fib(a-2) + fib(a-1);  
+     }                      
+    </code></pre></blockquote>
+
+Recursion is much faster in C than in Python, so it would be beneficial
+to use the C version for fibonacci number calculations instead of the
+Python version.  We need an extension function that calls this C function
+to do this.  This is possible by including the above code snippet as 
+"support code" and then calling it from the extension function. Support 
+code snippets (usually structure definitions, helper functions and the like)
+are inserted into the extension module C/C++ file before the extension
+function code.  Here is how to build the C version of the fibonacci number
+generator:
+
+    <blockquote><pre><code>
+def build_fibonacci():
+    """ Builds an extension module with fibonacci calculators.
+    """
+    mod = ext_tools.ext_module('fibonacci_ext')
+    a = 1 # this is effectively a type declaration
+    
+    # recursive fibonacci in C 
+    fib_code = """
+                   int fib1(int a)
+                   {                   
+                       if(a <= 2)
+                           return 1;
+                       else
+                           return fib1(a-2) + fib1(a-1);  
+                   }                         
+               """
+    ext_code = """
+                   int val = fib1(a);
+                   return_val = Py::new_reference_to(Py::Int(val));
+               """    
+    fib = ext_tools.ext_function('fib',ext_code,['a'])
+    fib.customize.add_support_code(fib_code)
+    mod.add_function(fib)
+
+    mod.compile()
+
+    </code></pre></blockquote>
+
+XXX More about custom_info, and what xxx_info instances are good for.
+
+<p>
+<em>
+Note: recursion is not the fastest way to calculate fibonacci numbers, but this 
+approach serves nicely for this example.
+</em>
+<p>
+<a name="#Type Factories"></a>
+<h1>Customizing Type Conversions -- Type Factories</h1>
+not written
+
+<h1>Things I wish compiler did</h1>
+not written