summaryrefslogtreecommitdiff
path: root/doc/ufuncs.rst.txt
diff options
context:
space:
mode:
authorrgommers <ralf.gommers@googlemail.com>2010-11-11 21:29:56 +0800
committerrgommers <ralf.gommers@googlemail.com>2010-11-11 21:29:56 +0800
commit4a7de57448091ef02e50edf9d1e302c20a26ff0c (patch)
tree357de983a7ba4ff56e7800781f4619d70111a89f /doc/ufuncs.rst.txt
parenta07ac0f4703cbac0e8747ac0e9e08f41cba9b896 (diff)
downloadnumpy-4a7de57448091ef02e50edf9d1e302c20a26ff0c.tar.gz
DOC: rename ReST files under doc/ from *.txt to *.rst.txt, so they render on github.
Diffstat (limited to 'doc/ufuncs.rst.txt')
-rw-r--r--doc/ufuncs.rst.txt103
1 files changed, 103 insertions, 0 deletions
diff --git a/doc/ufuncs.rst.txt b/doc/ufuncs.rst.txt
new file mode 100644
index 000000000..fa107cc21
--- /dev/null
+++ b/doc/ufuncs.rst.txt
@@ -0,0 +1,103 @@
+BUFFERED General Ufunc explanation
+==================================
+
+.. note::
+
+ This was implemented already, but the notes are kept here for historical
+ and explanatory purposes.
+
+We need to optimize the section of ufunc code that handles mixed-type
+and misbehaved arrays. In particular, we need to fix it so that items
+are not copied into the buffer if they don't have to be.
+
+Right now, all data is copied into the buffers (even scalars are copied
+multiple times into the buffers even if they are not going to be cast).
+
+Some benchmarks show that this results in a significant slow-down
+(factor of 4) over similar numarray code.
+
+The approach is therefore, to loop over the largest-dimension (just like
+the NO_BUFFER) portion of the code. All arrays will either have N or
+1 in this last dimension (or their would be a mis-match error). The
+buffer size is B.
+
+If N <= B (and only if needed), we copy the entire last-dimension into
+the buffer as fast as possible using the single-stride information.
+
+Also we only copy into output arrays if needed as well (other-wise the
+output arrays are used directly in the ufunc code).
+
+Call the function using the appropriate strides information from all the input
+arrays. Only set the strides to the element-size for arrays that will be copied.
+
+If N > B, then we have to do the above operation in a loop (with an extra loop
+at the end with a different buffer size).
+
+Both of these cases are handled with the following code::
+
+ Compute N = quotient * B + remainder.
+ quotient = N / B # integer math
+ (store quotient + 1) as the number of innerloops
+ remainder = N % B # integer remainder
+
+On the inner-dimension we will have (quotient + 1) loops where
+the size of the inner function is B for all but the last when the niter size is
+remainder.
+
+So, the code looks very similar to NOBUFFER_LOOP except the inner loop is
+replaced with::
+
+ for(k=0; i<quotient+1; k++) {
+ if (k==quotient+1) make itersize remainder size
+ copy only needed items to buffer.
+ swap input buffers if needed
+ cast input buffers if needed
+ call function()
+ cast outputs in buffers if needed
+ swap outputs in buffers if needed
+ copy only needed items back to output arrays.
+ update all data-pointers by strides*niter
+ }
+
+
+Reference counting for OBJECT arrays:
+
+If there are object arrays involved then loop->obj gets set to 1. Then there are two cases:
+
+1) The loop function is an object loop:
+
+ Inputs:
+ - castbuf starts as NULL and then gets filled with new references.
+ - function gets called and doesn't alter the reference count in castbuf
+ - on the next iteration (next value of k), the casting function will
+ DECREF what is present in castbuf already and place a new object.
+
+ - At the end of the inner loop (for loop over k), the final new-references
+ in castbuf must be DECREF'd. If its a scalar then a single DECREF suffices
+ Otherwise, "bufsize" DECREF's are needed (unless there was only one
+ loop, then "remainder" DECREF's are needed).
+
+ Outputs:
+ - castbuf contains a new reference as the result of the function call. This
+ gets converted to the type of interest and. This new reference in castbuf
+ will be DECREF'd by later calls to the function. Thus, only after the
+ inner most loop do we need to DECREF the remaining references in castbuf.
+
+2) The loop function is of a different type:
+
+ Inputs:
+
+ - The PyObject input is copied over to buffer which receives a "borrowed"
+ reference. This reference is then used but not altered by the cast
+ call. Nothing needs to be done.
+
+ Outputs:
+
+ - The buffer[i] memory receives the PyObject input after the cast. This is
+ a new reference which will be "stolen" as it is copied over into memory.
+ The only problem is that what is presently in memory must be DECREF'd first.
+
+
+
+
+