summaryrefslogtreecommitdiff
path: root/scipy/doc/CAPI.txt
blob: f8bfd39816076703454ec87476d1f97a71a4f39a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
Author:          Travis Oliphant
Discussions to:  scipy-dev@scipy.org
Created:         October 2005

The CAPI of SciPy is (mostly) backward compatible with Numeric.  

There are a few non-standard Numeric usages (that were not really part
of the API) that will need to be changed:

  * If you used any of the function pointers in the PyArray_Descr
    structure you will have to modify your usage of those.  First, 
    the pointers are all under the member named f.  So descr->cast is now
    descr->f->cast.  In addition, the
    casting functions have eliminated the strides argument (use
    PyArray_CastTo if you need strided casting). All functions have
    one or two PyArrayObject * arguments at the end.  This allows the
    flexible arrays and mis-behaved arrays to be handled.

  * The descr->zero and descr->one constants have been replaced with
    function calls, PyArray_Zero, and PyArray_One (be sure to read the
    code and free the resulting memory if you use these calls. 

  * If you passed array->dimensions and array->strides around to 
    functions, you will need to fix some code.  These are now intp* pointers.  
    On 32-bit systems there won't be a problem.  However, on 64-bit systems, you will
    need to make changes to avoid errors and segfaults.


The header files arrayobject.h and ufuncobject.h contain many defines
that you may find useful.  The files __ufunc_api.h and
__multiarray_api.h contain the available C-API function calls with
their function signatures.

All of these headers are installed to 

<YOUR_PYTHON_LOCATION>/site-packages/scipy/base/include


Getting arrays in C-code
=========================

All new arrays can be created using PyArray_NewFromDescr.  A simple interface 
equivalent to PyArray_FromDims is PyArray_SimpleNew(nd, dims, typenum)
and to PyArray_FromDimsAndData is PyArray_SimpleNewFromData(nd, dims, typenum, data)

This is a very flexible function.  

PyObject * PyArray_NewFromDescr(PyTypeObject *subtype, PyArray_Descr *descr,
                                int nd, intp *dims, 
                                intp *strides, char *data, 
				int flags, PyObject *obj);


subtype  : The subtype that should be created (either pass in
             &PyArray_Type, &PyBigArray_Type, or obj->ob_type,
             where obj is a an instance of a subtype (or subclass) of 
	     PyArray_Type or PyBigArray_Type).

descr    : The type descriptor for the array.  This is a Python Object
	     (this function steals a reference to it).  The easiest way
	     to get one is using PyArray_DescrFromType(<typenum>).  If 
	     you want to use a flexible size array, then you need to use
	     PyArray_DescrNewFromType(<flexible typenum>) and set its elsize
	     paramter to the desired size.   The typenum in both of these
	     cases is one of the PyArray_XXXX enumerated types. 

nd       : The number of dimensions (<MAX_DIMS)

*dims    : A pointer to the size in each dimension.  Information will be
             copied from here.

*strides : The strides this array should have.  For new arrays created
             by this routine, this should be NULL.  If you pass in
             memory for this array to use, then you can pass in the
             strides information as well (otherwise it will be created for
	     you and default to C-contiguous or Fortran contiguous).
	     Any strides will be copied into the array structure.   
	     Do not pass in bad strides information!!!!

	    PyArray_CheckStrides(...) can help but you must call it if you are
	     unsure.  You cannot pass in strides information when data is NULL
	     and this routine is creating its own memory. 

*data    : NULL for creating brand-new memory.  If you want this array
             to wrap another memory area, then pass the pointer here.
             You are responsible for deleting the memory in that case,
             but do not do so until the new array object has been
             deleted.  The best way to handle that is to get the memory
             from another Python object, INCREF that Python object after
             passing it's data pointer to this routine, and set the
             ->base member of the returned array to the Python object.
             *You are responsible for* setting PyArray_BASE(ret) to the 
	     base object.  Failure to do so will create a memory leak.  
            
            If you pass in a data buffer, the flags argument will be
              the flags of the new array. If you create a new array, a
              non-zero flags argument indicates that you want the array
              to be in FORTRAN order.

flags    : Either the flags showing how to interpret the data buffer
             passed in.  Or if a new array is created, nonzero to
             indicate a FORTRAN order array.  See below for an explanation of
	     the flags.

obj      : If subtypes is &PyArray_Type or &PyBigArray_Type, this
             argument is ignored.  Otherwise, the __array_finalize__
             method of the subtype is called (if present) and passed
             this object.  This is usually an array of the type to be
             created (so the __array_finalize__ method must handle an
             array argument.  But, it can be anything...)

Note: The returned array object will be unitialized unless the type is
PyArray_OBJECT in which case the memory will be set to NULL.

PyArray_SimpleNew(nd, dims, typenum)  is a drop-in replacement for 
PyArray_FromDims (except it takes intp* dims instead of int* dims which
		  matters on 64-bit systems) and it does not initialize 
                  the memory to zero.

PyArray_SimpleNew is just a macro for PyArray_New with default arguments.
Use PyArray_FILLWBYTE(arr, 0)  to fill with zeros.

The PyArray_FromDims and family of functions are still available and
are loose wrappers around this function.  These functions still take
int * arguments.  This should be fine on 32-bit systems, but on 64-bit
systems you may run into trouble if you frequently passed 
PyArray_FromDims the dimensions member of the old PyArrayObject structure
because sizeof(intp) != sizeof(int).


Getting an arrayobject from an arbitrary Python object
==============================================================

PyArray_FromAny(...)

This function replaces PyArray_ContiguousFromObject and friends (those
function calls still remain but they are loose wrappers around the
PyArray_FromAny call).

static PyObject *
PyArray_FromAny(PyObject *op, PyArray_Descr *dtype, int min_depth, 
		int max_depth, int requires) 


op        : The Python object to "convert" to an array object

dtype     : The desired data-type descriptor.  This can be NULL, if the descriptor
	    should be determined by the object.  Unless
              FORCECAST is present in flags, this call will generate
              an error if the data type cannot be safely obtained from
              the object.

min_depth : The minimum depth of array needed or 0 if doesn't matter

max_depth : The maximum depth of array allowed or 0 if doesn't matter

requires  : A flag indicating the "requirements" of the returned array. 


From the code comments, the requires flag is explained.

requires can be any of 

  CONTIGUOUS, 
  FORTRAN, 
  ALIGNED, 
  WRITEABLE, 
  ENSURECOPY, 
  ENSUREARRAY,
  UPDATEIFCOPY,
  FORCECAST,

   or'd (|) together

   Any of these flags present means that the returned array should 
   guarantee that aspect of the array.  Otherwise the returned array
   won't guarantee it -- it will depend on the object as to whether or 
   not it has such features. 

   Note that ENSURECOPY is enough to guarantee CONTIGUOUS, ALIGNED,
   and WRITEABLE and therefore it is redundant to include those as well. 

   BEHAVED_FLAGS == ALIGNED | WRITEABLE
   BEHAVED_FLAGS_RO == ALIGNED 
   CARRAY_FLAGS = CONTIGUOUS | BEHAVED_FLAGS
   FARRAY_FLAGS = FORTRAN | BEHAVED_FLAGS
   
   By default, if the object is an array (or any subclass) and requires is 0, 
   the array will just be INCREF'd and returned. 

   ENSUREARRAY makes sure a base-class ndarray is returned (If the object is a 
   bigndarray it will also be returned).   
   
   UPDATEIFCOPY flag sets this flag in the returned array *if a copy is
   made*.  The base argument of the returned array points to the misbehaved 
   array (which is set to READONLY in that case).   
   When the new array is deallocated, the original array held in base
   is updated with the contents of the new array.  This is useful, 
   if you don't want to deal with a possibly mis-behaved array, but want
   to update it easily using a local contiguous copy. 

   FORCECAST will cause a cast to occur regardless of whether or not
   it is safe. 


PyArray_ContiguousFromAny(op, typenum, min_depth, max_depth) is equivalent
to PyArray_ContiguousFromObject(...) (which is still available), except
it will return the subclass if op is already a subclass of the ndarray.  
The ContiguousFromObject version will always return an ndarray (or a bigndarray). 

Passing Data Type information to C-code
============================================

All Data-types are handled using the PyArray_Descr * structure.
This structure can be obtained from a Python object using 
PyArray_DescrConverter and PyArray_DescrConverter2.  The former 
returns the default PyArray_LONG descriptor when the input object
is None, while the latter returns NULL when the input object is None. 

See the arraymethods.c and multiarraymodule.c files for many examples of usage.

Getting at the structure of the array.

You should use the #defines provided to access array structure portions:

PyArray_DATA(obj)
PyArray_ITEMSIZE(obj)
PyArray_NDIM(obj)
PyArray_DIMS(obj)
PyArray_DIM(obj, n)
PyArray_STRIDES(obj)
PyArray_STRIDE(obj,n)
PyArray_DESCR(obj)
PyArray_BASE(obj)


see more in arrayobject.h


NDArray Flags
==========================

The flags attribute of the PyArrayObject structure contains important 
information about the memory used by the array (pointed to by the data member)
This flags information must be kept accurate or strange results and even
segfaults may result. 

There are 7 (binary) flags that describe the memory area used by the
data buffer.  These constants are defined in arrayobject.h and
determine the bit-position of the flag.  Python exposes a nice dictionary
interface for getting (and, if appropriate, setting) these flags.

Memory areas of all kinds can be pointed to by an ndarray, necessitating 
these flags.  If you get an arbitrary PyArrayObject in C-code, 
you need to be aware of the flags that are set.  
If you need to guarantee a certain kind of array 
(like CONTIGUOUS and BEHAVED), then pass these requirements into the 
PyArray_FromAny function.  


CONTIGUOUS  :  True if the array is (C-style) contiguous in memory.
FORTRAN     :  True if the array is (Fortran-style) contiguous in memory.

Notice that 1-d arrays are always both FORTRAN contiguous and C contiguous.
Both of these flags can be checked and are convenience flags only as whether
or not an array is CONTIGUOUS or FORTRAN can be determined by the strides,
dimensions, and itemsize variables..

OWNDATA     :  True if the array owns the memory (it will try and free it
	        using PyDataMem_FREE() on deallocation --- 
		so it better really own it).

These three flags facilitate using a data pointer that is a memory-mapped
array, or part of some larger record array.  But, they may have other uses...

ALIGNED     :  True if the data buffer is aligned for the type.  This
	        can be checked.
WRITEABLE   :  True only if the data buffer can be "written" to.


UPDATEIFCOPY :  This is a special flag that is set if this array represents
	          a copy made because a user required certain FLAGS in 
		  PyArray_FromAny and a copy had to be made of some 
		  other array (and the user asked for this flag to be set in 
	          such a situation). The base attribute then points to the 
		  "misbehaved" array (which is set read_only).  
                  When the array with this flag set is deallocated, 
                  it will copy its contents back to the "misbehaved" array 
                  (casting if necessary) and will reset the "misbehaved" 
	          array to WRITEABLE.  If the "misbehaved" array
	          was not WRITEABLE to begin with then PyArray_FromAny would
	          have returned an error because UPDATEIFCOPY would not
	          have been possible.  


PyArray_UpdateFlags(obj, FLAGS) will update the obj->flags for FLAGS
                  which can be any of CONTIGUOUS FORTRAN ALIGNED or WRITEABLE

Some useful combinations of these flags:

BEHAVED = ALIGNED | WRITEABLE
BEHAVED_RO = ALIGNED 
CARRAY_FLAGS = CONTIGUOUS | BEHAVED
FARRAY_FLAGS = FORTRAN | BEHAVED

The macro PyArray_CHECKFLAGS(obj, FLAGS)  can test any combination of flags.
There are several default combinations defined as macros already 
(see arrayobject.h) 

In particular, there are ISBEHAVED, ISBEHAVED_RO, ISCARRAY and ISFARRAY macros 
that also check to make sure the array is in native byte order (as determined)
by the data-type descriptor. 

There are more C-API enhancements which you can discover in the code, 
      or buy the book (http://www.trelgol.com)