diff options
Diffstat (limited to 'doc/source')
23 files changed, 777 insertions, 920 deletions
diff --git a/doc/source/dev/development_environment.rst b/doc/source/dev/development_environment.rst new file mode 100644 index 000000000..6221353ce --- /dev/null +++ b/doc/source/dev/development_environment.rst @@ -0,0 +1,211 @@ +.. _development-environment: + +Setting up and using your development environment +================================================= + + +Recommended development setup +----------------------------- + +Since NumPy contains parts written in C and Cython that need to be +compiled before use, make sure you have the necessary compilers and Python +development headers installed - see :ref:`building-from-source`. + +Having compiled code also means that importing Numpy from the development +sources needs some additional steps, which are explained below. For the rest +of this chapter we assume that you have set up your git repo as described in +:ref:`using-git`. + +To build the development version of NumPy and run tests, spawn +interactive shells with the Python import paths properly set up etc., +do one of:: + + $ python runtests.py -v + $ python runtests.py -v -s random + $ python runtests.py -v -t numpy/core/tests/test_iter.py:test_iter_c_order + $ python runtests.py --ipython + $ python runtests.py --python somescript.py + $ python runtests.py --bench + $ python runtests.py -g -m full + +This builds Numpy first, so the first time it may take a few minutes. If +you specify ``-n``, the tests are run against the version of NumPy (if +any) found on current PYTHONPATH. + +Using ``runtests.py`` is the recommended approach to running tests. +There are also a number of alternatives to it, for example in-place +build or installing to a virtualenv. See the FAQ below for details. + + +Building in-place +----------------- + +For development, you can set up an in-place build so that changes made to +``.py`` files have effect without rebuild. First, run:: + + $ python setup.py build_ext -i + +This allows you to import the in-place built NumPy *from the repo base +directory only*. If you want the in-place build to be visible outside that +base dir, you need to point your ``PYTHONPATH`` environment variable to this +directory. Some IDEs (Spyder for example) have utilities to manage +``PYTHONPATH``. On Linux and OSX, you can run the command:: + + $ export PYTHONPATH=$PWD + +and on Windows:: + + $ set PYTHONPATH=/path/to/numpy + +Now editing a Python source file in NumPy allows you to immediately +test and use your changes (in ``.py`` files), by simply restarting the +interpreter. + +Note that another way to do an inplace build visible outside the repo base dir +is with ``python setup.py develop``. This doesn't work for NumPy, because +NumPy builds don't use ``setuptools`` by default. ``python setupegg.py +develop`` will work though. + + +Other build options +------------------- + +It's possible to do a parallel build with ``numpy.distutils`` with the ``-j`` option; +see :ref:`parallel-builds` for more details. + +In order to install the development version of NumPy in ``site-packages``, use +``python setup.py install --user``. + +A similar approach to in-place builds and use of ``PYTHONPATH`` but outside the +source tree is to use:: + + $ python setup.py install --prefix /some/owned/folder + $ export PYTHONPATH=/some/owned/folder/lib/python3.4/site-packages + +Besides ``numpy.distutils``, NumPy supports building with `Bento`_. +This provides (among other things) faster builds and a build log that's much +more readable than the ``distutils`` one. Note that support is still fairly +experimental, partly due to Bento relying on `Waf`_ which tends to have +non-backwards-compatible API changes. Working versions of Bento and Waf are +run on TravisCI, see ``tools/travis-test.sh``. + + +Using virtualenvs +----------------- + +A frequently asked question is "How do I set up a development version of NumPy +in parallel to a released version that I use to do my job/research?". + +One simple way to achieve this is to install the released version in +site-packages, by using a binary installer or pip for example, and set +up the development version in a virtualenv. First install +`virtualenv`_ (optionally use `virtualenvwrapper`_), then create your +virtualenv (named numpy-dev here) with:: + + $ virtualenv numpy-dev + +Now, whenever you want to switch to the virtual environment, you can use the +command ``source numpy-dev/bin/activate``, and ``deactivate`` to exit from the +virtual environment and back to your previous shell. + + +Running tests +------------- + +Besides using ``runtests.py``, there are various ways to run the tests. Inside +the interpreter, tests can be run like this:: + + >>> np.test() + >>> np.test('full') # Also run tests marked as slow + >>> np.test('full', verbose=2) # Additionally print test name/file + +Or a similar way from the command line:: + + $ python -c "import numpy as np; np.test()" + +Tests can also be run with ``nosetests numpy``, however then the NumPy-specific +``nose`` plugin is not found which causes tests marked as ``KnownFailure`` to +be reported as errors. + +Running individual test files can be useful; it's much faster than running the +whole test suite or that of a whole module (example: ``np.random.test()``). +This can be done with:: + + $ python path_to_testfile/test_file.py + +That also takes extra arguments, like ``--pdb`` which drops you into the Python +debugger when a test fails or an exception is raised. + +Running tests with `tox`_ is also supported. For example, to build NumPy and +run the test suite with Python 3.4, use:: + + $ tox -e py34 + +For more extensive info on running and writing tests, see +https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt . + + +Rebuilding & cleaning the workspace +----------------------------------- + +Rebuilding NumPy after making changes to compiled code can be done with the +same build command as you used previously - only the changed files will be +re-built. Doing a full build, which sometimes is necessary, requires cleaning +the workspace first. The standard way of doing this is (*note: deletes any +uncommitted files!*):: + + $ git clean -xdf + +When you want to discard all changes and go back to the last commit in the +repo, use one of:: + + $ git checkout . + $ git reset --hard + + +Debugging +--------- + +Another frequently asked question is "How do I debug C code inside Numpy?". +The easiest way to do this is to first write a Python script that invokes the C +code whose execution you want to debug. For instance ``mytest.py``:: + + from numpy import linspace + x = np.arange(5) + np.empty_like(x) + +Now, you can run:: + + $ gdb --args python runtests.py -g --python mytest.py + +And then in the debugger:: + + (gdb) break array_empty_like + (gdb) run + +The execution will now stop at the corresponding C function and you can step +through it as usual. With the Python extensions for gdb installed (often the +default on Linux), a number of useful Python-specific commands are available. +For example to see where in the Python code you are, use ``py-list``. For more +details, see `DebuggingWithGdb`_. + +Instead of plain ``gdb`` you can of course use your favourite +alternative debugger; run it on the python binary with arguments +``runtests.py -g --python mytest.py``. + +Building NumPy with a Python built with debug support (on Linux distributions +typically packaged as ``python-dbg``) is highly recommended. + + + +.. _Bento: http://cournape.github.io/Bento/ + +.. _DebuggingWithGdb: https://wiki.python.org/moin/DebuggingWithGdb + +.. _tox: http://tox.testrun.org + +.. _virtualenv: http://www.virtualenv.org/ + +.. _virtualenvwrapper: http://www.doughellmann.com/projects/virtualenvwrapper/ + +.. _Waf: https://code.google.com/p/waf/ diff --git a/doc/source/dev/gitwash/branch_list.png b/doc/source/dev/gitwash/branch_list.png Binary files differdeleted file mode 100644 index 1196eb754..000000000 --- a/doc/source/dev/gitwash/branch_list.png +++ /dev/null diff --git a/doc/source/dev/gitwash/branch_list_compare.png b/doc/source/dev/gitwash/branch_list_compare.png Binary files differdeleted file mode 100644 index 336afa374..000000000 --- a/doc/source/dev/gitwash/branch_list_compare.png +++ /dev/null diff --git a/doc/source/dev/gitwash/development_workflow.rst b/doc/source/dev/gitwash/development_workflow.rst index c67a5e457..6458059cb 100644 --- a/doc/source/dev/gitwash/development_workflow.rst +++ b/doc/source/dev/gitwash/development_workflow.rst @@ -16,93 +16,63 @@ Basic workflow In short: -1. Update your ``master`` branch if it's not up to date. - Then start a new *feature branch* for each set of edits that you do. +1. Start a new *feature branch* for each set of edits that you do. See :ref:`below <making-a-new-feature-branch>`. - Avoid putting new commits in your ``master`` branch. - 2. Hack away! See :ref:`below <editing-workflow>` -3. Avoid merging other branches into your feature branch while you are - working. - - You can optionally rebase if really needed, - see :ref:`below <rebasing-on-master>`. - -4. When finished: +3. When finished: - *Contributors*: push your feature branch to your own Github repo, and - :ref:`ask for code review or make a pull request <asking-for-merging>`. - - - *Core developers* (if you want to push changes without - further review):: - - # First, either (i) rebase on upstream -- if you have only few commits - git fetch upstream - git rebase upstream/master - - # or, (ii) merge to upstream -- if you have many related commits - git fetch upstream - git merge --no-ff upstream/master - - # Recheck that what is there is sensible - git log --oneline --graph - git log -p upstream/master.. - - # Finally, push branch to upstream master - git push upstream my-new-feature:master - - See :ref:`below <pushing-to-main>`. - -.. note:: It's usually a good idea to use the ``-n`` flag to ``git push`` - to check first that you're about to push the changes you want to - the place you want. + :ref:`create a pull request <asking-for-merging>`. + - *Core developers* If you want to push changes without + further review, see the notes :ref:`below <pushing-to-main>`. + This way of working helps to keep work well organized and the history as clear as possible. -.. note:: - - Do not use ``git pull`` --- this avoids common mistakes if you are - new to Git. Instead, always do ``git fetch`` followed by ``git - rebase``, ``git merge --ff-only`` or ``git merge --no-ff``, - depending on what you intend. - .. seealso:: - See discussions on `linux git workflow`_, - and `ipython git workflow <http://mail.scipy.org/pipermail/ipython-dev/2010-October/006746.html>`__. + There are many online tutorials to help you `learn git`_. For discussions + of specific git workflows, see these discussions on `linux git workflow`_, + and `ipython git workflow`_. .. _making-a-new-feature-branch: Making a new feature branch =========================== -To update your master branch, use:: +First, update your master branch with changes that have been made in the main +Numpy repository. In this case, the ``--ff-only`` flag ensures that a new +commit is not created when you merge the upstream and master branches. It is +very important to avoid merging adding new commits to ``master``. + +:: + # go to the master branch + git checkout master + # download changes from github git fetch upstream + # update the master branch git merge upstream/master --ff-only + # Push new commits to your Github repo + git push -To create a new branch and check it out, use:: - - git checkout -b my-new-feature upstream/master - -Generally, you will want to keep this branch also on your public github_ fork -of NumPy_. To do this, you `git push`_ this new branch up to your github_ -repo. Generally (if you followed the instructions in these pages, and -by default), git will have a link to your github_ repo, called -``origin``. You push up to your own repo on github_ with:: +.. note:: - git push origin my-new-feature + You could also use ``pull``, which combines ``fetch`` and ``merge``, as + follows:: + + git pull --ff-only upstream master -In git >= 1.7 you can ensure that the link is correctly set by using the -``--set-upstream`` option:: + However, never use ``git pull`` without explicity indicating the source + branch (as above); the inherent ambiguity can cause problems. This avoids a + common mistake if you are new to Git. - git push --set-upstream origin my-new-feature +Finally create a new branch for your work and check it out:: -From now on git_ will know that ``my-new-feature`` is related to the -``my-new-feature`` branch in your own github_ repo. + git checkout -b my-new-feature master .. _editing-workflow: @@ -116,18 +86,21 @@ Overview :: # hack hack - git add my_new_file - git commit -am 'ENH: some message' - + git status # Optional + git diff # Optional + git add modified_file + git commit # push the branch to your own Github repo git push In more detail -------------- -#. Make some changes -#. See which files have changed with ``git status`` (see `git status`_). - You'll see a listing like this one:: +#. Make some changes. When you feel that you've made a complete, working set + of related changes, move on to the next steps. + +#. Optional: Check which files have changed with ``git status`` (see `git + status`_). You'll see a listing like this one:: # On branch my-new-feature # Changed but not updated: @@ -142,21 +115,60 @@ In more detail # INSTALL no changes added to commit (use "git add" and/or "git commit -a") -#. Check what the actual changes are with ``git diff`` (`git diff`_). -#. Add any new files to version control ``git add new_file_name`` (see - `git add`_). -#. To commit all modified files into the local copy of your repo,, do - ``git commit -am 'A commit message'``. Note the ``-am`` options to - ``commit``. The ``m`` flag just signals that you're going to type a - message on the command line. If you leave it out, an editor will open in - which you can compose your commit message. For non-trivial commits this is - often the better choice. The ``a`` flag - you can just take on faith - or - see `why the -a flag?`_ - and the helpful use-case description in the - `tangled working copy problem`_. The section on - :ref:`commit messages <writing-the-commit-message>` below might also be useful. -#. To push the changes up to your forked repo on github_, do a ``git - push`` (see `git push`). - +#. Optional: Compare the changes with the previous version using with ``git + diff`` (`git diff`_). This brings up a simple text browser interface that + highlights the difference between your files and the previous verison. + +#. Add any relevant modified or new files using ``git add modified_file`` + (see `git add`_). This puts the files into a staging area, which is a queue + of files that will be added to your next commit. Only add files that have + related, complete changes. Leave files with unfinished changes for later + commits. + +#. To commit the staged files into the local copy of your repo, do ``git + commit``. At this point, a text editor will open up to allow you to write a + commit message. Read the :ref:`commit message + section<writing-the-commit-message>` to be sure that you are writing a + properly formatted and sufficiently detailed commit message. After saving + your message and closing the editor, your commit will be saved. For trivial + commits, a short commit message can be passed in through the command line + using the ``-m`` flag. For example, ``git commit -am "ENH: Some message"``. + + In some cases, you will see this form of the commit command: ``git commit + -a``. The extra ``-a`` flag automatically commits all modified files and + removes all deleted files. This can save you some typing of numerous ``git + add`` commands; however, it can add unwanted changes to a commit if you're + not careful. For more information, see `why the -a flag?`_ - and the + helpful use-case description in the `tangled working copy problem`_. + +#. Push the changes to your forked repo on github_:: + + git push origin my-new-feature + + For more information, see `git push`_. + +.. note:: + + Assuming you have followed the instructions in these pages, git will create + a default link to your github_ repo called ``origin``. In git >= 1.7 you + can ensure that the link to origin is permanently set by using the + ``--set-upstream`` option:: + + git push --set-upstream origin my-new-feature + + From now on git_ will know that ``my-new-feature`` is related to the + ``my-new-feature`` branch in your own github_ repo. Subsequent push calls + are then simplified to the following:: + + git push + + You have to use ``--set-upstream`` for each new branch that you create. + + +It may be the case that while you were working on your edits, new commits have +been added to ``upstream`` that affect your work. In this case, follow the +:ref:`rebasing-on-master` section of this document to apply those changes to +your branch. .. _writing-the-commit-message: @@ -195,27 +207,71 @@ Standard acronyms to start the commit message with are:: REL: related to releasing numpy +.. _asking-for-merging: + +Asking for your changes to be merged with the main repo +======================================================= + +When you feel your work is finished, you can create a pull request (PR). Github +has a nice help page that outlines the process for `filing pull requests`_. + +If your changes involve modifications to the API or addition/modification of a +function, you should initiate a code review. This involves sending an email to +the `NumPy mailing list`_ with a link to your PR along with a description of +and a motivation for your changes. + +.. _pushing-to-main: + +Pushing changes to the main repo +================================ + +*This is only relevant if you have commit rights to the main Numpy repo.* + +When you have a set of "ready" changes in a feature branch ready for +Numpy's ``master`` or ``maintenance`` branches, you can push +them to ``upstream`` as follows: + +1. First, merge or rebase on the target branch. + + a) Only a few, unrelated commits then prefer rebasing:: + + git fetch upstream + git rebase upstream/master + + See :ref:`rebasing-on-master`. + + b) If all of the commits are related, create a merge commit:: + + git fetch upstream + git merge --no-ff upstream/master + +2. Check that what you are going to push looks sensible:: + + git log -p upstream/master.. + git log --oneline --graph + +3. Push to upstream:: + + git push upstream my-feature-branch:master + +.. note:: + + It's usually a good idea to use the ``-n`` flag to ``git push`` to check + first that you're about to push the changes you want to the place you + want. + + .. _rebasing-on-master: Rebasing on master ================== This updates your feature branch with changes from the upstream `NumPy -github`_ repo. If you do not absolutely need to do this, try to avoid -doing it, except perhaps when you are finished. - -First, it can be useful to update your master branch:: - - # go to the master branch - git checkout master - # pull changes from github - git fetch upstream - # update the master branch - git rebase upstream/master - # push it to your Github repo - git push - -Then, the feature branch:: +github`_ repo. If you do not absolutely need to do this, try to avoid doing +it, except perhaps when you are finished. The first step will be to update +your master branch with new commits from upstream. This is done in the same +manner as described at the beginning of :ref:`making-a-new-feature-branch`. +Next, you need to update the feature branch:: # go to the feature branch git checkout my-new-feature @@ -225,15 +281,17 @@ Then, the feature branch:: git rebase master If you have made changes to files that have changed also upstream, -this may generate merge conflicts that you need to resolve. -Finally, remove the backup branch once the rebase succeeded:: +this may generate merge conflicts that you need to resolve. See +:ref:`below<recovering-from-mess-up>` for help in this case. + +Finally, remove the backup branch upon a successful rebase:: git branch -D tmp .. _recovering-from-mess-up: Recovering from mess-ups ------------------------- +======================== Sometimes, you mess up merges or rebases. Luckily, in Git it is relatively straightforward to recover from such mistakes. @@ -262,100 +320,8 @@ If you forgot to make a backup branch:: If you didn't actually mess up but there are merge conflicts, you need to resolve those. This can be one of the trickier things to get right. For a -good description of how to do this, see -http://git-scm.com/book/en/Git-Branching-Basic-Branching-and-Merging#Basic-Merge-Conflicts - - -.. _asking-for-merging: - -Asking for your changes to be merged with the main repo -======================================================= - -When you feel your work is finished, you can ask for code review, or -directly file a pull request. - -Asking for code review ----------------------- - -#. Go to your repo URL - e.g. ``http://github.com/your-user-name/numpy``. -#. Click on the *Branch list* button: - - .. image:: branch_list.png - -#. Click on the *Compare* button for your feature branch - here ``my-new-feature``: - - .. image:: branch_list_compare.png - -#. If asked, select the *base* and *comparison* branch names you want to - compare. Usually these will be ``master`` and ``my-new-feature`` - (where that is your feature branch name). -#. At this point you should get a nice summary of the changes. Copy the - URL for this, and post it to the `NumPy mailing list`_, asking for - review. The URL will look something like: - ``http://github.com/your-user-name/numpy/compare/master...my-new-feature``. - There's an example at - http://github.com/matthew-brett/nipy/compare/master...find-install-data - See: http://github.com/blog/612-introducing-github-compare-view for - more detail. - -The generated comparison, is between your feature branch -``my-new-feature``, and the place in ``master`` from which you branched -``my-new-feature``. In other words, you can keep updating ``master`` -without interfering with the output from the comparison. More detail? -Note the three dots in the URL above (``master...my-new-feature``) and -see :ref:`dot2-dot3`. - -Filing a pull request ---------------------- - -When you are ready to ask for the merge of your code: - -#. Go to the URL of your forked repo, say - ``http://github.com/your-user-name/numpy.git``. -#. Click on the 'Pull request' button: - - .. image:: pull_button.png - - Enter a message; we suggest you select only ``NumPy`` as the - recipient. The message will go to the NumPy core developers. Please - feel free to add others from the list as you like. - - -.. _pushing-to-main: - -Pushing changes to the main repo -================================ - -When you have a set of "ready" changes in a feature branch ready for -Numpy's ``master`` or ``maintenance/1.5.x`` branches, you can push -them to ``upstream`` as follows: - -1. First, merge or rebase on the target branch. - - a) Only a few commits: prefer rebasing:: - - git fetch upstream - git rebase upstream/master - - See :ref:`above <rebasing-on-master>`. - - b) Many related commits: consider creating a merge commit:: - - git fetch upstream - git merge --no-ff upstream/master - -2. Check that what you are going to push looks sensible:: - - git log -p upstream/master.. - git log --oneline --graph - -3. Push to upstream:: - - git push upstream my-feature-branch:master - -.. note:: +good description of how to do this, see `this article on merging conflicts`_. - Avoid using ``git pull`` here. Additional things you might want to do ###################################### diff --git a/doc/source/dev/gitwash/git_links.inc b/doc/source/dev/gitwash/git_links.inc index 9ea01717d..e80ab2b63 100644 --- a/doc/source/dev/gitwash/git_links.inc +++ b/doc/source/dev/gitwash/git_links.inc @@ -75,11 +75,18 @@ .. _tangled working copy problem: http://tomayko.com/writings/the-thing-about-git .. _git management: http://kerneltrap.org/Linux/Git_Management .. _linux git workflow: http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg39091.html +.. _ipython git workflow: http://mail.scipy.org/pipermail/ipython-dev/2010-October/006746.html .. _git parable: http://tom.preston-werner.com/2009/05/19/the-git-parable.html .. _git foundation: http://matthew-brett.github.com/pydagogue/foundation.html .. _numpy/master: https://github.com/numpy/numpy .. _git cherry-pick: https://www.kernel.org/pub/software/scm/git/docs/git-cherry-pick.html .. _git blame: https://www.kernel.org/pub/software/scm/git/docs/git-blame.html +.. _this blog post: http://github.com/blog/612-introducing-github-compare-view +.. _this article on merging conflicts: http://git-scm.com/book/en/Git-Branching-Basic-Branching-and-Merging#Basic-Merge-Conflicts +.. _learn git: https://www.atlassian.com/git/tutorials/ +.. _filing pull requests: https://help.github.com/articles/using-pull-requests/#initiating-the-pull-request +.. _pull request review: https://help.github.com/articles/using-pull-requests/#reviewing-the-pull-request + .. other stuff .. _python: http://www.python.org diff --git a/doc/source/dev/gitwash/index.rst b/doc/source/dev/gitwash/index.rst index 9d733dd1c..ae7ce69de 100644 --- a/doc/source/dev/gitwash/index.rst +++ b/doc/source/dev/gitwash/index.rst @@ -1,7 +1,7 @@ .. _using-git: Working with *NumPy* source code -====================================== +================================ Contents: diff --git a/doc/source/dev/index.rst b/doc/source/dev/index.rst index 2229f3ccb..b0d0ec483 100644 --- a/doc/source/dev/index.rst +++ b/doc/source/dev/index.rst @@ -6,5 +6,6 @@ Contributing to Numpy :maxdepth: 3 gitwash/index + development_environment For core developers: see :ref:`development-workflow`. diff --git a/doc/source/reference/arrays.classes.rst b/doc/source/reference/arrays.classes.rst index e77dfc31e..caaf3a73b 100644 --- a/doc/source/reference/arrays.classes.rst +++ b/doc/source/reference/arrays.classes.rst @@ -337,7 +337,7 @@ Record arrays (:mod:`numpy.rec`) :ref:`arrays.dtypes`. Numpy provides the :class:`recarray` class which allows accessing the -fields of a record/structured array as attributes, and a corresponding +fields of a structured array as attributes, and a corresponding scalar data type object :class:`record`. .. currentmodule:: numpy diff --git a/doc/source/reference/arrays.dtypes.rst b/doc/source/reference/arrays.dtypes.rst index 797f1f6f8..a43c23218 100644 --- a/doc/source/reference/arrays.dtypes.rst +++ b/doc/source/reference/arrays.dtypes.rst @@ -14,12 +14,12 @@ following aspects of the data: 1. Type of the data (integer, float, Python object, etc.) 2. Size of the data (how many bytes is in *e.g.* the integer) 3. Byte order of the data (:term:`little-endian` or :term:`big-endian`) -4. If the data type is a :term:`record`, an aggregate of other +4. If the data type is :term:`structured`, an aggregate of other data types, (*e.g.*, describing an array item consisting of an integer and a float), - 1. what are the names of the ":term:`fields <field>`" of the record, - by which they can be :ref:`accessed <arrays.indexing.rec>`, + 1. what are the names of the ":term:`fields <field>`" of the structure, + by which they can be :ref:`accessed <arrays.indexing.fields>`, 2. what is the data-type of each :term:`field`, and 3. which part of the memory block each field takes. @@ -40,15 +40,14 @@ needed in Numpy. .. index:: pair: dtype; field - pair: dtype; record -Struct data types are formed by creating a data type whose +Structured data types are formed by creating a data type whose :term:`fields` contain other data types. Each field has a name by -which it can be :ref:`accessed <arrays.indexing.rec>`. The parent data +which it can be :ref:`accessed <arrays.indexing.fields>`. The parent data type should be of sufficient size to contain all its fields; the parent is nearly always based on the :class:`void` type which allows -an arbitrary item size. Struct data types may also contain nested struct -sub-array data types in their fields. +an arbitrary item size. Structured data types may also contain nested +structured sub-array data types in their fields. .. index:: pair: dtype; sub-array @@ -60,7 +59,7 @@ fixed size. If an array is created using a data-type describing a sub-array, the dimensions of the sub-array are appended to the shape of the array when the array is created. Sub-arrays in a field of a -record behave differently, see :ref:`arrays.indexing.rec`. +structured type behave differently, see :ref:`arrays.indexing.fields`. Sub-arrays always have a C-contiguous memory layout. @@ -83,7 +82,7 @@ Sub-arrays always have a C-contiguous memory layout. .. admonition:: Example - A record data type containing a 16-character string (in field 'name') + A structured data type containing a 16-character string (in field 'name') and a sub-array of two 64-bit floating-point number (in field 'grades'): >>> dt = np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))]) @@ -246,8 +245,8 @@ Array-protocol type strings (see :ref:`arrays.interface`) String with comma-separated fields - Numarray introduced a short-hand notation for specifying the format - of a record as a comma-separated string of basic formats. + A short-hand notation for specifying the format of a structured data type is + a comma-separated string of basic formats. A basic format in this context is an optional shape specifier followed by an array-protocol type string. Parenthesis are required @@ -315,7 +314,7 @@ Type strings >>> dt = np.dtype((np.int32, (2,2))) # 2 x 2 integer sub-array >>> dt = np.dtype(('S10', 1)) # 10-character string - >>> dt = np.dtype(('i4, (2,3)f8, f4', (2,3))) # 2 x 3 record sub-array + >>> dt = np.dtype(('i4, (2,3)f8, f4', (2,3))) # 2 x 3 structured sub-array .. index:: triple: dtype; construction; from list @@ -432,7 +431,8 @@ Type strings Both arguments must be convertible to data-type objects in this case. The *base_dtype* is the data-type object that the new data-type builds on. This is how you could assign named fields to - any built-in data-type object. + any built-in data-type object, as done in + :ref:`record arrays <arrays.classes.rec>`. .. admonition:: Example @@ -486,7 +486,7 @@ Endianness of this data: dtype.byteorder -Information about sub-data-types in a :term:`record`: +Information about sub-data-types in a :term:`structured` data type: .. autosummary:: :toctree: generated/ diff --git a/doc/source/reference/arrays.indexing.rst b/doc/source/reference/arrays.indexing.rst index ef0180e0f..2eb07c4e0 100644 --- a/doc/source/reference/arrays.indexing.rst +++ b/doc/source/reference/arrays.indexing.rst @@ -11,7 +11,7 @@ Indexing :class:`ndarrays <ndarray>` can be indexed using the standard Python ``x[obj]`` syntax, where *x* is the array and *obj* the selection. -There are three kinds of indexing available: record access, basic +There are three kinds of indexing available: field access, basic slicing, advanced indexing. Which one occurs depends on *obj*. .. note:: @@ -489,25 +489,25 @@ indexing (in no particular order): view on the data. This *must* be done if the subclasses ``__getitem__`` does not return views. -.. _arrays.indexing.rec: +.. _arrays.indexing.fields: -Record Access +Field Access ------------- .. seealso:: :ref:`arrays.dtypes`, :ref:`arrays.scalars` -If the :class:`ndarray` object is a record array, *i.e.* its data type -is a :term:`record` data type, the :term:`fields <field>` of the array -can be accessed by indexing the array with strings, dictionary-like. +If the :class:`ndarray` object is a structured array the :term:`fields <field>` +of the array can be accessed by indexing the array with strings, +dictionary-like. Indexing ``x['field-name']`` returns a new :term:`view` to the array, which is of the same shape as *x* (except when the field is a sub-array) but of data type ``x.dtype['field-name']`` and contains -only the part of the data in the specified field. Also record array -scalars can be "indexed" this way. +only the part of the data in the specified field. Also +:ref:`record array <arrays.classes.rec>` scalars can be "indexed" this way. -Indexing into a record array can also be done with a list of field names, +Indexing into a structured array can also be done with a list of field names, *e.g.* ``x[['field-name1','field-name2']]``. Currently this returns a new array containing a copy of the values in the fields specified in the list. As of NumPy 1.7, returning a copy is being deprecated in favor of returning diff --git a/doc/source/reference/arrays.interface.rst b/doc/source/reference/arrays.interface.rst index 16abe5ce1..50595c2d8 100644 --- a/doc/source/reference/arrays.interface.rst +++ b/doc/source/reference/arrays.interface.rst @@ -103,19 +103,19 @@ This approach to the interface consists of the object having an not a requirement. The only requirement is that the number of bytes represented in the *typestr* key is the same as the total number of bytes represented here. The idea is to support - descriptions of C-like structs (records) that make up array + descriptions of C-like structs that make up array elements. The elements of each tuple in the list are 1. A string providing a name associated with this portion of - the record. This could also be a tuple of ``('full name', + the datatype. This could also be a tuple of ``('full name', 'basic_name')`` where basic name would be a valid Python variable name representing the full name of the field. 2. Either a basic-type description string as in *typestr* or - another list (for nested records) + another list (for nested structured types) 3. An optional shape tuple providing how many times this part - of the record should be repeated. No repeats are assumed + of the structure should be repeated. No repeats are assumed if this is not given. Very complicated structures can be described using this generic interface. Notice, however, that each element of the array is still of the same @@ -301,7 +301,8 @@ more information which may be important for various applications:: typestr == '|V16' descr == [('ival','>i4'),('','|V4'),('dval','>f8')] -It should be clear that any record type could be described using this interface. +It should be clear that any structured type could be described using this +interface. Differences with Array interface (Version 2) ============================================ diff --git a/doc/source/reference/arrays.ndarray.rst b/doc/source/reference/arrays.ndarray.rst index e9c0a6d87..c8d834d1c 100644 --- a/doc/source/reference/arrays.ndarray.rst +++ b/doc/source/reference/arrays.ndarray.rst @@ -82,7 +82,7 @@ Indexing arrays Arrays can be indexed using an extended Python slicing syntax, ``array[selection]``. Similar syntax is also used for accessing -fields in a :ref:`record array <arrays.dtypes>`. +fields in a :ref:`structured array <arrays.dtypes.field>`. .. seealso:: :ref:`Array Indexing <arrays.indexing>`. diff --git a/doc/source/reference/arrays.scalars.rst b/doc/source/reference/arrays.scalars.rst index f229efb07..652fa62e1 100644 --- a/doc/source/reference/arrays.scalars.rst +++ b/doc/source/reference/arrays.scalars.rst @@ -250,7 +250,7 @@ array scalar, - ``x[()]`` returns a 0-dimensional :class:`ndarray` - ``x['field-name']`` returns the array scalar in the field *field-name*. - (*x* can have fields, for example, when it corresponds to a record data type.) + (*x* can have fields, for example, when it corresponds to a structured data type.) Methods ======= @@ -282,10 +282,10 @@ Defining new types ================== There are two ways to effectively define a new array scalar type -(apart from composing record :ref:`dtypes <arrays.dtypes>` from the built-in -scalar types): One way is to simply subclass the :class:`ndarray` and -overwrite the methods of interest. This will work to a degree, but -internally certain behaviors are fixed by the data type of the array. -To fully customize the data type of an array you need to define a new -data-type, and register it with NumPy. Such new types can only be -defined in C, using the :ref:`Numpy C-API <c-api>`. +(apart from composing structured types :ref:`dtypes <arrays.dtypes>` from +the built-in scalar types): One way is to simply subclass the +:class:`ndarray` and overwrite the methods of interest. This will work to +a degree, but internally certain behaviors are fixed by the data type of +the array. To fully customize the data type of an array you need to +define a new data-type, and register it with NumPy. Such new types can only +be defined in C, using the :ref:`Numpy C-API <c-api>`. diff --git a/doc/source/reference/c-api.array.rst b/doc/source/reference/c-api.array.rst index e2b8d33d6..5911eaa68 100644 --- a/doc/source/reference/c-api.array.rst +++ b/doc/source/reference/c-api.array.rst @@ -108,10 +108,13 @@ sub-types). .. cfunction:: int PyArray_FLAGS(PyArrayObject* arr) -.. cfunction:: int PyArray_ITEMSIZE(PyArrayObject* arr) +.. cfunction:: npy_intp PyArray_ITEMSIZE(PyArrayObject* arr) Return the itemsize for the elements of this array. + Note that, in the old API that was deprecated in version 1.7, this function + had the return type ``int``. + .. cfunction:: int PyArray_TYPE(PyArrayObject* arr) Return the (builtin) typenumber for the elements of this array. @@ -460,7 +463,7 @@ From other objects .. cvar:: NPY_ARRAY_IN_ARRAY - :cdata:`NPY_ARRAY_CONTIGUOUS` \| :cdata:`NPY_ARRAY_ALIGNED` + :cdata:`NPY_ARRAY_C_CONTIGUOUS` \| :cdata:`NPY_ARRAY_ALIGNED` .. cvar:: NPY_ARRAY_IN_FARRAY @@ -1250,9 +1253,9 @@ Special functions for NPY_OBJECT A function to INCREF all the objects at the location *ptr* according to the data-type *dtype*. If *ptr* is the start of a - record with an object at any offset, then this will (recursively) + structured type with an object at any offset, then this will (recursively) increment the reference count of all object-like items in the - record. + structured type. .. cfunction:: int PyArray_XDECREF(PyArrayObject* op) @@ -1263,7 +1266,7 @@ Special functions for NPY_OBJECT .. cfunction:: void PyArray_Item_XDECREF(char* ptr, PyArray_Descr* dtype) - A function to XDECREF all the object-like items at the loacation + A function to XDECREF all the object-like items at the location *ptr* as recorded in the data-type, *dtype*. This works recursively so that if ``dtype`` itself has fields with data-types that contain object-like items, all the object-like fields will be @@ -1550,7 +1553,7 @@ Conversion itemsize of the new array type must be less than *self* ->descr->elsize or an error is raised. The same shape and strides as the original array are used. Therefore, this function has the - effect of returning a field from a record array. But, it can also + effect of returning a field from a structured array. But, it can also be used to select specific bytes or groups of bytes from any array type. @@ -1796,7 +1799,7 @@ Item selection and manipulation ->descr is a data-type with fields defined, then self->descr->names is used to determine the sort order. A comparison where the first field is equal will use the second - field and so on. To alter the sort order of a record array, create + field and so on. To alter the sort order of a structured array, create a new data-type with a different order of names and construct a view of the array with that new data-type. @@ -1815,7 +1818,7 @@ Item selection and manipulation to understand the order the *sort_keys* must be in (reversed from the order you would use when comparing two elements). - If these arrays are all collected in a record array, then + If these arrays are all collected in a structured array, then :cfunc:`PyArray_Sort` (...) can also be used to sort the array directly. @@ -1848,7 +1851,7 @@ Item selection and manipulation If *self*->descr is a data-type with fields defined, then self->descr->names is used to determine the sort order. A comparison where the first field is equal will use the second field and so on. To alter the - sort order of a record array, create a new data-type with a different + sort order of a structured array, create a new data-type with a different order of names and construct a view of the array with that new data-type. Returns zero on success and -1 on failure. diff --git a/doc/source/reference/c-api.iterator.rst b/doc/source/reference/c-api.iterator.rst index 084fdcbce..1d90ce302 100644 --- a/doc/source/reference/c-api.iterator.rst +++ b/doc/source/reference/c-api.iterator.rst @@ -18,8 +18,6 @@ preservation of memory layouts, and buffering of data with the wrong alignment or type, without requiring difficult coding. This page documents the API for the iterator. -The C-API naming convention chosen is based on the one in the numpy-refactor -branch, so will integrate naturally into the refactored code base. The iterator is named ``NpyIter`` and functions are named ``NpyIter_*``. @@ -28,51 +26,6 @@ which may be of interest for those using this C API. In many instances, testing out ideas by creating the iterator in Python is a good idea before writing the C iteration code. -Converting from Previous NumPy Iterators ----------------------------------------- - -The existing iterator API includes functions like PyArrayIter_Check, -PyArray_Iter* and PyArray_ITER_*. The multi-iterator array includes -PyArray_MultiIter*, PyArray_Broadcast, and PyArray_RemoveSmallest. The -new iterator design replaces all of this functionality with a single object -and associated API. One goal of the new API is that all uses of the -existing iterator should be replaceable with the new iterator without -significant effort. In 1.6, the major exception to this is the neighborhood -iterator, which does not have corresponding features in this iterator. - -Here is a conversion table for which functions to use with the new iterator: - -===================================== ============================================= -*Iterator Functions* -:cfunc:`PyArray_IterNew` :cfunc:`NpyIter_New` -:cfunc:`PyArray_IterAllButAxis` :cfunc:`NpyIter_New` + ``axes`` parameter **or** - Iterator flag :cdata:`NPY_ITER_EXTERNAL_LOOP` -:cfunc:`PyArray_BroadcastToShape` **NOT SUPPORTED** (Use the support for - multiple operands instead.) -:cfunc:`PyArrayIter_Check` Will need to add this in Python exposure -:cfunc:`PyArray_ITER_RESET` :cfunc:`NpyIter_Reset` -:cfunc:`PyArray_ITER_NEXT` Function pointer from :cfunc:`NpyIter_GetIterNext` -:cfunc:`PyArray_ITER_DATA` :cfunc:`NpyIter_GetDataPtrArray` -:cfunc:`PyArray_ITER_GOTO` :cfunc:`NpyIter_GotoMultiIndex` -:cfunc:`PyArray_ITER_GOTO1D` :cfunc:`NpyIter_GotoIndex` or - :cfunc:`NpyIter_GotoIterIndex` -:cfunc:`PyArray_ITER_NOTDONE` Return value of ``iternext`` function pointer -*Multi-iterator Functions* -:cfunc:`PyArray_MultiIterNew` :cfunc:`NpyIter_MultiNew` -:cfunc:`PyArray_MultiIter_RESET` :cfunc:`NpyIter_Reset` -:cfunc:`PyArray_MultiIter_NEXT` Function pointer from :cfunc:`NpyIter_GetIterNext` -:cfunc:`PyArray_MultiIter_DATA` :cfunc:`NpyIter_GetDataPtrArray` -:cfunc:`PyArray_MultiIter_NEXTi` **NOT SUPPORTED** (always lock-step iteration) -:cfunc:`PyArray_MultiIter_GOTO` :cfunc:`NpyIter_GotoMultiIndex` -:cfunc:`PyArray_MultiIter_GOTO1D` :cfunc:`NpyIter_GotoIndex` or - :cfunc:`NpyIter_GotoIterIndex` -:cfunc:`PyArray_MultiIter_NOTDONE` Return value of ``iternext`` function pointer -:cfunc:`PyArray_Broadcast` Handled by :cfunc:`NpyIter_MultiNew` -:cfunc:`PyArray_RemoveSmallest` Iterator flag :cdata:`NPY_ITER_EXTERNAL_LOOP` -*Other Functions* -:cfunc:`PyArray_ConvertToCommonType` Iterator flag :cdata:`NPY_ITER_COMMON_DTYPE` -===================================== ============================================= - Simple Iteration Example ------------------------ @@ -91,6 +44,7 @@ number of non-zero elements in an array. NpyIter* iter; NpyIter_IterNextFunc *iternext; char** dataptr; + npy_intp nonzero_count; npy_intp* strideptr,* innersizeptr; /* Handle zero-sized arrays specially */ @@ -138,7 +92,7 @@ number of non-zero elements in an array. /* The location of the inner loop size which the iterator may update */ innersizeptr = NpyIter_GetInnerLoopSizePtr(iter); - /* The iteration loop */ + nonzero_count = 0; do { /* Get the inner loop data/stride/count values */ char* data = *dataptr; @@ -1296,3 +1250,48 @@ functions provide that information. .. index:: pair: iterator; C-API + +Converting from Previous NumPy Iterators +---------------------------------------- + +The old iterator API includes functions like PyArrayIter_Check, +PyArray_Iter* and PyArray_ITER_*. The multi-iterator array includes +PyArray_MultiIter*, PyArray_Broadcast, and PyArray_RemoveSmallest. The +new iterator design replaces all of this functionality with a single object +and associated API. One goal of the new API is that all uses of the +existing iterator should be replaceable with the new iterator without +significant effort. In 1.6, the major exception to this is the neighborhood +iterator, which does not have corresponding features in this iterator. + +Here is a conversion table for which functions to use with the new iterator: + +===================================== ============================================= +*Iterator Functions* +:cfunc:`PyArray_IterNew` :cfunc:`NpyIter_New` +:cfunc:`PyArray_IterAllButAxis` :cfunc:`NpyIter_New` + ``axes`` parameter **or** + Iterator flag :cdata:`NPY_ITER_EXTERNAL_LOOP` +:cfunc:`PyArray_BroadcastToShape` **NOT SUPPORTED** (Use the support for + multiple operands instead.) +:cfunc:`PyArrayIter_Check` Will need to add this in Python exposure +:cfunc:`PyArray_ITER_RESET` :cfunc:`NpyIter_Reset` +:cfunc:`PyArray_ITER_NEXT` Function pointer from :cfunc:`NpyIter_GetIterNext` +:cfunc:`PyArray_ITER_DATA` :cfunc:`NpyIter_GetDataPtrArray` +:cfunc:`PyArray_ITER_GOTO` :cfunc:`NpyIter_GotoMultiIndex` +:cfunc:`PyArray_ITER_GOTO1D` :cfunc:`NpyIter_GotoIndex` or + :cfunc:`NpyIter_GotoIterIndex` +:cfunc:`PyArray_ITER_NOTDONE` Return value of ``iternext`` function pointer +*Multi-iterator Functions* +:cfunc:`PyArray_MultiIterNew` :cfunc:`NpyIter_MultiNew` +:cfunc:`PyArray_MultiIter_RESET` :cfunc:`NpyIter_Reset` +:cfunc:`PyArray_MultiIter_NEXT` Function pointer from :cfunc:`NpyIter_GetIterNext` +:cfunc:`PyArray_MultiIter_DATA` :cfunc:`NpyIter_GetDataPtrArray` +:cfunc:`PyArray_MultiIter_NEXTi` **NOT SUPPORTED** (always lock-step iteration) +:cfunc:`PyArray_MultiIter_GOTO` :cfunc:`NpyIter_GotoMultiIndex` +:cfunc:`PyArray_MultiIter_GOTO1D` :cfunc:`NpyIter_GotoIndex` or + :cfunc:`NpyIter_GotoIterIndex` +:cfunc:`PyArray_MultiIter_NOTDONE` Return value of ``iternext`` function pointer +:cfunc:`PyArray_Broadcast` Handled by :cfunc:`NpyIter_MultiNew` +:cfunc:`PyArray_RemoveSmallest` Iterator flag :cdata:`NPY_ITER_EXTERNAL_LOOP` +*Other Functions* +:cfunc:`PyArray_ConvertToCommonType` Iterator flag :cdata:`NPY_ITER_COMMON_DTYPE` +===================================== ============================================= diff --git a/doc/source/reference/internals.code-explanations.rst b/doc/source/reference/internals.code-explanations.rst index 580661cb3..f01300e25 100644 --- a/doc/source/reference/internals.code-explanations.rst +++ b/doc/source/reference/internals.code-explanations.rst @@ -74,9 +74,9 @@ optimizations that by-pass this mechanism, but the point of the data- type abstraction is to allow new data-types to be added. One of the built-in data-types, the void data-type allows for -arbitrary records containing 1 or more fields as elements of the +arbitrary structured types containing 1 or more fields as elements of the array. A field is simply another data-type object along with an offset -into the current record. In order to support arbitrarily nested +into the current structured type. In order to support arbitrarily nested fields, several recursive implementations of data-type access are implemented for the void type. A common idiom is to cycle through the elements of the dictionary and perform a specific operation based on @@ -184,7 +184,7 @@ The array scalars also offer the same methods and attributes as arrays with the intent that the same code can be used to support arbitrary dimensions (including 0-dimensions). The array scalars are read-only (immutable) with the exception of the void scalar which can also be -written to so that record-array field setting works more naturally +written to so that structured array field setting works more naturally (a[0]['f1'] = ``value`` ). diff --git a/doc/source/reference/routines.io.rst b/doc/source/reference/routines.io.rst index 26afbfb4f..b99754912 100644 --- a/doc/source/reference/routines.io.rst +++ b/doc/source/reference/routines.io.rst @@ -3,8 +3,8 @@ Input and output .. currentmodule:: numpy -NPZ files ---------- +Numpy binary files (NPY, NPZ) +----------------------------- .. autosummary:: :toctree: generated/ @@ -13,6 +13,9 @@ NPZ files savez savez_compressed +The format of these binary file types is documented in +http://docs.scipy.org/doc/numpy/neps/npy-format.html + Text files ---------- .. autosummary:: diff --git a/doc/source/reference/routines.ma.rst b/doc/source/reference/routines.ma.rst index 66bcb1f1c..2408899b3 100644 --- a/doc/source/reference/routines.ma.rst +++ b/doc/source/reference/routines.ma.rst @@ -184,6 +184,8 @@ Finding masked data ma.flatnotmasked_edges ma.notmasked_contiguous ma.notmasked_edges + ma.clump_masked + ma.clump_unmasked Modifying a mask diff --git a/doc/source/release.rst b/doc/source/release.rst index 657eb9a5d..c853fc0f0 100644 --- a/doc/source/release.rst +++ b/doc/source/release.rst @@ -3,6 +3,7 @@ Release Notes ************* .. include:: ../release/1.10.0-notes.rst +.. include:: ../release/1.9.1-notes.rst .. include:: ../release/1.9.0-notes.rst .. include:: ../release/1.8.2-notes.rst .. include:: ../release/1.8.1-notes.rst diff --git a/doc/source/user/basics.rec.rst b/doc/source/user/basics.rec.rst index ce6c3b851..1be5af081 100644 --- a/doc/source/user/basics.rec.rst +++ b/doc/source/user/basics.rec.rst @@ -1,7 +1,7 @@ .. _structured_arrays: -*************************************** -Structured arrays (aka "Record arrays") -*************************************** +***************** +Structured arrays +***************** .. automodule:: numpy.doc.structured_arrays diff --git a/doc/source/user/c-info.how-to-extend.rst b/doc/source/user/c-info.how-to-extend.rst index 4d54c0eef..b088b5006 100644 --- a/doc/source/user/c-info.how-to-extend.rst +++ b/doc/source/user/c-info.how-to-extend.rst @@ -10,7 +10,7 @@ How to extend NumPy | --- *Alan Turing* -.. _`sec:Writing-an-extension`: +.. _writing-an-extension: Writing an extension module =========================== diff --git a/doc/source/user/c-info.python-as-glue.rst b/doc/source/user/c-info.python-as-glue.rst index 8dfd39beb..cc360e966 100644 --- a/doc/source/user/c-info.python-as-glue.rst +++ b/doc/source/user/c-info.python-as-glue.rst @@ -35,15 +35,6 @@ libraries from Python and the purpose of this Chapter is not to make you an expert. The main goal is to make you aware of some of the possibilities so that you will know what to "Google" in order to learn more. -The http://www.scipy.org website also contains a great deal of useful -information about many of these tools. For example, there is a nice -description of using several of the tools explained in this chapter at -http://www.scipy.org/PerformancePython. This link provides several -ways to solve the same problem showing how to use and connect with -compiled code to get the best performance. In the process you can get -a taste for several of the approaches that will be discussed in this -chapter. - Calling other compiled libraries from Python ============================================ @@ -54,7 +45,7 @@ raw computations inside of for loops) to be up 10-100 times slower than equivalent code written in a static compiled language. In addition, it can cause memory usage to be larger than necessary as temporary arrays are created and destroyed during computation. For -many types of computing needs the extra slow-down and memory +many types of computing needs, the extra slow-down and memory consumption can often not be spared (at least for time- or memory- critical portions of your code). Therefore one of the most common needs is to call out from Python code to a fast, machine-code routine @@ -65,9 +56,8 @@ high-level language for scientific and engineering programming. Their are two basic approaches to calling compiled code: writing an extension module that is then imported to Python using the import command, or calling a shared-library subroutine directly from Python -using the ctypes module (included in the standard distribution since -Python 2.5). The first method is the most common (but with the -inclusion of ctypes into Python 2.5 this status may change). +using the `ctypes <https://docs.python.org/3/library/ctypes.html>`_ +module. Writing an extension module is the most common method. .. warning:: @@ -80,14 +70,14 @@ inclusion of ctypes into Python 2.5 this status may change). Hand-generated wrappers ======================= -Extension modules were discussed in Chapter `1 -<#sec-writing-an-extension>`__ . The most basic way to interface with -compiled code is to write an extension module and construct a module -method that calls the compiled code. For improved readability, your -method should take advantage of the PyArg_ParseTuple call to convert -between Python objects and C data-types. For standard C data-types -there is probably already a built-in converter. For others you may -need to write your own converter and use the "O&" format string which +Extension modules were discussed in :ref:`writing-an-extension`. +The most basic way to interface with compiled code is to write +an extension module and construct a module method that calls +the compiled code. For improved readability, your method should +take advantage of the ``PyArg_ParseTuple`` call to convert between +Python objects and C data-types. For standard C data-types there +is probably already a built-in converter. For others you may need +to write your own converter and use the ``"O&"`` format string which allows you to specify a function that will be used to perform the conversion from the Python object to whatever C-structures are needed. @@ -110,10 +100,10 @@ it can be adapted using the time-honored technique of "cutting-pasting-and-modifying" from other extension modules. Because, the procedure of calling out to additional C-code is fairly regimented, code-generation procedures have been developed to make -this process easier. One of these code- generation techniques is +this process easier. One of these code-generation techniques is distributed with NumPy and allows easy integration with Fortran and (simple) C code. This package, f2py, will be covered briefly in the -next session. +next section. f2py @@ -123,7 +113,8 @@ F2py allows you to automatically construct an extension module that interfaces to routines in Fortran 77/90/95 code. It has the ability to parse Fortran 77/90/95 code and automatically generate Python signatures for the subroutines it encounters, or you can guide how the -subroutine interfaces with Python by constructing an interface-definition-file (or modifying the f2py-produced one). +subroutine interfaces with Python by constructing an interface-definition-file +(or modifying the f2py-produced one). .. index:: single: f2py @@ -176,7 +167,7 @@ This command leaves a file named add.{ext} in the current directory module on your platform --- so, pyd, *etc.* ). This module may then be imported from Python. It will contain a method for each subroutine in add (zadd, cadd, dadd, sadd). The docstring of each method contains -information about how the module method may be called: +information about how the module method may be called:: >>> import add >>> print add.zadd.__doc__ @@ -199,9 +190,9 @@ attempt to convert all arguments to their required types (and shapes) and issue an error if unsuccessful. However, because it knows nothing about the semantics of the arguments (such that C is an output and n should really match the array sizes), it is possible to abuse this -function in ways that can cause Python to crash. For example: +function in ways that can cause Python to crash. For example:: - >>> add.zadd([1,2,3],[1,2],[3,4],1000) + >>> add.zadd([1,2,3], [1,2], [3,4], 1000) will cause a program crash on most systems. Under the covers, the lists are being converted to proper arrays but then the underlying add @@ -254,7 +245,7 @@ by compiling both ``add.f95`` and ``add.pyf``:: f2py -c add.pyf add.f95 -The new interface has docstring: +The new interface has docstring:: >>> import add >>> print add.zadd.__doc__ @@ -266,7 +257,7 @@ The new interface has docstring: Return objects: c : rank-1 array('D') with bounds (n) -Now, the function can be called in a much more robust way: +Now, the function can be called in a much more robust way:: >>> add.zadd([1,2,3],[4,5,6]) array([ 5.+0.j, 7.+0.j, 9.+0.j]) @@ -306,11 +297,11 @@ Then, I can compile the extension module using:: The resulting signature for the function add.zadd is exactly the same one that was created previously. If the original source code had -contained A(N) instead of A(\*) and so forth with B and C, then I -could obtain (nearly) the same interface simply by placing the -INTENT(OUT) :: C comment line in the source code. The only difference -is that N would be an optional input that would default to the length -of A. +contained ``A(N)`` instead of ``A(*)`` and so forth with ``B`` and ``C``, +then I could obtain (nearly) the same interface simply by placing the +``INTENT(OUT) :: C`` comment line in the source code. The only difference +is that ``N`` would be an optional input that would default to the length +of ``A``. A filtering example @@ -356,23 +347,20 @@ version of the input. Calling f2py from Python ------------------------ -The f2py program is written in Python and can be run from inside your -module. This provides a facility that is somewhat similar to the use -of weave.ext_tools described below. An example of the final interface -executed using Python code is: +The f2py program is written in Python and can be run from inside your code +to compile Fortran code at runtime, as follows: .. code-block:: python - import numpy.f2py as f2py - fid = open('add.f') - source = fid.read() - fid.close() - f2py.compile(source, modulename='add') + from numpy import f2py + with open("add.f") as sourcefile: + sourcecode = sourcefile.read() + f2py.compile(sourcecode, modulename='add') import add The source string can be any valid Fortran code. If you want to save the extension-module source code then a suitable file-name can be -provided by the source_fn keyword to the compile function. +provided by the ``source_fn`` keyword to the compile function. Automatic extension module generation @@ -381,9 +369,9 @@ Automatic extension module generation If you want to distribute your f2py extension module, then you only need to include the .pyf file and the Fortran code. The distutils extensions in NumPy allow you to define an extension module entirely -in terms of this interface file. A valid setup.py file allowing -distribution of the add.f module (as part of the package f2py_examples -so that it would be loaded as f2py_examples.add) is: +in terms of this interface file. A valid ``setup.py`` file allowing +distribution of the ``add.f`` module (as part of the package +``f2py_examples`` so that it would be loaded as ``f2py_examples.add``) is: .. code-block:: python @@ -403,10 +391,10 @@ Installation of the new package is easy using:: assuming you have the proper permissions to write to the main site- packages directory for the version of Python you are using. For the -resulting package to work, you need to create a file named __init__.py -(in the same directory as add.pyf). Notice the extension module is -defined entirely in terms of the "add.pyf" and "add.f" files. The -conversion of the .pyf file to a .c file is handled by numpy.disutils. +resulting package to work, you need to create a file named ``__init__.py`` +(in the same directory as ``add.pyf``). Notice the extension module is +defined entirely in terms of the ``add.pyf`` and ``add.f`` files. The +conversion of the .pyf file to a .c file is handled by `numpy.disutils`. Conclusion @@ -438,471 +426,200 @@ written C-code. single: f2py -weave -===== - -Weave is a scipy package that can be used to automate the process of -extending Python with C/C++ code. It can be used to speed up -evaluation of an array expression that would otherwise create -temporary variables, to directly "inline" C/C++ code into Python, or -to create a fully-named extension module. You must either install -scipy or get the weave package separately and install it using the -standard python setup.py install. You must also have a C/C++-compiler -installed and useable by Python distutils in order to use weave. - -.. index:: - single: weave - -Somewhat dated, but still useful documentation for weave can be found -at the link http://www.scipy/Weave. There are also many examples found -in the examples directory which is installed under the weave directory -in the place where weave is installed on your system. - - -Speed up code involving arrays (also see scipy.numexpr) -------------------------------------------------------- - -This is the easiest way to use weave and requires minimal changes to -your Python code. It involves placing quotes around the expression of -interest and calling weave.blitz. Weave will parse the code and -generate C++ code using Blitz C++ arrays. It will then compile the -code and catalog the shared library so that the next time this exact -string is asked for (and the array types are the same), the already- -compiled shared library will be loaded and used. Because Blitz makes -extensive use of C++ templating, it can take a long time to compile -the first time. After that, however, the code should evaluate more -quickly than the equivalent NumPy expression. This is especially true -if your array sizes are large and the expression would require NumPy -to create several temporaries. Only expressions involving basic -arithmetic operations and basic array slicing can be converted to -Blitz C++ code. - -For example, consider the expression:: - - d = 4*a + 5*a*b + 6*b*c - -where a, b, and c are all arrays of the same type and shape. When the -data-type is double-precision and the size is 1000x1000, this -expression takes about 0.5 seconds to compute on an 1.1Ghz AMD Athlon -machine. When this expression is executed instead using blitz: - -.. code-block:: python - - d = empty(a.shape, 'd'); weave.blitz(expr) - -execution time is only about 0.20 seconds (about 0.14 seconds spent in -weave and the rest in allocating space for d). Thus, we've sped up the -code by a factor of 2 using only a simnple command (weave.blitz). Your -mileage may vary, but factors of 2-8 speed-ups are possible with this -very simple technique. - -If you are interested in using weave in this way, then you should also -look at scipy.numexpr which is another similar way to speed up -expressions by eliminating the need for temporary variables. Using -numexpr does not require a C/C++ compiler. - - -Inline C-code -------------- - -Probably the most widely-used method of employing weave is to -"in-line" C/C++ code into Python in order to speed up a time-critical -section of Python code. In this method of using weave, you define a -string containing useful C-code and then pass it to the function -**weave.inline** ( ``code_string``, ``variables`` ), where -code_string is a string of valid C/C++ code and variables is a list of -variables that should be passed in from Python. The C/C++ code should -refer to the variables with the same names as they are defined with in -Python. If weave.line should return anything the the special value -return_val should be set to whatever object should be returned. The -following example shows how to use weave on basic Python objects: - -.. code-block:: python - - code = r""" - int i; - py::tuple results(2); - for (i=0; i<a.length(); i++) { - a[i] = i; - } - results[0] = 3.0; - results[1] = 4.0; - return_val = results; - """ - a = [None]*10 - res = weave.inline(code,['a']) - -The C++ code shown in the code string uses the name 'a' to refer to -the Python list that is passed in. Because the Python List is a -mutable type, the elements of the list itself are modified by the C++ -code. A set of C++ classes are used to access Python objects using -simple syntax. - -The main advantage of using C-code, however, is to speed up processing -on an array of data. Accessing a NumPy array in C++ code using weave, -depends on what kind of type converter is chosen in going from NumPy -arrays to C++ code. The default converter creates 5 variables for the -C-code for every NumPy array passed in to weave.inline. The following -table shows these variables which can all be used in the C++ code. The -table assumes that ``myvar`` is the name of the array in Python with -data-type {dtype} (i.e. float64, float32, int8, etc.) - -=========== ============== ========================================= -Variable Type Contents -=========== ============== ========================================= -myvar {dtype}* Pointer to the first element of the array -Nmyvar npy_intp* A pointer to the dimensions array -Smyvar npy_intp* A pointer to the strides array -Dmyvar int The number of dimensions -myvar_array PyArrayObject* The entire structure for the array -=========== ============== ========================================= - -The in-lined code can contain references to any of these variables as -well as to the standard macros MYVAR1(i), MYVAR2(i,j), MYVAR3(i,j,k), -and MYVAR4(i,j,k,l). These name-based macros (they are the Python name -capitalized followed by the number of dimensions needed) will de- -reference the memory for the array at the given location with no error -checking (be-sure to use the correct macro and ensure the array is -aligned and in correct byte-swap order in order to get useful -results). The following code shows how you might use these variables -and macros to code a loop in C that computes a simple 2-d weighted -averaging filter. - -.. code-block:: c++ - - int i,j; - for(i=1;i<Na[0]-1;i++) { - for(j=1;j<Na[1]-1;j++) { - B2(i,j) = A2(i,j) + (A2(i-1,j) + - A2(i+1,j)+A2(i,j-1) - + A2(i,j+1))*0.5 - + (A2(i-1,j-1) - + A2(i-1,j+1) - + A2(i+1,j-1) - + A2(i+1,j+1))*0.25 - } - } - -The above code doesn't have any error checking and so could fail with -a Python crash if, ``a`` had the wrong number of dimensions, or ``b`` -did not have the same shape as ``a``. However, it could be placed -inside a standard Python function with the necessary error checking to -produce a robust but fast subroutine. - -One final note about weave.inline: if you have additional code you -want to include in the final extension module such as supporting -function calls, include statements, etc. you can pass this code in as a -string using the keyword support_code: ``weave.inline(code, variables, -support_code=support)``. If you need the extension module to link -against an additional library then you can also pass in -distutils-style keyword arguments such as library_dirs, libraries, -and/or runtime_library_dirs which point to the appropriate libraries -and directories. - -Simplify creation of an extension module ----------------------------------------- - -The inline function creates one extension module for each function to- -be inlined. It also generates a lot of intermediate code that is -duplicated for each extension module. If you have several related -codes to execute in C, it would be better to make them all separate -functions in a single extension module with multiple functions. You -can also use the tools weave provides to produce this larger extension -module. In fact, the weave.inline function just uses these more -general tools to do its work. - -The approach is to: - -1. construct a extension module object using - ext_tools.ext_module(``module_name``); - -2. create function objects using ext_tools.ext_function(``func_name``, - ``code``, ``variables``); - -3. (optional) add support code to the function using the - .customize.add_support_code( ``support_code`` ) method of the - function object; - -4. add the functions to the extension module object using the - .add_function(``func``) method; - -5. when all the functions are added, compile the extension with its - .compile() method. - -Several examples are available in the examples directory where weave -is installed on your system. Look particularly at ramp2.py, -increment_example.py and fibonacii.py - - -Conclusion ----------- - -Weave is a useful tool for quickly routines in C/C++ and linking them -into Python. It's caching-mechanism allows for on-the-fly compilation -which makes it particularly attractive for in-house code. Because of -the requirement that the user have a C++-compiler, it can be difficult -(but not impossible) to distribute a package that uses weave to other -users who don't have a compiler installed. Of course, weave could be -used to construct an extension module which is then distributed in the -normal way *(* using a setup.py file). While you can use weave to -build larger extension modules with many methods, creating methods -with a variable- number of arguments is not possible. Thus, for a more -sophisticated module, you will still probably want a Python-layer that -calls the weave-produced extension. - -.. index:: - single: weave - +Cython +====== -Pyrex -===== +`Cython <http://cython.org>`_ is a compiler for a Python dialect that adds +(optional) static typing for speed, and allows mixing C or C++ code +into your modules. It produces C or C++ extensions that can be compiled +and imported in Python code. -Pyrex is a way to write C-extension modules using Python-like syntax. -It is an interesting way to generate extension modules that is growing -in popularity, particularly among people who have rusty or non- -existent C-skills. It does require the user to write the "interface" -code and so is more time-consuming than SWIG or f2py if you are trying -to interface to a large library of code. However, if you are writing -an extension module that will include quite a bit of your own -algorithmic code, as well, then Pyrex is a good match. A big weakness -perhaps is the inability to easily and quickly access the elements of -a multidimensional array. +If you are writing an extension module that will include quite a bit of your +own algorithmic code as well, then Cython is a good match. Among its +features is the ability to easily and quickly +work with multidimensional arrays. .. index:: - single: pyrex + single: cython -Notice that Pyrex is an extension-module generator only. Unlike weave -or f2py, it includes no automatic facility for compiling and linking +Notice that Cython is an extension-module generator only. Unlike f2py, +it includes no automatic facility for compiling and linking the extension module (which must be done in the usual fashion). It -does provide a modified distutils class called build_ext which lets -you build an extension module from a .pyx source. Thus, you could -write in a setup.py file: +does provide a modified distutils class called ``build_ext`` which lets +you build an extension module from a ``.pyx`` source. Thus, you could +write in a ``setup.py`` file: .. code-block:: python - from Pyrex.Distutils import build_ext + from Cython.Distutils import build_ext from distutils.extension import Extension from distutils.core import setup - import numpy - py_ext = Extension('mine', ['mine.pyx'], - include_dirs=[numpy.get_include()]) setup(name='mine', description='Nothing', - ext_modules=[pyx_ext], + ext_modules=[Extension('filter', ['filter.pyx'], + include_dirs=[numpy.get_include()])], cmdclass = {'build_ext':build_ext}) Adding the NumPy include directory is, of course, only necessary if -you are using NumPy arrays in the extension module (which is what I -assume you are using Pyrex for). The distutils extensions in NumPy +you are using NumPy arrays in the extension module (which is what we +assume you are using Cython for). The distutils extensions in NumPy also include support for automatically producing the extension-module and linking it from a ``.pyx`` file. It works so that if the user does -not have Pyrex installed, then it looks for a file with the same +not have Cython installed, then it looks for a file with the same file-name but a ``.c`` extension which it then uses instead of trying to produce the ``.c`` file again. -Pyrex does not natively understand NumPy arrays. However, it is not -difficult to include information that lets Pyrex deal with them -usefully. In fact, the numpy.random.mtrand module was written using -Pyrex so an example of Pyrex usage is already included in the NumPy -source distribution. That experience led to the creation of a standard -c_numpy.pxd file that you can use to simplify interacting with NumPy -array objects in a Pyrex-written extension. The file may not be -complete (it wasn't at the time of this writing). If you have -additions you'd like to contribute, please send them. The file is -located in the .../site-packages/numpy/doc/pyrex directory where you -have Python installed. There is also an example in that directory of -using Pyrex to construct a simple extension module. It shows that -Pyrex looks a lot like Python but also contains some new syntax that -is necessary in order to get C-like speed. - -If you just use Pyrex to compile a standard Python module, then you -will get a C-extension module that runs either as fast or, possibly, -more slowly than the equivalent Python module. Speed increases are -possible only when you use cdef to statically define C variables and -use a special construct to create for loops: +If you just use Cython to compile a standard Python module, then you +will get a C extension module that typically runs a bit faster than the +equivalent Python module. Further speed increases can be gained by using +the ``cdef`` keyword to statically define C variables. + +Let's look at two examples we've seen before to see how they might be +implemented using Cython. These examples were compiled into extension +modules using Cython 0.21.1. + + +Complex addition in Cython +-------------------------- + +Here is part of a Cython module named ``add.pyx`` which implements the +complex addition functions we previously implemented using f2py: .. code-block:: none - cdef int i - for i from start <= i < stop + cimport cython + cimport numpy as np + import numpy as np -Let's look at two examples we've seen before to see how they might be -implemented using Pyrex. These examples were compiled into extension -modules using Pyrex-0.9.3.1. + # We need to initialize NumPy. + np.import_array() + #@cython.boundscheck(False) + def zadd(in1, in2): + cdef double complex[:] a = in1.ravel() + cdef double complex[:] b = in2.ravel() -Pyrex-add ---------- + out = np.empty(a.shape[0], np.complex64) + cdef double complex[:] c = out.ravel() -Here is part of a Pyrex-file I named add.pyx which implements the add -functions we previously implemented using f2py: + for i in range(c.shape[0]): + c[i].real = a[i].real + b[i].real + c[i].imag = a[i].imag + b[i].imag -.. code-block:: none + return out - cimport c_numpy - from c_numpy cimport import_array, ndarray, npy_intp, npy_cdouble, \ - npy_cfloat, NPY_DOUBLE, NPY_CDOUBLE, NPY_FLOAT, \ - NPY_CFLOAT - - #We need to initialize NumPy - import_array() - - def zadd(object ao, object bo): - cdef ndarray c, a, b - cdef npy_intp i - a = c_numpy.PyArray_ContiguousFromAny(ao, - NPY_CDOUBLE, 1, 1) - b = c_numpy.PyArray_ContiguousFromAny(bo, - NPY_CDOUBLE, 1, 1) - c = c_numpy.PyArray_SimpleNew(a.nd, a.dimensions, - a.descr.type_num) - for i from 0 <= i < a.dimensions[0]: - (<npy_cdouble *>c.data)[i].real = \ - (<npy_cdouble *>a.data)[i].real + \ - (<npy_cdouble *>b.data)[i].real - (<npy_cdouble *>c.data)[i].imag = \ - (<npy_cdouble *>a.data)[i].imag + \ - (<npy_cdouble *>b.data)[i].imag - return c +This module shows use of the ``cimport`` statement to load the definitions +from the ``numpy.pxd`` header that ships with Cython. It looks like NumPy is +imported twice; ``cimport`` only makes the NumPy C-API available, while the +regular ``import`` causes a Python-style import at runtime and makes it +possible to call into the familiar NumPy Python API. + +The example also demonstrates Cython's "typed memoryviews", which are like +NumPy arrays at the C level, in the sense that they are shaped and strided +arrays that know their own extent (unlike a C array addressed through a bare +pointer). The syntax ``double complex[:]`` denotes a one-dimensional array +(vector) of doubles, with arbitrary strides. A contiguous array of ints would +be ``int[::1]``, while a matrix of floats would be ``float[:, :]``. + +Shown commented is the ``cython.boundscheck`` decorator, which turns +bounds-checking for memory view accesses on or off on a per-function basis. +We can use this to further speed up our code, at the expense of safety +(or a manual check prior to entering the loop). + +Other than the view syntax, the function is immediately readable to a Python +programmer. Static typing of the variable ``i`` is implicit. Instead of the +view syntax, we could also have used Cython's special NumPy array syntax, +but the view syntax is preferred. -This module shows use of the ``cimport`` statement to load the -definitions from the c_numpy.pxd file. As shown, both versions of the -import statement are supported. It also shows use of the NumPy C-API -to construct NumPy arrays from arbitrary input objects. The array c is -created using PyArray_SimpleNew. Then the c-array is filled by -addition. Casting to a particiular data-type is accomplished using -<cast \*>. Pointers are de-referenced with bracket notation and -members of structures are accessed using '.' notation even if the -object is techinically a pointer to a structure. The use of the -special for loop construct ensures that the underlying code will have -a similar C-loop so the addition calculation will proceed quickly. -Notice that we have not checked for NULL after calling to the C-API ---- a cardinal sin when writing C-code. For routines that return -Python objects, Pyrex inserts the checks for NULL into the C-code for -you and returns with failure if need be. There is also a way to get -Pyrex to automatically check for exceptions when you call functions -that don't return Python objects. See the documentation of Pyrex for -details. - - -Pyrex-filter ------------- -The two-dimensional example we created using weave is a bit uglier to -implement in Pyrex because two-dimensional indexing using Pyrex is not -as simple. But, it is straightforward (and possibly faster because of -pre-computed indices). Here is the Pyrex-file I named image.pyx. +Image filter in Cython +---------------------- + +The two-dimensional example we created using Fortran is just as easy to write +in Cython: .. code-block:: none - cimport c_numpy - from c_numpy cimport import_array, ndarray, npy_intp,\ - NPY_DOUBLE, NPY_CDOUBLE, \ - NPY_FLOAT, NPY_CFLOAT, NPY_ALIGNED \ - - #We need to initialize NumPy - import_array() - def filter(object ao): - cdef ndarray a, b - cdef npy_intp i, j, M, N, oS - cdef npy_intp r,rm1,rp1,c,cm1,cp1 - cdef double value - # Require an ALIGNED array - # (but not necessarily contiguous) - # We will use strides to access the elements. - a = c_numpy.PyArray_FROMANY(ao, NPY_DOUBLE, \ - 2, 2, NPY_ALIGNED) - b = c_numpy.PyArray_SimpleNew(a.nd,a.dimensions, \ - a.descr.type_num) - M = a.dimensions[0] - N = a.dimensions[1] - S0 = a.strides[0] - S1 = a.strides[1] - for i from 1 <= i < M-1: - r = i*S0 - rm1 = r-S0 - rp1 = r+S0 - oS = i*N - for j from 1 <= j < N-1: - c = j*S1 - cm1 = c-S1 - cp1 = c+S1 - (<double *>b.data)[oS+j] = \ - (<double *>(a.data+r+c))[0] + \ - ((<double *>(a.data+rm1+c))[0] + \ - (<double *>(a.data+rp1+c))[0] + \ - (<double *>(a.data+r+cm1))[0] + \ - (<double *>(a.data+r+cp1))[0])*0.5 + \ - ((<double *>(a.data+rm1+cm1))[0] + \ - (<double *>(a.data+rp1+cm1))[0] + \ - (<double *>(a.data+rp1+cp1))[0] + \ - (<double *>(a.data+rm1+cp1))[0])*0.25 - return b + cimport numpy as np + import numpy as np + + np.import_array() + + def filter(img): + cdef double[:, :] a = np.asarray(img, dtype=np.double) + out = np.zeros(img.shape, dtype=np.double) + cdef double[:, ::1] b = out + + cdef np.npy_intp i, j + + for i in range(1, a.shape[0] - 1): + for j in range(1, a.shape[1] - 1): + b[i, j] = (a[i, j] + + .5 * ( a[i-1, j] + a[i+1, j] + + a[i, j-1] + a[i, j+1]) + + .25 * ( a[i-1, j-1] + a[i-1, j+1] + + a[i+1, j-1] + a[i+1, j+1])) + + return out This 2-d averaging filter runs quickly because the loop is in C and -the pointer computations are done only as needed. However, it is not -particularly easy to understand what is happening. A 2-d image, ``in`` -, can be filtered using this code very quickly using: +the pointer computations are done only as needed. If the code above is +compiled as a module ``image``, then a 2-d image, ``img``, can be filtered +using this code very quickly using: .. code-block:: python import image - out = image.filter(in) + out = image.filter(img) + +Regarding the code, two things are of note: firstly, it is impossible to +return a memory view to Python. Instead, a NumPy array ``out`` is first +created, and then a view ``b`` onto this array is used for the computation. +Secondly, the view ``b`` is typed ``double[:, ::1]``. This means 2-d array +with contiguous rows, i.e., C matrix order. Specifying the order explicitly +can speed up some algorithms since they can skip stride computations. Conclusion ---------- -There are several disadvantages of using Pyrex: - -1. The syntax for Pyrex can get a bit bulky, and it can be confusing at - first to understand what kind of objects you are getting and how to - interface them with C-like constructs. - -2. Inappropriate Pyrex syntax or incorrect calls to C-code or type- - mismatches can result in failures such as +Cython is the extension mechanism of choice for several scientific Python +libraries, including Scipy, Pandas, SAGE, scikit-image and scikit-learn, +as well as the XML processing library LXML. +The language and compiler are well-maintained. - 1. Pyrex failing to generate the extension module source code, +There are several disadvantages of using Cython: - 2. Compiler failure while generating the extension module binary due to - incorrect C syntax, +1. When coding custom algorithms, and sometimes when wrapping existing C + libraries, some familiarity with C is required. In particular, when using + C memory management (``malloc`` and friends), it's easy to introduce + memory leaks. However, just compiling a Python module renamed to ``.pyx`` + can already speed it up, and adding a few type declarations can give + dramatic speedups in some code. - 3. Python failure when trying to use the module. - - -3. It is easy to lose a clean separation between Python and C which makes +2. It is easy to lose a clean separation between Python and C which makes re-using your C-code for other non-Python-related projects more difficult. -4. Multi-dimensional arrays are "bulky" to index (appropriate macros - may be able to fix this). - -5. The C-code generated by Pyrex is hard to read and modify (and typically +3. The C-code generated by Cython is hard to read and modify (and typically compiles with annoying but harmless warnings). -Writing a good Pyrex extension module still takes a bit of effort -because not only does it require (a little) familiarity with C, but -also with Pyrex's brand of Python-mixed-with C. One big advantage of -Pyrex-generated extension modules is that they are easy to distribute -using distutils. In summary, Pyrex is a very capable tool for either -gluing C-code or generating an extension module quickly and should not -be over-looked. It is especially useful for people that can't or won't -write C-code or Fortran code. But, if you are already able to write -simple subroutines in C or Fortran, then I would use one of the other -approaches such as f2py (for Fortran), ctypes (for C shared- -libraries), or weave (for inline C-code). +One big advantage of Cython-generated extension modules is that they are +easy to distribute. In summary, Cython is a very capable tool for either +gluing C code or generating an extension module quickly and should not be +over-looked. It is especially useful for people that can't or won't write +C or Fortran code. .. index:: - single: pyrex - - + single: cython ctypes ====== -Ctypes is a Python extension module, included in the stdlib, that +`Ctypes <https://docs.python.org/3/library/ctypes.html>`_ +is a Python extension module, included in the stdlib, that allows you to call an arbitrary function in a shared library directly from Python. This approach allows you to interface with C-code directly from Python. This opens up an enormous number of libraries for use from @@ -928,11 +645,11 @@ mention the conversion from ctypes objects to C-data-types that ctypes itself performs), will make the interface slower than a hand-written extension-module interface. However, this overhead should be neglible if the C-routine being called is doing any significant amount of work. -If you are a great Python programmer with weak C-skills, ctypes is an +If you are a great Python programmer with weak C skills, ctypes is an easy way to write a useful interface to a (shared) library of compiled code. -To use c-types you must +To use ctypes you must 1. Have a shared library. @@ -947,18 +664,16 @@ Having a shared library ----------------------- There are several requirements for a shared library that can be used -with c-types that are platform specific. This guide assumes you have +with ctypes that are platform specific. This guide assumes you have some familiarity with making a shared library on your system (or simply have a shared library available to you). Items to remember are: - A shared library must be compiled in a special way ( *e.g.* using - the -shared flag with gcc). + the ``-shared`` flag with gcc). - On some platforms (*e.g.* Windows) , a shared library requires a .def file that specifies the functions to be exported. For example a - mylib.def file might contain. - - :: + mylib.def file might contain:: LIBRARY mylib.dll EXPORTS @@ -966,15 +681,15 @@ simply have a shared library available to you). Items to remember are: cool_function2 Alternatively, you may be able to use the storage-class specifier - __declspec(dllexport) in the C-definition of the function to avoid the - need for this .def file. + ``__declspec(dllexport)`` in the C-definition of the function to avoid + the need for this ``.def`` file. There is no standard way in Python distutils to create a standard shared library (an extension module is a "special" shared library Python understands) in a cross-platform manner. Thus, a big disadvantage of ctypes at the time of writing this book is that it is difficult to distribute in a cross-platform manner a Python extension -that uses c-types and includes your own code which should be compiled +that uses ctypes and includes your own code which should be compiled as a shared library on the users system. @@ -982,13 +697,13 @@ Loading the shared library -------------------------- A simple, but robust way to load the shared library is to get the -absolute path name and load it using the cdll object of ctypes.: +absolute path name and load it using the cdll object of ctypes: .. code-block:: python lib = ctypes.cdll[<full_path_name>] -However, on Windows accessing an attribute of the cdll method will +However, on Windows accessing an attribute of the ``cdll`` method will load the first DLL by that name found in the current directory or on the PATH. Loading the absolute path name requires a little finesse for cross-platform work since the extension of shared libraries varies. @@ -997,22 +712,22 @@ simplify the process of finding the library to load but it is not foolproof. Complicating matters, different platforms have different default extensions used by shared libraries (e.g. .dll -- Windows, .so -- Linux, .dylib -- Mac OS X). This must also be taken into account if -you are using c-types to wrap code that needs to work on several +you are using ctypes to wrap code that needs to work on several platforms. NumPy provides a convenience function called -:func:`ctypeslib.load_library` (name, path). This function takes the name +``ctypeslib.load_library`` (name, path). This function takes the name of the shared library (including any prefix like 'lib' but excluding the extension) and a path where the shared library can be located. It -returns a ctypes library object or raises an OSError if the library -cannot be found or raises an ImportError if the ctypes module is not +returns a ctypes library object or raises an ``OSError`` if the library +cannot be found or raises an ``ImportError`` if the ctypes module is not available. (Windows users: the ctypes library object loaded using -:func:`load_library` is always loaded assuming cdecl calling convention. -See the ctypes documentation under ctypes.windll and/or ctypes.oledll +``load_library`` is always loaded assuming cdecl calling convention. +See the ctypes documentation under ``ctypes.windll`` and/or ``ctypes.oledll`` for ways to load libraries under other calling conventions). The functions in the shared library are available as attributes of the -ctypes library object (returned from :func:`ctypeslib.load_library`) or +ctypes library object (returned from ``ctypeslib.load_library``) or as items using ``lib['func_name']`` syntax. The latter method for retrieving a function name is particularly useful if the function name contains characters that are not allowable in Python variable names. @@ -1022,10 +737,10 @@ Converting arguments -------------------- Python ints/longs, strings, and unicode objects are automatically -converted as needed to equivalent c-types arguments The None object is +converted as needed to equivalent ctypes arguments The None object is also converted automatically to a NULL pointer. All other Python objects must be converted to ctypes-specific types. There are two ways -around this restriction that allow c-types to integrate with other +around this restriction that allow ctypes to integrate with other objects. 1. Don't set the argtypes attribute of the function object and define an @@ -1040,7 +755,7 @@ objects. NumPy uses both methods with a preference for the second method because it can be safer. The ctypes attribute of the ndarray returns -an object that has an _as_parameter\_ attribute which returns an +an object that has an ``_as_parameter_`` attribute which returns an integer representing the address of the ndarray to which it is associated. As a result, one can pass this ctypes attribute object directly to a function expecting a pointer to the data in your @@ -1059,26 +774,26 @@ the ndarray that were specified by the user in the call to :func:`ndpointer`. Aspects of the ndarray that can be checked include the data-type, the number-of-dimensions, the shape, and/or the state of the flags on any array passed. The return value of the from_param method is the ctypes -attribute of the array which (because it contains the _as_parameter\_ +attribute of the array which (because it contains the ``_as_parameter_`` attribute pointing to the array data area) can be used by ctypes directly. The ctypes attribute of an ndarray is also endowed with additional attributes that may be convenient when passing additional information about the array into a ctypes function. The attributes **data**, -**shape**, and **strides** can provide c-types compatible types +**shape**, and **strides** can provide ctypes compatible types corresponding to the data-area, the shape, and the strides of the array. The data attribute reutrns a ``c_void_p`` representing a pointer to the data area. The shape and strides attributes each return an array of ctypes integers (or None representing a NULL pointer, if a 0-d array). The base ctype of the array is a ctype integer of the same size as a pointer on the platform. There are also methods -data_as({ctype}), shape_as(<base ctype>), and strides_as(<base -ctype>). These return the data as a ctype object of your choice and +``data_as({ctype})``, ``shape_as(<base ctype>)``, and ``strides_as(<base +ctype>)``. These return the data as a ctype object of your choice and the shape/strides arrays using an underlying base type of your choice. -For convenience, the **ctypeslib** module also contains **c_intp** as +For convenience, the ``ctypeslib`` module also contains ``c_intp`` as a ctypes integer data-type whose size is the same as the size of -``c_void_p`` on the platform (it's value is None if ctypes is not +``c_void_p`` on the platform (its value is None if ctypes is not installed). @@ -1086,13 +801,13 @@ Calling the function -------------------- The function is accessed as an attribute of or an item from the loaded -shared-library. Thus, if "./mylib.so" has a function named -"cool_function1" , I could access this function either as: +shared-library. Thus, if ``./mylib.so`` has a function named +``cool_function1`` , I could access this function either as: .. code-block:: python lib = numpy.ctypeslib.load_library('mylib','.') - func1 = lib.cool_function1 # or equivalently + func1 = lib.cool_function1 # or equivalently func1 = lib['cool_function1'] In ctypes, the return-value of a function is set to be 'int' by @@ -1126,11 +841,11 @@ signature necessary requirements. Using an ndpointer class in the argtypes method can make it -significantly safer to call a C-function using ctypes and the data- +significantly safer to call a C function using ctypes and the data- area of an ndarray. You may still want to wrap the function in an additional Python wrapper to make it user-friendly (hiding some obvious arguments and making some arguments output arguments). In this -process, the **requires** function in NumPy may be useful to return the right +process, the ``requires`` function in NumPy may be useful to return the right kind of array from a given input. @@ -1139,9 +854,9 @@ Complete example In this example, I will show how the addition function and the filter function implemented previously using the other approaches can be -implemented using ctypes. First, the C-code which implements the -algorithms contains the functions zadd, dadd, sadd, cadd, and -dfilter2d. The zadd function is: +implemented using ctypes. First, the C code which implements the +algorithms contains the functions ``zadd``, ``dadd``, ``sadd``, ``cadd``, +and ``dfilter2d``. The ``zadd`` function is: .. code-block:: c @@ -1157,8 +872,8 @@ dfilter2d. The zadd function is: } } -with similar code for cadd, dadd, and sadd that handles complex float, -double, and float data-types, respectively: +with similar code for ``cadd``, ``dadd``, and ``sadd`` that handles complex +float, double, and float data-types, respectively: .. code-block:: c @@ -1183,7 +898,7 @@ double, and float data-types, respectively: } } -The code.c file also contains the function dfilter2d: +The ``code.c`` file also contains the function ``dfilter2d``: .. code-block:: c @@ -1216,14 +931,14 @@ A possible advantage this code has over the Fortran-equivalent code is that it takes arbitrarily strided (i.e. non-contiguous arrays) and may also run faster depending on the optimization capability of your compiler. But, it is a obviously more complicated than the simple code -in filter.f. This code must be compiled into a shared library. On my +in ``filter.f``. This code must be compiled into a shared library. On my Linux system this is accomplished using:: gcc -o code.so -shared code.c Which creates a shared_library named code.so in the current directory. -On Windows don't forget to either add __declspec(dllexport) in front -of void on the line preceeding each function definition, or write a +On Windows don't forget to either add ``__declspec(dllexport)`` in front +of void on the line preceding each function definition, or write a code.def file that lists the names of the functions to be exported. A suitable Python interface to this shared library should be @@ -1254,7 +969,7 @@ following lines at the top: 'writeable'), N.ctypeslib.c_intp] -This code loads the shared library named code.{ext} located in the +This code loads the shared library named ``code.{ext}`` located in the same path as this file. It then adds a return type of void to the functions contained in the library. It also adds argument checking to the functions in the library so that ndarrays can be passed as the @@ -1326,13 +1041,13 @@ Conclusion single: ctypes Using ctypes is a powerful way to connect Python with arbitrary -C-code. It's advantages for extending Python include +C-code. Its advantages for extending Python include -- clean separation of C-code from Python code +- clean separation of C code from Python code - no need to learn a new syntax except Python and C - - allows re-use of C-code + - allows re-use of C code - functionality in shared libraries written for other purposes can be obtained with a simple Python wrapper and search for the library. @@ -1342,7 +1057,7 @@ C-code. It's advantages for extending Python include - full argument checking with the ndpointer class factory -It's disadvantages include +Its disadvantages include - It is difficult to distribute an extension module made using ctypes because of a lack of support for building shared libraries in @@ -1350,15 +1065,14 @@ It's disadvantages include - You must have shared-libraries of your code (no static libraries). -- Very little support for C++ code and it's different library-calling - conventions. You will probably need a C-wrapper around C++ code to use +- Very little support for C++ code and its different library-calling + conventions. You will probably need a C wrapper around C++ code to use with ctypes (or just use Boost.Python instead). Because of the difficulty in distributing an extension module made -using ctypes, f2py is still the easiest way to extend Python for -package creation. However, ctypes is a close second and will probably -be growing in popularity now that it is part of the Python -distribution. This should bring more features to ctypes that should +using ctypes, f2py and Cython are still the easiest ways to extend Python +for package creation. However, ctypes is in some cases a useful alternative. +This should bring more features to ctypes that should eliminate the difficulty in extending Python and distributing the extension using ctypes. @@ -1367,10 +1081,10 @@ Additional tools you may find useful ==================================== These tools have been found useful by others using Python and so are -included here. They are discussed separately because I see them as -either older ways to do things more modernly handled by f2py, weave, -Pyrex, or ctypes (SWIG, PyFort, PyInline) or because I don't know much -about them (SIP, Boost, Instant). I have not added links to these +included here. They are discussed separately because they are +either older ways to do things now handled by f2py, Cython, or ctypes +(SWIG, PyFort) or because I don't know much about them (SIP, Boost). +I have not added links to these methods because my experience is that you can find the most relevant link faster using Google or some other search engine, and any links provided here would be quickly dated. Do not assume that just because @@ -1452,70 +1166,6 @@ have a set of C++ classes that need to be integrated cleanly into Python, consider learning about and using Boost.Python. -Instant -------- - -.. index:: - single: Instant - -This is a relatively new package (called pyinstant at sourceforge) -that builds on top of SWIG to make it easy to inline C and C++ code in -Python very much like weave. However, Instant builds extension modules -on the fly with specific module names and specific method names. In -this repsect it is more more like f2py in its behavior. The extension -modules are built on-the fly (as long as the SWIG is installed). They -can then be imported. Here is an example of using Instant with NumPy -arrays (adapted from the test2 included in the Instant distribution): - -.. code-block:: python - - code=""" - PyObject* add(PyObject* a_, PyObject* b_){ - /* - various checks - */ - PyArrayObject* a=(PyArrayObject*) a_; - PyArrayObject* b=(PyArrayObject*) b_; - int n = a->dimensions[0]; - int dims[1]; - dims[0] = n; - PyArrayObject* ret; - ret = (PyArrayObject*) PyArray_FromDims(1, dims, NPY_DOUBLE); - int i; - char *aj=a->data; - char *bj=b->data; - double *retj = (double *)ret->data; - for (i=0; i < n; i++) { - *retj++ = *((double *)aj) + *((double *)bj); - aj += a->strides[0]; - bj += b->strides[0]; - } - return (PyObject *)ret; - } - """ - import Instant, numpy - ext = Instant.Instant() - ext.create_extension(code=s, headers=["numpy/arrayobject.h"], - include_dirs=[numpy.get_include()], - init_code='import_array();', module="test2b_ext") - import test2b_ext - a = numpy.arange(1000) - b = numpy.arange(1000) - d = test2b_ext.add(a,b) - -Except perhaps for the dependence on SWIG, Instant is a -straightforward utility for writing extension modules. - - -PyInline --------- - -This is a much older module that allows automatic building of -extension modules so that C-code can be included with Python code. -It's latest release (version 0.03) was in 2001, and it appears that it -is not being updated. - - PyFort ------ diff --git a/doc/source/user/install.rst b/doc/source/user/install.rst index 29aeff6a3..dcf20498c 100644 --- a/doc/source/user/install.rst +++ b/doc/source/user/install.rst @@ -12,15 +12,15 @@ Windows ------- Good solutions for Windows are, `Enthought Canopy -<https://www.enthought.com/products/canopy/>`_ (which provides binary -installers for Windows, OS X and Linux) and `Python (x, y) -<http://www.pythonxy.com>`_. Both of these packages include Python, NumPy and -many additional packages. +<https://www.enthought.com/products/canopy/>`_, `Anaconda +<http://continuum.io/downloads.html>`_ (which both provide binary installers +for Windows, OS X and Linux) and `Python (x, y) <http://www.pythonxy.com>`_. +Both of these packages include Python, NumPy and many additional packages. A lightweight alternative is to download the Python installer from `www.python.org <http://www.python.org>`_ and the NumPy -installer for your Python version from the Sourceforge `download site <http:// -sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103>`_ +installer for your Python version from the Sourceforge `download site +<http://sourceforge.net/projects/numpy/files/NumPy`_. The NumPy installer includes binaries for different CPU's (without SSE instructions, with SSE2 or with SSE3) and installs the correct one @@ -28,25 +28,27 @@ automatically. If needed, this can be bypassed from the command line with :: numpy-<1.y.z>-superpack-win32.exe /arch nosse -or 'sse2' or 'sse3' instead of 'nosse'. +or ``sse2`` or ``sse3`` instead of ``nosse``. Linux ----- -Most of the major distributions provide packages for NumPy, but these can lag -behind the most recent NumPy release. Pre-built binary packages for Ubuntu are -available on the `scipy ppa -<https://edge.launchpad.net/~scipy/+archive/ppa>`_. Redhat binaries are -available in the `Enthought Canopy -<https://www.enthought.com/products/canopy/>`_. +All major distributions provide packages for NumPy. These are usually +reasonably up-to-date, but sometimes lag behind the most recent NumPy release. Mac OS X -------- -A universal binary installer for NumPy is available from the `download site -<http://sourceforge.net/project/showfiles.php?group_id=1369& -package_id=175103>`_. The `Enthought Canopy -<https://www.enthought.com/products/canopy/>`_ provides NumPy binaries. +Universal binary installers for NumPy are available from the `download site +<http://sourceforge.net/projects/numpy/files/NumPy`_, and wheel packages +from PyPi. With a recent version of `pip`<https://pip.pypa.io/en/latest/>`_ +this will give you a binary install (from the wheel packages) compatible with +at python.org Python, Homebrew and MacPorts:: + + pip install numpy + + +.. _building-from-source: Building from source ==================== @@ -82,7 +84,7 @@ Building NumPy requires the following software installed: Note that NumPy is developed mainly using GNU compilers. Compilers from other vendors such as Intel, Absoft, Sun, NAG, Compaq, Vast, Porland, Lahey, HP, IBM, Microsoft are only supported in the form of community - feedback, and may not work out of the box. GCC 3.x (and later) compilers + feedback, and may not work out of the box. GCC 4.x (and later) compilers are recommended. 3) Linear Algebra libraries @@ -93,6 +95,42 @@ Building NumPy requires the following software installed: can be used, including optimized LAPACK libraries such as ATLAS, MKL or the Accelerate/vecLib framework on OS X. +Basic Installation +------------------ + +To install NumPy run:: + + python setup.py install + +To perform an in-place build that can be run from the source folder run:: + + python setup.py build_ext --inplace + +The NumPy build system uses ``distutils`` and ``numpy.distutils``. +``setuptools`` is only used when building via ``pip`` or with ``python +setupegg.py``. Using ``virtualenv`` should work as expected. + +*Note: for build instructions to do development work on NumPy itself, see +:ref:`development-environment`*. + +.. _parallel-builds: + +Parallel builds +~~~~~~~~~~~~~~~ + +From NumPy 1.10.0 on it's also possible to do a parallel build with:: + + python setup.py build -j 4 install --prefix $HOME/.local + +This will compile numpy on 4 CPUs and install it into the specified prefix. +to perform a parallel in-place build, run:: + + python setup.py build_ext --inplace -j 4 + +The number of build jobs can also be specified via the environment variable +``NPY_NUM_BUILD_JOBS``. + + FORTRAN ABI mismatch -------------------- @@ -147,35 +185,10 @@ Additional compiler flags can be supplied by setting the ``OPT``, Building with ATLAS support --------------------------- -Ubuntu 8.10 (Intrepid) and 9.04 (Jaunty) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Ubuntu +~~~~~~ -You can install the necessary packages for optimized ATLAS with this command:: +You can install the necessary package for optimized ATLAS with this command:: sudo apt-get install libatlas-base-dev -If you have a recent CPU with SIMD suppport (SSE, SSE2, etc...), you should -also install the corresponding package for optimal performances. For example, -for SSE2:: - - sudo apt-get install libatlas3gf-sse2 - -This package is not available on amd64 platforms. - -*NOTE*: Ubuntu changed its default fortran compiler from g77 in Hardy to -gfortran in Intrepid. If you are building ATLAS from source and are upgrading -from Hardy to Intrepid or later versions, you should rebuild everything from -scratch, including lapack. - -Ubuntu 8.04 and lower -~~~~~~~~~~~~~~~~~~~~~ - -You can install the necessary packages for optimized ATLAS with this command:: - - sudo apt-get install atlas3-base-dev - -If you have a recent CPU with SIMD suppport (SSE, SSE2, etc...), you should -also install the corresponding package for optimal performances. For example, -for SSE2:: - - sudo apt-get install atlas3-sse2 |