Docstring update: lib

author: Pauli Virtanen <pav@iki.fi> 2009-10-02 19:33:33 +0000
committer: Pauli Virtanen <pav@iki.fi> 2009-10-02 19:33:33 +0000
commit: 474d013a3b38c5909a7381cfa0cc2c8203807cfa (patch)
tree: af895af917b636c1a0ddcf94a7134052a6d6e55e /numpy/lib/format.py
parent: f1e3392d6d8813ed146ce1675f65a880634f727b (diff)
download: numpy-474d013a3b38c5909a7381cfa0cc2c8203807cfa.tar.gz
1 files changed, 114 insertions, 35 deletions
diff --git a/numpy/lib/format.py b/numpy/lib/format.py
index 28444613c..3c5fe3209 100644
--- a/numpy/lib/format.py
+++ b/numpy/lib/format.py
@@ -2,57 +2,136 @@
 Define a simple format for saving numpy arrays to disk with the full
 information about them.
 
-WARNING: Due to limitations in the interpretation of structured dtypes, dtypes
-with fields with empty names will have the names replaced by 'f0', 'f1', etc.
-Such arrays will not round-trip through the format entirely accurately. The
-data is intact; only the field names will differ. We are working on a fix for
-this.  This fix will not require a change in the file format. The arrays with
-such structures can still be saved and restored, and the correct dtype may be
-restored by using the `loadedarray.view(correct_dtype)` method.
+The ``.npy`` format is the standard binary file format in NumPy for
+persisting a *single* arbitrary NumPy array on disk. The format stores all
+of the shape and dtype information necessary to reconstruct the array
+correctly even on another machine with a different architecture.
+The format is designed to be as simple as possible while achieving
+its limited goals.
+
+The ``.npz`` format is the standard format for persisting *multiple* NumPy
+arrays on disk. A ``.npz`` file is a zip file containing multiple ``.npy``
+files, one for each array.
+
+Capabilities
+------------
+
+- Can represent all NumPy arrays including nested record arrays and
+  object arrays.
+
+- Represents the data in its native binary form.
+
+- Supports Fortran-contiguous arrays directly.
+
+- Stores all of the necessary information to reconstruct the array
+  including shape and dtype on a machine of a different
+  architecture.  Both little-endian and big-endian arrays are
+  supported, and a file with little-endian numbers will yield
+  a little-endian array on any machine reading the file. The
+  types are described in terms of their actual sizes. For example,
+  if a machine with a 64-bit C "long int" writes out an array with
+  "long ints", a reading machine with 32-bit C "long ints" will yield
+  an array with 64-bit integers.
+
+- Is straightforward to reverse engineer. Datasets often live longer than
+  the programs that created them. A competent developer should be
+  able create a solution in his preferred programming language to
+  read most ``.npy`` files that he has been given without much
+  documentation.
+
+- Allows memory-mapping of the data. See `open_memmep`.
+
+- Can be read from a filelike stream object instead of an actual file.
+
+- Stores object arrays, i.e. arrays containing elements that are arbitrary
+  Python objects. Files with object arrays are not to be mmapable, but
+  can be read and written to disk.
+
+Limitations
+-----------
+
+- Arbitrary subclasses of numpy.ndarray are not completely preserved.
+  Subclasses will be accepted for writing, but only the array data will
+  be written out. A regular numpy.ndarray object will be created
+  upon reading the file.
+
+.. warning::
+
+  Due to limitations in the interpretation of structured dtypes, dtypes
+  with fields with empty names will have the names replaced by 'f0', 'f1',
+  etc. Such arrays will not round-trip through the format entirely
+  accurately. The data is intact; only the field names will differ. We are
+  working on a fix for this. This fix will not require a change in the
+  file format. The arrays with such structures can still be saved and
+  restored, and the correct dtype may be restored by using the
+  ``loadedarray.view(correct_dtype)`` method.
+
+File extensions
+---------------
+
+We recommend using the ``.npy`` and ``.npz`` extensions for files saved
+in this format. This is by no means a requirement; applications may wish
+to use these file formats but use an extension specific to the
+application. In the absence of an obvious alternative, however,
+we suggest using ``.npy`` and ``.npz``.
+
+Version numbering
+-----------------
+
+The version numbering of these formats is independent of NumPy version
+numbering. If the format is upgraded, the code in `numpy.io` will still
+be able to read and write Version 1.0 files.
 
 Format Version 1.0
 ------------------
 
-The first 6 bytes are a magic string: exactly "\\\\x93NUMPY".
+The first 6 bytes are a magic string: exactly ``\\x93NUMPY``.
 
 The next 1 byte is an unsigned byte: the major version number of the file
-format, e.g. \\\\x01.
+format, e.g. ``\\x01``.
 
 The next 1 byte is an unsigned byte: the minor version number of the file
-format, e.g. \\\\x00. Note: the version of the file format is not tied to the
-version of the numpy package.
+format, e.g. ``\\x00``. Note: the version of the file format is not tied
+to the version of the numpy package.
 
-The next 2 bytes form a little-endian unsigned short int: the length of the
-header data HEADER_LEN.
+The next 2 bytes form a little-endian unsigned short int: the length of
+the header data HEADER_LEN.
 
-The next HEADER_LEN bytes form the header data describing the array's format.
-It is an ASCII string which contains a Python literal expression of a
-dictionary. It is terminated by a newline ('\\\\n') and padded with spaces
-('\\\\x20') to make the total length of the magic string + 4 + HEADER_LEN be
-evenly divisible by 16 for alignment purposes.
+The next HEADER_LEN bytes form the header data describing the array's
+format. It is an ASCII string which contains a Python literal expression
+of a dictionary. It is terminated by a newline (``\\n``) and padded with
+spaces (``\\x20``) to make the total length of
+``magic string + 4 + HEADER_LEN`` be evenly divisible by 16 for alignment
+purposes.
 
 The dictionary contains three keys:
 
     "descr" : dtype.descr
-        An object that can be passed as an argument to the numpy.dtype()
-        constructor to create the array's dtype.
+      An object that can be passed as an argument to the `numpy.dtype`
+      constructor to create the array's dtype.
     "fortran_order" : bool
-        Whether the array data is Fortran-contiguous or not. Since
-        Fortran-contiguous arrays are a common form of non-C-contiguity, we
-        allow them to be written directly to disk for efficiency.
+      Whether the array data is Fortran-contiguous or not. Since
+      Fortran-contiguous arrays are a common form of non-C-contiguity,
+      we allow them to be written directly to disk for efficiency.
     "shape" : tuple of int
-        The shape of the array.
-
-For repeatability and readability, the dictionary keys are sorted in alphabetic
-order. This is for convenience only. A writer SHOULD implement this if
-possible. A reader MUST NOT depend on this.
-
-Following the header comes the array data. If the dtype contains Python objects
-(i.e. dtype.hasobject is True), then the data is a Python pickle of the array.
-Otherwise the data is the contiguous (either C- or Fortran-, depending on
-fortran_order) bytes of the array. Consumers can figure out the number of bytes
-by multiplying the number of elements given by the shape (noting that shape=()
-means there is 1 element) by dtype.itemsize.
+      The shape of the array.
+
+For repeatability and readability, the dictionary keys are sorted in
+alphabetic order. This is for convenience only. A writer SHOULD implement
+this if possible. A reader MUST NOT depend on this.
+
+Following the header comes the array data. If the dtype contains Python
+objects (i.e. ``dtype.hasobject is True``), then the data is a Python
+pickle of the array. Otherwise the data is the contiguous (either C-
+or Fortran-, depending on ``fortran_order``) bytes of the array.
+Consumers can figure out the number of bytes by multiplying the number
+of elements given by the shape (noting that ``shape=()`` means there is
+1 element) by ``dtype.itemsize``.
+
+Notes
+-----
+The ``.npy`` format, including reasons for creating it and a comparison of
+alternatives, is described fully in the "npy-format" NEP.
 
 """
author	Pauli Virtanen <pav@iki.fi>	2009-10-02 19:33:33 +0000
committer	Pauli Virtanen <pav@iki.fi>	2009-10-02 19:33:33 +0000
commit	474d013a3b38c5909a7381cfa0cc2c8203807cfa (patch)
tree	af895af917b636c1a0ddcf94a7134052a6d6e55e /numpy/lib/format.py
parent	f1e3392d6d8813ed146ce1675f65a880634f727b (diff)
download	numpy-474d013a3b38c5909a7381cfa0cc2c8203807cfa.tar.gz