DOC: stop refering to 'S' dtype as string

The S dtype is zero terminated bytes which happen to match what Python 2 called strings. As this is not the case in Python 3 we should stop naming it wrong in our documentation. [ci skip]
author: Julian Taylor <jtaylor.debian@googlemail.com> 2017-04-13 20:24:04 +0200
committer: Julian Taylor <jtaylor.debian@googlemail.com> 2017-04-22 12:15:55 +0200
commit: 0107956102d914a68f157d80614f207f78dffb96 (patch)
tree: b812f602843b45ad7091db8a6377c9130b54a9e3 /doc/source
parent: 0f3846aaabc7c1278e9aeab7aafa0b8d4a1b264e (diff)
download: numpy-0107956102d914a68f157d80614f207f78dffb96.tar.gz
4 files changed, 45 insertions, 24 deletions
diff --git a/doc/source/reference/arrays.dtypes.rst b/doc/source/reference/arrays.dtypes.rst
index 594b94608..031fb6529 100644
--- a/doc/source/reference/arrays.dtypes.rst
+++ b/doc/source/reference/arrays.dtypes.rst
@@ -85,9 +85,9 @@ Sub-arrays always have a C-contiguous memory layout.
    A structured data type containing a 16-character string (in field 'name')
    and a sub-array of two 64-bit floating-point number (in field 'grades'):
 
-   >>> dt = np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))])
+   >>> dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))])
    >>> dt['name']
-   dtype('|S16')
+   dtype('|U16')
    >>> dt['grades']
    dtype(('float64',(2,)))
 
@@ -178,12 +178,18 @@ Built-in Python types
     :class:`bool`     :class:`bool\_`
     :class:`float`    :class:`float\_`
     :class:`complex`  :class:`cfloat`
-    :class:`str`      :class:`string`
+    :class:`bytes`    :class:`bytes\_`
+    :class:`str`      :class:`bytes\_` (Python2) or :class:`unicode\_` (Python3)
     :class:`unicode`  :class:`unicode\_`
     :class:`buffer`   :class:`void`
     (all others)      :class:`object_`
     ================  ===============
 
+    Note that ``str`` refers to either null terminated bytes or unicode strings
+    depending on the Python version. In code targetting both Python 2 and 3
+    ``np.unicode_`` should be used as a dtype for strings.
+    See :ref:`Note on string types<string-dtype-note>`.
+
     .. admonition:: Example
 
        >>> dt = np.dtype(float)   # Python-compatible floating-point number
@@ -225,7 +231,9 @@ Array-protocol type strings (see :ref:`arrays.interface`)
    supported kinds are
 
    ================   ========================
-   ``'b'``            boolean
+   ``'?'``            boolean
+   ``'b'``            (signed) byte
+   ``'B'``            unsigned byte
    ``'i'``            (signed) integer
    ``'u'``            unsigned integer
    ``'f'``            floating-point
@@ -233,8 +241,8 @@ Array-protocol type strings (see :ref:`arrays.interface`)
    ``'m'``            timedelta
    ``'M'``            datetime
    ``'O'``            (Python) objects
-   ``'S'``, ``'a'``   (byte-)string
-   ``'U'``            Unicode
+   ``'S'``, ``'a'``   zero-terminated bytes (not recommended)
+   ``'U'``            Unicode string
    ``'V'``            raw data (:class:`void`)
    ================   ========================
 
@@ -243,7 +251,19 @@ Array-protocol type strings (see :ref:`arrays.interface`)
       >>> dt = np.dtype('i4')   # 32-bit signed integer
       >>> dt = np.dtype('f8')   # 64-bit floating-point number
       >>> dt = np.dtype('c16')  # 128-bit complex floating-point number
-      >>> dt = np.dtype('a25')  # 25-character string
+      >>> dt = np.dtype('a25')  # 25-length zero-terminated bytes
+      >>> dt = np.dtype('U25')  # 25-character string
+
+   .. _string-dtype-note:
+
+   .. admonition:: Note on string types
+
+    For backward compatibility with Python 2 the ``S`` and ``a`` typestrings
+    remain zero-terminated bytes and ``np.string_`` continues to map to
+    ``np.bytes_``.
+    To use actual strings in Python 3 use ``U`` or ``np.unicode_``.
+    For signed bytes that do not need zero-termination ``b`` or ``i1`` can be
+    used.
 
 String with comma-separated fields
 
@@ -297,8 +317,7 @@ Type strings
 
     .. admonition:: Example
 
-       >>> dt = np.dtype((void, 10))  # 10-byte wide data block
-       >>> dt = np.dtype((str, 35))   # 35-character string
+       >>> dt = np.dtype((np.void, 10))  # 10-byte wide data block
        >>> dt = np.dtype(('U', 10))   # 10-character unicode string
 
 ``(fixed_dtype, shape)``
@@ -315,7 +334,7 @@ Type strings
     .. admonition:: Example
 
        >>> dt = np.dtype((np.int32, (2,2)))          # 2 x 2 integer sub-array
-       >>> dt = np.dtype(('S10', 1))                 # 10-character string
+       >>> dt = np.dtype(('U10', 1))                 # 10-character string
        >>> dt = np.dtype(('i4, (2,3)f8, f4', (2,3))) # 2 x 3 structured sub-array
 
 .. index::
@@ -421,7 +440,7 @@ Type strings
        byte position 0), ``col2`` (32-bit float at byte position 10),
        and ``col3`` (integers at byte position 14):
 
-       >>> dt = np.dtype({'col1': ('S10', 0), 'col2': (float32, 10),
+       >>> dt = np.dtype({'col1': ('U10', 0), 'col2': (float32, 10),
            'col3': (int, 14)})
 
 ``(base_dtype, new_dtype)``
diff --git a/doc/source/reference/arrays.scalars.rst b/doc/source/reference/arrays.scalars.rst
index f76087ce2..9c4f05f75 100644
--- a/doc/source/reference/arrays.scalars.rst
+++ b/doc/source/reference/arrays.scalars.rst
@@ -71,7 +71,7 @@ Array scalar type     Related Python type
 :class:`int_`         :class:`IntType` (Python 2 only)
 :class:`float_`       :class:`FloatType`
 :class:`complex_`     :class:`ComplexType`
-:class:`str_`         :class:`StringType`
+:class:`bytes_`       :class:`BytesType`
 :class:`unicode_`     :class:`UnicodeType`
 ====================  ================================
 
@@ -193,15 +193,17 @@ size: the data they describe can be of different length in different
 arrays. (In the character codes ``#`` is an integer denoting how many
 elements the data type consists of.)
 
-===================  =============================  ========
-:class:`str_`        compatible: Python str         ``'S#'``
-:class:`unicode_`    compatible: Python unicode     ``'U#'``
-:class:`void`                                       ``'V#'``
-===================  =============================  ========
+===================  ==============================  ========
+:class:`bytes_`      compatible: Python bytes        ``'S#'``
+:class:`unicode_`    compatible: Python unicode/str  ``'U#'``
+:class:`void`                                        ``'V#'``
+===================  ==============================  ========
 
 
 .. warning::
 
+   See :ref:`Note on string types<string-dtype-note>`.
+
    Numeric Compatibility: If you used old typecode characters in your
    Numeric code (which was never recommended), you will need to change
    some of them to the new characters. In particular, the needed
diff --git a/doc/source/reference/c-api.types-and-structures.rst b/doc/source/reference/c-api.types-and-structures.rst
index 2116a9912..255c348f9 100644
--- a/doc/source/reference/c-api.types-and-structures.rst
+++ b/doc/source/reference/c-api.types-and-structures.rst
@@ -218,7 +218,7 @@ PyArrayDescr_Type
     interface typestring notation). A 'b' represents Boolean, a 'i'
     represents signed integer, a 'u' represents unsigned integer, 'f'
     represents floating point, 'c' represents complex floating point, 'S'
-    represents 8-bit character string, 'U' represents 32-bit/character
+    represents 8-bit zero-terminated bytes, 'U' represents 32-bit/character
     unicode string, and 'V' represents arbitrary.
 
 .. c:member:: char PyArray_Descr.type
@@ -300,7 +300,7 @@ PyArrayDescr_Type
     .. c:function:: PyDataType_REFCHK(PyArray_Descr *dtype)
 
         Equivalent to :c:func:`PyDataType_FLAGCHK` (*dtype*,
- 	:c:data:`NPY_ITEM_REFCOUNT`).
+        :c:data:`NPY_ITEM_REFCOUNT`).
 
 .. c:member:: int PyArray_Descr.type_num
 
diff --git a/doc/source/user/basics.io.genfromtxt.rst b/doc/source/user/basics.io.genfromtxt.rst
index 5870b5af2..1048ab725 100644
--- a/doc/source/user/basics.io.genfromtxt.rst
+++ b/doc/source/user/basics.io.genfromtxt.rst
@@ -96,15 +96,15 @@ This behavior can be overwritten by setting the optional argument
 
    >>> data = "1, abc , 2\n 3, xxx, 4"
    >>> # Without autostrip
-   >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|S5")
+   >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|U5")
    array([['1', ' abc ', ' 2'],
           ['3', ' xxx', ' 4']],
-         dtype='|S5')
+         dtype='|U5')
    >>> # With autostrip
-   >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|S5", autostrip=True)
+   >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|U5", autostrip=True)
    array([['1', 'abc', '2'],
           ['3', 'xxx', '4']],
-         dtype='|S5')
+         dtype='|U5')
 
 
 The :keyword:`comments` argument
@@ -212,7 +212,7 @@ Acceptable values for this argument are:
   (see below).  Note that ``dtype=float`` is the default for
   :func:`~numpy.genfromtxt`.
 * a sequence of types, such as ``dtype=(int, float, float)``.
-* a comma-separated string, such as ``dtype="i4,f8,|S3"``.
+* a comma-separated string, such as ``dtype="i4,f8,|U3"``.
 * a dictionary with two keys ``'names'`` and ``'formats'``.
 * a sequence of tuples ``(name, type)``, such as
   ``dtype=[('A', int), ('B', float)]``.
author	Julian Taylor <jtaylor.debian@googlemail.com>	2017-04-13 20:24:04 +0200
committer	Julian Taylor <jtaylor.debian@googlemail.com>	2017-04-22 12:15:55 +0200
commit	0107956102d914a68f157d80614f207f78dffb96 (patch)
tree	b812f602843b45ad7091db8a6377c9130b54a9e3 /doc/source
parent	0f3846aaabc7c1278e9aeab7aafa0b8d4a1b264e (diff)
download	numpy-0107956102d914a68f157d80614f207f78dffb96.tar.gz