summaryrefslogtreecommitdiff
path: root/doc/source
diff options
context:
space:
mode:
authorJulian Taylor <jtaylor.debian@googlemail.com>2017-04-13 20:24:04 +0200
committerJulian Taylor <jtaylor.debian@googlemail.com>2017-04-22 12:15:55 +0200
commit0107956102d914a68f157d80614f207f78dffb96 (patch)
treeb812f602843b45ad7091db8a6377c9130b54a9e3 /doc/source
parent0f3846aaabc7c1278e9aeab7aafa0b8d4a1b264e (diff)
downloadnumpy-0107956102d914a68f157d80614f207f78dffb96.tar.gz
DOC: stop refering to 'S' dtype as string
The S dtype is zero terminated bytes which happen to match what Python 2 called strings. As this is not the case in Python 3 we should stop naming it wrong in our documentation. [ci skip]
Diffstat (limited to 'doc/source')
-rw-r--r--doc/source/reference/arrays.dtypes.rst41
-rw-r--r--doc/source/reference/arrays.scalars.rst14
-rw-r--r--doc/source/reference/c-api.types-and-structures.rst4
-rw-r--r--doc/source/user/basics.io.genfromtxt.rst10
4 files changed, 45 insertions, 24 deletions
diff --git a/doc/source/reference/arrays.dtypes.rst b/doc/source/reference/arrays.dtypes.rst
index 594b94608..031fb6529 100644
--- a/doc/source/reference/arrays.dtypes.rst
+++ b/doc/source/reference/arrays.dtypes.rst
@@ -85,9 +85,9 @@ Sub-arrays always have a C-contiguous memory layout.
A structured data type containing a 16-character string (in field 'name')
and a sub-array of two 64-bit floating-point number (in field 'grades'):
- >>> dt = np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))])
+ >>> dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))])
>>> dt['name']
- dtype('|S16')
+ dtype('|U16')
>>> dt['grades']
dtype(('float64',(2,)))
@@ -178,12 +178,18 @@ Built-in Python types
:class:`bool` :class:`bool\_`
:class:`float` :class:`float\_`
:class:`complex` :class:`cfloat`
- :class:`str` :class:`string`
+ :class:`bytes` :class:`bytes\_`
+ :class:`str` :class:`bytes\_` (Python2) or :class:`unicode\_` (Python3)
:class:`unicode` :class:`unicode\_`
:class:`buffer` :class:`void`
(all others) :class:`object_`
================ ===============
+ Note that ``str`` refers to either null terminated bytes or unicode strings
+ depending on the Python version. In code targetting both Python 2 and 3
+ ``np.unicode_`` should be used as a dtype for strings.
+ See :ref:`Note on string types<string-dtype-note>`.
+
.. admonition:: Example
>>> dt = np.dtype(float) # Python-compatible floating-point number
@@ -225,7 +231,9 @@ Array-protocol type strings (see :ref:`arrays.interface`)
supported kinds are
================ ========================
- ``'b'`` boolean
+ ``'?'`` boolean
+ ``'b'`` (signed) byte
+ ``'B'`` unsigned byte
``'i'`` (signed) integer
``'u'`` unsigned integer
``'f'`` floating-point
@@ -233,8 +241,8 @@ Array-protocol type strings (see :ref:`arrays.interface`)
``'m'`` timedelta
``'M'`` datetime
``'O'`` (Python) objects
- ``'S'``, ``'a'`` (byte-)string
- ``'U'`` Unicode
+ ``'S'``, ``'a'`` zero-terminated bytes (not recommended)
+ ``'U'`` Unicode string
``'V'`` raw data (:class:`void`)
================ ========================
@@ -243,7 +251,19 @@ Array-protocol type strings (see :ref:`arrays.interface`)
>>> dt = np.dtype('i4') # 32-bit signed integer
>>> dt = np.dtype('f8') # 64-bit floating-point number
>>> dt = np.dtype('c16') # 128-bit complex floating-point number
- >>> dt = np.dtype('a25') # 25-character string
+ >>> dt = np.dtype('a25') # 25-length zero-terminated bytes
+ >>> dt = np.dtype('U25') # 25-character string
+
+ .. _string-dtype-note:
+
+ .. admonition:: Note on string types
+
+ For backward compatibility with Python 2 the ``S`` and ``a`` typestrings
+ remain zero-terminated bytes and ``np.string_`` continues to map to
+ ``np.bytes_``.
+ To use actual strings in Python 3 use ``U`` or ``np.unicode_``.
+ For signed bytes that do not need zero-termination ``b`` or ``i1`` can be
+ used.
String with comma-separated fields
@@ -297,8 +317,7 @@ Type strings
.. admonition:: Example
- >>> dt = np.dtype((void, 10)) # 10-byte wide data block
- >>> dt = np.dtype((str, 35)) # 35-character string
+ >>> dt = np.dtype((np.void, 10)) # 10-byte wide data block
>>> dt = np.dtype(('U', 10)) # 10-character unicode string
``(fixed_dtype, shape)``
@@ -315,7 +334,7 @@ Type strings
.. admonition:: Example
>>> dt = np.dtype((np.int32, (2,2))) # 2 x 2 integer sub-array
- >>> dt = np.dtype(('S10', 1)) # 10-character string
+ >>> dt = np.dtype(('U10', 1)) # 10-character string
>>> dt = np.dtype(('i4, (2,3)f8, f4', (2,3))) # 2 x 3 structured sub-array
.. index::
@@ -421,7 +440,7 @@ Type strings
byte position 0), ``col2`` (32-bit float at byte position 10),
and ``col3`` (integers at byte position 14):
- >>> dt = np.dtype({'col1': ('S10', 0), 'col2': (float32, 10),
+ >>> dt = np.dtype({'col1': ('U10', 0), 'col2': (float32, 10),
'col3': (int, 14)})
``(base_dtype, new_dtype)``
diff --git a/doc/source/reference/arrays.scalars.rst b/doc/source/reference/arrays.scalars.rst
index f76087ce2..9c4f05f75 100644
--- a/doc/source/reference/arrays.scalars.rst
+++ b/doc/source/reference/arrays.scalars.rst
@@ -71,7 +71,7 @@ Array scalar type Related Python type
:class:`int_` :class:`IntType` (Python 2 only)
:class:`float_` :class:`FloatType`
:class:`complex_` :class:`ComplexType`
-:class:`str_` :class:`StringType`
+:class:`bytes_` :class:`BytesType`
:class:`unicode_` :class:`UnicodeType`
==================== ================================
@@ -193,15 +193,17 @@ size: the data they describe can be of different length in different
arrays. (In the character codes ``#`` is an integer denoting how many
elements the data type consists of.)
-=================== ============================= ========
-:class:`str_` compatible: Python str ``'S#'``
-:class:`unicode_` compatible: Python unicode ``'U#'``
-:class:`void` ``'V#'``
-=================== ============================= ========
+=================== ============================== ========
+:class:`bytes_` compatible: Python bytes ``'S#'``
+:class:`unicode_` compatible: Python unicode/str ``'U#'``
+:class:`void` ``'V#'``
+=================== ============================== ========
.. warning::
+ See :ref:`Note on string types<string-dtype-note>`.
+
Numeric Compatibility: If you used old typecode characters in your
Numeric code (which was never recommended), you will need to change
some of them to the new characters. In particular, the needed
diff --git a/doc/source/reference/c-api.types-and-structures.rst b/doc/source/reference/c-api.types-and-structures.rst
index 2116a9912..255c348f9 100644
--- a/doc/source/reference/c-api.types-and-structures.rst
+++ b/doc/source/reference/c-api.types-and-structures.rst
@@ -218,7 +218,7 @@ PyArrayDescr_Type
interface typestring notation). A 'b' represents Boolean, a 'i'
represents signed integer, a 'u' represents unsigned integer, 'f'
represents floating point, 'c' represents complex floating point, 'S'
- represents 8-bit character string, 'U' represents 32-bit/character
+ represents 8-bit zero-terminated bytes, 'U' represents 32-bit/character
unicode string, and 'V' represents arbitrary.
.. c:member:: char PyArray_Descr.type
@@ -300,7 +300,7 @@ PyArrayDescr_Type
.. c:function:: PyDataType_REFCHK(PyArray_Descr *dtype)
Equivalent to :c:func:`PyDataType_FLAGCHK` (*dtype*,
- :c:data:`NPY_ITEM_REFCOUNT`).
+ :c:data:`NPY_ITEM_REFCOUNT`).
.. c:member:: int PyArray_Descr.type_num
diff --git a/doc/source/user/basics.io.genfromtxt.rst b/doc/source/user/basics.io.genfromtxt.rst
index 5870b5af2..1048ab725 100644
--- a/doc/source/user/basics.io.genfromtxt.rst
+++ b/doc/source/user/basics.io.genfromtxt.rst
@@ -96,15 +96,15 @@ This behavior can be overwritten by setting the optional argument
>>> data = "1, abc , 2\n 3, xxx, 4"
>>> # Without autostrip
- >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|S5")
+ >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|U5")
array([['1', ' abc ', ' 2'],
['3', ' xxx', ' 4']],
- dtype='|S5')
+ dtype='|U5')
>>> # With autostrip
- >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|S5", autostrip=True)
+ >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|U5", autostrip=True)
array([['1', 'abc', '2'],
['3', 'xxx', '4']],
- dtype='|S5')
+ dtype='|U5')
The :keyword:`comments` argument
@@ -212,7 +212,7 @@ Acceptable values for this argument are:
(see below). Note that ``dtype=float`` is the default for
:func:`~numpy.genfromtxt`.
* a sequence of types, such as ``dtype=(int, float, float)``.
-* a comma-separated string, such as ``dtype="i4,f8,|S3"``.
+* a comma-separated string, such as ``dtype="i4,f8,|U3"``.
* a dictionary with two keys ``'names'`` and ``'formats'``.
* a sequence of tuples ``(name, type)``, such as
``dtype=[('A', int), ('B', float)]``.