diff options
-rw-r--r-- | doc/source/reference/arrays.dtypes.rst | 41 | ||||
-rw-r--r-- | doc/source/reference/arrays.scalars.rst | 14 | ||||
-rw-r--r-- | doc/source/reference/c-api.types-and-structures.rst | 4 | ||||
-rw-r--r-- | doc/source/user/basics.io.genfromtxt.rst | 10 |
4 files changed, 45 insertions, 24 deletions
diff --git a/doc/source/reference/arrays.dtypes.rst b/doc/source/reference/arrays.dtypes.rst index 594b94608..031fb6529 100644 --- a/doc/source/reference/arrays.dtypes.rst +++ b/doc/source/reference/arrays.dtypes.rst @@ -85,9 +85,9 @@ Sub-arrays always have a C-contiguous memory layout. A structured data type containing a 16-character string (in field 'name') and a sub-array of two 64-bit floating-point number (in field 'grades'): - >>> dt = np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))]) + >>> dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))]) >>> dt['name'] - dtype('|S16') + dtype('|U16') >>> dt['grades'] dtype(('float64',(2,))) @@ -178,12 +178,18 @@ Built-in Python types :class:`bool` :class:`bool\_` :class:`float` :class:`float\_` :class:`complex` :class:`cfloat` - :class:`str` :class:`string` + :class:`bytes` :class:`bytes\_` + :class:`str` :class:`bytes\_` (Python2) or :class:`unicode\_` (Python3) :class:`unicode` :class:`unicode\_` :class:`buffer` :class:`void` (all others) :class:`object_` ================ =============== + Note that ``str`` refers to either null terminated bytes or unicode strings + depending on the Python version. In code targetting both Python 2 and 3 + ``np.unicode_`` should be used as a dtype for strings. + See :ref:`Note on string types<string-dtype-note>`. + .. admonition:: Example >>> dt = np.dtype(float) # Python-compatible floating-point number @@ -225,7 +231,9 @@ Array-protocol type strings (see :ref:`arrays.interface`) supported kinds are ================ ======================== - ``'b'`` boolean + ``'?'`` boolean + ``'b'`` (signed) byte + ``'B'`` unsigned byte ``'i'`` (signed) integer ``'u'`` unsigned integer ``'f'`` floating-point @@ -233,8 +241,8 @@ Array-protocol type strings (see :ref:`arrays.interface`) ``'m'`` timedelta ``'M'`` datetime ``'O'`` (Python) objects - ``'S'``, ``'a'`` (byte-)string - ``'U'`` Unicode + ``'S'``, ``'a'`` zero-terminated bytes (not recommended) + ``'U'`` Unicode string ``'V'`` raw data (:class:`void`) ================ ======================== @@ -243,7 +251,19 @@ Array-protocol type strings (see :ref:`arrays.interface`) >>> dt = np.dtype('i4') # 32-bit signed integer >>> dt = np.dtype('f8') # 64-bit floating-point number >>> dt = np.dtype('c16') # 128-bit complex floating-point number - >>> dt = np.dtype('a25') # 25-character string + >>> dt = np.dtype('a25') # 25-length zero-terminated bytes + >>> dt = np.dtype('U25') # 25-character string + + .. _string-dtype-note: + + .. admonition:: Note on string types + + For backward compatibility with Python 2 the ``S`` and ``a`` typestrings + remain zero-terminated bytes and ``np.string_`` continues to map to + ``np.bytes_``. + To use actual strings in Python 3 use ``U`` or ``np.unicode_``. + For signed bytes that do not need zero-termination ``b`` or ``i1`` can be + used. String with comma-separated fields @@ -297,8 +317,7 @@ Type strings .. admonition:: Example - >>> dt = np.dtype((void, 10)) # 10-byte wide data block - >>> dt = np.dtype((str, 35)) # 35-character string + >>> dt = np.dtype((np.void, 10)) # 10-byte wide data block >>> dt = np.dtype(('U', 10)) # 10-character unicode string ``(fixed_dtype, shape)`` @@ -315,7 +334,7 @@ Type strings .. admonition:: Example >>> dt = np.dtype((np.int32, (2,2))) # 2 x 2 integer sub-array - >>> dt = np.dtype(('S10', 1)) # 10-character string + >>> dt = np.dtype(('U10', 1)) # 10-character string >>> dt = np.dtype(('i4, (2,3)f8, f4', (2,3))) # 2 x 3 structured sub-array .. index:: @@ -421,7 +440,7 @@ Type strings byte position 0), ``col2`` (32-bit float at byte position 10), and ``col3`` (integers at byte position 14): - >>> dt = np.dtype({'col1': ('S10', 0), 'col2': (float32, 10), + >>> dt = np.dtype({'col1': ('U10', 0), 'col2': (float32, 10), 'col3': (int, 14)}) ``(base_dtype, new_dtype)`` diff --git a/doc/source/reference/arrays.scalars.rst b/doc/source/reference/arrays.scalars.rst index f76087ce2..9c4f05f75 100644 --- a/doc/source/reference/arrays.scalars.rst +++ b/doc/source/reference/arrays.scalars.rst @@ -71,7 +71,7 @@ Array scalar type Related Python type :class:`int_` :class:`IntType` (Python 2 only) :class:`float_` :class:`FloatType` :class:`complex_` :class:`ComplexType` -:class:`str_` :class:`StringType` +:class:`bytes_` :class:`BytesType` :class:`unicode_` :class:`UnicodeType` ==================== ================================ @@ -193,15 +193,17 @@ size: the data they describe can be of different length in different arrays. (In the character codes ``#`` is an integer denoting how many elements the data type consists of.) -=================== ============================= ======== -:class:`str_` compatible: Python str ``'S#'`` -:class:`unicode_` compatible: Python unicode ``'U#'`` -:class:`void` ``'V#'`` -=================== ============================= ======== +=================== ============================== ======== +:class:`bytes_` compatible: Python bytes ``'S#'`` +:class:`unicode_` compatible: Python unicode/str ``'U#'`` +:class:`void` ``'V#'`` +=================== ============================== ======== .. warning:: + See :ref:`Note on string types<string-dtype-note>`. + Numeric Compatibility: If you used old typecode characters in your Numeric code (which was never recommended), you will need to change some of them to the new characters. In particular, the needed diff --git a/doc/source/reference/c-api.types-and-structures.rst b/doc/source/reference/c-api.types-and-structures.rst index 2116a9912..255c348f9 100644 --- a/doc/source/reference/c-api.types-and-structures.rst +++ b/doc/source/reference/c-api.types-and-structures.rst @@ -218,7 +218,7 @@ PyArrayDescr_Type interface typestring notation). A 'b' represents Boolean, a 'i' represents signed integer, a 'u' represents unsigned integer, 'f' represents floating point, 'c' represents complex floating point, 'S' - represents 8-bit character string, 'U' represents 32-bit/character + represents 8-bit zero-terminated bytes, 'U' represents 32-bit/character unicode string, and 'V' represents arbitrary. .. c:member:: char PyArray_Descr.type @@ -300,7 +300,7 @@ PyArrayDescr_Type .. c:function:: PyDataType_REFCHK(PyArray_Descr *dtype) Equivalent to :c:func:`PyDataType_FLAGCHK` (*dtype*, - :c:data:`NPY_ITEM_REFCOUNT`). + :c:data:`NPY_ITEM_REFCOUNT`). .. c:member:: int PyArray_Descr.type_num diff --git a/doc/source/user/basics.io.genfromtxt.rst b/doc/source/user/basics.io.genfromtxt.rst index 5870b5af2..1048ab725 100644 --- a/doc/source/user/basics.io.genfromtxt.rst +++ b/doc/source/user/basics.io.genfromtxt.rst @@ -96,15 +96,15 @@ This behavior can be overwritten by setting the optional argument >>> data = "1, abc , 2\n 3, xxx, 4" >>> # Without autostrip - >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|S5") + >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|U5") array([['1', ' abc ', ' 2'], ['3', ' xxx', ' 4']], - dtype='|S5') + dtype='|U5') >>> # With autostrip - >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|S5", autostrip=True) + >>> np.genfromtxt(BytesIO(data), delimiter=",", dtype="|U5", autostrip=True) array([['1', 'abc', '2'], ['3', 'xxx', '4']], - dtype='|S5') + dtype='|U5') The :keyword:`comments` argument @@ -212,7 +212,7 @@ Acceptable values for this argument are: (see below). Note that ``dtype=float`` is the default for :func:`~numpy.genfromtxt`. * a sequence of types, such as ``dtype=(int, float, float)``. -* a comma-separated string, such as ``dtype="i4,f8,|S3"``. +* a comma-separated string, such as ``dtype="i4,f8,|U3"``. * a dictionary with two keys ``'names'`` and ``'formats'``. * a sequence of tuples ``(name, type)``, such as ``dtype=[('A', int), ('B', float)]``. |