ENH: add byteswapping tutorial

author: Matthew Brett <matthew.brett@gmail.com> 2009-10-03 16:32:21 +0000
committer: Matthew Brett <matthew.brett@gmail.com> 2009-10-03 16:32:21 +0000
commit: 3e1dd88095d287f059a87e1746da47c968b3d707 (patch)
tree: 175ccd74914f23bcfe407eaaca5f46572315d0b2 /numpy/doc/byteswapping.py
parent: d9665935dddea327ab47bec40d12813f7c117c78 (diff)
download: numpy-3e1dd88095d287f059a87e1746da47c968b3d707.tar.gz
1 files changed, 137 insertions, 0 deletions
diff --git a/numpy/doc/byteswapping.py b/numpy/doc/byteswapping.py
new file mode 100644
index 000000000..4cbdbf47f
--- /dev/null
+++ b/numpy/doc/byteswapping.py
@@ -0,0 +1,137 @@
+'''
+
+=============================
+ Byteswapping and byte order
+=============================
+
+Introduction to byte ordering and ndarrays
+==========================================
+
+``ndarrays`` are objects that provide a python array interface to data
+in memory.
+
+It often happens that the memory that you want to view with an array is
+not of the same byte ordering as the computer on which you are running
+Python.
+
+For example, I might be working on a computer with a little-endian CPU -
+such as an Intel Pentium, but I have loaded some data from a file
+written by a computer that is big-endian.  Let's say I have loaded 4
+bytes from a file written by a Sun (big-endian) computer.  I know that
+these 4 bytes represent two 16-bit integers.  On a big-endian machine, a
+two-byte integer is stored with the Most Significant Byte (MSB) first,
+and then the Least Significant Byte (LSB). Thus the bytes are, in memory order:
+
+#. MSB integer 1
+#. LSB integer 1
+#. MSB integer 2
+#. LSB integer 2
+
+Let's say the two integers were in fact 1 and 770.  Because 770 = 256 *
+3 + 2, the 4 bytes in memory would contain respectively: 0, 1, 3, 2.
+The bytes I have loaded from the file would have these contents:
+
+>>> big_end_str = chr(0) + chr(1) + chr(3) + chr(2)
+>>> big_end_str
+'\x00\x01\x03\x02'
+
+We might want to use an ``ndarray`` to access these integers.  In that
+case, we can create an array around this memory, and tell numpy that
+there are two integers, and that they are 16 bit and big-endian:
+
+>>> import numpy as np
+>>> big_end_arr = np.ndarray(shape=(2,),dtype='>i2', buffer=big_end_str)
+>>> big_end_arr[0]
+1
+>>> big_end_arr[1]
+770
+
+Note the array ``dtype`` above of ``>i2``.  The ``>`` means 'big-endian'
+(``<`` is little-endian) and ``i2`` means 'signed 2-byte integer'.  For
+example, if our data represented a single unsigned 4-byte little-endian
+integer, the dtype string would be ``<u4``.
+
+In fact, why don't we try that?
+
+>>> little_end_u4 = np.ndarray(shape=(1,),dtype='<u4', buffer=big_end_str)
+>>> little_end_u4[0] == 1 * 256**1 + 3 * 256**2 + 2 * 256**3
+True
+
+Returning to our ``big_end_arr`` - in this case our underlying data is
+big-endian (data endianness) and we've set the dtype to match (the dtype
+is also big-endian).  However, sometimes you need to flip these around.
+
+Changing byte ordering
+======================
+
+As you can imagine from the introduction, there are two ways you can
+affect the relationship between the byte ordering of the array and the
+underlying memory it is looking at:
+
+* Change the byte-ordering information in the array dtype so that it
+  interprets the undelying data as being in a different byte order.
+  This is the role of ``arr.newbyteorder()``
+* Change the byte-ordering of the underlying data, leaving the dtype
+  interpretation as it was.  This is what ``arr.byteswap()`` does.
+
+The common situations in which you need to change byte ordering are:
+
+#. Your data and dtype endianess don't match, and you want to change
+   the dtype so that it matches the data.
+#. Your data and dtype endianess don't match, and you want to swap the
+   data so that they match the dtype
+#. Your data and dtype endianess match, but you want the data swapped
+   and the dtype to reflect this
+
+Data and dtype endianness don't match, change dtype to match data
+-----------------------------------------------------------------
+
+We make something where they don't match:
+
+>>> wrong_end_dtype_arr = np.ndarray(shape=(2,),dtype='<i2', buffer=big_end_str)
+>>> wrong_end_dtype_arr[0]
+256
+
+The obvious fix for this situation is to change the dtype so it gives
+the correct endianness:
+
+>>> fixed_end_dtype_arr = wrong_end_dtype_arr.newbyteorder()
+>>> fixed_end_dtype_arr[0]
+1
+
+Note the the array has not changed in memory:
+
+>>> fixed_end_dtype_arr.tostring() == big_end_str
+True
+
+Data and type endianness don't match, change data to match dtype
+----------------------------------------------------------------
+
+You might want to do this if you need the data in memory to be a certain
+ordering.  For example you might be writing the memory out to a file
+that needs a certain byte ordering.
+
+>>> fixed_end_mem_arr = wrong_end_dtype_arr.byteswap()
+>>> fixed_end_mem_arr[0]
+1
+
+Now the array *has* changed in memory:
+
+>>> fixed_end_mem_arr.tostring() == big_end_str
+False
+
+Data and dtype endianness match, swap data and dtype
+----------------------------------------------------
+
+You may have a correctly specified array dtype, but you need the array
+to have the opposite byte order in memory, and you want the dtype to
+match so the array values make sense.  In this case you just do both of
+the previous operations:
+
+>>> swapped_end_arr = big_end_arr.byteswap().newbyteorder()
+>>> swapped_end_arr[0]
+1
+>>> swapped_end_arr.tostring() == big_end_str
+False
+
+'''
author	Matthew Brett <matthew.brett@gmail.com>	2009-10-03 16:32:21 +0000
committer	Matthew Brett <matthew.brett@gmail.com>	2009-10-03 16:32:21 +0000
commit	3e1dd88095d287f059a87e1746da47c968b3d707 (patch)
tree	175ccd74914f23bcfe407eaaca5f46572315d0b2 /numpy/doc/byteswapping.py
parent	d9665935dddea327ab47bec40d12813f7c117c78 (diff)
download	numpy-3e1dd88095d287f059a87e1746da47c968b3d707.tar.gz