From ba4d1161fe4943cb720f35c0abfd0581628255d6 Mon Sep 17 00:00:00 2001 From: Mark Wiebe Date: Tue, 16 Aug 2011 19:11:22 -0700 Subject: BUG: missingdata: Fix mask usage in PyArray_TakeFrom, add tests for it --- doc/neps/missing-data.rst | 55 +++++++++++++++++++++-------------------------- 1 file changed, 25 insertions(+), 30 deletions(-) (limited to 'doc/neps') diff --git a/doc/neps/missing-data.rst b/doc/neps/missing-data.rst index e83bd2189..197a3107d 100644 --- a/doc/neps/missing-data.rst +++ b/doc/neps/missing-data.rst @@ -237,21 +237,15 @@ mask [Exposed, Exposed, Hidden, Exposed], and values [1.0, 2.0, , 7.0] for the masked and NA dtype versions respectively. -It may be worth overloading the np.NA __call__ method to accept a dtype, -returning a zero-dimensional array with a missing value of that dtype. -Without doing this, NA printouts would look like:: +The np.NA singleton may accept a dtype= keyword parameter, indicating +that it should be treated as an NA of a particular data type. This is also +a mechanism for preserving the dtype in a NumPy scalar-like fashion. +Here's what this could look like:: >>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], maskna=True)) - array(NA, dtype='float64', maskna=True) - >>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')) - array(NA, dtype='NA[>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], maskna=True)) - NA('float64') + NA(dtype='>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')) - NA('NA[>> a = np.array([1,2]) - >>> b = a.view() - >>> b.flags.hasmaskna = True + >>> b = a.view(maskna=True) >>> b - array([1,2], maskna=True) + array([1, 2], maskna=True) >>> b[0] = np.NA >>> b - array([NA,2], maskna=True) + array([NA, 2], maskna=True) >>> a - array([1,2]) + array([1, 2]) >>> # The underlying number 1 value in 'a[0]' was untouched Copying values between the mask-based implementation and the @@ -322,8 +309,16 @@ these semantics without the extra manipulation. A manual loop through a masked array like:: - for i in xrange(len(a)): - a[i] = np.log(a[i]) + >>> a = np.arange(5., maskna=True) + >>> a[3] = np.NA + >>> a + array([ 0., 1., 2., NA, 4.], maskna=True) + >>> for i in xrange(len(a)): + ... a[i] = np.log(a[i]) + ... + __main__:2: RuntimeWarning: divide by zero encountered in log + >>> a + array([ -inf, 0. , 0.69314718, NA, 1.38629436], maskna=True) works even with masked values, because 'a[i]' returns a zero-dimensional array with a missing value instead of the singleton np.NA for the missing -- cgit v1.2.1