Merge pull request #3880 from pv/linalg-stride-fix

Fix bugs in umath_linalg + add more tests + remove unused code
author: Charles Harris <charlesr.harris@gmail.com> 2013-10-08 08:49:48 -0700
committer: Charles Harris <charlesr.harris@gmail.com> 2013-10-08 08:49:48 -0700
commit: 6a7830b143945c40b2893d02ea09951cd8c60598 (patch)
tree: a2367af215a084a5010852574a1d476a79a87413
parent: ec3603f43aaf4056bdd78401c23c937c5760f673 (diff)
parent: 2fa1c9d150398cc52a2752604e24bde9e762a43c (diff)
download: numpy-6a7830b143945c40b2893d02ea09951cd8c60598.tar.gz
5 files changed, 371 insertions, 3339 deletions
diff --git a/numpy/linalg/_gufuncs_linalg.py b/numpy/linalg/_gufuncs_linalg.py
deleted file mode 100644
index ef19c2121..000000000
--- a/numpy/linalg/_gufuncs_linalg.py
+++ /dev/null
@@ -1,1454 +0,0 @@
-"""Linear Algebra functions implemented as gufuncs, so they broadcast.
-
-.. warning:: This module is only for testing, the functionality will be
-   integrated into numpy.linalg proper.
-
-=======================
- gufuncs_linalg module
-=======================
-
-gufuncs_linalg implements a series of linear algebra functions as gufuncs.
-Most of these functions are already present in numpy.linalg, but as they
-are implemented using gufunc kernels they can be broadcasting. Some parts
-that are python in numpy.linalg are implemented inside C functions, as well
-as the iteration used when used on vectors. This can result in faster
-execution as well.
-
-In addition, there are some ufuncs thrown in that implement fused operations
-over numpy vectors that can result in faster execution on large vector
-compared to non-fused versions (for example: multiply_add, multiply3).
-
-In fact, gufuncs_linalg is a very thin wrapper of python code that wraps
-the actual kernels (gufuncs). This wrapper was needed in order to provide
-a sane interface for some functions. Mostly working around limitations on
-what can be described in a gufunc signature. Things like having one dimension
-of a result depending on the minimum of two dimensions of the sources (like
-in svd) or passing an uniform keyword parameter to the whole operation
-(like UPLO on functions over symmetric/hermitian matrices).
-
-The gufunc kernels are in a c module named _umath_linalg, that is imported
-privately in gufuncs_linalg.
-
-==========
- Contents
-==========
-
-Here is an enumeration of the functions. These are the functions exported by
-the module and should appear in its __all__ attribute. All the functions
-contain a docstring explaining them in detail.
-
-General
-=======
-- inner1d
-- innerwt
-- matrix_multiply
-- quadratic_form
-
-Lineal Algebra
-==============
-- det
-- slogdet
-- cholesky
-- eig
-- eigvals
-- eigh
-- eigvalsh
-- solve
-- svd
-- chosolve
-- inv
-- poinv
-
-Fused Operations
-================
-- add3
-- multiply3
-- multiply3_add
-- multiply_add
-- multiply_add2
-- multiply4
-- multiply4_add
-
-================
- Error Handling
-================
-Unlike the numpy.linalg module, this module does not use exceptions to notify
-errors in the execution of the kernels. As these functions are thougth to be
-used in a vector way it didn't seem appropriate to raise exceptions on failure
-of an element. So instead, when an error computing an element occurs its
-associated result will be set to an invalid value (all NaNs).
-
-Exceptions can occur if the arguments fail to map properly to the underlying
-gufunc (due to signature mismatch, for example).
-
-================================
- Notes about the implementation
-================================
-Where possible, the wrapper functions map directly into a gufunc implementing
-the computation.
-
-That's not always the case, as due to limitations of the gufunc interface some
-functions cannot be mapped straight into a kernel.
-
-Two cases come to mind:
-- An uniform parameter is needed to configure the way the computation is
-performed (like UPLO in the functions working on symmetric/hermitian matrices)
-- svd, where it was impossible to map the function to a gufunc signature.
-
-In the case of uniform parameters like UPLO, there are two separate entry points
-in the C module that imply either 'U' or 'L'. The wrapper just selects the
-kernel to use by checking the appropriate keyword parameter. This way a
-function interface similar to numpy.linalg can be kept.
-
-In the case of SVD not only there were problems with the support of keyword
-arguments. There was the added problem of the signature system not being able
-to cope with the needs of this functions. Just for the singular values a
-a signature like (m,n)->(min(m,n)) was needed. This has been worked around by
-implementing different kernels for the cases where min(m,n) == m and where
-min(m,n) == n. The wrapper code automatically selects the appropriate one.
-
-
-"""
-
-from __future__ import division, absolute_import, print_function
-
-
-__all__ = ['inner1d', 'dotc1d', 'innerwt', 'matrix_multiply', 'det', 'slogdet',
-           'inv', 'cholesky', 'quadratic_form', 'add3', 'multiply3',
-           'multiply3_add', 'multiply_add', 'multiply_add2', 'multiply4',
-           'multiply4_add', 'eig', 'eigvals', 'eigh', 'eigvalsh', 'solve',
-           'svd', 'chosolve', 'poinv']
-
-import numpy as np
-
-from . import _umath_linalg as _impl
-
-
-def inner1d(a, b, **kwargs):
-    """
-    Compute the dot product of vectors over the inner dimension, with
-    broadcasting.
-
-    Parameters
-    ----------
-    a : (..., N) array
-        Input array
-    b : (..., N) array
-        Input array
-
-    Returns
-    -------
-    inner : (...) array
-        dot product over the inner dimension.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply when matching dimensions.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    For single and double types this is equivalent to dotc1d.
-
-    Maps to Blas functions sdot, ddot, cdotu and zdotu.
-
-    See Also
-    --------
-    dotc1d : dot product conjugating first vector.
-    innerwt : weighted (i.e. triple) inner product.
-
-    Examples
-    --------
-    >>> a = np.arange(1,5).reshape(2,2)
-    >>> b = np.arange(1,8,2).reshape(2,2)
-    >>> res = inner1d(a,b)
-    >>> res.shape
-    (2,)
-    >>> print res
-    [  7.  43.]
-
-    """
-    return _impl.inner1d(a, b, **kwargs)
-
-
-def dotc1d(a, b, **kwargs):
-    """
-    Compute the dot product of vectors over the inner dimension, conjugating
-    the first vector, with broadcasting
-
-    Parameters
-    ----------
-    a : (..., N) array
-        Input array
-    b : (..., N) array
-        Input array
-
-    Returns
-    -------
-    dotc : (...) array
-        dot product conjugating the first vector over the inner
-        dimension.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply when matching dimensions.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    For single and double types this is equivalent to inner1d.
-
-    Maps to Blas functions sdot, ddot, cdotc and zdotc.
-
-    See Also
-    --------
-    inner1d : dot product
-    innerwt : weighted (i.e. triple) inner product.
-
-    Examples
-    --------
-    >>> a = np.arange(1,5).reshape(2,2)
-    >>> b = np.arange(1,8,2).reshape(2,2)
-    >>> res = inner1d(a,b)
-    >>> res.shape
-    (2,)
-    >>> print res
-    [  7.  43.]
-
-    """
-    return _impl.dotc1d(a, b, **kwargs)
-
-
-def innerwt(a, b, c, **kwargs):
-    """
-    Compute the weighted (i.e. triple) inner product, with
-    broadcasting.
-
-    Parameters
-    ----------
-    a, b, c : (..., N) array
-        Input arrays
-
-    Returns
-    -------
-    inner : (...) array
-        The weighted (i.e. triple) inner product.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    See Also
-    --------
-    inner1d : inner product.
-    dotc1d : dot product conjugating first vector.
-
-    Examples
-    --------
-    >>> a = np.arange(1,5).reshape(2,2)
-    >>> b = np.arange(1,8,2).reshape(2,2)
-    >>> c = np.arange(0.25,1.20,0.25).reshape(2,2)
-    >>> res = innerwt(a,b,c)
-    >>> res.shape
-    (2,)
-    >>> res
-    array([  3.25,  39.25])
-
-    """
-    return _impl.innerwt(a, b, c, **kwargs)
-
-
-def matrix_multiply(a,b,**kwargs):
-    """
-    Compute matrix multiplication, with broadcasting
-
-    Parameters
-    ----------
-    a : (..., M, N) array
-        Input array.
-    b : (..., N, P) array
-        Input array.
-
-    Returns
-    -------
-    r : (..., M, P) array matrix multiplication of a and b over any number of
-        outer dimensions
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Matrix multiplication is computed using BLAS _gemm functions.
-
-    Implemented for single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    Examples
-    --------
-    >>> a = np.arange(1,17).reshape(2,2,4)
-    >>> b = np.arange(1,25).reshape(2,4,3)
-    >>> res = matrix_multiply(a,b)
-    >>> res.shape
-    (2, 2, 3)
-    >>> res
-    array([[[   70.,    80.,    90.],
-            [  158.,   184.,   210.]],
-    <BLANKLINE>
-           [[  750.,   792.,   834.],
-            [ 1030.,  1088.,  1146.]]])
-
-    """
-    return _impl.matrix_multiply(a,b,**kwargs)
-
-
-def det(a, **kwargs):
-    """
-    Compute the determinant of arrays, with broadcasting.
-
-    Parameters
-    ----------
-    a : (NDIMS, M, M) array
-        Input array. Its inner dimensions must be those of a square 2-D array.
-
-    Returns
-    -------
-    det : (NDIMS) array
-        Determinants of `a`
-
-    See Also
-    --------
-    slogdet : Another representation for the determinant, more suitable
-        for large matrices where underflow/overflow may occur
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    The determinants are computed via LU factorization using the LAPACK
-    routine _getrf.
-
-    Implemented for single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    Examples
-    --------
-    The determinant of a 2-D array [[a, b], [c, d]] is ad - bc:
-
-    >>> a = np.array([[1, 2], [3, 4]])
-    >>> np.allclose(-2.0, det(a))
-    True
-
-    >>> a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]] ])
-    >>> np.allclose(-2.0, det(a))
-    True
-
-    """
-    return _impl.det(a, **kwargs)
-
-
-def slogdet(a, **kwargs):
-    """
-    Compute the sign and (natural) logarithm of the determinant of an array,
-    with broadcasting.
-
-    If an array has a very small or very large determinant, then a call to
-    `det` may overflow or underflow. This routine is more robust against such
-    issues, because it computes the logarithm of the determinant rather than
-    the determinant itself
-
-    Parameters
-    ----------
-    a : (..., M, M) array
-        Input array. Its inner dimensions must be those of a square 2-D array.
-
-    Returns
-    -------
-    sign : (...) array
-        An array of numbers representing the sign of the determinants. For real
-        matrices, this is 1, 0, or -1. For complex matrices, this is a complex
-        number with absolute value 1 (i.e., it is on the unit circle), or else
-        0.
-    logdet : (...) array
-        The natural log of the absolute value of the determinant. This is always
-        a real type.
-
-    If the determinant is zero, then `sign` will be 0 and `logdet` will be -Inf.
-    In all cases, the determinant is equal to ``sign * np.exp(logdet)``.
-
-    See Also
-    --------
-    det
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    The determinants are computed via LU factorization using the LAPACK
-    routine _getrf.
-
-    Implemented for types single, double, csingle and cdouble. Numpy conversion
-    rules apply. For complex versions `logdet` will be of the associated real
-    type (single for csingle, double for cdouble).
-
-    Examples
-    --------
-    The determinant of a 2-D array [[a, b], [c, d]] is ad - bc:
-
-    >>> a = np.array([[1, 2], [3, 4]])
-    >>> (sign, logdet) = slogdet(a)
-    >>> sign.shape
-    ()
-    >>> logdet.shape
-    ()
-    >>> np.allclose(-2.0, sign * np.exp(logdet))
-    True
-
-    >>> a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]] ])
-    >>> (sign, logdet) = slogdet(a)
-    >>> sign.shape
-    (2,)
-    >>> logdet.shape
-    (2,)
-    >>> np.allclose(-2.0, sign * np.exp(logdet))
-    True
-
-    """
-    return _impl.slogdet(a, **kwargs)
-
-
-def inv(a, **kwargs):
-    """
-    Compute the (multiplicative) inverse of matrices, with broadcasting.
-
-    Given a square matrix `a`, return the matrix `ainv` satisfying
-    ``matrix_multiply(a, ainv) = matrix_multiply(ainv, a) = Identity matrix``
-
-    Parameters
-    ----------
-    a : (..., M, M) array
-        Matrices to be inverted
-
-    Returns
-    -------
-    ainv : (..., M, M) array
-        (Multiplicative) inverse of the `a` matrices.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    Singular matrices and thus, not invertible, result in an array of NaNs.
-
-    See Also
-    --------
-    poinv : compute the multiplicative inverse of hermitian/symmetric matrices,
-            using cholesky decomposition.
-
-    Examples
-    --------
-    >>> a = np.array([[1, 2], [3, 4]])
-    >>> ainv = inv(a)
-    >>> np.allclose(matrix_multiply(a, ainv), np.eye(2))
-    True
-    >>> np.allclose(matrix_multiply(ainv, a), np.eye(2))
-    True
-
-    """
-    return _impl.inv(a, **kwargs)
-
-
-def cholesky(a, UPLO='L', **kwargs):
-    """
-    Compute the cholesky decomposition of `a`, with broadcasting
-
-    The Cholesky decomposition (or Cholesky triangle) is a decomposition of a
-    Hermitian, positive-definite matrix into the product of a lower triangular
-    matrix and its conjugate transpose.
-
-    A = LL*
-
-    where L* is the positive-definite matrix.
-
-    Parameters
-    ----------
-    a : (..., M, M) array
-        Matrices for which compute the cholesky decomposition
-
-    Returns
-    -------
-    l : (..., M, M) array
-        Matrices for each element where each entry is the lower triangular
-        matrix with strictly positive diagonal entries such that a = ll* for
-        all outer dimensions
-
-    See Also
-    --------
-    chosolve : solve a system using cholesky decomposition
-    poinv : compute the inverse of a matrix using cholesky decomposition
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    Decomposition is performed using LAPACK routine _potrf.
-
-    For elements where the LAPACK routine fails, the result will be set to NaNs.
-
-    If an element of the source array is not a positive-definite matrix the
-    result for that element is undefined.
-
-    Examples
-    --------
-    >>> A = np.array([[1,-2j],[2j,5]])
-    >>> A
-    array([[ 1.+0.j,  0.-2.j],
-           [ 0.+2.j,  5.+0.j]])
-    >>> L = cholesky(A)
-    >>> L
-    array([[ 1.+0.j,  0.+0.j],
-           [ 0.+2.j,  1.+0.j]])
-
-    """
-    if 'L' == UPLO:
-        gufunc = _impl.cholesky_lo
-    else:
-        gufunc = _impl.cholesky_up
-
-    return gufunc(a, **kwargs)
-
-
-def eig(a, **kwargs):
-    """
-    Compute the eigenvalues and right eigenvectors of square arrays,
-    with broadcasting
-
-    Parameters
-    ----------
-    a : (..., M, M) array
-        Matrices for which the eigenvalues and right eigenvectors will
-        be computed
-
-    Returns
-    -------
-    w : (..., M) array
-        The eigenvalues, each repeated according to its multiplicity.
-        The eigenvalues are not necessarily ordered. The resulting
-        array will be always be of complex type. When `a` is real
-        the resulting eigenvalues will be real (0 imaginary part) or
-        occur in conjugate pairs
-
-    v : (..., M, M) array
-        The normalized (unit "length") eigenvectors, such that the
-        column ``v[:,i]`` is the eigenvector corresponding to the
-        eigenvalue ``w[i]``.
-
-    See Also
-    --------
-    eigvals : eigenvalues of general arrays.
-    eigh : eigenvalues and right eigenvectors of symmetric/hermitian
-        arrays.
-    eigvalsh : eigenvalues of symmetric/hermitian arrays.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    Eigenvalues and eigenvectors for single and double versions will
-    always be typed csingle and cdouble, even if all the results are
-    real (imaginary part will be 0).
-
-    This is implemented using the _geev LAPACK routines which compute
-    the eigenvalues and eigenvectors of general square arrays.
-
-    For elements where the LAPACK routine fails, the result will be set
-    to NaNs.
-
-    Examples
-    --------
-    First, a utility function to check if eigvals/eigvectors are correct.
-    This checks the definition of eigenvectors. For each eigenvector v
-    with associated eigenvalue w of a matrix M the following equality must
-    hold: Mv == wv
-
-    >>> def check_eigen(M, w, v):
-    ...     '''vectorial check of Mv==wv'''
-    ...     lhs = matrix_multiply(M, v)
-    ...     rhs = w*v
-    ...     return np.allclose(lhs, rhs)
-
-    (Almost) Trivial example with real e-values and e-vectors. Note
-    the complex types of the results
-
-    >>> M = np.diag((1,2,3)).astype(float)
-    >>> w, v = eig(M)
-    >>> check_eigen(M, w, v)
-    True
-
-    Real matrix possessing complex e-values and e-vectors; note that the
-    e-values are complex conjugates of each other.
-
-    >>> M = np.array([[1, -1], [1, 1]])
-    >>> w, v = eig(M)
-    >>> check_eigen(M, w, v)
-    True
-
-    Complex-valued matrix with real e-values (but complex-valued e-vectors);
-    note that a.conj().T = a, i.e., a is Hermitian.
-
-    >>> M = np.array([[1, 1j], [-1j, 1]])
-    >>> w, v = eig(M)
-    >>> check_eigen(M, w, v)
-    True
-
-    """
-    return _impl.eig(a, **kwargs)
-
-
-def eigvals(a, **kwargs):
-    """
-    Compute the eigenvalues of general matrices, with broadcasting.
-
-    Main difference between `eigvals` and `eig`: the eigenvectors aren't
-    returned.
-
-    Parameters
-    ----------
-    a : (..., M, M) array
-        Matrices whose eigenvalues will be computed
-
-    Returns
-    -------
-    w : (..., M) array
-        The eigenvalues, each repeated according to its multiplicity.
-        The eigenvalues are not necessarily ordered. The resulting
-        array will be always be of complex type. When `a` is real
-        the resulting eigenvalues will be real (0 imaginary part) or
-        occur in conjugate pairs
-
-    See Also
-    --------
-    eig : eigenvalues and right eigenvectors of general arrays.
-    eigh : eigenvalues and right eigenvectors of symmetric/hermitian
-        arrays.
-    eigvalsh : eigenvalues of symmetric/hermitian arrays.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    Eigenvalues for single and double versions will always be typed
-    csingle and cdouble, even if all the results are real (imaginary
-    part will be 0).
-
-    This is implemented using the _geev LAPACK routines which compute
-    the eigenvalues and eigenvectors of general square arrays.
-
-    For elements where the LAPACK routine fails, the result will be set
-    to NaNs.
-
-    Examples
-    --------
-
-    Eigenvalues for a diagonal matrix are its diagonal elements
-
-    >>> D = np.diag((-1,1))
-    >>> eigvals(D)
-    array([-1.+0.j,  1.+0.j])
-
-    Multiplying on the left by an orthogonal matrix, `Q`, and on the
-    right by `Q.T` (the transpose of `Q` preserves the eigenvalues of
-    the original matrix
-
-    >>> x = np.random.random()
-    >>> Q = np.array([[np.cos(x), -np.sin(x)], [np.sin(x), np.cos(x)]])
-    >>> A = matrix_multiply(Q, D)
-    >>> A = matrix_multiply(A, Q.T)
-    >>> eigvals(A)
-    array([-1.+0.j,  1.+0.j])
-
-    """
-    return _impl.eigvals(a, **kwargs)
-
-
-def quadratic_form(u,Q,v, **kwargs):
-    """
-    Compute the quadratic form uQv, with broadcasting
-
-    Parameters
-    ----------
-    u : (..., M) array
-        The u vectors of the quadratic form uQv
-
-    Q : (..., M, N) array
-        The Q matrices of the quadratic form uQv
-
-    v : (..., N) array
-        The v vectors of the quadratic form uQv
-
-    Returns
-    -------
-    qf : (...) array
-        The result of the quadratic forms
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    This is similar to PDL inner2
-
-    Examples
-    --------
-
-    The result in absence of broadcasting is just as np.dot(np.dot(u,Q),v)
-    or np.dot(u, np.dot(Q,v))
-
-    >>> u = np.array([2., 3.])
-    >>> Q = np.array([[1.,1.], [0.,1.]])
-    >>> v = np.array([1.,2.])
-    >>> quadratic_form(u,Q,v)
-    12.0
-
-    >>> np.dot(np.dot(u,Q),v)
-    12.0
-
-    >>> np.dot(u, np.dot(Q,v))
-    12.0
-
-    """
-    return _impl.quadratic_form(u, Q, v, **kwargs)
-
-
-def add3(a, b, c, **kwargs):
-    """
-    Element-wise addition of 3 arrays: a + b + c.
-
-    Parameters
-    ----------
-    a, b, c : (...) array
-        arrays with the addends
-
-    Returns
-    -------
-    add3 : (...) array
-        resulting element-wise addition.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    See Also
-    --------
-    multiply3 : element-wise three-way multiplication.
-    multiply3_add : element-wise three-way multiplication and addition.
-    multiply_add : element-wise multiply-add.
-    multiply_add2 : element-wise multiplication with two additions.
-    multiply4 : element-wise four-way multiplication
-    multiply4_add : element-wise four-way multiplication and addition,
-
-    Examples
-    --------
-    >>> a = np.linspace(1.0, 30.0, 30)
-    >>> add3(a[0::3], a[1::3], a[2::3])
-    array([  6.,  15.,  24.,  33.,  42.,  51.,  60.,  69.,  78.,  87.])
-
-    """
-    return _impl.add3(a, b, c, **kwargs)
-
-
-def multiply3(a, b, c, **kwargs):
-    """
-    Element-wise multiplication of 3 arrays: a*b*c.
-
-    Parameters
-    ----------
-    a, b, c : (...) array
-        arrays with the factors
-
-    Returns
-    -------
-    m3 : (...) array
-        resulting element-wise product
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    See Also
-    --------
-    add3 : element-wise three-was addition
-    multiply3_add : element-wise three-way multiplication and addition.
-    multiply_add : element-wise multiply-add.
-    multiply_add2 : element-wise multiplication with two additions.
-    multiply4 : element-wise four-way multiplication
-    multiply4_add : element-wise four-way multiplication and addition,
-
-    Examples
-    --------
-    >>> a = np.linspace(1.0, 10.0, 10)
-    >>> multiply3(a, 1.01, a)
-    array([   1.01,    4.04,    9.09,   16.16,   25.25,   36.36,   49.49,
-             64.64,   81.81,  101.  ])
-
-    """
-    return _impl.multiply3(a, b, c, **kwargs)
-
-
-def multiply3_add(a, b, c, d, **kwargs):
-    """
-    Element-wise multiplication of 3 arrays adding an element
-    of the a 4th array to the result: a*b*c + d
-
-    Parameters
-    ----------
-    a, b, c : (...) array
-        arrays with the factors
-
-    d : (...) array
-        array with the addend
-
-    Returns
-    -------
-    m3a : (...) array
-        resulting element-wise addition
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    See Also
-    --------
-    add3 : element-wise three-was addition
-    multiply3 : element-wise three-way multiplication.
-    multiply3_add : element-wise three-way multiplication and addition.
-    multiply_add : element-wise multiply-add.
-    multiply_add2 : element-wise multiplication with two additions.
-    multiply4 : element-wise four-way multiplication
-    multiply4_add : element-wise four-way multiplication and addition,
-
-    Examples
-    --------
-    >>> a = np.linspace(1.0, 10.0, 10)
-    >>> multiply3_add(a, 1.01, a, 42e-4)
-    array([   1.0142,    4.0442,    9.0942,   16.1642,   25.2542,   36.3642,
-             49.4942,   64.6442,   81.8142,  101.0042])
-
-    """
-    return _impl.multiply3_add(a, b, c, d, **kwargs)
-
-
-def multiply_add(a, b, c, **kwargs):
-    """
-    Element-wise addition of 3 arrays
-
-    Parameters
-    ----------
-    a, b, c : (...) array
-        arrays with the addends
-
-    Returns
-    -------
-    add3 : (...) array
-        resulting element-wise addition
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    See Also
-    --------
-    add3 : element-wise three-was addition
-    multiply3 : element-wise three-way multiplication.
-    multiply3_add : element-wise three-way multiplication and addition.
-    multiply_add : element-wise multiply-add.
-    multiply_add2 : element-wise multiplication with two additions.
-    multiply4 : element-wise four-way multiplication
-    multiply4_add : element-wise four-way multiplication and addition,
-
-    Examples
-    --------
-    >>> a = np.linspace(1.0, 10.0, 10)
-    >>> multiply_add(a, a, 42e-4)
-    array([   1.0042,    4.0042,    9.0042,   16.0042,   25.0042,   36.0042,
-             49.0042,   64.0042,   81.0042,  100.0042])
-
-    """
-    return _impl.multiply_add(a, b, c, **kwargs)
-
-
-def multiply_add2(a, b, c, d, **kwargs):
-    """
-    Element-wise addition of 3 arrays
-
-    Parameters
-    ----------
-    a, b, c : (...) array
-        arrays with the addends
-
-    Returns
-    -------
-    add3 : (...) array
-        resulting element-wise addition
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    See Also
-    --------
-    add3 : element-wise three-was addition
-    multiply3 : element-wise three-way multiplication.
-    multiply3_add : element-wise three-way multiplication and addition.
-    multiply_add : element-wise multiply-add.
-    multiply_add2 : element-wise multiplication with two additions.
-    multiply4 : element-wise four-way multiplication
-    multiply4_add : element-wise four-way multiplication and addition,
-
-    Examples
-    --------
-    >>> a = np.linspace(1.0, 10.0, 10)
-    >>> multiply_add2(a, a, a, 42e-4)
-    array([   2.0042,    6.0042,   12.0042,   20.0042,   30.0042,   42.0042,
-             56.0042,   72.0042,   90.0042,  110.0042])
-
-    """
-    return _impl.multiply_add2(a, b, c, d, **kwargs)
-
-
-def multiply4(a, b, c, d, **kwargs):
-    """
-    Element-wise addition of 3 arrays
-
-    Parameters
-    ----------
-    a, b, c : (...) array
-        arrays with the addends
-
-    Returns
-    -------
-    add3 : (...) array
-        resulting element-wise addition
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    See Also
-    --------
-    add3 : element-wise three-was addition
-    multiply3 : element-wise three-way multiplication.
-    multiply3_add : element-wise three-way multiplication and addition.
-    multiply_add : element-wise multiply-add.
-    multiply_add2 : element-wise multiplication with two additions.
-    multiply4 : element-wise four-way multiplication
-    multiply4_add : element-wise four-way multiplication and addition,
-
-    Examples
-    --------
-    >>> a = np.linspace(1.0, 10.0, 10)
-    >>> multiply4(a, a, a[::-1], 1.0001)
-    array([  10.001 ,   36.0036,   72.0072,  112.0112,  150.015 ,  180.018 ,
-            196.0196,  192.0192,  162.0162,  100.01  ])
-
-    """
-    return _impl.multiply4(a, b, c, d, **kwargs)
-
-
-def multiply4_add(a, b, c, d, e, **kwargs):
-    """
-    Element-wise addition of 3 arrays
-
-    Parameters
-    ----------
-    a, b, c : (...) array
-        arrays with the addends
-
-    Returns
-    -------
-    add3 : (...) array
-        resulting element-wise addition
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Implemented for types single, double, csingle and cdouble. Numpy
-    conversion rules apply.
-
-    See Also
-    --------
-    add3 : element-wise three-was addition
-    multiply3 : element-wise three-way multiplication.
-    multiply3_add : element-wise three-way multiplication and addition.
-    multiply_add : element-wise multiply-add.
-    multiply_add2 : element-wise multiplication with two additions.
-    multiply4 : element-wise four-way multiplication
-    multiply4_add : element-wise four-way multiplication and addition,
-
-    Examples
-    --------
-    >>> a = np.linspace(1.0, 10.0, 10)
-    >>> multiply4_add(a, a, a[::-1], 1.01, 42e-4)
-    array([  10.1042,   36.3642,   72.7242,  113.1242,  151.5042,  181.8042,
-            197.9642,  193.9242,  163.6242,  101.0042])
-
-    """
-    return _impl.multiply4_add(a, b, c, d, e,**kwargs)
-
-
-def eigh(A, UPLO='L', **kw_args):
-    """
-    Computes the eigenvalues and eigenvectors for the square matrices
-    in the inner dimensions of A, being those matrices
-    symmetric/hermitian.
-
-    Parameters
-    ----------
-    A : (..., M, M) array
-         Hermitian/Symmetric matrices whose eigenvalues and
-         eigenvectors are to be computed.
-    UPLO : {'L', 'U'}, optional
-         Specifies whether the calculation is done with the lower
-         triangular part of the elements in `A` ('L', default) or
-         the upper triangular part ('U').
-
-    Returns
-    -------
-    w : (..., M) array
-        The eigenvalues, not necessarily ordered.
-    v : (..., M, M) array
-        The inner dimensions contain matrices with the normalized
-        eigenvectors as columns. The column-numbers are coherent with
-        the associated eigenvalue.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    The eigenvalues/eigenvectors are computed using LAPACK routines _ssyevd,
-    _heevd
-
-    For elements where the LAPACK routine fails, the result will be set
-    to NaNs.
-
-    Implemented for single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    Unlike eig, the results for single and double will always be single
-    and doubles. It is not possible for symmetrical real matrices to result
-    in complex eigenvalues/eigenvectors
-
-    See Also
-    --------
-    eigvalsh : eigenvalues of symmetric/hermitian arrays.
-    eig : eigenvalues and right eigenvectors for general matrices.
-    eigvals : eigenvalues for general matrices.
-
-    Examples
-    --------
-    First, a utility function to check if eigvals/eigvectors are correct.
-    This checks the definition of eigenvectors. For each eigenvector v
-    with associated eigenvalue w of a matrix M the following equality must
-    hold: Mv == wv
-
-    >>> def check_eigen(M, w, v):
-    ...     '''vectorial check of Mv==wv'''
-    ...     lhs = matrix_multiply(M, v)
-    ...     rhs = w*v
-    ...     return np.allclose(lhs, rhs)
-
-    A simple example that computes eigenvectors and eigenvalues of
-    a hermitian matrix and checks that A*v = v*w for both pairs of
-    eignvalues(w) and eigenvectors(v)
-
-    >>> M = np.array([[1, -2j], [2j, 1]])
-    >>> w, v = eigh(M)
-    >>> check_eigen(M, w, v)
-    True
-
-    """
-    if 'L' == UPLO:
-        gufunc = _impl.eigh_lo
-    else:
-        gufunc = _impl.eigh_up
-
-    return gufunc(A, **kw_args)
-
-
-def eigvalsh(A, UPLO='L', **kw_args):
-    """
-    Computes the eigenvalues for the square matrices in the inner
-    dimensions of A, being those matrices symmetric/hermitian.
-
-    Parameters
-    ----------
-    A : (..., M, M) array
-         Hermitian/Symmetric matrices whose eigenvalues and
-         eigenvectors are to be computed.
-    UPLO : {'L', 'U'}, optional
-         Specifies whether the calculation is done with the lower
-         triangular part of the elements in `A` ('L', default) or
-         the upper triangular part ('U').
-
-    Returns
-    -------
-    w : (..., M) array
-        The eigenvalues, not necessarily ordered.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    The eigenvalues are computed using LAPACK routines _ssyevd, _heevd
-
-    For elements where the LAPACK routine fails, the result will be set
-    to NaNs.
-
-    Implemented for single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    Unlike eigvals, the results for single and double will always be single
-    and doubles. It is not possible for symmetrical real matrices to result
-    in complex eigenvalues.
-
-    See Also
-    --------
-    eigh : eigenvalues of symmetric/hermitian arrays.
-    eig : eigenvalues and right eigenvectors for general matrices.
-    eigvals : eigenvalues for general matrices.
-
-    Examples
-    --------
-    eigvalsh results should be the same as the eigenvalues returned by eigh
-
-    >>> a = np.array([[1, -2j], [2j, 5]])
-    >>> w, v = eigh(a)
-    >>> np.allclose(w, eigvalsh(a))
-    True
-
-    eigvalsh on an identity matrix is all ones
-    >>> eigvalsh(np.eye(6))
-    array([ 1.,  1.,  1.,  1.,  1.,  1.])
-
-    """
-    if ('L' == UPLO):
-        gufunc = _impl.eigvalsh_lo
-    else:
-        gufunc = _impl.eigvalsh_up
-
-    return gufunc(A,**kw_args)
-
-
-def solve(A,B,**kw_args):
-    """
-    Solve the linear matrix equations on the inner dimensions.
-
-    Computes the "exact" solution, `x`. of the well-determined,
-    i.e., full rank, linear matrix equations `ax = b`.
-
-    Parameters
-    ----------
-    A : (..., M, M) array
-        Coefficient matrices.
-    B : (..., M, N) array
-        Ordinate or "dependent variable" values.
-
-    Returns
-    -------
-    X : (..., M, N) array
-        Solutions to the system A X = B for all the outer dimensions
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    The solutions are computed using LAPACK routine _gesv
-
-    For elements where the LAPACK routine fails, the result will be set
-    to NaNs.
-
-    Implemented for single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    See Also
-    --------
-    chosolve : solve a system using cholesky decomposition (for equations
-               having symmetric/hermitian coefficient matrices)
-
-    Examples
-    --------
-    Solve the system of equations ``3 * x0 + x1 = 9`` and ``x0 + 2 * x1 = 8``:
-
-    >>> a = np.array([[3,1], [1,2]])
-    >>> b = np.array([9,8])
-    >>> x = solve(a, b)
-    >>> x
-    array([ 2.,  3.])
-
-    Check that the solution is correct:
-
-    >>> np.allclose(np.dot(a, x), b)
-    True
-
-    """
-    if len(B.shape) == (len(A.shape) - 1):
-        gufunc = _impl.solve1
-    else:
-        gufunc = _impl.solve
-
-    return gufunc(A,B,**kw_args)
-
-
-def svd(a, full_matrices=1, compute_uv=1 ,**kw_args):
-    """
-    Singular Value Decomposition on the inner dimensions.
-
-    Factors the matrices in `a` as ``u * np.diag(s) * v``, where `u`
-    and `v` are unitary and `s` is a 1-d array of `a`'s singular
-    values.
-
-    Parameters
-    ----------
-    a : (..., M, N) array
-        The array of matrices to decompose.
-    full_matrices : bool, optional
-        If True (default), `u` and `v` have the shapes (`M`, `M`) and
-        (`N`, `N`), respectively. Otherwise, the shapes are (`M`, `K`)
-        and (`K`, `N`), respectively, where `K` = min(`M`, `N`).
-    compute_uv : bool, optional
-        Whether or not to compute `u` and `v` in addition to `s`. True
-        by default.
-
-    Returns
-    -------
-    u : { (..., M, M), (..., M, K) } array
-        Unitary matrices. The actual shape depends on the value of
-        ``full_matrices``. Only returned when ``compute_uv`` is True.
-    s : (..., K) array
-        The singular values for every matrix, sorted in descending order.
-    v : { (..., N, N), (..., K, N) } array
-        Unitary matrices. The actual shape depends on the value of
-        ``full_matrices``. Only returned when ``compute_uv`` is True.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    Singular Value Decomposition is performed using LAPACK routine _gesdd
-
-    For elements where the LAPACK routine fails, the result will be set
-    to NaNs.
-
-    Implemented for types single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    Examples
-    --------
-    >>> a = np.random.randn(9, 6) + 1j*np.random.randn(9, 6)
-
-    Reconstruction based on full SVD:
-
-    >>> U, s, V = svd(a, full_matrices=True)
-    >>> U.shape, V.shape, s.shape
-    ((9, 9), (6, 6), (6,))
-    >>> S = np.zeros((9, 6), dtype=complex)
-    >>> S[:6, :6] = np.diag(s)
-    >>> np.allclose(a, np.dot(U, np.dot(S, V)))
-    True
-
-    Reconstruction based on reduced SVD:
-
-    >>> U, s, V = svd(a, full_matrices=False)
-    >>> U.shape, V.shape, s.shape
-    ((9, 6), (6, 6), (6,))
-    >>> S = np.diag(s)
-    >>> np.allclose(a, np.dot(U, np.dot(S, V)))
-    True
-
-    """
-    m = a.shape[-2]
-    n = a.shape[-1]
-    if 1 == compute_uv:
-        if 1 == full_matrices:
-            if m < n:
-                gufunc = _impl.svd_m_f
-            else:
-                gufunc = _impl.svd_n_f
-        else:
-            if m < n:
-                gufunc = _impl.svd_m_s
-            else:
-                gufunc = _impl.svd_n_s
-    else:
-        if m < n:
-            gufunc = _impl.svd_m
-        else:
-            gufunc = _impl.svd_n
-    return gufunc(a, **kw_args)
-
-
-def chosolve(A, B, UPLO='L', **kw_args):
-    """
-    Solve the linear matrix equations on the inner dimensions, using
-    cholesky decomposition.
-
-    Computes the "exact" solution, `x`. of the well-determined,
-    i.e., full rank, linear matrix equations `ax = b`, where a is
-    a symmetric/hermitian positive-definite matrix.
-
-    Parameters
-    ----------
-    A : (..., M, M) array
-        Coefficient symmetric/hermitian positive-definite matrices.
-    B : (..., M, N) array
-        Ordinate or "dependent variable" values.
-    UPLO : {'L', 'U'}, optional
-         Specifies whether the calculation is done with the lower
-         triangular part of the elements in `A` ('L', default) or
-         the upper triangular part ('U').
-
-    Returns
-    -------
-    X : (..., M, N) array
-        Solutions to the system A X = B for all elements in the outer
-        dimensions
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    The solutions are computed using LAPACK routines _potrf, _potrs
-
-    For elements where the LAPACK routine fails, the result will be set
-    to NaNs.
-
-    Implemented for single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    See Also
-    --------
-    solve : solve a system using cholesky decomposition (for equations
-            having symmetric/hermitian coefficient matrices)
-
-    Examples
-    --------
-    Solve the system of equations ``3 * x0 + x1 = 9`` and ``x0 + 2 * x1 = 8``:
-    (note the matrix is symmetric in this system)
-
-    >>> a = np.array([[3,1], [1,2]])
-    >>> b = np.array([9,8])
-    >>> x = solve(a, b)
-    >>> x
-    array([ 2.,  3.])
-
-    Check that the solution is correct:
-
-    >>> np.allclose(np.dot(a, x), b)
-    True
-
-    """
-    if len(B.shape) == (len(A.shape) - 1):
-        if 'L' == UPLO:
-            gufunc = _impl.chosolve1_lo
-        else:
-            gufunc = _impl.chosolve1_up
-    else:
-        if 'L' == UPLO:
-            gufunc = _impl.chosolve_lo
-        else:
-            gufunc = _impl.chosolve_up
-
-    return gufunc(A, B, **kw_args)
-
-
-def poinv(A, UPLO='L', **kw_args):
-    """
-    Compute the (multiplicative) inverse of symmetric/hermitian positive
-    definite matrices, with broadcasting.
-
-    Given a square symmetic/hermitian positive-definite matrix `a`, return
-    the matrix `ainv` satisfying ``matrix_multiply(a, ainv) =
-    matrix_multiply(ainv, a) = Identity matrix``.
-
-    Parameters
-    ----------
-    a : (..., M, M) array
-        Symmetric/hermitian postive definite matrices to be inverted.
-
-    Returns
-    -------
-    ainv : (..., M, M) array
-        (Multiplicative) inverse of the `a` matrices.
-
-    Notes
-    -----
-    Numpy broadcasting rules apply.
-
-    The inverse is computed using LAPACK routines _potrf, _potri
-
-    For elements where the LAPACK routine fails, the result will be set
-    to NaNs.
-
-    Implemented for types single, double, csingle and cdouble. Numpy conversion
-    rules apply.
-
-    See Also
-    --------
-    inv : compute the multiplicative inverse of general matrices.
-
-    Examples
-    --------
-    >>> a = np.array([[5, 3], [3, 5]])
-    >>> ainv = poinv(a)
-    >>> np.allclose(matrix_multiply(a, ainv), np.eye(2))
-    True
-    >>> np.allclose(matrix_multiply(ainv, a), np.eye(2))
-    True
-
-    """
-    if 'L' == UPLO:
-        gufunc = _impl.poinv_lo
-    else:
-        gufunc = _impl.poinv_up
-
-    return gufunc(A, **kw_args);
-
-
-if __name__ == "__main__":
-    import doctest
-    doctest.testmod()
diff --git a/numpy/linalg/linalg.py b/numpy/linalg/linalg.py
index a1bb842ee..aa3bdea34 100644
--- a/numpy/linalg/linalg.py
+++ b/numpy/linalg/linalg.py
@@ -965,7 +965,7 @@ def eigvalsh(a, UPLO='L'):
     t, result_t = _commonType(a)
     signature = 'D->d' if isComplexType(t) else 'd->d'
     w = gufunc(a, signature=signature, extobj=extobj)
-    return w.astype(result_t)
+    return w.astype(_realType(result_t))
 
 def _convertarray(a):
     t, result_t = _commonType(a)
diff --git a/numpy/linalg/tests/test_gufuncs_linalg.py b/numpy/linalg/tests/test_gufuncs_linalg.py
deleted file mode 100644
index 40f8c4058..000000000
--- a/numpy/linalg/tests/test_gufuncs_linalg.py
+++ /dev/null
@@ -1,500 +0,0 @@
-"""
-Test functions for gufuncs_linalg module
-Heavily inspired (ripped in part) test_linalg
-"""
-from __future__ import division, print_function
-
-################################################################################
-# The following functions are implemented in the module "gufuncs_linalg"
-#
-# category "linalg"
-# - inv (TestInv)
-# - poinv (TestPoinv)
-# - det (TestDet)
-# - slogdet (TestDet)
-# - eig (TestEig)
-# - eigh (TestEigh)
-# - eigvals (TestEigvals)
-# - eigvalsh (TestEigvalsh)
-# - cholesky
-# - solve (TestSolve)
-# - chosolve (TestChosolve)
-# - svd (TestSVD)
-
-# ** unimplemented **
-# - qr
-# - matrix_power
-# - matrix_rank
-# - pinv
-# - lstsq
-# - tensorinv
-# - tensorsolve
-# - norm
-# - cond
-#
-# category "inspired by pdl"
-# - quadratic_form
-# - matrix_multiply3
-# - add3 (TestAdd3)
-# - multiply3 (TestMultiply3)
-# - multiply3_add (TestMultiply3Add)
-# - multiply_add (TestMultiplyAdd)
-# - multiply_add2 (TestMultiplyAdd2)
-# - multiply4 (TestMultiply4)
-# - multiply4_add (TestMultiply4Add)
-#
-# category "others"
-# - convolve
-# - inner1d
-# - innerwt
-# - matrix_multiply
-
-from nose.plugins.skip import Skip, SkipTest
-import numpy as np
-
-from numpy.testing import (TestCase, assert_, assert_equal, assert_raises,
-                           assert_array_equal, assert_almost_equal,
-                           run_module_suite)
-
-from numpy import array, single, double, csingle, cdouble, dot, identity
-from numpy import multiply, inf
-import numpy.linalg._gufuncs_linalg as gula
-
-old_assert_almost_equal = assert_almost_equal
-
-def assert_almost_equal(a, b, **kw):
-    if a.dtype.type in (single, csingle):
-        decimal = 5
-    else:
-        decimal = 10
-    old_assert_almost_equal(a, b, decimal = decimal, **kw)
-
-
-def assert_valid_eigen_no_broadcast(M, w, v, **kw):
-    lhs = gula.matrix_multiply(M, v)
-    rhs = w*v
-    assert_almost_equal(lhs, rhs, **kw)
-
-
-def assert_valid_eigen_recurse(M, w, v, **kw):
-    """check that w and v are valid eigenvalues/eigenvectors for matrix M
-    broadcast"""
-    if len(M.shape) > 2:
-        for i in range(M.shape[0]):
-            assert_valid_eigen_recurse(M[i], w[i], v[i], **kw)
-    else:
-        if len(M.shape) == 2:
-            assert_valid_eigen_no_broadcast(M, w, v, **kw)
-        else:
-            raise AssertionError('Not enough dimensions')
-
-
-def assert_valid_eigen(M, w, v, **kw):
-    if np.any(np.isnan(M)):
-        raise AssertionError('nan found in matrix')
-    if np.any(np.isnan(w)):
-        raise AssertionError('nan found in eigenvalues')
-    if np.any(np.isnan(v)):
-        raise AssertionError('nan found in eigenvectors')
-
-    assert_valid_eigen_recurse(M, w, v, **kw)
-
-
-def assert_valid_eigenvals_no_broadcast(M, w, **kw):
-    ident = np.eye(M.shape[0], dtype=M.dtype)
-    for i in range(w.shape[0]):
-        assert_almost_equal(gula.det(M - w[i]*ident), 0.0, **kw)
-
-
-def assert_valid_eigenvals_recurse(M, w, **kw):
-    if len(M.shape) > 2:
-        for i in range(M.shape[0]):
-            assert_valid_eigenvals_recurse(M[i], w[i], **kw)
-    else:
-        if len(M.shape) == 2:
-            assert_valid_eigenvals_no_broadcast(M, w, **kw)
-        else:
-            raise AssertionError('Not enough dimensions')
-
-
-def assert_valid_eigenvals(M, w, **kw):
-    if np.any(np.isnan(M)):
-        raise AssertionError('nan found in matrix')
-    if np.any(np.isnan(w)):
-        raise AssertionError('nan found in eigenvalues')
-    assert_valid_eigenvals_recurse(M, w, **kw)
-
-
-class MatrixGenerator(object):
-    def real_matrices(self):
-        a = [[1, 2],
-             [3, 4]]
-
-        b = [[4, 3],
-             [2, 1]]
-
-        return a, b
-
-    def real_symmetric_matrices(self):
-        a = [[ 2, -1],
-             [-1, 2]]
-
-        b = [[4, 3],
-             [2, 1]]
-
-        return a, b
-
-    def complex_matrices(self):
-        a = [[1+2j, 2+3j],
-             [3+4j, 4+5j]]
-
-        b = [[4+3j, 3+2j],
-             [2+1j, 1+0j]]
-
-        return a, b
-
-    def complex_hermitian_matrices(self):
-        a = [[2, -1],
-             [-1, 2]]
-
-        b = [[4+3j, 3+2j],
-             [2-1j, 1+0j]]
-
-        return a, b
-
-    def real_matrices_vector(self):
-        a, b = self.real_matrices()
-        return [a], [b]
-
-    def real_symmetric_matrices_vector(self):
-        a, b = self.real_symmetric_matrices()
-        return [a], [b]
-
-    def complex_matrices_vector(self):
-        a, b = self.complex_matrices()
-        return [a], [b]
-
-    def complex_hermitian_matrices_vector(self):
-        a, b = self.complex_hermitian_matrices()
-        return [a], [b]
-
-
-class GeneralTestCase(MatrixGenerator):
-    def test_single(self):
-        a, b = self.real_matrices()
-        self.do(array(a, dtype=single),
-                array(b, dtype=single))
-
-    def test_double(self):
-        a, b = self.real_matrices()
-        self.do(array(a, dtype=double),
-                array(b, dtype=double))
-
-    def test_csingle(self):
-        a, b = self.complex_matrices()
-        self.do(array(a, dtype=csingle),
-                array(b, dtype=csingle))
-
-    def test_cdouble(self):
-        a, b = self.complex_matrices()
-        self.do(array(a, dtype=cdouble),
-                array(b, dtype=cdouble))
-
-    def test_vector_single(self):
-        a, b = self.real_matrices_vector()
-        self.do(array(a, dtype=single),
-                array(b, dtype=single))
-
-    def test_vector_double(self):
-        a, b = self.real_matrices_vector()
-        self.do(array(a, dtype=double),
-                array(b, dtype=double))
-
-    def test_vector_csingle(self):
-        a, b = self.complex_matrices_vector()
-        self.do(array(a, dtype=csingle),
-                array(b, dtype=csingle))
-
-    def test_vector_cdouble(self):
-        a, b = self.complex_matrices_vector()
-        self.do(array(a, dtype=cdouble),
-                array(b, dtype=cdouble))
-
-
-class HermitianTestCase(MatrixGenerator):
-    def test_single(self):
-        a, b = self.real_symmetric_matrices()
-        self.do(array(a, dtype=single),
-                array(b, dtype=single))
-
-    def test_double(self):
-        a, b = self.real_symmetric_matrices()
-        self.do(array(a, dtype=double),
-                array(b, dtype=double))
-
-    def test_csingle(self):
-        a, b = self.complex_hermitian_matrices()
-        self.do(array(a, dtype=csingle),
-                array(b, dtype=csingle))
-
-    def test_cdouble(self):
-        a, b = self.complex_hermitian_matrices()
-        self.do(array(a, dtype=cdouble),
-                array(b, dtype=cdouble))
-
-    def test_vector_single(self):
-        a, b = self.real_symmetric_matrices_vector()
-        self.do(array(a, dtype=single),
-                array(b, dtype=single))
-
-    def test_vector_double(self):
-        a, b = self.real_symmetric_matrices_vector()
-        self.do(array(a, dtype=double),
-                array(b, dtype=double))
-
-    def test_vector_csingle(self):
-        a, b = self.complex_hermitian_matrices_vector()
-        self.do(array(a, dtype=csingle),
-                array(b, dtype=csingle))
-
-    def test_vector_cdouble(self):
-        a, b = self.complex_hermitian_matrices_vector()
-        self.do(array(a, dtype=cdouble),
-                array(b, dtype=cdouble))
-
-
-class TestMatrixMultiply(GeneralTestCase):
-    def do(self, a, b):
-        res = gula.matrix_multiply(a, b)
-        if a.ndim == 2:
-            assert_almost_equal(res, np.dot(a, b))
-        else:
-            assert_almost_equal(res[0], np.dot(a[0], b[0]))
-
-    def test_column_matrix(self):
-        A = np.arange(2*2).reshape((2, 2))
-        B = np.arange(2*1).reshape((2, 1))
-        res = gula.matrix_multiply(A, B)
-        assert_almost_equal(res, np.dot(A, B))
-
-class TestInv(GeneralTestCase, TestCase):
-    def do(self, a, b):
-        a_inv = gula.inv(a)
-        ident = identity(a.shape[-1])
-        if 3 == len(a.shape):
-            ident = ident.reshape((1, ident.shape[0], ident.shape[1]))
-        assert_almost_equal(gula.matrix_multiply(a, a_inv), ident)
-
-
-class TestPoinv(HermitianTestCase, TestCase):
-    def do(self, a, b):
-        a_inv = gula.poinv(a)
-        ident = identity(a.shape[-1])
-        if 3 == len(a.shape):
-            ident = ident.reshape((1, ident.shape[0], ident.shape[1]))
-
-        assert_almost_equal(a_inv, gula.inv(a))
-        assert_almost_equal(gula.matrix_multiply(a, a_inv), ident)
-
-
-class TestDet(GeneralTestCase, TestCase):
-    def do(self, a, b):
-        d = gula.det(a)
-        s, ld = gula.slogdet(a)
-        assert_almost_equal(s * np.exp(ld), d)
-
-        if np.csingle == a.dtype.type or np.single == a.dtype.type:
-            cmp_type=np.csingle
-        else:
-            cmp_type=np.cdouble
-
-        ev = gula.eigvals(a.astype(cmp_type))
-        assert_almost_equal(d.astype(cmp_type),
-                            multiply.reduce(ev.astype(cmp_type),
-                                            axis=(ev.ndim-1)))
-        if s != 0:
-            assert_almost_equal(np.abs(s), 1)
-        else:
-            assert_equal(ld, -inf)
-
-    def test_zero(self):
-        assert_equal(gula.det(array([[0.0]], dtype=single)), 0.0)
-        assert_equal(gula.det(array([[0.0]], dtype=double)), 0.0)
-        assert_equal(gula.det(array([[0.0]], dtype=csingle)), 0.0)
-        assert_equal(gula.det(array([[0.0]], dtype=cdouble)), 0.0)
-
-        assert_equal(gula.slogdet(array([[0.0]], dtype=single)), (0.0, -inf))
-        assert_equal(gula.slogdet(array([[0.0]], dtype=double)), (0.0, -inf))
-        assert_equal(gula.slogdet(array([[0.0]], dtype=csingle)), (0.0, -inf))
-        assert_equal(gula.slogdet(array([[0.0]], dtype=cdouble)), (0.0, -inf))
-
-    def test_types(self):
-        for typ in [(single, single),
-                    (double, double),
-                    (csingle, single),
-                    (cdouble, double)]:
-            for x in [ [0], [[0]], [[[0]]] ]:
-                assert_equal(gula.det(array(x, dtype=typ[0])).dtype, typ[0])
-                assert_equal(gula.slogdet(array(x, dtype=typ[0]))[0].dtype, typ[0])
-                assert_equal(gula.slogdet(array(x, dtype=typ[0]))[1].dtype, typ[1])
-
-
-class TestEig(GeneralTestCase, TestCase):
-    def do(self, a, b):
-        evalues, evectors = gula.eig(a)
-        assert_valid_eigenvals(a, evalues)
-        assert_valid_eigen(a, evalues, evectors)
-        ev = gula.eigvals(a)
-        assert_valid_eigenvals(a, evalues)
-        assert_almost_equal(ev, evalues)
-
-
-class TestEigh(HermitianTestCase, TestCase):
-    def do(self, a, b):
-        evalues_lo, evectors_lo = gula.eigh(a, UPLO='L')
-        evalues_up, evectors_up = gula.eigh(a, UPLO='U')
-
-        assert_valid_eigenvals(a, evalues_lo)
-        assert_valid_eigenvals(a, evalues_up)
-        assert_valid_eigen(a, evalues_lo, evectors_lo)
-        assert_valid_eigen(a, evalues_up, evectors_up)
-        assert_almost_equal(evalues_lo, evalues_up)
-        assert_almost_equal(evectors_lo, evectors_up)
-
-        ev_lo = gula.eigvalsh(a, UPLO='L')
-        ev_up = gula.eigvalsh(a, UPLO='U')
-        assert_valid_eigenvals(a, ev_lo)
-        assert_valid_eigenvals(a, ev_up)
-        assert_almost_equal(ev_lo, evalues_lo)
-        assert_almost_equal(ev_up, evalues_up)
-
-
-class TestSolve(GeneralTestCase, TestCase):
-    def do(self, a, b):
-        x = gula.solve(a, b)
-        assert_almost_equal(b, gula.matrix_multiply(a, x))
-
-
-class TestChosolve(HermitianTestCase, TestCase):
-    def do(self, a, b):
-        """
-        inner1d not defined for complex types.
-        todo: implement alternative test
-        """
-        if csingle == a.dtype or cdouble == a.dtype:
-            raise SkipTest
-
-        x_lo = gula.chosolve(a, b, UPLO='L')
-        x_up = gula.chosolve(a, b, UPLO='U')
-        assert_almost_equal(x_lo, x_up)
-        # inner1d not defined for complex types
-        # todo: implement alternative test
-        assert_almost_equal(b, gula.matrix_multiply(a, x_lo))
-        assert_almost_equal(b, gula.matrix_multiply(a, x_up))
-
-
-class TestSVD(GeneralTestCase, TestCase):
-    def do(self, a, b):
-        """ still work in progress """
-        raise SkipTest
-        u, s, vt = gula.svd(a, 0)
-        assert_almost_equal(a, dot(multiply(u, s), vt))
-
-"""
-class TestCholesky(HermitianTestCase, TestCase):
-    def do(self, a, b):
-        pass
-"""
-
-################################################################################
-# ufuncs inspired by pdl
-# - add3
-# - multiply3
-# - multiply3_add
-# - multiply_add
-# - multiply_add2
-# - multiply4
-# - multiply4_add
-
-class UfuncTestCase(object):
-    parameter = range(0, 10)
-
-    def _check_for_type(self, typ):
-        a = np.array(self.__class__.parameter, dtype=typ)
-        self.do(a)
-
-    def _check_for_type_vector(self, typ):
-        parameter = self.__class__.parameter
-        a = np.array([parameter, parameter], dtype=typ)
-        self.do(a)
-
-    def test_single(self):
-        self._check_for_type(single)
-
-    def test_double(self):
-        self._check_for_type(double)
-
-    def test_csingle(self):
-        self._check_for_type(csingle)
-
-    def test_cdouble(self):
-        self._check_for_type(cdouble)
-
-    def test_single_vector(self):
-        self._check_for_type_vector(single)
-
-    def test_double_vector(self):
-        self._check_for_type_vector(double)
-
-    def test_csingle_vector(self):
-        self._check_for_type_vector(csingle)
-
-    def test_cdouble_vector(self):
-        self._check_for_type_vector(cdouble)
-
-
-class TestAdd3(UfuncTestCase, TestCase):
-    def do(self, a):
-        r = gula.add3(a, a, a)
-        assert_almost_equal(r, a+a+a)
-
-
-class TestMultiply3(UfuncTestCase, TestCase):
-    def do(self, a):
-        r = gula.multiply3(a, a, a)
-        assert_almost_equal(r, a*a*a)
-
-
-class TestMultiply3Add(UfuncTestCase, TestCase):
-    def do(self, a):
-        r = gula.multiply3_add(a, a, a, a)
-        assert_almost_equal(r, a*a*a+a)
-
-
-class TestMultiplyAdd(UfuncTestCase, TestCase):
-    def do(self, a):
-        r = gula.multiply_add(a, a, a)
-        assert_almost_equal(r, a*a+a)
-
-
-class TestMultiplyAdd2(UfuncTestCase, TestCase):
-    def do(self, a):
-        r = gula.multiply_add2(a, a, a, a)
-        assert_almost_equal(r, a*a+a+a)
-
-
-class TestMultiply4(UfuncTestCase, TestCase):
-    def do(self, a):
-        r = gula.multiply4(a, a, a, a)
-        assert_almost_equal(r, a*a*a*a)
-
-
-class TestMultiply4_add(UfuncTestCase, TestCase):
-    def do(self, a):
-        r = gula.multiply4_add(a, a, a, a, a)
-        assert_almost_equal(r, a*a*a*a+a)
-
-
-if __name__ == "__main__":
-    print('testing gufuncs_linalg; gufuncs version: %s' % gula._impl.__version__)
-    run_module_suite()
diff --git a/numpy/linalg/tests/test_linalg.py b/numpy/linalg/tests/test_linalg.py
index 2c003a072..aefb61b14 100644
--- a/numpy/linalg/tests/test_linalg.py
+++ b/numpy/linalg/tests/test_linalg.py
@@ -5,6 +5,8 @@ from __future__ import division, absolute_import, print_function
 
 import os
 import sys
+import itertools
+import traceback
 
 import numpy as np
 from numpy import array, single, double, csingle, cdouble, dot, identity
@@ -12,8 +14,9 @@ from numpy import multiply, atleast_2d, inf, asarray, matrix
 from numpy import linalg
 from numpy.linalg import matrix_power, norm, matrix_rank
 from numpy.testing import (
-    TestCase, assert_, assert_equal, assert_raises, assert_array_equal,
-    assert_almost_equal, run_module_suite, dec
+    assert_, assert_equal, assert_raises, assert_array_equal,
+    assert_almost_equal, assert_allclose, run_module_suite,
+    dec
 )
 
 
@@ -45,162 +48,299 @@ def get_complex_dtype(dtype):
     return {single: csingle, double: cdouble,
             csingle: csingle, cdouble: cdouble}[dtype]
 
+def get_rtol(dtype):
+    # Choose a safe rtol
+    if dtype in (np.single, csingle):
+        return 1e-5
+    else:
+        return 1e-11
+
+class LinalgCase(object):
+    def __init__(self, name, a, b, exception_cls=None):
+        assert isinstance(name, str)
+        self.name = name
+        self.a = a
+        self.b = b
+        self.exception_cls = exception_cls
+
+    def check(self, do):
+        if self.exception_cls is None:
+            do(self.a, self.b)
+        else:
+            assert_raises(self.exception_cls, do, self.a, self.b)
+
+    def __repr__(self):
+        return "<LinalgCase: %s>" % (self.name,)
+
+
+#
+# Base test cases
+#
+
+np.random.seed(1234)
+
+SQUARE_CASES = [
+    LinalgCase("single",
+               array([[1., 2.], [3., 4.]], dtype=single),
+               array([2., 1.], dtype=single)),
+    LinalgCase("double",
+               array([[1., 2.], [3., 4.]], dtype=double),
+               array([2., 1.], dtype=double)),
+    LinalgCase("double_2",
+               array([[1., 2.], [3., 4.]], dtype=double),
+               array([[2., 1., 4.], [3., 4., 6.]], dtype=double)),
+    LinalgCase("csingle",
+               array([[1.+2j, 2+3j], [3+4j, 4+5j]], dtype=csingle),
+               array([2.+1j, 1.+2j], dtype=csingle)),
+    LinalgCase("cdouble",
+               array([[1.+2j, 2+3j], [3+4j, 4+5j]], dtype=cdouble),
+               array([2.+1j, 1.+2j], dtype=cdouble)),
+    LinalgCase("cdouble_2",
+               array([[1.+2j, 2+3j], [3+4j, 4+5j]], dtype=cdouble),
+               array([[2.+1j, 1.+2j, 1+3j], [1-2j, 1-3j, 1-6j]], dtype=cdouble)),
+    LinalgCase("empty",
+               atleast_2d(array([], dtype = double)),
+               atleast_2d(array([], dtype = double)),
+               linalg.LinAlgError),
+    LinalgCase("8x8",
+               np.random.rand(8, 8),
+               np.random.rand(8)),
+    LinalgCase("nonarray",
+               [[1, 2], [3, 4]],
+               [2, 1]),
+    LinalgCase("matrix_b_only",
+               array([[1., 2.], [3., 4.]]),
+               matrix([2., 1.]).T),
+    LinalgCase("matrix_a_and_b",
+               matrix([[1., 2.], [3., 4.]]),
+               matrix([2., 1.]).T),
+]
+
+NONSQUARE_CASES = [
+    LinalgCase("single_nsq_1",
+               array([[1., 2., 3.], [3., 4., 6.]], dtype=single),
+               array([2., 1.], dtype=single)),
+    LinalgCase("single_nsq_2",
+               array([[1., 2.], [3., 4.], [5., 6.]], dtype=single),
+               array([2., 1., 3.], dtype=single)),
+    LinalgCase("double_nsq_1",
+               array([[1., 2., 3.], [3., 4., 6.]], dtype=double),
+               array([2., 1.], dtype=double)),
+    LinalgCase("double_nsq_2",
+               array([[1., 2.], [3., 4.], [5., 6.]], dtype=double),
+               array([2., 1., 3.], dtype=double)),
+    LinalgCase("csingle_nsq_1",
+               array([[1.+1j, 2.+2j, 3.-3j], [3.-5j, 4.+9j, 6.+2j]], dtype=csingle),
+               array([2.+1j, 1.+2j], dtype=csingle)),
+    LinalgCase("csingle_nsq_2",
+               array([[1.+1j, 2.+2j], [3.-3j, 4.-9j], [5.-4j, 6.+8j]], dtype=csingle),
+               array([2.+1j, 1.+2j, 3.-3j], dtype=csingle)),
+    LinalgCase("cdouble_nsq_1",
+               array([[1.+1j, 2.+2j, 3.-3j], [3.-5j, 4.+9j, 6.+2j]], dtype=cdouble),
+               array([2.+1j, 1.+2j], dtype=cdouble)),
+    LinalgCase("cdouble_nsq_2",
+               array([[1.+1j, 2.+2j], [3.-3j, 4.-9j], [5.-4j, 6.+8j]], dtype=cdouble),
+               array([2.+1j, 1.+2j, 3.-3j], dtype=cdouble)),
+    LinalgCase("cdouble_nsq_1_2",
+               array([[1.+1j, 2.+2j, 3.-3j], [3.-5j, 4.+9j, 6.+2j]], dtype=cdouble),
+               array([[2.+1j, 1.+2j], [1-1j, 2-2j]], dtype=cdouble)),
+    LinalgCase("cdouble_nsq_2_2",
+               array([[1.+1j, 2.+2j], [3.-3j, 4.-9j], [5.-4j, 6.+8j]], dtype=cdouble),
+               array([[2.+1j, 1.+2j], [1-1j, 2-2j], [1-1j, 2-2j]], dtype=cdouble)),
+    LinalgCase("8x11",
+               np.random.rand(8, 11),
+               np.random.rand(11)),
+]
+
+HERMITIAN_CASES = [
+    LinalgCase("hsingle",
+               array([[1., 2.], [2., 1.]], dtype=single),
+               None),
+    LinalgCase("hdouble",
+               array([[1., 2.], [2., 1.]], dtype=double),
+               None),
+    LinalgCase("hcsingle",
+               array([[1., 2+3j], [2-3j, 1]], dtype=csingle),
+               None),
+    LinalgCase("hcdouble",
+               array([[1., 2+3j], [2-3j, 1]], dtype=cdouble),
+               None),
+    LinalgCase("hempty",
+               atleast_2d(array([], dtype = double)),
+               None,
+               linalg.LinAlgError),
+    LinalgCase("hnonarray",
+               [[1, 2], [2, 1]],
+               None),
+    LinalgCase("matrix_b_only",
+               array([[1., 2.], [2., 1.]]),
+               None),
+    LinalgCase("hmatrix_a_and_b",
+               matrix([[1., 2.], [2., 1.]]),
+               None)
+]
+
+
+#
+# Gufunc test cases
+#
+
+GENERALIZED_SQUARE_CASES = []
+GENERALIZED_NONSQUARE_CASES = []
+GENERALIZED_HERMITIAN_CASES = []
+
+for tgt, src in ((GENERALIZED_SQUARE_CASES, SQUARE_CASES),
+                 (GENERALIZED_NONSQUARE_CASES, NONSQUARE_CASES),
+                 (GENERALIZED_HERMITIAN_CASES, HERMITIAN_CASES)):
+    for case in src:
+        if not isinstance(case.a, np.ndarray):
+            continue
+        
+        a = np.array([case.a, 2*case.a, 3*case.a])
+        if case.b is None:
+            b = None
+        else:
+            b = np.array([case.b, 7*case.b, 6*case.b])
+        new_case = LinalgCase(case.name + "_tile3", a, b,
+                              case.exception_cls)
+        tgt.append(new_case)
+
+        a = np.array([case.a]*2*3).reshape((3, 2) + case.a.shape)
+        if case.b is None:
+            b = None
+        else:
+            b = np.array([case.b]*2*3).reshape((3, 2) + case.b.shape)
+        new_case = LinalgCase(case.name + "_tile213", a, b,
+                              case.exception_cls)
+        tgt.append(new_case)
+
+#
+# Generate stride combination variations of the above
+#
+
+def _stride_comb_iter(x):
+    """
+    Generate cartesian product of strides for all axes
+    """
+
+    if not isinstance(x, np.ndarray):
+        yield x, "nop"
+        return
+
+    stride_set = [(1,)]*x.ndim
+    stride_set[-1] = (1, 3, -4)
+    if x.ndim > 1:
+        stride_set[-2] = (1, 3, -4)
+    if x.ndim > 2:
+        stride_set[-3] = (1, -4)
+
+    for repeats in itertools.product(*tuple(stride_set)):
+        new_shape = [abs(a*b) for a, b in zip(x.shape, repeats)]
+        slices = tuple([slice(None, None, repeat) for repeat in repeats])
+
+        # new array with different strides, but same data
+        xi = np.empty(new_shape, dtype=x.dtype)
+        xi.view(np.uint32).fill(0xdeadbeef)
+        xi = xi[slices]
+        xi[...] = x
+        xi = xi.view(x.__class__)
+        assert np.all(xi == x)
+        yield xi, "stride_" + "_".join(["%+d" % j for j in repeats])
+
+for src in (SQUARE_CASES,
+            NONSQUARE_CASES,
+            HERMITIAN_CASES,
+            GENERALIZED_SQUARE_CASES,
+            GENERALIZED_NONSQUARE_CASES,
+            GENERALIZED_HERMITIAN_CASES):
+
+    new_cases = []
+    for case in src:
+        for a, a_tag in _stride_comb_iter(case.a):
+            for b, b_tag in _stride_comb_iter(case.b):
+                new_case = LinalgCase(case.name + "_" + a_tag + "_" + b_tag, a, b,
+                                      exception_cls=case.exception_cls)
+                new_cases.append(new_case)
+    src.extend(new_cases)
+
+
+#
+# Test different routines against the above cases
+#
+
+def _check_cases(func, cases):
+    for case in cases:
+        try:
+            case.check(func)
+        except Exception:
+            msg = "In test case: %r\n\n" % case
+            msg += traceback.format_exc()
+            raise AssertionError(msg)
 
 class LinalgTestCase(object):
-    def test_single(self):
-        a = array([[1., 2.], [3., 4.]], dtype=single)
-        b = array([2., 1.], dtype=single)
-        self.do(a, b)
-
-    def test_double(self):
-        a = array([[1., 2.], [3., 4.]], dtype=double)
-        b = array([2., 1.], dtype=double)
-        self.do(a, b)
-
-    def test_double_2(self):
-        a = array([[1., 2.], [3., 4.]], dtype=double)
-        b = array([[2., 1., 4.], [3., 4., 6.]], dtype=double)
-        self.do(a, b)
-
-    def test_csingle(self):
-        a = array([[1.+2j, 2+3j], [3+4j, 4+5j]], dtype=csingle)
-        b = array([2.+1j, 1.+2j], dtype=csingle)
-        self.do(a, b)
-
-    def test_cdouble(self):
-        a = array([[1.+2j, 2+3j], [3+4j, 4+5j]], dtype=cdouble)
-        b = array([2.+1j, 1.+2j], dtype=cdouble)
-        self.do(a, b)
-
-    def test_cdouble_2(self):
-        a = array([[1.+2j, 2+3j], [3+4j, 4+5j]], dtype=cdouble)
-        b = array([[2.+1j, 1.+2j, 1+3j], [1-2j, 1-3j, 1-6j]], dtype=cdouble)
-        self.do(a, b)
+    def test_sq_cases(self):
+        _check_cases(self.do, SQUARE_CASES)
 
-    def test_empty(self):
-        a = atleast_2d(array([], dtype = double))
-        b = atleast_2d(array([], dtype = double))
-        try:
-            self.do(a, b)
-            raise AssertionError("%s should fail with empty matrices", self.__name__[5:])
-        except linalg.LinAlgError as e:
-            pass
 
-    def test_nonarray(self):
-        a = [[1, 2], [3, 4]]
-        b = [2, 1]
-        self.do(a, b)
+class LinalgNonsquareTestCase(object):
+    def test_sq_cases(self):
+        _check_cases(self.do, NONSQUARE_CASES)
 
-    def test_matrix_b_only(self):
-        """Check that matrix type is preserved."""
-        a = array([[1., 2.], [3., 4.]])
-        b = matrix([2., 1.]).T
-        self.do(a, b)
 
-    def test_matrix_a_and_b(self):
-        """Check that matrix type is preserved."""
-        a = matrix([[1., 2.], [3., 4.]])
-        b = matrix([2., 1.]).T
-        self.do(a, b)
+class LinalgGeneralizedTestCase(object):
+    @dec.slow
+    def test_generalized_sq_cases(self):
+        _check_cases(self.do, GENERALIZED_SQUARE_CASES)
 
 
-class LinalgNonsquareTestCase(object):
-    def test_single_nsq_1(self):
-        a = array([[1., 2., 3.], [3., 4., 6.]], dtype=single)
-        b = array([2., 1.], dtype=single)
-        self.do(a, b)
-
-    def test_single_nsq_2(self):
-        a = array([[1., 2.], [3., 4.], [5., 6.]], dtype=single)
-        b = array([2., 1., 3.], dtype=single)
-        self.do(a, b)
-
-    def test_double_nsq_1(self):
-        a = array([[1., 2., 3.], [3., 4., 6.]], dtype=double)
-        b = array([2., 1.], dtype=double)
-        self.do(a, b)
-
-    def test_double_nsq_2(self):
-        a = array([[1., 2.], [3., 4.], [5., 6.]], dtype=double)
-        b = array([2., 1., 3.], dtype=double)
-        self.do(a, b)
-
-    def test_csingle_nsq_1(self):
-        a = array([[1.+1j, 2.+2j, 3.-3j], [3.-5j, 4.+9j, 6.+2j]], dtype=csingle)
-        b = array([2.+1j, 1.+2j], dtype=csingle)
-        self.do(a, b)
-
-    def test_csingle_nsq_2(self):
-        a = array([[1.+1j, 2.+2j], [3.-3j, 4.-9j], [5.-4j, 6.+8j]], dtype=csingle)
-        b = array([2.+1j, 1.+2j, 3.-3j], dtype=csingle)
-        self.do(a, b)
-
-    def test_cdouble_nsq_1(self):
-        a = array([[1.+1j, 2.+2j, 3.-3j], [3.-5j, 4.+9j, 6.+2j]], dtype=cdouble)
-        b = array([2.+1j, 1.+2j], dtype=cdouble)
-        self.do(a, b)
-
-    def test_cdouble_nsq_2(self):
-        a = array([[1.+1j, 2.+2j], [3.-3j, 4.-9j], [5.-4j, 6.+8j]], dtype=cdouble)
-        b = array([2.+1j, 1.+2j, 3.-3j], dtype=cdouble)
-        self.do(a, b)
-
-    def test_cdouble_nsq_1_2(self):
-        a = array([[1.+1j, 2.+2j, 3.-3j], [3.-5j, 4.+9j, 6.+2j]], dtype=cdouble)
-        b = array([[2.+1j, 1.+2j], [1-1j, 2-2j]], dtype=cdouble)
-        self.do(a, b)
-
-    def test_cdouble_nsq_2_2(self):
-        a = array([[1.+1j, 2.+2j], [3.-3j, 4.-9j], [5.-4j, 6.+8j]], dtype=cdouble)
-        b = array([[2.+1j, 1.+2j], [1-1j, 2-2j], [1-1j, 2-2j]], dtype=cdouble)
-        self.do(a, b)
-
-
-def _generalized_testcase(new_cls_name, old_cls):
-    def get_method(old_name, new_name):
-        def method(self):
-            base = old_cls()
-            def do(a, b):
-                a = np.array([a, a, a])
-                b = np.array([b, b, b])
-                self.do(a, b)
-            base.do = do
-            getattr(base, old_name)()
-        method.__name__ = new_name
-        return method
-
-    dct = dict()
-    for old_name in dir(old_cls):
-        if old_name.startswith('test_'):
-            new_name = old_name + '_generalized'
-            dct[new_name] = get_method(old_name, new_name)
-
-    return type(new_cls_name, (object,), dct)
-
-LinalgGeneralizedTestCase = _generalized_testcase(
-    'LinalgGeneralizedTestCase', LinalgTestCase)
-LinalgGeneralizedNonsquareTestCase = _generalized_testcase(
-    'LinalgGeneralizedNonsquareTestCase', LinalgNonsquareTestCase)
+class LinalgGeneralizedNonsquareTestCase(object):
+    @dec.slow
+    def test_generalized_nonsq_cases(self):
+        _check_cases(self.do, GENERALIZED_NONSQUARE_CASES)
+
+
+class HermitianTestCase(object):
+    def test_herm_cases(self):
+        _check_cases(self.do, HERMITIAN_CASES)
+
+
+class HermitianGeneralizedTestCase(object):
+    @dec.slow
+    def test_generalized_herm_cases(self):
+        _check_cases(self.do, GENERALIZED_HERMITIAN_CASES)
 
 
 def dot_generalized(a, b):
     a = asarray(a)
-    if a.ndim == 3:
-        return np.array([dot(ax, bx) for ax, bx in zip(a, b)])
-    elif a.ndim > 3:
-        raise ValueError("Not implemented...")
-    return dot(a, b)
+    if a.ndim >= 3:
+        if a.ndim == b.ndim:
+            # matrix x matrix
+            new_shape = a.shape[:-1] + b.shape[-1:]
+        elif a.ndim == b.ndim + 1:
+            # matrix x vector
+            new_shape = a.shape[:-1]
+        else:
+            raise ValueError("Not implemented...")
+        r = np.empty(new_shape, dtype=np.common_type(a, b))
+        for c in itertools.product(*map(range, a.shape[:-2])):
+            r[c] = dot(a[c], b[c])
+        return r
+    else:
+        return dot(a, b)
+
 
 def identity_like_generalized(a):
     a = asarray(a)
-    if a.ndim == 3:
-        return np.array([identity(a.shape[-2]) for ax in a])
-    elif a.ndim > 3:
-        raise ValueError("Not implemented...")
-    return identity(a.shape[0])
+    if a.ndim >= 3:
+        r = np.empty(a.shape, dtype=a.dtype)
+        for c in itertools.product(*map(range, a.shape[:-2])):
+            r[c] = identity(a.shape[-2])
+        return r
+    else:
+        return identity(a.shape[0])
 
 
-class TestSolve(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
+class TestSolve(LinalgTestCase, LinalgGeneralizedTestCase):
     def do(self, a, b):
         x = linalg.solve(a, b)
         assert_almost_equal(b, dot_generalized(a, x))
@@ -265,7 +405,7 @@ class TestSolve(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
         assert_(isinstance(result, ArraySubclass))
 
 
-class TestInv(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
+class TestInv(LinalgTestCase, LinalgGeneralizedTestCase):
     def do(self, a, b):
         a_inv = linalg.inv(a)
         assert_almost_equal(dot_generalized(a, a_inv),
@@ -295,7 +435,7 @@ class TestInv(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
         assert_equal(a.shape, res.shape)
 
 
-class TestEigvals(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
+class TestEigvals(LinalgTestCase, LinalgGeneralizedTestCase):
     def do(self, a, b):
         ev = linalg.eigvals(a)
         evalues, evectors = linalg.eig(a)
@@ -311,13 +451,12 @@ class TestEigvals(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
             yield check, dtype
 
 
-class TestEig(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
+class TestEig(LinalgTestCase, LinalgGeneralizedTestCase):
     def do(self, a, b):
         evalues, evectors = linalg.eig(a)
-        if evectors.ndim == 3:
-            assert_almost_equal(dot_generalized(a, evectors), evectors * evalues[:, None,:])
-        else:
-            assert_almost_equal(dot(a, evectors), multiply(evectors, evalues))
+        assert_allclose(dot_generalized(a, evectors),
+                        np.asarray(evectors) * np.asarray(evalues)[...,None,:],
+                        rtol=get_rtol(evalues.dtype))
         assert_(imply(isinstance(a, matrix), isinstance(evectors, matrix)))
 
     def test_types(self):
@@ -333,16 +472,15 @@ class TestEig(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
             assert_equal(v.dtype, get_complex_dtype(dtype))
 
         for dtype in [single, double, csingle, cdouble]:
-            yield dtype
+            yield check, dtype
 
 
-class TestSVD(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
+class TestSVD(LinalgTestCase, LinalgGeneralizedTestCase):
     def do(self, a, b):
         u, s, vt = linalg.svd(a, 0)
-        if u.ndim == 3:
-            assert_almost_equal(a, dot_generalized(u * s[:, None,:], vt))
-        else:
-            assert_almost_equal(a, dot(multiply(u, s), vt))
+        assert_allclose(a, dot_generalized(np.asarray(u) * np.asarray(s)[...,None,:],
+                                           np.asarray(vt)),
+                        rtol=get_rtol(u.dtype))
         assert_(imply(isinstance(a, matrix), isinstance(u, matrix)))
         assert_(imply(isinstance(a, matrix), isinstance(vt, matrix)))
 
@@ -360,34 +498,34 @@ class TestSVD(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
             yield check, dtype
 
 
-class TestCondSVD(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
+class TestCondSVD(LinalgTestCase, LinalgGeneralizedTestCase):
     def do(self, a, b):
         c = asarray(a) # a might be a matrix
         s = linalg.svd(c, compute_uv=False)
         old_assert_almost_equal(s[0]/s[-1], linalg.cond(a), decimal=5)
 
 
-class TestCond2(LinalgTestCase, TestCase):
+class TestCond2(LinalgTestCase):
     def do(self, a, b):
         c = asarray(a) # a might be a matrix
         s = linalg.svd(c, compute_uv=False)
         old_assert_almost_equal(s[0]/s[-1], linalg.cond(a, 2), decimal=5)
 
 
-class TestCondInf(TestCase):
+class TestCondInf(object):
     def test(self):
         A = array([[1., 0, 0], [0, -2., 0], [0, 0, 3.]])
         assert_almost_equal(linalg.cond(A, inf), 3.)
 
 
-class TestPinv(LinalgTestCase, TestCase):
+class TestPinv(LinalgTestCase):
     def do(self, a, b):
         a_ginv = linalg.pinv(a)
         assert_almost_equal(dot(a, a_ginv), identity(asarray(a).shape[0]))
         assert_(imply(isinstance(a, matrix), isinstance(a_ginv, matrix)))
 
 
-class TestDet(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
+class TestDet(LinalgTestCase, LinalgGeneralizedTestCase):
     def do(self, a, b):
         d = linalg.det(a)
         (s, ld) = linalg.slogdet(a)
@@ -421,12 +559,15 @@ class TestDet(LinalgTestCase, LinalgGeneralizedTestCase, TestCase):
     def test_types(self):
         def check(dtype):
             x = np.array([[1, 0.5], [0.5, 1]], dtype=dtype)
-            assert_equal(np.linalg.det(x), get_real_dtype(dtype))
+            assert_equal(np.linalg.det(x).dtype, dtype)
+            ph, s = np.linalg.slogdet(x)
+            assert_equal(s.dtype, get_real_dtype(dtype))
+            assert_equal(ph.dtype, dtype)
         for dtype in [single, double, csingle, cdouble]:
             yield check, dtype
 
 
-class TestLstsq(LinalgTestCase, LinalgNonsquareTestCase, TestCase):
+class TestLstsq(LinalgTestCase, LinalgNonsquareTestCase):
     def do(self, a, b):
         arr = np.asarray(a)
         m, n = arr.shape
@@ -506,70 +647,38 @@ class TestMatrixPower(object):
                       lambda: matrix_power(self.noninv, -1))
 
 
-class TestBoolPower(TestCase):
+class TestBoolPower(object):
     def test_square(self):
         A = array([[True, False], [True, True]])
         assert_equal(matrix_power(A, 2), A)
 
 
-class HermitianTestCase(object):
-    def test_single(self):
-        a = array([[1., 2.], [2., 1.]], dtype=single)
-        self.do(a, None)
-
-    def test_double(self):
-        a = array([[1., 2.], [2., 1.]], dtype=double)
-        self.do(a, None)
-
-    def test_csingle(self):
-        a = array([[1., 2+3j], [2-3j, 1]], dtype=csingle)
-        self.do(a, None)
-
-    def test_cdouble(self):
-        a = array([[1., 2+3j], [2-3j, 1]], dtype=cdouble)
-        self.do(a, None)
-
-    def test_empty(self):
-        a = atleast_2d(array([], dtype = double))
-        assert_raises(linalg.LinAlgError, self.do, a, None)
-
-    def test_nonarray(self):
-        a = [[1, 2], [2, 1]]
-        self.do(a, None)
-
-    def test_matrix_b_only(self):
-        """Check that matrix type is preserved."""
-        a = array([[1., 2.], [2., 1.]])
-        self.do(a, None)
-
-    def test_matrix_a_and_b(self):
-        """Check that matrix type is preserved."""
-        a = matrix([[1., 2.], [2., 1.]])
-        self.do(a, None)
-
-
-HermitianGeneralizedTestCase = _generalized_testcase(
-    'HermitianGeneralizedTestCase', HermitianTestCase)
-
-class TestEigvalsh(HermitianTestCase, HermitianGeneralizedTestCase, TestCase):
+class TestEigvalsh(HermitianTestCase, HermitianGeneralizedTestCase):
     def do(self, a, b):
         # note that eigenvalue arrays must be sorted since
         # their order isn't guaranteed.
-        ev = linalg.eigvalsh(a)
+        ev = linalg.eigvalsh(a, 'L')
         evalues, evectors = linalg.eig(a)
         ev.sort(axis=-1)
         evalues.sort(axis=-1)
-        assert_almost_equal(ev, evalues)
+        assert_allclose(ev, evalues,
+                        rtol=get_rtol(ev.dtype))
+
+        ev2 = linalg.eigvalsh(a, 'U')
+        ev2.sort(axis=-1)
+        assert_allclose(ev2, evalues,
+                        rtol=get_rtol(ev.dtype))
 
     def test_types(self):
         def check(dtype):
             x = np.array([[1, 0.5], [0.5, 1]], dtype=dtype)
-            assert_equal(np.linalg.eigvalsh(x), get_real_dtype(dtype))
+            w = np.linalg.eigvalsh(x)
+            assert_equal(w.dtype, get_real_dtype(dtype))
         for dtype in [single, double, csingle, cdouble]:
             yield check, dtype
 
 
-class TestEigh(HermitianTestCase, HermitianGeneralizedTestCase, TestCase):
+class TestEigh(HermitianTestCase, HermitianGeneralizedTestCase):
     def do(self, a, b):
         # note that eigenvalue arrays must be sorted since
         # their order isn't guaranteed.
@@ -579,17 +688,29 @@ class TestEigh(HermitianTestCase, HermitianGeneralizedTestCase, TestCase):
         evalues.sort(axis=-1)
         assert_almost_equal(ev, evalues)
 
+        assert_allclose(dot_generalized(a, evc),
+                        np.asarray(ev)[...,None,:] * np.asarray(evc),
+                        rtol=get_rtol(ev.dtype))
+
+        ev2, evc2 = linalg.eigh(a, 'U')
+        ev2.sort(axis=-1)
+        assert_almost_equal(ev2, evalues)
+
+        assert_allclose(dot_generalized(a, evc2),
+                        np.asarray(ev2)[...,None,:] * np.asarray(evc2),
+                        rtol=get_rtol(ev.dtype))
+
     def test_types(self):
         def check(dtype):
             x = np.array([[1, 0.5], [0.5, 1]], dtype=dtype)
-            w, v = np.linalg.eig(x)
-            assert_equal(w, get_real_dtype(dtype))
-            assert_equal(v, dtype)
+            w, v = np.linalg.eigh(x)
+            assert_equal(w.dtype, get_real_dtype(dtype))
+            assert_equal(v.dtype, dtype)
         for dtype in [single, double, csingle, cdouble]:
             yield check, dtype
 
 
-class _TestNorm(TestCase):
+class _TestNorm(object):
 
     dt = None
     dec = None
@@ -640,9 +761,9 @@ class _TestNorm(TestCase):
         assert_almost_equal(norm(A, 2), 9.1231056256176615)
         assert_almost_equal(norm(A, -2), 0.87689437438234041)
 
-        self.assertRaises(ValueError, norm, A, 'nofro')
-        self.assertRaises(ValueError, norm, A, -3)
-        self.assertRaises(ValueError, norm, A, 0)
+        assert_raises(ValueError, norm, A, 'nofro')
+        assert_raises(ValueError, norm, A, -3)
+        assert_raises(ValueError, norm, A, 0)
 
     def test_axis(self):
         # Vector norms.
@@ -687,20 +808,20 @@ class _TestNorm(TestCase):
         # Using `axis=<integer>` or passing in a 1-D array implies vector
         # norms are being computed, so also using `ord='fro'` raises a
         # ValueError.
-        self.assertRaises(ValueError, norm, A, 'fro', 0)
-        self.assertRaises(ValueError, norm, [3, 4], 'fro', None)
+        assert_raises(ValueError, norm, A, 'fro', 0)
+        assert_raises(ValueError, norm, [3, 4], 'fro', None)
 
         # Similarly, norm should raise an exception when ord is any finite
         # number other than 1, 2, -1 or -2 when computing matrix norms.
         for order in [0, 3]:
-            self.assertRaises(ValueError, norm, A, order, None)
-            self.assertRaises(ValueError, norm, A, order, (0, 1))
-            self.assertRaises(ValueError, norm, B, order, (1, 2))
+            assert_raises(ValueError, norm, A, order, None)
+            assert_raises(ValueError, norm, A, order, (0, 1))
+            assert_raises(ValueError, norm, B, order, (1, 2))
 
         # Invalid axis
-        self.assertRaises(ValueError, norm, B, None, 3)
-        self.assertRaises(ValueError, norm, B, None, (2, 3))
-        self.assertRaises(ValueError, norm, B, None, (0, 1, 2))
+        assert_raises(ValueError, norm, B, None, 3)
+        assert_raises(ValueError, norm, B, None, (2, 3))
+        assert_raises(ValueError, norm, B, None, (0, 1, 2))
 
 
 class TestNormDouble(_TestNorm):
@@ -751,7 +872,7 @@ def test_reduced_rank():
         assert_equal(matrix_rank(X), 8)
 
 
-class TestQR(TestCase):
+class TestQR(object):
 
     def check_qr(self, a):
         # This test expects the argument `a` to be an ndarray or
@@ -793,7 +914,7 @@ class TestQR(TestCase):
 
     def test_qr_empty(self):
         a = np.zeros((0, 2))
-        self.assertRaises(linalg.LinAlgError, linalg.qr, a)
+        assert_raises(linalg.LinAlgError, linalg.qr, a)
 
     def test_mode_raw(self):
         # The factorization is not unique and varies between libraries,
diff --git a/numpy/linalg/umath_linalg.c.src b/numpy/linalg/umath_linalg.c.src
index 3d9157dd9..3c2b316e2 100644
--- a/numpy/linalg/umath_linalg.c.src
+++ b/numpy/linalg/umath_linalg.c.src
@@ -819,9 +819,17 @@ linearize_@TYPE@_matrix(void *dst_in,
             (fortran_int)(data->column_strides/sizeof(@typ@));
         fortran_int one = 1;
         for (i=0; i< data->rows; i++) {
-            FNAME(@copy@)(&columns,
-                          (void*)src, &column_strides,
-                          (void*)dst, &one);
+            if (column_strides >= 0) {
+                FNAME(@copy@)(&columns,
+                              (void*)src, &column_strides,
+                              (void*)dst, &one);
+            }
+            else {
+                FNAME(@copy@)(&columns,
+                              (void*)((@typ@*)src + (columns-1)*column_strides),
+                              &column_strides,
+                              (void*)dst, &one);
+            }
             src += data->row_strides/sizeof(@typ@);
             dst += data->columns;
         }
@@ -847,9 +855,17 @@ delinearize_@TYPE@_matrix(void *dst_in,
             (fortran_int)(data->column_strides/sizeof(@typ@));
         fortran_int one = 1;
         for (i=0; i < data->rows; i++) {
-            FNAME(@copy@)(&columns,
-                          (void*)src, &one,
-                          (void*)dst, &column_strides);
+            if (column_strides >= 0) {
+                FNAME(@copy@)(&columns,
+                              (void*)src, &one,
+                              (void*)dst, &column_strides);
+            }
+            else {
+                FNAME(@copy@)(&columns,
+                              (void*)src, &one,
+                              (void*)((@typ@*)dst + (columns-1)*column_strides),
+                              &column_strides);
+            }
             src += data->columns;
             dst += data->row_strides/sizeof(@typ@);
         }
@@ -877,45 +893,6 @@ nan_@TYPE@_matrix(void *dst_in, const LINEARIZE_DATA_t* data)
     }
 }
 
-static inline void
-make_symmetric_@TYPE@_matrix(char UPLO,
-                             fortran_int n,
-                             void *matrix)
-{
-    /*
-       note: optimized assuming that sequential write is better than
-       sequential read
-    */
-    fortran_int i;
-    fortran_int one = 1;
-
-    /* note: 'L' and 'U' are interpreted considering *fortran* order */
-    if ('L' == UPLO) {
-        @typ@ *dst = (@typ@ *) matrix;
-        @typ@ *src = (@typ@ *) matrix;
-        src += 1;
-        dst += n;
-        for (i = 1; i < n; ++i) {
-            FNAME(@copy@)(&i,
-                          (void *)src, &n,
-                          (void *)dst, &one);
-            src += 1;
-            dst += n;
-        }
-    } else { /* assume U */
-        @typ@ *dst = (@typ@ *) matrix;
-        @typ@ *src = (@typ@ *) matrix;
-        src += n;
-        dst += 1;
-        for (i = n - 1; i > 0; --i) {
-            FNAME(@copy@)(&i,
-                          (void *)src, &n,
-                          (void *)dst, &one);
-            src += n + 1;
-            dst += n + 1;
-        }
-    }
-}
 /**end repeat**/
 
                /* identity square matrix generation */
@@ -947,19 +924,6 @@ identity_@TYPE@_matrix(void *ptr, size_t n)
    #typ=float,double,COMPLEX_t,DOUBLECOMPLEX_t#
    #cblas_type=s,d,c,z#
  */
-static inline void
-tril_@TYPE@_matrix(void *ptr, size_t n)
-{
-    size_t i,j;
-    @typ@ *matrix = (@typ@*)ptr;
-    matrix++;
-    for (i = n-1; i > 0; --i) {
-        for (j = 0; j < i; ++j) {
-            *matrix = @cblas_type@_zero;
-        }
-        matrix += n + 1;
-    }
-}
 
 static inline void
 triu_@TYPE@_matrix(void *ptr, size_t n)
@@ -976,337 +940,6 @@ triu_@TYPE@_matrix(void *ptr, size_t n)
 }
 /**end repeat**/
 
-/*
- *****************************************************************************
- **                             UFUNC LOOPS                                 **
- *****************************************************************************
- */
-
-/**begin repeat
-   #typ=npy_float,npy_double,COMPLEX_t,DOUBLECOMPLEX_t,COMPLEX_t,DOUBLECOMPLEX_t#
-   #ftyp=fortran_real,fortran_doublereal,fortran_complex,fortran_doublecomplex,fortran_complex,fortran_doublecomplex#
-   #add=FLOAT_add,DOUBLE_add,CFLOAT_add,CDOUBLE_add,CFLOAT_add,CDOUBLE_add#
-   #mul=FLOAT_mul,DOUBLE_mul,CFLOAT_mul,CDOUBLE_mul,CFLOAT_mulc,CDOUBLE_mulc#
-   #dot=sdot,ddot,cdotu,zdotu,cdotc,zdotc#
-   #zero=s_zero, d_zero,c_zero,z_zero,c_zero,z_zero#
- */
-
-static inline void
-@dot@_inner1d(char **args, npy_intp *dimensions, npy_intp *steps)
-{
-    const size_t sot = sizeof(@typ@);
-    int dim = (int)dimensions[0];
-    INIT_OUTER_LOOP_3
-    /*
-     * use blas if the stride is a multiple of datatype size in the inputs
-     * it should be the common case
-     */
-    if ((0 == (steps[3] % sot)) &&
-        (0 == (steps[4] % sot))) {
-        /* use blas */
-
-        int is1 = (int)(steps[0]/sot), is2 = (int)(steps[1]/sot);
-        BEGIN_OUTER_LOOP_3
-            @ftyp@ * ip1 = (@ftyp@*)args[0], *ip2 = (@ftyp@*)args[1];
-            *(@ftyp@*)(args[2]) = BLAS(@dot@)(&dim, ip1, &is1, ip2, &is2);
-        END_OUTER_LOOP
-    } else {
-        /* use standard version */
-        npy_intp is1=steps[0], is2=steps[1];
-        BEGIN_OUTER_LOOP_3
-            int i;
-            char *ip1=args[0], *ip2=args[1], *op=args[2];
-            @typ@ sum = @zero@;
-            for (i = 0; i < dim; i++) {
-                @typ@ prod = @mul@(*(@typ@ *)ip1, *(@typ@*)ip2);
-                sum = @add@(sum, prod);
-                ip1 += is1;
-                ip2 += is2;
-            }
-            *(@typ@ *)op = sum;
-        END_OUTER_LOOP
-    }
-}
-
-/**end repeat**/
-
-/**begin repeat
-   #TYPE=FLOAT,DOUBLE,CFLOAT,CDOUBLE#
-   #dot=sdot,ddot,cdotu,zdotu#
-   #dotc=sdot,ddot,cdotc,zdotc#
- */
-static void
-@TYPE@_inner1d(char **args, npy_intp *dimensions, npy_intp *steps,
-               void *NPY_UNUSED(func))
-{
-    @dot@_inner1d(args, dimensions, steps);
-}
-
-static void
-@TYPE@_dotc1d(char **args, npy_intp *dimensions, npy_intp *steps,
-              void *NPY_UNUSED(func))
-{
-    @dotc@_inner1d(args, dimensions, steps);
-}
-/**end repeat**/
-
-/* -------------------------------------------------------------------------- */
-
-/**begin repeat
-   #TYPE=FLOAT,DOUBLE,CFLOAT,CDOUBLE#
-   #typ=npy_float,npy_double,COMPLEX_t,DOUBLECOMPLEX_t#
-   #add=FLOAT_add,DOUBLE_add,CFLOAT_add,CDOUBLE_add#
-   #mul=FLOAT_mul,DOUBLE_mul,CFLOAT_mul,CDOUBLE_mul#
-   #zero=s_zero, d_zero,c_zero,z_zero#
-*/
-
-/*
- *  This implements the function
- *        out[n] = sum_i { in1[n, i] * in2[n, i] * in3[n, i] }.
- */
-
-static void
-@TYPE@_innerwt(char **args,
-               npy_intp *dimensions,
-               npy_intp *steps,
-               void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_4
-    npy_intp di = dimensions[0];
-    npy_intp i;
-    npy_intp is1=steps[0], is2=steps[1], is3=steps[2];
-    BEGIN_OUTER_LOOP_4
-        char *ip1=args[0], *ip2=args[1], *ip3=args[2], *op=args[3];
-        @typ@ sum = @zero@;
-        for (i = 0; i < di; i++) {
-            @typ@ prod = @mul@(@mul@(*(@typ@*)ip1, *(@typ@*)ip2),*(@typ@*)ip3);
-            sum = @add@(sum, prod);
-            ip1 += is1;
-            ip2 += is2;
-            ip3 += is3;
-        }
-        *(@typ@ *)op = sum;
-    END_OUTER_LOOP
-}
-
-/**end repeat**/
-
-
-/* -------------------------------------------------------------------------- */
-                        /* Matrix Multiply */
-
-typedef struct gemm_params_struct
-{
-    void *A;
-    void *B;
-    void *C;
-    fortran_int LDA;
-    fortran_int LDB;
-    fortran_int LDC;
-    fortran_int M;
-    fortran_int K;
-    fortran_int N;
-    char TRANSA;
-    char TRANSB;
-
-    void *allocated_data;
-} GEMM_PARAMS_t;
-
-static inline void
-dump_gemm_params(const char* name, const GEMM_PARAMS_t* params)
-{
-    TRACE_TXT("\n%s\n"
-              "%14s: %18p\n" "%14s: %18p\n" "%14s: %18p\n"
-              "%14s: %18d\n" "%14s: %18d\n" "%14s: %18d\n"
-              "%14s: %18d\n" "%14s: %18d\n" "%14s: %18d\n"
-              "%14s: %15c'%c'\n" "%14s: %15c'%c'\n",
-              name,
-              "A", params->A, "B", params->B, "C", params->C,
-              "LDA", params->LDA, "LDB", params->LDB, "LDC", params->LDC,
-              "M", params->M, "K", params->K, "N", params->N,
-              "TRANSA", ' ', params->TRANSA, "TRANSB", ' ', params->TRANSB);
-}
-
-
-typedef struct _matrix_desc {
-    int need_buff;
-    fortran_int lead;
-    size_t size;
-    void *buff;
-} matrix_desc;
-
-static inline void
-matrix_desc_init(matrix_desc *mtxd,
-                 npy_intp* steps,
-                 size_t sot,
-                 fortran_int rows,
-                 fortran_int columns)
-/* initialized a matrix desc based on the information in steps
-   it will fill the matrix desc with the values needed.
-   steps[0] contains column step
-   steps[1] contains row step
-*/
-{
-    /* if column step is not element step, a copy will be needed
-       to arrange the array in a way compatible with blas
-    */
-    mtxd->need_buff = steps[0] != sot;
-
-    if (!mtxd->need_buff) {
-        if (steps[1]) {
-            mtxd->lead = (fortran_int) steps[1] / sot;
-        } else {
-            /* step is 0, this means either it is only 1 column or
-               there is a step trick around*/
-            if (columns > 1) {
-                /* step tricks not supported in gemm... make a copy */
-                mtxd->need_buff = 1;
-            } else {
-                /* lead must be at least 1 */
-                mtxd->lead = rows;
-            }
-        }
-    }
-
-    /* if a copy is performed, the lead is the number of columns */
-    if (mtxd->need_buff) {
-        mtxd->lead = rows;
-    }
-
-    mtxd->size = rows*columns*sot*mtxd->need_buff;
-    mtxd->buff = NULL;
-}
-
-static inline npy_uint8 *
-matrix_desc_assign_buff(matrix_desc* mtxd, npy_uint8 *p)
-{
-    if (mtxd->need_buff) {
-        mtxd->buff = p;
-        return p + mtxd->size;
-    } else {
-        return p;
-    }
-}
-
-static inline int
-init_gemm_params(GEMM_PARAMS_t *params,
-                 fortran_int m,
-                 fortran_int k,
-                 fortran_int n,
-                 npy_intp* steps,
-                 size_t sot)
-{
-    npy_uint8 *mem_buff = NULL;
-    matrix_desc a, b, c;
-    matrix_desc_init(&a, steps + 0, sot, m, k);
-    matrix_desc_init(&b, steps + 2, sot, k, n);
-    matrix_desc_init(&c, steps + 4, sot, m, n);
-
-    if (a.size + b.size + c.size)
-    {
-        npy_uint8 *current;
-        /* need to allocate some buffer */
-        mem_buff = malloc(a.size + b.size + c.size);
-        if (!mem_buff)
-            goto error;
-
-        current = mem_buff;
-        current = matrix_desc_assign_buff(&a, current);
-        current = matrix_desc_assign_buff(&b, current);
-        current = matrix_desc_assign_buff(&c, current);
-    }
-
-    params->A = a.buff;
-    params->B = b.buff;
-    params->C = c.buff;
-    params->LDA = a.lead;
-    params->LDB = b.lead;
-    params->LDC = c.lead;
-    params->M = m;
-    params->N = n;
-    params->K = k;
-    params->TRANSA='N';
-    params->TRANSB='N';
-
-    params->allocated_data = mem_buff;
-
-    return 1;
- error:
-    free(mem_buff);
-    memset(params, 0, sizeof(*params));
-    return 0;
-}
-
-static inline void
-release_gemm_params(GEMM_PARAMS_t * params)
-{
-    free(params->allocated_data);
-    params->allocated_data = NULL;
-}
-
-
-/**begin repeat
-   #TYPE=FLOAT,DOUBLE,CFLOAT,CDOUBLE#
-   #typ=npy_float,npy_double,npy_cfloat,npy_cdouble#
-   #one=s_one,d_one,c_one.array,z_one.array#
-   #zero=s_zero,d_zero,c_zero.array,z_zero.array#
-   #GEMM=sgemm,dgemm,cgemm,zgemm#
-*/
-
-static void
-@TYPE@_matrix_multiply(char **args,
-                       npy_intp *dimensions,
-                       npy_intp *steps,
-                       void *NPY_UNUSED(func))
-{
-    /*
-     * everything is setup in a way that makes things work. BLAS gemm can be
-     * be called without rearranging nor using weird stuff, as matrices are
-     * in the expected way in memory.
-     * This is just a loop calling blas.
-     */
-    GEMM_PARAMS_t params;
-    fortran_int m, k, n;
-
-    INIT_OUTER_LOOP_3
-
-    m = (fortran_int) dimensions[0];
-    k = (fortran_int) dimensions[1];
-    n = (fortran_int) dimensions[2];
-
-    if (init_gemm_params(&params, m, k, n, steps, sizeof(@typ@)))
-    {
-        LINEARIZE_DATA_t a_in, b_in, c_out;
-
-        if (params.A)
-            init_linearize_data(&a_in, k, m, steps[1], steps[0]);
-        if (params.B)
-            init_linearize_data(&b_in, n, k, steps[3], steps[2]);
-        if (params.C)
-            init_linearize_data(&c_out, n, m, steps[5], steps[4]);
-
-        BEGIN_OUTER_LOOP_3
-            /* just call the appropriate multiply and update pointers */
-            @typ@ *A = linearize_@TYPE@_matrix(params.A, args[0], &a_in);
-            @typ@ *B = linearize_@TYPE@_matrix(params.B, args[1], &b_in);
-            @typ@ *C = params.C ? (@typ@*)params.C : (@typ@ *) args[2];
-
-            /* linearize source operands if needed */
-            FNAME(@GEMM@)(&params.TRANSA, &params.TRANSB,
-                          &params.M, &params.N, &params.K,
-                          (void*)&@one@, // alpha
-                          (void*)A, &params.LDA,
-                          (void*)B, &params.LDB,
-                          (void*)&@zero@, // beta
-                          (void*)C, &params.LDC);
-            delinearize_@TYPE@_matrix(args[2], params.C, &c_out);
-        END_OUTER_LOOP
-
-        release_gemm_params(&params);
-    }
-}
-/**end repeat**/
-
 
 /* -------------------------------------------------------------------------- */
                           /* Determinants */
@@ -2152,6 +1785,8 @@ static void
     fortran_int n;
     INIT_OUTER_LOOP_2
 
+    assert(uplo == 'L');
+
     n = (fortran_int)dimensions[0];
     if (init_@lapack_func@(&params, uplo, n))
     {
@@ -2177,21 +1812,12 @@ static void
 }
 
 static void
-@TYPE@_cholesky_up(char **args, npy_intp *dimensions, npy_intp *steps,
-                void *NPY_UNUSED(func))
-{
-    @TYPE@_cholesky('U', args, dimensions, steps);
-}
-
-static void
 @TYPE@_cholesky_lo(char **args, npy_intp *dimensions, npy_intp *steps,
                 void *NPY_UNUSED(func))
 {
     @TYPE@_cholesky('L', args, dimensions, steps);
 }
 
-
-
 /**end repeat**/
 
 /* -------------------------------------------------------------------------- */
@@ -2590,6 +2216,8 @@ static inline void
     int error_occurred = get_fp_invalid_and_clear();
     GEEV_PARAMS_t geev_params;
 
+    assert(JOBVL == 'N');
+
     STACK_TRACE;
     op_count += 'V'==JOBVL?1:0;
     op_count += 'V'==JOBVR?1:0;
@@ -3140,581 +2768,6 @@ static void
 
 /**end repeat**/
 
-/* -------------------------------------------------------------------------- */
-               /* some basic ufuncs */
-
-/**begin repeat
-   #TYPE=FLOAT,DOUBLE,CFLOAT,CDOUBLE#
-   #typ=float,double,COMPLEX_t,DOUBLECOMPLEX_t#
- */
-
-static void
-@TYPE@_add3(char **args,
-            npy_intp *dimensions,
-            npy_intp* steps,
-            void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_4
-    BEGIN_OUTER_LOOP_4
-        @typ@ *op1p = (@typ@ *) args[0];
-        @typ@ *op2p = (@typ@ *) args[1];
-        @typ@ *op3p = (@typ@ *) args[2];
-        @typ@ *op4p = (@typ@ *) args[3];
-        @typ@ tmp;
-        tmp   = @TYPE@_add(*op1p, *op2p);
-        *op4p = @TYPE@_add(tmp, *op3p);
-    END_OUTER_LOOP
-}
-
-static void
-@TYPE@_multiply3(char **args,
-                 npy_intp *dimensions,
-                 npy_intp* steps,
-                 void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_4
-    BEGIN_OUTER_LOOP_4
-        @typ@ *op1p = (@typ@ *) args[0];
-        @typ@ *op2p = (@typ@ *) args[1];
-        @typ@ *op3p = (@typ@ *) args[2];
-        @typ@ *op4p = (@typ@ *) args[3];
-        @typ@ tmp;
-        tmp   = @TYPE@_mul(*op1p, *op2p);
-        *op4p = @TYPE@_mul(tmp, *op3p);
-    END_OUTER_LOOP
-}
-
-static void
-@TYPE@_multiply3_add(char **args,
-                     npy_intp *dimensions,
-                     npy_intp* steps,
-                     void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_5
-    BEGIN_OUTER_LOOP_5
-        @typ@ *op1p = (@typ@ *) args[0];
-        @typ@ *op2p = (@typ@ *) args[1];
-        @typ@ *op3p = (@typ@ *) args[2];
-        @typ@ *op4p = (@typ@ *) args[3];
-        @typ@ *op5p = (@typ@ *) args[4];
-        @typ@ tmp, tmp2;
-
-        tmp   = @TYPE@_mul(*op1p, *op2p);
-        tmp2  = @TYPE@_mul(tmp, *op3p);
-        *op5p = @TYPE@_add(tmp2, *op4p);
-    END_OUTER_LOOP
-}
-
-static void
-@TYPE@_multiply_add(char **args,
-                    npy_intp *dimensions,
-                    npy_intp* steps,
-                    void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_4
-    BEGIN_OUTER_LOOP_4
-        @typ@ *op1p = (@typ@ *) args[0];
-        @typ@ *op2p = (@typ@ *) args[1];
-        @typ@ *op3p = (@typ@ *) args[2];
-        @typ@ *op4p = (@typ@ *) args[3];
-        @typ@ tmp;
-        tmp   = @TYPE@_mul(*op1p, *op2p);
-        *op4p = @TYPE@_add(tmp, *op3p);
-    END_OUTER_LOOP
-}
-
-static void
-@TYPE@_multiply_add2(char **args,
-                     npy_intp *dimensions,
-                     npy_intp* steps,
-                     void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_5
-    BEGIN_OUTER_LOOP_5
-        @typ@ *op1p = (@typ@ *) args[0];
-        @typ@ *op2p = (@typ@ *) args[1];
-        @typ@ *op3p = (@typ@ *) args[2];
-        @typ@ *op4p = (@typ@ *) args[3];
-        @typ@ *op5p = (@typ@ *) args[4];
-        @typ@ tmp, tmp2;
-        tmp   = @TYPE@_mul(*op1p, *op2p);
-        tmp2  = @TYPE@_add(*op3p, *op4p);
-        *op5p = @TYPE@_add(tmp, tmp2);
-    END_OUTER_LOOP
-}
-
-static void
-@TYPE@_multiply4(char **args,
-                 npy_intp *dimensions,
-                 npy_intp* steps,
-                 void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_5
-    BEGIN_OUTER_LOOP_5
-        @typ@ *op1p = (@typ@ *) args[0];
-        @typ@ *op2p = (@typ@ *) args[1];
-        @typ@ *op3p = (@typ@ *) args[2];
-        @typ@ *op4p = (@typ@ *) args[3];
-        @typ@ *op5p = (@typ@ *) args[4];
-        @typ@ tmp, tmp2;
-        tmp   = @TYPE@_mul(*op1p, *op2p);
-        tmp2  = @TYPE@_mul(*op3p, *op4p);
-        *op5p = @TYPE@_mul(tmp, tmp2);
-    END_OUTER_LOOP
-}
-
-static void
-@TYPE@_multiply4_add(char **args,
-                     npy_intp *dimensions,
-                     npy_intp* steps,
-                     void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_6
-    BEGIN_OUTER_LOOP_6
-        @typ@ *op1p = (@typ@ *) args[0];
-        @typ@ *op2p = (@typ@ *) args[1];
-        @typ@ *op3p = (@typ@ *) args[2];
-        @typ@ *op4p = (@typ@ *) args[3];
-        @typ@ *op5p = (@typ@ *) args[4];
-        @typ@ *op6p = (@typ@ *) args[5];
-        @typ@ tmp, tmp2, tmp3;
-        tmp   = @TYPE@_mul(*op1p, *op2p);
-        tmp2  = @TYPE@_mul(*op3p, *op4p);
-        tmp3  = @TYPE@_mul(tmp, tmp2);
-        *op6p = @TYPE@_add(tmp3, *op5p);
-    END_OUTER_LOOP
-}
-
-/**end repeat**/
-
-/* -------------------------------------------------------------------------- */
-             /* quadratic form */
-
-/**begin repeat
-   #TYPE=FLOAT,DOUBLE,CFLOAT,CDOUBLE#
-   #typ=float,double,COMPLEX_t,DOUBLECOMPLEX_t#
-   #ftyp=fortran_real,fortran_doublereal,fortran_complex,fortran_doublecomplex#
-   #zero=s_zero,d_zero,c_zero,z_zero#
-   #blas_dot=FNAME(sdot),FNAME(ddot),FNAME(cdotu),FNAME(zdotu)#
- */
-
-static inline @typ@
-@TYPE@_dot_blas(size_t count,
-                void *X, intptr_t X_byte_step,
-                void *Y, intptr_t Y_byte_step)
-{
-    @ftyp@ result;
-    fortran_int N = (fortran_int) count;
-    fortran_int INCX = X_byte_step/sizeof(@ftyp@);
-    fortran_int INCY = Y_byte_step/sizeof(@ftyp@);
-
-    result = @blas_dot@(&N, X, &INCX, Y, &INCY);
-
-    return *(@typ@ *)&result;
-}
-
-static inline @typ@
-@TYPE@_dot_std(size_t count,
-               void *X, intptr_t X_byte_step,
-               void *Y, intptr_t Y_byte_step)
-{
-    size_t i;
-    @typ@ acc, *x_ptr, *y_ptr;
-    x_ptr = (@typ@ *)X;
-    y_ptr = (@typ@ *)Y;
-
-    acc = @TYPE@_mul(*x_ptr, *y_ptr);
-    x_ptr = offset_ptr(x_ptr, X_byte_step);
-    y_ptr = offset_ptr(y_ptr, Y_byte_step);
-
-    for (i = 1; i < count; i++) {
-        @typ@ tmp;
-
-        tmp = @TYPE@_mul(*x_ptr, *y_ptr);
-        acc = @TYPE@_add(acc, tmp);
-
-        x_ptr = offset_ptr(x_ptr, X_byte_step);
-        y_ptr = offset_ptr(y_ptr, Y_byte_step);
-    }
-
-    return acc;
-}
-
-/* uQv, where u and v are vectors and Q is a matrix */
-static void
-@TYPE@_quadratic_form(char **args,
-                      npy_intp *dimensions,
-                      npy_intp* steps,
-                      void *NPY_UNUSED(func))
-{
-    INIT_OUTER_LOOP_4
-    size_t m = (size_t)dimensions[0];
-    size_t n = (size_t)dimensions[1];
-    ptrdiff_t u_step = (ptrdiff_t)steps[0];
-    ptrdiff_t Q_row_step = (ptrdiff_t)steps[1];
-    ptrdiff_t Q_column_step = (ptrdiff_t)steps[2];
-    ptrdiff_t v_step = (ptrdiff_t)steps[3];
-
-    if ((0 == (Q_row_step % sizeof(@ftyp@))) &&
-        (0 == (u_step % sizeof(@ftyp@)))) {
-        /* use blas loops for dot */
-        BEGIN_OUTER_LOOP_4
-            size_t column;
-            npy_uint8 *u, *Q, *v;
-            @typ@ *r;
-            @typ@ accum = @zero@;
-
-            u = (npy_uint8 *)args[0]; /* u (m)   [in] */
-            Q = (npy_uint8 *)args[1]; /* Q (m,n) [in] */
-            v = (npy_uint8 *)args[2]; /* v (n)   [in] */
-            r = (@typ@ *)args[3];   /* result  [out] */
-            /* sum (compute u * Q[i] * v[i]) for all i.
-               Q[i] are the different columns on Q */
-
-            for (column = 0; column < n; ++column) {
-                @typ@ tmp, tmp2;
-                tmp = @TYPE@_dot_blas(m,
-                                      Q, Q_row_step,
-                                      u, u_step);
-                tmp2  = @TYPE@_mul(tmp, *(@typ@*)v);
-                accum = @TYPE@_add(accum, tmp2);
-                Q += Q_column_step;
-                v += v_step;
-            }
-
-            *r = accum;
-        END_OUTER_LOOP
-    } else {
-        BEGIN_OUTER_LOOP_4
-            size_t column;
-            npy_uint8 *u, *Q, *v;
-            @typ@ *r;
-            @typ@ accum = @zero@;
-            u = (npy_uint8 *)args[0]; /* u (m)   [in] */
-            Q = (npy_uint8 *)args[1]; /* Q (m,n) [in] */
-            v = (npy_uint8 *)args[2]; /* v (n)   [in] */
-            r = (@typ@ *)args[3];   /* result  [out] */
-            /* sum (compute u * Q[i] * v[i]) for all i.
-               Q[i] are the different columns on Q */
-
-            for (column = 0; column < n; ++column) {
-                @typ@ tmp, tmp2;
-                tmp = @TYPE@_dot_std(m, Q, Q_row_step, u, u_step);
-                tmp2 = @TYPE@_mul(tmp, *(@typ@*)v);
-                accum = @TYPE@_add(accum, tmp2);
-                Q += Q_column_step;
-                v += v_step;
-            }
-
-            *r = accum;
-        END_OUTER_LOOP
-    }
-}
-/**end repeat**/
-
-/* -------------------------------------------------------------------------- */
-             /* chosolve (using potrs) */
-
-typedef struct potrs_params_struct {
-    void *A;
-    void *B;
-    fortran_int N;
-    fortran_int NRHS;
-    fortran_int LDA;
-    fortran_int LDB;
-    char UPLO;
-} POTRS_PARAMS_t;
-
-/**begin repeat
-   #TYPE=FLOAT,DOUBLE,CFLOAT,CDOUBLE#
-   #ftyp=float,double,COMPLEX_t,DOUBLECOMPLEX_t#
-   #lapack_func_2=spotrf,dpotrf,cpotrf,zpotrf#
-   #lapack_func=spotrs,dpotrs,cpotrs,zpotrs#
- */
-
-
-static inline int
-init_@lapack_func@(POTRS_PARAMS_t *params,
-                   char UPLO,
-                   fortran_int N,
-                   fortran_int NRHS)
-{
-    npy_uint8 *mem_buff = NULL;
-    npy_uint8 *a, *b;
-    size_t a_size, b_size;
-
-    a_size = N*N*sizeof(@ftyp@);
-    b_size = N*NRHS*sizeof(@ftyp@);
-    mem_buff = malloc(a_size + b_size);
-    if (!mem_buff)
-      goto error;
-
-    a = mem_buff;
-    b = a + a_size;
-
-    params->A = (void*)a;
-    params->B = (void*)b;
-    params->N = N;
-    params->NRHS = NRHS;
-    params->LDA = N;
-    params->LDB = N;
-    params->UPLO = UPLO;
-
-    return 1;
-
- error:
-    free(mem_buff);
-    memset(params, 0, sizeof(*params));
-
-    return 0;
-}
-
-static inline void
-release_@lapack_func@(POTRS_PARAMS_t *params)
-{
-    free(params->A);
-    memset(params,0,sizeof(*params));
-    return;
-}
-
-static inline int
-call_@lapack_func@(POTRS_PARAMS_t *params)
-{
-    fortran_int rv;
-    LAPACK(@lapack_func_2@)(&params->UPLO,
-                            &params->N,
-                            params->A, &params->LDA,
-                            &rv);
-    if (0 != rv)
-        return rv;
-
-    LAPACK(@lapack_func@)(&params->UPLO,
-                          &params->N, &params->NRHS,
-                          params->A, &params->LDA,
-                          params->B, &params->LDB,
-                          &rv);
-    return rv;
-}
-
-
-/* uplo: either 'U' or 'L'
-   ndim: use 1 to get nrhs from dimensions (matrix), 0 to use 1 (vector)
- */
-static void
-@TYPE@_chosolve(char uplo, char ndim,
-                char **args,
-                npy_intp *dimensions,
-                npy_intp* steps
-                )
-{
-    POTRS_PARAMS_t params;
-    int error_occurred = get_fp_invalid_and_clear();
-    fortran_int n, nrhs;
-    INIT_OUTER_LOOP_3
-
-    n = (fortran_int)dimensions[0];
-    nrhs = ndim?(fortran_int)dimensions[1]:1;
-    if (init_@lapack_func@(&params, uplo, n, nrhs)) {
-        LINEARIZE_DATA_t a_in, b_in, r_out;
-
-        init_linearize_data(&a_in, n, n, steps[1], steps[0]);
-        if (ndim) {
-            init_linearize_data(&b_in,  nrhs, n, steps[3], steps[2]);
-            init_linearize_data(&r_out, nrhs, n, steps[5], steps[4]);
-        } else {
-            init_linearize_data(&b_in,  1, n, 0, steps[2]);
-            init_linearize_data(&r_out, 1, n, 0, steps[3]);
-        }
-
-        BEGIN_OUTER_LOOP_3
-            int not_ok;
-            linearize_@TYPE@_matrix(params.A, args[0], &a_in);
-            linearize_@TYPE@_matrix(params.B, args[1], &b_in);
-            not_ok = call_@lapack_func@(&params);
-            if (!not_ok) {
-                delinearize_@TYPE@_matrix(args[2], params.B, &r_out);
-            } else {
-                error_occurred = 1;
-                nan_@TYPE@_matrix(args[2], &r_out);
-            }
-        END_OUTER_LOOP
-
-        release_@lapack_func@(&params);
-    }
-
-    set_fp_invalid_or_clear(error_occurred);
-}
-
-static void
-@TYPE@_chosolve_up(char **args,
-                   npy_intp *dimensions,
-                   npy_intp* steps,
-                   void *NPY_UNUSED(func))
-{
-    @TYPE@_chosolve('U', 1, args, dimensions, steps);
-}
-
-static void
-@TYPE@_chosolve_lo(char **args,
-                   npy_intp *dimensions,
-                   npy_intp* steps,
-                   void *NPY_UNUSED(func))
-{
-    @TYPE@_chosolve('L', 1, args, dimensions, steps);
-}
-
-static void
-@TYPE@_chosolve1_up(char **args,
-                    npy_intp *dimensions,
-                    npy_intp* steps,
-                    void *NPY_UNUSED(func))
-{
-    @TYPE@_chosolve('U', 0, args, dimensions, steps);
-}
-
-static void
-@TYPE@_chosolve1_lo(char **args,
-                    npy_intp *dimensions,
-                    npy_intp* steps,
-                    void *NPY_UNUSED(func))
-{
-    @TYPE@_chosolve('L', 0, args, dimensions, steps);
-}
-
-/**end repeat**/
-
-/* -------------------------------------------------------------------------- */
-                 /* poinv  */
-
-typedef struct potri_params_struct {
-    void *A;
-    fortran_int N;
-    fortran_int LDA;
-    char UPLO;
-} POTRI_PARAMS_t;
-
-
-/**begin repeat
-   #TYPE=FLOAT,DOUBLE,CFLOAT,CDOUBLE#
-   #ftyp=float,double,COMPLEX_t,DOUBLECOMPLEX_t#
-   #potrf=spotrf,dpotrf,cpotrf,zpotrf#
-   #potri=spotri,dpotri,cpotri,zpotri#
- */
-static inline int
-init_@potri@(POTRI_PARAMS_t *params,
-             char UPLO,
-             fortran_int N)
-{
-    npy_uint8 *mem_buff = NULL;
-    npy_uint8 *a;
-    size_t a_size;
-
-    a_size = N*N*sizeof(@ftyp@);
-    mem_buff = malloc(a_size);
-    if (!mem_buff)
-      goto error;
-
-    a = mem_buff;
-
-    params->A = (void*)a;
-    params->N = N;
-    params->LDA = N;
-    params->UPLO = UPLO;
-
-    return 1;
-
- error:
-    free(mem_buff);
-    memset(params, 0, sizeof(*params));
-
-    return 0;
-}
-
-static inline void
-release_@potri@(POTRI_PARAMS_t *params)
-{
-    free(params->A);
-    memset(params,0,sizeof(*params));
-    return;
-}
-
-static inline int
-call_@potri@(POTRI_PARAMS_t *params)
-{
-    fortran_int rv;
-    LAPACK(@potrf@)(&params->UPLO,
-                    &params->N,
-                    params->A, &params->LDA,
-                    &rv);
-    if (0!= rv) {
-      return rv;
-    }
-
-    LAPACK(@potri@)(&params->UPLO,
-                    &params->N,
-                    params->A, &params->LDA,
-                    &rv);
-    return rv;
-}
-
-
-static inline void
-@TYPE@_poinv(char uplo,
-             char **args,
-             npy_intp *dimensions,
-             npy_intp *steps)
-{
-    POTRI_PARAMS_t params;
-    int error_occurred = get_fp_invalid_and_clear();
-    fortran_int n;
-    INIT_OUTER_LOOP_2
-
-    n = (fortran_int)dimensions[0];
-    if (init_@potri@(&params, uplo, n)) {
-        LINEARIZE_DATA_t a_in, r_out;
-
-        init_linearize_data(&a_in, n, n, steps[1], steps[0]);
-        init_linearize_data(&r_out, n, n, steps[3], steps[2]);
-
-        BEGIN_OUTER_LOOP_2
-            int not_ok;
-            linearize_@TYPE@_matrix(params.A, args[0], &a_in);
-            not_ok = call_@potri@(&params);
-            if (!not_ok) {
-                make_symmetric_@TYPE@_matrix(params.UPLO, params.N, params.A);
-                delinearize_@TYPE@_matrix(args[1], params.A, &r_out);
-            } else {
-                error_occurred = 1;
-                nan_@TYPE@_matrix(args[1], &r_out);
-            }
-        END_OUTER_LOOP
-
-        release_@potri@(&params);
-    }
-
-    set_fp_invalid_or_clear(error_occurred);
-}
-
-static void
-@TYPE@_poinv_up(char **args,
-                npy_intp *dimensions,
-                npy_intp *steps,
-                void *NPY_UNUSED(func))
-{
-    @TYPE@_poinv('U', args, dimensions, steps);
-}
-
-static void
-@TYPE@_poinv_lo(char **args,
-                npy_intp *dimensions,
-                npy_intp *steps,
-                void *NPY_UNUSED(func))
-{
-    @TYPE@_poinv('L', args, dimensions, steps);
-}
-
-/**end repeat**/
-
 #pragma GCC diagnostic pop
 
 /* -------------------------------------------------------------------------- */
@@ -3772,10 +2825,6 @@ static void *array_of_nulls[] = {
     }
 
 
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(inner1d);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(dotc1d);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(innerwt);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(matrix_multiply);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(slogdet);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(det);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(eighlo);
@@ -3785,27 +2834,12 @@ GUFUNC_FUNC_ARRAY_REAL_COMPLEX(eigvalshup);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(solve);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(solve1);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(inv);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(cholesky_up);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(cholesky_lo);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(svd_N);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(svd_S);
 GUFUNC_FUNC_ARRAY_REAL_COMPLEX(svd_A);
 GUFUNC_FUNC_ARRAY_EIG(eig);
 GUFUNC_FUNC_ARRAY_EIG(eigvals);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(quadratic_form);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(add3);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(multiply3);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(multiply3_add);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(multiply_add);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(multiply_add2);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(multiply4);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(multiply4_add);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(chosolve_up);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(chosolve_lo);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(chosolve1_up);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(chosolve1_lo);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(poinv_up);
-GUFUNC_FUNC_ARRAY_REAL_COMPLEX(poinv_lo);
 
 static char equal_2_types[] = {
     NPY_FLOAT, NPY_FLOAT,
@@ -3903,43 +2937,6 @@ typedef struct gufunc_descriptor_struct {
 
 GUFUNC_DESCRIPTOR_t gufunc_descriptors [] = {
     {
-        "inner1d",
-        "(i),(i)->()",
-        "inner on the last dimension and broadcast on the rest \n"\
-        "    \"(i),(i)->()\" \n",
-        4, 2, 1,
-        FUNC_ARRAY_NAME(inner1d),
-        equal_3_types
-    },
-    {
-        "dotc1d",
-        "(i),(i)->()",
-        "inner on the last dimension and broadcast on the rest \n"\
-        "    \"(i),(i)->()\" \n",
-        4, 2, 1,
-        FUNC_ARRAY_NAME(dotc1d),
-        equal_3_types
-    },
-    {
-        "innerwt",
-        "(i),(i),(i)->()",
-        "inner on the last dimension using 3 operands and broadcast on the"\
-        " rest \n"\
-        "    \"(i),(i),(i)->()\" \n",
-        2, 3, 1,
-        FUNC_ARRAY_NAME(innerwt),
-        equal_4_types
-    },
-    {
-        "matrix_multiply",
-        "(m,k),(k,n)->(m,n)",
-        "dot on the last two dimensions and broadcast on the rest \n"\
-        "    \"(m,k),(k,n)->(m,n)\" \n",
-        4, 2, 1,
-        FUNC_ARRAY_NAME(matrix_multiply),
-        equal_3_types
-    },
-    {
         "slogdet",
         "(m,m)->(),()",
         "slogdet on the last two dimensions and broadcast on the rest. \n"\
@@ -4041,16 +3038,6 @@ GUFUNC_DESCRIPTOR_t gufunc_descriptors [] = {
         equal_2_types
     },
     {
-        "cholesky_up",
-        "(m,m)->(m,m)",
-        "cholesky decomposition of hermitian positive-definite matrices. \n"\
-        "Broadcast to all outer dimensions. \n"\
-        "    \"(m,m)->(m,m)\" \n",
-        4, 1, 1,
-        FUNC_ARRAY_NAME(cholesky_up),
-        equal_2_types
-    },
-    {
         "cholesky_lo",
         "(m,m)->(m,m)",
         "cholesky decomposition of hermitian positive-definite matrices. \n"\
@@ -4129,128 +3116,6 @@ GUFUNC_DESCRIPTOR_t gufunc_descriptors [] = {
         FUNC_ARRAY_NAME(eigvals),
         eigvals_types
     },
-    {
-        "quadratic_form",
-        "(m),(m,n),(n)->()",
-        "computes the quadratic form uQv in the inner dimensions, broadcast"\
-        "to the rest \n"\
-        "Results in the result of uQv for each element of the broadcasted"\
-        "dimensions. \n"\
-        "    \"(m),(m,n),(n)->()",
-        4,3,1,
-        FUNC_ARRAY_NAME(quadratic_form),
-        equal_4_types
-    },
-    {
-        "add3",
-        "(),(),()->()",
-        "3-way element-wise addition. a,b,c -> a+b+c for all elements. \n",
-        4,3,1,
-        FUNC_ARRAY_NAME(add3),
-        equal_4_types
-    },
-    {
-        "multiply3",
-        "(),(),()->()",
-        "3-way element-wise product. a,b,c -> a*b*c for all elements. \n",
-        4,3,1,
-        FUNC_ARRAY_NAME(multiply3),
-        equal_4_types
-    },
-    {
-        "multiply3_add",
-        "(),(),(),()->()",
-        "3-way element-wise product plus addition. a,b,c,d -> a*b*c+d"\
-        " for all elements. \n",
-        4,4,1,
-        FUNC_ARRAY_NAME(multiply3_add),
-        equal_5_types
-    },
-    {
-        "multiply_add",
-        "(),(),()->()",
-        "element-wise multiply-add. a,b,c -> a*b+c for all elements. \n",
-        4,3,1,
-        FUNC_ARRAY_NAME(multiply_add),
-        equal_4_types
-    },
-    {
-        "multiply_add2",
-        "(),(),(),()->()",
-        "element-wise product with 2 additions. a,b,c,d -> a*b+c+d for"\
-        " all elements. \n",
-        4,4,1,
-        FUNC_ARRAY_NAME(multiply_add2),
-        equal_5_types
-    },
-    {
-        "multiply4",
-        "(),(),(),()->()",
-        "4-way element-wise product. a,b,c,d -> a*b*c*d for all elements. \n",
-        4,4,1,
-        FUNC_ARRAY_NAME(multiply4),
-        equal_5_types
-    },
-    {
-        "multiply4_add",
-        "(),(),(),(),()->()",
-        "4-way element-wise product and addition. a,b,c,d,e -> a*b*c*d+e. \n",
-        4,5,1,
-        FUNC_ARRAY_NAME(multiply4_add),
-        equal_6_types
-    },
-    {  /* solve using cholesky decomposition (lapack potrs) */
-        "chosolve_up",
-        "(m,m),(m,n)->(m,n)",
-        "solve for symmetric/hermitian matrices using cholesky"\
-        " factorization. \n",
-        4,2,1,
-        FUNC_ARRAY_NAME(chosolve_up),
-        equal_3_types
-    },
-    {  /* solve using choleske decomposition (lapack potrs) */
-        "chosolve_lo",
-        "(m,m),(m,n)->(m,n)",
-        "solve for symmetric/hermitian matrices using cholesky"\
-        " factorization. \n",
-        4,2,1,
-        FUNC_ARRAY_NAME(chosolve_lo),
-        equal_3_types
-    },
-    {   /* solve using cholesky decomposition (lapack potrs) */
-        "chosolve1_up",
-        "(m,m),(m)->(m)",
-        "solve a system using cholesky decomposition. \n",
-        4,2,1,
-        FUNC_ARRAY_NAME(chosolve1_up),
-        equal_3_types
-    },
-    {   /* solve using cholesky decomposition (lapack potrs) */
-        "chosolve1_lo",
-        "(m,m),(m)->(m)",
-        "solve a system using cholesky decomposition. \n",
-        4,2,1,
-        FUNC_ARRAY_NAME(chosolve1_lo),
-        equal_3_types
-    },
-    {   /* inverse using cholesky decomposition (lapack potri) */
-        "poinv_up",
-        "(m,m)->(m,m)",
-        "inverse using cholesky decomposition for symmetric/hermitian"\
-        " matrices. \n",
-        4,1,1,
-        FUNC_ARRAY_NAME(poinv_up),
-        equal_2_types
-    },
-    {   /* inverse using cholesky decomposition (lapack potri) */
-        "poinv_lo",
-        "(m,m)->(m,m)",
-        "inverse using cholesky decomposition for symmetric/hermitian"\
-        " matrices. \n",
-        4,1,1,
-        FUNC_ARRAY_NAME(poinv_lo),
-        equal_2_types
-    },
 };
 
 static void
author	Charles Harris <charlesr.harris@gmail.com>	2013-10-08 08:49:48 -0700
committer	Charles Harris <charlesr.harris@gmail.com>	2013-10-08 08:49:48 -0700
commit	6a7830b143945c40b2893d02ea09951cd8c60598 (patch)
tree	a2367af215a084a5010852574a1d476a79a87413
parent	ec3603f43aaf4056bdd78401c23c937c5760f673 (diff)
parent	2fa1c9d150398cc52a2752604e24bde9e762a43c (diff)
download	numpy-6a7830b143945c40b2893d02ea09951cd8c60598.tar.gz