/* Solving memory overlap integer programs and bounded Diophantine equations with positive coefficients. Asking whether two strided arrays `a` and `b` overlap is equivalent to asking whether there is a solution to the following problem:: sum(stride_a[i] * x_a[i] for i in range(ndim_a)) - sum(stride_b[i] * x_b[i] for i in range(ndim_b)) == base_b - base_a 0 <= x_a[i] < shape_a[i] 0 <= x_b[i] < shape_b[i] for some integer x_a, x_b. Itemsize needs to be considered as an additional dimension with stride 1 and size itemsize. Negative strides can be changed to positive (and vice versa) by changing variables x[i] -> shape[i] - 1 - x[i], and zero strides can be dropped, so that the problem can be recast into a bounded Diophantine equation with positive coefficients:: sum(a[i] * x[i] for i in range(n)) == b a[i] > 0 0 <= x[i] <= ub[i] This problem is NP-hard --- runtime of algorithms grows exponentially with increasing ndim. *Algorithm description* A straightforward algorithm that excludes infeasible solutions using GCD-based pruning is outlined in Ref. [1]. It is implemented below. A number of other algorithms exist in the literature; however, this one seems to have performance satisfactory for the present purpose. The idea is that an equation:: a_1 x_1 + a_2 x_2 + ... + a_n x_n = b 0 <= x_i <= ub_i, i = 1...n implies:: a_2' x_2' + a_3 x_3 + ... + a_n x_n = b 0 <= x_i <= ub_i, i = 2...n 0 <= x_1' <= c_1 ub_1 + c_2 ub_2 with a_2' = gcd(a_1, a_2) and x_2' = c_1 x_1 + c_2 x_2 with c_1 = (a_1/a_1'), and c_2 = (a_2/a_1'). This procedure can be repeated to obtain:: a_{n-1}' x_{n-1}' + a_n x_n = b 0 <= x_{n-1}' <= ub_{n-1}' 0 <= x_n <= ub_n Now, one can enumerate all candidate solutions for x_n. For each, one can use the previous-level equation to enumerate potential solutions for x_{n-1}, with transformed right-hand side b -> b - a_n x_n. And so forth, until after n-1 nested for loops we either arrive at a candidate solution for x_1 (in which case we have found one solution to the problem), or find that the equations do not allow any solutions either for x_1 or one of the intermediate x_i (in which case we have proved there is no solution for the upper-level candidates chosen). If no solution is found for any candidate x_n, we have proved the problem is infeasible --- which for the memory overlap problem means there is no overlap. *Performance* Some common ndarray cases are easy for the algorithm: - Two arrays whose memory ranges do not overlap. These will be excluded by the bounds on x_n, with max_work=1. We also add this check as a fast path, to avoid computing GCDs needlessly, as this can take some time. - Arrays produced by continuous slicing of a continuous parent array (no internal overlap), e.g., a=x[:,0,:], b=x[:,1,:]. The strides taken together, mapped positive, and duplicates then satisfy gcd(stride[0], .., stride[j]) = stride[j] for some ordering. In this case, for each x[i] at most one candidate exists, given that the algorithm runs with strides sorted from largest to smallest. The problem can be written as:: sum a_j x_j ?= b = sum a_j z_j a_j = n_{j+1} * n_{j+2} * ... * n_d, a_d = 1 0 <= x_j <= u_j <= 2*n_j - 2 0 <= z_j <= n_j - 1 b is the offset of the last element of the second array from the start of the first. z_j are uniquely determined because of the gcd property. For each x_j, the bounds at first sight allow x_j=z_j and x_j=z_j+n_j. However, u_j <= n_j - 1 + z_j, so that at most one candidate is left. - Two arrays with stride-incommensurate starting points. For example, a=x[:,::2], b=x[:,1::2]. The base address difference is incommensurate with all strides, so that there are no solution candidates to consider. For itemsize != 1, similar result is obtained for x_{n-1}. The above cases cover arrays produced by typical slicing of well-behaved parent arrays. More generally, more difficult cases can result:: x = np.arange(4*20).reshape(4, 20).astype(np.int8) a = x[:,::7] b = x[:,3::3] <=> 20*x1 + 7*x2 + 3*x3 = 78 (= 3 + 3*20 + 5*3) 0 <= x1 <= 6, 0 <= x2 <= 2, 0 <= x3 <= 5 Non-overlapping in this case relies on x.shape[1] <= lcm(7, 3) = 21. However, elimination of x1 does not restrict candidate values for x3, so the algorithm ends up considering all values x3=0...5 separately. The upper bound for work done is prod(shape_a)*prod(shape_b), which scales faster than work done by binary ufuncs, after broadcasting, prod(shape_a). The bound may be loose, but it is possible to construct hard instances where ufunc is faster (adapted from [2,3]):: from numpy.lib.stride_tricks import as_strided # Construct non-overlapping x1 and x2 x = np.zeros([192163377], dtype=np.int8) x1 = as_strided(x, strides=(36674, 61119, 85569), shape=(1049, 1049, 1049)) x2 = as_strided(x[64023025:], strides=(12223, 12224, 1), shape=(1049, 1049, 1)) To avoid such worst cases, the amount of work done needs to be capped. If the overlap problem is related to ufuncs, one suitable cap choice is to scale max_work with the number of elements of the array. (Ref. [3] describes a more efficient algorithm for solving problems similar to the above --- however, also it must scale exponentially.) *Integer overflows* The algorithm is written in fixed-width integers, and can terminate with failure if integer overflow is detected (the implementation catches all cases). Potential failure modes: - Array extent sum(stride*(shape-1)) is too large (for int64). - Minimal solutions to a_i x_i + a_j x_j == b are too large, in some of the intermediate equations. We do this part of the computation in 128-bit integers. In general, overflows are expected only if array size is close to NPY_INT64_MAX, requiring ~exabyte size arrays, which is usually not possible. References ---------- .. [1] P. Ramachandran, ''Use of Extended Euclidean Algorithm in Solving a System of Linear Diophantine Equations with Bounded Variables''. Algorithmic Number Theory, Lecture Notes in Computer Science **4076**, 182-192 (2006). doi:10.1007/11792086_14 .. [2] Cornuejols, Urbaniak, Weismantel, and Wolsey, ''Decomposition of integer programs and of generating sets.'', Lecture Notes in Computer Science 1284, 92-103 (1997). .. [3] K. Aardal, A.K. Lenstra, ''Hard equality constrained integer knapsacks'', Lecture Notes in Computer Science 2337, 350-366 (2002). */ /* Copyright (c) 2015 Pauli Virtanen All rights reserved. Licensed under 3-clause BSD license, see LICENSE.txt. */ #define NPY_NO_DEPRECATED_API NPY_API_VERSION #define PY_SSIZE_T_CLEAN #include #include "numpy/ndarraytypes.h" #include "mem_overlap.h" #include "npy_extint128.h" #include #include #include #define MAX(a, b) (((a) >= (b)) ? (a) : (b)) #define MIN(a, b) (((a) <= (b)) ? (a) : (b)) /** * Euclid's algorithm for GCD. * * Solves for gamma*a1 + epsilon*a2 == gcd(a1, a2) * providing |gamma| < |a2|/gcd, |epsilon| < |a1|/gcd. */ static void euclid(npy_int64 a1, npy_int64 a2, npy_int64 *a_gcd, npy_int64 *gamma, npy_int64 *epsilon) { npy_int64 gamma1, gamma2, epsilon1, epsilon2, r; assert(a1 > 0); assert(a2 > 0); gamma1 = 1; gamma2 = 0; epsilon1 = 0; epsilon2 = 1; /* The numbers remain bounded by |a1|, |a2| during the iteration, so no integer overflows */ while (1) { if (a2 > 0) { r = a1/a2; a1 -= r*a2; gamma1 -= r*gamma2; epsilon1 -= r*epsilon2; } else { *a_gcd = a1; *gamma = gamma1; *epsilon = epsilon1; break; } if (a1 > 0) { r = a2/a1; a2 -= r*a1; gamma2 -= r*gamma1; epsilon2 -= r*epsilon1; } else { *a_gcd = a2; *gamma = gamma2; *epsilon = epsilon2; break; } } } /** * Precompute GCD and bounds transformations */ static int diophantine_precompute(unsigned int n, diophantine_term_t *E, diophantine_term_t *Ep, npy_int64 *Gamma, npy_int64 *Epsilon) { npy_int64 a_gcd, gamma, epsilon, c1, c2; unsigned int j; char overflow = 0; assert(n >= 2); euclid(E[0].a, E[1].a, &a_gcd, &gamma, &epsilon); Ep[0].a = a_gcd; Gamma[0] = gamma; Epsilon[0] = epsilon; if (n > 2) { c1 = E[0].a / a_gcd; c2 = E[1].a / a_gcd; /* Ep[0].ub = E[0].ub * c1 + E[1].ub * c2; */ Ep[0].ub = safe_add(safe_mul(E[0].ub, c1, &overflow), safe_mul(E[1].ub, c2, &overflow), &overflow); if (overflow) { return 1; } } for (j = 2; j < n; ++j) { euclid(Ep[j-2].a, E[j].a, &a_gcd, &gamma, &epsilon); Ep[j-1].a = a_gcd; Gamma[j-1] = gamma; Epsilon[j-1] = epsilon; if (j < n - 1) { c1 = Ep[j-2].a / a_gcd; c2 = E[j].a / a_gcd; /* Ep[j-1].ub = c1 * Ep[j-2].ub + c2 * E[j].ub; */ Ep[j-1].ub = safe_add(safe_mul(c1, Ep[j-2].ub, &overflow), safe_mul(c2, E[j].ub, &overflow), &overflow); if (overflow) { return 1; } } } return 0; } /** * Depth-first bounded Euclid search */ static mem_overlap_t diophantine_dfs(unsigned int n, unsigned int v, diophantine_term_t *E, diophantine_term_t *Ep, npy_int64 *Gamma, npy_int64 *Epsilon, npy_int64 b, Py_ssize_t max_work, int require_ub_nontrivial, npy_int64 *x, Py_ssize_t *count) { npy_int64 a_gcd, gamma, epsilon, a1, u1, a2, u2, c, r, c1, c2, t, t_l, t_u, b2, x1, x2; npy_extint128_t x10, x20, t_l1, t_l2, t_u1, t_u2; mem_overlap_t res; char overflow = 0; if (max_work >= 0 && *count >= max_work) { return MEM_OVERLAP_TOO_HARD; } /* Fetch precomputed values for the reduced problem */ if (v == 1) { a1 = E[0].a; u1 = E[0].ub; } else { a1 = Ep[v-2].a; u1 = Ep[v-2].ub; } a2 = E[v].a; u2 = E[v].ub; a_gcd = Ep[v-1].a; gamma = Gamma[v-1]; epsilon = Epsilon[v-1]; /* Generate set of allowed solutions */ c = b / a_gcd; r = b % a_gcd; if (r != 0) { ++*count; return MEM_OVERLAP_NO; } c1 = a2 / a_gcd; c2 = a1 / a_gcd; /* The set to enumerate is: x1 = gamma*c + c1*t x2 = epsilon*c - c2*t t integer 0 <= x1 <= u1 0 <= x2 <= u2 and we have c, c1, c2 >= 0 */ x10 = mul_64_64(gamma, c); x20 = mul_64_64(epsilon, c); t_l1 = ceildiv_128_64(neg_128(x10), c1); t_l2 = ceildiv_128_64(sub_128(x20, to_128(u2), &overflow), c2); t_u1 = floordiv_128_64(sub_128(to_128(u1), x10, &overflow), c1); t_u2 = floordiv_128_64(x20, c2); if (overflow) { return MEM_OVERLAP_OVERFLOW; } if (gt_128(t_l2, t_l1)) { t_l1 = t_l2; } if (gt_128(t_u1, t_u2)) { t_u1 = t_u2; } if (gt_128(t_l1, t_u1)) { ++*count; return MEM_OVERLAP_NO; } t_l = to_64(t_l1, &overflow); t_u = to_64(t_u1, &overflow); x10 = add_128(x10, mul_64_64(c1, t_l), &overflow); x20 = sub_128(x20, mul_64_64(c2, t_l), &overflow); t_u = safe_sub(t_u, t_l, &overflow); t_l = 0; x1 = to_64(x10, &overflow); x2 = to_64(x20, &overflow); if (overflow) { return MEM_OVERLAP_OVERFLOW; } /* The bounds t_l, t_u ensure the x computed below do not overflow */ if (v == 1) { /* Base case */ if (t_u >= t_l) { x[0] = x1 + c1*t_l; x[1] = x2 - c2*t_l; if (require_ub_nontrivial) { unsigned int j; int is_ub_trivial; is_ub_trivial = 1; for (j = 0; j < n; ++j) { if (x[j] != E[j].ub/2) { is_ub_trivial = 0; break; } } if (is_ub_trivial) { /* Ignore 'trivial' solution */ ++*count; return MEM_OVERLAP_NO; } } return MEM_OVERLAP_YES; } ++*count; return MEM_OVERLAP_NO; } else { /* Recurse to all candidates */ for (t = t_l; t <= t_u; ++t) { x[v] = x2 - c2*t; /* b2 = b - a2*x[v]; */ b2 = safe_sub(b, safe_mul(a2, x[v], &overflow), &overflow); if (overflow) { return MEM_OVERLAP_OVERFLOW; } res = diophantine_dfs(n, v-1, E, Ep, Gamma, Epsilon, b2, max_work, require_ub_nontrivial, x, count); if (res != MEM_OVERLAP_NO) { return res; } } ++*count; return MEM_OVERLAP_NO; } } /** * Solve bounded Diophantine equation * * The problem considered is:: * * A[0] x[0] + A[1] x[1] + ... + A[n-1] x[n-1] == b * 0 <= x[i] <= U[i] * A[i] > 0 * * Solve via depth-first Euclid's algorithm, as explained in [1]. * * If require_ub_nontrivial!=0, look for solutions to the problem * where b = A[0]*(U[0]/2) + ... + A[n]*(U[n-1]/2) but ignoring * the trivial solution x[i] = U[i]/2. All U[i] must be divisible by 2. * The value given for `b` is ignored in this case. */ NPY_VISIBILITY_HIDDEN mem_overlap_t solve_diophantine(unsigned int n, diophantine_term_t *E, npy_int64 b, Py_ssize_t max_work, int require_ub_nontrivial, npy_int64 *x) { mem_overlap_t res; unsigned int j; for (j = 0; j < n; ++j) { if (E[j].a <= 0) { return MEM_OVERLAP_ERROR; } else if (E[j].ub < 0) { return MEM_OVERLAP_NO; } } if (require_ub_nontrivial) { npy_int64 ub_sum = 0; char overflow = 0; for (j = 0; j < n; ++j) { if (E[j].ub % 2 != 0) { return MEM_OVERLAP_ERROR; } ub_sum = safe_add(ub_sum, safe_mul(E[j].a, E[j].ub/2, &overflow), &overflow); } if (overflow) { return MEM_OVERLAP_ERROR; } b = ub_sum; } if (b < 0) { return MEM_OVERLAP_NO; } if (n == 0) { if (require_ub_nontrivial) { /* Only trivial solution for 0-variable problem */ return MEM_OVERLAP_NO; } if (b == 0) { return MEM_OVERLAP_YES; } return MEM_OVERLAP_NO; } else if (n == 1) { if (require_ub_nontrivial) { /* Only trivial solution for 1-variable problem */ return MEM_OVERLAP_NO; } if (b % E[0].a == 0) { x[0] = b / E[0].a; if (x[0] >= 0 && x[0] <= E[0].ub) { return MEM_OVERLAP_YES; } } return MEM_OVERLAP_NO; } else { Py_ssize_t count = 0; diophantine_term_t *Ep = NULL; npy_int64 *Epsilon = NULL, *Gamma = NULL; Ep = malloc(n * sizeof(diophantine_term_t)); Epsilon = malloc(n * sizeof(npy_int64)); Gamma = malloc(n * sizeof(npy_int64)); if (Ep == NULL || Epsilon == NULL || Gamma == NULL) { res = MEM_OVERLAP_ERROR; } else if (diophantine_precompute(n, E, Ep, Gamma, Epsilon)) { res = MEM_OVERLAP_OVERFLOW; } else { res = diophantine_dfs(n, n-1, E, Ep, Gamma, Epsilon, b, max_work, require_ub_nontrivial, x, &count); } free(Ep); free(Gamma); free(Epsilon); return res; } } static int diophantine_sort_A(const void *xp, const void *yp) { npy_int64 xa = ((diophantine_term_t*)xp)->a; npy_int64 ya = ((diophantine_term_t*)yp)->a; if (xa < ya) { return 1; } else if (ya < xa) { return -1; } else { return 0; } } /** * Simplify Diophantine decision problem. * * Combine identical coefficients, remove unnecessary variables, and trim * bounds. * * The feasible/infeasible decision result is retained. * * Returns: 0 (success), -1 (integer overflow). */ NPY_VISIBILITY_HIDDEN int diophantine_simplify(unsigned int *n, diophantine_term_t *E, npy_int64 b) { unsigned int i, j, m; char overflow = 0; /* Skip obviously infeasible cases */ for (j = 0; j < *n; ++j) { if (E[j].ub < 0) { return 0; } } if (b < 0) { return 0; } /* Sort vs. coefficients */ qsort(E, *n, sizeof(diophantine_term_t), diophantine_sort_A); /* Combine identical coefficients */ m = *n; i = 0; for (j = 1; j < m; ++j) { if (E[i].a == E[j].a) { E[i].ub = safe_add(E[i].ub, E[j].ub, &overflow); --*n; } else { ++i; if (i != j) { E[i] = E[j]; } } } /* Trim bounds and remove unnecessary variables */ m = *n; i = 0; for (j = 0; j < m; ++j) { E[j].ub = MIN(E[j].ub, b / E[j].a); if (E[j].ub == 0) { /* If the problem is feasible at all, x[i]=0 */ --*n; } else { if (i != j) { E[i] = E[j]; } ++i; } } if (overflow) { return -1; } else { return 0; } } /* Gets a half-open range [start, end) of offsets from the data pointer */ NPY_VISIBILITY_HIDDEN void offset_bounds_from_strides(const int itemsize, const int nd, const npy_intp *dims, const npy_intp *strides, npy_intp *lower_offset, npy_intp *upper_offset) { npy_intp max_axis_offset; npy_intp lower = 0; npy_intp upper = 0; int i; for (i = 0; i < nd; i++) { if (dims[i] == 0) { /* If the array size is zero, return an empty range */ *lower_offset = 0; *upper_offset = 0; return; } /* Expand either upwards or downwards depending on stride */ max_axis_offset = strides[i] * (dims[i] - 1); if (max_axis_offset > 0) { upper += max_axis_offset; } else { lower += max_axis_offset; } } /* Return a half-open range */ upper += itemsize; *lower_offset = lower; *upper_offset = upper; } /* Gets a half-open range [start, end) which contains the array data */ static void get_array_memory_extents(PyArrayObject *arr, npy_uintp *out_start, npy_uintp *out_end, npy_uintp *num_bytes) { npy_intp low, upper; int j; offset_bounds_from_strides(PyArray_ITEMSIZE(arr), PyArray_NDIM(arr), PyArray_DIMS(arr), PyArray_STRIDES(arr), &low, &upper); *out_start = (npy_uintp)PyArray_DATA(arr) + (npy_uintp)low; *out_end = (npy_uintp)PyArray_DATA(arr) + (npy_uintp)upper; *num_bytes = PyArray_ITEMSIZE(arr); for (j = 0; j < PyArray_NDIM(arr); ++j) { *num_bytes *= PyArray_DIM(arr, j); } } static int strides_to_terms(PyArrayObject *arr, diophantine_term_t *terms, unsigned int *nterms, int skip_empty) { int i; for (i = 0; i < PyArray_NDIM(arr); ++i) { if (skip_empty) { if (PyArray_DIM(arr, i) <= 1 || PyArray_STRIDE(arr, i) == 0) { continue; } } terms[*nterms].a = PyArray_STRIDE(arr, i); if (terms[*nterms].a < 0) { terms[*nterms].a = -terms[*nterms].a; } if (terms[*nterms].a < 0) { /* integer overflow */ return 1; } terms[*nterms].ub = PyArray_DIM(arr, i) - 1; ++*nterms; } return 0; } /** * Determine whether two arrays share some memory. * * Returns: 0 (no shared memory), 1 (shared memory), or < 0 (failed to solve). * * Note that failures to solve can occur due to integer overflows, or effort * required solving the problem exceeding max_work. The general problem is * NP-hard and worst case runtime is exponential in the number of dimensions. * max_work controls the amount of work done, either exact (max_work == -1), only * a simple memory extent check (max_work == 0), or set an upper bound * max_work > 0 for the number of solution candidates considered. */ NPY_VISIBILITY_HIDDEN mem_overlap_t solve_may_share_memory(PyArrayObject *a, PyArrayObject *b, Py_ssize_t max_work) { npy_int64 rhs; diophantine_term_t terms[2*NPY_MAXDIMS + 2]; npy_uintp start1 = 0, end1 = 0, size1 = 0; npy_uintp start2 = 0, end2 = 0, size2 = 0; npy_uintp uintp_rhs; npy_int64 x[2*NPY_MAXDIMS + 2]; unsigned int nterms; get_array_memory_extents(a, &start1, &end1, &size1); get_array_memory_extents(b, &start2, &end2, &size2); if (!(start1 < end2 && start2 < end1 && start1 < end1 && start2 < end2)) { /* Memory extents don't overlap */ return MEM_OVERLAP_NO; } if (max_work == 0) { /* Too much work required, give up */ return MEM_OVERLAP_TOO_HARD; } /* Convert problem to Diophantine equation form with positive coefficients. The bounds computed by offset_bounds_from_strides correspond to all-positive strides. start1 + sum(abs(stride1)*x1) == start2 + sum(abs(stride2)*x2) == end1 - 1 - sum(abs(stride1)*x1') == end2 - 1 - sum(abs(stride2)*x2') <=> sum(abs(stride1)*x1) + sum(abs(stride2)*x2') == end2 - 1 - start1 OR sum(abs(stride1)*x1') + sum(abs(stride2)*x2) == end1 - 1 - start2 We pick the problem with the smaller RHS (they are non-negative due to the extent check above.) */ uintp_rhs = MIN(end2 - 1 - start1, end1 - 1 - start2); if (uintp_rhs > NPY_MAX_INT64) { /* Integer overflow */ return MEM_OVERLAP_OVERFLOW; } rhs = (npy_int64)uintp_rhs; nterms = 0; if (strides_to_terms(a, terms, &nterms, 1)) { return MEM_OVERLAP_OVERFLOW; } if (strides_to_terms(b, terms, &nterms, 1)) { return MEM_OVERLAP_OVERFLOW; } if (PyArray_ITEMSIZE(a) > 1) { terms[nterms].a = 1; terms[nterms].ub = PyArray_ITEMSIZE(a) - 1; ++nterms; } if (PyArray_ITEMSIZE(b) > 1) { terms[nterms].a = 1; terms[nterms].ub = PyArray_ITEMSIZE(b) - 1; ++nterms; } /* Simplify, if possible */ if (diophantine_simplify(&nterms, terms, rhs)) { /* Integer overflow */ return MEM_OVERLAP_OVERFLOW; } /* Solve */ return solve_diophantine(nterms, terms, rhs, max_work, 0, x); } /** * Determine whether an array has internal overlap. * * Returns: 0 (no overlap), 1 (overlap), or < 0 (failed to solve). * * max_work and reasons for solver failures are as in solve_may_share_memory. */ NPY_VISIBILITY_HIDDEN mem_overlap_t solve_may_have_internal_overlap(PyArrayObject *a, Py_ssize_t max_work) { diophantine_term_t terms[NPY_MAXDIMS+1]; npy_int64 x[NPY_MAXDIMS+1]; unsigned int i, j, nterms; if (PyArray_ISCONTIGUOUS(a)) { /* Quick case */ return MEM_OVERLAP_NO; } /* The internal memory overlap problem is looking for two different solutions to sum(a*x) = b, 0 <= x[i] <= ub[i] for any b. Equivalently, sum(a*x0) - sum(a*x1) = 0 Mapping the coefficients on the left by x0'[i] = x0[i] if a[i] > 0 else ub[i]-x0[i] and opposite for x1, we have sum(abs(a)*(x0' + x1')) = sum(abs(a)*ub) Now, x0!=x1 if for some i we have x0'[i] + x1'[i] != ub[i]. We can now change variables to z[i] = x0'[i] + x1'[i] so the problem becomes sum(abs(a)*z) = sum(abs(a)*ub), 0 <= z[i] <= 2*ub[i], z != ub This can be solved with solve_diophantine. */ nterms = 0; if (strides_to_terms(a, terms, &nterms, 0)) { return MEM_OVERLAP_OVERFLOW; } if (PyArray_ITEMSIZE(a) > 1) { terms[nterms].a = 1; terms[nterms].ub = PyArray_ITEMSIZE(a) - 1; ++nterms; } /* Get rid of zero coefficients and empty terms */ i = 0; for (j = 0; j < nterms; ++j) { if (terms[j].ub == 0) { continue; } else if (terms[j].ub < 0) { return MEM_OVERLAP_NO; } else if (terms[j].a == 0) { return MEM_OVERLAP_YES; } if (i != j) { terms[i] = terms[j]; } ++i; } nterms = i; /* Double bounds to get the internal overlap problem */ for (j = 0; j < nterms; ++j) { terms[j].ub *= 2; } /* Sort vs. coefficients; cannot call diophantine_simplify because it may change the decision problem inequality part */ qsort(terms, nterms, sizeof(diophantine_term_t), diophantine_sort_A); /* Solve */ return solve_diophantine(nterms, terms, -1, max_work, 1, x); }