summaryrefslogtreecommitdiff
path: root/benchmarks
diff options
context:
space:
mode:
authorSebastian Berg <sebastian@sipsolutions.net>2022-06-10 06:48:15 -0700
committerGitHub <noreply@github.com>2022-06-10 07:48:15 -0600
commit71f7f7c9df592094c3ca8cfb45cea2211de24a58 (patch)
tree40ab82b516a2a43dada998dfa56fb2a9a4858102 /benchmarks
parent7d39b0f1135859598ed6bd086a11124a52d8a969 (diff)
downloadnumpy-71f7f7c9df592094c3ca8cfb45cea2211de24a58.tar.gz
ENH: Implement string comparison ufuncs (or almost) (#21041)
* ENH: Implement string comparison ufuncs (or almost) This makes all comparison operators and ufuncs work on strings using the ufunc machinery. It requires a half-manual "ufunc" to keep supporting void comparisons and especially `np.compare_chararrays` (that one may have a bit more overhead now). In general the new code should be much faster, and has a lot of easier optimization potential. It is also much simpler since it can outsource some complexities to the ufunc/iterator machinery. This further fixes a couple of bugs with byte-swapped strings. The backward compatibility related change is that using the normal ufunc machinery means that string comparisons between string and unicode now give a `FutureWarning` (instead of just False). * MAINT: Do not use C99 tagged struct init in C++ C++ does not like it (at least not before C++20)... GCC and clang don't seem to mind, but MSVC seems to. * BENCH: Add basic string comparison benchmarks * DOC,STY: Fixup string-comparisons comments based on review Thanks to Marten's comments, a few clarfications and slight fixups. * ENH: Use `memcmp` because it may be faster for the byte case * TST: Improve string and unicode comparison tests. * MAINT: Use switch statement based on review As suggested be Serge. Co-authored-by: Serge Guelton <serge.guelton@telecom-bretagne.eu> * TST: Make unicode byte-swap test slightly more concrete The issue is that the `view` needs to use native byte-order, so just ensure native byte-order for the view, and then do another cast to get it right. * BUG: Add `np.compare_chararrays` to test and fix typo * TST: Add test for empty string comparisons * TST: Fixup string test based on martens review * MAINT: Move definitions back into string_ufuncs.h * MAINT: Use enum class for comparison operator templating This removes the need for a dynamic (or static) assert in the switch statement. * Template version of add_loop to avoid redundant code * STY: Fixup style, two spaces, error is -1 * STY: Small `string_ufuncs.cpp` fixups based on Serge's review * MAINT: Fix merge conflict (ensure_dtype_nbo was removed) Co-authored-by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Diffstat (limited to 'benchmarks')
-rw-r--r--benchmarks/benchmarks/bench_strings.py45
1 files changed, 45 insertions, 0 deletions
diff --git a/benchmarks/benchmarks/bench_strings.py b/benchmarks/benchmarks/bench_strings.py
new file mode 100644
index 000000000..e500d7f3f
--- /dev/null
+++ b/benchmarks/benchmarks/bench_strings.py
@@ -0,0 +1,45 @@
+from __future__ import absolute_import, division, print_function
+
+from .common import Benchmark
+
+import numpy as np
+import operator
+
+
+_OPERATORS = {
+ '==': operator.eq,
+ '!=': operator.ne,
+ '<': operator.lt,
+ '<=': operator.le,
+ '>': operator.gt,
+ '>=': operator.ge,
+}
+
+
+class StringComparisons(Benchmark):
+ # Basic string comparison speed tests
+ params = [
+ [100, 10000, (1000, 20)],
+ ['U', 'S'],
+ [True, False],
+ ['==', '!=', '<', '<=', '>', '>=']]
+ param_names = ['shape', 'dtype', 'contig', 'operator']
+ int64 = np.dtype(np.int64)
+
+ def setup(self, shape, dtype, contig, operator):
+ self.arr = np.arange(np.prod(shape)).astype(dtype).reshape(shape)
+ self.arr_identical = self.arr.copy()
+ self.arr_different = self.arr[::-1].copy()
+
+ if not contig:
+ self.arr = self.arr[..., ::2]
+ self.arr_identical = self.arr_identical[..., ::2]
+ self.arr_different = self.arr_different[..., ::2]
+
+ self.operator = _OPERATORS[operator]
+
+ def time_compare_identical(self, shape, dtype, contig, operator):
+ self.operator(self.arr, self.arr_identical)
+
+ def time_compare_different(self, shape, dtype, contig, operator):
+ self.operator(self.arr, self.arr_different)