diff options
Diffstat (limited to 'doc/source/reference/simd')
-rw-r--r-- | doc/source/reference/simd/build-options.rst | 19 | ||||
-rw-r--r-- | doc/source/reference/simd/gen_features.py | 4 | ||||
-rw-r--r-- | doc/source/reference/simd/generated_tables/compilers-diff.inc | 42 | ||||
-rw-r--r-- | doc/source/reference/simd/generated_tables/cpu_features.inc | 53 | ||||
-rw-r--r-- | doc/source/reference/simd/how-it-works.rst | 4 |
5 files changed, 68 insertions, 54 deletions
diff --git a/doc/source/reference/simd/build-options.rst b/doc/source/reference/simd/build-options.rst index 0994f15aa..d1e2e6b8e 100644 --- a/doc/source/reference/simd/build-options.rst +++ b/doc/source/reference/simd/build-options.rst @@ -333,7 +333,7 @@ and here is how it looks on x86_64/gcc: .. literalinclude:: log_example.txt :language: bash -As you see, there is a separate report for each of ``build_ext`` and ``build_clib`` +There is a separate report for each of ``build_ext`` and ``build_clib`` that includes several sections, and each section has several values, representing the following: **Platform**: @@ -371,6 +371,17 @@ that includes several sections, and each section has several values, representin - The lines that come after the above property and end with a ':' on a separate line, represent the paths of c/c++ sources that define the generated optimizations. -Runtime Trace -------------- -To be completed. +.. _runtime-simd-dispatch: + +Runtime dispatch +---------------- +Importing NumPy triggers a scan of the available CPU features from the set +of dispatchable features. This can be further restricted by setting the +environment variable ``NPY_DISABLE_CPU_FEATURES`` to a comma-, tab-, or +space-separated list of features to disable. This will raise an error if +parsing fails or if the feature was not enabled. For instance, on ``x86_64`` +this will disable ``AVX2`` and ``FMA3``:: + + NPY_DISABLE_CPU_FEATURES="AVX2,FMA3" + +If the feature is not available, a warning will be emitted. diff --git a/doc/source/reference/simd/gen_features.py b/doc/source/reference/simd/gen_features.py index 9a38ef5c9..b141e23d0 100644 --- a/doc/source/reference/simd/gen_features.py +++ b/doc/source/reference/simd/gen_features.py @@ -168,7 +168,7 @@ if __name__ == '__main__': gen_path = path.join( path.dirname(path.realpath(__file__)), "generated_tables" ) - with open(path.join(gen_path, 'cpu_features.inc'), 'wt') as fd: + with open(path.join(gen_path, 'cpu_features.inc'), 'w') as fd: fd.write(f'.. generated via {__file__}\n\n') for arch in ( ("x86", "PPC64", "PPC64LE", "ARMHF", "AARCH64", "S390X") @@ -177,7 +177,7 @@ if __name__ == '__main__': table = Features(arch, 'gcc').table() fd.write(wrapper_section(title, table)) - with open(path.join(gen_path, 'compilers-diff.inc'), 'wt') as fd: + with open(path.join(gen_path, 'compilers-diff.inc'), 'w') as fd: fd.write(f'.. generated via {__file__}\n\n') for arch, cc_names in ( ("x86", ("clang", "ICC", "MSVC")), diff --git a/doc/source/reference/simd/generated_tables/compilers-diff.inc b/doc/source/reference/simd/generated_tables/compilers-diff.inc index 4b9009a68..d5a87da3c 100644 --- a/doc/source/reference/simd/generated_tables/compilers-diff.inc +++ b/doc/source/reference/simd/generated_tables/compilers-diff.inc @@ -1,33 +1,35 @@ -.. generated via /home/seiko/work/repos/numpy/doc/source/reference/simd/./gen_features.py +.. generated via /numpy/numpy/./doc/source/reference/simd/gen_features.py On x86::Intel Compiler ~~~~~~~~~~~~~~~~~~~~~~ .. table:: :align: left - ================ ========================================================================================================================================== - Name Implies - ================ ========================================================================================================================================== - FMA3 SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C :enabled:`AVX2` - AVX2 SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C :enabled:`FMA3` - AVX512F SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 :enabled:`AVX512CD` - :disabled:`XOP` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` - :disabled:`FMA4` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` - ================ ========================================================================================================================================== + ====================== ================================================================================================================================================================================================================================================================================================================================== ====================== + Name Implies Gathers + ====================== ================================================================================================================================================================================================================================================================================================================================== ====================== + FMA3 SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C :enabled:`AVX2` + AVX2 SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C :enabled:`FMA3` + AVX512F SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 :enabled:`AVX512CD` + :disabled:`XOP` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` + :disabled:`FMA4` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` + :disabled:`AVX512_SPR` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` :disabled:`F16C` :disabled:`FMA3` :disabled:`AVX2` :disabled:`AVX512F` :disabled:`AVX512CD` :disabled:`AVX512_SKX` :disabled:`AVX512_CLX` :disabled:`AVX512_CNL` :disabled:`AVX512_ICL` :disabled:`AVX512FP16` + ====================== ================================================================================================================================================================================================================================================================================================================================== ====================== On x86::Microsoft Visual C/C++ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. table:: :align: left - ====================== ============================================================================================================================================================================================================================================================= ============================================================================= - Name Implies Gathers - ====================== ============================================================================================================================================================================================================================================================= ============================================================================= - FMA3 SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C :enabled:`AVX2` - AVX2 SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C :enabled:`FMA3` - AVX512F SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 :enabled:`AVX512CD` :enabled:`AVX512_SKX` - AVX512CD SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F :enabled:`AVX512_SKX` - :disabled:`AVX512_KNL` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` :disabled:`F16C` :disabled:`FMA3` :disabled:`AVX2` :disabled:`AVX512F` :disabled:`AVX512CD` :disabled:`AVX512ER` :disabled:`AVX512PF` - :disabled:`AVX512_KNM` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` :disabled:`F16C` :disabled:`FMA3` :disabled:`AVX2` :disabled:`AVX512F` :disabled:`AVX512CD` :disabled:`AVX512_KNL` :disabled:`AVX5124FMAPS` :disabled:`AVX5124VNNIW` :disabled:`AVX512VPOPCNTDQ` - ====================== ============================================================================================================================================================================================================================================================= ============================================================================= + ====================== ================================================================================================================================================================================================================================================================================================================================== ============================================================================= + Name Implies Gathers + ====================== ================================================================================================================================================================================================================================================================================================================================== ============================================================================= + FMA3 SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C :enabled:`AVX2` + AVX2 SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C :enabled:`FMA3` + AVX512F SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 :enabled:`AVX512CD` :enabled:`AVX512_SKX` + AVX512CD SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX F16C FMA3 AVX2 AVX512F :enabled:`AVX512_SKX` + :disabled:`AVX512_KNL` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` :disabled:`F16C` :disabled:`FMA3` :disabled:`AVX2` :disabled:`AVX512F` :disabled:`AVX512CD` :disabled:`AVX512ER` :disabled:`AVX512PF` + :disabled:`AVX512_KNM` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` :disabled:`F16C` :disabled:`FMA3` :disabled:`AVX2` :disabled:`AVX512F` :disabled:`AVX512CD` :disabled:`AVX512_KNL` :disabled:`AVX5124FMAPS` :disabled:`AVX5124VNNIW` :disabled:`AVX512VPOPCNTDQ` + :disabled:`AVX512_SPR` :disabled:`SSE` :disabled:`SSE2` :disabled:`SSE3` :disabled:`SSSE3` :disabled:`SSE41` :disabled:`POPCNT` :disabled:`SSE42` :disabled:`AVX` :disabled:`F16C` :disabled:`FMA3` :disabled:`AVX2` :disabled:`AVX512F` :disabled:`AVX512CD` :disabled:`AVX512_SKX` :disabled:`AVX512_CLX` :disabled:`AVX512_CNL` :disabled:`AVX512_ICL` :disabled:`AVX512FP16` + ====================== ================================================================================================================================================================================================================================================================================================================================== ============================================================================= diff --git a/doc/source/reference/simd/generated_tables/cpu_features.inc b/doc/source/reference/simd/generated_tables/cpu_features.inc index 7782172d2..603370e21 100644 --- a/doc/source/reference/simd/generated_tables/cpu_features.inc +++ b/doc/source/reference/simd/generated_tables/cpu_features.inc @@ -1,35 +1,36 @@ -.. generated via /home/seiko/work/repos/review/numpy/doc/source/reference/simd/gen_features.py +.. generated via /numpy/numpy/./doc/source/reference/simd/gen_features.py On x86 ~~~~~~ .. table:: :align: left - ============== =========================================================================================================================================================================== ===================================================== - Name Implies Gathers - ============== =========================================================================================================================================================================== ===================================================== - ``SSE`` ``SSE2`` - ``SSE2`` ``SSE`` - ``SSE3`` ``SSE`` ``SSE2`` - ``SSSE3`` ``SSE`` ``SSE2`` ``SSE3`` - ``SSE41`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` - ``POPCNT`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` - ``SSE42`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` - ``AVX`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` - ``XOP`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` - ``FMA4`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` - ``F16C`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` - ``FMA3`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` - ``AVX2`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` - ``AVX512F`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` - ``AVX512CD`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` - ``AVX512_KNL`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512ER`` ``AVX512PF`` - ``AVX512_KNM`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_KNL`` ``AVX5124FMAPS`` ``AVX5124VNNIW`` ``AVX512VPOPCNTDQ`` - ``AVX512_SKX`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512VL`` ``AVX512BW`` ``AVX512DQ`` - ``AVX512_CLX`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512VNNI`` - ``AVX512_CNL`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512IFMA`` ``AVX512VBMI`` - ``AVX512_ICL`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512_CLX`` ``AVX512_CNL`` ``AVX512VBMI2`` ``AVX512BITALG`` ``AVX512VPOPCNTDQ`` - ============== =========================================================================================================================================================================== ===================================================== + ============== ========================================================================================================================================================================================== ===================================================== + Name Implies Gathers + ============== ========================================================================================================================================================================================== ===================================================== + ``SSE`` ``SSE2`` + ``SSE2`` ``SSE`` + ``SSE3`` ``SSE`` ``SSE2`` + ``SSSE3`` ``SSE`` ``SSE2`` ``SSE3`` + ``SSE41`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` + ``POPCNT`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` + ``SSE42`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` + ``AVX`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` + ``XOP`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` + ``FMA4`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` + ``F16C`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` + ``FMA3`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` + ``AVX2`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` + ``AVX512F`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` + ``AVX512CD`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` + ``AVX512_KNL`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512ER`` ``AVX512PF`` + ``AVX512_KNM`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_KNL`` ``AVX5124FMAPS`` ``AVX5124VNNIW`` ``AVX512VPOPCNTDQ`` + ``AVX512_SKX`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512VL`` ``AVX512BW`` ``AVX512DQ`` + ``AVX512_CLX`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512VNNI`` + ``AVX512_CNL`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512IFMA`` ``AVX512VBMI`` + ``AVX512_ICL`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512_CLX`` ``AVX512_CNL`` ``AVX512VBMI2`` ``AVX512BITALG`` ``AVX512VPOPCNTDQ`` + ``AVX512_SPR`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512_CLX`` ``AVX512_CNL`` ``AVX512_ICL`` ``AVX512FP16`` + ============== ========================================================================================================================================================================================== ===================================================== On IBM/POWER big-endian ~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/source/reference/simd/how-it-works.rst b/doc/source/reference/simd/how-it-works.rst index 19b3dba45..dac80dd02 100644 --- a/doc/source/reference/simd/how-it-works.rst +++ b/doc/source/reference/simd/how-it-works.rst @@ -1,6 +1,6 @@ -********************************** +********************************* How does the CPU dispatcher work? -********************************** +********************************* NumPy dispatcher is based on multi-source compiling, which means taking a certain source and compiling it multiple times with different compiler |