DOC, SIMD: add a new index for the optimization page to separate into multiple files

author: Sayed Adel <seiko@imavr.com> 2021-11-23 03:54:15 +0200
committer: Sayed Adel <seiko@imavr.com> 2021-12-08 22:18:07 +0200
commit: 9fd4162476e4499c71567f79197cc2f9f9076219 (patch)
tree: 4f80bf46802ccb7def039d54349633cb7ca26053
parent: 563051aaebbb80da3d453cacf3e1f9782d3077fb (diff)
download: numpy-9fd4162476e4499c71567f79197cc2f9f9076219.tar.gz
3 files changed, 40 insertions, 527 deletions
diff --git a/doc/source/reference/index.rst b/doc/source/reference/index.rst
index a18211cca..24bb6665d 100644
--- a/doc/source/reference/index.rst
+++ b/doc/source/reference/index.rst
@@ -26,7 +26,7 @@ For learning how to use NumPy, see the :ref:`complete documentation <numpy_docs_
    distutils
    distutils_guide
    c-api/index
-   simd/simd-optimizations
+   simd/index
    swig
 
 
diff --git a/doc/source/reference/simd/index.rst b/doc/source/reference/simd/index.rst
new file mode 100644
index 000000000..4115338e9
--- /dev/null
+++ b/doc/source/reference/simd/index.rst
@@ -0,0 +1,37 @@
+.. _numpysimd:
+.. currentmodule:: numpysimd
+
+***********************
+CPU/SIMD Optimizations
+***********************
+
+NumPy comes with flexible working mechanism that allows it to harness the SIMD
+features that CPUs own, in order to provide faster and more stable performance
+on all popular platforms. Currently, NumPy supports (X86, IBM/Power, ARM7, ARM8)
+architectures.
+
+The optimization process in NumPy is carried out in three layers:
+
+- Code is *written* using the universal intrinsics, with guards that
+  will enable use of the them only when the compiler recognizes them.
+  Usually, they are used to generate multiple kernels for the same functionality,
+  in which each generated kernel represents a set of instructions that related one
+  or multiple certain CPU features. The first kernel represents the minimum(baseline)
+  CPU features, and the other kernels represent the additional(dispatched) CPU features.
+
+- At *compile* time, CPU build options are used to define the minimum and
+  additional features to support, based on user choice and compiler support. The
+  appropriate intrinsics are overlaid with the platform / architecture intrinsics,
+  and multiple kernels are compiled.
+
+- At *runtime import*, the CPU is probed for the set of supported CPU
+  features. A mechanism is used to grab the pointer to the most appropriate
+  kernel, and this will be the one called for the function.
+
+.. note::
+
+   NumPy community had a deep discussion before implementing this work,
+   please check `NEP-38`_ for more clarification.
+
+.. _`NEP-38`: https://numpy.org/neps/nep-0038-SIMD-optimizations.html
+
diff --git a/doc/source/reference/simd/simd-optimizations.rst b/doc/source/reference/simd/simd-optimizations.rst
index 9de6d1734..0ceff1ff8 100644
--- a/doc/source/reference/simd/simd-optimizations.rst
+++ b/doc/source/reference/simd/simd-optimizations.rst
@@ -1,527 +1,3 @@
-******************
-SIMD Optimizations
-******************
+:orphan:
 
-NumPy provides a set of macros that define `Universal Intrinsics`_ to
-abstract out typical platform-specific intrinsics so SIMD code needs to be
-written only once. There are three layers:
-
-- Code is *written* using the universal intrinsic macros, with guards that
-  will enable use of the macros only when the compiler recognizes them.
-  In NumPy, these are used to construct multiple ufunc loops. Current policy is
-  to create three loops: One loop is the default and uses no intrinsics. One
-  uses the minimum intrinsics required on the architecture. And the third is
-  written using the maximum set of intrinsics possible.
-- At *compile* time, a distutils command is used to define the minimum and
-  maximum features to support, based on user choice and compiler support. The
-  appropriate macros are overlaid with the platform / architecture intrinsics,
-  and the three loops are compiled.
-- At *runtime import*, the CPU is probed for the set of supported intrinsic
-  features. A mechanism is used to grab the pointer to the most appropriate
-  function, and this will be the one called for the function.
-
-
-Build options for compilation
-=============================
-
-- ``--cpu-baseline``: minimal set of required optimizations. Default
-  value is ``min`` which provides the minimum CPU features that can
-  safely run on a wide range of platforms within the processor family.
-
-- ``--cpu-dispatch``: dispatched set of additional optimizations.
-  The default value is ``max -xop -fma4`` which enables all CPU
-  features, except for AMD legacy features(in case of X86).
-
-The command arguments are available in ``build``, ``build_clib``, and
-``build_ext``.
-if ``build_clib`` or ``build_ext`` are not specified by the user, the arguments of
-``build`` will be used instead, which also holds the default values.
-
-Optimization names can be CPU features or groups of features that gather
-several features or :ref:`special options <special-options>` to perform a series of procedures.
-
-
-The following tables show the current supported optimizations sorted from the lowest to the highest interest.
-
-.. include:: simd-optimizations-tables.inc
-
-----
-
-.. _tables-diff:
-
-While the above tables are based on the GCC Compiler, the following tables showing the differences in the
-other compilers:
-
-.. include:: simd-optimizations-tables-diff.inc
-
-.. _special-options:
-
-Special options
-~~~~~~~~~~~~~~~
-
-- ``NONE``: enable no features
-
-- ``NATIVE``: Enables all CPU features that supported by the current
-   machine, this operation is based on the compiler flags (``-march=native, -xHost, /QxHost``)
-
-- ``MIN``: Enables the minimum CPU features that can safely run on a wide range of platforms:
-
-  .. table::
-      :align: left
-
-      ======================================  =======================================
-       For Arch                               Returns
-      ======================================  =======================================
-       ``x86``                                ``SSE`` ``SSE2``
-       ``x86`` ``64-bit mode``                ``SSE`` ``SSE2`` ``SSE3``
-       ``IBM/POWER`` ``big-endian mode``      ``NONE``
-       ``IBM/POWER`` ``little-endian mode``   ``VSX`` ``VSX2``
-       ``ARMHF``                              ``NONE``
-       ``ARM64`` ``AARCH64``                  ``NEON`` ``NEON_FP16`` ``NEON_VFPV4``
-                                              ``ASIMD``
-      ======================================  =======================================
-
-- ``MAX``: Enables all supported CPU features by the Compiler and platform.
-
-- ``Operators-/+``: remove or add features, useful with options ``MAX``, ``MIN`` and ``NATIVE``.
-
-NOTES
-~~~~~~~~~~~~~
-- CPU features and other options are case-insensitive.
-
-- The order of the requested optimizations doesn't matter.
-
-- Either commas or spaces can be used as a separator, e.g. ``--cpu-dispatch``\ =
-  "avx2 avx512f" or ``--cpu-dispatch``\ = "avx2, avx512f" both work, but the
-  arguments must be enclosed in quotes.
-
-- The operand ``+`` is only added for nominal reasons, For example:
-  ``--cpu-baseline= "min avx2"`` is equivalent to ``--cpu-baseline="min + avx2"``.
-  ``--cpu-baseline="min,avx2"`` is equivalent to ``--cpu-baseline`="min,+avx2"``
-
-- If the CPU feature is not supported by the user platform or
-  compiler, it will be skipped rather than raising a fatal error.
-
-- Any specified CPU feature to ``--cpu-dispatch`` will be skipped if
-  it's part of CPU baseline features
-
-- The ``--cpu-baseline`` argument force-enables implied features,
-  e.g. ``--cpu-baseline``\ ="sse42" is equivalent to
-  ``--cpu-baseline``\ ="sse sse2 sse3 ssse3 sse41 popcnt sse42"
-
-- The value of ``--cpu-baseline`` will be treated as "native" if
-  compiler native flag ``-march=native`` or ``-xHost`` or ``QxHost`` is
-  enabled through environment variable ``CFLAGS``
-
-- The validation process for the requested optimizations when it comes to
-  ``--cpu-baseline`` isn't strict. For example, if the user requested
-  ``AVX2`` but the compiler doesn't support it then we just skip it and return
-  the maximum optimization that the compiler can handle depending on the
-  implied features of ``AVX2``, let us assume ``AVX``.
-
-- The user should always check the final report through the build log
-  to verify the enabled features.
-
-Special cases
-~~~~~~~~~~~~~
-
-**Interrelated CPU features**: Some exceptional conditions force us to link some features together when it come to certain compilers or architectures, resulting in the impossibility of building them separately.
-These conditions can be divided into two parts, as follows:
-
-- **Architectural compatibility**: The need to align certain CPU features that are assured
-  to be supported by successive generations of the same architecture, for example:
-
-  - On ppc64le `VSX(ISA 2.06)` and `VSX2(ISA 2.07)` both imply one another since the
-    first generation that supports little-endian mode is Power-8`(ISA 2.07)`
-  - On AArch64 `NEON` `FP16` `VFPV4` `ASIMD` implies each other since they are part of the
-    hardware baseline.
-
-- **Compilation compatibility**: Not all **C/C++** compilers provide independent support for all CPU
-  features. For example, **Intel**'s compiler doesn't provide separated flags for `AVX2` and `FMA3`,
-  it makes sense since all Intel CPUs that comes with `AVX2` also support `FMA3` and vice versa,
-  but this approach is incompatible with other **x86** CPUs from **AMD** or **VIA**.
-  Therefore, there are differences in the depiction of CPU features between the C/C++ compilers,
-  as shown in the :ref:`tables above <tables-diff>`.
-
-
-Behaviors and Errors
-~~~~~~~~~~~~~~~~~~~~
-
-
-
-Usage and Examples
-~~~~~~~~~~~~~~~~~~
-
-Report and Trace
-~~~~~~~~~~~~~~~~
-
-Understanding CPU Dispatching, How the NumPy dispatcher works?
-==============================================================
-
-NumPy dispatcher is based on multi-source compiling, which means taking
-a certain source and compiling it multiple times with different compiler
-flags and also with different **C** definitions that affect the code
-paths to enable certain instruction-sets for each compiled object
-depending on the required optimizations, then combining the returned
-objects together.
-
-.. figure:: ../figures/opt-infra.png
-
-This mechanism should support all compilers and it doesn't require any
-compiler-specific extension, but at the same time it is adds a few steps to
-normal compilation that are explained as follows:
-
-1- Configuration
-~~~~~~~~~~~~~~~~
-
-Configuring the required optimization by the user before starting to build the
-source files via the two command arguments as explained above:
-
--  ``--cpu-baseline``: minimal set of required optimizations.
-
--  ``--cpu-dispatch``: dispatched set of additional optimizations.
-
-
-2- Discovering the environment
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-In this part, we check the compiler and platform architecture
-and cache some of the intermediary results to speed up rebuilding.
-
-3- Validating the requested optimizations
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-By testing them against the compiler, and seeing what the compiler can
-support according to the requested optimizations.
-
-4- Generating the main configuration header
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The generated header ``_cpu_dispatch.h`` contains all the definitions and
-headers of instruction-sets for the required optimizations that have been
-validated during the previous step.
-
-It also contains extra C definitions that are used for defining NumPy's
-Python-level module attributes ``__cpu_baseline__`` and ``__cpu_dispaٍtch__``.
-
-**What is in this header?**
-
-The example header was dynamically generated by gcc on an X86 machine.
-The compiler supports ``--cpu-baseline="sse sse2 sse3"`` and
-``--cpu-dispatch="ssse3 sse41"``, and the result is below.
-
-.. code:: c
-
-   // The header should be located at numpy/numpy/core/src/common/_cpu_dispatch.h
-   /**NOTE
-    ** C definitions prefixed with "NPY_HAVE_" represent
-    ** the required optimzations.
-    **
-    ** C definitions prefixed with 'NPY__CPU_TARGET_' are protected and
-    ** shouldn't be used by any NumPy C sources.
-    */
-   /******* baseline features *******/
-   /** SSE **/
-   #define NPY_HAVE_SSE 1
-   #include <xmmintrin.h>
-   /** SSE2 **/
-   #define NPY_HAVE_SSE2 1
-   #include <emmintrin.h>
-   /** SSE3 **/
-   #define NPY_HAVE_SSE3 1
-   #include <pmmintrin.h>
-
-   /******* dispatch-able features *******/
-   #ifdef NPY__CPU_TARGET_SSSE3
-     /** SSSE3 **/
-     #define NPY_HAVE_SSSE3 1
-     #include <tmmintrin.h>
-   #endif
-   #ifdef NPY__CPU_TARGET_SSE41
-     /** SSE41 **/
-     #define NPY_HAVE_SSE41 1
-     #include <smmintrin.h>
-   #endif
-
-**Baseline features** are the minimal set of required optimizations configured
-via ``--cpu-baseline``. They have no preprocessor guards and they're
-always on, which means they can be used in any source.
-
-Does this mean NumPy's infrastructure passes the compiler's flags of
-baseline features to all sources?
-
-Definitely, yes. But the :ref:`dispatch-able sources <dispatchable-sources>` are
-treated differently.
-
-What if the user specifies certain **baseline features** during the
-build but at runtime the machine doesn't support even these
-features? Will the compiled code be called via one of these definitions, or
-maybe the compiler itself auto-generated/vectorized certain piece of code
-based on the provided command line compiler flags?
-
-During the loading of the NumPy module, there's a validation step
-which detects this behavior. It will raise a Python runtime error to inform the
-user. This is to prevent the CPU reaching an illegal instruction error causing
-a segfault.
-
-**Dispatch-able features** are our dispatched set of additional optimizations
-that were configured via ``--cpu-dispatch``. They are not activated by
-default and are always guarded by other C definitions prefixed with
-``NPY__CPU_TARGET_``. C definitions ``NPY__CPU_TARGET_`` are only
-enabled within **dispatch-able sources**.
-
-.. _dispatchable-sources:
-
-5- Dispatch-able sources and configuration statements
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Dispatch-able sources are special **C** files that can be compiled multiple
-times with different compiler flags and also with different **C**
-definitions. These affect code paths to enable certain
-instruction-sets for each compiled object according to "**the
-configuration statements**" that must be declared between a **C**
-comment\ ``(/**/)`` and start with a special mark **@targets** at the
-top of each dispatch-able source. At the same time, dispatch-able
-sources will be treated as normal **C** sources if the optimization was
-disabled by the command argument ``--disable-optimization`` .
-
-**What are configuration statements?**
-
-Configuration statements are sort of keywords combined together to
-determine the required optimization for the dispatch-able source.
-
-Example:
-
-.. code:: c
-
-   /*@targets avx2 avx512f vsx2 vsx3 asimd asimdhp */
-   // C code
-
-The keywords mainly represent the additional optimizations configured
-through ``--cpu-dispatch``, but it can also represent other options such as:
-
-- Target groups: pre-configured configuration statements used for
-  managing the required optimizations from outside the dispatch-able source.
-
-- Policies: collections of options used for changing the default
-  behaviors or forcing the compilers to perform certain things.
-
-- "baseline": a unique keyword represents the minimal optimizations
-  that configured through ``--cpu-baseline``
-
-**Numpy's infrastructure handles dispatch-able sources in four steps**:
-
-- **(A) Recognition**: Just like source templates and F2PY, the
-  dispatch-able sources requires a special extension ``*.dispatch.c``
-  to mark C dispatch-able source files, and for C++
-  ``*.dispatch.cpp`` or ``*.dispatch.cxx``
-  **NOTE**: C++ not supported yet.
-
-- **(B) Parsing and validating**: In this step, the
-  dispatch-able sources that had been filtered by the previous step
-  are parsed and validated by the configuration statements for each one
-  of them one by one in order to determine the required optimizations.
-
-- **(C) Wrapping**: This is the approach taken by NumPy's
-  infrastructure, which has proved to be sufficiently flexible in order
-  to compile a single source multiple times with different **C**
-  definitions and flags that affect the code paths. The process is
-  achieved by creating a temporary **C** source for each required
-  optimization that related to the additional optimization, which
-  contains the declarations of the **C** definitions and includes the
-  involved source via the **C** directive **#include**. For more
-  clarification take a look at the following code for AVX512F :
-
-  .. code:: c
-
-      /*
-       * this definition is used by NumPy utilities as suffixes for the
-       * exported symbols
-       */
-      #define NPY__CPU_TARGET_CURRENT AVX512F
-      /*
-       * The following definitions enable
-       * definitions of the dispatch-able features that are defined within the main
-       * configuration header. These are definitions for the implied features.
-       */
-      #define NPY__CPU_TARGET_SSE
-      #define NPY__CPU_TARGET_SSE2
-      #define NPY__CPU_TARGET_SSE3
-      #define NPY__CPU_TARGET_SSSE3
-      #define NPY__CPU_TARGET_SSE41
-      #define NPY__CPU_TARGET_POPCNT
-      #define NPY__CPU_TARGET_SSE42
-      #define NPY__CPU_TARGET_AVX
-      #define NPY__CPU_TARGET_F16C
-      #define NPY__CPU_TARGET_FMA3
-      #define NPY__CPU_TARGET_AVX2
-      #define NPY__CPU_TARGET_AVX512F
-      // our dispatch-able source
-      #include "/the/absuolate/path/of/hello.dispatch.c"
-
-- **(D) Dispatch-able configuration header**: The infrastructure
-  generates a config header for each dispatch-able source, this header
-  mainly contains two abstract **C** macros used for identifying the
-  generated objects, so they can be used for runtime dispatching
-  certain symbols from the generated objects by any **C** source. It is
-  also used for forward declarations.
-
-  The generated header takes the name of the dispatch-able source after
-  excluding the extension and replace it with '**.h**', for example
-  assume we have a dispatch-able source called **hello.dispatch.c** and
-  contains the following:
-
-  .. code:: c
-
-      // hello.dispatch.c
-      /*@targets baseline sse42 avx512f */
-      #include <stdio.h>
-      #include "numpy/utils.h" // NPY_CAT, NPY_TOSTR
-
-      #ifndef NPY__CPU_TARGET_CURRENT
-        // wrapping the dispatch-able source only happens to the additional optimizations
-        // but if the keyword 'baseline' provided within the configuration statements,
-        // the infrastructure will add extra compiling for the dispatch-able source by
-        // passing it as-is to the compiler without any changes.
-        #define CURRENT_TARGET(X) X
-        #define NPY__CPU_TARGET_CURRENT baseline // for printing only
-      #else
-        // since we reach to this point, that's mean we're dealing with
-          // the additional optimizations, so it could be SSE42 or AVX512F
-        #define CURRENT_TARGET(X) NPY_CAT(NPY_CAT(X, _), NPY__CPU_TARGET_CURRENT)
-      #endif
-      // Macro 'CURRENT_TARGET' adding the current target as suffux to the exported symbols,
-      // to avoid linking duplications, NumPy already has a macro called
-      // 'NPY_CPU_DISPATCH_CURFX' similar to it, located at
-      // numpy/numpy/core/src/common/npy_cpu_dispatch.h
-      // NOTE: we tend to not adding suffixes to the baseline exported symbols
-      void CURRENT_TARGET(simd_whoami)(const char *extra_info)
-      {
-          printf("I'm " NPY_TOSTR(NPY__CPU_TARGET_CURRENT) ", %s\n", extra_info);
-      }
-
-  Now assume you attached **hello.dispatch.c** to the source tree, then
-  the infrastructure should generate a temporary config header called
-  **hello.dispatch.h** that can be reached by any source in the source
-  tree, and it should contain the following code :
-
-  .. code:: c
-
-      #ifndef NPY__CPU_DISPATCH_EXPAND_
-        // To expand the macro calls in this header
-          #define NPY__CPU_DISPATCH_EXPAND_(X) X
-      #endif
-      // Undefining the following macros, due to the possibility of including config headers
-      // multiple times within the same source and since each config header represents
-      // different required optimizations according to the specified configuration
-      // statements in the dispatch-able source that derived from it.
-      #undef NPY__CPU_DISPATCH_BASELINE_CALL
-      #undef NPY__CPU_DISPATCH_CALL
-      // nothing strange here, just a normal preprocessor callback
-      // enabled only if 'baseline' specified within the configuration statements
-      #define NPY__CPU_DISPATCH_BASELINE_CALL(CB, ...) \
-        NPY__CPU_DISPATCH_EXPAND_(CB(__VA_ARGS__))
-      // 'NPY__CPU_DISPATCH_CALL' is an abstract macro is used for dispatching
-      // the required optimizations that specified within the configuration statements.
-      //
-      // @param CHK, Expected a macro that can be used to detect CPU features
-      // in runtime, which takes a CPU feature name without string quotes and
-      // returns the testing result in a shape of boolean value.
-      // NumPy already has macro called "NPY_CPU_HAVE", which fits this requirement.
-      //
-      // @param CB, a callback macro that expected to be called multiple times depending
-      // on the required optimizations, the callback should receive the following arguments:
-      //  1- The pending calls of @param CHK filled up with the required CPU features,
-      //     that need to be tested first in runtime before executing call belong to
-      //     the compiled object.
-      //  2- The required optimization name, same as in 'NPY__CPU_TARGET_CURRENT'
-      //  3- Extra arguments in the macro itself
-      //
-      // By default the callback calls are sorted depending on the highest interest
-      // unless the policy "$keep_sort" was in place within the configuration statements
-      // see "Dive into the CPU dispatcher" for more clarification.
-      #define NPY__CPU_DISPATCH_CALL(CHK, CB, ...) \
-        NPY__CPU_DISPATCH_EXPAND_(CB((CHK(AVX512F)), AVX512F, __VA_ARGS__)) \
-        NPY__CPU_DISPATCH_EXPAND_(CB((CHK(SSE)&&CHK(SSE2)&&CHK(SSE3)&&CHK(SSSE3)&&CHK(SSE41)), SSE41, __VA_ARGS__))
-
-  An example of using the config header in light of the above:
-
-  .. code:: c
-
-      // NOTE: The following macros are only defined for demonstration purposes only.
-      // NumPy already has a collections of macros located at
-      // numpy/numpy/core/src/common/npy_cpu_dispatch.h, that covers all dispatching
-      // and declarations scenarios.
-
-      #include "numpy/npy_cpu_features.h" // NPY_CPU_HAVE
-      #include "numpy/utils.h" // NPY_CAT, NPY_EXPAND
-
-      // An example for setting a macro that calls all the exported symbols at once
-      // after checking if they're supported by the running machine.
-      #define DISPATCH_CALL_ALL(FN, ARGS) \
-          NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_ALL_CB, FN, ARGS) \
-          NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_ALL_CB, FN, ARGS)
-      // The preprocessor callbacks.
-      // The same suffixes as we define it in the dispatch-able source.
-      #define DISPATCH_CALL_ALL_CB(CHECK, TARGET_NAME, FN, ARGS) \
-        if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
-      #define DISPATCH_CALL_BASELINE_ALL_CB(FN, ARGS) \
-        FN NPY_EXPAND(ARGS);
-
-      // An example for setting a macro that calls the exported symbols of highest
-      // interest optimization, after checking if they're supported by the running machine.
-      #define DISPATCH_CALL_HIGH(FN, ARGS) \
-        if (0) {} \
-          NPY__CPU_DISPATCH_CALL(NPY_CPU_HAVE, DISPATCH_CALL_HIGH_CB, FN, ARGS) \
-          NPY__CPU_DISPATCH_BASELINE_CALL(DISPATCH_CALL_BASELINE_HIGH_CB, FN, ARGS)
-      // The preprocessor callbacks
-      // The same suffixes as we define it in the dispatch-able source.
-      #define DISPATCH_CALL_HIGH_CB(CHECK, TARGET_NAME, FN, ARGS) \
-        else if (CHECK) { NPY_CAT(NPY_CAT(FN, _), TARGET_NAME) ARGS; }
-      #define DISPATCH_CALL_BASELINE_HIGH_CB(FN, ARGS) \
-        else { FN NPY_EXPAND(ARGS); }
-
-      // NumPy has a macro called 'NPY_CPU_DISPATCH_DECLARE' can be used
-      // for forward declrations any kind of prototypes based on
-      // 'NPY__CPU_DISPATCH_CALL' and 'NPY__CPU_DISPATCH_BASELINE_CALL'.
-      // However in this example, we just handle it manually.
-      void simd_whoami(const char *extra_info);
-      void simd_whoami_AVX512F(const char *extra_info);
-      void simd_whoami_SSE41(const char *extra_info);
-
-      void trigger_me(void)
-      {
-          // bring the auto-gernreated config header
-          // which contains config macros 'NPY__CPU_DISPATCH_CALL' and
-          // 'NPY__CPU_DISPATCH_BASELINE_CALL'.
-          // it highely recomaned to include the config header before exectuing
-        // the dispatching macros in case if there's another header in the scope.
-          #include "hello.dispatch.h"
-          DISPATCH_CALL_ALL(simd_whoami, ("all"))
-          DISPATCH_CALL_HIGH(simd_whoami, ("the highest interest"))
-          // An example of including multiple config headers in the same source
-          // #include "hello2.dispatch.h"
-          // DISPATCH_CALL_HIGH(another_function, ("the highest interest"))
-      }
-
-
-Dive into the CPU dispatcher
-============================
-
-The baseline
-~~~~~~~~~~~~
-
-Dispatcher
-~~~~~~~~~~
-
-Groups and Policies
-~~~~~~~~~~~~~~~~~~~
-
-Examples
-~~~~~~~~
-
-Report and Trace
-~~~~~~~~~~~~~~~~
-
-
-.. _`Universal Intrinsics`: https://numpy.org/neps/nep-0038-SIMD-optimizations.html
+TODO add redirect
author	Sayed Adel <seiko@imavr.com>	2021-11-23 03:54:15 +0200
committer	Sayed Adel <seiko@imavr.com>	2021-12-08 22:18:07 +0200
commit	9fd4162476e4499c71567f79197cc2f9f9076219 (patch)
tree	4f80bf46802ccb7def039d54349633cb7ca26053
parent	563051aaebbb80da3d453cacf3e1f9782d3077fb (diff)
download	numpy-9fd4162476e4499c71567f79197cc2f9f9076219.tar.gz