summaryrefslogtreecommitdiff
path: root/doc/source/reference/simd
diff options
context:
space:
mode:
Diffstat (limited to 'doc/source/reference/simd')
-rw-r--r--doc/source/reference/simd/simd-optimizations-tables-diff.inc37
-rw-r--r--doc/source/reference/simd/simd-optimizations-tables.inc165
-rw-r--r--doc/source/reference/simd/simd-optimizations.py236
-rw-r--r--doc/source/reference/simd/simd-optimizations.rst42
4 files changed, 282 insertions, 198 deletions
diff --git a/doc/source/reference/simd/simd-optimizations-tables-diff.inc b/doc/source/reference/simd/simd-optimizations-tables-diff.inc
new file mode 100644
index 000000000..41fa96703
--- /dev/null
+++ b/doc/source/reference/simd/simd-optimizations-tables-diff.inc
@@ -0,0 +1,37 @@
+.. generated via source/reference/simd/simd-optimizations.py
+
+x86::Intel Compiler - CPU feature names
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. table::
+ :align: left
+
+ =========== ==================================================================================================================
+ Name Implies
+ =========== ==================================================================================================================
+ ``FMA3`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` **AVX2**
+ ``AVX2`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` **FMA3**
+ ``AVX512F`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` **AVX512CD**
+ =========== ==================================================================================================================
+
+.. note::
+ The following features aren't supported by x86::Intel Compiler:
+ **XOP FMA4**
+
+x86::Microsoft Visual C/C++ - CPU feature names
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. table::
+ :align: left
+
+ ============ =================================================================================================================================
+ Name Implies
+ ============ =================================================================================================================================
+ ``FMA3`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` **AVX2**
+ ``AVX2`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` **FMA3**
+ ``AVX512F`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` **AVX512CD** **AVX512_SKX**
+ ``AVX512CD`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` **AVX512_SKX**
+ ============ =================================================================================================================================
+
+.. note::
+ The following features aren't supported by x86::Microsoft Visual C/C++:
+ **AVX512_KNL AVX512_KNM**
+
diff --git a/doc/source/reference/simd/simd-optimizations-tables.inc b/doc/source/reference/simd/simd-optimizations-tables.inc
index d5b82ee0c..f038a91e1 100644
--- a/doc/source/reference/simd/simd-optimizations-tables.inc
+++ b/doc/source/reference/simd/simd-optimizations-tables.inc
@@ -1,110 +1,103 @@
.. generated via source/reference/simd/simd-optimizations.py
-``X86`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
+x86 - CPU feature names
+~~~~~~~~~~~~~~~~~~~~~~~
.. table::
:align: left
- ======== =================================================================================================================
- Name Implies
- ======== =================================================================================================================
- SSE ``SSE`` ``SSE2``
- SSE2 ``SSE`` ``SSE2``
- SSE3 ``SSE`` ``SSE2``
- SSSE3 ``SSE`` ``SSE2`` ``SSE3``
- SSE41 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3``
- POPCNT ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41``
- SSE42 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT``
- AVX ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42``
- XOP ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX``
- FMA4 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX``
- F16C ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX``
- FMA3 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C``
- AVX2 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C``
- AVX512F ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2``
- AVX512CD ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F``
- ======== =================================================================================================================
-
-``X86`` - Group names
-~~~~~~~~~~~~~~~~~~~~~
-
+ ============ =================================================================================================================
+ Name Implies
+ ============ =================================================================================================================
+ ``SSE`` ``SSE2``
+ ``SSE2`` ``SSE``
+ ``SSE3`` ``SSE`` ``SSE2``
+ ``SSSE3`` ``SSE`` ``SSE2`` ``SSE3``
+ ``SSE41`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3``
+ ``POPCNT`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41``
+ ``SSE42`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT``
+ ``AVX`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42``
+ ``XOP`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX``
+ ``FMA4`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX``
+ ``F16C`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX``
+ ``FMA3`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C``
+ ``AVX2`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C``
+ ``AVX512F`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2``
+ ``AVX512CD`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F``
+ ============ =================================================================================================================
+
+x86 - Group names
+~~~~~~~~~~~~~~~~~
.. table::
:align: left
- ========== ===================================================== ===========================================================================================================================================================================
- Name Gather Implies
- ========== ===================================================== ===========================================================================================================================================================================
- AVX512_KNL ``AVX512ER`` ``AVX512PF`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD``
- AVX512_KNM ``AVX5124FMAPS`` ``AVX5124VNNIW`` ``AVX512VPOPCNTDQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_KNL``
- AVX512_SKX ``AVX512VL`` ``AVX512BW`` ``AVX512DQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD``
- AVX512_CLX ``AVX512VNNI`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX``
- AVX512_CNL ``AVX512IFMA`` ``AVX512VBMI`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX``
- AVX512_ICL ``AVX512VBMI2`` ``AVX512BITALG`` ``AVX512VPOPCNTDQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512_CLX`` ``AVX512_CNL``
- ========== ===================================================== ===========================================================================================================================================================================
-
-``IBM/POWER`` ``big-endian`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
+ ============== ===================================================== ===========================================================================================================================================================================
+ Name Gather Implies
+ ============== ===================================================== ===========================================================================================================================================================================
+ ``AVX512_KNL`` ``AVX512ER`` ``AVX512PF`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD``
+ ``AVX512_KNM`` ``AVX5124FMAPS`` ``AVX5124VNNIW`` ``AVX512VPOPCNTDQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_KNL``
+ ``AVX512_SKX`` ``AVX512VL`` ``AVX512BW`` ``AVX512DQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD``
+ ``AVX512_CLX`` ``AVX512VNNI`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX``
+ ``AVX512_CNL`` ``AVX512IFMA`` ``AVX512VBMI`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX``
+ ``AVX512_ICL`` ``AVX512VBMI2`` ``AVX512BITALG`` ``AVX512VPOPCNTDQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512_CLX`` ``AVX512_CNL``
+ ============== ===================================================== ===========================================================================================================================================================================
+
+IBM/POWER big-endian - CPU feature names
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. table::
:align: left
- ==== ================
- Name Implies
- ==== ================
- VSX
- VSX2 ``VSX``
- VSX3 ``VSX`` ``VSX2``
- ==== ================
-
-``IBM/POWER`` ``little-endian mode`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ ======== ================
+ Name Implies
+ ======== ================
+ ``VSX``
+ ``VSX2`` ``VSX``
+ ``VSX3`` ``VSX`` ``VSX2``
+ ======== ================
+IBM/POWER little-endian - CPU feature names
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. table::
:align: left
- ==== ================
- Name Implies
- ==== ================
- VSX ``VSX`` ``VSX2``
- VSX2 ``VSX`` ``VSX2``
- VSX3 ``VSX`` ``VSX2``
- ==== ================
+ ======== ================
+ Name Implies
+ ======== ================
+ ``VSX`` ``VSX2``
+ ``VSX2`` ``VSX``
+ ``VSX3`` ``VSX`` ``VSX2``
+ ======== ================
-``ARMHF`` - CPU feature names
+ARMv7/A32 - CPU feature names
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
.. table::
:align: left
- ========== ===========================================================
- Name Implies
- ========== ===========================================================
- NEON
- NEON_FP16 ``NEON``
- NEON_VFPV4 ``NEON`` ``NEON_FP16``
- ASIMD ``NEON`` ``NEON_FP16`` ``NEON_VFPV4``
- ASIMDHP ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
- ASIMDDP ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
- ASIMDFHM ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` ``ASIMDHP``
- ========== ===========================================================
-
-``ARM64`` ``AARCH64`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
+ ============== ===========================================================
+ Name Implies
+ ============== ===========================================================
+ ``NEON``
+ ``NEON_FP16`` ``NEON``
+ ``NEON_VFPV4`` ``NEON`` ``NEON_FP16``
+ ``ASIMD`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4``
+ ``ASIMDHP`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
+ ``ASIMDDP`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
+ ``ASIMDFHM`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` ``ASIMDHP``
+ ============== ===========================================================
+
+ARMv8/A64 - CPU feature names
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. table::
:align: left
- ========== ===========================================================
- Name Implies
- ========== ===========================================================
- NEON ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
- NEON_FP16 ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
- NEON_VFPV4 ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
- ASIMD ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
- ASIMDHP ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
- ASIMDDP ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
- ASIMDFHM ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` ``ASIMDHP``
- ========== ===========================================================
+ ============== ===========================================================
+ Name Implies
+ ============== ===========================================================
+ ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
+ ``NEON_FP16`` ``NEON`` ``NEON_VFPV4`` ``ASIMD``
+ ``NEON_VFPV4`` ``NEON`` ``NEON_FP16`` ``ASIMD``
+ ``ASIMD`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4``
+ ``ASIMDHP`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
+ ``ASIMDDP`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD``
+ ``ASIMDFHM`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` ``ASIMDHP``
+ ============== ===========================================================
- \ No newline at end of file
diff --git a/doc/source/reference/simd/simd-optimizations.py b/doc/source/reference/simd/simd-optimizations.py
index 628356163..5d6da50e3 100644
--- a/doc/source/reference/simd/simd-optimizations.py
+++ b/doc/source/reference/simd/simd-optimizations.py
@@ -3,10 +3,14 @@ Generate CPU features tables from CCompilerOpt
"""
from os import sys, path
gen_path = path.dirname(path.realpath(__file__))
+#sys.path.append(path.abspath(path.join(gen_path, *([".."]*4), "numpy", "distutils")))
+#from ccompiler_opt import CCompilerOpt
from numpy.distutils.ccompiler_opt import CCompilerOpt
class FakeCCompilerOpt(CCompilerOpt):
fake_info = ""
+ # disable caching no need for it
+ conf_nocache = True
def __init__(self, *args, **kwargs):
no_cc = None
CCompilerOpt.__init__(self, no_cc, **kwargs)
@@ -23,40 +27,49 @@ class FakeCCompilerOpt(CCompilerOpt):
return True
def gen_features_table(self, features, ignore_groups=True,
- field_names=["Name", "Implies"], **kwargs):
+ field_names=["Name", "Implies"],
+ fstyle=None, fstyle_implies=None, **kwargs):
rows = []
- for f in features:
+ if fstyle is None:
+ fstyle = lambda ft: f'``{ft}``'
+ if fstyle_implies is None:
+ fstyle_implies = lambda origin, ft: fstyle(ft)
+ for f in self.feature_sorted(features):
is_group = "group" in self.feature_supported.get(f, {})
if ignore_groups and is_group:
continue
implies = self.feature_sorted(self.feature_implies(f))
- implies = ' '.join(['``%s``' % i for i in implies])
- rows.append([f, implies])
- return self.gen_rst_table(field_names, rows, **kwargs)
+ implies = ' '.join([fstyle_implies(f, i) for i in implies])
+ rows.append([fstyle(f), implies])
+ if rows:
+ return self.gen_rst_table(field_names, rows, **kwargs)
def gen_gfeatures_table(self, features,
field_names=["Name", "Gather", "Implies"],
- **kwargs):
+ fstyle=None, fstyle_implies=None, **kwargs):
rows = []
- for f in features:
+ if fstyle is None:
+ fstyle = lambda ft: f'``{ft}``'
+ if fstyle_implies is None:
+ fstyle_implies = lambda origin, ft: fstyle(ft)
+ for f in self.feature_sorted(features):
gather = self.feature_supported.get(f, {}).get("group", None)
if not gather:
continue
implies = self.feature_sorted(self.feature_implies(f))
- implies = ' '.join(['``%s``' % i for i in implies])
- gather = ' '.join(['``%s``' % i for i in gather])
- rows.append([f, gather, implies])
- return self.gen_rst_table(field_names, rows, **kwargs)
+ implies = ' '.join([fstyle_implies(f, i) for i in implies])
+ gather = ' '.join([fstyle_implies(f, i) for i in gather])
+ rows.append([fstyle(f), gather, implies])
+ if rows:
+ return self.gen_rst_table(field_names, rows, **kwargs)
-
- def gen_rst_table(self, field_names, rows, margin_left=2):
+ def gen_rst_table(self, field_names, rows, tab_size=4):
assert(not rows or len(field_names) == len(rows[0]))
rows.append(field_names)
fld_len = len(field_names)
cls_len = [max(len(c[i]) for c in rows) for i in range(fld_len)]
del rows[-1]
- padding = 0
- cformat = ' '.join('{:<%d}' % (i+padding) for i in cls_len)
+ cformat = ' '.join('{:<%d}' % i for i in cls_len)
border = cformat.format(*['='*i for i in cls_len])
rows = [cformat.format(*row) for row in rows]
@@ -65,102 +78,113 @@ class FakeCCompilerOpt(CCompilerOpt):
# footer
rows += [border]
# add left margin
- rows = [(' ' * margin_left) + r for r in rows]
+ rows = [(' ' * tab_size) + r for r in rows]
return '\n'.join(rows)
-if __name__ == '__main__':
- margin_left = 4*1
- ############### x86 ###############
- FakeCCompilerOpt.fake_info = "x86_64 gcc"
- x64_gcc = FakeCCompilerOpt(cpu_baseline="max")
- x86_tables = """\
-``X86`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. table::
- :align: left
-
-{x86_features}
-
-``X86`` - Group names
-~~~~~~~~~~~~~~~~~~~~~
-
-.. table::
- :align: left
-
-{x86_gfeatures}
-
-""".format(
- x86_features = x64_gcc.gen_features_table(
- x64_gcc.cpu_baseline_names(), margin_left=margin_left
- ),
- x86_gfeatures = x64_gcc.gen_gfeatures_table(
- x64_gcc.cpu_baseline_names(), margin_left=margin_left
+def features_table_sections(name, ftable=None, gtable=None, tab_size=4):
+ tab = ' '*tab_size
+ content = ''
+ if ftable:
+ title = f"{name} - CPU feature names"
+ content = (
+ f"{title}\n{'~'*len(title)}"
+ f"\n.. table::\n{tab}:align: left\n\n"
+ f"{ftable}\n\n"
)
- )
- ############### Power ###############
- FakeCCompilerOpt.fake_info = "ppc64 gcc"
- ppc64_gcc = FakeCCompilerOpt(cpu_baseline="max")
- FakeCCompilerOpt.fake_info = "ppc64le gcc"
- ppc64le_gcc = FakeCCompilerOpt(cpu_baseline="max")
- ppc64_tables = """\
-``IBM/POWER`` ``big-endian`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. table::
- :align: left
-
-{ppc64_features}
-
-``IBM/POWER`` ``little-endian mode`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. table::
- :align: left
-
-{ppc64le_features}
-
-""".format(
- ppc64_features = ppc64_gcc.gen_features_table(
- ppc64_gcc.cpu_baseline_names(), margin_left=margin_left
- ),
- ppc64le_features = ppc64le_gcc.gen_features_table(
- ppc64le_gcc.cpu_baseline_names(), margin_left=margin_left
+ if gtable:
+ title = f"{name} - Group names"
+ content += (
+ f"{title}\n{'~'*len(title)}"
+ f"\n.. table::\n{tab}:align: left\n\n"
+ f"{gtable}\n\n"
)
+ return content
+
+def features_table(arch, cc="gcc", pretty_name=None, **kwargs):
+ FakeCCompilerOpt.fake_info = arch + cc
+ ccopt = FakeCCompilerOpt(cpu_baseline="max")
+ features = ccopt.cpu_baseline_names()
+ ftable = ccopt.gen_features_table(features, **kwargs)
+ gtable = ccopt.gen_gfeatures_table(features, **kwargs)
+
+ if not pretty_name:
+ pretty_name = arch + '/' + cc
+ return features_table_sections(pretty_name, ftable, gtable, **kwargs)
+
+def features_table_diff(arch, cc, cc_vs="gcc", pretty_name=None, **kwargs):
+ FakeCCompilerOpt.fake_info = arch + cc
+ ccopt = FakeCCompilerOpt(cpu_baseline="max")
+ fnames = ccopt.cpu_baseline_names()
+ features = {f:ccopt.feature_implies(f) for f in fnames}
+
+ FakeCCompilerOpt.fake_info = arch + cc_vs
+ ccopt_vs = FakeCCompilerOpt(cpu_baseline="max")
+ fnames_vs = ccopt_vs.cpu_baseline_names()
+ features_vs = {f:ccopt_vs.feature_implies(f) for f in fnames_vs}
+
+ common = set(fnames).intersection(fnames_vs)
+ extra_avl = set(fnames).difference(fnames_vs)
+ not_avl = set(fnames_vs).difference(fnames)
+ diff_impl_f = {f:features[f].difference(features_vs[f]) for f in common}
+ diff_impl = {k for k, v in diff_impl_f.items() if v}
+
+ fbold = lambda ft: f'**{ft}**' if ft in extra_avl else f'``{ft}``'
+ fbold_implies = lambda origin, ft: (
+ f'**{ft}**' if ft in diff_impl_f.get(origin, {}) else f'``{ft}``'
)
- ############### Arm ###############
- FakeCCompilerOpt.fake_info = "armhf gcc"
- armhf_gcc = FakeCCompilerOpt(cpu_baseline="max")
- FakeCCompilerOpt.fake_info = "aarch64 gcc"
- aarch64_gcc = FakeCCompilerOpt(cpu_baseline="max")
- arm_tables = """\
-``ARMHF`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. table::
- :align: left
-
-{armhf_features}
-
-``ARM64`` ``AARCH64`` - CPU feature names
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. table::
- :align: left
-
-{aarch64_features}
-
- """.format(
- armhf_features = armhf_gcc.gen_features_table(
- armhf_gcc.cpu_baseline_names(), margin_left=margin_left
- ),
- aarch64_features = aarch64_gcc.gen_features_table(
- aarch64_gcc.cpu_baseline_names(), margin_left=margin_left
- )
+ diff_all = diff_impl.union(extra_avl)
+ ftable = ccopt.gen_features_table(
+ diff_all, fstyle=fbold, fstyle_implies=fbold_implies, **kwargs
+ )
+ gtable = ccopt.gen_gfeatures_table(
+ diff_all, fstyle=fbold, fstyle_implies=fbold_implies, **kwargs
)
- # TODO: diff the difference among all supported compilers
+ if not pretty_name:
+ pretty_name = arch + '/' + cc
+ content = features_table_sections(pretty_name, ftable, gtable, **kwargs)
+
+ if not_avl:
+ not_avl = ccopt_vs.feature_sorted(not_avl)
+ not_avl = ' '.join(not_avl)
+ content += (
+ ".. note::\n"
+ f" The following features aren't supported by {pretty_name}:\n"
+ f" **{not_avl}**\n\n"
+ )
+ return content
+
+if __name__ == '__main__':
+ pretty_names = {
+ "PPC64": "IBM/POWER big-endian",
+ "PPC64LE": "IBM/POWER little-endian",
+ "ARMHF": "ARMv7/A32",
+ "AARCH64": "ARMv8/A64",
+ "ICC": "Intel Compiler",
+ # "ICCW": "Intel Compiler msvc-like",
+ "MSVC": "Microsoft Visual C/C++"
+ }
with open(path.join(gen_path, 'simd-optimizations-tables.inc'), 'wt') as fd:
fd.write(f'.. generated via {__file__}\n\n')
- fd.write(x86_tables)
- fd.write(ppc64_tables)
- fd.write(arm_tables)
+ for arch in (
+ ("x86", "PPC64", "PPC64LE", "ARMHF", "AARCH64")
+ ):
+ pretty_name = pretty_names.get(arch, arch)
+ table = features_table(arch=arch, pretty_name=pretty_name)
+ assert(table)
+ fd.write(table)
+
+ with open(path.join(gen_path, 'simd-optimizations-tables-diff.inc'), 'wt') as fd:
+ fd.write(f'.. generated via {__file__}\n\n')
+ for arch, cc_names in (
+ ("x86", ("clang", "ICC", "MSVC")),
+ ("PPC64", ("clang",)),
+ ("PPC64LE", ("clang",)),
+ ("ARMHF", ("clang",)),
+ ("AARCH64", ("clang",))
+ ):
+ arch_pname = pretty_names.get(arch, arch)
+ for cc in cc_names:
+ pretty_name = f"{arch_pname}::{pretty_names.get(cc, cc)}"
+ table = features_table_diff(arch=arch, cc=cc, pretty_name=pretty_name)
+ if table:
+ fd.write(table)
diff --git a/doc/source/reference/simd/simd-optimizations.rst b/doc/source/reference/simd/simd-optimizations.rst
index eb7eb2a83..59a4892b2 100644
--- a/doc/source/reference/simd/simd-optimizations.rst
+++ b/doc/source/reference/simd/simd-optimizations.rst
@@ -29,8 +29,8 @@ Build options for compilation
safely run on a wide range of platforms within the processor family.
- ``--cpu-dispatch``: dispatched set of additional optimizations.
- The default value for ``x86`` is ``max -xop -fma4`` which enables all CPU
- features, except for AMD legacy features.
+ The default value is ``max -xop -fma4`` which enables all CPU
+ features, except for AMD legacy features(in case of X86).
The command arguments are available in ``build``, ``build_clib``, and
``build_ext``.
@@ -38,13 +38,24 @@ if ``build_clib`` or ``build_ext`` are not specified by the user, the arguments
``build`` will be used instead, which also holds the default values.
Optimization names can be CPU features or groups of features that gather
-several features or special options to perform a series of procedures.
+several features or :ref:`special options <special-options>` to perform a series of procedures.
The following tables show the current supported optimizations sorted from the lowest to the highest interest.
.. include:: simd-optimizations-tables.inc
+----
+
+.. _tables-diff:
+
+While the above tables are based on the GCC Compiler, the following tables showing the differences in the
+other compilers:
+
+.. include:: simd-optimizations-tables-diff.inc
+
+.. _special-options:
+
Special options
~~~~~~~~~~~~~~~
@@ -80,7 +91,7 @@ NOTES
- The order of the requsted optimizations doesn't matter.
-- Either commas or spaces can be used as a separator, e.g. ``--cpu-dispatch``\ =
+- Either commas or spaces can be used as a separator, e.g. ``--cpu-dispatch``\ =
"avx2 avx512f" or ``--cpu-dispatch``\ = "avx2, avx512f" both work, but the
arguments must be enclosed in quotes.
@@ -114,6 +125,25 @@ NOTES
Special cases
~~~~~~~~~~~~~
+**Interrelated CPU features**: Some exceptional conditions force us to link some features together when it come to certain compilers or architectures, resulting in the impossibility of building them separately.
+These conditions can be divided into two parts, as follows:
+
+- **Architectural compatibility**: The need to align certain CPU features that are assured
+ to be supported by successive generations of the same architecture, for example:
+
+ - On ppc64le `VSX(ISA 2.06)` and `VSX2(ISA 2.07)` both imply one another since the
+ first generation that supports little-endian mode is Power-8`(ISA 2.07)`
+ - On AArch64 `NEON` `FP16` `VFPV4` `ASIMD` implies each other since they are part of the
+ hardware baseline.
+
+- **Compilation compatibility**: Not all **C/C++** compilers provide independent support for all CPU
+ features. For example, **Intel**'s compiler doesn't provide separated flags for `AVX2` and `FMA3`,
+ it makes sense since all Intel CPUs that comes with `AVX2` also support `FMA3` and vice versa,
+ but this approach is incompatible with other **x86** CPUs from **AMD** or **VIA**.
+ Therefore, there are differences in the depiction of CPU features between the C/C++ compilers,
+ as shown in the :ref:`tables above <tables-diff>`.
+
+
Behaviors and Errors
~~~~~~~~~~~~~~~~~~~~
@@ -224,7 +254,7 @@ Definitely, yes. But the :ref:`dispatch-able sources <dispatchable-sources>` are
treated differently.
What if the user specifies certain **baseline features** during the
-build but at runtime the machine doesn't support even these
+build but at runtime the machine doesn't support even these
features? Will the compiled code be called via one of these definitions, or
maybe the compiler itself auto-generated/vectorized certain piece of code
based on the provided command line compiler flags?
@@ -304,7 +334,7 @@ through ``--cpu-dispatch``, but it can also represent other options such as:
.. code:: c
- /*
+ /*
* this definition is used by NumPy utilities as suffixes for the
* exported symbols
*/