diff options
Diffstat (limited to 'doc/source/reference/simd')
-rw-r--r-- | doc/source/reference/simd/simd-optimizations-tables-diff.inc | 37 | ||||
-rw-r--r-- | doc/source/reference/simd/simd-optimizations-tables.inc | 165 | ||||
-rw-r--r-- | doc/source/reference/simd/simd-optimizations.py | 236 | ||||
-rw-r--r-- | doc/source/reference/simd/simd-optimizations.rst | 42 |
4 files changed, 282 insertions, 198 deletions
diff --git a/doc/source/reference/simd/simd-optimizations-tables-diff.inc b/doc/source/reference/simd/simd-optimizations-tables-diff.inc new file mode 100644 index 000000000..41fa96703 --- /dev/null +++ b/doc/source/reference/simd/simd-optimizations-tables-diff.inc @@ -0,0 +1,37 @@ +.. generated via source/reference/simd/simd-optimizations.py + +x86::Intel Compiler - CPU feature names +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. table:: + :align: left + + =========== ================================================================================================================== + Name Implies + =========== ================================================================================================================== + ``FMA3`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` **AVX2** + ``AVX2`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` **FMA3** + ``AVX512F`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` **AVX512CD** + =========== ================================================================================================================== + +.. note:: + The following features aren't supported by x86::Intel Compiler: + **XOP FMA4** + +x86::Microsoft Visual C/C++ - CPU feature names +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. table:: + :align: left + + ============ ================================================================================================================================= + Name Implies + ============ ================================================================================================================================= + ``FMA3`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` **AVX2** + ``AVX2`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` **FMA3** + ``AVX512F`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` **AVX512CD** **AVX512_SKX** + ``AVX512CD`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` **AVX512_SKX** + ============ ================================================================================================================================= + +.. note:: + The following features aren't supported by x86::Microsoft Visual C/C++: + **AVX512_KNL AVX512_KNM** + diff --git a/doc/source/reference/simd/simd-optimizations-tables.inc b/doc/source/reference/simd/simd-optimizations-tables.inc index d5b82ee0c..f038a91e1 100644 --- a/doc/source/reference/simd/simd-optimizations-tables.inc +++ b/doc/source/reference/simd/simd-optimizations-tables.inc @@ -1,110 +1,103 @@ .. generated via source/reference/simd/simd-optimizations.py -``X86`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - +x86 - CPU feature names +~~~~~~~~~~~~~~~~~~~~~~~ .. table:: :align: left - ======== ================================================================================================================= - Name Implies - ======== ================================================================================================================= - SSE ``SSE`` ``SSE2`` - SSE2 ``SSE`` ``SSE2`` - SSE3 ``SSE`` ``SSE2`` - SSSE3 ``SSE`` ``SSE2`` ``SSE3`` - SSE41 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` - POPCNT ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` - SSE42 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` - AVX ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` - XOP ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` - FMA4 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` - F16C ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` - FMA3 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` - AVX2 ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` - AVX512F ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` - AVX512CD ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` - ======== ================================================================================================================= - -``X86`` - Group names -~~~~~~~~~~~~~~~~~~~~~ - + ============ ================================================================================================================= + Name Implies + ============ ================================================================================================================= + ``SSE`` ``SSE2`` + ``SSE2`` ``SSE`` + ``SSE3`` ``SSE`` ``SSE2`` + ``SSSE3`` ``SSE`` ``SSE2`` ``SSE3`` + ``SSE41`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` + ``POPCNT`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` + ``SSE42`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` + ``AVX`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` + ``XOP`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` + ``FMA4`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` + ``F16C`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` + ``FMA3`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` + ``AVX2`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` + ``AVX512F`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` + ``AVX512CD`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` + ============ ================================================================================================================= + +x86 - Group names +~~~~~~~~~~~~~~~~~ .. table:: :align: left - ========== ===================================================== =========================================================================================================================================================================== - Name Gather Implies - ========== ===================================================== =========================================================================================================================================================================== - AVX512_KNL ``AVX512ER`` ``AVX512PF`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` - AVX512_KNM ``AVX5124FMAPS`` ``AVX5124VNNIW`` ``AVX512VPOPCNTDQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_KNL`` - AVX512_SKX ``AVX512VL`` ``AVX512BW`` ``AVX512DQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` - AVX512_CLX ``AVX512VNNI`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` - AVX512_CNL ``AVX512IFMA`` ``AVX512VBMI`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` - AVX512_ICL ``AVX512VBMI2`` ``AVX512BITALG`` ``AVX512VPOPCNTDQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512_CLX`` ``AVX512_CNL`` - ========== ===================================================== =========================================================================================================================================================================== - -``IBM/POWER`` ``big-endian`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - + ============== ===================================================== =========================================================================================================================================================================== + Name Gather Implies + ============== ===================================================== =========================================================================================================================================================================== + ``AVX512_KNL`` ``AVX512ER`` ``AVX512PF`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` + ``AVX512_KNM`` ``AVX5124FMAPS`` ``AVX5124VNNIW`` ``AVX512VPOPCNTDQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_KNL`` + ``AVX512_SKX`` ``AVX512VL`` ``AVX512BW`` ``AVX512DQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` + ``AVX512_CLX`` ``AVX512VNNI`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` + ``AVX512_CNL`` ``AVX512IFMA`` ``AVX512VBMI`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` + ``AVX512_ICL`` ``AVX512VBMI2`` ``AVX512BITALG`` ``AVX512VPOPCNTDQ`` ``SSE`` ``SSE2`` ``SSE3`` ``SSSE3`` ``SSE41`` ``POPCNT`` ``SSE42`` ``AVX`` ``F16C`` ``FMA3`` ``AVX2`` ``AVX512F`` ``AVX512CD`` ``AVX512_SKX`` ``AVX512_CLX`` ``AVX512_CNL`` + ============== ===================================================== =========================================================================================================================================================================== + +IBM/POWER big-endian - CPU feature names +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. table:: :align: left - ==== ================ - Name Implies - ==== ================ - VSX - VSX2 ``VSX`` - VSX3 ``VSX`` ``VSX2`` - ==== ================ - -``IBM/POWER`` ``little-endian mode`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ======== ================ + Name Implies + ======== ================ + ``VSX`` + ``VSX2`` ``VSX`` + ``VSX3`` ``VSX`` ``VSX2`` + ======== ================ +IBM/POWER little-endian - CPU feature names +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. table:: :align: left - ==== ================ - Name Implies - ==== ================ - VSX ``VSX`` ``VSX2`` - VSX2 ``VSX`` ``VSX2`` - VSX3 ``VSX`` ``VSX2`` - ==== ================ + ======== ================ + Name Implies + ======== ================ + ``VSX`` ``VSX2`` + ``VSX2`` ``VSX`` + ``VSX3`` ``VSX`` ``VSX2`` + ======== ================ -``ARMHF`` - CPU feature names +ARMv7/A32 - CPU feature names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - .. table:: :align: left - ========== =========================================================== - Name Implies - ========== =========================================================== - NEON - NEON_FP16 ``NEON`` - NEON_VFPV4 ``NEON`` ``NEON_FP16`` - ASIMD ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` - ASIMDHP ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` - ASIMDDP ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` - ASIMDFHM ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` ``ASIMDHP`` - ========== =========================================================== - -``ARM64`` ``AARCH64`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - + ============== =========================================================== + Name Implies + ============== =========================================================== + ``NEON`` + ``NEON_FP16`` ``NEON`` + ``NEON_VFPV4`` ``NEON`` ``NEON_FP16`` + ``ASIMD`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` + ``ASIMDHP`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` + ``ASIMDDP`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` + ``ASIMDFHM`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` ``ASIMDHP`` + ============== =========================================================== + +ARMv8/A64 - CPU feature names +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. table:: :align: left - ========== =========================================================== - Name Implies - ========== =========================================================== - NEON ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` - NEON_FP16 ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` - NEON_VFPV4 ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` - ASIMD ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` - ASIMDHP ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` - ASIMDDP ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` - ASIMDFHM ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` ``ASIMDHP`` - ========== =========================================================== + ============== =========================================================== + Name Implies + ============== =========================================================== + ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` + ``NEON_FP16`` ``NEON`` ``NEON_VFPV4`` ``ASIMD`` + ``NEON_VFPV4`` ``NEON`` ``NEON_FP16`` ``ASIMD`` + ``ASIMD`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` + ``ASIMDHP`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` + ``ASIMDDP`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` + ``ASIMDFHM`` ``NEON`` ``NEON_FP16`` ``NEON_VFPV4`` ``ASIMD`` ``ASIMDHP`` + ============== =========================================================== -
\ No newline at end of file diff --git a/doc/source/reference/simd/simd-optimizations.py b/doc/source/reference/simd/simd-optimizations.py index 628356163..5d6da50e3 100644 --- a/doc/source/reference/simd/simd-optimizations.py +++ b/doc/source/reference/simd/simd-optimizations.py @@ -3,10 +3,14 @@ Generate CPU features tables from CCompilerOpt """ from os import sys, path gen_path = path.dirname(path.realpath(__file__)) +#sys.path.append(path.abspath(path.join(gen_path, *([".."]*4), "numpy", "distutils"))) +#from ccompiler_opt import CCompilerOpt from numpy.distutils.ccompiler_opt import CCompilerOpt class FakeCCompilerOpt(CCompilerOpt): fake_info = "" + # disable caching no need for it + conf_nocache = True def __init__(self, *args, **kwargs): no_cc = None CCompilerOpt.__init__(self, no_cc, **kwargs) @@ -23,40 +27,49 @@ class FakeCCompilerOpt(CCompilerOpt): return True def gen_features_table(self, features, ignore_groups=True, - field_names=["Name", "Implies"], **kwargs): + field_names=["Name", "Implies"], + fstyle=None, fstyle_implies=None, **kwargs): rows = [] - for f in features: + if fstyle is None: + fstyle = lambda ft: f'``{ft}``' + if fstyle_implies is None: + fstyle_implies = lambda origin, ft: fstyle(ft) + for f in self.feature_sorted(features): is_group = "group" in self.feature_supported.get(f, {}) if ignore_groups and is_group: continue implies = self.feature_sorted(self.feature_implies(f)) - implies = ' '.join(['``%s``' % i for i in implies]) - rows.append([f, implies]) - return self.gen_rst_table(field_names, rows, **kwargs) + implies = ' '.join([fstyle_implies(f, i) for i in implies]) + rows.append([fstyle(f), implies]) + if rows: + return self.gen_rst_table(field_names, rows, **kwargs) def gen_gfeatures_table(self, features, field_names=["Name", "Gather", "Implies"], - **kwargs): + fstyle=None, fstyle_implies=None, **kwargs): rows = [] - for f in features: + if fstyle is None: + fstyle = lambda ft: f'``{ft}``' + if fstyle_implies is None: + fstyle_implies = lambda origin, ft: fstyle(ft) + for f in self.feature_sorted(features): gather = self.feature_supported.get(f, {}).get("group", None) if not gather: continue implies = self.feature_sorted(self.feature_implies(f)) - implies = ' '.join(['``%s``' % i for i in implies]) - gather = ' '.join(['``%s``' % i for i in gather]) - rows.append([f, gather, implies]) - return self.gen_rst_table(field_names, rows, **kwargs) + implies = ' '.join([fstyle_implies(f, i) for i in implies]) + gather = ' '.join([fstyle_implies(f, i) for i in gather]) + rows.append([fstyle(f), gather, implies]) + if rows: + return self.gen_rst_table(field_names, rows, **kwargs) - - def gen_rst_table(self, field_names, rows, margin_left=2): + def gen_rst_table(self, field_names, rows, tab_size=4): assert(not rows or len(field_names) == len(rows[0])) rows.append(field_names) fld_len = len(field_names) cls_len = [max(len(c[i]) for c in rows) for i in range(fld_len)] del rows[-1] - padding = 0 - cformat = ' '.join('{:<%d}' % (i+padding) for i in cls_len) + cformat = ' '.join('{:<%d}' % i for i in cls_len) border = cformat.format(*['='*i for i in cls_len]) rows = [cformat.format(*row) for row in rows] @@ -65,102 +78,113 @@ class FakeCCompilerOpt(CCompilerOpt): # footer rows += [border] # add left margin - rows = [(' ' * margin_left) + r for r in rows] + rows = [(' ' * tab_size) + r for r in rows] return '\n'.join(rows) -if __name__ == '__main__': - margin_left = 4*1 - ############### x86 ############### - FakeCCompilerOpt.fake_info = "x86_64 gcc" - x64_gcc = FakeCCompilerOpt(cpu_baseline="max") - x86_tables = """\ -``X86`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. table:: - :align: left - -{x86_features} - -``X86`` - Group names -~~~~~~~~~~~~~~~~~~~~~ - -.. table:: - :align: left - -{x86_gfeatures} - -""".format( - x86_features = x64_gcc.gen_features_table( - x64_gcc.cpu_baseline_names(), margin_left=margin_left - ), - x86_gfeatures = x64_gcc.gen_gfeatures_table( - x64_gcc.cpu_baseline_names(), margin_left=margin_left +def features_table_sections(name, ftable=None, gtable=None, tab_size=4): + tab = ' '*tab_size + content = '' + if ftable: + title = f"{name} - CPU feature names" + content = ( + f"{title}\n{'~'*len(title)}" + f"\n.. table::\n{tab}:align: left\n\n" + f"{ftable}\n\n" ) - ) - ############### Power ############### - FakeCCompilerOpt.fake_info = "ppc64 gcc" - ppc64_gcc = FakeCCompilerOpt(cpu_baseline="max") - FakeCCompilerOpt.fake_info = "ppc64le gcc" - ppc64le_gcc = FakeCCompilerOpt(cpu_baseline="max") - ppc64_tables = """\ -``IBM/POWER`` ``big-endian`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. table:: - :align: left - -{ppc64_features} - -``IBM/POWER`` ``little-endian mode`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. table:: - :align: left - -{ppc64le_features} - -""".format( - ppc64_features = ppc64_gcc.gen_features_table( - ppc64_gcc.cpu_baseline_names(), margin_left=margin_left - ), - ppc64le_features = ppc64le_gcc.gen_features_table( - ppc64le_gcc.cpu_baseline_names(), margin_left=margin_left + if gtable: + title = f"{name} - Group names" + content += ( + f"{title}\n{'~'*len(title)}" + f"\n.. table::\n{tab}:align: left\n\n" + f"{gtable}\n\n" ) + return content + +def features_table(arch, cc="gcc", pretty_name=None, **kwargs): + FakeCCompilerOpt.fake_info = arch + cc + ccopt = FakeCCompilerOpt(cpu_baseline="max") + features = ccopt.cpu_baseline_names() + ftable = ccopt.gen_features_table(features, **kwargs) + gtable = ccopt.gen_gfeatures_table(features, **kwargs) + + if not pretty_name: + pretty_name = arch + '/' + cc + return features_table_sections(pretty_name, ftable, gtable, **kwargs) + +def features_table_diff(arch, cc, cc_vs="gcc", pretty_name=None, **kwargs): + FakeCCompilerOpt.fake_info = arch + cc + ccopt = FakeCCompilerOpt(cpu_baseline="max") + fnames = ccopt.cpu_baseline_names() + features = {f:ccopt.feature_implies(f) for f in fnames} + + FakeCCompilerOpt.fake_info = arch + cc_vs + ccopt_vs = FakeCCompilerOpt(cpu_baseline="max") + fnames_vs = ccopt_vs.cpu_baseline_names() + features_vs = {f:ccopt_vs.feature_implies(f) for f in fnames_vs} + + common = set(fnames).intersection(fnames_vs) + extra_avl = set(fnames).difference(fnames_vs) + not_avl = set(fnames_vs).difference(fnames) + diff_impl_f = {f:features[f].difference(features_vs[f]) for f in common} + diff_impl = {k for k, v in diff_impl_f.items() if v} + + fbold = lambda ft: f'**{ft}**' if ft in extra_avl else f'``{ft}``' + fbold_implies = lambda origin, ft: ( + f'**{ft}**' if ft in diff_impl_f.get(origin, {}) else f'``{ft}``' ) - ############### Arm ############### - FakeCCompilerOpt.fake_info = "armhf gcc" - armhf_gcc = FakeCCompilerOpt(cpu_baseline="max") - FakeCCompilerOpt.fake_info = "aarch64 gcc" - aarch64_gcc = FakeCCompilerOpt(cpu_baseline="max") - arm_tables = """\ -``ARMHF`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. table:: - :align: left - -{armhf_features} - -``ARM64`` ``AARCH64`` - CPU feature names -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. table:: - :align: left - -{aarch64_features} - - """.format( - armhf_features = armhf_gcc.gen_features_table( - armhf_gcc.cpu_baseline_names(), margin_left=margin_left - ), - aarch64_features = aarch64_gcc.gen_features_table( - aarch64_gcc.cpu_baseline_names(), margin_left=margin_left - ) + diff_all = diff_impl.union(extra_avl) + ftable = ccopt.gen_features_table( + diff_all, fstyle=fbold, fstyle_implies=fbold_implies, **kwargs + ) + gtable = ccopt.gen_gfeatures_table( + diff_all, fstyle=fbold, fstyle_implies=fbold_implies, **kwargs ) - # TODO: diff the difference among all supported compilers + if not pretty_name: + pretty_name = arch + '/' + cc + content = features_table_sections(pretty_name, ftable, gtable, **kwargs) + + if not_avl: + not_avl = ccopt_vs.feature_sorted(not_avl) + not_avl = ' '.join(not_avl) + content += ( + ".. note::\n" + f" The following features aren't supported by {pretty_name}:\n" + f" **{not_avl}**\n\n" + ) + return content + +if __name__ == '__main__': + pretty_names = { + "PPC64": "IBM/POWER big-endian", + "PPC64LE": "IBM/POWER little-endian", + "ARMHF": "ARMv7/A32", + "AARCH64": "ARMv8/A64", + "ICC": "Intel Compiler", + # "ICCW": "Intel Compiler msvc-like", + "MSVC": "Microsoft Visual C/C++" + } with open(path.join(gen_path, 'simd-optimizations-tables.inc'), 'wt') as fd: fd.write(f'.. generated via {__file__}\n\n') - fd.write(x86_tables) - fd.write(ppc64_tables) - fd.write(arm_tables) + for arch in ( + ("x86", "PPC64", "PPC64LE", "ARMHF", "AARCH64") + ): + pretty_name = pretty_names.get(arch, arch) + table = features_table(arch=arch, pretty_name=pretty_name) + assert(table) + fd.write(table) + + with open(path.join(gen_path, 'simd-optimizations-tables-diff.inc'), 'wt') as fd: + fd.write(f'.. generated via {__file__}\n\n') + for arch, cc_names in ( + ("x86", ("clang", "ICC", "MSVC")), + ("PPC64", ("clang",)), + ("PPC64LE", ("clang",)), + ("ARMHF", ("clang",)), + ("AARCH64", ("clang",)) + ): + arch_pname = pretty_names.get(arch, arch) + for cc in cc_names: + pretty_name = f"{arch_pname}::{pretty_names.get(cc, cc)}" + table = features_table_diff(arch=arch, cc=cc, pretty_name=pretty_name) + if table: + fd.write(table) diff --git a/doc/source/reference/simd/simd-optimizations.rst b/doc/source/reference/simd/simd-optimizations.rst index eb7eb2a83..59a4892b2 100644 --- a/doc/source/reference/simd/simd-optimizations.rst +++ b/doc/source/reference/simd/simd-optimizations.rst @@ -29,8 +29,8 @@ Build options for compilation safely run on a wide range of platforms within the processor family. - ``--cpu-dispatch``: dispatched set of additional optimizations. - The default value for ``x86`` is ``max -xop -fma4`` which enables all CPU - features, except for AMD legacy features. + The default value is ``max -xop -fma4`` which enables all CPU + features, except for AMD legacy features(in case of X86). The command arguments are available in ``build``, ``build_clib``, and ``build_ext``. @@ -38,13 +38,24 @@ if ``build_clib`` or ``build_ext`` are not specified by the user, the arguments ``build`` will be used instead, which also holds the default values. Optimization names can be CPU features or groups of features that gather -several features or special options to perform a series of procedures. +several features or :ref:`special options <special-options>` to perform a series of procedures. The following tables show the current supported optimizations sorted from the lowest to the highest interest. .. include:: simd-optimizations-tables.inc +---- + +.. _tables-diff: + +While the above tables are based on the GCC Compiler, the following tables showing the differences in the +other compilers: + +.. include:: simd-optimizations-tables-diff.inc + +.. _special-options: + Special options ~~~~~~~~~~~~~~~ @@ -80,7 +91,7 @@ NOTES - The order of the requsted optimizations doesn't matter. -- Either commas or spaces can be used as a separator, e.g. ``--cpu-dispatch``\ = +- Either commas or spaces can be used as a separator, e.g. ``--cpu-dispatch``\ = "avx2 avx512f" or ``--cpu-dispatch``\ = "avx2, avx512f" both work, but the arguments must be enclosed in quotes. @@ -114,6 +125,25 @@ NOTES Special cases ~~~~~~~~~~~~~ +**Interrelated CPU features**: Some exceptional conditions force us to link some features together when it come to certain compilers or architectures, resulting in the impossibility of building them separately. +These conditions can be divided into two parts, as follows: + +- **Architectural compatibility**: The need to align certain CPU features that are assured + to be supported by successive generations of the same architecture, for example: + + - On ppc64le `VSX(ISA 2.06)` and `VSX2(ISA 2.07)` both imply one another since the + first generation that supports little-endian mode is Power-8`(ISA 2.07)` + - On AArch64 `NEON` `FP16` `VFPV4` `ASIMD` implies each other since they are part of the + hardware baseline. + +- **Compilation compatibility**: Not all **C/C++** compilers provide independent support for all CPU + features. For example, **Intel**'s compiler doesn't provide separated flags for `AVX2` and `FMA3`, + it makes sense since all Intel CPUs that comes with `AVX2` also support `FMA3` and vice versa, + but this approach is incompatible with other **x86** CPUs from **AMD** or **VIA**. + Therefore, there are differences in the depiction of CPU features between the C/C++ compilers, + as shown in the :ref:`tables above <tables-diff>`. + + Behaviors and Errors ~~~~~~~~~~~~~~~~~~~~ @@ -224,7 +254,7 @@ Definitely, yes. But the :ref:`dispatch-able sources <dispatchable-sources>` are treated differently. What if the user specifies certain **baseline features** during the -build but at runtime the machine doesn't support even these +build but at runtime the machine doesn't support even these features? Will the compiled code be called via one of these definitions, or maybe the compiler itself auto-generated/vectorized certain piece of code based on the provided command line compiler flags? @@ -304,7 +334,7 @@ through ``--cpu-dispatch``, but it can also represent other options such as: .. code:: c - /* + /* * this definition is used by NumPy utilities as suffixes for the * exported symbols */ |