| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Combine (AND m (SRWconst x)) or (SRWconst (AND m x)) when mask m is
and the shift value produce constant which can be encoded into an
RLWINM instruction.
Combine (CLRLSLDI (SRWconst x)) if the combining of the underling rotate
masks produces a constant which can be encoded into RLWINM.
Likewise for (SLDconst (SRWconst x)) and (CLRLSDI (RLWINM x)).
Combine rotate word + and operations which can be encoded as a single
RLWINM/RLWNM instruction.
The most notable performance improvements arise from the crypto
benchmarks below (GOARCH=power8 on a ppc64le/linux):
pkg:golang.org/x/crypto/blowfish goos:linux goarch:ppc64le
ExpandKeyWithSalt 52.2µs ± 0% 47.5µs ± 0% -8.88%
ExpandKey 44.4µs ± 0% 40.3µs ± 0% -9.15%
pkg:golang.org/x/crypto/ssh/internal/bcrypt_pbkdf goos:linux goarch:ppc64le
Key 57.6ms ± 0% 52.3ms ± 0% -9.13%
pkg:golang.org/x/crypto/bcrypt goos:linux goarch:ppc64le
Equal 90.9ms ± 0% 82.6ms ± 0% -9.13%
DefaultCost 91.0ms ± 0% 82.7ms ± 0% -9.12%
Change-Id: I59a0ca29face38f4ab46e37124c32906f216c4ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/260798
Run-TryBot: Carlos Eduardo Seo <carlos.seo@linaro.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Backstop support for non-sse2 chips now that 387 is gone.
RELNOTE=yes
Change-Id: Ib10e69c4a3654c15a03568f93393437e1939e013
Reviewed-on: https://go-review.googlesource.com/c/go/+/260017
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
My last 387 CL. So sad ... ... ... ... not!
Fixes #40255
Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/258957
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A recent change to improve shifts was generating some
invalid cases when the rule was based on an AND. The
extended mnemonics CLRLSLDI and CLRLSLWI only allow
certain values for the operands and in the mask case
those values were not being checked properly. This
adds a check to those rules to verify that the
'b' and 'n' values used when an AND was part of the rule
have correct values.
There was a bug in some diag messages in asm9. The
message expected 3 values but only provided 2. Those are
corrected here also.
The test/codegen/shift.go was updated to add a few more
cases to check for the case mentioned here.
Some of the comments that mention the order of operands
in these extended mnemonics were wrong and those have been
corrected.
Fixes #41683.
Change-Id: If5bb860acaa5051b9e0cd80784b2868b85898c31
Reviewed-on: https://go-review.googlesource.com/c/go/+/258138
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds support for the extswsli instruction which combines
extsw followed by a shift.
New benchmark demonstrates the improvement:
name old time/op new time/op delta
ExtShift 1.34µs ± 0% 1.30µs ± 0% -3.15% (p=0.057 n=4+3)
Change-Id: I21b410676fdf15d20e0cbbaa75d7c6dcd3bbb7b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/257017
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@gmail.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds rules to find pairs of instructions that can
be combined into a single shifts. These instruction sequences
are common in array addressing within loops. Improvements can
be seen in many crypto packages and the hash packages.
These are based on the extended mnemonics found in the ISA
sections C.8.1 and C.8.2.
Some rules in PPC64.rules were moved because the ordering prevented
some matching.
The following results were generated on power9.
hash/crc32:
CRC32/poly=Koopman/size=40/align=0 195ns ± 0% 163ns ± 0% -16.41%
CRC32/poly=Koopman/size=40/align=1 200ns ± 0% 163ns ± 0% -18.50%
CRC32/poly=Koopman/size=512/align=0 1.98µs ± 0% 1.67µs ± 0% -15.46%
CRC32/poly=Koopman/size=512/align=1 1.98µs ± 0% 1.69µs ± 0% -14.80%
CRC32/poly=Koopman/size=1kB/align=0 3.90µs ± 0% 3.31µs ± 0% -15.27%
CRC32/poly=Koopman/size=1kB/align=1 3.85µs ± 0% 3.31µs ± 0% -14.15%
CRC32/poly=Koopman/size=4kB/align=0 15.3µs ± 0% 13.1µs ± 0% -14.22%
CRC32/poly=Koopman/size=4kB/align=1 15.4µs ± 0% 13.1µs ± 0% -14.79%
CRC32/poly=Koopman/size=32kB/align=0 137µs ± 0% 105µs ± 0% -23.56%
CRC32/poly=Koopman/size=32kB/align=1 137µs ± 0% 105µs ± 0% -23.53%
crypto/rc4:
RC4_128 733ns ± 0% 650ns ± 0% -11.32% (p=1.000 n=1+1)
RC4_1K 5.80µs ± 0% 5.17µs ± 0% -10.89% (p=1.000 n=1+1)
RC4_8K 45.7µs ± 0% 40.8µs ± 0% -10.73% (p=1.000 n=1+1)
crypto/sha1:
Hash8Bytes 635ns ± 0% 613ns ± 0% -3.46% (p=1.000 n=1+1)
Hash320Bytes 2.30µs ± 0% 2.18µs ± 0% -5.38% (p=1.000 n=1+1)
Hash1K 5.88µs ± 0% 5.38µs ± 0% -8.62% (p=1.000 n=1+1)
Hash8K 42.0µs ± 0% 37.9µs ± 0% -9.75% (p=1.000 n=1+1)
There are other improvements found in golang.org/x/crypto which are all in the
range of 5-15%.
Change-Id: I193471fbcf674151ffe2edab212799d9b08dfb8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/252097
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The order array was zero initialized by the compiler, but ends up being
overwritten by the runtime anyway.
So let the runtime takes full responsibility for initializing, save us
one instruction per select.
Fixes #40399
Change-Id: Iec1eca27ad7180d4fcb3cc9ef97348206b7fe6b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/251517
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This merges an lis + subf into subfic, and for 32b constants
lwa + subf into oris + ori + subf.
The carry bit is no longer used in code generation, therefore
I think we can clobber it as needed. Note, lowered borrow/carry
arithmetic is self-contained and thus is not affected.
A few extra rules are added to ensure early transformations to
SUBFCconst don't trip up earlier rules, fold constant operations,
or otherwise simplify lowering. Likewise, tests are added to
ensure all rules are hit. Generic constant folding catches
trivial cases, however some lowering rules insert arithmetic
which can introduce new opportunities (e.g BitLen or Slicemask).
I couldn't find a specific benchmark to demonstrate noteworthy
improvements, but this is generating subfic in many of the default
bent test binaries, so we are at least saving a little code space.
Change-Id: Iad7c6e5767eaa9dc24dc1c989bd1c8cfe1982012
Reviewed-on: https://go-review.googlesource.com/c/go/+/249461
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this patch, opt pass can expose more obvious constant-folding
opportunites.
Example:
func test(i int) int {return (i+8)-(i+4)}
The previous version:
MOVD "".i(FP), R0
ADD $8, R0, R1
ADD $4, R0, R0
SUB R0, R1, R0
MOVD R0, "".~r1+8(FP)
RET (R30)
The optimized version:
MOVD $4, R0
MOVD R0, "".~r1+8(FP)
RET (R30)
This patch removes some existing reassociation rules, such as "x+(z-C)",
because the current generic rewrite rules will canonicalize "x-const"
to "x+(-const)", making "x+(z-C)" equal to "x+(z+(-C))".
This patch also adds test cases.
Change-Id: I857108ba0b5fcc18a879eeab38e2551bc4277797
Reviewed-on: https://go-review.googlesource.com/c/go/+/237137
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
Updates #21439
Change-Id: I0fbcde6e0c2fc368fe686b271670f9d8be4a7900
Reviewed-on: https://go-review.googlesource.com/c/go/+/247557
Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Richard Musiol <neelance@gmail.com>
|
|
|
|
|
|
|
|
|
|
| |
This patch fuses pattern '(MVN (XOR x y))' into '(EON x y)'.
Change-Id: I269c98ce198d51a4945ce8bd0e1024acbd1b7609
Reviewed-on: https://go-review.googlesource.com/c/go/+/239638
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new lowering rule to match and replace such instances
with the MADDLD instruction available on power9 where
possible.
Likewise, this plumbs in a new ppc64 ssa opcode to house
the newly generated MADDLD instructions.
When testing ed25519, this reduced binary size by 936B.
Similarly, MADDLD combination occcurs in a few other less
obvious cases such as division by constant.
Testing of golang.org/x/crypto/ed25519 shows non-trivial
speedup during keygeneration:
name old time/op new time/op delta
KeyGeneration 65.2µs ± 0% 63.1µs ± 0% -3.19%
Signing 64.3µs ± 0% 64.4µs ± 0% +0.16%
Verification 147µs ± 0% 147µs ± 0% +0.11%
Similarly, this test binary has shrunk by 66488B.
Change-Id: I077aeda7943119b41f07e4e62e44a648f16e4ad0
Reviewed-on: https://go-review.googlesource.com/c/go/+/248723
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the existing optimizations aren't triggered because they
are handled by the generic rules so this CL removes them. Also
some constraints were copied without much thought from the amd64
rules and they don't make sense on s390x, so we remove those
constraints.
Finally, add a 'multiply by the sum of two powers of two'
optimization. This makes sense on s390x as shifts are low latency
and can also sometimes be optimized further (especially if we add
support for RISBG instructions).
name old time/op new time/op delta
IntMulByConst/3-8 1.70ns ±11% 1.10ns ± 5% -35.26% (p=0.000 n=10+10)
IntMulByConst/5-8 1.64ns ± 7% 1.10ns ± 4% -32.94% (p=0.000 n=10+9)
IntMulByConst/12-8 1.65ns ± 6% 1.20ns ± 4% -27.16% (p=0.000 n=10+9)
IntMulByConst/120-8 1.66ns ± 4% 1.22ns ±13% -26.43% (p=0.000 n=10+10)
IntMulByConst/-120-8 1.65ns ± 7% 1.19ns ± 4% -28.06% (p=0.000 n=9+10)
IntMulByConst/65537-8 0.86ns ± 9% 1.12ns ±12% +30.41% (p=0.000 n=10+10)
IntMulByConst/65538-8 1.65ns ± 5% 1.23ns ± 5% -25.11% (p=0.000 n=10+10)
Change-Id: Ib196e6bff1e97febfd266134d0a2b2a62897989f
Reviewed-on: https://go-review.googlesource.com/c/go/+/248937
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For an unsigned integer, it's useful to convert its order test with 0/1
to its equality test with 0. We can save a comparison instruction that
followed by a conditional branch on arm64 since it supports
compare-with-zero-and-branch instructions. For example,
if x > 0 { ... } else { ... }
the original version:
CMP $0, R0
BLS 9
the optimized version:
CBZ R0, 8
Updates #21439
Change-Id: Id1de6f865f6aa72c5d45b29f7894818857288425
Reviewed-on: https://go-review.googlesource.com/c/go/+/246857
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
| |
If the AND has other uses, we end up saving an argument to the AND
in another register, so we can use it for the TEST. No point in doing that.
Change-Id: I73444a6aeddd6f55e2328ce04d77c3e6cf4a83e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/241280
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are some architecture-independent rules in #21439, since an
unsigned integer >= 0 is always true and < 0 is always false. This CL
adds these optimizations to generic rules.
Updates #21439
Change-Id: Iec7e3040b761ecb1e60908f764815fdd9bc62495
Reviewed-on: https://go-review.googlesource.com/c/go/+/246617
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
They were missed as part of the refactoring to use a separate
addressing modes pass.
Fixes #40426
Change-Id: Ie0418b2fac4ba1ffe720644ac918f6d728d5e420
Reviewed-on: https://go-review.googlesource.com/c/go/+/244859
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.
Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.
Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:
Block-Op Meaning ARM condition codes
1. LTnoov less than MI
2. GEnoov greater than or equal PL
3. LEnoov less than or equal MI || EQ
4. GTnoov greater than NEQ & PL
The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.
For more details please refer to the previous fix on 64-bit ARM:
https://go-review.googlesource.com/c/go/+/233097
Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.
name old time/op new time/op delta
BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8)
Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8)
FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8)
FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8)
FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7)
FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8)
FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8)
FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7)
GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8)
GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7)
Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8)
Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8)
HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8)
JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7)
JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7)
Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8)
GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8)
RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7)
RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8)
RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7)
RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8)
RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8)
RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8)
Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8)
Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8)
TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8)
TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8)
name old speed new speed delta
GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8)
GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7)
Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8)
Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8)
JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7)
JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7)
GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8)
RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7)
RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8)
RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7)
RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8)
RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8)
RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8)
Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8)
Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8)
Fixes #39303
Updates #38740
Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.
Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.
Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:
Block-Op Meaning ARM condition codes
1. LTnoov less than MI
2. GEnoov greater than or equal PL
3. LEnoov less than or equal MI || EQ
4. GTnoov greater than NEQ & PL
The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.
Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.
Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.
This change also updates TestFormats to support using %#x.
Examples exhibiting where does the issue come from:
1: 'if x + 3 < 0' might be converted to:
before:
CMN $3, R0
BGE <else branch> // wrong branch is taken if 'x+3' overflows
after:
CMN $3, R0
BPL <else branch>
2: 'if y - 3 > 0' might be converted to:
before:
CMP $3, R0
BLE <else branch> // wrong branch is taken if 'y-3' underflows
after:
CMP $3, R0
BMI <else branch>
BEQ <else branch>
Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.
S1:
name old time/op new time/op delta
CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10)
CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10)
S2:
name old time/op new time/op delta
CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10)
CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10)
S3:
name old time/op new time/op delta
CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10)
CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9)
S4:
name old time/op new time/op delta
CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10)
CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10)
S5:
name old time/op new time/op delta
CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10)
CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9)
Go1 perf. data:
name old time/op new time/op delta
BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5)
Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5)
FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5)
FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5)
FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5)
FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5)
FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5)
FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5)
FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5)
GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5)
GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5)
Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5)
Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5)
HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4)
JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5)
JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5)
Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5)
GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5)
RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5)
RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4)
RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5)
RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal)
RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5)
RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5)
RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5)
Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5)
Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5)
TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5)
TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5)
Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Taking over Zach's CL 212277. Just cleaned up and added a test.
For a positive, signed integer, an arithmetic right shift of count
(bit-width - 1) equals zero. e.g. int64(22) >> 63 -> 0. This CL makes
prove replace these right shifts with a zero-valued constant.
These shifts may arise in source code explicitly, but can also be
created by the generic rewrite of signed division by a power of 2.
// Signed divide by power of 2.
// n / c = n >> log(c) if n >= 0
// = (n+c-1) >> log(c) if n < 0
// We conditionally add c-1 by adding n>>63>>(64-log(c))
(first shift signed, second shift unsigned).
(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) ->
(Rsh64x64
(Add64 <t> n (Rsh64Ux64 <t>
(Rsh64x64 <t> n (Const64 <typ.UInt64> [63]))
(Const64 <typ.UInt64> [64-log2(c)])))
(Const64 <typ.UInt64> [log2(c)]))
If n is known to be positive, this rewrite includes an extra Add and 2
extra Rsh. This CL will allow prove to replace one of the extra Rsh with
a 0. That replacement then allows lateopt to remove all the unneccesary
fixups from the generic rewrite.
There is a rewrite rule to handle this case directly:
(Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) ->
(Rsh64Ux64 n (Const64 <typ.UInt64> [log2(c)]))
But this implementation of isNonNegative really only handles constants
and a few special operations like len/cap. The division could be
handled if the factsTable version of isNonNegative were available.
Unfortunately, the first opt pass happens before prove even has a
chance to deduce the numerator is non-negative, so the generic rewrite
has already fired and created the extra Ops discussed above.
Fixes #36159
By Printf count, this zeroes 137 right shifts when building std and cmd.
Change-Id: Iab486910ac9d7cfb86ace2835456002732b384a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/232857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
match:
m = make([]T, x); copy(m, s)
for pointer free T and x==len(s) rewrite to:
m = mallocgc(x*elemsize(T), nil, false); memmove(&m, &s, x*elemsize(T))
otherwise rewrite to:
m = makeslicecopy([]T, x, s)
This avoids memclear and shading of pointers in the newly created slice
before the copy.
With this CL "s" is only be allowed to bev a variable and not a more
complex expression. This restriction could be lifted in future versions
of this optimization when it can be proven that "s" is not referencing "m".
Triggers 450 times during make.bash..
Reduces go binary size by ~8 kbyte.
name old time/op new time/op delta
MakeSliceCopy/mallocmove/Byte 71.1ns ± 1% 65.8ns ± 0% -7.49% (p=0.000 n=10+9)
MakeSliceCopy/mallocmove/Int 71.2ns ± 1% 66.0ns ± 0% -7.27% (p=0.000 n=10+8)
MakeSliceCopy/mallocmove/Ptr 104ns ± 4% 99ns ± 1% -5.13% (p=0.000 n=10+10)
MakeSliceCopy/makecopy/Byte 70.3ns ± 0% 68.0ns ± 0% -3.22% (p=0.000 n=10+9)
MakeSliceCopy/makecopy/Int 70.3ns ± 0% 68.5ns ± 1% -2.59% (p=0.000 n=9+10)
MakeSliceCopy/makecopy/Ptr 102ns ± 0% 99ns ± 1% -2.97% (p=0.000 n=9+9)
MakeSliceCopy/nilappend/Byte 75.4ns ± 0% 74.9ns ± 2% -0.63% (p=0.015 n=9+9)
MakeSliceCopy/nilappend/Int 75.6ns ± 0% 76.4ns ± 3% ~ (p=0.245 n=9+10)
MakeSliceCopy/nilappend/Ptr 107ns ± 0% 108ns ± 1% +0.93% (p=0.005 n=9+10)
Fixes #26252
Change-Id: Iec553dd1fef6ded16197216a472351c8799a8e71
Reviewed-on: https://go-review.googlesource.com/c/go/+/146719
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
name old time/op new time/op delta
Modify-16 404ns ± 1% 365ns ± 1% -9.73% (p=0.000 n=10+10)
ConstModify-16 407ns ± 0% 385ns ± 2% -5.56% (p=0.000 n=9+10)
Seems to generally help generated code.
Binary size change is in the noise.
Change-Id: I57891bfaf0f7dfc5d143bb9f7ebafc7079d2614f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228098
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
name old time/op new time/op delta
LoadAdd-16 545ns ± 0% 456ns ± 0% -16.31% (p=0.000 n=10+10)
Update #36468
Change-Id: I84f390d55490648fa1f58cdbc24fd74c4f1bc8c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227960
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We set up static symbols during walk that
we later make copies of to initialize local variables.
It is difficult to ascertain at that time exactly
when copying a symbol is profitable vs locally
initializing an autotmp.
During SSA, we are much better placed to optimize.
This change recognizes when we are copying from a
global readonly all-zero symbol and replaces it with
direct zeroing.
This often allows the all-zero symbol to be
deadcode eliminated at link time.
This is not ideal--it makes for large object files,
and longer link times--but it is the cleanest fix I could find.
This makes the final binary for the program in #38554
shrink from >500mb to ~2.2mb.
It also shrinks the standard binaries:
file before after Δ %
addr2line 4412496 4404304 -8192 -0.186%
buildid 2893816 2889720 -4096 -0.142%
cgo 4841048 4832856 -8192 -0.169%
compile 19926480 19922432 -4048 -0.020%
cover 5281816 5277720 -4096 -0.078%
link 6734648 6730552 -4096 -0.061%
nm 4366240 4358048 -8192 -0.188%
objdump 4755968 4747776 -8192 -0.172%
pprof 14653060 14612100 -40960 -0.280%
trace 11805940 11777268 -28672 -0.243%
vet 7185560 7181416 -4144 -0.058%
total 113588440 113465560 -122880 -0.108%
And not just by removing unnecessary symbols;
the program text shrinks a bit as well.
Fixes #38554
Change-Id: I8381ae6084ae145a5e0cd9410c451e52c0dc51c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/229704
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
Triggers a handful of times in std+cmd.
Change-Id: I9bb8ce9a5f8bae2547cb61157cd8f256e1b63e76
Reviewed-on: https://go-review.googlesource.com/c/go/+/229602
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL optimizes code that uses a carry from a function such as
bits.Add64 as the condition in an if statement. For example:
x, c := bits.Add64(a, b, 0)
if c != 0 {
panic("overflow")
}
Rather than converting the carry into a 0 or a 1 value and using
that as an input to a comparison instruction the carry flag is now
used as the input to a conditional branch directly. This typically
removes an ADD LOGICAL WITH CARRY instruction when user code is
doing overflow detection and is closer to the code that a user
would expect to generate.
Change-Id: I950431270955ab72f1b5c6db873b6abe769be0da
Reviewed-on: https://go-review.googlesource.com/c/go/+/219757
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When generating code for unsigned equals (==) and not equals (!=)
comparisons we currently, on s390x, always use signed comparisons.
This mostly works well, however signed comparisons on s390x sign
extend their immediates and unsigned comparisons zero extend them.
For compare-and-branch instructions which can only have 8-bit
immediates this significantly changes the range of immediate values
we can represent: [-128, 127] for signed comparisons and [0, 255]
for unsigned comparisons.
When generating equals and not equals checks we don't neet to worry
about whether the comparison is signed or unsigned. This CL
therefore adds rules to allow us to switch signedness for such
comparisons if it means that it brings a constant into range for an
8-bit immediate.
For example, a signed equals with an integer in the range [128, 255]
will now be implemented using an unsigned compare-and-branch
instruction rather than separate compare and branch instructions.
As part of this change I've also added support for adding a name
to block control values using the same `x:(...)` syntax we use for
value rules.
Triggers 792 times when compiling cmd and std.
Change-Id: I77fa80a128f0a8ce51a2888d1e384bd5e9b61a77
Reviewed-on: https://go-review.googlesource.com/c/go/+/228642
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
Change-Id: Ife4e065246729319c39e57a4fbd8e6f7b37724e1
GitHub-Last-Rev: e71803eaeb366c00f6c156de0b0b2c50927a0e82
GitHub-Pull-Request: golang/go#38527
Reviewed-on: https://go-review.googlesource.com/c/go/+/228901
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This optimization works on any integer with exactly one bit set.
This is identical to being a power of two, except in the
most negative number. Use oneBit instead.
The rule now triggers in a few more places in std+cmd,
in packages encoding/asn1, crypto/elliptic, and
vendor/golang.org/x/crypto/cryptobyte.
This change obviates the need for CL 222479
by doing this optimization consistently in the compiler.
Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f
Reviewed-on: https://go-review.googlesource.com/c/go/+/229124
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Benchmarking suggests that the combo instruction is notably slower,
at least in the places where we measure.
Updates #37955
Change-Id: I829f1975dd6edf38163128ba51d84604055512f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/228157
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the code generated for variable length shift
counts to use isel instead of instructions that set and
read the carry flag.
This reduces the generated code for shifts like this
by 1 instruction and avoids the use of instructions to
set and read the carry flag.
This sequence can be found in strconv with these results
on power9:
Atof64Decimal 71.6ns ± 0% 68.3ns ± 0% -4.61%
Atof64Float 95.3ns ± 0% 90.9ns ± 0% -4.62%
Atof64FloatExp 153ns ± 0% 149ns ± 0% -2.61%
Atof64Big 234ns ± 0% 232ns ± 0% -0.85%
Atof64RandomBits 348ns ± 0% 369ns ± 0% +6.03%
Atof64RandomFloats 262ns ± 0% 262ns ± 0% ~
Atof32Decimal 72.0ns ± 0% 68.2ns ± 0% -5.28%
Atof32Float 92.1ns ± 0% 87.1ns ± 0% -5.43%
Atof32FloatExp 159ns ± 0% 158ns ± 0% -0.63%
Atof32Random 194ns ± 0% 191ns ± 0% -1.55%
Some tests in codegen/shift.go are enabled to verify the
expected instructions are generated.
Change-Id: I968715d10ada405a8c46132bf19b8ed9b85796d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/227337
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prior to this change, the shortcircuit pass could only
handle blocks containing only a single phi control value,
possibly wrapped in some OpNot and OpCopy values.
This change partially lifts this limitation.
It handles some cases in which the block contains other phi values.
This appears to happen most commonly in cases in which
the conditionals being checked involve the memory state,
in which case there is a phi memory value in the block.
The general idea here is to use the information we have about
the CFG to (1) move the other phi values into other blocks
and/or (2) rewrite uses of the other phi values in other blocks.
For example, consider this CFG:
p q
\ /
b
/ \
t u
And consider a phi value v in block b.
We'll write v = Phi(p: x, q: y) to say that v has value x corresponding
to inbound block p, and value y for block q.
We will rewrite this CFG to:
p q
| /
| b
|/ \
t u
What should we do with v?
Any uses of v in u can be replaced with y. Why?
If we are in block u, we came from b, and before that from q.
If prior to b we came from p, then we would have gone to t, not u.
Since we came from q, we know that v took the value y.
Uses of v in t are a bit more complicated.
It is going to end up being a phi value: Phi(p: ?, b: ?).
Suppose, after the rewrite, we came from block p.
Then, before the rewrite, we would have gone to b,
where v would have the value x.
So we have Phi(p: x, b: ?).
Suppose, after the rewrite, we came from block b.
Then we must have come from block q.
If we come from block q, v has value y.
So we have Phi(p: x, b: y).
Uses of v in t can thus be replaced with a new phi value,
with the same values as v, but with altered predecessors.
Similar reasoning can be employed to rewrite or replace
other uses of v elsewhere in the CFG, so that v itself can be eliminated,
and the CFG rewrite can proceed.
This change sets up the infrastructure for such optimizations
and adds a few cheap ones. All optimizations in this change depend
only on the shape of the CFG; future changes may also depend on where
v's uses are. That analysis is more powerful but more expensive,
and should be done incrementally.
The use of closures here is perhaps a bit unusual,
but during development it proved critical to having readable code.
We must decide early on whether we can safely do the CFG modifications,
and then later fix up the phis if so.
Safely storing state and decisions across these two phases is hard to do readably.
Closures solve the problem neatly.
I manually instrumented the code paths in shortcircuitPhiPlan.
During make.bash there are nearly 6000 invocations.
The least-visited code path gets run 85 times,
so all the code in this CL is reasonably well-exercised.
Here is a concrete example of code improved by this change:
func f(e interface{}) int {
if x, ok := e.(int); ok {
return x
}
return 0
}
Omitting PCDATA, FUNCDATA, and the like, it used to compile to:
"".f STEXT nosplit size=50 args=0x18 locals=0x0
0x0000 00000 (x.go:4) LEAQ type.int(SB), AX
0x0007 00007 (x.go:4) MOVQ "".e+8(SP), CX
0x000c 00012 (x.go:4) CMPQ AX, CX
0x000f 00015 (x.go:4) JNE 43
0x0011 00017 (x.go:4) MOVQ "".e+16(SP), AX
0x0016 00022 (x.go:4) MOVQ (AX), AX
0x0019 00025 (x.go:4) JNE 33
0x001b 00027 (x.go:5) MOVQ AX, "".~r1+24(SP)
0x0020 00032 (x.go:5) RET
0x0021 00033 (x.go:7) MOVQ $0, "".~r1+24(SP)
0x002a 00042 (x.go:7) RET
0x002b 00043 (x.go:7) MOVL $0, AX
0x0030 00048 (x.go:4) JMP 25
Afterwards, it compiles to:
"".f STEXT nosplit size=41 args=0x18 locals=0x0
0x0000 00000 (x.go:4) LEAQ type.int(SB), AX
0x0007 00007 (x.go:4) MOVQ "".e+8(SP), CX
0x000c 00012 (x.go:4) CMPQ AX, CX
0x000f 00015 (x.go:4) JNE 31
0x0011 00017 (x.go:4) MOVQ "".e+16(SP), AX
0x0016 00022 (x.go:4) MOVQ (AX), AX
0x0019 00025 (x.go:5) MOVQ AX, "".~r1+24(SP)
0x001e 00030 (x.go:5) RET
0x001f 00031 (x.go:7) MOVQ $0, "".~r1+24(SP)
0x0028 00040 (x.go:7) RET
Note that there is now only a single JNE and a single RET $0 path.
Updates #37608
Has a minor good effect on compilation speed and memory use.
Provides widespread improvements to generated code.
The rare, minor regressions I have investigated are due to
register allocation fluctuations.
file before after Δ %
addr2line 4376080 4371984 -4096 -0.094%
api 5945400 5933112 -12288 -0.207%
asm 5034312 5030216 -4096 -0.081%
buildid 2844952 2840856 -4096 -0.144%
cgo 4812872 4804680 -8192 -0.170%
compile 19622064 19610368 -11696 -0.060%
cover 5236648 5232552 -4096 -0.078%
dist 3658312 3654216 -4096 -0.112%
doc 4653512 4649416 -4096 -0.088%
fix 3370072 3365976 -4096 -0.122%
link 6671864 6667768 -4096 -0.061%
pprof 14781652 14761172 -20480 -0.139%
trace 11639684 11627396 -12288 -0.106%
vet 8252280 8231800 -20480 -0.248%
total 115052984 114934792 -118192 -0.103%
file before after Δ %
internal/cpu.s 3298 3296 -2 -0.061%
internal/bytealg.s 1730 1737 +7 +0.405%
cmd/vendor/golang.org/x/mod/semver.s 7332 7283 -49 -0.668%
image/color.s 8248 8156 -92 -1.115%
math.s 35966 35956 -10 -0.028%
math/cmplx.s 6596 6575 -21 -0.318%
runtime.s 480566 480053 -513 -0.107%
sync.s 16408 16385 -23 -0.140%
math/rand.s 10447 10406 -41 -0.392%
internal/reflectlite.s 28408 28366 -42 -0.148%
errors.s 2736 2701 -35 -1.279%
sort.s 17031 17036 +5 +0.029%
io.s 16993 16964 -29 -0.171%
container/heap.s 2006 1997 -9 -0.449%
text/tabwriter.s 9570 9552 -18 -0.188%
bytes.s 31823 31594 -229 -0.720%
strconv.s 52760 52717 -43 -0.082%
vendor/golang.org/x/text/transform.s 16713 16706 -7 -0.042%
strings.s 42590 42563 -27 -0.063%
bufio.s 22883 22785 -98 -0.428%
encoding/base32.s 9586 9531 -55 -0.574%
syscall.s 82237 82243 +6 +0.007%
image.s 37465 37452 -13 -0.035%
regexp/syntax.s 82827 82769 -58 -0.070%
image/draw.s 18698 18584 -114 -0.610%
image/jpeg.s 36560 36549 -11 -0.030%
time.s 82557 82526 -31 -0.038%
context.s 10863 10820 -43 -0.396%
regexp.s 64114 64049 -65 -0.101%
os.s 51751 51524 -227 -0.439%
reflect.s 168240 168049 -191 -0.114%
cmd/go/internal/lockedfile/internal/filelock.s 2317 2290 -27 -1.165%
path/filepath.s 17831 17766 -65 -0.365%
io/ioutil.s 6994 6990 -4 -0.057%
encoding/binary.s 30791 30726 -65 -0.211%
cmd/vendor/golang.org/x/sys/unix.s 78055 78033 -22 -0.028%
encoding/pem.s 9280 9247 -33 -0.356%
crypto/cipher.s 20376 20374 -2 -0.010%
os/exec.s 29229 29140 -89 -0.304%
internal/goroot.s 4588 4579 -9 -0.196%
cmd/internal/browser.s 2246 2240 -6 -0.267%
cmd/vendor/golang.org/x/crypto/ssh/terminal.s 27183 27149 -34 -0.125%
fmt.s 76625 76484 -141 -0.184%
encoding/hex.s 6154 6152 -2 -0.032%
compress/lzw.s 7063 7059 -4 -0.057%
database/sql/driver.s 18875 18862 -13 -0.069%
debug/plan9obj.s 8268 8266 -2 -0.024%
net/url.s 29724 29719 -5 -0.017%
encoding/csv.s 12872 12856 -16 -0.124%
debug/gosym.s 25303 25268 -35 -0.138%
compress/flate.s 50952 51019 +67 +0.131%
compress/zlib.s 7277 7266 -11 -0.151%
archive/zip.s 42155 42111 -44 -0.104%
debug/dwarf.s 107632 107541 -91 -0.085%
database/sql.s 98373 98028 -345 -0.351%
os/user.s 14722 14708 -14 -0.095%
encoding/json.s 105836 105711 -125 -0.118%
debug/macho.s 32598 32560 -38 -0.117%
encoding/gob.s 136478 135755 -723 -0.530%
debug/pe.s 31160 30869 -291 -0.934%
debug/elf.s 63495 63302 -193 -0.304%
vendor/golang.org/x/text/unicode/bidi.s 27220 27217 -3 -0.011%
vendor/golang.org/x/text/secure/bidirule.s 3363 3352 -11 -0.327%
go/token.s 12036 12035 -1 -0.008%
flag.s 22277 22256 -21 -0.094%
mime.s 39696 39509 -187 -0.471%
go/scanner.s 19033 19020 -13 -0.068%
archive/tar.s 70936 70581 -355 -0.500%
internal/xcoff.s 22823 22820 -3 -0.013%
text/scanner.s 11631 11629 -2 -0.017%
encoding/xml.s 110534 110408 -126 -0.114%
math/big.s 183636 183545 -91 -0.050%
image/gif.s 27376 27343 -33 -0.121%
crypto/dsa.s 6029 5969 -60 -0.995%
image/png.s 42947 42939 -8 -0.019%
crypto/rand.s 6866 6854 -12 -0.175%
vendor/golang.org/x/text/unicode/norm.s 66394 66354 -40 -0.060%
runtime/trace.s 2603 2521 -82 -3.150%
crypto/ed25519.s 6321 6300 -21 -0.332%
text/template/parse.s 93910 93844 -66 -0.070%
crypto/rsa.s 31460 31369 -91 -0.289%
encoding/asn1.s 57021 57023 +2 +0.004%
crypto/elliptic.s 51382 51363 -19 -0.037%
crypto/x509/pkix.s 10386 10342 -44 -0.424%
vendor/golang.org/x/net/idna.s 24482 24466 -16 -0.065%
vendor/golang.org/x/crypto/cryptobyte.s 33479 33280 -199 -0.594%
crypto/ecdsa.s 11936 11883 -53 -0.444%
go/constant.s 43670 42663 -1007 -2.306%
go/ast.s 80383 80191 -192 -0.239%
testing.s 68069 68057 -12 -0.018%
runtime/pprof.s 59613 59603 -10 -0.017%
testing/iotest.s 4895 4891 -4 -0.082%
internal/trace.s 78136 78089 -47 -0.060%
cmd/internal/goobj2.s 13158 13154 -4 -0.030%
cmd/internal/src.s 17661 17657 -4 -0.023%
go/parser.s 79046 78880 -166 -0.210%
cmd/internal/objabi.s 16367 16343 -24 -0.147%
text/template.s 94899 94486 -413 -0.435%
go/printer.s 77267 76992 -275 -0.356%
cmd/internal/goobj.s 25988 25947 -41 -0.158%
runtime/pprof/internal/profile.s 102066 101933 -133 -0.130%
go/format.s 5419 5371 -48 -0.886%
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s 37181 37149 -32 -0.086%
go/doc.s 74533 74132 -401 -0.538%
html/template.s 88743 88389 -354 -0.399%
cmd/asm/internal/lex.s 24881 24872 -9 -0.036%
cmd/internal/buildid.s 18263 18256 -7 -0.038%
cmd/vendor/golang.org/x/arch/x86/x86asm.s 80036 79980 -56 -0.070%
go/build.s 68905 68737 -168 -0.244%
cmd/cover.s 46070 45950 -120 -0.260%
cmd/internal/obj.s 117001 116991 -10 -0.009%
cmd/doc.s 62700 62419 -281 -0.448%
cmd/internal/obj/arm.s 66745 66687 -58 -0.087%
cmd/compile/internal/syntax.s 145406 145062 -344 -0.237%
cmd/internal/obj/wasm.s 44049 44027 -22 -0.050%
net.s 291835 291020 -815 -0.279%
cmd/dist.s 209020 208807 -213 -0.102%
cmd/cgo.s 241564 241102 -462 -0.191%
vendor/golang.org/x/net/http/httpproxy.s 9407 9399 -8 -0.085%
log/syslog.s 7921 7909 -12 -0.151%
go/types.s 319325 317513 -1812 -0.567%
vendor/golang.org/x/net/http/httpguts.s 3834 3825 -9 -0.235%
mime/multipart.s 21414 21343 -71 -0.332%
cmd/internal/obj/ppc64.s 119949 119938 -11 -0.009%
cmd/compile/internal/logopt.s 10158 10118 -40 -0.394%
vendor/golang.org/x/net/nettest.s 28012 27991 -21 -0.075%
go/internal/srcimporter.s 6405 6380 -25 -0.390%
go/internal/gcimporter.s 34525 34493 -32 -0.093%
net/mail.s 23937 23720 -217 -0.907%
go/internal/gccgoimporter.s 56095 56038 -57 -0.102%
cmd/compile/internal/types.s 47247 47207 -40 -0.085%
cmd/api.s 39582 39558 -24 -0.061%
cmd/go/internal/base.s 12572 12551 -21 -0.167%
cmd/vendor/golang.org/x/xerrors.s 17846 17814 -32 -0.179%
cmd/vendor/golang.org/x/mod/sumdb/note.s 18142 18070 -72 -0.397%
cmd/go/internal/search.s 19994 19876 -118 -0.590%
cmd/go/internal/imports.s 16457 16428 -29 -0.176%
cmd/vendor/golang.org/x/mod/module.s 17838 17759 -79 -0.443%
cmd/go/internal/cache.s 30551 30514 -37 -0.121%
cmd/vendor/golang.org/x/mod/sumdb/tlog.s 36356 36321 -35 -0.096%
cmd/internal/test2json.s 9452 9408 -44 -0.466%
cmd/go/internal/mvs.s 25136 25092 -44 -0.175%
cmd/go/internal/txtar.s 3488 3461 -27 -0.774%
cmd/vendor/golang.org/x/mod/zip.s 18811 18800 -11 -0.058%
cmd/go/internal/version.s 11213 11171 -42 -0.375%
cmd/link/internal/benchmark.s 4941 4949 +8 +0.162%
cmd/internal/obj/s390x.s 126865 126849 -16 -0.013%
cmd/gofmt.s 30684 30596 -88 -0.287%
cmd/fix.s 87450 86906 -544 -0.622%
cmd/internal/obj/x86.s 88578 88556 -22 -0.025%
cmd/vendor/golang.org/x/mod/modfile.s 72450 72363 -87 -0.120%
cmd/oldlink/internal/loader.s 16743 16741 -2 -0.012%
cmd/pack.s 14863 14861 -2 -0.013%
cmd/go/internal/load.s 106742 106568 -174 -0.163%
cmd/oldlink/internal/objfile.s 21787 21780 -7 -0.032%
cmd/oldlink/internal/loadmacho.s 29309 29317 +8 +0.027%
cmd/oldlink/internal/loadelf.s 35013 35021 +8 +0.023%
cmd/asm/internal/asm.s 68550 68538 -12 -0.018%
cmd/link/internal/loader.s 94765 94564 -201 -0.212%
cmd/link/internal/loadelf.s 35663 35667 +4 +0.011%
cmd/link/internal/loadmacho.s 29501 29509 +8 +0.027%
cmd/vendor/golang.org/x/tools/go/analysis.s 4983 4976 -7 -0.140%
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s 16771 16709 -62 -0.370%
cmd/vendor/golang.org/x/tools/go/types/objectpath.s 18481 18456 -25 -0.135%
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2100 2085 -15 -0.714%
cmd/vendor/github.com/google/pprof/profile.s 150141 149620 -521 -0.347%
cmd/vendor/github.com/google/pprof/internal/measurement.s 10420 10404 -16 -0.154%
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s 36814 36755 -59 -0.160%
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s 6688 6673 -15 -0.224%
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s 9856 9784 -72 -0.731%
cmd/vendor/golang.org/x/tools/go/analysis/passes/composite.s 3011 2979 -32 -1.063%
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s 9737 9682 -55 -0.565%
cmd/vendor/golang.org/x/tools/go/cfg.s 30738 30725 -13 -0.042%
cmd/vendor/github.com/ianlancetaylor/demangle.s 175195 174513 -682 -0.389%
cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse.s 3625 3520 -105 -2.897%
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s 2987 2971 -16 -0.536%
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s 4372 4340 -32 -0.732%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s 8634 8611 -23 -0.266%
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 6189 6164 -25 -0.404%
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s 8089 8073 -16 -0.198%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr.s 2208 2177 -31 -1.404%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8050 8047 -3 -0.037%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s 3665 3629 -36 -0.982%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s 65773 65680 -93 -0.141%
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s 13328 13286 -42 -0.315%
cmd/vendor/golang.org/x/tools/go/types/typeutil.s 12263 12162 -101 -0.824%
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s 1459 1421 -38 -2.605%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s 5208 5191 -17 -0.326%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal.s 1801 1782 -19 -1.055%
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s 9569 9528 -41 -0.428%
cmd/go/internal/work.s 304928 304756 -172 -0.056%
crypto/x509.s 147340 147139 -201 -0.136%
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s 34287 34019 -268 -0.782%
crypto/tls.s 311603 310644 -959 -0.308%
cmd/oldlink/internal/ld.s 533115 532651 -464 -0.087%
cmd/oldlink/internal/wasm.s 16484 16458 -26 -0.158%
cmd/oldlink/internal/x86.s 18832 18830 -2 -0.011%
cmd/link/internal/ld.s 548200 547626 -574 -0.105%
cmd/link/internal/wasm.s 16760 16734 -26 -0.155%
cmd/link/internal/arm64.s 20850 20840 -10 -0.048%
cmd/link/internal/x86.s 17437 17435 -2 -0.011%
net/http.s 556647 555519 -1128 -0.203%
net/http/cookiejar.s 15849 15833 -16 -0.101%
expvar.s 9521 9508 -13 -0.137%
net/http/httptest.s 16471 16452 -19 -0.115%
cmd/vendor/github.com/google/pprof/internal/plugin.s 4266 4264 -2 -0.047%
net/http/cgi.s 23448 23428 -20 -0.085%
cmd/go/internal/web.s 16472 16428 -44 -0.267%
net/http/httputil.s 39672 39670 -2 -0.005%
net/rpc.s 33989 33965 -24 -0.071%
net/http/fcgi.s 19167 19162 -5 -0.026%
cmd/vendor/github.com/google/pprof/internal/symbolz.s 5861 5857 -4 -0.068%
cmd/vendor/github.com/google/pprof/internal/binutils.s 35842 35823 -19 -0.053%
cmd/vendor/github.com/google/pprof/internal/symbolizer.s 11449 11404 -45 -0.393%
cmd/go/internal/get.s 62726 62582 -144 -0.230%
cmd/vendor/github.com/google/pprof/internal/report.s 80032 80022 -10 -0.012%
cmd/go/internal/modfetch/codehost.s 89005 88871 -134 -0.151%
cmd/trace.s 116607 116496 -111 -0.095%
cmd/vendor/github.com/google/pprof/internal/driver.s 143234 143207 -27 -0.019%
cmd/vendor/github.com/google/pprof/driver.s 9000 8998 -2 -0.022%
cmd/go/internal/modfetch.s 126300 125726 -574 -0.454%
cmd/pprof.s 12317 12312 -5 -0.041%
cmd/go/internal/modconv.s 17878 17861 -17 -0.095%
cmd/go/internal/modload.s 150261 149763 -498 -0.331%
cmd/go/internal/clean.s 11122 11091 -31 -0.279%
cmd/go/internal/help.s 6523 6521 -2 -0.031%
cmd/go/internal/generate.s 11627 11614 -13 -0.112%
cmd/go/internal/envcmd.s 22034 21986 -48 -0.218%
cmd/go/internal/modget.s 38478 38398 -80 -0.208%
cmd/go/internal/modcmd.s 46430 46229 -201 -0.433%
cmd/go/internal/test.s 64399 64374 -25 -0.039%
cmd/compile/internal/ssa.s 3615264 3608276 -6988 -0.193%
cmd/compile/internal/gc.s 1538865 1537625 -1240 -0.081%
cmd/compile/internal/amd64.s 33593 33574 -19 -0.057%
cmd/compile/internal/x86.s 30871 30852 -19 -0.062%
total 19343565 19311284 -32281 -0.167%
Change-Id: Ib030eb79458827a5a5b6d0d2f98765f8325a4d7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/222923
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On s390x, some floating point arithmetic instructions (FSUB, FADD) generate flag.
This patch allows those related SSA ops to return a tuple, where the second argument of
the tuple is the generated flag. We can use the flag and remove the
subsequent comparison instruction (e.g: LTDBR).
This CL also reduces the .text section for math.test binary by 0.4KB.
Benchmarks:
name old time/op new time/op delta
Acos-18 12.1ns ± 0% 12.1ns ± 0% ~ (all equal)
Acosh-18 18.5ns ± 0% 18.5ns ± 0% ~ (all equal)
Asin-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal)
Asinh-18 19.4ns ± 0% 19.5ns ± 1% ~ (p=0.444 n=5+5)
Atan-18 10.0ns ± 0% 10.0ns ± 0% ~ (all equal)
Atanh-18 19.1ns ± 1% 19.2ns ± 2% ~ (p=0.841 n=5+5)
Atan2-18 16.4ns ± 0% 16.4ns ± 0% ~ (all equal)
Cbrt-18 14.8ns ± 0% 14.8ns ± 0% ~ (all equal)
Ceil-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Copysign-18 0.80ns ± 0% 0.80ns ± 0% ~ (all equal)
Cos-18 7.19ns ± 0% 7.19ns ± 0% ~ (p=0.556 n=4+5)
Cosh-18 12.4ns ± 0% 12.4ns ± 0% ~ (all equal)
Erf-18 10.8ns ± 0% 10.8ns ± 0% ~ (all equal)
Erfc-18 11.0ns ± 0% 11.0ns ± 0% ~ (all equal)
Erfinv-18 23.0ns ±16% 26.8ns ± 1% +16.90% (p=0.008 n=5+5)
Erfcinv-18 23.3ns ±15% 26.1ns ± 7% ~ (p=0.087 n=5+5)
Exp-18 8.67ns ± 0% 8.67ns ± 0% ~ (p=1.000 n=4+4)
ExpGo-18 50.8ns ± 3% 52.4ns ± 2% ~ (p=0.063 n=5+5)
Expm1-18 9.49ns ± 1% 9.47ns ± 0% ~ (p=1.000 n=5+5)
Exp2-18 52.7ns ± 1% 50.5ns ± 3% -4.10% (p=0.024 n=5+5)
Exp2Go-18 50.6ns ± 1% 48.4ns ± 3% -4.39% (p=0.008 n=5+5)
Abs-18 0.67ns ± 0% 0.67ns ± 0% ~ (p=0.444 n=5+5)
Dim-18 1.02ns ± 0% 1.03ns ± 0% +0.98% (p=0.008 n=5+5)
Floor-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Max-18 3.09ns ± 1% 3.05ns ± 0% -1.42% (p=0.008 n=5+5)
Min-18 3.32ns ± 1% 3.30ns ± 0% -0.72% (p=0.016 n=5+4)
Mod-18 62.3ns ± 1% 65.8ns ± 3% +5.55% (p=0.008 n=5+5)
Frexp-18 5.05ns ± 2% 4.98ns ± 0% ~ (p=0.683 n=5+5)
Gamma-18 24.4ns ± 0% 24.1ns ± 0% -1.23% (p=0.008 n=5+5)
Hypot-18 10.3ns ± 0% 10.3ns ± 0% ~ (all equal)
HypotGo-18 10.2ns ± 0% 10.2ns ± 0% ~ (all equal)
Ilogb-18 3.56ns ± 1% 3.54ns ± 0% ~ (p=0.595 n=5+5)
J0-18 113ns ± 0% 108ns ± 1% -4.42% (p=0.016 n=4+5)
J1-18 115ns ± 0% 109ns ± 1% -4.87% (p=0.016 n=4+5)
Jn-18 240ns ± 0% 230ns ± 2% -4.41% (p=0.008 n=5+5)
Ldexp-18 6.19ns ± 0% 6.19ns ± 0% ~ (p=0.444 n=5+5)
Lgamma-18 32.2ns ± 0% 32.2ns ± 0% ~ (all equal)
Log-18 13.1ns ± 0% 13.1ns ± 0% ~ (all equal)
Logb-18 4.23ns ± 0% 4.22ns ± 0% ~ (p=0.444 n=5+5)
Log1p-18 12.7ns ± 0% 12.7ns ± 0% ~ (all equal)
Log10-18 18.1ns ± 0% 18.2ns ± 0% ~ (p=0.167 n=5+5)
Log2-18 14.0ns ± 0% 14.0ns ± 0% ~ (all equal)
Modf-18 10.4ns ± 0% 10.5ns ± 0% +0.96% (p=0.016 n=4+5)
Nextafter32-18 11.3ns ± 0% 11.3ns ± 0% ~ (all equal)
Nextafter64-18 4.01ns ± 1% 3.97ns ± 0% ~ (p=0.333 n=5+4)
PowInt-18 32.7ns ± 0% 32.7ns ± 0% ~ (all equal)
PowFrac-18 33.2ns ± 0% 33.1ns ± 0% ~ (p=0.095 n=4+5)
Pow10Pos-18 1.58ns ± 0% 1.58ns ± 0% ~ (all equal)
Pow10Neg-18 5.81ns ± 0% 5.81ns ± 0% ~ (all equal)
Round-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
RoundToEven-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Remainder-18 40.6ns ± 0% 40.7ns ± 0% ~ (p=0.238 n=5+4)
Signbit-18 1.57ns ± 0% 1.57ns ± 0% ~ (all equal)
Sin-18 6.75ns ± 0% 6.74ns ± 0% ~ (p=0.333 n=5+4)
Sincos-18 29.5ns ± 0% 29.5ns ± 0% ~ (all equal)
Sinh-18 14.4ns ± 0% 14.4ns ± 0% ~ (all equal)
SqrtIndirect-18 3.97ns ± 0% 4.15ns ± 0% +4.59% (p=0.008 n=5+5)
SqrtLatency-18 8.01ns ± 0% 8.01ns ± 0% ~ (all equal)
SqrtIndirectLatency-18 11.6ns ± 0% 11.6ns ± 0% ~ (all equal)
SqrtGoLatency-18 44.7ns ± 0% 45.0ns ± 0% +0.67% (p=0.008 n=5+5)
SqrtPrime-18 1.26µs ± 0% 1.27µs ± 0% +0.63% (p=0.029 n=4+4)
Tan-18 11.1ns ± 0% 11.1ns ± 0% ~ (all equal)
Tanh-18 15.8ns ± 0% 15.8ns ± 0% ~ (all equal)
Trunc-18 0.78ns ± 0% 0.78ns ± 0% ~ (all equal)
Y0-18 113ns ± 2% 108ns ± 3% -5.11% (p=0.008 n=5+5)
Y1-18 112ns ± 3% 107ns ± 0% -4.29% (p=0.000 n=5+4)
Yn-18 229ns ± 0% 220ns ± 1% -3.76% (p=0.016 n=4+5)
Float64bits-18 1.09ns ± 0% 1.09ns ± 0% ~ (all equal)
Float64frombits-18 0.55ns ± 0% 0.55ns ± 0% ~ (all equal)
Float32bits-18 0.96ns ±16% 0.86ns ± 0% ~ (p=0.563 n=5+5)
Float32frombits-18 1.03ns ±28% 0.84ns ± 0% ~ (p=0.167 n=5+5)
FMA-18 1.60ns ± 0% 1.60ns ± 0% ~ (all equal)
[Geo mean] 10.0ns 9.9ns -0.41%
Change-Id: Ief7e63ea5a8ba404b0a4696e12b9b7e0b05a9a03
Reviewed-on: https://go-review.googlesource.com/c/go/+/209160
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.
Fixes #37316.
Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change includes the following:
- Generate LXV/STXV sequences instead of LXVD2X/STXVD2X on power9.
These instructions do not require an index register, which
allows more loads and stores within a loop without initializing
multiple index registers. The LoweredQuadXXX generate LXV/STXV.
- Create LoweredMoveXXXShort and LoweredZeroXXXShort for short
moves that don't generate loops, and therefore don't clobber the
address registers or flags.
- Use registers other than R3 and R4 to avoid conflicting with
registers that have already been allocated to avoid unnecessary
register moves.
- Eliminate the use of R14 as scratch register and use R31
instead.
- Add PCALIGN when the LoweredMoveXXX or LoweredZeroXXX generates a
loop with more than 3 iterations.
This performance opportunity was noticed in github.com/golang/snappy
benchmarks. Results on power9:
WordsDecode1e1 54.1ns ± 0% 53.8ns ± 0% -0.51% (p=0.029 n=4+4)
WordsDecode1e2 287ns ± 0% 282ns ± 1% -1.83% (p=0.029 n=4+4)
WordsDecode1e3 3.98µs ± 0% 3.64µs ± 0% -8.52% (p=0.029 n=4+4)
WordsDecode1e4 66.9µs ± 0% 67.0µs ± 0% +0.20% (p=0.029 n=4+4)
WordsDecode1e5 723µs ± 0% 723µs ± 0% -0.01% (p=0.200 n=4+4)
WordsDecode1e6 7.21ms ± 0% 7.21ms ± 0% -0.02% (p=1.000 n=4+4)
WordsEncode1e1 29.9ns ± 0% 29.4ns ± 0% -1.51% (p=0.029 n=4+4)
WordsEncode1e2 2.12µs ± 0% 1.75µs ± 0% -17.70% (p=0.029 n=4+4)
WordsEncode1e3 11.7µs ± 0% 11.2µs ± 0% -4.61% (p=0.029 n=4+4)
WordsEncode1e4 119µs ± 0% 120µs ± 0% +0.36% (p=0.029 n=4+4)
WordsEncode1e5 1.21ms ± 0% 1.22ms ± 0% +0.41% (p=0.029 n=4+4)
WordsEncode1e6 12.0ms ± 0% 12.0ms ± 0% +0.57% (p=0.029 n=4+4)
RandomEncode 286µs ± 0% 203µs ± 0% -28.82% (p=0.029 n=4+4)
ExtendMatch 47.4µs ± 0% 47.0µs ± 0% -0.85% (p=0.029 n=4+4)
Change-Id: Iecad3a39ae55280286e42760a5c9d5c1168f5858
Reviewed-on: https://go-review.googlesource.com/c/go/+/226539
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before using some CPU instructions, we must check for their presence.
We use global variables in the runtime package to record features.
Prior to this CL, we issued a regular memory load for these features.
The downside to this is that, because it is a regular memory load,
it cannot be hoisted out of loops or otherwise reordered with other loads.
This CL introduces a new intrinsic just for checking cpu features.
It still ends up resulting in a memory load, but that memory load can
now be floated to the entry block and rematerialized as needed.
One downside is that the regular load could be combined with the comparison
into a CMPBconstload+NE. This new intrinsic cannot; it generates MOVB+TESTB+NE.
(It is possible that MOVBQZX+TESTQ+NE would be better.)
This CL does only amd64. It is easy to extend to other architectures.
For the benchmark in #36196, on my machine, this offers a mild speedup.
name old time/op new time/op delta
FMA-8 1.39ns ± 6% 1.29ns ± 9% -7.19% (p=0.000 n=97+96)
NonFMA-8 2.03ns ±11% 2.04ns ±12% ~ (p=0.618 n=99+98)
Updates #15808
Updates #36196
Change-Id: I75e2fcfcf5a6df1bdb80657a7143bed69fca6deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/212360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Things like CMPQ 4(AX)(BX*8), CX
Fixes #37955
Change-Id: Icbed430f65c91a0e3f38a633d8321d79433ad8b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/224219
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The compiler-inserted write barrier calls use a special ABI
for speed and to minimize the binary size impact.
runtime.gcWriteBarrier takes its args in DI and AX.
This change adds gcWriteBarrier wrapper functions,
varying only in the register used for the second argument.
(Allowing variation in the first argument doesn't offer improvements,
which is convenient, as it avoids quadratic API growth.)
This reduces the number of register copies.
The goals are reduced binary size via reduced register pressure/copies.
One downside to this change is that when the write barrier is on,
we may bounce through several different write barrier wrappers,
which is bad for the instruction cache.
Package runtime write barrier benchmarks for this change:
name old time/op new time/op delta
WriteBarrier-8 16.6ns ± 6% 15.6ns ± 6% -5.73% (p=0.000 n=97+99)
BulkWriteBarrier-8 4.37ns ± 7% 4.22ns ± 8% -3.45% (p=0.000 n=96+99)
However, I don't particularly trust these numbers.
I ran runtime.BenchmarkWriteBarrier multiple times as I rebased
this change, and noticed that the results have high variance
depending on the parent change, perhaps due to aligment.
This change was stress tested with GOGC=1 GODEBUG=gccheckmark=1 go test std.
This change reduces binary sizes:
file before after Δ %
addr2line 4308720 4296688 -12032 -0.279%
api 5965592 5945368 -20224 -0.339%
asm 5148088 5025464 -122624 -2.382%
buildid 2848760 2844904 -3856 -0.135%
cgo 4828968 4812840 -16128 -0.334%
compile 19754720 19529744 -224976 -1.139%
cover 5256840 5236600 -20240 -0.385%
dist 3670312 3658264 -12048 -0.328%
doc 4669608 4657576 -12032 -0.258%
fix 3377976 3365944 -12032 -0.356%
link 6614888 6586472 -28416 -0.430%
nm 4258368 4254528 -3840 -0.090%
objdump 4656336 4644304 -12032 -0.258%
pack 2295176 2295432 +256 +0.011%
pprof 14762356 14709364 -52992 -0.359%
test2json 2824456 2820600 -3856 -0.137%
trace 11684404 11643700 -40704 -0.348%
vet 8284760 8252248 -32512 -0.392%
total 115210328 114580040 -630288 -0.547%
This change improves compiler performance:
name old time/op new time/op delta
Template 208ms ± 3% 207ms ± 3% -0.40% (p=0.030 n=43+44)
Unicode 80.2ms ± 3% 81.3ms ± 3% +1.25% (p=0.000 n=41+44)
GoTypes 699ms ± 3% 694ms ± 2% -0.71% (p=0.016 n=42+37)
Compiler 3.26s ± 2% 3.23s ± 2% -0.86% (p=0.000 n=43+45)
SSA 6.97s ± 1% 6.93s ± 1% -0.63% (p=0.000 n=43+45)
Flate 134ms ± 3% 133ms ± 2% ~ (p=0.139 n=45+42)
GoParser 165ms ± 2% 164ms ± 1% -0.79% (p=0.000 n=45+40)
Reflect 434ms ± 4% 435ms ± 4% ~ (p=0.937 n=44+44)
Tar 181ms ± 2% 181ms ± 2% ~ (p=0.702 n=43+45)
XML 244ms ± 2% 244ms ± 2% ~ (p=0.237 n=45+44)
[Geo mean] 403ms 402ms -0.29%
name old user-time/op new user-time/op delta
Template 271ms ± 2% 268ms ± 1% -1.40% (p=0.000 n=42+42)
Unicode 117ms ± 3% 116ms ± 5% ~ (p=0.066 n=45+45)
GoTypes 948ms ± 2% 936ms ± 2% -1.30% (p=0.000 n=41+40)
Compiler 4.26s ± 1% 4.21s ± 2% -1.25% (p=0.000 n=37+45)
SSA 9.52s ± 2% 9.41s ± 1% -1.18% (p=0.000 n=44+45)
Flate 167ms ± 2% 165ms ± 2% -1.15% (p=0.000 n=44+41)
GoParser 201ms ± 2% 198ms ± 1% -1.40% (p=0.000 n=43+43)
Reflect 563ms ± 8% 560ms ± 7% ~ (p=0.206 n=45+44)
Tar 224ms ± 2% 222ms ± 2% -0.81% (p=0.000 n=45+45)
XML 308ms ± 2% 304ms ± 1% -1.17% (p=0.000 n=42+43)
[Geo mean] 525ms 519ms -1.08%
name old alloc/op new alloc/op delta
Template 36.3MB ± 0% 36.3MB ± 0% ~ (p=0.421 n=5+5)
Unicode 28.4MB ± 0% 28.3MB ± 0% ~ (p=0.056 n=5+5)
GoTypes 121MB ± 0% 121MB ± 0% -0.14% (p=0.008 n=5+5)
Compiler 567MB ± 0% 567MB ± 0% -0.06% (p=0.016 n=4+5)
SSA 1.26GB ± 0% 1.26GB ± 0% -0.07% (p=0.008 n=5+5)
Flate 22.9MB ± 0% 22.8MB ± 0% ~ (p=0.310 n=5+5)
GoParser 28.0MB ± 0% 27.9MB ± 0% -0.09% (p=0.008 n=5+5)
Reflect 78.4MB ± 0% 78.4MB ± 0% -0.03% (p=0.008 n=5+5)
Tar 34.2MB ± 0% 34.2MB ± 0% -0.05% (p=0.008 n=5+5)
XML 44.4MB ± 0% 44.4MB ± 0% -0.04% (p=0.016 n=5+5)
[Geo mean] 76.4MB 76.3MB -0.05%
name old allocs/op new allocs/op delta
Template 356k ± 0% 356k ± 0% -0.13% (p=0.008 n=5+5)
Unicode 326k ± 0% 326k ± 0% -0.07% (p=0.008 n=5+5)
GoTypes 1.24M ± 0% 1.24M ± 0% -0.24% (p=0.008 n=5+5)
Compiler 5.30M ± 0% 5.28M ± 0% -0.34% (p=0.008 n=5+5)
SSA 11.9M ± 0% 11.9M ± 0% -0.16% (p=0.008 n=5+5)
Flate 226k ± 0% 225k ± 0% -0.12% (p=0.008 n=5+5)
GoParser 287k ± 0% 286k ± 0% -0.29% (p=0.008 n=5+5)
Reflect 930k ± 0% 929k ± 0% -0.05% (p=0.008 n=5+5)
Tar 332k ± 0% 331k ± 0% -0.12% (p=0.008 n=5+5)
XML 411k ± 0% 411k ± 0% -0.12% (p=0.008 n=5+5)
[Geo mean] 771k 770k -0.16%
For some packages, this change significantly reduces the size of executable text.
Examples:
file before after Δ %
cmd/internal/obj/arm.s 68658 66855 -1803 -2.626%
cmd/internal/obj/mips.s 57486 56272 -1214 -2.112%
cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250%
cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053%
cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019%
Full listing:
file before after Δ %
container/ring.s 1890 1870 -20 -1.058%
container/list.s 5366 5390 +24 +0.447%
internal/cpu.s 3298 3295 -3 -0.091%
internal/testlog.s 1507 1501 -6 -0.398%
image/color.s 8281 8248 -33 -0.399%
runtime.s 480970 480075 -895 -0.186%
sync.s 16497 16408 -89 -0.539%
internal/singleflight.s 2591 2577 -14 -0.540%
math/rand.s 10456 10438 -18 -0.172%
cmd/go/internal/par.s 2801 2790 -11 -0.393%
internal/reflectlite.s 28477 28417 -60 -0.211%
errors.s 2750 2736 -14 -0.509%
internal/oserror.s 446 434 -12 -2.691%
sort.s 17061 17046 -15 -0.088%
io.s 17063 16999 -64 -0.375%
vendor/golang.org/x/crypto/hkdf.s 1962 1936 -26 -1.325%
text/tabwriter.s 9617 9574 -43 -0.447%
hash/crc64.s 3414 3408 -6 -0.176%
hash/crc32.s 6657 6651 -6 -0.090%
bytes.s 31932 31863 -69 -0.216%
strconv.s 53158 52799 -359 -0.675%
strings.s 42829 42665 -164 -0.383%
encoding/ascii85.s 4833 4791 -42 -0.869%
vendor/golang.org/x/text/transform.s 16810 16724 -86 -0.512%
path.s 6848 6845 -3 -0.044%
encoding/base32.s 9658 9592 -66 -0.683%
bufio.s 23051 22908 -143 -0.620%
compress/bzip2.s 11773 11764 -9 -0.076%
image.s 37565 37502 -63 -0.168%
syscall.s 82359 82279 -80 -0.097%
regexp/syntax.s 83573 82930 -643 -0.769%
image/jpeg.s 36535 36490 -45 -0.123%
regexp.s 64396 64214 -182 -0.283%
time.s 82724 82622 -102 -0.123%
plugin.s 6539 6536 -3 -0.046%
context.s 10959 10865 -94 -0.858%
internal/poll.s 24286 24270 -16 -0.066%
reflect.s 168304 167927 -377 -0.224%
internal/fmtsort.s 7416 7376 -40 -0.539%
os.s 52465 51787 -678 -1.292%
cmd/go/internal/lockedfile/internal/filelock.s 2326 2317 -9 -0.387%
os/signal.s 4657 4648 -9 -0.193%
runtime/debug.s 6040 5998 -42 -0.695%
encoding/binary.s 30838 30801 -37 -0.120%
vendor/golang.org/x/net/route.s 23694 23491 -203 -0.857%
path/filepath.s 17895 17889 -6 -0.034%
cmd/vendor/golang.org/x/sys/unix.s 78125 78109 -16 -0.020%
io/ioutil.s 6999 6996 -3 -0.043%
encoding/base64.s 12094 12007 -87 -0.719%
crypto/cipher.s 20466 20372 -94 -0.459%
cmd/go/internal/robustio.s 2672 2669 -3 -0.112%
encoding/pem.s 9302 9286 -16 -0.172%
internal/obscuretestdata.s 1719 1695 -24 -1.396%
crypto/aes.s 11014 11002 -12 -0.109%
os/exec.s 29388 29231 -157 -0.534%
cmd/internal/browser.s 2266 2260 -6 -0.265%
internal/goroot.s 4601 4592 -9 -0.196%
vendor/golang.org/x/crypto/chacha20poly1305.s 8945 8942 -3 -0.034%
cmd/vendor/golang.org/x/crypto/ssh/terminal.s 27226 27195 -31 -0.114%
index/suffixarray.s 36431 36411 -20 -0.055%
fmt.s 77017 76709 -308 -0.400%
encoding/hex.s 6241 6154 -87 -1.394%
compress/lzw.s 7133 7069 -64 -0.897%
database/sql/driver.s 18888 18877 -11 -0.058%
net/url.s 29838 29739 -99 -0.332%
debug/plan9obj.s 8329 8279 -50 -0.600%
encoding/csv.s 12986 12902 -84 -0.647%
debug/gosym.s 25403 25330 -73 -0.287%
compress/flate.s 51192 50970 -222 -0.434%
vendor/golang.org/x/net/dns/dnsmessage.s 86769 86208 -561 -0.647%
compress/gzip.s 9791 9758 -33 -0.337%
compress/zlib.s 7310 7277 -33 -0.451%
archive/zip.s 42356 42166 -190 -0.449%
debug/dwarf.s 108259 107730 -529 -0.489%
encoding/json.s 106378 105910 -468 -0.440%
os/user.s 14751 14724 -27 -0.183%
database/sql.s 99011 98404 -607 -0.613%
log.s 9466 9423 -43 -0.454%
debug/pe.s 31272 31182 -90 -0.288%
debug/macho.s 32764 32608 -156 -0.476%
encoding/gob.s 136976 136517 -459 -0.335%
vendor/golang.org/x/text/unicode/bidi.s 27318 27276 -42 -0.154%
archive/tar.s 71416 70975 -441 -0.618%
vendor/golang.org/x/net/http2/hpack.s 23892 23848 -44 -0.184%
vendor/golang.org/x/text/secure/bidirule.s 3354 3351 -3 -0.089%
mime/quotedprintable.s 5960 5925 -35 -0.587%
net/http/internal.s 5874 5853 -21 -0.358%
math/big.s 184147 183692 -455 -0.247%
debug/elf.s 63775 63567 -208 -0.326%
mime.s 39802 39709 -93 -0.234%
encoding/xml.s 111038 110713 -325 -0.293%
crypto/dsa.s 6044 6029 -15 -0.248%
go/token.s 12139 12077 -62 -0.511%
crypto/rand.s 6889 6866 -23 -0.334%
go/scanner.s 19030 19008 -22 -0.116%
flag.s 22320 22236 -84 -0.376%
vendor/golang.org/x/text/unicode/norm.s 66652 66391 -261 -0.392%
crypto/rsa.s 31671 31650 -21 -0.066%
crypto/elliptic.s 51553 51403 -150 -0.291%
internal/xcoff.s 22950 22822 -128 -0.558%
go/constant.s 43750 43689 -61 -0.139%
encoding/asn1.s 57086 57035 -51 -0.089%
runtime/trace.s 2609 2603 -6 -0.230%
crypto/x509/pkix.s 10458 10471 +13 +0.124%
image/gif.s 27544 27385 -159 -0.577%
vendor/golang.org/x/net/idna.s 24558 24502 -56 -0.228%
image/png.s 42775 42685 -90 -0.210%
vendor/golang.org/x/crypto/cryptobyte.s 33616 33493 -123 -0.366%
go/ast.s 80684 80449 -235 -0.291%
net/internal/socktest.s 16571 16535 -36 -0.217%
crypto/ecdsa.s 11948 11936 -12 -0.100%
text/template/parse.s 95138 94002 -1136 -1.194%
runtime/pprof.s 59702 59639 -63 -0.106%
testing.s 68427 68088 -339 -0.495%
internal/testenv.s 5620 5596 -24 -0.427%
testing/internal/testdeps.s 3312 3294 -18 -0.543%
internal/trace.s 78473 78239 -234 -0.298%
testing/iotest.s 4968 4908 -60 -1.208%
os/signal/internal/pty.s 3011 2990 -21 -0.697%
testing/quick.s 12179 12125 -54 -0.443%
cmd/internal/bio.s 9286 9274 -12 -0.129%
cmd/internal/src.s 17684 17663 -21 -0.119%
cmd/internal/goobj2.s 12588 12558 -30 -0.238%
cmd/internal/objabi.s 16408 16390 -18 -0.110%
go/printer.s 77417 77308 -109 -0.141%
go/parser.s 80045 79113 -932 -1.164%
go/format.s 5434 5419 -15 -0.276%
cmd/internal/goobj.s 26146 25954 -192 -0.734%
runtime/pprof/internal/profile.s 102518 102178 -340 -0.332%
text/template.s 95343 94935 -408 -0.428%
cmd/internal/dwarf.s 31718 31572 -146 -0.460%
cmd/vendor/golang.org/x/arch/arm/armasm.s 45240 45151 -89 -0.197%
internal/lazytemplate.s 1470 1457 -13 -0.884%
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s 37253 37220 -33 -0.089%
cmd/asm/internal/flags.s 2593 2590 -3 -0.116%
cmd/asm/internal/lex.s 25068 24921 -147 -0.586%
cmd/internal/buildid.s 18536 18263 -273 -1.473%
cmd/vendor/golang.org/x/arch/x86/x86asm.s 80209 80105 -104 -0.130%
go/doc.s 75140 74585 -555 -0.739%
cmd/internal/edit.s 3893 3899 +6 +0.154%
html/template.s 89377 88809 -568 -0.636%
cmd/vendor/golang.org/x/arch/arm64/arm64asm.s 117998 117824 -174 -0.147%
cmd/internal/obj.s 115015 114290 -725 -0.630%
go/build.s 69379 68862 -517 -0.745%
cmd/internal/objfile.s 48106 47982 -124 -0.258%
cmd/cover.s 46239 46113 -126 -0.272%
cmd/addr2line.s 2845 2833 -12 -0.422%
cmd/internal/obj/arm.s 68658 66855 -1803 -2.626%
cmd/internal/obj/mips.s 57486 56272 -1214 -2.112%
cmd/internal/obj/riscv.s 63834 63006 -828 -1.297%
cmd/compile/internal/syntax.s 146582 145456 -1126 -0.768%
cmd/internal/obj/wasm.s 44117 44066 -51 -0.116%
cmd/cgo.s 242645 241653 -992 -0.409%
cmd/internal/obj/arm64.s 152107 147163 -4944 -3.250%
net.s 295972 292010 -3962 -1.339%
go/types.s 321371 319432 -1939 -0.603%
vendor/golang.org/x/net/http/httpproxy.s 9450 9423 -27 -0.286%
net/textproto.s 19455 19406 -49 -0.252%
cmd/internal/obj/ppc64.s 125544 120456 -5088 -4.053%
go/internal/srcimporter.s 6475 6409 -66 -1.019%
log/syslog.s 8017 7929 -88 -1.098%
cmd/compile/internal/logopt.s 10183 10162 -21 -0.206%
net/mail.s 24085 23948 -137 -0.569%
mime/multipart.s 21527 21420 -107 -0.497%
cmd/internal/obj/s390x.s 127610 127757 +147 +0.115%
go/internal/gcimporter.s 34913 34548 -365 -1.045%
vendor/golang.org/x/net/nettest.s 28103 28016 -87 -0.310%
cmd/go/internal/cfg.s 9967 9916 -51 -0.512%
cmd/api.s 39703 39603 -100 -0.252%
go/internal/gccgoimporter.s 56470 56120 -350 -0.620%
go/importer.s 2077 2056 -21 -1.011%
cmd/compile/internal/types.s 48202 47282 -920 -1.909%
cmd/go/internal/str.s 4341 4320 -21 -0.484%
cmd/internal/obj/x86.s 89440 88625 -815 -0.911%
cmd/go/internal/base.s 12667 12580 -87 -0.687%
cmd/go/internal/cache.s 30754 30571 -183 -0.595%
cmd/doc.s 62976 62755 -221 -0.351%
cmd/go/internal/search.s 20114 19993 -121 -0.602%
cmd/vendor/golang.org/x/xerrors.s 17923 17855 -68 -0.379%
cmd/go/internal/lockedfile.s 16451 16415 -36 -0.219%
cmd/vendor/golang.org/x/mod/sumdb/note.s 18200 18150 -50 -0.275%
cmd/vendor/golang.org/x/mod/module.s 17869 17851 -18 -0.101%
cmd/asm/internal/arch.s 37533 37482 -51 -0.136%
cmd/fix.s 87728 87492 -236 -0.269%
cmd/vendor/golang.org/x/mod/sumdb/tlog.s 36394 36367 -27 -0.074%
cmd/vendor/golang.org/x/mod/sumdb/dirhash.s 4990 4963 -27 -0.541%
cmd/go/internal/imports.s 16499 16469 -30 -0.182%
cmd/vendor/golang.org/x/mod/zip.s 18816 18745 -71 -0.377%
cmd/go/internal/cmdflag.s 5126 5123 -3 -0.059%
cmd/internal/test2json.s 9540 9452 -88 -0.922%
cmd/go/internal/tool.s 3629 3623 -6 -0.165%
cmd/go/internal/version.s 11232 11220 -12 -0.107%
cmd/go/internal/mvs.s 25383 25179 -204 -0.804%
cmd/nm.s 5815 5803 -12 -0.206%
cmd/dist.s 210146 209140 -1006 -0.479%
cmd/asm/internal/asm.s 68655 68549 -106 -0.154%
cmd/vendor/golang.org/x/mod/modfile.s 72974 72510 -464 -0.636%
cmd/go/internal/load.s 107548 106861 -687 -0.639%
cmd/link/internal/sym.s 18708 18581 -127 -0.679%
cmd/asm.s 3367 3343 -24 -0.713%
cmd/gofmt.s 30795 30698 -97 -0.315%
cmd/link/internal/objfile.s 21828 21630 -198 -0.907%
cmd/pack.s 14878 14869 -9 -0.060%
cmd/vendor/github.com/google/pprof/internal/elfexec.s 6788 6782 -6 -0.088%
cmd/test2json.s 1647 1641 -6 -0.364%
cmd/link/internal/loader.s 48677 48483 -194 -0.399%
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s 16783 16773 -10 -0.060%
cmd/link/internal/loadelf.s 35464 35126 -338 -0.953%
cmd/link/internal/loadmacho.s 29438 29180 -258 -0.876%
cmd/link/internal/loadpe.s 16440 16371 -69 -0.420%
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2106 2100 -6 -0.285%
cmd/link/internal/loadxcoff.s 11711 11615 -96 -0.820%
cmd/vendor/golang.org/x/tools/go/analysis/internal/facts.s 14954 14883 -71 -0.475%
cmd/vendor/golang.org/x/tools/go/ast/inspector.s 5394 5374 -20 -0.371%
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s 37029 36822 -207 -0.559%
cmd/vendor/golang.org/x/tools/go/analysis/passes/inspect.s 340 337 -3 -0.882%
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s 9919 9858 -61 -0.615%
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s 6705 6690 -15 -0.224%
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s 9783 9741 -42 -0.429%
cmd/vendor/golang.org/x/tools/go/cfg.s 31699 30742 -957 -3.019%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ifaceassert.s 2768 2762 -6 -0.217%
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s 3031 2998 -33 -1.089%
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s 4382 4376 -6 -0.137%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s 8654 8642 -12 -0.139%
cmd/vendor/golang.org/x/tools/go/analysis/passes/stringintconv.s 3458 3446 -12 -0.347%
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s 8011 7995 -16 -0.200%
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 6205 6193 -12 -0.193%
cmd/vendor/golang.org/x/tools/go/ast/astutil.s 66183 65861 -322 -0.487%
cmd/vendor/github.com/google/pprof/profile.s 150844 150261 -583 -0.386%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8057 8054 -3 -0.037%
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s 3670 3667 -3 -0.082%
cmd/vendor/github.com/google/pprof/internal/measurement.s 10464 10440 -24 -0.229%
cmd/vendor/golang.org/x/tools/go/types/typeutil.s 12319 12274 -45 -0.365%
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s 13503 13342 -161 -1.192%
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s 5261 5218 -43 -0.817%
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s 1462 1459 -3 -0.205%
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s 9594 9582 -12 -0.125%
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s 34397 34338 -59 -0.172%
cmd/vendor/github.com/google/pprof/internal/graph.s 53225 52936 -289 -0.543%
cmd/vendor/github.com/ianlancetaylor/demangle.s 177450 175329 -2121 -1.195%
crypto/x509.s 147892 147388 -504 -0.341%
cmd/go/internal/work.s 306465 304950 -1515 -0.494%
cmd/go/internal/run.s 4664 4657 -7 -0.150%
crypto/tls.s 313130 311833 -1297 -0.414%
net/http/httptrace.s 3979 3905 -74 -1.860%
net/smtp.s 14413 14344 -69 -0.479%
cmd/link/internal/ld.s 545343 542279 -3064 -0.562%
cmd/link/internal/mips.s 6218 6215 -3 -0.048%
cmd/link/internal/mips64.s 6108 6103 -5 -0.082%
cmd/link/internal/amd64.s 18154 18112 -42 -0.231%
cmd/link/internal/arm64.s 22527 22494 -33 -0.146%
cmd/link/internal/arm.s 22574 22494 -80 -0.354%
cmd/link/internal/s390x.s 20779 20746 -33 -0.159%
cmd/link/internal/wasm.s 16531 16493 -38 -0.230%
cmd/link/internal/x86.s 18906 18849 -57 -0.301%
cmd/link/internal/ppc64.s 26856 26778 -78 -0.290%
net/http.s 559101 556513 -2588 -0.463%
net/http/cookiejar.s 15912 15885 -27 -0.170%
expvar.s 9531 9525 -6 -0.063%
net/http/httptest.s 16616 16475 -141 -0.849%
net/http/cgi.s 23624 23458 -166 -0.703%
cmd/go/internal/web.s 16546 16489 -57 -0.344%
cmd/vendor/golang.org/x/mod/sumdb.s 33197 33117 -80 -0.241%
net/http/fcgi.s 19266 19169 -97 -0.503%
net/http/httputil.s 39875 39728 -147 -0.369%
cmd/vendor/github.com/google/pprof/internal/symbolz.s 5888 5867 -21 -0.357%
net/rpc.s 34154 34003 -151 -0.442%
cmd/vendor/github.com/google/pprof/internal/transport.s 2746 2716 -30 -1.092%
cmd/vendor/github.com/google/pprof/internal/binutils.s 35999 35875 -124 -0.344%
net/rpc/jsonrpc.s 6637 6598 -39 -0.588%
cmd/vendor/github.com/google/pprof/internal/symbolizer.s 11533 11458 -75 -0.650%
cmd/go/internal/get.s 62921 62803 -118 -0.188%
cmd/vendor/github.com/google/pprof/internal/report.s 80364 80058 -306 -0.381%
cmd/go/internal/modfetch/codehost.s 89680 89066 -614 -0.685%
cmd/trace.s 117171 116701 -470 -0.401%
cmd/vendor/github.com/google/pprof/internal/driver.s 144268 143297 -971 -0.673%
cmd/go/internal/modfetch.s 126299 125860 -439 -0.348%
cmd/vendor/github.com/google/pprof/driver.s 9042 9000 -42 -0.464%
cmd/go/internal/modconv.s 17947 17889 -58 -0.323%
cmd/pprof.s 12399 12326 -73 -0.589%
cmd/go/internal/modload.s 151182 150389 -793 -0.525%
cmd/go/internal/generate.s 11738 11636 -102 -0.869%
cmd/go/internal/help.s 6571 6531 -40 -0.609%
cmd/go/internal/clean.s 11174 11142 -32 -0.286%
cmd/go/internal/vet.s 7897 7867 -30 -0.380%
cmd/go/internal/envcmd.s 22176 22095 -81 -0.365%
cmd/go/internal/list.s 15216 15067 -149 -0.979%
cmd/go/internal/modget.s 38698 38519 -179 -0.463%
cmd/go/internal/modcmd.s 46674 46441 -233 -0.499%
cmd/go/internal/test.s 64664 64456 -208 -0.322%
cmd/go.s 6730 6703 -27 -0.401%
cmd/compile/internal/ssa.s 3592565 3582500 -10065 -0.280%
cmd/compile/internal/gc.s 1549123 1537123 -12000 -0.775%
cmd/compile/internal/riscv64.s 14579 14483 -96 -0.658%
cmd/compile/internal/mips.s 20578 20419 -159 -0.773%
cmd/compile/internal/ppc64.s 25524 25359 -165 -0.646%
cmd/compile/internal/mips64.s 19795 19636 -159 -0.803%
cmd/compile/internal/wasm.s 13329 13290 -39 -0.293%
cmd/compile/internal/s390x.s 28097 27892 -205 -0.730%
cmd/compile/internal/arm.s 31489 31321 -168 -0.534%
cmd/compile/internal/arm64.s 29803 29590 -213 -0.715%
cmd/compile/internal/amd64.s 32961 33221 +260 +0.789%
cmd/compile/internal/x86.s 31029 30878 -151 -0.487%
total 18534966 18440341 -94625 -0.511%
Change-Id: I830d37364f14f0297800adc42c99f60a74c51aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/226367
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure we don't use the rewrite ptr + (c + x) -> c + (ptr + x), as
that may create an ephemeral out-of-bounds pointer.
I have not seen an actual bug caused by this yet, but we've seen
them in the 386 port so I'm fixing this issue for amd64 as well.
The load-combining rules needed to be reworked somewhat to still
work without the above broken rule.
Update #37881
Change-Id: I8046d170e89e2035195f261535e34ca7d8aca68a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Retrying CL 222782, with a fix that will hopefully stop the random crashing.
The issue with the previous CL is that it does pointer arithmetic
in a way that may briefly generate an out-of-bounds pointer. If an
interrupt happens to occur in that state, the referenced object may
be collected incorrectly.
Suppose there was code that did s[x+c]. The previous CL had a rule
to the effect of ptr + (x + c) -> c + (ptr + x). But ptr+x is not
guaranteed to point to the same object as ptr. In contrast,
ptr+(x+c) is guaranteed to point to the same object as ptr, because
we would have already checked that x+c is in bounds.
For example, strconv.trim used to have this code:
MOVZX -0x1(BX)(DX*1), BP
CMPL $0x30, AL
After CL 222782, it had this code:
LEAL 0(BX)(DX*1), BP
CMPB $0x30, -0x1(BP)
An interrupt between those last two instructions could see BP pointing
outside the backing store of the slice involved.
It's really hard to actually demonstrate a bug. First, you need to
have an interrupt occur at exactly the right time. Then, there must
be no other pointers to the object in question. Since the interrupted
frame will be scanned conservatively, there can't even be a dead
pointer in another register or on the stack. (In the example above,
a bug can't happen because BX still holds the original pointer.)
Then, the object in question needs to be collected (or at least
scanned?) before the interrupted code continues.
This CL needs to handle load combining somewhat differently than CL 222782
because of the new restriction on arithmetic. That's the only real
difference (other than removing the bad rules) from that old CL.
This bug is also present in the amd64 rewrite rules, and we haven't
seen any crashing as a result. I will fix up that code similarly to
this one in a separate CL.
Update #37881
Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f
Reviewed-on: https://go-review.googlesource.com/c/go/+/225798
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change to the rules removes some unnecessary signed shifts
that appear in the math/rand functions. Existing rules did not
cover some of the signed cases.
A little improvement seen in math/rand due to removing 1 of 2
instructions generated for Int31n, which is inlined quite a bit.
Intn1000 46.9ns ± 0% 45.5ns ± 0% -2.99% (p=1.000 n=1+1)
Int63n1000 33.5ns ± 0% 32.8ns ± 0% -2.09% (p=1.000 n=1+1)
Int31n1000 32.7ns ± 0% 32.6ns ± 0% -0.31% (p=1.000 n=1+1)
Float32 32.7ns ± 0% 30.3ns ± 0% -7.34% (p=1.000 n=1+1)
Float64 21.7ns ± 0% 20.9ns ± 0% -3.69% (p=1.000 n=1+1)
Perm3 205ns ± 0% 202ns ± 0% -1.46% (p=1.000 n=1+1)
Perm30 1.71µs ± 0% 1.68µs ± 0% -1.35% (p=1.000 n=1+1)
Perm30ViaShuffle 1.65µs ± 0% 1.65µs ± 0% -0.30% (p=1.000 n=1+1)
ShuffleOverhead 2.83µs ± 0% 2.83µs ± 0% -0.07% (p=1.000 n=1+1)
Read3 18.7ns ± 0% 16.1ns ± 0% -13.90% (p=1.000 n=1+1)
Read64 126ns ± 0% 124ns ± 0% -1.59% (p=1.000 n=1+1)
Read1000 1.75µs ± 0% 1.63µs ± 0% -7.08% (p=1.000 n=1+1)
Change-Id: I11502dfca7d65aafc76749a8d713e9e50c24a858
Reviewed-on: https://go-review.googlesource.com/c/go/+/225917
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The load and test instructions compare the given value
against zero and will produce a condition code indicating
one of the following scenarios:
0: Result is zero
1: Result is less than zero
2: Result is greater than zero
3: Result is not a number (NaN)
The instruction can be used to simplify floating point comparisons
against zero, which can enable further optimizations.
This CL also reduces the size of .text section of math.test binary by around
0.7 KB (in hexadecimal, from 1358f0 to 135620).
Change-Id: I33cb714f0c6feebac7a1c46dfcc735e7daceff9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/209159
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit CL 222782.
Reason for revert: Reverting to see if 386 errors go away
Update #37881
Change-Id: I74f287404c52414db1b6ff1649effa4ed9e5cc0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/225218
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit CL 224837.
Reason for revert: Reverting partial reverts of 222782.
Update #37881
Change-Id: Ie9bf84d6e17ed214abe538965e5ff03936886826
Reviewed-on: https://go-review.googlesource.com/c/go/+/225217
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit CL 225057.
Reason for revert: Undoing partial reverts of CL 222782
Update #37881
Change-Id: Iee024cab2a580a37a0fc355e0e3c5ad3d8fdaf7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/225197
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
|
|
|
|
|
|
|
|
| |
Update #37881
Change-Id: I1f9a3f57f6215a19c31765c257ee78715eab36b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/225057
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rolling back portions of CL 222782 to see if that helps
issue #37881 any.
Update #37881
Change-Id: I9cc3ff8c469fa5e4b22daec715d04148033f46f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/224837
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit extends the -spectre flag to cmd/asm and adds
a new Spectre mitigation mode "ret", which enables the use
of retpolines.
Retpolines prevent speculation about the target of an indirect
jump or call and are described in more detail here:
https://support.google.com/faqs/answer/7625886
Change-Id: I4f2cb982fa94e44d91e49bd98974fd125619c93a
Reviewed-on: https://go-review.googlesource.com/c/go/+/222661
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a new cmd/compile flag -spectre,
which accepts a comma-separated list of possible
Spectre mitigations to apply, or the empty string (none),
or "all". The only known mitigation right now is "index",
which uses conditional moves to ensure that x86-64 CPUs
do not speculate past index bounds checks.
Speculating past index bounds checks may be problematic
on systems running privileged servers that accept requests
from untrusted users who can execute their own programs
on the same machine. (And some more constraints that
make it even more unlikely in practice.)
The cases this protects against are analogous to the ones
Microsoft explains in the "Array out of bounds load/store feeding ..."
sections here:
https://docs.microsoft.com/en-us/cpp/security/developer-guidance-speculative-execution?view=vs-2019#array-out-of-bounds-load-feeding-an-indirect-branch
Change-Id: Ib7532d7e12466b17e04c4e2075c2a456dc98f610
Reviewed-on: https://go-review.googlesource.com/c/go/+/222660
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
Update #36468
Change-Id: Idfdb845d097994689be450d6e8a57fa9adb57166
Reviewed-on: https://go-review.googlesource.com/c/go/+/222782
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
|