| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL optimizes code that uses a carry from a function such as
bits.Add64 as the condition in an if statement. For example:
x, c := bits.Add64(a, b, 0)
if c != 0 {
panic("overflow")
}
Rather than converting the carry into a 0 or a 1 value and using
that as an input to a comparison instruction the carry flag is now
used as the input to a conditional branch directly. This typically
removes an ADD LOGICAL WITH CARRY instruction when user code is
doing overflow detection and is closer to the code that a user
would expect to generate.
Change-Id: I950431270955ab72f1b5c6db873b6abe769be0da
Reviewed-on: https://go-review.googlesource.com/c/go/+/219757
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Div functions in math/bits (Div, Div32, and Div64) compute both
quotients and remainders, but they panic if the quotients do not not
fit a 32/64 uint.
Since, on the other hand, the remainder will always fit the size of
the divisor, it is useful to have Div variants that only compute the
remainder, and don't panic on a quotient overflow.
This change adds to the math/bits package three new functions:
Rem(hi, lo, y uint) uint
Rem32(hi, lo, y uint32) uint32
Rem64(hi, lo, y uint64) uint64
which can be used to compute (hi,lo)%y even when the quotient
overflows the uint size.
Fixes #28970
Change-Id: I119948429f737670c5e5ceb8756121e6a738dbdc
Reviewed-on: https://go-review.googlesource.com/c/go/+/197838
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
constant time
Fixes #31267
Change-Id: I91e4aa8cf9d797689cb9612d0fe3bf1bb3ad15a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/178177
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
| |
Change-Id: Id87db9bed5e8715d554c1bf95c063d7d0a03c3e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/178117
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the extended precision add-with-carry and sub-with-carry operations
take a constant amount of time to execute, regardless of input.
name old time/op new time/op delta
Add-4 1.16ns ±11% 1.51ns ± 5% +30.52% (p=0.008 n=5+5)
Add32-4 1.08ns ± 0% 1.03ns ± 1% -4.86% (p=0.029 n=4+4)
Add64-4 1.09ns ± 1% 1.95ns ± 3% +79.23% (p=0.008 n=5+5)
Add64multiple-4 4.03ns ± 1% 4.55ns ±11% +13.07% (p=0.008 n=5+5)
Sub-4 1.08ns ± 1% 1.50ns ± 0% +38.17% (p=0.016 n=5+4)
Sub32-4 1.09ns ± 2% 1.53ns ±10% +40.26% (p=0.008 n=5+5)
Sub64-4 1.10ns ± 1% 1.47ns ± 1% +33.39% (p=0.008 n=5+5)
Sub64multiple-4 4.30ns ± 2% 4.08ns ± 4% -5.07% (p=0.032 n=5+5)
Fixes #31267
Change-Id: I1824b1b3ab8f09902ce8b5fef84ce2fdb8847ed9
Reviewed-on: https://go-review.googlesource.com/c/go/+/170758
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When building as part of the bootstrap process, avoid
use of "go:linkname" applied to variables, since this
feature is ill-defined/unsupported for gccgo.
Updates #30771.
Change-Id: Id44d01b5c98d292702e5075674117518cb59e2d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/170737
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
This CL deals with the additional comments of CL 159017.
Change-Id: I4ad3c60c834646d58dc0c544c741b92bfe83fb8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/168857
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL instrinsifies Add64 with arm64 instruction sequence ADDS, ADCS
and ADC, and optimzes the case of carry chains.The CL also changes the
test code so that the intrinsic implementation can be tested.
Benchmarks:
name old time/op new time/op delta
Add-224 2.500000ns +- 0% 2.090000ns +- 4% -16.40% (p=0.000 n=9+10)
Add32-224 2.500000ns +- 0% 2.500000ns +- 0% ~ (all equal)
Add64-224 2.500000ns +- 0% 1.577778ns +- 2% -36.89% (p=0.000 n=10+9)
Add64multiple-224 6.000000ns +- 0% 2.000000ns +- 0% -66.67% (p=0.000 n=10+10)
Change-Id: I6ee91c9a85c16cc72ade5fd94868c579f16c7615
Reviewed-on: https://go-review.googlesource.com/c/go/+/159017
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use ReverseBytes32 and ReverseBytes64 to speed up these functions.
The byte reversal functions are intrinsics on most platforms and
generally compile to a single instruction.
name old time/op new time/op delta
Reverse32 2.41ns ± 1% 1.94ns ± 3% -19.60% (p=0.000 n=20+19)
Reverse64 3.85ns ± 1% 2.56ns ± 1% -33.32% (p=0.000 n=17+19)
Change-Id: I160bf59a0c7bd5db94114803ec5a59fae448f096
Reviewed-on: https://go-review.googlesource.com/c/159358
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TrailingZeros16 is the only one of the TrailingZeros functions with a
named return value in the signature. This creates a sligthly
unpleasant effect in the godoc listing:
func TrailingZeros(x uint) int
func TrailingZeros16(x uint16) (n int)
func TrailingZeros32(x uint32) int
func TrailingZeros64(x uint64) int
func TrailingZeros8(x uint8) int
Since the named return value is not even used, remove it.
Change-Id: I15c5aedb6157003911b6e0685c357ce56e466c0e
Reviewed-on: https://go-review.googlesource.com/c/153340
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Div panics when y<=hi because either the quotient overflows
the size of the output or division by zero occurs when y==0.
This provides a uniform behavior for all implementations.
Fixes #28316
Change-Id: If23aeb10e0709ee1a60b7d614afc9103d674a980
Reviewed-on: https://go-review.googlesource.com/c/149517
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Explicitly check for divide-by-zero/overflow and panic with the appropriate
runtime error. The additional checks have basically no effect on performance
since the branch is easily predicted.
name old time/op new time/op delta
Div-4 53.9ns ± 1% 53.0ns ± 1% -1.59% (p=0.016 n=4+5)
Div32-4 17.9ns ± 0% 18.4ns ± 0% +2.56% (p=0.008 n=5+5)
Div64-4 53.5ns ± 0% 53.3ns ± 0% ~ (p=0.095 n=5+5)
Updates #28316
Change-Id: I36297ee9946cbbc57fefb44d1730283b049ecf57
Reviewed-on: https://go-review.googlesource.com/c/144377
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
name old time/op new time/op delta
Add-8 1.11ns ± 0% 1.18ns ± 0% +6.31% (p=0.029 n=4+4)
Add32-8 1.02ns ± 0% 1.02ns ± 1% ~ (p=0.333 n=4+5)
Add64-8 1.11ns ± 1% 1.17ns ± 0% +5.79% (p=0.008 n=5+5)
Add64multiple-8 4.35ns ± 1% 0.86ns ± 0% -80.22% (p=0.000 n=5+4)
The individual ops are a bit slower (but still very fast).
Using the ops in carry chains is very fast.
Update #28273
Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5
Reviewed-on: https://go-review.googlesource.com/c/144257
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
|
|
|
|
|
|
| |
Previously, the benchmark was measuring Add64 instead of Sub64.
Change-Id: I0cf30935c8a4728bead9868834377aae0b34f008
Reviewed-on: https://go-review.googlesource.com/c/144380
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Port math/big pure go versions of add-with-carry, subtract-with-borrow,
full-width multiply, and full-width divide.
Updates #24813
Change-Id: Ifae5d2f6ee4237137c9dcba931f69c91b80a4b1c
Reviewed-on: https://go-review.googlesource.com/123157
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes math/bits not have any explicit imports even
when compiling tests and thereby avoids import cycles when
dependencies of testing want to import math/bits.
Change-Id: I95eccae2f5c4310e9b18124abfa85212dfbd9daa
Reviewed-on: https://go-review.googlesource.com/110479
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Right rotation is achieved using negative k in RotateLeft*(x, k). Add
examples demonstrating that functionality.
Change-Id: I15dab159accd2937cb18d3fa8ca32da8501567d3
Reviewed-on: https://go-review.googlesource.com/75371
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
|
|
|
|
| |
Change-Id: Icbe6885ffd3aa4e77441ab03a2b9a04a9276d5eb
Reviewed-on: https://go-review.googlesource.com/68311
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
|
|
|
|
|
|
|
|
| |
Change-Id: Icdd0566d3b7dbc034256e16f8a6b6f1af07069b3
Reviewed-on: https://go-review.googlesource.com/54350
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
| |
Change-Id: I30563d31f6acea594cc853cc6b672ec664f90d48
Reviewed-on: https://go-review.googlesource.com/53636
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
| |
Change-Id: Iee1b3e116b4dcc4071d6512abc5241eabedaeb5c
Reviewed-on: https://go-review.googlesource.com/53850
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
| |
Erroneously called OnesCount instead of OnesCount64
Change-Id: Ie877e43f213253e45d31f64931c4a15915849586
Reviewed-on: https://go-review.googlesource.com/53410
Reviewed-by: Chris Broadfoot <cbro@golang.org>
|
|
|
|
|
|
|
| |
Change-Id: Ie673f9665825a40281c2584d478ba1260f725856
Reviewed-on: https://go-review.googlesource.com/53357
Run-TryBot: Chris Broadfoot <cbro@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>
|
|
|
|
|
|
|
|
| |
Change-Id: Ib491d144387a7675af370f7b925fe6e62440d153
Reviewed-on: https://go-review.googlesource.com/48966
Run-TryBot: Kevin Burke <kev@inburke.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Kevin Burke <kev@inburke.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds math/bits intrinsics for OnesCount, Len, TrailingZeros on
ppc64x.
benchmark old ns/op new ns/op delta
BenchmarkLeadingZeros-16 4.26 1.71 -59.86%
BenchmarkLeadingZeros16-16 3.04 1.83 -39.80%
BenchmarkLeadingZeros32-16 3.31 1.82 -45.02%
BenchmarkLeadingZeros64-16 3.69 1.71 -53.66%
BenchmarkTrailingZeros-16 2.55 1.62 -36.47%
BenchmarkTrailingZeros32-16 2.55 1.77 -30.59%
BenchmarkTrailingZeros64-16 2.78 1.62 -41.73%
BenchmarkOnesCount-16 3.19 0.93 -70.85%
BenchmarkOnesCount32-16 2.55 1.18 -53.73%
BenchmarkOnesCount64-16 3.22 0.93 -71.12%
Update #18616
I also made a change to bits_test.go because when debugging some failures
the output was not quite providing the right argument information.
Change-Id: Ia58d31d1777cf4582a4505f85b11a1202ca07d3e
Reviewed-on: https://go-review.googlesource.com/41630
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For details see the discussion on the issue below.
RotateLeft functions can now be inlined because the don't panic
anymore for negative rotation counts.
name old time/op new time/op delta
RotateLeft-8 6.72ns ± 2% 1.86ns ± 0% -72.33% (p=0.016 n=5+4)
RotateLeft8-8 4.41ns ± 2% 1.67ns ± 1% -62.15% (p=0.008 n=5+5)
RotateLeft16-8 4.46ns ± 6% 1.65ns ± 0% -63.06% (p=0.008 n=5+5)
RotateLeft32-8 4.50ns ± 5% 1.67ns ± 1% -62.86% (p=0.008 n=5+5)
RotateLeft64-8 4.54ns ± 1% 1.85ns ± 1% -59.32% (p=0.008 n=5+5)
https://perf.golang.org/search?q=upload:20170411.4
(Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.)
For #18616.
Change-Id: I0828d80d54ec24f8d44954a57b3d6aeedb69c686
Reviewed-on: https://go-review.googlesource.com/40394
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
| |
Removes an extra function call for TrailingZeroes and thus may
increase chances for inlining.
Change-Id: Iefd8d4402dc89b64baf4e5c865eb3dadade623af
Reviewed-on: https://go-review.googlesource.com/37613
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
benchmark old ns/op new ns/op delta
BenchmarkLeadingZeros-8 8.43 3.10 -63.23%
BenchmarkLeadingZeros8-8 8.13 1.33 -83.64%
BenchmarkLeadingZeros16-8 7.34 2.07 -71.80%
BenchmarkLeadingZeros32-8 7.99 2.87 -64.08%
BenchmarkLeadingZeros64-8 8.13 2.96 -63.59%
Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.
Change-Id: Id343531b408d42ac45f10c76f60e85bdb977f91e
Reviewed-on: https://go-review.googlesource.com/37582
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For sizes > 8, the existing code is faster.
benchmark old ns/op new ns/op delta
BenchmarkTrailingZeros8-8 1.95 1.29 -33.85%
Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.
Change-Id: I6f3a33ec633a2c544ec29693c141f2f99335c745
Reviewed-on: https://go-review.googlesource.com/37581
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For uint64, the existing algorithm is faster.
benchmark old ns/op new ns/op delta
BenchmarkOnesCount8-8 1.95 0.97 -50.26%
BenchmarkOnesCount16-8 2.54 1.39 -45.28%
BenchmarkOnesCount32-8 2.61 1.96 -24.90%
Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.
Change-Id: I6cc42882fef3d24694720464039161e339a9ae99
Reviewed-on: https://go-review.googlesource.com/37580
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Measured on 2.3 GHz Intel Core i7, running macOS 10.12.3:
benchmark old ns/op new ns/op delta
BenchmarkReverse8-8 1.70 0.99 -41.76%
BenchmarkReverse16-8 2.24 1.32 -41.07%
Fixes #19279.
Change-Id: I398cf8a3513b7fa63c130efc7846a7c5353999d4
Reviewed-on: https://go-review.googlesource.com/37459
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
| |
Change-Id: I3e40018ab1903d3b9ada7ad7812ba71ea2a428e7
Reviewed-on: https://go-review.googlesource.com/37456
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using some additional suggestions per "Hacker's Delight".
Added documentation and extra tests.
Measured on 1.7 GHz Intel Core i7, running macOS 10.12.3.
benchmark old ns/op new ns/op delta
BenchmarkOnesCount-4 7.34 5.38 -26.70%
BenchmarkOnesCount8-4 2.03 1.98 -2.46%
BenchmarkOnesCount16-4 2.56 2.50 -2.34%
BenchmarkOnesCount32-4 2.98 2.39 -19.80%
BenchmarkOnesCount64-4 4.22 2.96 -29.86%
Change-Id: I566b0ef766e55cf5776b1662b6016024ebe5d878
Reviewed-on: https://go-review.googlesource.com/37223
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BenchmarkLeadingZeros-8 200000000 8.80 ns/op
BenchmarkLeadingZeros8-8 200000000 8.21 ns/op
BenchmarkLeadingZeros16-8 200000000 7.49 ns/op
BenchmarkLeadingZeros32-8 200000000 7.80 ns/op
BenchmarkLeadingZeros64-8 200000000 8.67 ns/op
BenchmarkTrailingZeros-8 1000000000 2.05 ns/op
BenchmarkTrailingZeros8-8 2000000000 1.94 ns/op
BenchmarkTrailingZeros16-8 2000000000 1.94 ns/op
BenchmarkTrailingZeros32-8 2000000000 1.92 ns/op
BenchmarkTrailingZeros64-8 2000000000 2.03 ns/op
Change-Id: I45497bf2d6369ba6cfc88ded05aa735908af8908
Reviewed-on: https://go-review.googlesource.com/37220
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3.
benchmark old ns/op new ns/op delta
BenchmarkRotateLeft-8 7.87 7.00 -11.05%
BenchmarkRotateLeft8-8 8.41 4.52 -46.25%
BenchmarkRotateLeft16-8 8.07 4.55 -43.62%
BenchmarkRotateLeft32-8 8.36 4.73 -43.42%
BenchmarkRotateLeft64-8 7.93 4.78 -39.72%
BenchmarkRotateRight-8 8.23 6.72 -18.35%
BenchmarkRotateRight8-8 8.76 4.39 -49.89%
BenchmarkRotateRight16-8 9.07 4.44 -51.05%
BenchmarkRotateRight32-8 8.85 4.46 -49.60%
BenchmarkRotateRight64-8 8.11 4.43 -45.38%
Change-Id: I79ea1e9e6fc65f95794a91f860a911efed3aa8a1
Reviewed-on: https://go-review.googlesource.com/37219
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also: Changed Reverse/ReverseBytes implementations to use
the same (smaller) masks as OnesCount.
BenchmarkOnesCount-8 37.0 6.26 -83.08%
BenchmarkOnesCount8-8 7.24 1.99 -72.51%
BenchmarkOnesCount16-8 11.3 2.47 -78.14%
BenchmarkOnesCount32-8 18.4 3.02 -83.59%
BenchmarkOnesCount64-8 40.0 3.78 -90.55%
BenchmarkReverse-8 6.69 6.22 -7.03%
BenchmarkReverse8-8 1.64 1.64 +0.00%
BenchmarkReverse16-8 2.26 2.18 -3.54%
BenchmarkReverse32-8 2.88 2.87 -0.35%
BenchmarkReverse64-8 5.64 4.34 -23.05%
BenchmarkReverseBytes-8 2.48 2.17 -12.50%
BenchmarkReverseBytes16-8 0.63 0.95 +50.79%
BenchmarkReverseBytes32-8 1.13 1.24 +9.73%
BenchmarkReverseBytes64-8 2.50 2.16 -13.60%
OnesCount-8 37.0ns ± 0% 6.3ns ± 0% ~ (p=1.000 n=1+1)
OnesCount8-8 7.24ns ± 0% 1.99ns ± 0% ~ (p=1.000 n=1+1)
OnesCount16-8 11.3ns ± 0% 2.5ns ± 0% ~ (p=1.000 n=1+1)
OnesCount32-8 18.4ns ± 0% 3.0ns ± 0% ~ (p=1.000 n=1+1)
OnesCount64-8 40.0ns ± 0% 3.8ns ± 0% ~ (p=1.000 n=1+1)
Reverse-8 6.69ns ± 0% 6.22ns ± 0% ~ (p=1.000 n=1+1)
Reverse8-8 1.64ns ± 0% 1.64ns ± 0% ~ (all samples are equal)
Reverse16-8 2.26ns ± 0% 2.18ns ± 0% ~ (p=1.000 n=1+1)
Reverse32-8 2.88ns ± 0% 2.87ns ± 0% ~ (p=1.000 n=1+1)
Reverse64-8 5.64ns ± 0% 4.34ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes-8 2.48ns ± 0% 2.17ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes16-8 0.63ns ± 0% 0.95ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes32-8 1.13ns ± 0% 1.24ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes64-8 2.50ns ± 0% 2.16ns ± 0% ~ (p=1.000 n=1+1)
Change-Id: I591b0ffc83fc3a42828256b6e5030f32c64f9497
Reviewed-on: https://go-review.googlesource.com/37218
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- moved from: x&m>>k | x&^m<<k to: x&m>>k | x<<k&m
This permits use of the same constant m twice (*) which may be
better for machines that can't use large immediate constants
directly with an AND instruction and have to load them explicitly.
*) CPUs don't usually have a &^ instruction, so x&^m becomes x&(^m)
- simplified returns
This improves the generated code because the compiler recognizes
x>>k | x<<k as ROT when k is the bitsize of x.
The 8-bit versions of these instructions can be significantly faster
still if they are replaced with table lookups, as long as the table
is in cache. If the table is not in cache, table-lookup is probably
slower, hence the choice of an explicit register-only implementation
for now.
BenchmarkReverse-8 8.50 6.86 -19.29%
BenchmarkReverse8-8 2.17 1.74 -19.82%
BenchmarkReverse16-8 2.89 2.34 -19.03%
BenchmarkReverse32-8 3.55 2.95 -16.90%
BenchmarkReverse64-8 6.81 5.57 -18.21%
BenchmarkReverseBytes-8 3.49 2.48 -28.94%
BenchmarkReverseBytes16-8 0.93 0.62 -33.33%
BenchmarkReverseBytes32-8 1.55 1.13 -27.10%
BenchmarkReverseBytes64-8 2.47 2.47 +0.00%
Reverse-8 8.50ns ± 0% 6.86ns ± 0% ~ (p=1.000 n=1+1)
Reverse8-8 2.17ns ± 0% 1.74ns ± 0% ~ (p=1.000 n=1+1)
Reverse16-8 2.89ns ± 0% 2.34ns ± 0% ~ (p=1.000 n=1+1)
Reverse32-8 3.55ns ± 0% 2.95ns ± 0% ~ (p=1.000 n=1+1)
Reverse64-8 6.81ns ± 0% 5.57ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes-8 3.49ns ± 0% 2.48ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes16-8 0.93ns ± 0% 0.62ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes32-8 1.55ns ± 0% 1.13ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes64-8 2.47ns ± 0% 2.47ns ± 0% ~ (all samples are equal)
Change-Id: I0064de8c7e0e568ca7885d6f7064344bef91a06d
Reviewed-on: https://go-review.googlesource.com/37215
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sum up function results and store them in an exported (global)
variable. This prevents the compiler from optimizing away the
otherwise side-effect free function calls.
We now have more realistic set of benchmark numbers...
Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3.
Note: These measurements are based on the same "old"
implementation as the prior measurements (commit 7d5c003).
benchmark old ns/op new ns/op delta
BenchmarkReverse-8 72.9 8.50 -88.34%
BenchmarkReverse8-8 13.2 2.17 -83.56%
BenchmarkReverse16-8 21.2 2.89 -86.37%
BenchmarkReverse32-8 36.3 3.55 -90.22%
BenchmarkReverse64-8 71.3 6.81 -90.45%
BenchmarkReverseBytes-8 11.2 3.49 -68.84%
BenchmarkReverseBytes16-8 6.24 0.93 -85.10%
BenchmarkReverseBytes32-8 7.40 1.55 -79.05%
BenchmarkReverseBytes64-8 10.5 2.47 -76.48%
Reverse-8 72.9ns ± 0% 8.5ns ± 0% ~ (p=1.000 n=1+1)
Reverse8-8 13.2ns ± 0% 2.2ns ± 0% ~ (p=1.000 n=1+1)
Reverse16-8 21.2ns ± 0% 2.9ns ± 0% ~ (p=1.000 n=1+1)
Reverse32-8 36.3ns ± 0% 3.5ns ± 0% ~ (p=1.000 n=1+1)
Reverse64-8 71.3ns ± 0% 6.8ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes-8 11.2ns ± 0% 3.5ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes16-8 6.24ns ± 0% 0.93ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes32-8 7.40ns ± 0% 1.55ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes64-8 10.5ns ± 0% 2.5ns ± 0% ~ (p=1.000 n=1+1)
Change-Id: I8aef1334b84f6cafd25edccad7e6868b37969efb
Reviewed-on: https://go-review.googlesource.com/37213
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3.
benchmark old ns/op new ns/op delta
BenchmarkReverseBytes-8 11.4 3.51 -69.21%
BenchmarkReverseBytes16-8 6.87 0.64 -90.68%
BenchmarkReverseBytes32-8 7.79 0.65 -91.66%
BenchmarkReverseBytes64-8 11.6 0.64 -94.48%
name old time/op new time/op delta
ReverseBytes-8 11.4ns ± 0% 3.5ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes16-8 6.87ns ± 0% 0.64ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes32-8 7.79ns ± 0% 0.65ns ± 0% ~ (p=1.000 n=1+1)
ReverseBytes64-8 11.6ns ± 0% 0.6ns ± 0% ~ (p=1.000 n=1+1)
Change-Id: I67b529652b3b613c61687e9e185e8d4ee40c51a2
Reviewed-on: https://go-review.googlesource.com/37211
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3.
name old time/op new time/op delta
Reverse-8 76.6ns ± 0% 8.1ns ± 0% ~ (p=1.000 n=1+1)
Reverse8-8 12.6ns ± 0% 0.6ns ± 0% ~ (p=1.000 n=1+1)
Reverse16-8 20.8ns ± 0% 0.6ns ± 0% ~ (p=1.000 n=1+1)
Reverse32-8 36.5ns ± 0% 0.6ns ± 0% ~ (p=1.000 n=1+1)
Reverse64-8 74.0ns ± 0% 6.4ns ± 0% ~ (p=1.000 n=1+1)
benchmark old ns/op new ns/op delta
BenchmarkReverse-8 76.6 8.07 -89.46%
BenchmarkReverse8-8 12.6 0.64 -94.92%
BenchmarkReverse16-8 20.8 0.64 -96.92%
BenchmarkReverse32-8 36.5 0.64 -98.25%
BenchmarkReverse64-8 74.0 6.38 -91.38%
Change-Id: I6b99b10cee2f2babfe79342b50ee36a45a34da30
Reviewed-on: https://go-review.googlesource.com/37149
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-up on https://go-review.googlesource.com/36315.
No functionality change.
For #18616.
Change-Id: Id4df34dd7d0381be06eea483a11bf92f4a01f604
Reviewed-on: https://go-review.googlesource.com/37140
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
|
|
Initial platform-independent implementation.
For #18616.
Change-Id: I4585c55b963101af9059c06c1b8a866cb384754c
Reviewed-on: https://go-review.googlesource.com/36315
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|