<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/go-git.git/src/hash, branch dev.inline</title>
<subtitle>github.com: golang/go
</subtitle>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/'/>
<entry>
<title>hash/crc32: cleanup code and improve tests</title>
<updated>2016-08-31T15:17:57+00:00</updated>
<author>
<name>Radu Berinde</name>
<email>radu@cockroachlabs.com</email>
</author>
<published>2016-08-28T18:36:06+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=bdde10137b3e67383f38329b02b329a906b78d5d'/>
<id>bdde10137b3e67383f38329b02b329a906b78d5d</id>
<content type='text'>
Major reorganization of the crc32 code:

 - The arch-specific files now implement a well-defined interface
   (documented in crc32.go). They no longer have the responsibility of
   initializing and falling back to a non-accelerated implementation;
   instead, that happens in the higher level code.

 - The non-accelerated algorithms are moved to a separate file with no
   dependencies on other code.

 - The "cutoff" optimization for slicing-by-8 is moved inside the
   algorithm itself (as opposed to every callsite).

Tests are significantly improved:
 - direct tests for the non-accelerated algorithms.
 - "cross-check" tests for arch-specific implementations (all archs).
 - tests for misaligned buffers for both IEEE and Castagnoli.

Fixes #16909.

Change-Id: I9b6dd83b7a57cd615eae901c0a6d61c6b8091c74
Reviewed-on: https://go-review.googlesource.com/27935
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Major reorganization of the crc32 code:

 - The arch-specific files now implement a well-defined interface
   (documented in crc32.go). They no longer have the responsibility of
   initializing and falling back to a non-accelerated implementation;
   instead, that happens in the higher level code.

 - The non-accelerated algorithms are moved to a separate file with no
   dependencies on other code.

 - The "cutoff" optimization for slicing-by-8 is moved inside the
   algorithm itself (as opposed to every callsite).

Tests are significantly improved:
 - direct tests for the non-accelerated algorithms.
 - "cross-check" tests for arch-specific implementations (all archs).
 - tests for misaligned buffers for both IEEE and Castagnoli.

Fixes #16909.

Change-Id: I9b6dd83b7a57cd615eae901c0a6d61c6b8091c74
Reviewed-on: https://go-review.googlesource.com/27935
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hash/crc32: fix nil Castagnoli table problem</title>
<updated>2016-08-28T19:01:07+00:00</updated>
<author>
<name>Radu Berinde</name>
<email>radu@cockroachlabs.com</email>
</author>
<published>2016-08-28T15:47:14+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=8c15a1725147692e1106f2e32fae657e1b7f27aa'/>
<id>8c15a1725147692e1106f2e32fae657e1b7f27aa</id>
<content type='text'>
When SSE is available, we don't need the Table. However, it is
returned as a handle by MakeTable. Fix this to always generate
the table.

Further cleanup is discussed in #16909.

Change-Id: Ic05400d68c6b5d25073ebd962000451746137afc
Reviewed-on: https://go-review.googlesource.com/27934
Reviewed-by: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
Run-TryBot: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When SSE is available, we don't need the Table. However, it is
returned as a handle by MakeTable. Fix this to always generate
the table.

Further cleanup is discussed in #16909.

Change-Id: Ic05400d68c6b5d25073ebd962000451746137afc
Reviewed-on: https://go-review.googlesource.com/27934
Reviewed-by: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
Run-TryBot: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hash/crc32: improve the AMD64 implementation using SSE4.2</title>
<updated>2016-08-28T01:39:03+00:00</updated>
<author>
<name>Radu Berinde</name>
<email>radu@cockroachlabs.com</email>
</author>
<published>2016-08-27T17:17:30+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=90c3cf4b52cc9373a96009da0d013019c1f5bcd8'/>
<id>90c3cf4b52cc9373a96009da0d013019c1f5bcd8</id>
<content type='text'>
The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.

Additionally, we no longer initialize the software tables if SSE4.2 is
available.

Adding a test for the SSE implementation (restricted to amd64 and
amd64p32).

Benchmarks on a Haswell i5-4670 @ 3.4 GHz:

name                           old time/op    new time/op     delta
CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%

name                           old speed      new speed       delta
CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%

Fixes #16107.

Change-Id: Ibe50ea76574674ce0571ef31c31015e0ed66b907
Reviewed-on: https://go-review.googlesource.com/27931
Run-TryBot: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.

Additionally, we no longer initialize the software tables if SSE4.2 is
available.

Adding a test for the SSE implementation (restricted to amd64 and
amd64p32).

Benchmarks on a Haswell i5-4670 @ 3.4 GHz:

name                           old time/op    new time/op     delta
CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%

name                           old speed      new speed       delta
CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%

Fixes #16107.

Change-Id: Ibe50ea76574674ce0571ef31c31015e0ed66b907
Reviewed-on: https://go-review.googlesource.com/27931
Run-TryBot: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "hash/crc32: improve the AMD64 implementation using SSE4.2"</title>
<updated>2016-08-27T16:49:02+00:00</updated>
<author>
<name>Keith Randall</name>
<email>khr@golang.org</email>
</author>
<published>2016-08-27T16:48:22+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=3427f16642a1c207db4a4c3cce912dfdce2ac9f5'/>
<id>3427f16642a1c207db4a4c3cce912dfdce2ac9f5</id>
<content type='text'>
This reverts commit 54d7de7dd62bab764125c48fd159bb938da078e1.

It was breaking non-amd64 builds.

Change-Id: I22650e922498eeeba3d4fa08bb4ea40a210c8f97
Reviewed-on: https://go-review.googlesource.com/27925
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 54d7de7dd62bab764125c48fd159bb938da078e1.

It was breaking non-amd64 builds.

Change-Id: I22650e922498eeeba3d4fa08bb4ea40a210c8f97
Reviewed-on: https://go-review.googlesource.com/27925
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hash/crc32: improve the AMD64 implementation using SSE4.2</title>
<updated>2016-08-27T15:50:28+00:00</updated>
<author>
<name>Radu Berinde</name>
<email>radu@cockroachlabs.com</email>
</author>
<published>2016-08-23T21:33:21+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=54d7de7dd62bab764125c48fd159bb938da078e1'/>
<id>54d7de7dd62bab764125c48fd159bb938da078e1</id>
<content type='text'>
The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.

Additionally, we no longer initialize the software tables if SSE4.2 is
available.

Benchmarks on a Haswell i5-4670 @ 3.4 GHz:

name                           old time/op    new time/op     delta
CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%

name                           old speed      new speed       delta
CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%

Fixes #16107.

Change-Id: I8fa827ec03f708ba27ee71c833f7544ad9dc5bc3
Reviewed-on: https://go-review.googlesource.com/24471
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The algorithm is explained in the comments. The improvement in
throughput is about 1.4x for buffers between 500b-4Kb and 2.5x-2.6x
for larger buffers.

Additionally, we no longer initialize the software tables if SSE4.2 is
available.

Benchmarks on a Haswell i5-4670 @ 3.4 GHz:

name                           old time/op    new time/op     delta
CastagnoliCrc15B-4               21.9ns ± 1%     22.9ns ± 0%    +4.45%
CastagnoliCrc15BMisaligned-4     22.6ns ± 0%     23.4ns ± 0%    +3.43%
CastagnoliCrc40B-4               23.3ns ± 0%     23.9ns ± 0%    +2.58%
CastagnoliCrc40BMisaligned-4     25.4ns ± 0%     26.1ns ± 0%    +2.86%
CastagnoliCrc512-4               72.6ns ± 0%     52.8ns ± 0%   -27.33%
CastagnoliCrc512Misaligned-4     76.3ns ± 1%     56.3ns ± 0%   -26.18%
CastagnoliCrc1KB-4                128ns ± 1%       89ns ± 0%   -30.04%
CastagnoliCrc1KBMisaligned-4      130ns ± 0%       88ns ± 0%   -32.65%
CastagnoliCrc4KB-4                461ns ± 0%      187ns ± 0%   -59.40%
CastagnoliCrc4KBMisaligned-4      463ns ± 0%      191ns ± 0%   -58.77%
CastagnoliCrc32KB-4              3.58µs ± 0%     1.35µs ± 0%   -62.22%
CastagnoliCrc32KBMisaligned-4    3.58µs ± 0%     1.36µs ± 0%   -61.84%

name                           old speed      new speed       delta
CastagnoliCrc15B-4              684MB/s ± 1%    655MB/s ± 0%    -4.32%
CastagnoliCrc15BMisaligned-4    663MB/s ± 0%    641MB/s ± 0%    -3.32%
CastagnoliCrc40B-4             1.72GB/s ± 0%   1.67GB/s ± 0%    -2.69%
CastagnoliCrc40BMisaligned-4   1.58GB/s ± 0%   1.53GB/s ± 0%    -2.82%
CastagnoliCrc512-4             7.05GB/s ± 0%   9.70GB/s ± 0%   +37.59%
CastagnoliCrc512Misaligned-4   6.71GB/s ± 1%   9.09GB/s ± 0%   +35.43%
CastagnoliCrc1KB-4             7.98GB/s ± 1%  11.46GB/s ± 0%   +43.55%
CastagnoliCrc1KBMisaligned-4   7.86GB/s ± 0%  11.70GB/s ± 0%   +48.75%
CastagnoliCrc4KB-4             8.87GB/s ± 0%  21.80GB/s ± 0%  +145.69%
CastagnoliCrc4KBMisaligned-4   8.83GB/s ± 0%  21.39GB/s ± 0%  +142.25%
CastagnoliCrc32KB-4            9.15GB/s ± 0%  24.22GB/s ± 0%  +164.62%
CastagnoliCrc32KBMisaligned-4  9.16GB/s ± 0%  24.00GB/s ± 0%  +161.94%

Fixes #16107.

Change-Id: I8fa827ec03f708ba27ee71c833f7544ad9dc5bc3
Reviewed-on: https://go-review.googlesource.com/24471
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hash/crc32: fix optimized s390x implementation</title>
<updated>2016-08-21T02:04:43+00:00</updated>
<author>
<name>Michael Munday</name>
<email>munday@ca.ibm.com</email>
</author>
<published>2016-08-21T01:09:53+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=4b17b152a3a3d238669c93b31de34e87c2855f6e'/>
<id>4b17b152a3a3d238669c93b31de34e87c2855f6e</id>
<content type='text'>
The code wasn't checking to see if the data was still &gt;= 64 bytes
long after aligning it.

Aligning the data is an optimization and we don't actually need
to do it. In fact for smaller sizes it slows things down due to
the overhead of calling the generic function. Therefore for now
I have simply removed the alignment stage. I have also added a
check into the assembly to deliberately trigger a segmentation
fault if the data is too short.

Fixes #16779.

Change-Id: Ic01636d775efc5ec97689f050991cee04ce8fe73
Reviewed-on: https://go-review.googlesource.com/27409
Reviewed-by: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The code wasn't checking to see if the data was still &gt;= 64 bytes
long after aligning it.

Aligning the data is an optimization and we don't actually need
to do it. In fact for smaller sizes it slows things down due to
the overhead of calling the generic function. Therefore for now
I have simply removed the alignment stage. I have also added a
check into the assembly to deliberately trigger a segmentation
fault if the data is too short.

Fixes #16779.

Change-Id: Ic01636d775efc5ec97689f050991cee04ce8fe73
Reviewed-on: https://go-review.googlesource.com/27409
Reviewed-by: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hash/crc32: improve the processing of the last bytes in the SSE4.2 code for AMD64</title>
<updated>2016-08-17T21:20:50+00:00</updated>
<author>
<name>Radu Berinde</name>
<email>radu@cockroachlabs.com</email>
</author>
<published>2016-08-16T12:05:39+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=0c819b654f0e2b24f418aeac5c5627516905eda9'/>
<id>0c819b654f0e2b24f418aeac5c5627516905eda9</id>
<content type='text'>
This commit improves the processing of the final few bytes in
castagnoliSSE42: instead of processing one byte at a time, we use all
versions of the CRC32 instruction to process 4 bytes, then 2, then 1.
The difference is only noticeable for small "odd" sized buffers.

We do the similar improvement for processing the first few bytes in
the case of unaligned buffer.

Fixing the test which was not actually verifying the results for
misaligned buffers (WriteString was creating an internal copy which
was aligned).

Adding benchmarks for length 15 (aligned and misaligned), results
below.

name                          old time/op    new time/op    delta
CastagnoliCrc15B-4              25.1ns ± 0%    22.1ns ± 1%  -12.14%
CastagnoliCrc15BMisaligned-4    25.2ns ± 0%    22.9ns ± 1%   -9.03%
CastagnoliCrc40B-4              23.1ns ± 0%    23.4ns ± 0%   +1.08%
CastagnoliCrc1KB-4               127ns ± 0%     128ns ± 0%   +1.18%
CastagnoliCrc4KB-4               462ns ± 0%     464ns ± 0%     ~
CastagnoliCrc32KB-4             3.58µs ± 0%    3.60µs ± 0%   +0.58%

name                          old speed      new speed      delta
CastagnoliCrc15B-4             597MB/s ± 0%   679MB/s ± 1%  +13.77%
CastagnoliCrc15BMisaligned-4   596MB/s ± 0%   655MB/s ± 1%   +9.94%
CastagnoliCrc40B-4            1.73GB/s ± 0%  1.71GB/s ± 0%   -1.14%
CastagnoliCrc1KB-4            8.01GB/s ± 0%  7.93GB/s ± 1%   -1.06%
CastagnoliCrc4KB-4            8.86GB/s ± 0%  8.83GB/s ± 0%     ~
CastagnoliCrc32KB-4           9.14GB/s ± 0%  9.09GB/s ± 0%   -0.58%

Change-Id: I499e37af2241d28e3e5d522bbab836c1a718430a
Reviewed-on: https://go-review.googlesource.com/24470
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit improves the processing of the final few bytes in
castagnoliSSE42: instead of processing one byte at a time, we use all
versions of the CRC32 instruction to process 4 bytes, then 2, then 1.
The difference is only noticeable for small "odd" sized buffers.

We do the similar improvement for processing the first few bytes in
the case of unaligned buffer.

Fixing the test which was not actually verifying the results for
misaligned buffers (WriteString was creating an internal copy which
was aligned).

Adding benchmarks for length 15 (aligned and misaligned), results
below.

name                          old time/op    new time/op    delta
CastagnoliCrc15B-4              25.1ns ± 0%    22.1ns ± 1%  -12.14%
CastagnoliCrc15BMisaligned-4    25.2ns ± 0%    22.9ns ± 1%   -9.03%
CastagnoliCrc40B-4              23.1ns ± 0%    23.4ns ± 0%   +1.08%
CastagnoliCrc1KB-4               127ns ± 0%     128ns ± 0%   +1.18%
CastagnoliCrc4KB-4               462ns ± 0%     464ns ± 0%     ~
CastagnoliCrc32KB-4             3.58µs ± 0%    3.60µs ± 0%   +0.58%

name                          old speed      new speed      delta
CastagnoliCrc15B-4             597MB/s ± 0%   679MB/s ± 1%  +13.77%
CastagnoliCrc15BMisaligned-4   596MB/s ± 0%   655MB/s ± 1%   +9.94%
CastagnoliCrc40B-4            1.73GB/s ± 0%  1.71GB/s ± 0%   -1.14%
CastagnoliCrc1KB-4            8.01GB/s ± 0%  7.93GB/s ± 1%   -1.06%
CastagnoliCrc4KB-4            8.86GB/s ± 0%  8.83GB/s ± 0%     ~
CastagnoliCrc32KB-4           9.14GB/s ± 0%  9.09GB/s ± 0%   -0.58%

Change-Id: I499e37af2241d28e3e5d522bbab836c1a718430a
Reviewed-on: https://go-review.googlesource.com/24470
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hash/crc64: Use slicing by 8.</title>
<updated>2016-05-18T14:38:04+00:00</updated>
<author>
<name>Ilya Tocar</name>
<email>ilya.tocar@intel.com</email>
</author>
<published>2016-04-19T16:05:53+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=9d73e146dade6553f2365de2ada0156dcb6026d9'/>
<id>9d73e146dade6553f2365de2ada0156dcb6026d9</id>
<content type='text'>
Similar to crc32 slicing by 8.
This also fixes a Crc64KB benchmark actually using 1024 bytes.

Crc64/ISO64KB-4       147µs ± 0%      37µs ± 0%   -75.05%  (p=0.000 n=18+18)
Crc64/ISO4KB-4       9.19µs ± 0%    2.33µs ± 0%   -74.70%  (p=0.000 n=19+20)
Crc64/ISO1KB-4       2.31µs ± 0%    0.60µs ± 0%   -73.81%  (p=0.000 n=19+15)
Crc64/ECMA64KB-4      147µs ± 0%      37µs ± 0%   -75.05%  (p=0.000 n=20+20)
Crc64/Random64KB-4    147µs ± 0%      41µs ± 0%   -72.17%  (p=0.000 n=20+18)
Crc64/Random16KB-4   36.7µs ± 0%    36.5µs ± 0%    -0.54%  (p=0.000 n=18+19)

name                old speed     new speed      delta
Crc64/ISO64KB-4     446MB/s ± 0%  1788MB/s ± 0%  +300.72%  (p=0.000 n=18+18)
Crc64/ISO4KB-4      446MB/s ± 0%  1761MB/s ± 0%  +295.20%  (p=0.000 n=18+20)
Crc64/ISO1KB-4      444MB/s ± 0%  1694MB/s ± 0%  +281.46%  (p=0.000 n=19+20)
Crc64/ECMA64KB-4    446MB/s ± 0%  1788MB/s ± 0%  +300.77%  (p=0.000 n=20+20)
Crc64/Random64KB-4  446MB/s ± 0%  1603MB/s ± 0%  +259.32%  (p=0.000 n=20+18)
Crc64/Random16KB-4  446MB/s ± 0%   448MB/s ± 0%    +0.54%  (p=0.000 n=18+20)

Change-Id: I1c7621d836c486d6bfc41dbe1ec2ff9ab11aedfc
Reviewed-on: https://go-review.googlesource.com/22222
Run-TryBot: Ilya Tocar &lt;ilya.tocar@intel.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Russ Cox &lt;rsc@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similar to crc32 slicing by 8.
This also fixes a Crc64KB benchmark actually using 1024 bytes.

Crc64/ISO64KB-4       147µs ± 0%      37µs ± 0%   -75.05%  (p=0.000 n=18+18)
Crc64/ISO4KB-4       9.19µs ± 0%    2.33µs ± 0%   -74.70%  (p=0.000 n=19+20)
Crc64/ISO1KB-4       2.31µs ± 0%    0.60µs ± 0%   -73.81%  (p=0.000 n=19+15)
Crc64/ECMA64KB-4      147µs ± 0%      37µs ± 0%   -75.05%  (p=0.000 n=20+20)
Crc64/Random64KB-4    147µs ± 0%      41µs ± 0%   -72.17%  (p=0.000 n=20+18)
Crc64/Random16KB-4   36.7µs ± 0%    36.5µs ± 0%    -0.54%  (p=0.000 n=18+19)

name                old speed     new speed      delta
Crc64/ISO64KB-4     446MB/s ± 0%  1788MB/s ± 0%  +300.72%  (p=0.000 n=18+18)
Crc64/ISO4KB-4      446MB/s ± 0%  1761MB/s ± 0%  +295.20%  (p=0.000 n=18+20)
Crc64/ISO1KB-4      444MB/s ± 0%  1694MB/s ± 0%  +281.46%  (p=0.000 n=19+20)
Crc64/ECMA64KB-4    446MB/s ± 0%  1788MB/s ± 0%  +300.77%  (p=0.000 n=20+20)
Crc64/Random64KB-4  446MB/s ± 0%  1603MB/s ± 0%  +259.32%  (p=0.000 n=20+18)
Crc64/Random16KB-4  446MB/s ± 0%   448MB/s ± 0%    +0.54%  (p=0.000 n=18+20)

Change-Id: I1c7621d836c486d6bfc41dbe1ec2ff9ab11aedfc
Reviewed-on: https://go-review.googlesource.com/22222
Run-TryBot: Ilya Tocar &lt;ilya.tocar@intel.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Russ Cox &lt;rsc@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hash/crc32: use vector instructions on s390x</title>
<updated>2016-04-22T18:07:15+00:00</updated>
<author>
<name>Chris Zou</name>
<email>chriszou@ca.ibm.com</email>
</author>
<published>2016-04-18T23:30:17+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=5833d843de4d774f6b76e982b170f42b8e94a198'/>
<id>5833d843de4d774f6b76e982b170f42b8e94a198</id>
<content type='text'>
The input buffer is aligned to a doubleword boundary to
improve performance of the vector instructions. The pure
Go implementation is used to align the input data, and is
also used when the vector instructions are not available
or the data length is less than 64 bytes.

Change-Id: Ie259a5f2f1562bcc17961c99e5776c99091d6bed
Reviewed-on: https://go-review.googlesource.com/22201
Reviewed-by: Michael Munday &lt;munday@ca.ibm.com&gt;
Run-TryBot: Michael Munday &lt;munday@ca.ibm.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Bill O'Farrell &lt;billotosyr@gmail.com&gt;
Reviewed-by: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The input buffer is aligned to a doubleword boundary to
improve performance of the vector instructions. The pure
Go implementation is used to align the input data, and is
also used when the vector instructions are not available
or the data length is less than 64 bytes.

Change-Id: Ie259a5f2f1562bcc17961c99e5776c99091d6bed
Reviewed-on: https://go-review.googlesource.com/22201
Reviewed-by: Michael Munday &lt;munday@ca.ibm.com&gt;
Run-TryBot: Michael Munday &lt;munday@ca.ibm.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
Reviewed-by: Bill O'Farrell &lt;billotosyr@gmail.com&gt;
Reviewed-by: Brad Fitzpatrick &lt;bradfitz@golang.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hash/adler32: Unroll loop for extra performance.</title>
<updated>2016-04-15T10:17:17+00:00</updated>
<author>
<name>Ilya Tocar</name>
<email>ilya.tocar@intel.com</email>
</author>
<published>2016-04-12T15:14:45+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/go-git.git/commit/?id=89a1f02834f1472cf307b222e14884ebd41086d3'/>
<id>89a1f02834f1472cf307b222e14884ebd41086d3</id>
<content type='text'>
name         old time/op    new time/op    delta
Adler32KB-4     592ns ± 0%     447ns ± 0%  -24.49%  (p=0.000 n=19+20)

name         old speed      new speed      delta
Adler32KB-4  1.73GB/s ± 0%  2.29GB/s ± 0%  +32.41%  (p=0.000 n=20+20)

Change-Id: I38990aa66ca4452a886200018a57c0bc3af30717
Reviewed-on: https://go-review.googlesource.com/21880
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
Run-TryBot: Ilya Tocar &lt;ilya.tocar@intel.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
name         old time/op    new time/op    delta
Adler32KB-4     592ns ± 0%     447ns ± 0%  -24.49%  (p=0.000 n=19+20)

name         old speed      new speed      delta
Adler32KB-4  1.73GB/s ± 0%  2.29GB/s ± 0%  +32.41%  (p=0.000 n=20+20)

Change-Id: I38990aa66ca4452a886200018a57c0bc3af30717
Reviewed-on: https://go-review.googlesource.com/21880
Reviewed-by: Keith Randall &lt;khr@golang.org&gt;
Run-TryBot: Ilya Tocar &lt;ilya.tocar@intel.com&gt;
TryBot-Result: Gobot Gobot &lt;gobot@golang.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
