<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/libgit2.git/tests/merge, branch ethomson/test_https</title>
<subtitle>github.com: libgit2/libgit2.git
</subtitle>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/'/>
<entry>
<title>tests: include function declarations</title>
<updated>2021-11-11T22:31:43+00:00</updated>
<author>
<name>Edward Thomson</name>
<email>ethomson@edwardthomson.com</email>
</author>
<published>2021-11-11T19:58:49+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=8d2b31109d980462e1853bef3edf08ddecd094b2'/>
<id>8d2b31109d980462e1853bef3edf08ddecd094b2</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: declare functions statically where appropriate</title>
<updated>2021-11-11T22:31:43+00:00</updated>
<author>
<name>Edward Thomson</name>
<email>ethomson@edwardthomson.com</email>
</author>
<published>2021-11-11T18:28:08+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=ca14942e19788bd01334af64554c3095f3ff0d4a'/>
<id>ca14942e19788bd01334af64554c3095f3ff0d4a</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>path: separate git-specific path functions from util</title>
<updated>2021-11-09T15:17:17+00:00</updated>
<author>
<name>Edward Thomson</name>
<email>ethomson@github.com</email>
</author>
<published>2021-10-31T13:45:46+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=95117d4744cf5a66f2bcde7991a925e9852d9b1e'/>
<id>95117d4744cf5a66f2bcde7991a925e9852d9b1e</id>
<content type='text'>
Introduce `git_fs_path`, which operates on generic filesystem paths.
`git_path` will be kept for only git-specific path functionality (for
example, checking for `.git` in a path).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce `git_fs_path`, which operates on generic filesystem paths.
`git_path` will be kept for only git-specific path functionality (for
example, checking for `.git` in a path).
</pre>
</div>
</content>
</entry>
<entry>
<title>str: introduce `git_str` for internal, `git_buf` is external</title>
<updated>2021-10-17T13:49:01+00:00</updated>
<author>
<name>Edward Thomson</name>
<email>ethomson@edwardthomson.com</email>
</author>
<published>2021-09-07T21:53:49+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=f0e693b18afbe1de37d7da5b5a8967b6c87d8e53'/>
<id>f0e693b18afbe1de37d7da5b5a8967b6c87d8e53</id>
<content type='text'>
libgit2 has two distinct requirements that were previously solved by
`git_buf`.  We require:

1. A general purpose string class that provides a number of utility APIs
   for manipulating data (eg, concatenating, truncating, etc).
2. A structure that we can use to return strings to callers that they
   can take ownership of.

By using a single class (`git_buf`) for both of these purposes, we have
confused the API to the point that refactorings are difficult and
reasoning about correctness is also difficult.

Move the utility class `git_buf` to be called `git_str`: this represents
its general purpose, as an internal string buffer class.  The name also
is an homage to Junio Hamano ("gitstr").

The public API remains `git_buf`, and has a much smaller footprint.  It
is generally only used as an "out" param with strict requirements that
follow the documentation.  (Exceptions exist for some legacy APIs to
avoid breaking callers unnecessarily.)

Utility functions exist to convert a user-specified `git_buf` to a
`git_str` so that we can call internal functions, then converting it
back again.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
libgit2 has two distinct requirements that were previously solved by
`git_buf`.  We require:

1. A general purpose string class that provides a number of utility APIs
   for manipulating data (eg, concatenating, truncating, etc).
2. A structure that we can use to return strings to callers that they
   can take ownership of.

By using a single class (`git_buf`) for both of these purposes, we have
confused the API to the point that refactorings are difficult and
reasoning about correctness is also difficult.

Move the utility class `git_buf` to be called `git_str`: this represents
its general purpose, as an internal string buffer class.  The name also
is an homage to Junio Hamano ("gitstr").

The public API remains `git_buf`, and has a much smaller footprint.  It
is generally only used as an "out" param with strict requirements that
follow the documentation.  (Exceptions exist for some legacy APIs to
avoid breaking callers unnecessarily.)

Utility functions exist to convert a user-specified `git_buf` to a
`git_str` so that we can call internal functions, then converting it
back again.
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: don't generate false positives on empty path segments</title>
<updated>2021-08-08T11:34:06+00:00</updated>
<author>
<name>Peter Pettersson</name>
<email>boretrk@hotmail.com</email>
</author>
<published>2021-08-08T11:34:06+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=d095502ed797e20a73a00b65cc9d70d91f7d7ab4'/>
<id>d095502ed797e20a73a00b65cc9d70d91f7d7ab4</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>tests: merge: fix printf formatter on 32 bit arches</title>
<updated>2020-05-12T09:41:44+00:00</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2020-05-12T09:41:44+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=0cf9b6668c8ac5021cb36c6458c3ed243e6251bd'/>
<id>0cf9b6668c8ac5021cb36c6458c3ed243e6251bd</id>
<content type='text'>
We currently use `PRIuMAX` to print an integer of type `size_t` in
merge::trees::rename::cache_recomputation. While this works just fine on
64 bit arches, it doesn't on 32 bit ones. As a result, our nightly
builds on x86 and arm32 fail.

Fix the issue by using `PRIuZ` instead.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We currently use `PRIuMAX` to print an integer of type `size_t` in
merge::trees::rename::cache_recomputation. While this works just fine on
64 bit arches, it doesn't on 32 bit ones. As a result, our nightly
builds on x86 and arm32 fail.

Fix the issue by using `PRIuZ` instead.
</pre>
</div>
</content>
</entry>
<entry>
<title>merge: cache negative cache results for similarity metrics</title>
<updated>2020-04-01T13:16:18+00:00</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2020-04-01T13:16:18+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=4dfcc50fe88e06a49a4ccccb048d47fe6be01292'/>
<id>4dfcc50fe88e06a49a4ccccb048d47fe6be01292</id>
<content type='text'>
When computing renames, we cache the hash signatures for each of the
potentially conflicting entries so that we do not need to repeatedly
read the file and can at least halfway efficiently determine whether two
files are similar enough to be deemed a rename. In order to make the
hash signatures meaningful, we require at least four lines of data to be
present, resulting in at least four different hashes that can be
compared. Files that are deemed too small are not cached at all and
will thus be repeatedly re-hashed, which is usually not a huge issue.

The issue with above heuristic is in case a file does _not_ have at
least four lines, where a line is anything separated by a consecutive
run of "\n" or "\0" characters. For example "a\nb" is two lines, but
"a\0\0b" is also just two lines. Taken to the extreme, a file that has
megabytes of consecutive space- or NUL-only may also be deemed as too
small and thus not get cached. As a result, we will repeatedly load its
blob, calculate its hash signature just to finally throw it away as we
notice it's not of any value. When you've got a comparitively big file
that you compare against a big set of potentially renamed files, then
the cost simply expodes.

The issue can be trivially fixed by introducing negative cache entries.
Whenever we determine that a given blob does not have a meaningful
representation via a hash signature, we store this negative cache marker
and will from then on not hash it again, but also ignore it as a
potential rename target. This should help the "normal" case already
where you have a lot of small files as rename candidates, but in the
above scenario it's savings are extraordinarily high.

To verify we do not hit the issue anymore with described solution, this
commit adds a test that uses the exact same setup described above with
one 50 megabyte blob of '\0' characters and 1000 other files that get
renamed. Without the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation &gt;/dev/null
real    11m48.377s
user    11m11.576s
sys     0m35.187s

And with the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation &gt;/dev/null
real    0m1.972s
user    0m1.851s
sys     0m0.118s

So this represents a ~350-fold performance improvement, but it obviously
depends on how many files you have and how big the blob is. The test
number were chosen in a way that one will immediately notice as soon as
the bug resurfaces.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When computing renames, we cache the hash signatures for each of the
potentially conflicting entries so that we do not need to repeatedly
read the file and can at least halfway efficiently determine whether two
files are similar enough to be deemed a rename. In order to make the
hash signatures meaningful, we require at least four lines of data to be
present, resulting in at least four different hashes that can be
compared. Files that are deemed too small are not cached at all and
will thus be repeatedly re-hashed, which is usually not a huge issue.

The issue with above heuristic is in case a file does _not_ have at
least four lines, where a line is anything separated by a consecutive
run of "\n" or "\0" characters. For example "a\nb" is two lines, but
"a\0\0b" is also just two lines. Taken to the extreme, a file that has
megabytes of consecutive space- or NUL-only may also be deemed as too
small and thus not get cached. As a result, we will repeatedly load its
blob, calculate its hash signature just to finally throw it away as we
notice it's not of any value. When you've got a comparitively big file
that you compare against a big set of potentially renamed files, then
the cost simply expodes.

The issue can be trivially fixed by introducing negative cache entries.
Whenever we determine that a given blob does not have a meaningful
representation via a hash signature, we store this negative cache marker
and will from then on not hash it again, but also ignore it as a
potential rename target. This should help the "normal" case already
where you have a lot of small files as rename candidates, but in the
above scenario it's savings are extraordinarily high.

To verify we do not hit the issue anymore with described solution, this
commit adds a test that uses the exact same setup described above with
one 50 megabyte blob of '\0' characters and 1000 other files that get
renamed. Without the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation &gt;/dev/null
real    11m48.377s
user    11m11.576s
sys     0m35.187s

And with the negative cache:

$ time ./libgit2_clar -smerge::trees::renames::cache_recomputation &gt;/dev/null
real    0m1.972s
user    0m1.851s
sys     0m0.118s

So this represents a ~350-fold performance improvement, but it obviously
depends on how many files you have and how big the blob is. The test
number were chosen in a way that one will immediately notice as soon as
the bug resurfaces.
</pre>
</div>
</content>
</entry>
<entry>
<title>merge: update enum type name for consistency</title>
<updated>2020-01-18T14:05:24+00:00</updated>
<author>
<name>Edward Thomson</name>
<email>ethomson@edwardthomson.com</email>
</author>
<published>2020-01-18T14:03:23+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=94beb3a3f4a9bad33476673468f52d6caf81acc0'/>
<id>94beb3a3f4a9bad33476673468f52d6caf81acc0</id>
<content type='text'>
libgit2 does not use `type_t` suffixes as it's redundant; thus, rename
`git_merge_diff_type_t` to `git_merge_diff_t` for consistency.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
libgit2 does not use `type_t` suffixes as it's redundant; thus, rename
`git_merge_diff_type_t` to `git_merge_diff_t` for consistency.
</pre>
</div>
</content>
</entry>
<entry>
<title>fileops: rename to "futils.h" to match function signatures</title>
<updated>2019-07-20T17:11:20+00:00</updated>
<author>
<name>Patrick Steinhardt</name>
<email>ps@pks.im</email>
</author>
<published>2019-06-29T07:17:32+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=e54343a4024e75dfaa940e652c2c9799d33634b2'/>
<id>e54343a4024e75dfaa940e652c2c9799d33634b2</id>
<content type='text'>
Our file utils functions all have a "futils" prefix, e.g.
`git_futils_touch`. One would thus naturally guess that their
definitions and implementation would live in files "futils.h" and
"futils.c", respectively, but in fact they live in "fileops.h".

Rename the files to match expectations.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Our file utils functions all have a "futils" prefix, e.g.
`git_futils_touch`. One would thus naturally guess that their
definitions and implementation would live in files "futils.h" and
"futils.c", respectively, but in fact they live in "fileops.h".

Rename the files to match expectations.
</pre>
</div>
</content>
</entry>
<entry>
<title>blob: add underscore to `from` functions</title>
<updated>2019-06-15T23:16:49+00:00</updated>
<author>
<name>Edward Thomson</name>
<email>ethomson@edwardthomson.com</email>
</author>
<published>2019-06-08T22:46:04+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/libgit2.git/commit/?id=08f392080cfd7e08972820d8147dc06432e7c4bf'/>
<id>08f392080cfd7e08972820d8147dc06432e7c4bf</id>
<content type='text'>
The majority of functions are named `from_something` (with an
underscore) instead of `fromsomething`.  Update the blob functions for
consistency with the rest of the library.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The majority of functions are named `from_something` (with an
underscore) instead of `fromsomething`.  Update the blob functions for
consistency with the rest of the library.
</pre>
</div>
</content>
</entry>
</feed>
