<feed xmlns='http://www.w3.org/2005/Atom'>
<title>delta/git.git/list-objects.c, branch jk/string-list-static-init</title>
<subtitle>github.com: git/git.git
</subtitle>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/'/>
<entry>
<title>struct name_entry: use struct object_id instead of unsigned char sha1[20]</title>
<updated>2016-04-25T21:23:42+00:00</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2016-04-17T23:10:39+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=7d924c9139e33e7599d7aed0446e634c427a5f15'/>
<id>7d924c9139e33e7599d7aed0446e634c427a5f15</id>
<content type='text'>
Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>list-objects: pass full pathname to callbacks</title>
<updated>2016-02-12T20:51:17+00:00</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2016-02-11T22:28:36+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=de1e67d0703894cb6ea782e36abb63976ab07e60'/>
<id>de1e67d0703894cb6ea782e36abb63976ab07e60</id>
<content type='text'>
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.

So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.

This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.

So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.

This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>list-objects: drop name_path entirely</title>
<updated>2016-02-12T20:51:15+00:00</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2016-02-11T22:26:44+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=bd64516aca4d4e22acb33c71429d293a14d355cf'/>
<id>bd64516aca4d4e22acb33c71429d293a14d355cf</id>
<content type='text'>
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>list-objects: convert name_path to a strbuf</title>
<updated>2016-02-12T20:51:10+00:00</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2016-02-11T22:26:18+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=13528ab37cadb4d4f7384d0449489760912904b8'/>
<id>13528ab37cadb4d4f7384d0449489760912904b8</id>
<content type='text'>
The "struct name_path" data is examined in only two places:
we generate it in process_tree(), and we convert it to a
single string in path_name(). Everyone else just passes it
through to those functions.

We can further note that process_tree() already keeps a
single strbuf with the leading tree path, for use with
tree_entry_interesting().

Instead of building a separate name_path linked list, let's
just use the one we already build in "base". This reduces
the amount of code (especially tricky code in path_name()
which did not check for integer overflows caused by deep
or large pathnames).

It is also more efficient in some instances.  Any time we
were using tree_entry_interesting, we were building up the
strbuf anyway, so this is an immediate and obvious win
there. In cases where we were not, we trade off storing
"pathname/" in a strbuf on the heap for each level of the
path, instead of two pointers and an int on the stack (with
one pointer into the tree object). On a 64-bit system, the
latter is 20 bytes; so if path components are less than that
on average, this has lower peak memory usage.  In practice
it probably doesn't matter either way; we are already
holding in memory all of the tree objects leading up to each
pathname, and for normal-depth pathnames, we are only
talking about hundreds of bytes.

This patch leaves "struct name_path" as a thin wrapper
around the strbuf, to avoid disrupting callbacks. We should
fix them, but leaving it out makes this diff easier to view.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The "struct name_path" data is examined in only two places:
we generate it in process_tree(), and we convert it to a
single string in path_name(). Everyone else just passes it
through to those functions.

We can further note that process_tree() already keeps a
single strbuf with the leading tree path, for use with
tree_entry_interesting().

Instead of building a separate name_path linked list, let's
just use the one we already build in "base". This reduces
the amount of code (especially tricky code in path_name()
which did not check for integer overflows caused by deep
or large pathnames).

It is also more efficient in some instances.  Any time we
were using tree_entry_interesting, we were building up the
strbuf anyway, so this is an immediate and obvious win
there. In cases where we were not, we trade off storing
"pathname/" in a strbuf on the heap for each level of the
path, instead of two pointers and an int on the stack (with
one pointer into the tree object). On a 64-bit system, the
latter is 20 bytes; so if path components are less than that
on average, this has lower peak memory usage.  In practice
it probably doesn't matter either way; we are already
holding in memory all of the tree objects leading up to each
pathname, and for normal-depth pathnames, we are only
talking about hundreds of bytes.

This patch leaves "struct name_path" as a thin wrapper
around the strbuf, to avoid disrupting callbacks. We should
fix them, but leaving it out makes this diff easier to view.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Convert struct object to object_id</title>
<updated>2015-11-20T13:02:05+00:00</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2015-11-10T02:22:28+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=f2fd0760f62e79609fef7bfd7ecebb002e8e4ced'/>
<id>f2fd0760f62e79609fef7bfd7ecebb002e8e4ced</id>
<content type='text'>
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Jeff King &lt;peff@peff.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'jk/squelch-missing-link-warning-for-unreachable'</title>
<updated>2015-06-11T16:29:59+00:00</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2015-06-11T16:29:58+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=43262d8d6511212f49d519779505f8a557c8dc84'/>
<id>43262d8d6511212f49d519779505f8a557c8dc84</id>
<content type='text'>
Recent "git prune" traverses young unreachable objects to safekeep
old objects in the reachability chain from them, which sometimes
caused error messages that are unnecessarily alarming.

* jk/squelch-missing-link-warning-for-unreachable:
  suppress errors on missing UNINTERESTING links
  silence broken link warnings with revs-&gt;ignore_missing_links
  add quieter versions of parse_{tree,commit}
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recent "git prune" traverses young unreachable objects to safekeep
old objects in the reachability chain from them, which sometimes
caused error messages that are unnecessarily alarming.

* jk/squelch-missing-link-warning-for-unreachable:
  suppress errors on missing UNINTERESTING links
  silence broken link warnings with revs-&gt;ignore_missing_links
  add quieter versions of parse_{tree,commit}
</pre>
</div>
</content>
</entry>
<entry>
<title>silence broken link warnings with revs-&gt;ignore_missing_links</title>
<updated>2015-06-01T16:29:50+00:00</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2015-06-01T09:56:37+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=daf7d86783b1bd2065881a3f0957f69c79a52fd7'/>
<id>daf7d86783b1bd2065881a3f0957f69c79a52fd7</id>
<content type='text'>
We set revs-&gt;ignore_missing_links to instruct the
revision-walking machinery that we know the history graph
may be incomplete. For example, we use it when walking
unreachable but recent objects; we want to add what we can,
but it's OK if the history is incomplete.

However, we still print error messages for the missing
objects, which can be confusing. This is not an error, but
just a normal situation when transitioning from a repository
last pruned by an older git (which can leave broken segments
of history) to a more recent one (where we try to preserve
whole reachable segments).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We set revs-&gt;ignore_missing_links to instruct the
revision-walking machinery that we know the history graph
may be incomplete. For example, we use it when walking
unreachable but recent objects; we want to add what we can,
but it's OK if the history is incomplete.

However, we still print error messages for the missing
objects, which can be confusing. This is not an error, but
just a normal situation when transitioning from a repository
last pruned by an older git (which can leave broken segments
of history) to a more recent one (where we try to preserve
whole reachable segments).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rev-list: add an option to mark fewer edges as uninteresting</title>
<updated>2014-12-29T17:57:55+00:00</updated>
<author>
<name>brian m. carlson</name>
<email>sandals@crustytoothpaste.net</email>
</author>
<published>2014-12-24T23:05:39+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=1684c1b219e02c91655ad929f752f4f864c72faf'/>
<id>1684c1b219e02c91655ad929f752f4f864c72faf</id>
<content type='text'>
In commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting - 2013-08-16), we marked an increasing number
of edges uninteresting.  This change, and the subsequent change to make
this conditional on --objects-edge, are used by --thin to make much
smaller packs for shallow clones.

Unfortunately, they cause a significant performance regression when
pushing non-shallow clones with lots of refs (23.322 seconds vs.
4.785 seconds with 22400 refs).  Add an option to git rev-list,
--objects-edge-aggressive, that preserves this more aggressive behavior,
while leaving --objects-edge to provide more performant behavior.
Preserve the current behavior for the moment by using the aggressive
option.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting - 2013-08-16), we marked an increasing number
of edges uninteresting.  This change, and the subsequent change to make
this conditional on --objects-edge, are used by --thin to make much
smaller packs for shallow clones.

Unfortunately, they cause a significant performance regression when
pushing non-shallow clones with lots of refs (23.322 seconds vs.
4.785 seconds with 22400 refs).  Add an option to git rev-list,
--objects-edge-aggressive, that preserves this more aggressive behavior,
while leaving --objects-edge to provide more performant behavior.
Preserve the current behavior for the moment by using the aggressive
option.

Signed-off-by: brian m. carlson &lt;sandals@crustytoothpaste.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>traverse_commit_list: support pending blobs/trees with paths</title>
<updated>2014-10-19T22:06:31+00:00</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-10-15T22:43:19+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=207394908e9465d0169608725aeaa5bb355086e0'/>
<id>207394908e9465d0169608725aeaa5bb355086e0</id>
<content type='text'>
When we call traverse_commit_list, we may have trees and
blobs in the pending array. As we process these, we pass the
"name" field from the pending entry as the path of the
object within the tree (which then becomes the root path if
we recurse into subtrees).

When we set up the traversal in prepare_revision_walk,
though, the "name" field of any pending trees and blobs is
likely to be the ref at which we found the object. We would
not want to make this part of the path (e.g., doing so would
make "git rev-list --objects v2.6.11-tree" in linux.git show
paths like "v2.6.11-tree/Makefile", which is nonsensical).
Therefore prepare_revision_walk sets the name field of each
pending tree and blobs to the empty string.

However, this leaves no room for a caller who does know the
correct path of a pending object to propagate that
information to the revision walker. We can fix this by
making two related changes:

  1. Use the "path" field as the path instead of the "name"
     field in traverse_commit_list. If the path is not set,
     default to "" (which is what we always ended up with in
     the current code, because of prepare_revision_walk).

  2. In prepare_revision_walk, make a complete copy of the
     entry. This makes the path field available to the
     walker (if there is one), solving our problem.
     Leaving the name field intact is now OK, as we do not
     use it as a path due to point (1) above (and we can use
     it to make more meaningful error messages if we want).
     We also make the original "mode" field available to the
     walker, though it does not actually use it.

Note that we still re-add the pending objects and free the
old ones (so we may strdup the path and name only to free
the old ones). This could be made more efficient by simply
copying the object_array entries that we are keeping.
However, that would require more restructuring of the code,
and is not done here.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we call traverse_commit_list, we may have trees and
blobs in the pending array. As we process these, we pass the
"name" field from the pending entry as the path of the
object within the tree (which then becomes the root path if
we recurse into subtrees).

When we set up the traversal in prepare_revision_walk,
though, the "name" field of any pending trees and blobs is
likely to be the ref at which we found the object. We would
not want to make this part of the path (e.g., doing so would
make "git rev-list --objects v2.6.11-tree" in linux.git show
paths like "v2.6.11-tree/Makefile", which is nonsensical).
Therefore prepare_revision_walk sets the name field of each
pending tree and blobs to the empty string.

However, this leaves no room for a caller who does know the
correct path of a pending object to propagate that
information to the revision walker. We can fix this by
making two related changes:

  1. Use the "path" field as the path instead of the "name"
     field in traverse_commit_list. If the path is not set,
     default to "" (which is what we always ended up with in
     the current code, because of prepare_revision_walk).

  2. In prepare_revision_walk, make a complete copy of the
     entry. This makes the path field available to the
     walker (if there is one), solving our problem.
     Leaving the name field intact is now OK, as we do not
     use it as a path due to point (1) above (and we can use
     it to make more meaningful error messages if we want).
     We also make the original "mode" field available to the
     walker, though it does not actually use it.

Note that we still re-add the pending objects and free the
old ones (so we may strdup the path and name only to free
the old ones). This could be made more efficient by simply
copying the object_array entries that we are keeping.
However, that would require more restructuring of the code,
and is not done here.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>object_array: add a "clear" function</title>
<updated>2014-10-16T17:10:37+00:00</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-10-15T22:34:34+00:00</published>
<link rel='alternate' type='text/html' href='http://91.123.203.49/cgit/delta/git.git/commit/?id=46be823124bb6a6ff0e06dc19c327b599ed97c72'/>
<id>46be823124bb6a6ff0e06dc19c327b599ed97c72</id>
<content type='text'>
There's currently no easy way to free the memory associated
with an object_array (and in most cases, we simply leak the
memory in a rev_info's pending array). Let's provide a
helper to make this easier to handle.

We can make use of it in list-objects.c, which does the same
thing by hand (but fails to free the "name" field of each
entry, potentially leaking memory).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's currently no easy way to free the memory associated
with an object_array (and in most cases, we simply leak the
memory in a rev_info's pending array). Let's provide a
helper to make this easier to handle.

We can make use of it in list-objects.c, which does the same
thing by hand (but fails to free the "name" field of each
entry, potentially leaking memory).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
