delta/libgit2.git/src/cache.c, branch ethomson/github_actions

tree-wide: remove unused functions

2020-06-08T19:17:57+00:00

We have some functions which aren't used anywhere. Let's remove them to
get rid of unneeded baggage.

cache: fix invalid memory access in case updating cache entry fails

2020-02-07T12:08:23+00:00

When adding a new entry to our cache where an entry with the same OID
exists already, then we only update the existing entry in case it is
unparsed and the new entry is parsed. Currently, we do not check the
return value of `git_oidmap_set` though when updating the existing
entry. As a result, we will _not_ have updated the existing entry if
`git_oidmap_set` fails, but have decremented its refcount and
incremented the new entry's refcount. Later on, this may likely lead to
dereferencing invalid memory.

Fix the issue by checking the return value of `git_oidmap_set`. In case
it fails, we will simply keep the existing stored instead, even though
it's unparsed.

cache: evict items more efficiently

2019-07-17T15:59:54+00:00

When our object cache is full, we pick eight items (or the whole cache,
if there are fewer) and evict them. For small cache sizes, this is fine,
but when we're dealing with a large number of objects, we can repeatedly
exhaust the cache and spend a large amount of time in git_oidmap_iterate
trying to find items to evict.

Instead, let's assume that if the cache gets full, we have a large
number of objects that we're handling, and be more aggressive about
evicting items. Let's remove one item for every 2048 items, but not less
than 8. This causes us to scale our evictions in proportion to the size
of the cache and significantly reduces the time we spend in
git_oidmap_iterate.

Before this change, a full pack of all the non-blob objects in the Linux
repository took in excess of 30 minutes and spent 62.3% of total runtime
in odb_read_1 and its children, and 44.3% of the time in
git_oidmap_iterate. With this change, the same operation now takes 14
minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total
time, whereas git_oidmap_iterate consists of 6.2%.

Note that we do spend a little more time inflating objects and a decent
amount more time in memcmp. However, overall, the time taken is
significantly improved, and time in pack building is now dominated by
git_delta_create_from_index (33.7%), which is what we would expect.

cache: fix cache eviction using deallocated key

2019-05-24T13:28:33+00:00

When evicting cache entries, we first retrieve the object that is
to be evicted, delete the object and then finally delete the key
from the cache. In case where the cache eviction caused us to
free the cached object, though, its key will point to invalid
memory now when trying to remove it from the cache map. On my
system, this causes us to not properly remove the key from the
map, as its memory has been overwritten already and thus the key
lookup it will fail and we cannot delete it.

Fix this by only decrementing the refcount of the evictee after
we have removed it from our cache map. Add a test that caused a
segfault previous to that change.

Merge pull request #4901 from pks-t/pks/uniform-map-api

2019-02-22T10:56:08+00:00

High-level map APIs

cache: fix misnaming of `git_cache_free`

2019-02-21T12:35:56+00:00

Functions that free a structure's contents but not the structure
itself shall be named `dispose` in the libgit2 project, but the
function `git_cache_free` does not follow this naming pattern.

Fix this by renaming it to `git_cache_dispose` and adjusting all
callers to make use of the new name.

cache: use iteration interface for cache eviction

2019-02-15T12:16:49+00:00

To relieve us from memory pressure, we may regularly call `cache_evict_entries`
to remove some entries from it. Unfortunately, our cache does not support a
least-recently-used mode or something similar, which is why we evict entries
completeley at random right now. Thing is, this is only possible due to the map
interfaces exposing the entry indices, and we intend to completely remove those
to decouple map users from map implementations. As soon as that is done, we are
unable to do this random eviction anymore.

Convert this to make use of an iterator for now. Obviously, there is no random
eviction possible like that anymore, but we'll always start by evicting from the
beginning of the map. Due to hashing, one may hope that the selected buckets
will be evicted at least in some way unpredictably. But more likely than not,
this will not be the case. But let's see what happens and if any users complain
about degraded performance. If so, we might come up with a different scheme than
random removal, e.g. by using an LRU cache.

oidmap: introduce high-level setter for key/value pairs

2019-02-15T12:16:48+00:00

Currently, one would use either `git_oidmap_insert` to insert key/value pairs
into a map or `git_oidmap_put` to insert a key only. These function have
historically been macros, which is why their syntax is kind of weird: instead of
returning an error code directly, they instead have to be passed a pointer to
where the return value shall be stored. This does not match libgit2's common
idiom of directly returning error codes.Furthermore, `git_oidmap_put` is tightly
coupled with implementation details of the map as it exposes the index of
inserted entries.

Introduce a new function `git_oidmap_set`, which takes as parameters the map,
key and value and directly returns an error code. Convert all trivial callers of
`git_oidmap_insert` and `git_oidmap_put` to make use of it.

oidmap: introduce high-level getter for values

2019-02-15T12:16:48+00:00

The current way of looking up an entry from a map is tightly coupled with the
map implementation, as one first has to look up the index of the key and then
retrieve the associated value by using the index. As a caller, you usually do
not care about any indices at all, though, so this is more complicated than
really necessary. Furthermore, it invites for errors to happen if the correct
error checking sequence is not being followed.

Introduce a new high-level function `git_oidmap_get` that takes a map and a key
and returns a pointer to the associated value if such a key exists. Otherwise,
a `NULL` pointer is returned. Adjust all callers that can trivially be
converted.

maps: use uniform lifecycle management functions

2019-02-15T12:16:48+00:00

Currently, the lifecycle functions for maps (allocation, deallocation, resize)
are not named in a uniform way and do not have a uniform function signature.
Rename the functions to fix that, and stick to libgit2's naming scheme of saying
`git_foo_new`. This results in the following new interface for allocation:

- `int git_map_new(git_map **out)` to allocate a new map, returning an
  error code if we ran out of memory

- `void git_map_free(git_map *map)` to free a map

- `void git_map_clear(gitmap *map)` to remove all entries from a map

This commit also fixes all existing callers.