delta/libgit2.git/src/cache.c, branch ethomson/treebuilder_write

threads: rename thread files to thread.[ch]

2020-12-06T01:08:22+00:00

threads: rename git_atomic to git_atomic32

2020-12-06T01:08:22+00:00

Clarify the `git_atomic` type and functions now that we have a 64 bit
version as well (`git_atomic64`).

Improve the support of atomics

2020-10-08T12:31:30+00:00

This change:

* Starts using GCC's and clang's `__atomic_*` intrinsics instead of the
  `__sync_*` ones, since the former supercede the latter (and can be
  safely replaced by their equivalent `__atomic_*` version with the
  sequentially consistent model).
* Makes `git_atomic64`'s value `volatile`. Otherwise, this will make
  ThreadSanitizer complain.
* Adds ways to load the values from atomics. As it turns out,
  unsynchronized read are okay only in some architectures, but if we
  want to be correct (and make ThreadSanitizer happy), those loads
  should also be performed with the atomic builtins.
* Fixes two ThreadSanitizer warnings, as a proof-of-concept that this
  works:
  - Avoid directly accessing `git_refcount`'s `owner` directly, and
    instead makes all callers go through the `GIT_REFCOUNT_*()` macros,
    which also use the atomic utilities.
  - Makes `pool_system_page_size()` race-free.

Part of: #5592

tree-wide: remove unused functions

2020-06-08T19:17:57+00:00

We have some functions which aren't used anywhere. Let's remove them to
get rid of unneeded baggage.

cache: fix invalid memory access in case updating cache entry fails

2020-02-07T12:08:23+00:00

When adding a new entry to our cache where an entry with the same OID
exists already, then we only update the existing entry in case it is
unparsed and the new entry is parsed. Currently, we do not check the
return value of `git_oidmap_set` though when updating the existing
entry. As a result, we will _not_ have updated the existing entry if
`git_oidmap_set` fails, but have decremented its refcount and
incremented the new entry's refcount. Later on, this may likely lead to
dereferencing invalid memory.

Fix the issue by checking the return value of `git_oidmap_set`. In case
it fails, we will simply keep the existing stored instead, even though
it's unparsed.

cache: evict items more efficiently

2019-07-17T15:59:54+00:00

When our object cache is full, we pick eight items (or the whole cache,
if there are fewer) and evict them. For small cache sizes, this is fine,
but when we're dealing with a large number of objects, we can repeatedly
exhaust the cache and spend a large amount of time in git_oidmap_iterate
trying to find items to evict.

Instead, let's assume that if the cache gets full, we have a large
number of objects that we're handling, and be more aggressive about
evicting items. Let's remove one item for every 2048 items, but not less
than 8. This causes us to scale our evictions in proportion to the size
of the cache and significantly reduces the time we spend in
git_oidmap_iterate.

Before this change, a full pack of all the non-blob objects in the Linux
repository took in excess of 30 minutes and spent 62.3% of total runtime
in odb_read_1 and its children, and 44.3% of the time in
git_oidmap_iterate. With this change, the same operation now takes 14
minutes and 44 seconds, and odb_read_1 accounts for only 35.9% of total
time, whereas git_oidmap_iterate consists of 6.2%.

Note that we do spend a little more time inflating objects and a decent
amount more time in memcmp. However, overall, the time taken is
significantly improved, and time in pack building is now dominated by
git_delta_create_from_index (33.7%), which is what we would expect.

cache: fix cache eviction using deallocated key

2019-05-24T13:28:33+00:00

When evicting cache entries, we first retrieve the object that is
to be evicted, delete the object and then finally delete the key
from the cache. In case where the cache eviction caused us to
free the cached object, though, its key will point to invalid
memory now when trying to remove it from the cache map. On my
system, this causes us to not properly remove the key from the
map, as its memory has been overwritten already and thus the key
lookup it will fail and we cannot delete it.

Fix this by only decrementing the refcount of the evictee after
we have removed it from our cache map. Add a test that caused a
segfault previous to that change.

Merge pull request #4901 from pks-t/pks/uniform-map-api

2019-02-22T10:56:08+00:00

High-level map APIs

cache: fix misnaming of `git_cache_free`

2019-02-21T12:35:56+00:00

Functions that free a structure's contents but not the structure
itself shall be named `dispose` in the libgit2 project, but the
function `git_cache_free` does not follow this naming pattern.

Fix this by renaming it to `git_cache_dispose` and adjusting all
callers to make use of the new name.

cache: use iteration interface for cache eviction

2019-02-15T12:16:49+00:00

To relieve us from memory pressure, we may regularly call `cache_evict_entries`
to remove some entries from it. Unfortunately, our cache does not support a
least-recently-used mode or something similar, which is why we evict entries
completeley at random right now. Thing is, this is only possible due to the map
interfaces exposing the entry indices, and we intend to completely remove those
to decouple map users from map implementations. As soon as that is done, we are
unable to do this random eviction anymore.

Convert this to make use of an iterator for now. Obviously, there is no random
eviction possible like that anymore, but we'll always start by evicting from the
beginning of the map. Due to hashing, one may hope that the selected buckets
will be evicted at least in some way unpredictably. But more likely than not,
this will not be the case. But let's see what happens and if any users complain
about degraded performance. If so, we might come up with a different scheme than
random removal, e.g. by using an LRU cache.