delta/libgit2.git/src/commit_list.c, branch ethomson/github_actions

Merge pull request #4445 from tiennou/shallow/dry-commit-parsing

2019-10-03T10:55:48+00:00

DRY commit parsing

commit_list: store in/out-degrees as uint16_t

2019-10-03T10:23:52+00:00

The commit list's in- and out-degrees are currently stored as `unsigned
short`. When assigning it the value of `git_array_size`, which returns
an `size_t`, this generates a warning on some Win32 platforms due to
loosing precision.

We could just cast the returned value of `git_array_size`, which would
work fine for 99.99% of all cases as commits typically have less than
2^16 parents. For crafted commits though we might end up with a wrong
value, and thus we should definitely check whether the array size
actually fits into the field.

To ease the check, let's convert the fields to store the degrees as
`uint16_t`. We shouldn't rely on such unspecific types anyway, as it may
lead to different behaviour across platforms. Furthermore, this commit
introduces a new `git__is_uint16` function to check whether it actually
fits -- if not, we return an error.

commit_list: unify commit information parsing

2019-10-03T06:59:11+00:00

commit_list: fix possible buffer overflow in `commit_quick_parse`

2019-08-13T16:56:06+00:00

The function `commit_quick_parse` provides a way to quickly parse
parts of a commit without storing or verifying most of its
metadata. The first thing it does is calculating the number of
parents by skipping "parent " lines until it finds the first
non-parent line. Afterwards, this parent count is passed to
`alloc_parents`, which will allocate an array to store all the
parent.

To calculate the amount of storage required for the parents
array, `alloc_parents` simply multiplicates the number of parents
with the respective elements's size. This already screams "buffer
overflow", and in fact this problem is getting worse by the
result being cast to an `uint32_t`.

In fact, triggering this is possible: git-hash-object(1) will
happily write a commit with multiple millions of parents for you.
I've stopped at 67,108,864 parents as git-hash-object(1)
unfortunately soaks up the complete object without streaming
anything to disk and thus will cause an OOM situation at a later
point. The point here is: this commit was about 4.1GB of size but
compressed down to 24MB and thus easy to distribute.

The above doesn't yet trigger the buffer overflow, thus. As the
array's elements are all pointers which are 8 bytes on 64 bit, we
need a total of 536,870,912 parents to trigger the overflow to
`0`. The effect is that we're now underallocating the array
and do an out-of-bound writes. As the buffer is kindly provided
by the adversary, this may easily result in code execution.

Extrapolating from the test file with 67m commits to the one with
536m commits results in a factor of 8. Thus the uncompressed
contents would be about 32GB in size and the compressed ones
192MB. While still easily distributable via the network, only
servers will have that amount of RAM and not cause an
out-of-memory condition previous to triggering the overflow. This
at least makes this attack not an easy vector for client-side use
of libgit2.

pool: use `size_t` for sizes

2019-06-24T14:00:41+00:00

git_error: use new names in internal APIs and usage

2019-01-22T22:30:35+00:00

Move to the `git_error` name in the internal API for error-related
functions.

object_type: use new enumeration names

2018-12-01T11:54:57+00:00

Use the new object_type enumeration names within the codebase.

commit_list: avoid use of strtol64 without length limit

2018-10-18T09:25:59+00:00

When quick-parsing a commit, we use `git__strtol64` to parse the
commit's time. The buffer that's passed to `commit_quick_parse` is the
raw data of an ODB object, though, whose data may not be properly
formatted and also does not have to be `NUL` terminated. This may lead
to out-of-bound reads.

Use `git__strntol64` to avoid this problem.

Make sure to always include "common.h" first

2017-07-03T08:51:48+00:00

Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.

This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.

This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.

giterr_set: consistent error messages

2016-12-29T12:26:03+00:00

Error messages should be sentence fragments, and therefore:

1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one