| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some parts of our code, we make rather heavy use of casting
structures to their respective specialized implementation. One
example is the configuration code with the general
`git_config_backend` and the specialized `diskfile_header`
structures. At some occasions, it can get confusing though with
regards to the correct inheritance structure, which led to the
recent bug fixed in 2424e64c4 (config: harden our use of the
backend objects a bit, 2018-02-28).
Object-oriented programming in C is hard, but we can at least try
to have some checks when it comes to casting around stuff. Thus,
this commit introduces a `GIT_CONTAINER_OF` macro, which accepts
as parameters the pointer that is to be casted, the pointer it
should be cast to as well as the member inside of the target
structure that is the containing structure. This macro then tries
hard to detect mis-casts:
- It checks whether the source and target pointers are of the
same type. This requires support by the compiler, as it makes
use of the builtin `__builtin_types_compatible_p`.
- It checks whether the embedded member of the target structure
is the first member. In order to make this a compile-time
constant, the compiler-provided `__builtin_offsetof` is being
used for this.
- It ties these two checks together by the compiler-builtin
`__builtin_choose_expr`. Based on whether the previous two
checks evaluate to `true`, the compiler will either compile in
the correct cast, or it will output `(void)0`. The second case
results in a compiler error, resulting in a compile-time check
for wrong casts.
The only downside to this is that it relies heavily on
compiler-specific extensions. As both GCC and Clang support these
features, only define this macro like explained above in case
`__GNUC__` is set (Clang also defines `__GNUC__`). If the
compiler is not Clang or GCC, just go with a simple cast without
any additional checks.
|
| | |
|
| |\
| |
| | |
Object parse fixes
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Unfortunately, neither the `memmem` nor the `strnstr` functions are part
of any C standard but are merely extensions of C that are implemented by
e.g. glibc. Thus, there is no standardized way to search for a string in
a block of memory with a limited size, and using `strstr` is to be
considered unsafe in case where the buffer has not been sanitized. In
fact, there are some uses of `strstr` in exactly that unsafe way in our
codebase.
Provide a new function `git__memmem` that implements the `memmem`
semantics. That is in a given haystack of `n` bytes, search for the
occurrence of a byte sequence of `m` bytes and return a pointer to the
first occurrence. The implementation chosen is the "Not So Naive"
algorithm from [1]. It was chosen as the implementation is comparably
simple while still being reasonably efficient in most cases.
Preprocessing happens in constant time and space, searching has a time
complexity of O(n*m) with a slightly sub-linear average case.
[1]: http://www-igm.univ-mlv.fr/~lecroq/string/
|
| | |
| |
| |
| |
| |
| |
| | |
The function `git__strtol32` can easily be misused when untrusted data
is passed to it that may not have been sanitized with trailing `NUL`
bytes. As all usages of this function have now been removed, we can
remove this function altogether to avoid future misuse of it.
|
| |/
|
|
|
|
|
|
| |
The function `git__strtol64` does not take a maximum buffer length as
parameter. This has led to some unsafe usages of this function, and as
such we may consider it as being unsafe to use. As we have now
eradicated all usages of this function, let's remove it completely to
avoid future misuse.
|
| |
|
|
|
|
|
|
|
|
| |
Our "util.h" header is a grabbag of various different functions, where
many don't have a clear group they belong to. Our set of allocator
functions though can be clearly singled out as a single group of
functions that always belongs together. Furthermore, we will need to
implement additional functions relating to our allocators subsystem when
moving to pluggable allocators. Thus, we should just move these
functions into their own "alloc" module.
|
| |
|
|
|
|
|
|
|
|
|
| |
Right now, the standard allocator is being declared as part of the
"util.h" header as a set of inline functions. As with the crtdbg
allocator functions, these inline functions make it hard to convert to
function pointers for our allocators.
Create a new "stdalloc" module containing our standard allocations
functions to split these out. Convert the existing allocators to macros
which make use of the stdalloc functions.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Currently, the `git__free` function is being defined in a single place,
only, disregarding whether we use our standard allocators or the crtdbg
allocators. This makes it a bit harder to convert our code base to use
pluggable allocators, and furthermore makes the border between our two
allocators a bit more blurry.
Implement a separate `git__crtdbg__free` function for the crtdbg
allocator in order to completely separate both allocator
implementations.
|
| |
|
|
|
|
|
|
|
|
| |
The diff driver truncates the hunk header text to 80 bytes, which can truncate
4-byte Unicode characters and introduce garbage characters in the diff
output. This change sanitizes the hunk header before it is displayed.
This mirrors the test in git: https://github.com/git/git/blob/master/t/t4025-hunk-header.sh
Closes https://github.com/libgit2/rugged/issues/716
|
| |
|
|
|
|
|
|
|
|
|
| |
While "util.h" declares the macro `git__tolower`, which simply resorts
to tolower(3P) on Unix-like systems, the <ctype.h> header is only being
included in "util.c". Thus, anybody who has included "util.h" without
having <ctype.h> included will fail to compile as soon as the macro is
in use.
Furthermore, we can clean up additional includes in "util.c" and simply
replace them with an include for "common.h".
|
| |
|
|
| |
use consistent names for the #include / #define header guard pattern.
|
| |
|
|
|
|
|
|
|
|
|
| |
Introduce `git_prefixncmp` that will search up to the first `n`
characters of a string to see if it is prefixed by another string.
This is useful for examining if a non-null terminated character
array is prefixed by a particular substring.
Consolidate the various implementations of `git__prefixcmp` around a
single core implementation and add some test cases to validate its
behavior.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Strict aliasing rules dictate that for most data types, you are not
allowed to cast them to another data type and then access the casted
pointers. While this works just fine for most compilers, technically we
end up in undefined behaviour when we hurt that rule.
Our current refcounting code makes heavy use of casting and thus
violates that rule. While we didn't have any problems with that code,
Travis started spitting out a lot of warnings due to a change in their
toolchain. In the refcounting case, the code is also easy to fix:
as all refcounting-statements are actually macros, we can just access
the `rc` field directly instead of casting.
There are two outliers in our code where that doesn't work. Both the
`git_diff` and `git_patch` structures have specializations for generated
and parsed diffs/patches, which directly inherit from them. Because of
that, the refcounting code is only part of the base structure and not of
the children themselves. We can help that by instead passing their base
into `GIT_REFCOUNT_INC`, though.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Next to including several files, our "common.h" header also declares
various macros which are then used throughout the project. As such, we
have to make sure to always include this file first in all
implementation files. Otherwise, we might encounter problems or even
silent behavioural differences due to macros or defines not being
defined as they should be. So in fact, our header and implementation
files should make sure to always include "common.h" first.
This commit does so by establishing a common include pattern. Header
files inside of "src" will now always include "common.h" as its first
other file, separated by a newline from all the other includes to make
it stand out as special. There are two cases for the implementation
files. If they do have a matching header file, they will always include
this one first, leading to "common.h" being transitively included as
first file. If they do not have a matching header file, they instead
include "common.h" as first file themselves.
This fixes the outlined problems and will become our standard practice
for header and source files inside of the "src/" from now on.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Some of our header files are not included at all by any of their
implementing counter-parts. Including them inside of these files leads
to some compile errors mostly due to unknown types because of missing
includes. But there's also one case where a declared function does not
match the implementation's prototype.
Fix all these errors by fixing up the prototype and adding missing
includes. This is preparatory work for fixing up missing includes in the
implementation files.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current order of declarations and includes between "common.h" and
"w32_crtdbg_stacktrace.h" is rather complicated. Both header files make
use of things defined in the other one and are thus circularly dependent
on each other. This makes it currently impossible to compile the
"w32_crtdbg_stacktrace.c" file when including "common.h" inside of
"w32_crtdbg_stacktrace.h".
We can disentangle the mess by moving declaration of the inline crtdbg
functions into the "w32_crtdbg_stacktrace.h" file and adding additional
includes inside of it, such that all required functions are available to
it. This allows us to break the dependency cycle.
|
| | |
|
| |\
| |
| | |
git__getenv: utf-8 aware env reader
|
| | |
| |
| |
| |
| |
| | |
Introduce `git__getenv` which is a UTF-8 aware `getenv` everywhere.
Make `cl_getenv` use this to keep consistent memory handling around
return values (free everywhere, as opposed to only some platforms).
|
| |/ |
|
| |
|
|
|
|
|
|
| |
Some brain damaged tolower() implementations appear to want to
take the locale into account, and this may require taking some
insanely aggressive lock on the locale and slowing down what should
be the most trivial of trivial calls for people who just want to
downcase ASCII.
|
| | |
|
| |
|
|
|
|
|
|
|
| |
Make our overflow checking look more like gcc and clang's, so that
we can substitute it out with the compiler instrinsics on platforms
that support it. This means dropping the ability to pass `NULL` as
an out parameter.
As a result, the macros also get updated to reflect this as well.
|
| |
|
|
|
| |
Add some helper functions to check for overflow in a type-specific
manner.
|
| |
|
|
|
| |
Ensure that the given length to `p_read` is of ssize_t and ensure
that callers test the return as if it were an `ssize_t`.
|
| |
|
|
|
|
| |
Have the ALLOC_OVERFLOW testing macros also simply set_oom in the
case where a computation would overflow, so that callers don't
need to.
|
| |
|
|
|
|
|
|
| |
Introduce git__reallocarray that checks the product of the number
of elements and element size for overflow before allocation. Also
introduce git__mallocarray that behaves like calloc, but without the
`c`. (It does not zero memory, for those truly worried about every
cycle.)
|
| |
|
|
|
| |
Introduce some helper macros to test integer overflow from arithmetic
and set error message appropriately.
|
| |
|
|
| |
Avoid str length recalculation every iteration
|
| |
|
|
|
|
|
| |
The clock_gettime() function is normally not available under
AmigaOS, hence another solution is required. We are using now
GetUpTime() that is present in current versions of this
operating system.
|
| |
|
|
|
|
| |
A byte value of 0x85 is not whitespace, we were conflating that with
U+0085 (UTF8: 0xc2 0x85). This caused us to incorrectly treat valid
multibyte characters like U+88C5 (UTF8: 0xe8 0xa3 0x85) as whitespace.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Disallow:
1. paths with trailing dot
2. paths with trailing space
3. paths with trailing colon
4. paths that are 8.3 short names of .git folders ("GIT~1")
5. paths that are reserved path names (COM1, LPT1, etc).
6. paths with reserved DOS characters (colons, asterisks, etc)
These paths would (without \\?\ syntax) be elided to other paths - for
example, ".git." would be written as ".git". As a result, writing these
paths literally (using \\?\ syntax) makes them hard to operate with from
the shell, Windows Explorer or other tools. Disallow these.
|
| |
|
|
|
| |
The effect of this would be that various update callbacks would
not be made at the correct interval.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
| |
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
|
| | |
|
| |
|
|
|
|
|
|
|
| |
We need this from util.h and posix.h, but the latter includes common.h
which includes util.h, which means p_strlen is not defined by the time
we get to git__strndup().
Split the definition on p_strlen() off into its own header so we can use
it in util.h.
|
| |
|
|
|
| |
The standard library provides a very nice strnlen function, which knows
to use SSE, let's not reimplement it ourselves.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the basics of progress reporting during push. While progress
for all aspects of a push operation are not reported with this change,
it lays the foundation to add these later. Push progress reporting
can be improved in the future - and consumers of the API should
just get more accurate information at that point.
The main areas where this is lacking are:
1) packbuilding progress: does not report progress during deltafication,
as this involves coordinating progress from multiple threads.
2) network progress: reports progress as objects and bytes are going
to be written to the subtransport (instead of as client gets
confirmation that they have been received by the server) and leaves
out some of the bytes that are transfered as part of the push protocol.
Basically, this reports the pack bytes that are written to the
subtransport. It does not report the bytes sent on the wire that
are received by the server. This should be a good estimate of
progress (and an improvement over no progress).
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After doing further profiling, I found that a lot of time was
being spent attempting to insert hashes into the file hash
signature when using the rolling hash because the rolling hash
approach generates a hash per byte of the file instead of one
per run/line of data.
To optimize this, I decided to convert back to a run-based file
signature algorithm which would be more like core Git.
After changing this, a number of the existing tests started to
fail. In some cases, this appears to have been because the test
was coded to be too specific to the particular results of the file
similarity metric and in some cases there appear to have been bugs
in the core rename detection code where only by the coincidence
of the file similarity scoring were the expected results being
generated.
This renames all the variables in the core rename detection code
to be more consistent and hopefully easier to follow which made it
a bit easier to reason about the behavior of that code and fix the
problems that I was seeing. I think it's in better shape now.
There are a couple of tests now that attempt to stress test the
rename detection code and they are quite slow. Most of the time
is spent setting up the test data on disk and in the index. When
we roll out performance improvements for index insertion, it
should also speed up these tests I hope.
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
On Linux: fix a warning message related to the volatile qualifier (cast)
On Windows: use SecureZeroMemory()
On both, inline the call, so that no entry point can lead back to this "secure" memory zeroing.
|
| | |
|