|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Adds execution time measurements
* Remove @profile decorator
* Changes the whole algorithm. The old one, while being very readable, is a performance bottleneck especially when comparing two big files. Let's try a more efficient one...
* Use a copy of SuccessiveLinesLimits in the all_couples collection in order to avoid modification of the same object when removing successives common lines (in remove_successive method).
* Remove old algorithm (dead code now)
* Creates the LineSpecifs type, to be clearer when manipulating stripped lines.
* Adds type hint in the stripped_lines function signature. Modifies docstring for the same function
* LineSetStartCouple is now a classic class (no more NamedTuple). It allows to define __add__ dunder method to make operations clearer
* Adds __repr__ method to SuccessiveLinesLimits class. Also update the docstring
* Modifies the way the LinesChunk hash is computed. If the line is not empty or is empty but corresponds to a docstring then the hash is the classical one. Otherwise the hash is randomized in order to be sure that two empty lines corresponding to import line are not considered equal
* Empty lines that were comments before being stripped are considered as equal
* Rework the help message do distinguish the options
* Adds a full line of comments in the test and adapts the expected results
* ignore-docstrings by default is True and so all docstrings (differents or identicals) are considered identicals
* In case of multiprocessing reports options
* Simplifies the algoriothm and clarifies the use of the options. For now if something is ignored (docstrings, comments, signature, imports) then it is removed from stripped lines collection. No need of the LineType anymore. The drawback is that 2 chunks of lines in both file may have been detected as similar (which is correct) but have different number of lines because, for example, some comments are inserted and comments habe ignored.
* The CplSuccessiveLinesLimits is no more a NamedTuple because we added the effective_cm_lines_nb member which has to be mutable. It holds the number of "true" common lines between both files (i.e the number of common lines in both stripped lines collection)
* check_sim function is renamed filter_noncode_lines and check the similarities on the stripped lines collection (and no more the real lines collection). Adds the computation of the effective number of common lines (i.e the number of "true" common stripped lines)
* Adapts legacy code so that the effective number of common lines is printed (the number of common lines in both stripped lines collection) and the corresponding component of the first file is printed too.
* Updates the expected result so that they contain the effective common lines number
* Stripped lines are purged from all that is ignored (by default comments and docstrings). Adapts the expected result in consquence
* By default comment and docstrings are excluded from the comparison
* Print also the ending line number in the report
* Adapts the expected results to take into account the ending line number
* Takes into account Pierre-Sassoulas remarks
* Takes into accound the remarks of cdce8p
* The parameters of the SImilarChecker are read from configuration also in the __init__ method
Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>
|