Cpp TimSort Versions Save

A C++ implementation of timsort

v3.0.0

3 months ago

cpp-TimSort 3.0.0 is C++20-only, as well as a breaking release to better conform to the new constrained algorithm interface introduced by the work on standard ranges. The following branches are not going away, though they won't be maintained anymore unless support is explicitly asked through issues:

  • 1.x.y: C++03 branch.
  • 2.x.y (previously master): C++11 branch.

Breaking changes

Due to the use of standard library concepts to constrain the interface of gfx::timsort and gfx::timmerge, code that used to compile with the 2.x.y branch might not compile anymore with the 3.x.y. One notable such case is when trying to pass iterators that do not implement postfix operator++ and operator--: older branches used to support such iterators, but C++20 iterator concepts require these functions to exist and to return an iterator.

Other such breaking changes might occur, though the case described above should be the only that was explicitly supported. Other breaks can occur for types and use cases that were accidentally supported.

New features & improvements

Switching to C++20 allowed to take advantage of all the new library features that were introduced along ranges:

  • As previously mentioned, the functions timsort and timmerge are now constrained with concepts, notably std::random_access_iterator, std::ranges::random_access_range and std::sortable.
  • timmerge gained a new overload taking range and a middle iterator.
  • Sentinel support was added via std::sentinel_for: the last parameter does not have to be an iterator anymore, and can be any sentinel type compatible with the first iterator. Sorted ranges can also return an iterator/sentinel pair via their begin/end functions.
  • timsort and timmerge now return the one-past-the-end iterator of the sorted/merged range.
  • Temporary ranges are now supported, allowing to pass types such as std::span. If said temporary range does not model a borrowed range, then std::ranges::dangling is returned instead of the one-past-the-end iterator.
  • Proxy iterators are supported via consistent use of std::ranges::iter_move and std::ranges::iter_swap. This does not make a big difference for ibrary types in C++20, thought it means that timsort and timmerge can handle types such as std::vector<bool> or std::ranges::views::zip correctly in C++23.
  • The default comparison and projection callables were changed to std::ranges::less and std::identity respectively.
  • Overall, gfx::timsort and gfx::timmerge should now act as drop-in replacements for std::ranges::sort and std::ranges::inplace_merge respectively. The main difference being that they require O(n) extra memory, they do not have a fallback when such memory is unavailable.

Most of the changes to timsort and timmerge are related to their interface and usability, though one change might also make the algorithm faster in some specific scenarios: there is now first-class support for projections instead of baking it into the comparison function like version 2.x.y did, which means that the number of projections performed by the algorithm is now sometimes smaller than twice the number of comparisons. This might make the algorithm faster when projections are expensive.

Bar plot showing the numbers of projections performed by timsort versions 2.x.y and 3.x.y over an std::vector<double> of a million elements, with different data patterns

Tooling

  • Switching to a newer CMake version allowed to take advantage of new features, though it should not make a big difference to end users. A notable difference is the removal of the vendored DownloadProject dependency, replaced with the standard ExternalProject mechanism.

  • The test suite would sometimes report errors due to the passed RNG seed starting with a 0. This does not happen anymore.

  • The test suite now uses version 3 of the Catch2 testing framework.

  • CI: images, compilers and tools were upgraded to newer versions, sufficiently recent to support the new C++20 features.

  • The test suite isn't built with -Winline anymore: it was needlessly noisy, especially considering that the library does not use inline as an optimization hint.

v2.1.0

3 years ago

This release is mostly the work of @vedgy, thanks a lot to him! It brings the following changes:

  • New: a new function, gfx::timmerge is now available. It implements the merge algorithm used by timsort, and can be used as a drop-in replacement for std::inplace_merge. The most notable difference with the standard library function is that it can't fallback to an O(n log n) merge algorithm when no extra memory is available. Just like gfx::timsort, it supports projections as well as iterators that don't implement postfix ++ and -- iterators.
  • New: a new configuration macro, GFX_TIMSORT_ENABLE_AUDIT , can be defined to ask gfx::timsort and gfx::timmerge to perform more expensive checks than the ones performed when defining GFX_TIMSORT_ENABLE_ASSERT. These audits are disabled by default and only meant for debugging purposes since they can slow down the algorithms significantly.
  • Tests and benchmarks were completed to include the new timmerge function.
  • The project's GitHub wiki now contains a page with additional benchmark results.

v1.3.0

3 years ago

This release is mostly the work of @vedgy, thanks a lot to him! It brings the following new features:

  • A new function, gfx::timmerge is now available. It implements the merge algorithm used by timsort, and can be used as a drop-in replacement for std::inplace_merge. The most notable difference with the standard library function is that it can't fallback to an O(n log n) merge algorithm when no extra memory is available. Just like gfx::timsort, it supports projections as well as iterators that don't implement postfix ++ and -- iterators.
  • A new configuration macro, GFX_TIMSORT_ENABLE_AUDIT , can be defined to ask gfx::timsort and gfx::timmerge to perform more expensive checks than the ones performed when defining GFX_TIMSORT_ENABLE_ASSERT. These audits are disabled by default and only meant for debugging purposes since they can slow down the algorithms significantly.

v2.0.2

3 years ago

Minor release, mostly motivated by tooling changes:

  • Made functions TimSort::gallopLeft() and TimSort::gallopRight() static (#35, thanks @vedgy).
  • Changed the Catch2 branch downloaded by the test suite from master to v2.x - the project's master branch doesn't exist anymore (#34, thanks @vedgy).
  • Moved the continuous integration from Travis CI to GitHub Actions. This change means that the oldest tested compilers are a bit more recent than they used to be because of the differences between the old and new virtual environments - gfx::timsort should still run perfectly on older compilers nonetheless. The combinations of configurations tested in continuous integration are different but roughly equivalent to the old ones in terms of coverage.

v1.2.2

3 years ago

Minor release, mostly motivated by tooling changes:

  • Made functions TimSort::gallopLeft() and TimSort::gallopRight() static (#35, thanks @vedgy).
  • Moved the continuous integration from Travis CI to GitHub Actions. This change means that the oldest tested compilers are a bit more recent than they used to be because of the differences between the old and new virtual environments - gfx::timsort should still run perfectly on older compilers nonetheless. The combinations of configurations tested in continuous integration are different but roughly equivalent to the old ones in terms of coverage.
  • Added instructions to the README to install the library via Conan.

v2.0.1

3 years ago

Minor release in order to make the small changes made up to now available:

  • Changed loop condition in minRunLength() to match the one in the original CPython implementation (issue #33, thanks @weeyizhi).
  • Fix the project version number in CMakelists.txt.
  • Change the Conan badge in the README to point to https://conan.io/.

v1.2.1

3 years ago

Minor release in order to make the small changes made up to now available:

  • Changed loop condition in minRunLength() to match the one in the original CPython implementation (issue #33, thanks @weeyizhi).
  • Fix the project version number in CMakelists.txt.
  • Add link and instructions to get the 1.x version with Conan.

v2.0.0

4 years ago

cpp-TimSort release 2.0.0 is C++11-only: some amount of C++03 support will be maintained in the branch 1.x.y if needed, but most relevant development will now happen in the 2.x.y branch. This versions branches off 1.2.0, so every item in the changelog below describes changes that happened since this specific release.

C++11 doesn't bring many new features to the table for gfx::timsort, so most of the changes are linked to tooling:

  • New: gfx::timsort now accepts range arguments to sort a whole collection.
  • Move semantics are used unconditionally, which means that GFX_TIMSORT_USE_STD_MOVE doesn't do anything anymore, and that the code should be easier to read.
  • The tests now use Catch 2.x instead of 1.x, which allows for better tooling around them.
  • The tests in Debug mode are now run through Valgrind on OSX.
  • Tests have been added for generalized callables support.

v1.2.0

4 years ago

Apparently I was lying about the previous release being the last C++03 one before C++11 for here is another one!

This release includes the following changes:

  • When std::invoke is available (C++17 and later), it is used to invoke the comparison and projection functions. Therefore the following code is now valid:
    struct Person {
        std::string name;
        int age;
    };
    
    // Sort a vector of persons by age
    std::vector<Person> vec;
    // ... fill vec ...
    gfx::timsort(vec.begin(), vec.end(), std::less{}, &Person::age);
    
  • gfx::timsort now accepts iterators that don't have a conforming postfix operator++ or operator--, or that simply don't provide these operations. Only the prefix versions of these operators are used by the algorithm.
  • <windows.h> can now be included alongside <gfx/timsort.hpp> without triggering issues because of the min() and max() macros defined by the former.

v1.1.0

4 years ago

This is the last proper C++03 release before moving to C++11: from now on development will mostly target the 2.x version. That said the 1.x branch won't go totally unmaintained either: bugs will still be fixed and new features can still be backported if they are easy enough to implement in C++03.

Projections support

gfx::timsort now supports projections. Originally introduced in the Adobe Source Libraries (ASL), then later picked up by Eric Niebler's for range-v3 before making it all the way to the Ranges TS and C++20, projections are transformations that algorithms apply before examining the values of elements; in timsort a projection is applied to each element to be compared prior to a comparison.

Consider that you've got a collection of strings and want to sort it by string size instead of string contents. You could use a projection as follows:

#include <string>
#include <vector>
#include <gfx/timsort.hpp>

size_t len(const std::string& str) {
    return str.size();
}

// Sort a vector of strings by length
std::vector<std::string> vec;
// ... fill vec ...
gfx::timsort(vec.begin(), vec.end(), std::less<size_t>(), &len);

In C++20 projections (but also comparisons) can be pretty much anything satisfying the std::invocable concept, which means all function-like objects and even pointer to members. The support in gfx::timsort is a bit rougher and projections can only be instances of types callable with parentheses.

Miscellaneous changes

The following changes were also applied to the project:

  • Assertions with an assert(xx && yy); pattern were split into assert(xx); assert(yy); to provide more precise diagnostics when something fails.

  • When running the testsuite in debug mode through CMake, the flag -Og is now passed to GCC instead of -O0. It was always the intended behaviour but didn't work because of a case bug.

  • Some useless files and configuration were removed from the project in an attempt to clean it a bit.