Primecount Versions Save

🚀 Fast prime counting function implementations

v6.1

3 years ago

The main focus of this release has been on polishing the code and improving the documentation. I also tried many things to improve the scaling on servers with a large number of CPU cores, however I only achieved minor speed ups. The only meaningful improvement is that the same threads are now reused throughout the entire AC computation. This improves the scaling for small to medium sized computations up to 1020. GCC benefits most from this change whereas Clang performance is mostly unchanged.

  • Xavier Gourdon's algorithm has been distributed using MPI so that computations can now run on HPC clusters.
  • CMakeLists.txt: New WITH_JEMALLOC option (default OFF).
  • AC.cpp: Reuse the same threads throughout the computation.
  • AC.cpp: Improve upper bound of C2 formula.
  • AC.cpp: Avoid branch inside hot loop of A formula.
  • SegmentedPiTable.cpp: Reuse threads from AC.cpp.
  • LoadBalancerP2.cpp: New load balancer for P2 & B formulas.
  • phi.cpp: Reduce caching for tiny numbers.
  • generate_phi.hpp: Reduce caching for tiny numbers.
  • pod_vector.hpp: Like std::vector, but without default initialization (useful when allocating 100s of GiB).
  • PiTable.cpp: Multi-threaded initialization.
  • Status.cpp: Avoid thread synchronization when printing in order to improve scaling of AC and S2_easy.
  • Status.cpp: Improve S2_hard & D status accuracy.
  • StatusAC.cpp: More accurate status for AC formula.
  • cmdoptions.cpp: Add -B & -D options.

v6.0

4 years ago

This version fixes a scaling issue that has been present in primecount since the first version. The computation of the hard special leaves (S2_hard & D formulas) used a flawed optimization that deteriorated the runtime complexity of the algorithm. The Special-Leaves.md document contains more information. The new version should run much faster for large computations ≥ 10^23.

  • Sieve.cpp: Implemented O(sqrt(sieve_size)) counting, previously counting was O(sieve_size).
  • AC_libdivide.cpp: Up to 20% faster using GCC.
  • S2_easy_libdivide.cpp: Up to 20% faster using GCC.
  • primecount.h: New C API. primecount now has both a C API (primecount.h) and a C++ API (primecount.hpp).
  • LoadBalancer.cpp: Refactor using new ThreadSettings struct.
  • find_optimal_alpha_*.sh: New scripts that find optimal alpha tuning factors using the Linux perf tool.

v6.0-backup

4 years ago

See the ChangeLog for what's new.

v5.3

4 years ago

libprimesieve has been updated to the latest version which provides a minor speedup for all formulas that use primesieve::iterator e.g. P2, B, pi_lehmer, ...

  • Sieve.hpp: Use NOINLINE.
  • S2Status.hpp: Use NOINLINE.
  • D4.cpp: Simplify bounds.
  • S2_hard.cpp: Simplify bounds.
  • LoadBalancer.cpp: Simplify bounds.
  • LoadBalancer.cpp: Improve thread load balancing.
  • RiemannR.cpp: Add support for __float128 type.
  • popcnt.hpp: Faster ARM64 popcount implementation.
  • fast_div.hpp: Add ENABLE_DIV32 macro.
  • S2Status.cpp: Fix non-critical data race.

v5.3-backup

4 years ago

See the ChangeLog for what's new.

v5.2

4 years ago

This release adds a man page for the primecount binary and contains other documentation improvements e.g. the new BUILD.md contains a section that explains how to package primecount for Linux distros.

See the ChangeLog for more details.

v5.2-backup

4 years ago

I have now implemented Xavier Gourdon's algorithm in the primecount-backup version which runs up to 2x faster than the Deleglise-Rivat algorithm. See the ChangeLog for more information.

v5.1

4 years ago

The memory usage of Xavier Gourdon's algorithm has been reduced from O(x^(1/2)) to O(x^(1/3) * log^3 x). Xavier Gourdon's algorithm is now fully implemented and luckily it scales well up to a very large number of CPU cores. Xavier Gourdon's algorithm is now enabled by default for numbers > 10^7.

  • main.cpp: New options: --AC, --B, --D, --Phi0, --Sigma.
  • SegmentedPiTable.cpp: New PiTable implementation.
  • AC.cpp: Combine the A & C formulas to improve scaling.
  • D4.cpp: Tighter bounds, minor speedup.
  • Sigma.cpp: Reduce initialization overhead.
  • Sieve.cpp: Improved pre-sieving.
  • S2_hard.cpp: Improved pre-sieving.
  • print.cpp: Updated for Gourdon's algorithm.

C++ API Changes

In order to simplify the API the following functions have been removed from the primecount.hpp header:

int64_t pi_deleglise_rivat(int64_t x);
int64_t pi_legendre(int64_t x);
int64_t pi_lehmer(int64_t x);
int64_t pi_lmo(int64_t x);
int64_t pi_meissel(int64_t x);
int64_t pi_primesieve(int64_t x);

You should use pi(x) instead which is an alias for the fastest prime counting function implementation in primecount.

v5.0

4 years ago

I have finally implemented Xavier Gourdon's prime counting algorithm! Xavier Gourdon's algorithm is an improved version of the Deleglise-Rivat algorithm. According to my benchmarks Gourdon's algorithm runs up to 2x faster than the Deleglise-Rivat algorithm.

  • src/gourdon: Implementation of Xavier Gourdon's algorithm.
  • LoadBalancer.cpp: Updated for Gourdon's algorithm.
  • primecount.cpp: Updated for Gourdon's algorithm.
  • print.cpp: Updated for Gourdon's algorithm.
  • generate.cpp: Generate array with largest prime factors.
  • test.cpp: Test Gourdon's algorithm.
  • README.md: Update benchmarks.

v4.8

4 years ago

See the ChangeLog for what's new.