BitFaster.Caching Versions Save

High performance, thread-safe in-memory caching primitives for .NET

1 year ago

What's changed

Added an ItemUpdated event for all LRU classes, including the scoped and atomic cache decorators.
ConcurrentTLru/FastConcurrentTLru now use a clock based on Environment.TickCount64 for .NET Core 3 and .NET6 build targets, instead of Stopwatch.GetTimestamp. The smallest reliable time to live increases from about 1us to about 16ms (so precision is now worse), but the overhead of the TLRU policy drops significantly from about 170% to 20%. This seems like a good tradeoff, since expiring items faster than 16ms is not common. .NET Standard continues to use the previous high resolution clock since TickCount is 32 bit only on .NET Framework.
On .NET Core 3 and .NET6 LruBuilder will automatically fall back to the previous higher resolution clock if the specified TTL is less than 32ms.
Fixed Atomic cache count and enumeration methods such that partially created items are not visible externally. Count, enumerate and TryGet methods now all return consistent results if a factory delegate throws during item creation.
Fixed Atomic cache debug view, all caches now have a consistent debugger experience.

Full changelog: https://github.com/bitfaster/BitFaster.Caching/compare/v2.1.1...v2.1.2

1 year ago

Update CmSketch to use block-based indexing, matching Caffeine. The 64-byte blocks are the same size as x86 cache lines. This scheme exploits the hardware by reducing L1 cache misses, since each increment or frequency call is guaranteed to use data from the same cache line.
Vectorize the hot methods in CmSketch using AVX2 intrinsics. When combined with block indexing, this is 2x faster than the original implementation in benchmarks and gives 20% better ConcurrentLfu throughput when tested end to end.
ConcurrentLfu uses a Running value cache when comparing frequency. In the best case this reduces the number of sketch frequency calls by 50%. Improves throughput.
Unrolled the loop in CmSketch.Reset, reduces reset execution time by about 40%. This is called periodically so reduces worst case rather than average ConcurrentLfu maintenance time.
Implement a ThrowHelper invoked from all exception call sites. Reduces the size of the generated asm. Eliminated an unnecessary throw from the ConcurrentLfu hot path, minor latency reduction when benchmarked.
Increase ConcurrentLru cycle count when evicting items. Prevents runaway growth when stress tested on AMD CPUs.
ConcurrentLfu disposes items created but not cached when races occur during GetOrAdd.

Full changelog: https://github.com/bitfaster/BitFaster.Caching/compare/v2.1.0...v2.1.1

1 year ago

Added ConcurrentLfu, a .NET implementation of the W-TinyLfu admission policy. This closely follows the approach taken by the Caffeine library by Ben Manes - including buffered reads/writes and hill climbing to optimize hit rate. A ConcurrentLfuBuilder provides integration with the existing atomic value factory and scoped value features.
To support ConcurrentLfu added the MpscBoundedBuffer and StripedMpscBuffer classes.
To support ConcurrentLfu added the ThreadPoolScheduler, BackgroundThreadScheduler and ForegroundScheduler classes.
Added the Counter class for fast concurrent counting, based on LongAdder by Doug Lea.
Updated ConcurrentLru to use Counter for all metrics and added padding to internal queue counters. This improved throughput by about 2.5x with about 10% worse latency.
Added DebuggerTypeProxy types to customize the debugger view of FastConcurrentLru, ConcurrentLru, FastConcurrentTLru, ConcurrentTLru and ConcurrentLfu.
API documentation is now included in the NuGet package. Provided documentation for all public APIs.
Rewrote and corrected bugs in the throughput analysis tests, which now support Read, Read + Write, Update and Evict scenarios.

Full changelog: https://github.com/bitfaster/BitFaster.Caching/compare/v2.0.0...v2.1.0

1 year ago

Split ICache into ICache, IAsyncCache, IScopedCache and IScopedAsyncCache interfaces. Mixing sync and async code paths is problematic and generally discouraged. Splitting sync/async enables the most optimized code for each case. Scoped caches return Lifetime<T> instead of values, and internally have all the boilerplate code to safely resolve races.
Added ConcurrentLruBuilder, providing a fluent builder API to ease creation of different cache configurations. Each cache option comes with a small performance overhead. The builder enables the developer to choose the exact combination of options needed, without any penalty from unused features.
Cache interfaces have optional metrics, events and policy objects depending on the options chosen when constructing the cache.
Implemented optional support for atomic GetOrAdd methods (configurable via ConcurrentLruBuilder), mitigating cache stampede.
ConcurrentLru now has configurable hot, warm and cold queue size via ICapacityPartition. Default partition scheme changed from equal to 80% warm via FavorWarmPartition, improving hit rate across all tests.
Fixed ConcurrentLru warmup, allowing items to enter the warm queue until warm is full. Improves hit rate across all tests.
Added hit rate analysis tests for real world traces from Wikibench, ARC and glimpse workloads. This replicates the test suite used for Java's Caffeine.
Async get methods now return ValueTask, reducing memory allocations.
Added eviction count to cache metrics

Full changelog: https://github.com/bitfaster/BitFaster.Caching/compare/v1.1.0...v2.0.0

1 year ago

Added Trim(int itemCount) to ICache and all derived classes
Added TrimExpired() to TLRU classes
When an item is updated TLRU classes reset the item timestamp, extending TTL
Fixed TLRU long ticks policy on macos
Add Cleared and Trimmed to ItemRemovedReason
Item removal event that fires on Clear() now has ItemRemovedReason.Cleared instead of Evicted

Full changelog: https://github.com/bitfaster/BitFaster.Caching/compare/v1.0.7...v1.1.0

2 years ago

Added diagnostic features to dump cache contents:

ClassicLru/ConcurrentLru family implements IEnumerable<KeyValuePair<K,V>>. Enables enumeration of keys and values in the cache.
ClassicLru/ConcurrentLru family implements ICollection<K> Keys. Enables enumeration of the keys in the cache.

2 years ago

Implement ItemRemoved event on ConcurrentLru and ConcurrentTLru
NuGet package now produced with deterministic build. See https://reproducible-builds.org/ for info

2 years ago

Wrap LRU dispose logic in a static class
Dispose objects created by LRU valueFactory that are not added to the cache due to a race
Fix race in disposed Scoped instances
.net6 build target compiled using .NET SDK 6.0.100

2 years ago

Add Clear method to ICache and LRUs (ClassicLru, ConcurrentLru, ConcurrentTLru)
Fixed ConcurrentLRU so that capacity is exact when ctor arg is not divisible by 3
Added net6 build target (compiled with .NET SDK 6.0.0-preview.7)
Removed netcoreap2.0 and netcoreapp3.0 (since out of support)

3 years ago

Fixed ObjectDisposedException in SingletonCache.Release()
Old values are now disposed when replaced when calling the LRU methods TryUpdate() and AddOrUpdate()
LRU uses atomic ConcurrentDictionary method to remove items