A better compressed bitset in Rust
RoaringTreemap::select
returns None
if n >= len()
by @shekhirin in https://github.com/RoaringBitmap/roaring-rs/pull/264
Full Changelog: https://github.com/RoaringBitmap/roaring-rs/compare/v0.10.2...v0.10.3
RoaringBitmap::remove_smallest/biggests
methodsThanks to @shimatar0, who added those two methods. They remove the n
smallest or biggest values from a RoaringBitmap
. If n
is bigger than the cardinality of the bitmap, it clears it.
let mut rb = RoaringBitmap::from_iter([1, 5, 7, 9, 10, 15, 45]);
rb.remove_biggest(2);
assert_eq!(rb, RoaringBitmap::from_iter([1, 5, 7, 9, 10]));
rb.remove_smallest(1);
assert_eq!(rb, RoaringBitmap::from_iter([5, 7, 9, 10]));
RoaringBitmaps/Treemap
s support run containers when deserializingThanks in part to @josephglanville for the original deserialization code. Bitmaps and treemaps can deserialize run containers which are useful when those come from other libraries from other languages.
Note that this library supports run container operations and only converts the run containers into array or bitmap containers depending on the cardinality and will, therefore, not serialize containers into run ones either.
RoaringBitmap
or RoaringTreemap
from an arrayThanks to @michaelmior, you can use the From::from
trait method to construct a bitmap from an array. Building them using the FromIterator::from_iter
trait method was already possible. It is just even from convenient now.
let bitmap = RoaringBitmap::from([10]);
let treemap = RoaringTreemap::from([1,2,3]);
While doing crate maintenance, I had to update different crates and bump the minimal supported Rust version to 1.65, which was released eight months ago.
This release is a patch in which @jonasspinner fixed the MultiOps
union for Result
iterators which was wrongly using the intersection operations.
In this new v0.10 release, we introduce significant improvements for multiple bitmap operations and many new methods for range operations. You can find the documentation on docs.rs. The only reason it is a breaking release is due to the simd
feature that was updated to match the new types of the std::simd
nightly module.
MultiOps
trait@irevoire based this trait on @saik0's previous work. It brings a new fresh API to help in executing different set operations on a set of RoaringBitmap/Treemap
. With up to 37x improvements on intersection operations and 7x on unions, it clearly makes this PR worth releasing. If you want more numbers, you can look at some benchmarks in the original PR.
The MultiOps
trait is implemented for anything that implements IntoIterator
where Item
is a RoaringBitmap/Treemap
, but there also are try_
versions of it when your Item
is a Result
. It can be very useful when the sets are lazily deserialized, for example.
use roaring::{MultiOps, RoaringBitmap};
let bitmaps = [
RoaringBitmap::from_iter(0..10),
RoaringBitmap::from_iter(10..20),
RoaringBitmap::from_iter(20..30),
];
// Stop doing this
let naive = bitmaps.clone().into_iter().reduce(|a, b| a | b).unwrap_or_default();
// And start doing this instead. It will be much faster!
let iter = bitmaps.union();
assert_eq!(naive, iter);
@RaduBerinde made a huge improvement to the whole set of intersection operations by implementing a better version of the retain method that is now branchless. Thank you very much! This PR does between 1.03x and 1.37x improvement, sometimes slowing down the performances but for the good of all the other benchmarks.
While @Dr-Emann was introducing the new contains_range
and range_cardinality
methods on the RoaringBitmap
type, @not-jan was working on the RoaringTreemap::insert_range
method.
use roaring::{RoaringTreemap, RoaringBitmap};
let mut rb = RoaringTreemap::new();
rb.insert_range(2..4);
assert!(rb.contains(2));
assert!(rb.contains(3));
assert!(!rb.contains(4));
let mut rb = RoaringBitmap::new();
rb.insert_range(2..45_000);
assert!(rb.contains_range(2..30));
assert!(!rb.contains_range(0..3));
assert!(!rb.contains_range(6_000..46_001));
assert_eq!(rb.range_cardinality(6_000..45_001), 39_000);
While I (@Kerollmops) was reviewing the work on the RoaringBitmap::insert_range
I also introduced more methods for full RoaringBitmap
and RoaringTreemap
. You can now create full bitmaps and treemaps.
use roaring::{RoaringTreemap, RoaringBitmap};
let mut rt = RoaringTreemap::full();
assert!(rt.is_full());
assert!(rt.contains(42));
let mut rb = RoaringBitmap::full();
assert!(rb.is_full());
assert!(!rt.is_empty());
@irevoire didn't only work on the MultiOps
trait but also introduced the support of serde! You can use enable it by specifying the serde
feature and then convert your RoaringBitmap
and RoaringTreemap
into any format supported by serde and back into a bitmap.
use roaring::{RoaringTreemap, RoaringBitmap};
let original = RoaringTreemap::from_range(6..42);
let json = serde_json::to_vec(&original).unwrap();
let output = serde_json::from_slice(&json).unwrap();
assert_eq!(original, output);
let original = RoaringBitmap::from_range(16..142);
let buffer = bincode::serialize(&bitmap).unwrap();
let output = bincode::deserialize(&buffer).unwrap();
assert_eq!(bitmap, output);
Hey folks,
Before I start the note on the new release, I'll just introduce what roaring-rs is. It is a Rust idiomatic port of the Roaring Bitmap data structure introduced by Daniel Lemire, G. Ssi-Yan-Kai and O. Kaser. The data structure simply defines better-compressed sets of bits, faster and memory-efficient data structures for performing operations on sets, i.e. intersections, unions... and so on.
So, about this new version of roaring-rs, it is a bit special because it's the version that introduces the most performance improvements in a while and on the most topics. Most of this amazing work was done by @saik0, a new contributor to the crate 🎉. @schmidek implemented DoubleEndedIterator
on the bitmap iterators 💪
union_with
. Use the corresponding operators |=
.deserialize_from
validates inputs. In some cases, it can be 4x slower. For workloads that are heavy in deserialization from trusted sources migrate to deserialize_unchecked_from
.from_sorted_iter
and append
are exponentially faster. They should be preferred over collect
and extend
whenever adding monotonically increasing integers to the set as it's about 2-2.5x faster.Min | Mean | Max | |
---|---|---|---|
Iteration | 6% | 57% | 125% |
And | 0% | 7% | 22% |
Or | 0% | 10% | 33% |
Sub | 0% | 9% | 39% |
Xor | 4% | 90% | 209% |
rank
Returns the number of integers that are lower or equal to a given value. rank(u64::MAX) == len()
.select
Returns the n
th integer in the set or None
if n >= len()
.union_len
, intersection_len
... and so on. Compute the cardinality of a set operation without materializing it. Usually about twice as fast as materializing the bitmap.DoubleEndedIterator
.ExactSizeIterator
(on 64 bit targets only).Min | Mean | Max | |
---|---|---|---|
And | 0% | 34% | 141% |
Or | 2% | 45% | 145% |
Sub | 0% | 51% | 168% |
Xor | 0% | 130% | 437% |
Perf numbers are for pairwise operations on collections of bitmaps from real datasets. Pairwise as in: B1 ∩ B2, B2 ∩ B3, ... BN-1 ∩ BN
Sub::sub
implementation of the internal Store
type (Thanks to @kolstae).insert_range
API to accept the idiomatic RangeBounds<u32>
(thanks to @oliverdding and @tomjw64).Pairs
Iterator type.from_sorted_iter
and append methods to return a Result
(Thanks to @MarinPostma).