-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genericize RoaringBitmap and implements Roaring32 and Roaring64 #260
base: main
Are you sure you want to change the base?
Conversation
d8bb37b
to
494327b
Compare
/// A generic compressed bitmap using the [Roaring bitmap compression scheme](https://roaringbitmap.org/). | ||
#[derive(PartialEq)] | ||
pub struct RoaringBitmap<V: Value> { | ||
containers: Vec<container::Container<V>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that using a Vec for RoaringBitmap64 might not be the best idea. For RoaringBitmap32, in the worst case, we're moving around 64k containers, but for RoaringBitmap64, inserting a new container may involve quite a lot of copying, which is why the original used a BTreeMap instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's depends on the distribution of your dataset/workload, using a BTreeMap
isn't great either in term locality/pointer chasing.
I would say that for most of the case it should be fine, at least the paper didn't notice large difference in memory usage.
But maybe both approach should be offered?
This will allows to easily implement Roaring64, a 64-bit RoaringBitmap using the TwoLevelRoaringBitmap approach, which is up to 11x faster than the RoaringTreemap approach and reuse most of Roaring32 code. Note that RoaringBitmap::full have been removed because it won't scale for Roaring64. Comment out RoaringTreemap for now, will be replaced in the next commit.
Benchmark results on insert_range (only bench available for 64-bit): group roaring64 treemap ----- --------- ------- from_empty_1000 1.00 87.6±4.76ns 10.6 GElem/sec 1.65 144.2±1.28ns 6.5 GElem/sec from_empty_10000 1.00 166.8±9.32ns 55.8 GElem/sec 1.28 213.6±7.26ns 43.6 GElem/sec from_empty_8589934590 1.01 151.5±0.99ms 52.8 GElem/sec 1.00 149.7±1.00ms 53.5 GElem/sec pre_populated_1000 1.00 139.3±19.83ns 6.7 GElem/sec 1.30 180.8±20.70ns 5.2 GElem/sec pre_populated_10000 1.00 235.4±83.29ns 39.6 GElem/sec 1.26 295.9±106.25ns 31.5 GElem/sec pre_populated_8589934590 1.00 74.8±2.56ms 107.0 GElem/sec 1.01 75.3±1.82ms 106.3 GElem/sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this big PR 🤗
Would it be possible to make the IntoIter
type public, please? I was trying your new branch and was unfortunately unable to use it. I don't know if there is a flag or something to make sure cargo warns about it...
Done in the last commit |
Roaring32
is the new name ofRoaringBitmap
(which is now the core generic implementation,RoaringBitmap<V>
).Roaring64
is the 64-bit version and replacesRoaringTreemap
.