-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant-time cardinality? #410
Comments
I would find this useful. I'm not sure how tricky it would be for all operators. Possibly it would be useful to consider a lazy-evaluation type of thing where when we modify the bitmap in a way in which the new cardinality is not simple to compute (ORing two bitmaps for example) we set the cardinality to an invalid value, then when someone invokes cardinality() and the value is invalid we'd compute it then. Of course we have to be able to determine an "invalid" value. Maybe using 0 would be OK; if the value is 0 then we'd check to see if the cardinality is really 0 which should be fast; if not then compute the real cardinality. One potential death-knell for this idea is that it makes cardinality() no longer a read-only operation... it could modify the bitmap metadata. It wouldn't modify the content of the bitmap but it could still be problematic. |
Sometimes we need cardinality to determine whether it is necessary to change container (when ORing two arrays, if the cardinality of the result reaches a threshold, the result should be a bitset). So we need more than an invalid value. Is there a fast method to approximatly estimate cardinality? We can use an estimated value when we need to lower the performance drag, then mark the cardinality as "not precise", and calculate a precise one when cardinality() is called. |
Is there any appetite for being able to provide cardinality in constant time, on both the 32-bit and 64-bit sides? As a customer of the library I feel like I've wanted this.
Right now it's calculated on demand. In
Roaring64Map
, the methodcardinality()
does a full scan of the map every time, and evenisFull()
will do a lot of work if applied to a map with a large dense prefix of full 32-bit bitmaps.isStrictSubset()
also does twocardinality()
calls that are not, from an algorithmic standpoint, strictly necessary, so that's a performance drag. On the 32-bit side, it's not constant-time either, requiring an iteration over the containers and an inner iteration over the run containers if there are any.Providing constant-time cardinality seems possible but not necessarily "easy" so I wanted to check with you first.
I see three approaches here and I wanted to ask your opinion:
isStrictSubset
so it does a more clever subset calculation that doesn't require any calls tocardinality()
The text was updated successfully, but these errors were encountered: