Try migration to dragonfly #269

alexey-yarmosh · 2023-01-03T15:50:07Z

Seems like all redis JSON operations which we are using (get, set, strappend) are implemented by dragonfly team/contributors (https://github.com/dragonflydb/dragonfly/releases/tag/v0.12.0). Since that was a main blocker in using dragonfly we can try to migrate to it and measure the performance diff.

Questions to answer:

Does our API work correctly after migrations?
Do utility tools work fine (monitoring, integration with newrelic, db management tools)?
What is the perf diff?

MartinKolarik · 2023-03-14T15:17:13Z

I don't think this is necessary at all because redis can scale just as well or better with the cluster module, we just don't use it yet. The client library has support for clustering, so with small config changes, we should be able to utilize all cores with redis, too, without using an entirely different DB.

The benchmarks at https://redis.com/redis-enterprise/technology/linear-scaling-redis-enterprise/ suggest redis achieves almost 100% linear scaling, while dragonfly showcases "25x speedup" but that's on a 32 core / 64 thread server - so at least on the first sight, worse than redis.

jimaek · 2023-03-14T15:19:17Z

Clustering Redis reliably is a pain, especially on OSS version.
So I am 100% for migration to Dragonfly

MartinKolarik · 2024-12-28T02:22:44Z

I looked into this closer with two main reasons for the possible migration being the zero setup scaling and the new SSD Data Tiering. On paper, it really seems like switching to Dragonfly might be the easiest way to deal with scaling both the load and storage requirements.

Unfortunately, the way Dragonfly presents itself is somewhat misleading. It may be "almost Redis", but definitely not a "drop-in Redis replacement". It looks like they copied the redis documentation 1:1 as it was but didn't test it very well and in some cases, didn't even implement part of the documented features. Just running our test suite revealed three bugs:

SHARDNUMSUB option of PUBSUB command is not supported, despite being documented (Additional PUBSUB options dragonflydb/dragonfly#847)
several errors in handling JSON.SET (value not stored without any error, wrongly reported "syntax error") https://dl.dropboxusercontent.com/scl/fi/x0mnru4igixxw9nv3sqlw/2024-12-27_21-52-20.png?rlkey=j8bzl1xcq8xudmgm55sco0cms&dl=0 (Wrong handling of JSON paths in brackets dragonflydb/dragonfly#4381)
GEOSEARCH response has a wrong format (doesn't match neither real redis nor Dragonfly's own docs) (Wrong GEOSEARCH response format dragonflydb/dragonfly#4382)

Given the nature of these bugs, I have no confidence that we wouldn't hit more if we started to use it, as their tests clearly don't cover lots of stuff.

Additionally:

some commands that we use indirectly via our dependencies like SPUBLISH and SSUBSCRIBE are not yet supported (Add support for SPUBLISH dragonflydb/dragonfly#3001),
there are some intentional differences in the default behavior, for example, Dragonfly does not allow accessing undeclared keys in lua scripts dragonflydb/dragonfly#272, and even though it's configurable, changing it "makes multi-threaded Dragonfly less efficient than single-threaded Redis Core when calling EVALs at high throughput".

The missing commands could be worked around on our side by forking and editing the affected packages, but the other bugs are more serious, and even if they got fixed soon, I wouldn't trust Dragonfly enough to use it as one of the main components of our system.

For scaling, proper cluster configuration of redis is going to be the best option. On a single server, it shouldn't cause any big issues, and it'll also resolve our problem with slow RDB loading on startup. The performance is likely going to be better than Dragonfly's.

As for long-term storage of measurement results, we can implement it relatively easily by using a separate DB (Maria) or even an object storage (S3) for finished measurements. Alternatively, we can consider this a bit later as a part of #291 since "long term storage" is required there as well.

jimaek · 2024-12-28T09:52:56Z

Adding yet another DB to manage (Maria) is not ideal, the current system is much simpler. And a timeseries DB for non-timeseries long-term data storage is also not ideal. S3 sounds better as there is nothing to manage, but I worry about costs related to requests and bandwidth.

I guess we could try Redis clustering within a single server and see how it goes and then consider moving measurements. But I do have little hope for OSS Redis, they keep adding the best stuff to the paid version.

alexey-yarmosh mentioned this issue Mar 9, 2023

Use multiple cores by the API #224

Closed

MartinKolarik mentioned this issue Dec 28, 2024

fix: correctly declare keys in redis scripts #595

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try migration to dragonfly #269

Try migration to dragonfly #269

alexey-yarmosh commented Jan 3, 2023

MartinKolarik commented Mar 14, 2023

jimaek commented Mar 14, 2023

MartinKolarik commented Dec 28, 2024 •

edited

Loading

jimaek commented Dec 28, 2024

Try migration to dragonfly #269

Try migration to dragonfly #269

Comments

alexey-yarmosh commented Jan 3, 2023

MartinKolarik commented Mar 14, 2023

jimaek commented Mar 14, 2023

MartinKolarik commented Dec 28, 2024 • edited Loading

jimaek commented Dec 28, 2024

MartinKolarik commented Dec 28, 2024 •

edited

Loading