如何提高单个dataNode的compaction task的并行度? #38830
Unanswered
xiaobingxia-at
asked this question in
Q&A and General discussion
Replies: 1 comment
-
compaction任务不宜并行,过犹不及 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
由于各种原因,需要用到大量partition,以及大量segment。因此需要调整cluster,能够尽快完成大量segment compaction的工作。
我的理解是这样的,如果有10个datanode:
dataCoord.compaction.maxParallelTaskNum = 1000
dataNode.slot.slotCap = 100
dataCoord.slot.mixCompactionUsage = 1000
dataCoord.slot.l0DeleteCompactionUsage = 1000
10个data node,每个node对应的slotCap是100, 所以对全局来说,我们总共有1000个compaction task slot。(我的理解一个task slot可以提供一个thread,跑一个compaction task)
由于:mixCompactionUsage = 1000, 所以对全局来说,我最多可以同时跑1000个mix compaction task。由于:l0DeleteCompactionUsage = 1000,所以对全局来说,我最多可以同时跑1000个l0 compaction task。
但由于maxParallelTaskNum = 1000,对于全局来说,所有种类的,同时在跑的compaction task不得超过1000.
请问该理解是否准确?通过以上策略能否有效提高compaction task的效率,达到快速降低segment数目的目的。
另外,我希望避免coordinator每次compaction时,轮询所有的segment,因为发现每次coordinator变得很忙时,scaling query node会出现问题(特别慢)。我希望每个小时,trigger一次所有的compaction,但每次trigger compaction,就可以把所有的segment都给merge了。我理解的没错的话,data coord是有个compaction task queue?每次做compaction plan,submit很多compaction task到这个queue里。然后基于maxParallelTaskNum, mixCompactionUsage, l0DeleteCompactionUsage 以及每个data node的dataNode.slot.slotCap 来从task queue里一个个并发完成compaction task,这个理解对吗?谢谢
Beta Was this translation helpful? Give feedback.
All reactions