You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Requirement:
I have to read data from kafka and send it to datadog. The catch here is I have one kafka topic and one datadog endpoint for each customer. And there are 5000 customers, so in total we have 5000 of kafka topics and 5000 datadog plugins.
When the scale was low, I was creating one telegraf pod per customer to read from kafka topic and to send to datadog. But as the scale went to 5000, the Ops is worried about the resources constraints and monitoring of all those 5000 telegraf pods. Kafka topic will receive 5000 * 1KB of data every 10sec, the scale of data can also increase in future.
Is there any optimized way to handle this? Upon researching a bit I came accross two approaches
To have input plugins of 5000 kafka topics and output plugins in the same telegraf.conf file. By default telegraf sends data from all the input plugins to all the output plugins, but with tagpass(unique tag for each customer) we can restrict metric from one topic to be routed for its corresponding datadog output plugin. But I doubt if telegraf node can handle this at the scale of 5000 customers, because the time complexity will become O(N^2) and not sure how much resources(cpu, mem) should be given for that single telegraf pod.
I understand that telegraf might not be built to handle such usecase and I should probably use a microservice which should do it, but would love to know if it's possible to achieve this with telegraf?
The text was updated successfully, but these errors were encountered:
Requirement:
I have to read data from kafka and send it to datadog. The catch here is I have one kafka topic and one datadog endpoint for each customer. And there are 5000 customers, so in total we have 5000 of kafka topics and 5000 datadog plugins.
When the scale was low, I was creating one telegraf pod per customer to read from kafka topic and to send to datadog. But as the scale went to 5000, the Ops is worried about the resources constraints and monitoring of all those 5000 telegraf pods. Kafka topic will receive 5000 * 1KB of data every 10sec, the scale of data can also increase in future.
Is there any optimized way to handle this? Upon researching a bit I came accross two approaches
To have input plugins of 5000 kafka topics and output plugins in the same telegraf.conf file. By default telegraf sends data from all the input plugins to all the output plugins, but with tagpass(unique tag for each customer) we can restrict metric from one topic to be routed for its corresponding datadog output plugin. But I doubt if telegraf node can handle this at the scale of 5000 customers, because the time complexity will become O(N^2) and not sure how much resources(cpu, mem) should be given for that single telegraf pod.
To have individual telegraf services running in the same pod ... as discussed in Telegraf Configuration - Recommended approach for multiple .conf files? #6334 (comment). But won't be possible for the scale of 5000 customers.
I understand that telegraf might not be built to handle such usecase and I should probably use a microservice which should do it, but would love to know if it's possible to achieve this with telegraf?
The text was updated successfully, but these errors were encountered: