[Feature]: Improve the latency of analysis when there are some errors in cluster #1236

jxs1211 · 2024-08-23T23:39:10Z

Checklist

I've searched for similar issues and couldn't find anything matching
I've discussed this feature request in the K8sGPT Slack and got positive feedback

Is this feature request related to a problem?

None

Problem Description

Recently I encountered some latency issues like the one below which caused a long wait for the response if there were some errors (around 30+ errors)in the cluster:

k8sgpt analyze -o json --kubecontext   1.10s user 0.10s system 3% cpu 30.487 total
jq .  0.04s user 0.01s system 0% cpu 30.495 total

Solution Description

I hope the latency will be limited to seconds under good network connectivity.

Benefits

It will improve the user experience, and make the analysis process more efficient.

Potential Drawbacks

No response

Additional Information

No response

matthisholleville · 2024-08-26T11:42:14Z

Hi @jxs1211

I can reproduce the issue. The problem seems to come from the Service analyzer.

(matthisholleville) ➜  k8sgpt git:(main) ✗ time ./bin/k8sgpt analyze
2024/08/26 13:32:47 Analyzer Ingress took 57.249625ms
2024/08/26 13:32:47 Analyzer PersistentVolumeClaim took 57.525541ms
2024/08/26 13:32:47 Analyzer CronJob took 64.764125ms
2024/08/26 13:32:47 Analyzer Node took 162.466084ms
2024/08/26 13:32:47 Analyzer Deployment took 168.8345ms
2024/08/26 13:32:47 Analyzer ReplicaSet took 281.76075ms
2024/08/26 13:32:47 Analyzer StatefulSet took 501.487167ms
2024/08/26 13:32:48 Analyzer Pod took 1.263196583s
2024/08/26 13:33:23 Analyzer Service took 36.287712708s

I'm looking into optimizing it and adding a flag to display stats for each analyzer.

matthisholleville · 2024-08-26T12:45:18Z

In my case, the Service analyzer needs to analyze 182 items. By adding more logs, I can see that each item takes approximately 1 second to be analyzed. Concurrency is managed at the analyzer level, but we might want to consider applying it within each analyzer as well. What do you think, @AlexsJones ?

Also, do you think it would be useful to have more detailed statistics, such as the number of items analyzed and the P90 of the execution time per item?

AlexsJones · 2024-08-26T12:47:05Z

In my case, the Service analyzer needs to analyze 182 items. By adding more logs, I can see that each item takes approximately 1 second to be analyzed. Concurrency is managed at the analyzer level, but we might want to consider applying it within each analyzer as well. What do you think, @AlexsJones ?

Also, do you think it would be useful to have more detailed statistics, such as the number of items analyzed and the P90 of the execution time per item?

I think we should have the ability to either be selective on logging or turn it off.
If we split out analysers it will absolutely hammer the K8s API in/out of cluster

matthisholleville · 2024-08-26T12:50:13Z

In this PR I've proposed a new option for displaying stats only.

It seems to me that it might be interesting to have information such as the number of items analyzed, for example.

jxs1211 · 2024-09-18T14:11:42Z

Hi @jxs1211

I can reproduce the issue. The problem seems to come from the Service analyzer.

(matthisholleville) ➜  k8sgpt git:(main) ✗ time ./bin/k8sgpt analyze
2024/08/26 13:32:47 Analyzer Ingress took 57.249625ms
2024/08/26 13:32:47 Analyzer PersistentVolumeClaim took 57.525541ms
2024/08/26 13:32:47 Analyzer CronJob took 64.764125ms
2024/08/26 13:32:47 Analyzer Node took 162.466084ms
2024/08/26 13:32:47 Analyzer Deployment took 168.8345ms
2024/08/26 13:32:47 Analyzer ReplicaSet took 281.76075ms
2024/08/26 13:32:47 Analyzer StatefulSet took 501.487167ms
2024/08/26 13:32:48 Analyzer Pod took 1.263196583s
2024/08/26 13:33:23 Analyzer Service took 36.287712708s

I'm looking into optimizing it and adding a flag to display stats for each analyzer.

@matthisholleville Good catch, so do we have any chance to add concurrency in each analyzer for handling many items situation

AlexsJones · 2024-10-03T17:35:49Z

The analyzers themselves are concurrent, but our previous conversation was about making the routines within the analysers also task in parallel. The challenge here is going to be the API rate limit and back pressure

jxs1211 · 2024-10-04T01:55:05Z

The analyzers themselves are concurrent, but our previous conversation was about making the routines within the analysers also task in parallel. The challenge here is going to be the API rate limit and back pressure

Gotcha, so do we need more discussion on that?

github-project-automation bot added this to Backlog Aug 23, 2024

github-project-automation bot moved this to Proposed in Backlog Aug 23, 2024

matthisholleville mentioned this issue Aug 26, 2024

feat: add stats option to analyze command for performance insights #1237

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Improve the latency of analysis when there are some errors in cluster #1236

[Feature]: Improve the latency of analysis when there are some errors in cluster #1236

jxs1211 commented Aug 23, 2024 •

edited

Loading

matthisholleville commented Aug 26, 2024

matthisholleville commented Aug 26, 2024

AlexsJones commented Aug 26, 2024

matthisholleville commented Aug 26, 2024

jxs1211 commented Sep 18, 2024 •

edited

Loading

AlexsJones commented Oct 3, 2024

jxs1211 commented Oct 4, 2024

[Feature]: Improve the latency of analysis when there are some errors in cluster #1236

[Feature]: Improve the latency of analysis when there are some errors in cluster #1236

Comments

jxs1211 commented Aug 23, 2024 • edited Loading

Checklist

Is this feature request related to a problem?

Problem Description

Solution Description

Benefits

Potential Drawbacks

Additional Information

matthisholleville commented Aug 26, 2024

matthisholleville commented Aug 26, 2024

AlexsJones commented Aug 26, 2024

matthisholleville commented Aug 26, 2024

jxs1211 commented Sep 18, 2024 • edited Loading

AlexsJones commented Oct 3, 2024

jxs1211 commented Oct 4, 2024

jxs1211 commented Aug 23, 2024 •

edited

Loading

jxs1211 commented Sep 18, 2024 •

edited

Loading