Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake: TestAPI_CreateRuleGroupWithCaching #10294

Open
seizethedave opened this issue Dec 20, 2024 · 1 comment
Open

flake: TestAPI_CreateRuleGroupWithCaching #10294

seizethedave opened this issue Dec 20, 2024 · 1 comment

Comments

@seizethedave
Copy link
Contributor

        	Error Trace:	/__w/mimir/mimir/pkg/ruler/api_test.go:1479
        	Error:      	Not equal: 
        	            	expected: 5
        	            	actual  : 4
        	Test:       	TestAPI_CreateRuleGroupWithCaching

GH action log.

fail
--- FAIL: TestAPI_CreateRuleGroupWithCaching (0.51s)
    logger.go:38: 2024-12-20 16:51:40.803825486 +0000 UTC m=+4.958218709 instance localhost level info msg cleaning up mapped rules directory path /tmp/TestAPI_CreateRuleGroupWithCaching2658038317/001
    logger.go:38: 2024-12-20 16:51:40.804805104 +0000 UTC m=+4.959198357 instance localhost level info msg ring doesn't exist in KV store yet
    logger.go:38: 2024-12-20 16:51:40.804838695 +0000 UTC m=+4.959231917 instance localhost level info msg instance not found in the ring instance localhost ring ruler
    logger.go:38: 2024-12-20 16:51:40.8051244[79](https://github.com/grafana/mimir/actions/runs/12435238525/job/34720641880#step:8:80) +0000 UTC m=+4.959517702 instance localhost level info msg waiting until ruler is JOINING in the ring
    logger.go:38: 2024-12-20 16:51:40.805175624 +0000 UTC m=+4.959568848 instance localhost level info msg ruler is JOINING in the ring
    logger.go:38: 2024-12-20 16:51:40.[80](https://github.com/grafana/mimir/actions/runs/12435238525/job/34720641880#step:8:81)5202304 +0000 UTC m=+4.959595527 instance localhost level info msg syncing rules reason initial
    logger.go:38: 2024-12-20 16:51:40.805433856 +0000 UTC m=+4.959827079 instance localhost level info msg waiting until ruler is ACTIVE in the ring
    logger.go:38: 2024-12-20 16:51:40.946452079 +0000 UTC m=+5.100845302 instance localhost level info msg ruler is ACTIVE in the ring
    logger.go:38: 2024-12-20 16:51:40.946539662 +0000 UTC m=+5.100932895 instance localhost level info msg ruler is only now starting to evaluate rules
    logger.go:38: 2024-12-20 16:51:40.946582071 +0000 UTC m=+5.100975304 instance localhost level info msg ruler up and running
    logger.go:38: 2024-12-20 16:51:41.105857753 +0000 UTC m=+5.260250986 instance localhost level info msg syncing rules reason api-change
    logger.go:38: 2024-12-20 16:51:41.106846727 +0000 UTC m=+5.261239960 instance localhost level info user user1 msg updating rule file file /tmp/TestAPI_CreateRuleGroupWithCaching2658038317/001/user1/namespace1
    logger.go:38: 2024-12-20 16:51:41.108599323 +0000 UTC m=+5.262992566 instance localhost level info component ruler insight true user user1 caller manager.go:179 time 2024-12-20 16:51:41.108568906 +0000 UTC m=+5.262962120 msg Starting rule manager...
    logger.go:38: 2024-12-20 16:51:41.304857576 +0000 UTC m=+5.459250799 instance localhost level info msg syncing rules reason api-change
    api_test.go:1479: 
        	Error Trace:	/__w/mimir/mimir/pkg/ruler/api_test.go:1479
        	Error:      	Not equal: 
        	            	expected: 5
        	            	actual  : 4
        	Test:       	TestAPI_CreateRuleGroupWithCaching
    logger.go:38: 2024-12-20 16:51:41.306691194 +0000 UTC m=+5.461084417 instance localhost level info msg ring lifecycler is shutting down ring ruler
    logger.go:38: 2024-12-20 16:51:41.3069577 +0000 UTC m=+5.461350924 instance localhost level info msg unregistering instance from ring ring ruler
    logger.go:38: 2024-12-20 16:51:41.307046937 +0000 UTC m=+5.461440170 instance localhost level info msg instance removed from the ring ring ruler
    logger.go:38: 2024-12-20 16:51:41.307401879 +0000 UTC m=+5.461795112 instance localhost level info user user1 msg updating rule file file /tmp/TestAPI_CreateRuleGroupWithCaching2658038317/001/user1/namespace1
    logger.go:38: 2024-12-20 16:51:41.308264421 +0000 UTC m=+5.462657644 instance localhost level info msg stopping user managers
    logger.go:38: 2024-12-20 16:51:41.308356903 +0000 UTC m=+5.462750126 instance localhost level info component ruler insight true user user1 caller manager.go:193 time 2024-12-20 16:51:41.308329001 +0000 UTC m=+5.462722224 msg Stopping rule manager...
    logger.go:38: 2024-12-20 16:51:41.308443274 +0000 UTC m=+5.462836487 instance localhost level info component ruler insight true user user1 caller manager.go:203 time 2024-12-20 16:51:41.308433656 +0000 UTC m=+5.462826879 msg Rule manager stopped
    logger.go:38: 2024-12-20 16:51:41.308492566 +0000 UTC m=+5.462885799 instance localhost level info msg all user managers stopped
    logger.go:38: 2024-12-20 16:51:41.308520779 +0000 UTC m=+5.462914002 instance localhost level info msg stopping user notifiers
    logger.go:38: 2024-12-20 16:51:41.308583004 +0000 UTC m=+5.462976228 instance localhost level info user user1 caller notifier.go:702 time 2024-12-20 16:51:41.308566794 +0000 UTC m=+5.462960007 msg Stopping notification manager...
    logger.go:38: 2024-12-20 16:51:41.308641343 +0000 UTC m=+5.463034566 instance localhost level info user user1 msg notifier discovery manager stopped
    logger.go:38: 2024-12-20 16:51:41.308668099 +0000 UTC m=+5.463061312 instance localhost level info user user1 caller notifier.go:409 time 2024-12-20 16:51:41.308645126 +0000 UTC m=+5.463038349 msg Draining any remaining notifications...
    logger.go:38: 2024-12-20 16:51:41.3087285[81](https://github.com/grafana/mimir/actions/runs/12435238525/job/34720641880#step:8:82) +0000 UTC m=+5.463121804 instance localhost level info user user1 caller notifier.go:415 time 2024-12-20 16:51:41.308719174 +0000 UTC m=+5.463112407 msg Remaining notifications drained
    logger.go:38: 2024-12-20 16:51:41.308785718 +0000 UTC m=+5.463178941 instance localhost level info user user1 caller notifier.go:345 time 2024-12-20 16:51:41.30877073 +0000 UTC m=+5.463163953 msg Notification manager stopped
    logger.go:38: 2024-12-20 16:51:41.308[82](https://github.com/grafana/mimir/actions/runs/12435238525/job/34720641880#step:8:83)7966 +0000 UTC m=+5.463221199 instance localhost level info msg all user notifiers stopped
    logger.go:38: 2024-12-20 16:51:41.308858293 +0000 UTC m=+5.463251516 instance localhost level info msg cleaning up mapped rules directory path /tmp/TestAPI_CreateRuleGroupWithCaching265803[83](https://github.com/grafana/mimir/actions/runs/12435238525/job/34720641880#step:8:84)17/001
level=info user=user1 msg="updating rule file" file=/rules/user1/file%20%2Fone
level=info user=user1 msg="updating rule file" file=/rules/user1/file%20%2Fone
level=info user=user1 msg="updating rule file" file=/rules/user1/file%20%2Fone
level=info user=user1 msg="updating rule file" file=/rules/user1/file%20%2Ftwo
level=info user=user1 msg="updating rule file" file=/rules/user1/file%20%2Ftwo
level=info user=user1 msg="updating rule file" file=/rules/user1/+A_%2FReallyStrange%3C%3ENAME:SPACE%2F%3F
FAIL
FAIL	github.com/grafana/mimir/pkg/ruler	53.025s
@56quarters
Copy link
Contributor

I can't reproduce this locally in 1000+ test runs 😭

The only thing I can see that might be an issue is that we don't override the default ring check interval of the ruler for tests (5s). However, that would result in an extra cache lookup instead of a missing one. I can swap this test to use our test logger instead of a no-op one to hopefully give us more info the next time this flakes.

56quarters added a commit that referenced this issue Dec 20, 2024
Extra information to debug #10294

Signed-off-by: Nick Pillitteri <[email protected]>
56quarters added a commit that referenced this issue Dec 20, 2024
Extra information to debug #10294

Signed-off-by: Nick Pillitteri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants