You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
This has been happening only recently, starting about one month ago.
There is an intermittent situation on our salt-hubs (masters), two out of our three environments are affected.
Salt jobs are sent out to the minions from the hub, the jobs are received and performed on the minion and on the final send back to the hub, we can see the job is successful from the minion & the hub is receiving the event, but the data needed is missing.
Hubs receiving of the event but with the missing data:
2024-12-10 15:22:50,875 [salt.client ] [DEBUG ] get_iter_returns for jid 20241210212250747583 sent to {'salt-minion'} will timeout at 15:23:50.875442
2024-12-10 15:22:50,875 [salt.utils.event ] [TRACE ] Get event. tag: salt/job/20241210212250747583
2024-12-10 15:22:50,876 [salt.utils.event ] [TRACE ] _get_event() waited 0 seconds and received nothing
2024-12-10 15:22:50,886 [salt.utils.event ] [TRACE ] Get event. tag: salt/job/20241210212250747583
2024-12-10 15:22:50,888 [salt.utils.event ] [TRACE ] get_event() received = {'data': {'jid': '20241210212250747583', 'tgt_type': 'glob', 'tgt': 'salt-minion', 'user': 'sudo_user', 'fun': 'test.ping', 'arg': [], 'minions': ['salt-minion'], 'missing': [], '_stamp': '2024-12-10T21:22:50.748600'}, 'tag': 'salt/job/20241210212250747583/new'}
We also noticed this can cause minions to lose a connection with the hub at some point (testing with auth_safemode: True ping_interval: 1 on the minions
Setting the --batch-size 1 on the hub at least helps only target the minions that are connected, (but timeout issues still occur)
Expected behavior
That the:
Minions 'ReqChannel send' is the same as the hubs 'get_event() received'
Screenshots
NA
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
Salt: 3006.9Python Version:
Python: 3.10.14 (main, Jun 26 2024, 11:44:37) [GCC 11.2.0]Dependency Versions:
cffi: 1.16.0cherrypy: 18.6.1cryptography: 42.0.5dateutil: 2.8.1docker-py: Not Installedgitdb: 4.0.11gitpython: 3.1.40Jinja2: 3.1.4libgit2: Not Installedlooseversion: 1.0.2M2Crypto: Not InstalledMako: Not Installedmsgpack: 1.0.2msgpack-pure: Not Installedmysql-python: Not Installedpackaging: 22.0pycparser: 2.21pycrypto: Not Installedpycryptodome: 3.19.1pygit2: Not Installedpython-gnupg: 0.4.8PyYAML: 6.0.1PyZMQ: 23.2.0relenv: 0.17.0smmap: 5.0.1timelib: 0.2.4Tornado: 4.5.3ZMQ: 4.3.4System Versions:
dist: rhel 9.5 Plowlocale: utf-8machine: x86_64release: 5.14.0-503.16.1.el9_5.x86_64system: Linuxversion: Red Hat Enterprise Linux 9.5 Plow
Additional context
We are really just trying to understand how we can debug this any further and uncover the issue.
The text was updated successfully, but these errors were encountered:
Description
This has been happening only recently, starting about one month ago.
There is an intermittent situation on our salt-hubs (masters), two out of our three environments are affected.
Salt jobs are sent out to the minions from the hub, the jobs are received and performed on the minion and on the final send back to the hub, we can see the job is successful from the minion & the hub is receiving the event, but the data needed is missing.
The minion & hub will error on timeouts then.
Setup
Please be as specific as possible and give set-up details.
Steps to Reproduce the behavior
While its an intermittent issue, here is a simple output of a test.ping from the hub to the minion.
Minion gets and performs the job:
Hubs receiving of the event but with the missing data:
We also noticed this can cause minions to lose a connection with the hub at some point (testing with
auth_safemode: True ping_interval: 1
on the minionsSetting the
--batch-size 1
on the hub at least helps only target the minions that are connected, (but timeout issues still occur)Expected behavior
That the:
Minions 'ReqChannel send' is the same as the hubs 'get_event() received'
Screenshots
NA
Versions Report
salt --versions-report
(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)Additional context
We are really just trying to understand how we can debug this any further and uncover the issue.
The text was updated successfully, but these errors were encountered: