Alerts & Reporting - Selenium timed out / DNS Resolution #28454
Replies: 1 comment 1 reply
-
For some reason the webdriver out of the cluster (to the internet). |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all, after hours of trying to understand where the root cause is, I think I found where the problem is, but don't know why its happening.
Setup
Using helm to deploy superset in k8s (AWS) - following documentation
Using my own Postgres DB (disabled in values.yaml)
Enabled in values.yaml:
My superset service name and port are changed in my values.yaml:
name Override, fullnameOverride (so for this lets assume its still "superset")
Service is configured as NodePort and is accessed via ALB with my domain. and the port and nodePorts are changed to something different than 8088.
Following documentations of Alerts & Reporting - which BTW the part of the chrome installation is not working on latest image (from what i use) since it need to install wget, unzip and also sync between chromedriver and google chrome.
Versions
Superset(latest) - 3.1.3
Celery worker - 3.1.3
Python(from image) - 3.10.14
Config overrides
Worker Log
Problem & Approach
Its getting a timeout when trying to take the snapshot.
To understand what can be the problem, I override the WEBDRIVER_AUTH_FUNC:
Looking in the worker logs:
If I enter the worker pod (kubectl exec -it <worker_pod> -- /bin/bash) and use nslookup on that domain, its resolved.
If I open there python and execute the following:
It failed to resolve the IP of the cluster.
After that I use the full name of the domain "superset.default.svc.cluster.local" and python code resolves the IP.
After that, I used this as the base url for the driver in my config:
WEBDRIVER_BASEURL='http://superset.default.svc.cluster.local:8088'
And when re-running the scheduled alert, the same error:
I also tried to add to the auth override function this logic to first resolve the domain and get the IP of the cluster (which works to resolve) but then also after the report schedule triggers I got:
Its worth mentioning that wget or curl with the clusterIP/serviceName/fullServiceName all work and get 200 OK.
Also I tried to use default WebDriver with Firefox (as mentioned in the documentations) - same result.
Expected
I expect to have an email sent with a snapshot of a dashboard.
Both don't happen.
Beta Was this translation helpful? Give feedback.
All reactions