Alerts & Reporting - Selenium timed out / DNS Resolution #28454

nivdann · 2024-05-13T13:56:48Z

nivdann
May 13, 2024

Hi all, after hours of trying to understand where the root cause is, I think I found where the problem is, but don't know why its happening.

Setup

Using helm to deploy superset in k8s (AWS) - following documentation
Using my own Postgres DB (disabled in values.yaml)
Enabled in values.yaml:

supersetNode
supersetWorker (celery)
supersetCeleryBeat
Redis

NAME                                               READY   STATUS      RESTARTS   AGE
superset-55b578c669-nkgph                          1/1     Running     0          41s
superset-celerybeat-74d6b48557-cqt2p               1/1     Running     0          41s
superset-init-db-xjkpn                             0/1     Completed   0          40s
superset-redis-master-0                            1/1     Running     0          41s
superset-worker-5dbc55b74-hp5g4                    1/1     Running     0          41s

My superset service name and port are changed in my values.yaml:
name Override, fullnameOverride (so for this lets assume its still "superset")

Service is configured as NodePort and is accessed via ALB with my domain. and the port and nodePorts are changed to something different than 8088.

Following documentations of Alerts & Reporting - which BTW the part of the chrome installation is not working on latest image (from what i use) since it need to install wget, unzip and also sync between chromedriver and google chrome.

Versions

Superset(latest) - 3.1.3
Celery worker - 3.1.3
Python(from image) - 3.10.14

Config overrides

 ROW_LIMIT = 5000
 ENABLE_PROXY_FIX = True
 ALERT_REPORTS_NOTIFICATION_DRY_RUN = False


FEATURE_FLAGS = {
     "DYNAMIC_PLUGINS": True,
     "ALLOW_ADHOC_SUBQUERY": True,
     "ENABLE_TEMPLATE_PROCESSING": True,
     "DASHBOARD_VIRTUALIZATION": True,
     "DRILL_BY": True,
     "GLOBAL_ASYNC_QUERIES": False,
     "ALERT_REPORTS": True
}

import ast
SMTP_HOST = os.getenv("SMTP_HOST","localhost")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
SMTP_SSL_SERVER_AUTH = ast.literal_eval(os.getenv("SMTP_SSL_SERVER_AUTH", "False"))
SMTP_PORT = os.getenv("SMTP_PORT",25)
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","[email protected]")
SMTP_USER = ""
SMTP_PASSWORD = ""


from celery.schedules import crontab

class CeleryConfig:
    broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
    imports = (
        "superset.sql_lab",
        "superset.tasks.scheduler",
    )
    result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
    worker_prefetch_multiplier = 10
    task_acks_late = True
    task_annotations = {
        "sql_lab.get_sql_results": {
            "rate_limit": "100/s",
        },
    }
    beat_schedule = {
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute=0, hour=0),
        },
    }
CELERY_CONFIG = CeleryConfig

SCREENSHOT_LOCATE_WAIT = 100
SCREENSHOT_LOAD_WAIT = 600
EMAIL_PAGE_RENDER_WAIT = 60
WEBDRIVER_BASEURL = "http://superset:8088"
WEBDRIVER_BASEURL_USER_FRIENDLY = WEBDRIVER_BASEURL
WEBDRIVER_TYPE = "chrome"
WEBDRIVER_OPTION_ARGS = [
    "--force-device-scale-factor=2.0",
    "--high-dpi-support=2.0",
    "--headless",
    "--disable-gpu",
    "--disable-dev-shm-usage",
    "--no-sandbox",
    "--disable-setuid-sandbox",
    "--disable-extensions"
]

 SUPERSET_WEBSERVER_TIMEOUT = 300

from superset.tasks.types import ExecutorType

THUMBNAIL_SELENIUM_USER = 'admin'
ALERT_REPORTS_EXECUTE_AS = [ExecutorType.SELENIUM]

SECRET_KEY = "<MY_SECRET_KEY>"

Worker Log

[2024-05-13 12:07:00,099: INFO/ForkPoolWorker-3] Scheduling alert test122 eta: 2024-05-13 12:07:00
Executing alert/report, task id: 965728ea-5a8a-410b-8d7f-14b4dc9240fa, scheduled_dttm: 2024-05-13T12:07:00
[2024-05-13 12:07:00,107: INFO/ForkPoolWorker-3] Executing alert/report, task id: 965728ea-5a8a-410b-8d7f-14b4dc9240fa, scheduled_dttm: 2024-05-13T12:07:00
session is validated: id 13, executionid: 965728ea-5a8a-410b-8d7f-14b4dc9240fa
[2024-05-13 12:07:00,108: INFO/ForkPoolWorker-3] session is validated: id 13, executionid: 965728ea-5a8a-410b-8d7f-14b4dc9240fa
Running report schedule 965728ea-5a8a-410b-8d7f-14b4dc9240fa as user admin
[2024-05-13 12:07:00,166: INFO/ForkPoolWorker-3] Running report schedule 965728ea-5a8a-410b-8d7f-14b4dc9240fa as user admin
Selenium timed out requesting url http://superset:8088/superset/dashboard/dc28c4a4-6b5a-46f4-8f5f-2f71c6db30f2/?force=false&standalone=3
Traceback (most recent call last):
  File "/app/superset/utils/webdriver.py", line 368, in get_screenshot
    element = WebDriverWait(driver, self._screenshot_locate_wait).until(
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

[2024-05-13 12:07:47,328: ERROR/ForkPoolWorker-1] Selenium timed out requesting url http://superset:8088/superset/dashboard/dc28c4a4-6b5a-46f4-8f5f-2f71c6db30f2/?force=false&standalone=3
Traceback (most recent call last):
  File "/app/superset/utils/webdriver.py", line 368, in get_screenshot
    element = WebDriverWait(driver, self._screenshot_locate_wait).until(
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

A downstream warning occurred while generating a report: 965728ea-5a8a-410b-8d7f-14b4dc9240fa. Report Schedule is still working, refusing to re-compute.
Traceback (most recent call last):
  File "/app/superset/tasks/scheduler.py", line 98, in execute
    ).run()
  File "/app/superset/commands/report/execute.py", line 729, in run
    raise ex
  File "/app/superset/commands/report/execute.py", line 727, in run
    ).run()
  File "/app/superset/commands/report/execute.py", line 689, in run
    ).next()
  File "/app/superset/commands/report/execute.py", line 612, in next
    raise exception_working
superset.commands.report.exceptions.ReportSchedulePreviousWorkingError: Report Schedule is still working, refusing to re-compute.
[2024-05-13 12:07:47,485: WARNING/ForkPoolWorker-3] A downstream warning occurred while generating a report: 965728ea-5a8a-410b-8d7f-14b4dc9240fa. Report Schedule is still working, refusing to re-compute.
Traceback (most recent call last):
  File "/app/superset/tasks/scheduler.py", line 98, in execute
    ).run()
  File "/app/superset/commands/report/execute.py", line 729, in run
    raise ex
  File "/app/superset/commands/report/execute.py", line 727, in run
    ).run()
  File "/app/superset/commands/report/execute.py", line 689, in run
    ).next()
  File "/app/superset/commands/report/execute.py", line 612, in next
    raise exception_working
superset.commands.report.exceptions.ReportSchedulePreviousWorkingError: Report Schedule is still working, refusing to re-compute.
header_data in notifications for alerts and reports {'notification_type': 'Alert', 'notification_source': <ReportSourceFormat.DASHBOARD: 'dashboard'>, 'notification_format': 'PNG', 'chart_id': None, 'dashboard_id': 2, 'owners': [Superset Admin]}, taskid, b129208d-9417-490e-9fd0-01e9312cdd59
[2024-05-13 12:07:47,507: INFO/ForkPoolWorker-1] header_data in notifications for alerts and reports {'notification_type': 'Alert', 'notification_source': <ReportSourceFormat.DASHBOARD: 'dashboard'>, 'notification_format': 'PNG', 'chart_id': None, 'dashboard_id': 2, 'owners': [Superset Admin]}, taskid, b129208d-9417-490e-9fd0-01e9312cdd59
Report sent to email, notification content is None
[2024-05-13 12:07:47,677: INFO/ForkPoolWorker-1] Report sent to email, notification content is None
A downstream exception occurred while generating a report: b129208d-9417-490e-9fd0-01e9312cdd59. Report Schedule execution failed when generating a screenshot.
Traceback (most recent call last):
  File "/app/superset/tasks/scheduler.py", line 98, in execute
    ).run()
  File "/app/superset/commands/report/execute.py", line 729, in run
    raise ex
  File "/app/superset/commands/report/execute.py", line 727, in run
    ).run()
  File "/app/superset/commands/report/execute.py", line 689, in run
    ).next()
  File "/app/superset/commands/report/execute.py", line 586, in next
    raise first_ex
  File "/app/superset/commands/report/execute.py", line 555, in next
    self.send()
  File "/app/superset/commands/report/execute.py", line 452, in send
    notification_content = self._get_notification_content()
  File "/app/superset/commands/report/execute.py", line 357, in _get_notification_content
    screenshot_data = self._get_screenshots()
  File "/app/superset/commands/report/execute.py", line 240, in _get_screenshots
    raise ReportScheduleScreenshotFailedError()
superset.commands.report.exceptions.ReportScheduleScreenshotFailedError: Report Schedule execution failed when generating a screenshot.
[2024-05-13 12:07:47,702: ERROR/ForkPoolWorker-1] A downstream exception occurred while generating a report: b129208d-9417-490e-9fd0-01e9312cdd59. Report Schedule execution failed when generating a screenshot.
Traceback (most recent call last):
  File "/app/superset/tasks/scheduler.py", line 98, in execute
    ).run()
  File "/app/superset/commands/report/execute.py", line 729, in run
    raise ex
  File "/app/superset/commands/report/execute.py", line 727, in run
    ).run()
  File "/app/superset/commands/report/execute.py", line 689, in run
    ).next()
  File "/app/superset/commands/report/execute.py", line 586, in next
    raise first_ex
  File "/app/superset/commands/report/execute.py", line 555, in next
    self.send()
  File "/app/superset/commands/report/execute.py", line 452, in send
    notification_content = self._get_notification_content()
  File "/app/superset/commands/report/execute.py", line 357, in _get_notification_content
    screenshot_data = self._get_screenshots()
  File "/app/superset/commands/report/execute.py", line 240, in _get_screenshots
    raise ReportScheduleScreenshotFailedError()
superset.commands.report.exceptions.ReportScheduleScreenshotFailedError: Report Schedule execution failed when generating a screenshot.

Problem & Approach

Its getting a timeout when trying to take the snapshot.

To understand what can be the problem, I override the WEBDRIVER_AUTH_FUNC:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def auth_driver(driver, user):
	driver.get(f'http://superset:8088/login')
	print("Authenticating.....")
	try:
		username = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, 'username')))
		username.send_keys("admin")
		password = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, 'password')))
		password.send_keys("admin")
		login = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, 'login')))
		login.click()
		print(f"Success login for user: {user}")
		return driver
	except Exception as e:
		print("Failed to find an element. Printing all available elements:")
	
WEBDRIVER_AUTH_FUNC = auth_driver

Looking in the worker logs:

http://superset:8088/login/
The DNS resolution for this site failed. Either the site name has been mistyped or there is an issue with resolving the DNS name.

If I enter the worker pod (kubectl exec -it <worker_pod> -- /bin/bash) and use nslookup on that domain, its resolved.
If I open there python and execute the following:

import dns.resolver
domain = "superset"
resolver = dns.resolver.Resolver()
try:
    answers = resolver.resolve(domain, 'A')
    for rdata in answers:
        print(f"IP Address: {rdata.address}")
except Exception as e:
    print(f"An error occurred: {e}")

It failed to resolve the IP of the cluster.

After that I use the full name of the domain "superset.default.svc.cluster.local" and python code resolves the IP.

After that, I used this as the base url for the driver in my config:
WEBDRIVER_BASEURL='http://superset.default.svc.cluster.local:8088'

And when re-running the scheduled alert, the same error:

http://superset.default.svc.cluster.local:8088/login/
The DNS resolution for this site failed. Either the site name has been mistyped or there is an issue with resolving the DNS name.

I also tried to add to the auth override function this logic to first resolve the domain and get the IP of the cluster (which works to resolve) but then also after the report schedule triggers I got:

http://<ResolvedClusterIP>:8088/login
The DNS resolution for this site failed. Either the site name has been mistyped or there is an issue with resolving the DNS name.

Its worth mentioning that wget or curl with the clusterIP/serviceName/fullServiceName all work and get 200 OK.
Also I tried to use default WebDriver with Firefox (as mentioned in the documentations) - same result.

Expected

I expect to have an email sent with a snapshot of a dashboard.
Both don't happen.

nivdann · 2024-05-16T12:31:31Z

nivdann
May 16, 2024
Author

For some reason the webdriver out of the cluster (to the internet).
Fixed with adding to WEBDRIVER_OPTION_ARGS: (on Chrome driver)
"--proxy-server={k8s-superset-service>:<port}",
"--ignore-certificate-errors"

1 reply

devysh1907 Dec 16, 2024

@nivdann @dosu
what should be give in --proxy-server={k8s-superset-service>:<port}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alerts & Reporting - Selenium timed out / DNS Resolution #28454

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Alerts & Reporting - Selenium timed out / DNS Resolution #28454

nivdann May 13, 2024

Setup

Versions

Config overrides

Worker Log

Problem & Approach

Expected

Replies: 1 comment · 1 reply

nivdann May 16, 2024 Author

devysh1907 Dec 16, 2024

nivdann
May 13, 2024

Replies: 1 comment 1 reply

nivdann
May 16, 2024
Author