Skip to content

TestOtelIntegration::test_scheduler_change_after_the_first_task_finishes sometimes fail #61070

@jason810496

Description

@jason810496

What

The test sometimes fails.
For example: https://github.com/apache/airflow/actions/runs/21358582533/job/61472786065?pr=60989

Traceback Logs

summary:

FAILED airflow-core/tests/integration/otel/test_otel.py::TestOtelIntegration::test_scheduler_change_after_the_first_task_finishes - AssertionError: Span name 'task2' wasn't found in children span names. It's not a child of span 'otel_test_dag_with_pause_between_tasks'.
===================================================================== 1 failed, 7 passed, 10 skipped, 1 warning in 417.55s (0:06:57) =====================================================================
full traceback:
Traceback (most recent call last):
  File "/usr/python/bin/airflow", line 10, in <module>
    sys.exit(main())
  File "/opt/airflow/airflow-core/src/airflow/__main__.py", line 55, in main
    args.func(args)
  File "/opt/airflow/airflow-core/src/airflow/cli/cli_config.py", line 49, in command
    return func(*args, **kwargs)
  File "/opt/airflow/airflow-core/src/airflow/utils/cli.py", line 113, in wrapper
    return f(*args, **kwargs)
  File "/opt/airflow/airflow-core/src/airflow/utils/providers_configuration_loader.py", line 54, in wrapped_function
    return func(*args, **kwargs)
  File "/opt/airflow/airflow-core/src/airflow/cli/commands/api_server_command.py", line 120, in wrapper
    return func(args)
  File "/opt/airflow/airflow-core/src/airflow/cli/commands/api_server_command.py", line 164, in api_server
    run_command_with_daemon_option(
  File "/opt/airflow/airflow-core/src/airflow/cli/commands/daemon_utils.py", line 58, in run_command_with_daemon_option
    check_if_pidfile_process_is_running(pid_file=pid, process_name=process_name)
  File "/opt/airflow/airflow-core/src/airflow/utils/process_utils.py", line 370, in check_if_pidfile_process_is_running
    raise AirflowException(f"The {process_name} is already running under PID {pid}.")
airflow.sdk.exceptions.AirflowException: The api_server is already running under PID 124.
2026-01-26T13:32:15.730154Z [error    ] Exception while exporting metrics HTTPConnectionPool(host='breeze-otel-collector', port=4318): Max retries exceeded with url: /v1/metrics (Caused by NameResolutionError("HTTPConnection(host='breeze-otel-collector', port=4318): Failed to resolve 'breeze-otel-collector' ([Errno -3] Temporary failure in name resolution)")) [opentelemetry.sdk.metrics._internal.export] loc=__init__.py:545
Traceback (most recent call last):
  File "/usr/python/lib/python3.10/site-packages/urllib3/connection.py", line 204, in _new_conn
    sock = connection.create_connection(
  File "/usr/python/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/python/lib/python3.10/socket.py", line 967, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/python/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
  File "/usr/python/lib/python3.10/site-packages/urllib3/connectionpool.py", line 493, in _make_request
    conn.request(
  File "/usr/python/lib/python3.10/site-packages/urllib3/connection.py", line 500, in request
    self.endheaders()
  File "/usr/python/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/python/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/python/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/usr/python/lib/python3.10/site-packages/urllib3/connection.py", line 331, in connect
    self.sock = self._new_conn()
  File "/usr/python/lib/python3.10/site-packages/urllib3/connection.py", line 211, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: HTTPConnection(host='breeze-otel-collector', port=4318): Failed to resolve 'breeze-otel-collector' ([Errno -3] Temporary failure in name resolution)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/python/lib/python3.10/site-packages/requests/adapters.py", line 644, in send
    resp = conn.urlopen(
  File "/usr/python/lib/python3.10/site-packages/urllib3/connectionpool.py", line 841, in urlopen
    retries = retries.increment(
  File "/usr/python/lib/python3.10/site-packages/urllib3/util/retry.py", line 535, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='breeze-otel-collector', port=4318): Max retries exceeded with url: /v1/metrics (Caused by NameResolutionError("HTTPConnection(host='breeze-otel-collector', port=4318): Failed to resolve 'breeze-otel-collector' ([Errno -3] Temporary failure in name resolution)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/python/lib/python3.10/site-packages/opentelemetry/sdk/metrics/_internal/export/__init__.py", line 541, in _receive_metrics
    self._exporter.export(
  File "/usr/python/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py", line 203, in export
    resp = self._export(serialized_data.SerializeToString())
  File "/usr/python/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/http/metric_exporter/__init__.py", line 173, in _export
    return self._session.post(
  File "/usr/python/lib/python3.10/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/usr/python/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/python/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/python/lib/python3.10/site-packages/requests/adapters.py", line 677, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='breeze-otel-collector', port=4318): Max retries exceeded with url: /v1/metrics (Caused by NameResolutionError("HTTPConnection(host='breeze-otel-collector', port=4318): Failed to resolve 'breeze-otel-collector' ([Errno -3] Temporary failure in name resolution)"))
/usr/python/lib/python3.10/site-packages/celery/platforms.py:841 SecurityWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!

Please specify a different user using the --uid option.

User information: uid=0 euid=0 gid=0 egid=0

2026-01-26T13:32:17.299274Z [info     ] Connected to redis://redis:6379/0 [celery.worker.consumer.connection] loc=connection.py:22
2026-01-26T13:32:17.305338Z [info     ] mingle: searching for neighbors [celery.worker.consumer.mingle] loc=mingle.py:40
2026-01-26T13:32:18.316065Z [info     ] mingle: all alone              [celery.worker.consumer.mingle] loc=mingle.py:49
2026-01-26T13:32:18.330157Z [info     ] celery@753b8cbd20cb ready.     [celery.apps.worker] loc=worker.py:176
2026-01-26T13:32:19.434136Z [error    ] Exception while exporting metrics HTTPConnectionPool(host='breeze-otel-collector', port=4318): Max retries exceeded with url: /v1/metrics (Caused by NameResolutionError("HTTPConnection(host='breeze-otel-collector', port=4318): Failed to resolve 'breeze-otel-collector' ([Errno -3] Temporary failure in name resolution)")) [opentelemetry.sdk.metrics._internal.export] loc=__init__.py:545
Traceback (most recent call last):
  File "/usr/python/lib/python3.10/site-packages/urllib3/connection.py", line 204, in _new_conn
    sock = connection.create_connection(
  File "/usr/python/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/python/lib/python3.10/socket.py", line 967, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/python/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
  File "/usr/python/lib/python3.10/site-packages/urllib3/connectionpool.py", line 493, in _make_request
    conn.request(
  File "/usr/python/lib/python3.10/site-packages/urllib3/connection.py", line 500, in request
    self.endheaders()
  File "/usr/python/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/python/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/python/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/usr/python/lib/python3.10/site-packages/urllib3/connection.py", line 331, in connect
    self.sock = self._new_conn()
  File "/usr/python/lib/python3.10/site-packages/urllib3/connection.py", line 211, in _new_conn
    raise NameResolutionError(self.host, self, e) from e
urllib3.exceptions.NameResolutionError: HTTPConnection(host='breeze-otel-collector', port=4318): Failed to resolve 'breeze-otel-collector' ([Errno -3] Temporary failure in name resolution)

Metadata

Metadata

Labels

area:Schedulerincluding HA (high availability) schedulerkind:bugThis is a clearly a bugtelemetryTelemetry-related issues

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions