Skip to content

Fix Data Lake tests concurrent run#80386

Merged
azat merged 12 commits intomasterfrom
fix-data-lakes
May 26, 2025
Merged

Fix Data Lake tests concurrent run#80386
azat merged 12 commits intomasterfrom
fix-data-lakes

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented May 17, 2025

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Fixes: #79719

@alexey-milovidov alexey-milovidov added the 🍃 green ci 🌿 Fixing flaky tests in CI label May 17, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented May 17, 2025

Workflow [PR], commit [0744072]

@bharatnc
Copy link
Copy Markdown
Contributor

I realized that one more common cause of flake is multiple tests all use the same bind ports which leads to errors similar to the following:

E               Error response from daemon: driver failed programming external connectivity on endpoint roottestdatabaseglue-gw1-minio-1 (086121f6bcda800827d82dce9519f6b87f7d76ad5f5610258b96a82aacd909ea): Bind for 0.0.0.0:9002 failed: port is already allocated

image

@alesapin alesapin changed the title Fix Data Lake Fix Data Lake tests concurrent run May 23, 2025
@alesapin
Copy link
Copy Markdown
Member

@azat azat self-assigned this May 23, 2025
/usr/bin/mc policy set public minio/warehouse;
until (/usr/bin/mc config host add minio http://minio:9000 minio ClickHouse_Minio_P@ssw0rd) do echo '...waiting...' && sleep 1; done;
/usr/bin/mc rm -r --force minio/warehouse-hms;
/usr/bin/mc mb minio/warehouse-hms --ignore-existing;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move bucket creation to the cluster.py as for other MinIOs?

def wait_minio_to_start(self, timeout=180, secure=False):
self.minio_ip = self.get_instance_ip(self.minio_host)
self.minio_redirect_ip = self.get_instance_ip(self.minio_redirect_host)
os.environ["SSL_CERT_FILE"] = p.join(
self.base_dir, self.minio_dir, "certs", "public.crt"
)
minio_client = Minio(
f"{self.minio_ip}:{self.minio_port}",
access_key=minio_access_key,
secret_key=minio_secret_key,
secure=secure,
http_client=urllib3.PoolManager(cert_reqs="CERT_NONE"),
) # disable SSL check as we test ClickHouse and not Python library
start = time.time()
while time.time() - start < timeout:
try:
minio_client.list_buckets()
logging.debug("Connected to Minio.")
buckets = [self.minio_bucket, self.minio_bucket_2]
for bucket in buckets:
if minio_client.bucket_exists(bucket):
delete_object_list = map(
lambda x: x.object_name,
minio_client.list_objects_v2(bucket, recursive=True),
)
errors = minio_client.remove_objects(bucket, delete_object_list)
for error in errors:
logging.error(
f"Error occurred when deleting object {error}"
)
minio_client.remove_bucket(bucket)
minio_client.make_bucket(bucket)
logging.debug("S3 bucket '%s' created", bucket)
self.minio_client = minio_client
return
except Exception as ex:
logging.debug("Can't connect to Minio: %s", str(ex))
time.sleep(1)
try:
with open(os.path.join(self.minio_dir, "docker.log"), "w+") as f:
subprocess.check_call( # STYLE_CHECK_ALLOW_SUBPROCESS_CHECK_CALL
self.base_minio_cmd + ["logs"], stdout=f
)
except Exception as e:
logging.debug("Unable to get logs from docker.")
raise Exception("Can't wait Minio to start")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've add a todo for now.

@alesapin
Copy link
Copy Markdown
Member

alesapin commented May 23, 2025

E               Exception: Command [docker compose --project-name roottestdatabasehms-gw3 --env-file /ClickHouse/tests/integration/test_database_hms/_instances-0-gw3/.env --file /ClickHouse/tests/integration/helpers/../../../tests/integration/compose/docker_compose_iceberg_hms_catalog.yml --verbose up -d] return non-zero code 17: time="2025-05-23T16:01:23Z" level=trace msg="Docker Desktop integration not enabled"
E               time="2025-05-23T16:01:23Z" level=debug msg="using default config store \"/root/.docker/buildx\""
E               time="2025-05-23T16:01:23Z" level=debug msg="using default config store \"/root/.docker/buildx\""
E               time="2025-05-23T16:01:24Z" level=debug msg="serving grpc connection" spanID=1a08c18adf73eb92 traceID=688283e6744b3c5733acdc15fee8c146
E               time="2025-05-23T16:01:24Z" level=debug msg="stopping session" span="load buildkit capabilities" spanID=1a08c18adf73eb92 traceID=688283e6744b3c5733acdc15fee8c146
E               time="2025-05-23T16:01:24Z" level=debug msg="otel error" error="<nil>"
E               time="2025-05-23T16:01:24Z" level=debug msg="otel error" error="<nil>"
E               unable to prepare context: path "/ClickHouse/tests/hms_extensions" not found

What is this???

@azat
Copy link
Copy Markdown
Member

azat commented May 23, 2025

What is this???

Egh, do we build the hive image every time?

$ grep '^  hive:' -A2 tests/integration/compose/docker_compose_iceberg_hms_catalog.yml
  hive:
    build: ./../../hms_extensions/
    restart: unless-stopped

We should use pre-built image!

@azat
Copy link
Copy Markdown
Member

azat commented May 25, 2025

@alexey-milovidov
Copy link
Copy Markdown
Member Author

Please commit directly to this branch, let's merge it together!

@azat
Copy link
Copy Markdown
Member

azat commented May 25, 2025

For now I've fixed with "hot fix"

We should use pre-built image!

For this I need access to docker registry to create a new image

E unable to prepare context: path "/ClickHouse/tests/hms_extensions" not found

I have no idea how this was possible for now, only a theory, at some point this image had been built and used from cache (but it should be only local on machine, so again, no idea how it was possible), now this file has been modified and compose tries to rebuild the image

- minio
image: minio/mc
# Stick to version with "mc config"
image: minio/mc:RELEASE.2025-04-16T18-13-26Z
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also pinned all other docker images that was using in this tests

And about this place, it fails for me, since new mcli does not config command (alias should be used instead, but we stick with older version for now)

@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 26, 2025
robot-clickhouse-ci-1 added a commit that referenced this pull request May 28, 2025
Cherry pick #80386 to 25.4: Fix Data Lake tests concurrent run
robot-clickhouse-ci-1 added a commit that referenced this pull request May 28, 2025
Cherry pick #80386 to 25.5: Fix Data Lake tests concurrent run
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-backports-created-cloud deprecated label, NOOP label May 28, 2025
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label May 28, 2025
azat added a commit that referenced this pull request May 28, 2025
Backport #80386 to 25.4: Fix Data Lake tests concurrent run
azat added a commit that referenced this pull request May 28, 2025
Backport #80386 to 25.5: Fix Data Lake tests concurrent run
@Algunenano Algunenano added v25.3-must-backport and removed pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore labels May 29, 2025
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label May 31, 2025
azat added a commit that referenced this pull request Jun 1, 2025
Cherry pick #80386 to 25.3: Fix Data Lake tests concurrent run
@azat azat removed the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jun 1, 2025
@azat
Copy link
Copy Markdown
Member

azat commented Jun 1, 2025

removed the pr-backports-created label

Let's see if this will help for #81015 to processed

@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore label Jun 1, 2025
azat added a commit that referenced this pull request Jun 1, 2025
Backport #80386 to 25.3: Fix Data Lake tests concurrent run
@robot-clickhouse robot-clickhouse added the pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR label Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🍃 green ci 🌿 Fixing flaky tests in CI pr-backports-created Backport PRs are successfully created, it won't be processed by CI script anymore pr-backports-created-cloud deprecated label, NOOP pr-ci pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration test test_database_glue/test.py::test_hide_sensitive_info is flaky