Skip to content

Forbid creating directories in non-user directories#86844

Merged
scanhex12 merged 21 commits intoClickHouse:masterfrom
scanhex12:check_right
Sep 15, 2025
Merged

Forbid creating directories in non-user directories#86844
scanhex12 merged 21 commits intoClickHouse:masterfrom
scanhex12:check_right

Conversation

@scanhex12
Copy link
Copy Markdown
Member

@scanhex12 scanhex12 commented Sep 8, 2025

Changelog category (leave one):

  • Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Sep 8, 2025

Workflow [PR], commit [74ba30f]

@clickhouse-gh clickhouse-gh bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Sep 8, 2025
@nikitamikhaylov
Copy link
Copy Markdown
Member

Test?

@kssenii kssenii self-assigned this Sep 9, 2025
@Algunenano
Copy link
Copy Markdown
Member

Not for changelog (changelog entry is not required)

Please set the proper category. This is a critical bugfix

@divanik divanik self-assigned this Sep 9, 2025
@divanik
Copy link
Copy Markdown
Member

divanik commented Sep 9, 2025

I am not against of merging this PR (at least it is mitigate crtical issue). But I am really worried that LocalObjectStorage interface allows us to do such actions. If so, the bug can be a great deal more dangerous

Copy link
Copy Markdown
Member

@divanik divanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also address the tests and create the test which checks that security bug is not longer reproduced

Comment on lines +517 to +522
if (object_storage->getType() == ObjectStorageType::Local)
{
auto user_files_path = local_context->getUserFilesPath();
if (!fileOrSymlinkPathStartsWith(configuration_ptr->getPathForRead().path, user_files_path))
throw Exception(ErrorCodes::PATH_ACCESS_DENIED, "File path {} is not inside {}", configuration_ptr->getPathForRead().path, user_files_path);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the logic to

to also cover DeltaLake, referencing existing table (both by table engine and table function)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reasons why you think this is a bad suggestion?

@michael-anastasakis
Copy link
Copy Markdown
Member

From a security perspective, we cannot have a critical security check as an optional setting. That validation needs to be mandatory similarly as @Algunenano commented. I understand that there may be some extra effort needed to fix the failing tests but we need to have a proper and clean solution rather than a partial one.

Comment on lines -32 to -33
f"/iceberg_data/default/{TABLE_NAME}/",
f"/iceberg_data/default/{TABLE_NAME}/",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that default_upload_directory is always used with prefix "/var/lib/clickhouse/user_files/" now? Let's patch arguments in the function itself. This will dramatically change number of backported changes

assert int(instance.query(f"SELECT count() FROM {TABLE_NAME}")) == 100
snapshot1_timestamp = datetime.now(timezone.utc)
snapshot1_id = get_last_snapshot(f"/iceberg_data/default/{TABLE_NAME}/")
snapshot1_id = get_last_snapshot(f"/var/lib/clickhouse/user_files/iceberg_data/default/{TABLE_NAME}/")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used 2 times with prefix /var/lib/clickhouse/user_files, so it's useless to move those changes in get_last_snapshot.

Comment on lines +517 to +522
if (object_storage->getType() == ObjectStorageType::Local)
{
auto user_files_path = local_context->getUserFilesPath();
if (!fileOrSymlinkPathStartsWith(configuration_ptr->getPathForRead().path, user_files_path))
throw Exception(ErrorCodes::PATH_ACCESS_DENIED, "File path {} is not inside {}", configuration_ptr->getPathForRead().path, user_files_path);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reasons why you think this is a bad suggestion?

@scanhex12 scanhex12 requested a review from divanik September 12, 2025 16:14
Comment on lines +632 to +633

auto log = getLogger("IcebergMetadata");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd change

)

df = spark.read.format("iceberg").load(f"/iceberg_data/default/{TABLE_NAME}").collect()
df = spark.read.format("iceberg").load(f"/var/lib/clickhouse/user_files/iceberg_data/default/{TABLE_NAME}").collect()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to a separate function (in backported code it also will be easy to replace such line with a function call)

Comment on lines +53 to 54
with open(f"/var/lib/clickhouse/user_files/iceberg_data/default/{TABLE_NAME}/metadata/version-hint.text", "wb") as f:
f.write(b"4")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this to a separate function which takes table name and int version

@@ -28,12 +28,12 @@ def test_writes_create_partitioned_table(started_cluster_iceberg_with_spark, for
default_download_directory(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_download_directory can be changed as a default_upload_directory

@Algunenano
Copy link
Copy Markdown
Member

Please, change the changelog category to "Critical bug fix"

@scanhex12 scanhex12 added pr-must-backport Pull request should be backported intentionally. Use this label with great care! v25.8-must-backport labels Sep 15, 2025
@scanhex12 scanhex12 added this pull request to the merge queue Sep 15, 2025
Merged via the queue into ClickHouse:master with commit 5ec7d6d Sep 15, 2025
120 of 122 checks passed
@scanhex12 scanhex12 deleted the check_right branch September 15, 2025 12:39
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Sep 15, 2025
@pamarcos
Copy link
Copy Markdown
Member

Please, change the changelog category to "Critical bug fix"

+1, this seems relevant enough not only to be considered as a critical bug fix, but also IMHO a changelog entry is the bare minimum we should add as well

@robot-ch-test-poll1 robot-ch-test-poll1 added pr-backports-created-cloud deprecated label, NOOP pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR labels Sep 15, 2025
@otselnik
Copy link
Copy Markdown

@scanhex12 Will the work on backporting this pull request to the 25.8 version be continued?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-backports-created-cloud deprecated label, NOOP pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! pr-must-backport-synced The `*-must-backport` labels are synced into the cloud Sync PR pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants