Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
229 commits
Select commit Hold shift + click to select a range
440ccc7
Add suport for s3 hive partition style writes
arthurpassos Feb 26, 2025
4224845
storageurl and storagefile
arthurpassos Feb 26, 2025
b0ec6be
simplify code
arthurpassos Feb 26, 2025
13a67df
reduce changes
arthurpassos Feb 26, 2025
00685e2
add tests for s3, enforce some rules
arthurpassos Mar 3, 2025
278c6da
some refactoring
arthurpassos Mar 5, 2025
97f67b5
extern not_implemented
arthurpassos Mar 5, 2025
1c2748a
copy sample bock
arthurpassos Mar 5, 2025
2eb820d
focus on engine s3 only, new argument to control partition style
arthurpassos Mar 6, 2025
8458314
small adjustments
arthurpassos Mar 6, 2025
2439ef5
re-trigger ci
arthurpassos Mar 6, 2025
795506f
tests
arthurpassos Mar 6, 2025
0f2e254
comment out test until we figure out what to do with that syntax
arthurpassos Mar 9, 2025
b9cae0a
fix tests
arthurpassos Mar 9, 2025
e3d770f
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Mar 20, 2025
1405e1f
continue impl
arthurpassos Mar 28, 2025
5692d45
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Mar 28, 2025
4fa8d90
some cicd changes
arthurpassos Mar 29, 2025
3b7500d
add no fast test to test
arthurpassos Mar 29, 2025
073473d
write_partition_columns_into_files argument
arthurpassos Apr 2, 2025
1cc52f3
remove stale setting
arthurpassos Apr 2, 2025
c130845
updt
arthurpassos Apr 2, 2025
2b2f111
fix ub
arthurpassos Apr 2, 2025
7f767d1
add test for write or not write column to file
arthurpassos Apr 2, 2025
aa48394
refactor extractNamedArgumentAndRemoveFromList
arthurpassos Apr 3, 2025
e071ae3
address a few pr comments
arthurpassos Apr 3, 2025
12def67
address some more comments
arthurpassos Apr 3, 2025
bf68c58
simmplify extractPartitionRequiredColumns
arthurpassos Apr 3, 2025
4885954
rename setting
arthurpassos Apr 3, 2025
7a6b5bd
revert some small changes on storagefile
arthurpassos Apr 3, 2025
97c9219
some intermediate docs
arthurpassos Apr 3, 2025
ece858b
add missing test file updt
arthurpassos Apr 3, 2025
e20422a
simplify sanity check of partition config
arthurpassos Apr 3, 2025
c4be416
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Apr 4, 2025
9cecc7d
rmv conflict marker
arthurpassos Apr 4, 2025
ef04a67
style check fix
arthurpassos Apr 4, 2025
5f2cd8f
default partition strategy in factory
arthurpassos Apr 4, 2025
238d160
fix build
arthurpassos Apr 4, 2025
505b3b4
refactor executeImpl to pass around the insert_query object
arthurpassos Apr 4, 2025
2a95b87
remove default argument from virtual and overrides
arthurpassos Apr 4, 2025
f414a0c
use ASTPtr
arthurpassos Apr 4, 2025
9002da6
fix inconsistencies because of cherry pick and text replacament
arthurpassos Apr 5, 2025
6fff541
identation
arthurpassos Apr 5, 2025
02b825c
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Apr 7, 2025
ad66700
add exception for when no columns would be in file, tests and minor i…
arthurpassos Apr 9, 2025
76cb9d8
trigger ci
arthurpassos Apr 9, 2025
805d8ed
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Apr 10, 2025
e887326
positional argument
arthurpassos Apr 15, 2025
7890155
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Apr 15, 2025
d9da2af
rmv no longer needed badarguments
arthurpassos Apr 15, 2025
39bcb8e
fix integ tests
arthurpassos Apr 17, 2025
780303a
small change
arthurpassos Apr 22, 2025
dbaa422
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Apr 23, 2025
68db66a
fix build
arthurpassos Apr 23, 2025
fea015e
Implement reading
arthurpassos Apr 27, 2025
ddd3b14
add comment
arthurpassos Apr 29, 2025
f5c2c56
rename tests
arthurpassos Apr 29, 2025
35616ff
remove non-related files
arthurpassos Apr 29, 2025
cc1e061
keep columns duplicated
arthurpassos Apr 29, 2025
6dfee2c
remove unused files
arthurpassos Apr 29, 2025
35e02cf
remove unused include
arthurpassos Apr 29, 2025
f39c005
..
arthurpassos Apr 29, 2025
8b2ebfc
rmv files again again
arthurpassos Apr 29, 2025
e4650cf
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Apr 29, 2025
e13bd8f
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Apr 29, 2025
6821b78
trigger ci
arthurpassos Apr 29, 2025
a71e0ce
aa
arthurpassos Apr 29, 2025
eb907f6
hack
arthurpassos Apr 29, 2025
bea6cf5
revert some changes
arthurpassos Apr 30, 2025
01d0242
do not create hive partition columns as virtual, keep them as physica…
arthurpassos Apr 30, 2025
7498cee
style
arthurpassos Apr 30, 2025
03d17a5
fix dangling ref
arthurpassos Apr 30, 2025
b7a0534
fix build issue
arthurpassos Apr 30, 2025
0ff6e21
fix unrequested columns pushed to chunl
arthurpassos Apr 30, 2025
040d53f
fix ut build
arthurpassos Apr 30, 2025
d016a90
smalltst
arthurpassos May 4, 2025
2cd5cc2
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos May 4, 2025
ab38103
style
arthurpassos May 4, 2025
d3e30fd
oh boy, too much
arthurpassos May 9, 2025
8c0f031
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos May 9, 2025
bed3c84
style and some modifications
arthurpassos May 9, 2025
531d4ec
style again
arthurpassos May 9, 2025
4c9293a
fix filtering
arthurpassos May 10, 2025
58cc718
fix some more tests
arthurpassos May 10, 2025
e0a3d56
updt test
arthurpassos May 11, 2025
0c653a6
let's see
arthurpassos May 11, 2025
e70a39c
do not use lc type
arthurpassos May 12, 2025
7c00569
...
arthurpassos May 12, 2025
432a3bf
fix iceberg clone & pass some more arguments around
arthurpassos May 12, 2025
e6afbf4
partition columns should be in file by default
arthurpassos May 13, 2025
666b904
progress
arthurpassos May 16, 2025
417c3e9
progress
arthurpassos May 16, 2025
17dab0c
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos May 16, 2025
7956550
queue fix
arthurpassos May 17, 2025
288a059
change factory validation
arthurpassos May 19, 2025
d1d5cd9
dangerous simplify branch
arthurpassos May 28, 2025
9ec1718
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos May 28, 2025
ff34a9a
remove clone and shitty copy constructors
arthurpassos May 28, 2025
f2e5501
ah whathell
arthurpassos May 28, 2025
85aec70
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos May 28, 2025
2090d98
style
arthurpassos May 28, 2025
3c44ae8
nolint explicit constructor
arthurpassos May 28, 2025
385186a
...
arthurpassos May 28, 2025
2d97177
rmv duplicate header
arthurpassos May 29, 2025
b6445f0
small changes
arthurpassos Jun 2, 2025
447fa48
add a few comments
arthurpassos Jun 2, 2025
6e30c92
Revert "use ASTPtr"
arthurpassos Jun 2, 2025
c675fd4
Revert "remove default argument from virtual and overrides"
arthurpassos Jun 2, 2025
2bf0f58
Revert "refactor executeImpl to pass around the insert_query object"
arthurpassos Jun 2, 2025
f6e1406
trying to undo some stuff
arthurpassos Jun 2, 2025
c169ddb
setPartitionBy method
arthurpassos Jun 2, 2025
8d8dfd3
finalize revert
arthurpassos Jun 2, 2025
6a3ac44
actually finallize
arthurpassos Jun 2, 2025
7c8aaac
minimize number of changes
arthurpassos Jun 2, 2025
8070bc3
add some more comments
arthurpassos Jun 2, 2025
20532d7
minimize number of changes
arthurpassos Jun 2, 2025
a0b235b
some more docs
arthurpassos Jun 2, 2025
7609c9c
add some comments, reduce changes by removing changes in data lake stuff
arthurpassos Jun 2, 2025
d061fad
typo
arthurpassos Jun 2, 2025
c7d9914
rename from setpath to setrawpath
arthurpassos Jun 2, 2025
67afa60
rmv unnecessary include statements
arthurpassos Jun 2, 2025
9454c1f
remove dead code
arthurpassos Jun 2, 2025
71d1c01
small stuff
arthurpassos Jun 2, 2025
abede56
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 5, 2025
e61b3aa
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 14, 2025
fbd5167
resolve a few quick comments
arthurpassos Jun 14, 2025
dad5f25
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 14, 2025
54c388c
resolve some more easy comments
arthurpassos Jun 14, 2025
22667c3
remove wildcard_acceptance and start using magic_enum
arthurpassos Jun 16, 2025
fb96eb9
throw in case hive file contains only partition columns
arthurpassos Jun 16, 2025
ead0f50
rmv unrelated file
arthurpassos Jun 16, 2025
df43b8f
another comment
arthurpassos Jun 16, 2025
8edf2fe
rename stringfied partitionstrategy to wildcard and remove dummy fact…
arthurpassos Jun 16, 2025
9ff9be1
add some more comments
arthurpassos Jun 16, 2025
27e26d9
style
arthurpassos Jun 16, 2025
22dae5e
rawpath instead of reading path in deltalakemtadata
arthurpassos Jun 16, 2025
5012890
remove todo and document getpaths and setpaths
arthurpassos Jun 16, 2025
862bdc7
resolve one more todo
arthurpassos Jun 17, 2025
d757cdd
rmv one more todo
arthurpassos Jun 17, 2025
1b87681
resolve one more todo
arthurpassos Jun 17, 2025
b006338
one more todo
arthurpassos Jun 17, 2025
1c81d48
remove a few more todos
arthurpassos Jun 17, 2025
518c3ba
resolve one more todo
arthurpassos Jun 18, 2025
1eba18c
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 18, 2025
5402b0c
fix conflicts err
arthurpassos Jun 18, 2025
5cf75f7
one less todo
arthurpassos Jun 18, 2025
c3bbb1d
resolve a few more todos
arthurpassos Jun 18, 2025
bfd2f7b
a few more todos
arthurpassos Jun 18, 2025
5850fe3
resolve one more todo
arthurpassos Jun 19, 2025
79ea1d9
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 19, 2025
ac026c7
fix conflict issues
arthurpassos Jun 19, 2025
f1552f7
fix style
arthurpassos Jun 19, 2025
69412e1
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 20, 2025
c133260
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 20, 2025
2671ff6
resolve todo
arthurpassos Jun 20, 2025
86b0b88
fix conflict issues
arthurpassos Jun 20, 2025
e139c5d
implement partition column type checking
arthurpassos Jun 23, 2025
89a7485
Merge branch 'ClickHouse:master' into s3_hive_style_partitioned_writes
arthurpassos Jun 23, 2025
7518144
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 24, 2025
f40ff9a
add support for azure named collections
arthurpassos Jun 24, 2025
6abdf03
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 25, 2025
5d0aa96
cache compute path_for_read only once
arthurpassos Jun 25, 2025
5561660
resolve a few more todos
arthurpassos Jun 25, 2025
38f737a
resolve one more todo
arthurpassos Jun 25, 2025
3582d1f
forbid partition by without partition wildcard on wildcard strategy
arthurpassos Jun 25, 2025
5c96591
add tests for hard requirement of partition wildcard in partition by …
arthurpassos Jun 25, 2025
b2d0797
support azure as well
arthurpassos Jun 25, 2025
5d1244c
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 25, 2025
174ca13
Revert "cache compute path_for_read only once"
arthurpassos Jun 26, 2025
02ed1d2
improve hive path creation by trimming forward slashes and add azure …
arthurpassos Jun 26, 2025
03c8530
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 26, 2025
0ab0701
fix table already exists issue
arthurpassos Jun 27, 2025
44b219a
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 30, 2025
8fccf0d
support all datetime related types
arthurpassos Jun 30, 2025
ab42ddc
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jun 30, 2025
71ba375
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 10, 2025
d73b316
address minor style comments
arthurpassos Jul 10, 2025
9c65471
address some more comments
arthurpassos Jul 10, 2025
7e47764
make some members const
arthurpassos Jul 10, 2025
26cb916
address some more comments
arthurpassos Jul 10, 2025
08237ea
maybe fix linking err
arthurpassos Jul 10, 2025
cacb967
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 12, 2025
9e4afd2
Revert "maybe fix linking err"
arthurpassos Jul 12, 2025
cda6bb4
Revert "address some more comments"
arthurpassos Jul 12, 2025
860aadb
will it build?
arthurpassos Jul 12, 2025
650ad9d
fix assertion
arthurpassos Jul 14, 2025
1f64950
...
arthurpassos Jul 14, 2025
e0e7947
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 14, 2025
794c064
fix conflicts
arthurpassos Jul 14, 2025
7d7f808
bad arguments definition
arthurpassos Jul 14, 2025
ebc0a32
address comment
arthurpassos Jul 14, 2025
47c4ac9
Throw INCORRECT_DATA in case the expected hive partition columns were…
arthurpassos Jul 15, 2025
c9e52ba
reduce code duplication
arthurpassos Jul 15, 2025
bc628cd
remove INCORRECT_DATA definition from storageobjectstoragesource
arthurpassos Jul 15, 2025
f6df533
reduce code duplication even further
arthurpassos Jul 15, 2025
67a29af
style
arthurpassos Jul 15, 2025
6dab275
rmv coretypes
arthurpassos Jul 15, 2025
08438fc
Revert "rmv coretypes"
arthurpassos Jul 15, 2025
de7e201
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 16, 2025
9f978c8
write feature docs
arthurpassos Jul 16, 2025
31126c3
remove globbed word
arthurpassos Jul 16, 2025
9baf134
try to fix docs
arthurpassos Jul 16, 2025
62a4ca9
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 17, 2025
bad8fe8
add back check that was lost during merge glitch
arthurpassos Jul 21, 2025
8a061e3
address small comments
arthurpassos Jul 21, 2025
9c37a9d
add log
arthurpassos Jul 21, 2025
e22c288
make naming a bit more consistent
arthurpassos Jul 21, 2025
8206a7e
move plain object storage writing logic below the data lake one
arthurpassos Jul 21, 2025
5201bf3
do not allow partition_strategy to be set for data lakes
arthurpassos Jul 21, 2025
2707f0c
use getRawPath for all data lake related paths
arthurpassos Jul 21, 2025
0e98561
...
arthurpassos Jul 21, 2025
1e8a93e
use pathforread and pathforwrite for data lakes as well
arthurpassos Jul 22, 2025
eab8acf
remove stale comment
arthurpassos Jul 22, 2025
e1f6cd4
remove one helper factory get
arthurpassos Jul 22, 2025
9d2f80d
address some more comments
arthurpassos Jul 10, 2025
9a8afb3
address more minor comments
arthurpassos Jul 22, 2025
a050eff
compute path_for_read only once
arthurpassos Jul 22, 2025
7050b20
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 22, 2025
9d1b8ed
move supports_partial_prefix out of Path and rename a few functions
arthurpassos Jul 22, 2025
ec9b9e5
rmv setRawPath in favor of setPathForRead
arthurpassos Jul 22, 2025
9792b4a
use references when possible
arthurpassos Jul 22, 2025
210f319
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 25, 2025
5e6f5e2
possibly fix build issues after sync with private
arthurpassos Jul 25, 2025
5a5c294
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 25, 2025
ea39790
trigger ci
arthurpassos Jul 26, 2025
f341c40
fix merge conflict
arthurpassos Jul 26, 2025
e8642f8
Merge branch 'master' into s3_hive_style_partitioned_writes
arthurpassos Jul 28, 2025
f6a46a8
try to fix build
arthurpassos Jul 28, 2025
48aa454
fix build again
arthurpassos Jul 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 32 additions & 1 deletion docs/en/engines/table-engines/integrations/azureBlobStorage.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This engine provides an integration with [Azure Blob Storage](https://azure.micr

```sql
CREATE TABLE azure_blob_storage_table (name String, value UInt32)
ENGINE = AzureBlobStorage(connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])
ENGINE = AzureBlobStorage(connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression, partition_strategy, partition_columns_in_data_file])
[PARTITION BY expr]
[SETTINGS ...]
```
Expand All @@ -30,6 +30,8 @@ CREATE TABLE azure_blob_storage_table (name String, value UInt32)
- `account_key` - if storage_account_url is used, then account key can be specified here
- `format` — The [format](/interfaces/formats.md) of the file.
- `compression` — Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. By default, it will autodetect compression by file extension. (same as setting to `auto`).
- `partition_strategy` – Options: `WILDCARD` or `HIVE`. `WILDCARD` requires a `{_partition_id}` in the path, which is replaced with the partition key. `HIVE` does not allow wildcards, assumes the path is the table root, and generates Hive-style partitioned directories with Snowflake IDs as filenames and the file format as the extension. Defaults to `WILDCARD`
- `partition_columns_in_data_file` - Only used with `HIVE` partition strategy. Tells ClickHouse whether to expect partition columns to be written in the data file. Defaults `false`.

**Example**

Expand Down Expand Up @@ -95,6 +97,35 @@ SETTINGS filesystem_cache_name = 'cache_for_azure', enable_filesystem_cache = 1;

2. reuse cache configuration (and therefore cache storage) from clickhouse `storage_configuration` section, [described here](/operations/storing-data.md/#using-local-cache)

### PARTITION BY {#partition-by}

`PARTITION BY` — Optional. In most cases you don't need a partition key, and if it is needed you generally don't need a partition key more granular than by month. Partitioning does not speed up queries (in contrast to the ORDER BY expression). You should never use too granular partitioning. Don't partition your data by client identifiers or names (instead, make client identifier or name the first column in the ORDER BY expression).

For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](/sql-reference/data-types/date.md). The partition names here have the `"YYYYMM"` format.

#### Partition strategy {#partition-strategy}

`WILDCARD` (default): Replaces the `{_partition_id}` wildcard in the file path with the actual partition key. Reading is not supported.

`HIVE` implements hive style partitioning for reads & writes. Reading is implemented using a recursive glob pattern. Writing generates files using the following format: `<prefix>/<key1=val1/key2=val2...>/<snowflakeid>.<toLower(file_format)>`.

Note: When using `HIVE` partition strategy, the `use_hive_partitioning` setting has no effect.

Example of `HIVE` partition strategy:

```sql
arthur :) create table azure_table (year UInt16, country String, counter UInt8) ENGINE=AzureBlobStorage(account_name='devstoreaccount1', account_key='Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', storage_account_url = 'http://localhost:30000/devstoreaccount1', container='cont', blob_path='hive_partitioned', format='Parquet', compression='auto', partition_strategy='hive') PARTITION BY (year, country);

arthur :) insert into azure_table values (2020, 'Russia', 1), (2021, 'Brazil', 2);

arthur :) select _path, * from azure_table;

┌─_path──────────────────────────────────────────────────────────────────────┬─year─┬─country─┬─counter─┐
1. │ cont/hive_partitioned/year=2020/country=Russia/7351305360873664512.parquet │ 2020 │ Russia │ 1 │
2. │ cont/hive_partitioned/year=2021/country=Brazil/7351305360894636032.parquet │ 2021 │ Brazil │ 2 │
└────────────────────────────────────────────────────────────────────────────┴──────┴─────────┴─────────┘
```

## See also {#see-also}

[Azure Blob Storage Table Function](/sql-reference/table-functions/azureBlobStorage)
50 changes: 49 additions & 1 deletion docs/en/engines/table-engines/integrations/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ SELECT * FROM s3_engine_table LIMIT 2;

```sql
CREATE TABLE s3_engine_table (name String, value UInt32)
ENGINE = S3(path [, NOSIGN | aws_access_key_id, aws_secret_access_key,] format, [compression])
ENGINE = S3(path [, NOSIGN | aws_access_key_id, aws_secret_access_key,] format, [compression], [partition_strategy], [partition_columns_in_data_file])
[PARTITION BY expr]
[SETTINGS ...]
```
Expand All @@ -46,6 +46,8 @@ CREATE TABLE s3_engine_table (name String, value UInt32)
- `format` — The [format](/sql-reference/formats#formats-overview) of the file.
- `aws_access_key_id`, `aws_secret_access_key` - Long-term credentials for the [AWS](https://aws.amazon.com/) account user. You can use these to authenticate your requests. Parameter is optional. If credentials are not specified, they are used from the configuration file. For more information see [Using S3 for Data Storage](../mergetree-family/mergetree.md#table_engine-mergetree-s3).
- `compression` — Compression type. Supported values: `none`, `gzip/gz`, `brotli/br`, `xz/LZMA`, `zstd/zst`. Parameter is optional. By default, it will auto-detect compression by file extension.
- `partition_strategy` – Options: `WILDCARD` or `HIVE`. `WILDCARD` requires a `{_partition_id}` in the path, which is replaced with the partition key. `HIVE` does not allow wildcards, assumes the path is the table root, and generates Hive-style partitioned directories with Snowflake IDs as filenames and the file format as the extension. Defaults to `WILDCARD`
- `partition_columns_in_data_file` - Only used with `HIVE` partition strategy. Tells ClickHouse whether to expect partition columns to be written in the data file. Defaults `false`.

### Data cache {#data-cache}

Expand Down Expand Up @@ -84,6 +86,52 @@ There are two ways to define cache in configuration file.

For partitioning by month, use the `toYYYYMM(date_column)` expression, where `date_column` is a column with a date of the type [Date](/sql-reference/data-types/date.md). The partition names here have the `"YYYYMM"` format.

#### Partition strategy {#partition-strategy}

`WILDCARD` (default): Replaces the `{_partition_id}` wildcard in the file path with the actual partition key. Reading is not supported.

`HIVE` implements hive style partitioning for reads & writes. Reading is implemented using a recursive glob pattern, it is equivalent to `SELECT * FROM s3('table_root/**.parquet')`.
Writing generates files using the following format: `<prefix>/<key1=val1/key2=val2...>/<snowflakeid>.<toLower(file_format)>`.

Note: When using `HIVE` partition strategy, the `use_hive_partitioning` setting has no effect.

Example of `HIVE` partition strategy:

```sql
arthur :) CREATE TABLE t_03363_parquet (year UInt16, country String, counter UInt8)
ENGINE = S3(s3_conn, filename = 't_03363_parquet', format = Parquet, partition_strategy='hive')
PARTITION BY (year, country);

arthur :) INSERT INTO t_03363_parquet VALUES
(2022, 'USA', 1),
(2022, 'Canada', 2),
(2023, 'USA', 3),
(2023, 'Mexico', 4),
(2024, 'France', 5),
(2024, 'Germany', 6),
(2024, 'Germany', 7),
(1999, 'Brazil', 8),
(2100, 'Japan', 9),
(2024, 'CN', 10),
(2025, '', 11);

arthur :) select _path, * from t_03363_parquet;

┌─_path──────────────────────────────────────────────────────────────────────┬─year─┬─country─┬─counter─┐
1. │ test/t_03363_parquet/year=2100/country=Japan/7329604473272971264.parquet │ 2100 │ Japan │ 9 │
2. │ test/t_03363_parquet/year=2024/country=France/7329604473323302912.parquet │ 2024 │ France │ 5 │
3. │ test/t_03363_parquet/year=2022/country=Canada/7329604473314914304.parquet │ 2022 │ Canada │ 2 │
4. │ test/t_03363_parquet/year=1999/country=Brazil/7329604473289748480.parquet │ 1999 │ Brazil │ 8 │
5. │ test/t_03363_parquet/year=2023/country=Mexico/7329604473293942784.parquet │ 2023 │ Mexico │ 4 │
6. │ test/t_03363_parquet/year=2023/country=USA/7329604473319108608.parquet │ 2023 │ USA │ 3 │
7. │ test/t_03363_parquet/year=2025/country=/7329604473327497216.parquet │ 2025 │ │ 11 │
8. │ test/t_03363_parquet/year=2024/country=CN/7329604473310720000.parquet │ 2024 │ CN │ 10 │
9. │ test/t_03363_parquet/year=2022/country=USA/7329604473298137088.parquet │ 2022 │ USA │ 1 │
10. │ test/t_03363_parquet/year=2024/country=Germany/7329604473306525696.parquet │ 2024 │ Germany │ 6 │
11. │ test/t_03363_parquet/year=2024/country=Germany/7329604473306525696.parquet │ 2024 │ Germany │ 7 │
└────────────────────────────────────────────────────────────────────────────┴──────┴─────────┴─────────┘
```

### Querying partitioned data {#querying-partitioned-data}

This example uses the [docker compose recipe](https://github.com/ClickHouse/examples/tree/5fdc6ff72f4e5137e23ea075c88d3f44b0202490/docker-compose-recipes/recipes/ch-and-minio-S3), which integrates ClickHouse and MinIO. You should be able to reproduce the same queries using S3 by replacing the endpoint and authentication values.
Expand Down
Loading
Loading