Documentation Updates for New Write Related Features#7520
Documentation Updates for New Write Related Features#7520alamb merged 11 commits intoapache:mainfrom
Conversation
Weijun-H
left a comment
There was a problem hiding this comment.
Hello @devinjdangelo, I noticed some typos in this pr.
|
|
||
| | Option | Description | Default Value | | ||
| | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------- | | ||
| | SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false | |
There was a problem hiding this comment.
| | SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false | | |
| | SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will be appended to this file. | false | |
| ) | ||
| ``` | ||
|
|
||
| In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`. |
There was a problem hiding this comment.
| In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`. | |
| In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, the parquet option which supports column-specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`. |
Weijun-H
left a comment
There was a problem hiding this comment.
Hello @devinjdangelo, I noticed some typos in this pr.
alamb
left a comment
There was a problem hiding this comment.
Thank you @devinjdangelo -- this looks really great ❤️
Thank you @Weijun-H for the additional review
| files in the `dir_name` directory: | ||
|
|
||
| ```sql | ||
| > COPY source_table TO 'dir_name' (FORMAT parquet, PER_THREAD_OUTPUT true); |
| under the License. | ||
| --> | ||
|
|
||
| # Write Options |
There was a problem hiding this comment.
I think it might be a good idea to add a link to this page into the index https://github.com/apache/arrow-datafusion/blob/main/docs/source/user-guide/sql/index.rst so it show up in the left hand nav bar
Co-authored-by: Andrew Lamb <[email protected]>
|
Thanks again @devinjdangelo and @Weijun-H -- I'll merge this and we can continue iterating on the docs in follow on PRs |
Which issue does this PR close?
Closes #7499
Rationale for this change
We have added new options for writing files and changed some names around. We should update the documentation so the current state is clear.
What changes are included in this PR?
New documentation for write related options.
Are these changes tested?
Yes by existing tests.
Are there any user-facing changes?
New docs