Skip to content

Documentation Updates for New Write Related Features#7520

Merged
alamb merged 11 commits intoapache:mainfrom
devinjdangelo:write_docs_update
Sep 12, 2023
Merged

Documentation Updates for New Write Related Features#7520
alamb merged 11 commits intoapache:mainfrom
devinjdangelo:write_docs_update

Conversation

@devinjdangelo
Copy link
Contributor

Which issue does this PR close?

Closes #7499

Rationale for this change

We have added new options for writing files and changed some names around. We should update the documentation so the current state is clear.

What changes are included in this PR?

New documentation for write related options.

Are these changes tested?

Yes by existing tests.

Are there any user-facing changes?

New docs

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @devinjdangelo, I noticed some typos in this pr.


| Option | Description | Default Value |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------- |
| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false |
| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will be appended to this file. | false |

)
```

In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.
In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, the parquet option which supports column-specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @devinjdangelo, I noticed some typos in this pr.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @devinjdangelo -- this looks really great ❤️

Thank you @Weijun-H for the additional review

files in the `dir_name` directory:

```sql
> COPY source_table TO 'dir_name' (FORMAT parquet, PER_THREAD_OUTPUT true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

under the License.
-->

# Write Options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a good idea to add a link to this page into the index https://github.com/apache/arrow-datafusion/blob/main/docs/source/user-guide/sql/index.rst so it show up in the left hand nav bar

@alamb alamb added the documentation Improvements or additions to documentation label Sep 12, 2023
@alamb
Copy link
Contributor

alamb commented Sep 12, 2023

Thanks again @devinjdangelo and @Weijun-H -- I'll merge this and we can continue iterating on the docs in follow on PRs

@alamb alamb merged commit 561e0d7 into apache:main Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document COPY parquet specific options

3 participants