Documentation Updates for New Write Related Features by devinjdangelo · Pull Request #7520 · apache/datafusion

devinjdangelo · 2023-09-10T18:43:48Z

Which issue does this PR close?

Rationale for this change

We have added new options for writing files and changed some names around. We should update the documentation so the current state is clear.

What changes are included in this PR?

New documentation for write related options.

Are these changes tested?

Yes by existing tests.

Are there any user-facing changes?

New docs

Weijun-H

Hello @devinjdangelo, I noticed some typos in this pr.

docs/source/user-guide/sql/write_options.md

Weijun-H · 2023-09-11T04:25:12Z

docs/source/user-guide/sql/write_options.md

+
+| Option            | Description                                                                                                                                                                                                                                | Default Value                                                                |
+| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------- |
+| SINGLE_FILE       | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file.                                                                                                                      | false                                                                        |


Suggested change

| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. | false |

| SINGLE_FILE | If true, indicates that this external table is backed by a single file. INSERT INTO queries will be appended to this file. | false |

docs/source/user-guide/sql/write_options.md

Weijun-H · 2023-09-11T04:27:16Z

docs/source/user-guide/sql/write_options.md

+)
+```
+
+In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.


Suggested change

In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, parquet option which support column specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.

In this example, we write the entirety of `source_table` out to a folder of parquet files. The option `single_file_output` set to false, indicates that the destination path should be interpreted as a folder to which the query will output multiple files. One parquet file will be written in parallel to the folder for each partition in the query. The next option `compression` set to `snappy` indicates that unless otherwise specified all columns should use the snappy compression codec. The option `compression::col1` sets an override, so that the column `col1` in the parquet file will use `ZSTD` compression codec with compression level `5`. In general, the parquet option which supports column-specific settings can be specified with the syntax `OPTION::COLUMN.NESTED.PATH`.

docs/source/user-guide/sql/write_options.md

Weijun-H

Hello @devinjdangelo, I noticed some typos in this pr.

Co-authored-by: Alex Huang <[email protected]>

alamb

Thank you @devinjdangelo -- this looks really great ❤️

Thank you @Weijun-H for the additional review

docs/source/user-guide/sql/ddl.md

docs/source/user-guide/sql/dml.md

alamb · 2023-09-11T17:47:16Z

docs/source/user-guide/sql/dml.md

 files in the `dir_name` directory:

 ```sql
-> COPY source_table TO 'dir_name' (FORMAT parquet, PER_THREAD_OUTPUT true);


docs/source/user-guide/sql/write_options.md

alamb · 2023-09-11T17:49:44Z

docs/source/user-guide/sql/write_options.md

+  under the License.
+-->
+
+# Write Options


I think it might be a good idea to add a link to this page into the index https://github.com/apache/arrow-datafusion/blob/main/docs/source/user-guide/sql/index.rst so it show up in the left hand nav bar

docs/source/user-guide/sql/write_options.md

Co-authored-by: Andrew Lamb <[email protected]>

alamb · 2023-09-12T13:57:25Z

Thanks again @devinjdangelo and @Weijun-H -- I'll merge this and we can continue iterating on the docs in follow on PRs

devinjdangelo added 6 commits September 10, 2023 14:00

initial update

562be5c

add external table options

d836825

add parquet docs

5ca92a0

md to html in links

65d2de5

prettier

2a80883

edit SINGLE_FILE_OUPUT description

20215b1

Weijun-H reviewed Sep 11, 2023

View reviewed changes

Weijun-H mentioned this pull request Sep 11, 2023

typo: change delimeter to delimiter #7521

Merged

devinjdangelo and others added 3 commits September 11, 2023 11:28

Apply suggestions from code review

44f9ad2

Co-authored-by: Alex Huang <[email protected]>

prettier

1f17519

Merge remote-tracking branch 'apache/main' into write_docs_update

d2fe391

alamb approved these changes Sep 11, 2023

View reviewed changes

devinjdangelo and others added 2 commits September 11, 2023 19:12

Apply suggestions from code review

5ff004f

Co-authored-by: Andrew Lamb <[email protected]>

add link

b12e386

alamb added the documentation Improvements or additions to documentation label Sep 12, 2023

alamb merged commit 561e0d7 into apache:main Sep 12, 2023

	\| SINGLE_FILE \| If true, indicates that this external table is backed by a single file. INSERT INTO queries will append to this file. \| false \|
	\| SINGLE_FILE \| If true, indicates that this external table is backed by a single file. INSERT INTO queries will be appended to this file. \| false \|

Conversation

devinjdangelo commented Sep 10, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Weijun-H left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Weijun-H Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Weijun-H Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Weijun-H left a comment

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alamb Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alamb commented Sep 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants