Skip to content

Conversation

@dclandau
Copy link
Contributor

@dclandau dclandau commented Aug 2, 2022

When converting postgres boolean data type it was defaulting to the
parquet parquet.string() data type. Pyarrow would then raise an error
when attempting to convert the boolean data type into its string.

Changing the PostgresToGCSOperator class map data_type to convert
postgres boolean type to bigquery BOOL data type which then maps to
the parquet pa.bool_() data type when _convert_parquet_schema is
called.

closes: #25474

When converting postgres boolean data type it was defaulting to the
parquet parquet.string() data type. Pyarrow would then raise an error
when attempting to convert the boolean data type into its string.

Changing the PostgresToGCSOperator class map `data_type` to convert
postgres boolean type to bigquery `BOOL` data type which then maps to
the parquet `pa.bool_()` data type when `_convert_parquet_schema` is
called.
@dclandau dclandau requested a review from turbaszek as a code owner August 2, 2022 14:47
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Aug 2, 2022
@potiuk potiuk merged commit faf3c4f into apache:main Aug 2, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 2, 2022

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PostgresToGCSOperator parquet format mapping inconsistencies converts boolean data type to string

2 participants