Support converting large dates (i.e. +10999-12-31) from string to Date32#7074
Merged
alamb merged 21 commits intoapache:mainfrom Feb 12, 2025
Merged
Support converting large dates (i.e. +10999-12-31) from string to Date32#7074alamb merged 21 commits intoapache:mainfrom
alamb merged 21 commits intoapache:mainfrom
Conversation
sgrebnov
approved these changes
Feb 4, 2025
alamb
reviewed
Feb 6, 2025
Contributor
alamb
left a comment
There was a problem hiding this comment.
Thank you @phillipleblanc -- this looks readonable to me. I only think the PR needs a few more tests and we can merge.
I am sure we could make parsing dates like this faster but we can do that type of optimization as a follow up.
I am also running the cast benchmarks just to be sure this doesn't accidentally introduce a regression and will post the results to this PR
Contributor
Seems ok to me |
Co-authored-by: Andrew Lamb <[email protected]>
…e in decimal conversion (apache#7070) * fix <= check for scale in decimal conversion * Update arrow-cast/src/cast/mod.rs name change Co-authored-by: Arttu <[email protected]> * remove incorrect comment --------- Co-authored-by: Arttu <[email protected]>
* Add another decimal cast edge test case Before 1019f5b this test would fail, as the cast produced 1. 0 is an edge case worth explicitly testing for. * typo/fmt Co-authored-by: Felipe Oliveira Carvalho <[email protected]> --------- Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
…adata (apache#7052) * Support both 0x01 and 0x02 as type for list of booleans * Also support 0 for false inside boolean collections * Use hex notation in tests
…pache#6751) * Fix LocalFileSystem with range request that ends beyond end of file * fix windows * add comment * Seek error * fix seek check * remove windows flag * Get file length from file metadata
…ache#7027) * Introduce UnsafeFlag to manage disabling validation * fix docs
…che#7028) * Rename `ArrayReader` to `RecordBatchDecoder` * Remove alias for `self`
* Minor: Update release schedule * realism
* Refactor some decimal-related code and tests in preparation for adding Decimal32 and Decimal64 support * Fixed symbol * Apply PR feedback * Fixed format problem * Fixed logical merge conflicts * PR feedback
…coder` (apache#7029) * Move `create_primitive_array` into RecordBatchReader * Move `create_list-array` into RecordBatchReader * Move `create_dictionay_array` into RecordBatchReader
* Print Parquet BasicTypeInfo id when present * Improve print_schema documentation * tiny cleanup
…he#7019) * Initial change from Daniel. * Upgrade unit test to be more generic. * Add comments on why we have filter * Cleanup unit tests. * Update object_store/src/local.rs Co-authored-by: Adam Reeve <[email protected]> * Add changes suggested by Adam. * Cleanup match error. * Apply formatting changes suggested by cargo +stable fmt --all. * Apply cosmetic changes suggested by clippy. * Upgrade test_path_with_offset to create temporary directory + files for testing rather than pointing to existing dir. --------- Co-authored-by: Adam Reeve <[email protected]>
…s` (apache#7065) * fix: first none in `ListArray` panics in `cast_with_options` * simplify * fix * Update arrow-cast/src/cast/list.rs Co-authored-by: Jeffrey Vo <[email protected]> --------- Co-authored-by: Jeffrey Vo <[email protected]>
* Add benchmarks for Arrow IPC writer * Add benchmarks for Arrow IPC writer * reuse target buffer * rename, etc * Add compression type * update --------- Co-authored-by: Andy Grove <[email protected]>
…pache#7089) * Minor: Clarify documentaiton on NullBufferBuilder::allocated_size * add note about why allocations are 64 bytes
Contributor
Author
|
Thanks @alamb for the review. I've pushed up fixes for your comments. |
alamb
approved these changes
Feb 10, 2025
Contributor
alamb
left a comment
There was a problem hiding this comment.
Looks good to me -- thanks @phillipleblanc and @sgrebnov
| assert_eq!(3298139, c.value(0)); // 10999-12-31 | ||
| assert_eq!(-723122, c.value(1)); // -0010-02-28 | ||
| assert_eq!(-715817, c.value(2)); // 0010-02-28 | ||
| assert_eq!(c.value(3), c.value(4)); // Expect 0000-01-01 and -0000-01-01 to be parsed the same |
Contributor
|
Thanks again @phillipleblanc |
ryzhyk
pushed a commit
to feldera/feldera
that referenced
this pull request
Feb 25, 2025
Upgrade to the latest delta-rs main branch, which has a workaround for this: apache/arrow-rs#7074 This triggered apache/datafusion#14862 and while we're waiting for the [fix](apache/datafusion#14862) to land and make it into the next datafusion release, I had to add a [patch] section to Cargo.toml to use the fixed-up version of datafusion 45.0. Signed-off-by: Leonid Ryzhyk <[email protected]>
ryzhyk
pushed a commit
to feldera/feldera
that referenced
this pull request
Feb 25, 2025
Upgrade to the latest delta-rs main branch, which has a workaround for this: apache/arrow-rs#7074 This triggered apache/datafusion#14862 and while we're waiting for the [fix](apache/datafusion#14862) to land and make it into the next datafusion release, I had to add a [patch] section to Cargo.toml to use the fixed-up version of datafusion 45.0. Signed-off-by: Leonid Ryzhyk <[email protected]>
github-merge-queue bot
pushed a commit
to feldera/feldera
that referenced
this pull request
Feb 25, 2025
Upgrade to the latest delta-rs main branch, which has a workaround for this: apache/arrow-rs#7074 This triggered apache/datafusion#14862 and while we're waiting for the [fix](apache/datafusion#14862) to land and make it into the next datafusion release, I had to add a [patch] section to Cargo.toml to use the fixed-up version of datafusion 45.0. Signed-off-by: Leonid Ryzhyk <[email protected]>
github-merge-queue bot
pushed a commit
to feldera/feldera
that referenced
this pull request
Feb 25, 2025
Upgrade to the latest delta-rs main branch, which has a workaround for this: apache/arrow-rs#7074 This triggered apache/datafusion#14862 and while we're waiting for the [fix](apache/datafusion#14862) to land and make it into the next datafusion release, I had to add a [patch] section to Cargo.toml to use the fixed-up version of datafusion 45.0. Signed-off-by: Leonid Ryzhyk <[email protected]>
ryzhyk
pushed a commit
to feldera/feldera
that referenced
this pull request
Feb 26, 2025
Upgrade to the latest delta-rs main branch, which has a workaround for this: apache/arrow-rs#7074 This triggered apache/datafusion#14862 and while we're waiting for the [fix](apache/datafusion#14862) to land and make it into the next datafusion release, I had to add a [patch] section to Cargo.toml to use the fixed-up version of datafusion 45.0. Signed-off-by: Leonid Ryzhyk <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Support for casting large dates from string to Date32.
What changes are included in this PR?
Extend the
parse_datemethod, which is used in theimpl Parser for Date32Type, to handle dates which are prefixed with+or-. If the date is not prefixed with+or-, the existing logic is used unmodified.This code isn't as optimized as the code for processing more common date formats - but given that these extended dates are relatively rare in practice, I don't think it matters all that much.
Are there any user-facing changes?
Aside from the desired fix, no.