GH-47740: [C++][Parquet] Fix undefined behavior when reading invalid Parquet data#47741
GH-47740: [C++][Parquet] Fix undefined behavior when reading invalid Parquet data#47741pitrou merged 2 commits intoapache:mainfrom
Conversation
|
@github-actions crossbow submit -g cpp |
|
@AntoinePrv Would you like to take a look? |
| // There may be remaining null if they are not greedily filled by either decoder calls | ||
| check_and_handle_fully_null_remaining(); | ||
|
|
||
| ARROW_DCHECK(batch.is_done() || exhausted()); |
There was a problem hiding this comment.
This check could trigger if the RLE-bit-packed data is invalid (for example a run of invalid size). @AntoinePrv
|
Revision: d620685 Submitted crossbow builds: ursacomputing/crossbow @ actions-0059c16459 |
|
Valgrind failure is unrelated and will be fixed by #47743 |
|
After merging your PR, Conbench analyzed the 2 benchmarking runs that have been run so far on merge-commit 33d1f32. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…Parquet data (#47741) ### Rationale for this change Fix issues found by OSS-Fuzz when invalid Parquet data is fed to the Parquet reader: * https://issues.oss-fuzz.com/issues/447262173 * https://issues.oss-fuzz.com/issues/447480433 * https://issues.oss-fuzz.com/issues/447490896 * https://issues.oss-fuzz.com/issues/447693724 * https://issues.oss-fuzz.com/issues/447693728 * https://issues.oss-fuzz.com/issues/449498800 ### Are these changes tested? Yes, using the updated fuzz regression files from apache/arrow-testing#115 ### Are there any user-facing changes? No. **This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.) * GitHub Issue: #47740 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
| typename EncodingTraits<DType>::Accumulator* out, | ||
| int* out_num_values) { | ||
| std::vector<ByteArray> values(num_values); | ||
| std::vector<ByteArray> values(num_values - null_count); |
There was a problem hiding this comment.
Well, this was not a problem in itself, it was just allocating too much memory :)
…valid Parquet data (apache#47741) ### Rationale for this change Fix issues found by OSS-Fuzz when invalid Parquet data is fed to the Parquet reader: * https://issues.oss-fuzz.com/issues/447262173 * https://issues.oss-fuzz.com/issues/447480433 * https://issues.oss-fuzz.com/issues/447490896 * https://issues.oss-fuzz.com/issues/447693724 * https://issues.oss-fuzz.com/issues/447693728 * https://issues.oss-fuzz.com/issues/449498800 ### Are these changes tested? Yes, using the updated fuzz regression files from apache/arrow-testing#115 ### Are there any user-facing changes? No. **This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.) * GitHub Issue: apache#47740 Authored-by: Antoine Pitrou <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Rationale for this change
Fix issues found by OSS-Fuzz when invalid Parquet data is fed to the Parquet reader:
Are these changes tested?
Yes, using the updated fuzz regression files from apache/arrow-testing#115
Are there any user-facing changes?
No.
This PR contains a "Critical Fix". (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.)