ARROW-5380: [C++] Fix memory alignment UBSan errors.#4757
ARROW-5380: [C++] Fix memory alignment UBSan errors.#4757emkornfield wants to merge 2 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4757 +/- ##
==========================================
+ Coverage 86.44% 88.99% +2.55%
==========================================
Files 992 717 -275
Lines 138020 99467 -38553
Branches 1418 0 -1418
==========================================
- Hits 119309 88523 -30786
+ Misses 18349 10944 -7405
+ Partials 362 0 -362
Continue to review full report at Codecov.
|
|
@ursabot benchmark |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#28883) builder has been succeeded. Revision: db49fbb ====================================================== ================== ================== ============================
benchmark baseline contender delta
====================================================== ================== ================== ============================
ValidateSmallAscii 12.74 GiB/s 12.61 GiB/s -141.47 MiB/s (-1.08%)
BuildInt32ColumnByChunk/512 11379.83ns 11202.64ns -177.19ns (-1.56%)
BuildInt32ColumnByChunk/5 198.81ns 200.36ns 1.55ns (0.78%)
BufferedOutputStreamSmallWritesToPipe/real_time 860.02 MiB/s 874.24 MiB/s 14.22 MiB/s (1.65%)
FileOutputStreamLargeWritesToPipe/real_time 2.24 GiB/s 2.3 GiB/s 59.07 MiB/s (2.58%)
BufferedOutputStreamLargeWritesToPipe/real_time 2.24 GiB/s 2.27 GiB/s 31.92 MiB/s (1.39%)
TakeString/32768/1/min_time:1.000 1.45 GiB/s 1.47 GiB/s 21.99 MiB/s (1.48%)
TakeInt64/8388608/1/min_time:1.000 343.49 MiB/s 357.31 MiB/s 13.82 MiB/s (4.02%)
TakeInt64/1048576/1/min_time:1.000 453.24 MiB/s 470.46 MiB/s 17.23 MiB/s (3.80%)
TakeInt64/32768/50/min_time:1.000 342.54 MiB/s 335.58 MiB/s -6.96 MiB/s (-2.03%)
TakeString/32768/0/min_time:1.000 1.62 GiB/s 1.65 GiB/s 31.15 MiB/s (1.87%)
TakeString/1048576/1/min_time:1.000 1.27 GiB/s 1.29 GiB/s 22.73 MiB/s (1.74%)
TakeInt64VsFilter/32768/10/min_time:1.000 1.38 GiB/s 1.39 GiB/s 15.37 MiB/s (1.09%)
TakeFixedSizeList1Int64/1048576/1/min_time:1.000 120.13 MiB/s 121.87 MiB/s 1.74 MiB/s (1.45%)
TakeInt64VsFilter/32768/50/min_time:1.000 779.37 MiB/s 787.26 MiB/s 7.9 MiB/s (1.01%)
FilterFixedSizeList1Int64/8388608/1/min_time:1.000 347.87 MiB/s 354.1 MiB/s 6.23 MiB/s (1.79%)
FilterInt64/8388608/1/min_time:1.000 618.86 MiB/s 631.16 MiB/s 12.3 MiB/s (1.99%)
FilterString/32768/0/min_time:1.000 4.03 GiB/s 4.1 GiB/s 78.39 MiB/s (1.90%)
FilterString/32768/1/min_time:1.000 3.92 GiB/s 3.99 GiB/s 70.94 MiB/s (1.77%)
FilterFixedSizeList1Int64/32768/0/min_time:1.000 399.29 MiB/s 403.82 MiB/s 4.54 MiB/s (1.14%)
FilterString/1048576/1/min_time:1.000 3.3 GiB/s 3.23 GiB/s -71.76 MiB/s (-2.12%)
FilterFixedSizeList1Int64/32768/50/min_time:1.000 171.51 MiB/s 172.7 MiB/s 1.19 MiB/s (0.70%)
FilterFixedSizeList1Int64/1048576/1/min_time:1.000 356.37 MiB/s 354.26 MiB/s -2.11 MiB/s (-0.59%)
FilterInt64/32768/1/min_time:1.000 817.28 MiB/s 807.74 MiB/s -9.54 MiB/s (-1.17%)
FilterInt64/32768/10/min_time:1.000 712.6 MiB/s 706.36 MiB/s -6.23 MiB/s (-0.87%)
FilterString/32768/50/min_time:1.000 1.95 GiB/s 1.96 GiB/s 15.05 MiB/s (0.76%)
FilterString/8388608/1/min_time:1.000 3.46 GiB/s 3.43 GiB/s -37.88 MiB/s (-1.07%)
FilterInt64/32768/50/min_time:1.000 365.68 MiB/s 370.37 MiB/s 4.7 MiB/s (1.28%)
FilterFixedSizeList1Int64/32768/10/min_time:1.000 318.63 MiB/s 314.88 MiB/s -3.75 MiB/s (-1.18%)
ReadJSONBlockWithSchemaMultiThread/real_time 222.26 MiB/s 219.02 MiB/s -3.24 MiB/s (-1.46%)
ChunkJSONLineDelimited 104.61ns 100.43ns -4.17ns (-3.99%)
BuildStringDictionary 45.66 MiB/s 47.48 MiB/s 1.82 MiB/s (3.99%)
UniqueInt64WithNulls/4194304/1024 1.9 GiB/s 1.88 GiB/s -16.19 MiB/s (-0.83%)
SumKernel/32768/1 14.49 GiB/s 14.31 GiB/s -174.51 MiB/s (-1.18%)
ThreadPoolSpawn/threads:2/task_cost:100000/real_time 21766.45246150031 22732.475168703626 966.0227072033158 (4.44%)
ThreadedTaskGroup/threads:2/task_cost:10000/real_time 243879.6678804378 242437.13088468352 -1442.5369957542862 (-0.59%)
ThreadPoolSpawn/threads:4/task_cost:100000/real_time 41313.19311685875 40135.79891525845 -1177.3942016002984 (-2.85%)
ThreadedTaskGroup/threads:4/task_cost:100000/real_time 47294.43689321427 47773.651572188304 479.21467897403636 (1.01%)
ThreadPoolSpawn/threads:8/task_cost:1000/real_time 234707.2491696133 236067.04388044862 1359.7947108353255 (0.58%)
ThreadPoolSpawn/threads:2/task_cost:10000/real_time 215989.39769180672 210750.91108234323 -5238.486609463493 (-2.43%)
ThreadedTaskGroup/threads:8/task_cost:1000/real_time 222985.67371614516 233740.65355218883 10754.979836043669 (4.82%)
ThreadedTaskGroup/threads:1/task_cost:1000/real_time 886778.1114667933 878946.971340187 -7831.140126606332 (-0.88%)
ThreadedTaskGroup/threads:4/task_cost:1000/real_time 230454.4592953691 232475.82248897856 2021.3631936094607 (0.88%)
ThreadPoolSpawn/threads:1/task_cost:1000/real_time 982827.9930659528 953462.7785867956 -29365.214479157235 (-2.99%)
CompareArrayScalarKernel/32768/0 12.15 GiB/s 12.05 GiB/s -97.48 MiB/s (-0.78%)
CompareArrayScalarKernel/32768/1 11.93 GiB/s 12.05 GiB/s 125.43 MiB/s (1.03%)
CompareArrayArrayKernel/32768/0 20.59 GiB/s 20.72 GiB/s 133.79 MiB/s (0.63%)
DetectIntWidthNoNulls 18.97 GiB/s 19.22 GiB/s 262.51 MiB/s (1.35%)
+ BuildInt64DictionaryArraySimilar 379.53 MiB/s 412.36 MiB/s 32.84 MiB/s (8.65%)
+ BuildInt64DictionaryArraySequential 646.98 MiB/s 698.71 MiB/s 51.73 MiB/s (8.00%)
BuildChunkedBinaryArray 382.88 MiB/s 378.39 MiB/s -4.49 MiB/s (-1.17%)
ArrayDataConstructDestruct 68708.70ns 69165.68ns 456.98ns (0.67%)
+ BuildBinaryArray 286.93 MiB/s 315.39 MiB/s 28.47 MiB/s (9.92%)
ParallelMemoryCopy/threads:1/real_time 6.76 GiB/s 6.99 GiB/s 236.68 MiB/s (3.42%)
ParallelMemoryCopy/threads:8/real_time 23.21 GiB/s 23.47 GiB/s 261.88 MiB/s (1.10%)
ParallelMemoryCopy/threads:4/real_time 21.53 GiB/s 21.2 GiB/s -332.31 MiB/s (-1.51%)
ParallelMemoryCopy/threads:16/real_time 22.48 GiB/s 22.77 GiB/s 300.61 MiB/s (1.31%)
ParallelMemoryCopy/threads:40/real_time 21.62 GiB/s 21.86 GiB/s 244.3 MiB/s (1.10%)
+ ParallelMemoryCopy/threads:2/real_time 10.32 GiB/s 10.98 GiB/s 679.54 MiB/s (6.43%)
ReadRecordBatch/16/real_time 197.95 GiB/s 196.43 GiB/s -1.52 GiB/s (-0.77%)
WriteRecordBatch/4096/real_time 691.66 MiB/s 700.99 MiB/s 9.33 MiB/s (1.35%)
+ ReadRecordBatch/4096/real_time 489.3 MiB/s 639.14 MiB/s 149.84 MiB/s (30.62%)
WriteRecordBatch/64/real_time 9.87 GiB/s 10.13 GiB/s 263.55 MiB/s (2.61%)
WriteRecordBatch/256/real_time 6.35 GiB/s 6.48 GiB/s 132.12 MiB/s (2.03%)
WriteRecordBatch/1024/real_time 2.58 GiB/s 2.61 GiB/s 32.21 MiB/s (1.22%)
WriteRecordBatch/16/real_time 11.53 GiB/s 11.88 GiB/s 354.96 MiB/s (3.01%)
ReadRecordBatch/256/real_time 10.15 GiB/s 10.28 GiB/s 134.7 MiB/s (1.30%)
ReadRecordBatch/64/real_time 51.14 GiB/s 51.81 GiB/s 681.34 MiB/s (1.30%)
ReadRecordBatch/1024/real_time 2.5 GiB/s 2.47 GiB/s -26.35 MiB/s (-1.03%)
WriteRecordBatch/1/real_time 11.99 GiB/s 12.38 GiB/s 401.45 MiB/s (3.27%)
WriteRecordBatch/4/real_time 12.03 GiB/s 12.39 GiB/s 373.24 MiB/s (3.03%)
WriteRecordBatch/8192/real_time 343.59 MiB/s 347.61 MiB/s 4.01 MiB/s (1.17%)
FirstTimeBitmapWriter/8192 96.73 MiB/s 97.89 MiB/s 1.16 MiB/s (1.20%)
+ GenerateBitsUnrolled/8192 139.15 MiB/s 149.99 MiB/s 10.84 MiB/s (7.79%)
VisitBits/8192 102.32 MiB/s 100.94 MiB/s -1.38 MiB/s (-1.35%)
+ BitmapReader/8192 98.96 MiB/s 109.27 MiB/s 10.31 MiB/s (10.42%)
- BitmapWriter/8192 83.85 MiB/s 77.89 MiB/s -5.96 MiB/s (-7.11%)
- GenerateBits/8192 88.66 MiB/s 80.93 MiB/s -7.73 MiB/s (-8.72%)
====================================================== ================== ================== ============================
124 result(s) not shown due to tiny delta |
|
@ursabot benchmark --help |
|
|
@ursabot benchmark --benchmark-filter=Bit |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#28904) builder has been succeeded. Revision: db49fbb ========================== ============ ============ ====================
benchmark baseline contender delta
========================== ============ ============ ====================
- GenerateBits/8192 88.66 MiB/s 80.77 MiB/s -7.88 MiB/s (-8.89%)
+ BitmapReader/8192 98.91 MiB/s 109.34 MiB/s 10.42 MiB/s (10.54%)
+ GenerateBitsUnrolled/8192 136.98 MiB/s 150.46 MiB/s 13.48 MiB/s (9.84%)
VisitBits/8192 102.24 MiB/s 101.13 MiB/s -1.12 MiB/s (-1.09%)
FirstTimeBitmapWriter/8192 96.33 MiB/s 97.98 MiB/s 1.65 MiB/s (1.71%)
- BitmapWriter/8192 83.81 MiB/s 77.92 MiB/s -5.89 MiB/s (-7.03%)
========================== ============ ============ ====================
4 result(s) not shown due to tiny delta |
pitrou
left a comment
There was a problem hiding this comment.
LGTM. I hope debug mode doesn't become significantly slower because of this.
| return reinterpret_cast<T*>(&internal::non_null_filler); | ||
| } | ||
|
|
||
| template <typename T> |
There was a problem hiding this comment.
I'm no C++ guru, but can't you make a single method for this by templating the input type too, e.g.
template <typename T, typename I = uint8_t>
inline typename std::enable_if<std::is_integral<T>::value, T>::type SafeLoad(
const I* unaligned) {
typename std::remove_const<T>::type ret;
std::memcpy(&ret, unaligned, sizeof(T));
return ret;
}
There was a problem hiding this comment.
This is correct, I think I like how the methods are now though since it is more explicit when casting and a pass-through when not. One method could certainly delegate to the other.
| int64_t* impala_last_day_nanos = reinterpret_cast<int64_t*>(impala_timestamp); | ||
| *impala_last_day_nanos = last_day_units * NanosecondsPerUnit; | ||
| auto last_day_nanos = last_day_units * NanosecondsPerUnit; | ||
| // Strage might be unaligned, so use mempcy instead of reinterpret_cast |
There was a problem hiding this comment.
All even indexed Int96 in a vector will be unaligned (according to int64_t alignment).
There was a problem hiding this comment.
Good point I'll open up a follow-up PR, we must not hav good test data here or UBSan isn't foolproof, or somehow I didn't run this test properly
There was a problem hiding this comment.
Wait were you just commenting on my comment?
There was a problem hiding this comment.
Yes, I was commenting on your comment on the "strange" part :)
There was a problem hiding this comment.
Did you notice the typo? ("Strage")
|
Heh, didn't see Antoine merged, opened link from ML :) |
|
@fsaintjacques thank you for the comments I'll address them in a follow-up PR> |
- Add utility methods for unaligned loads use where errors are discovered. - Upgrade version of flatbuffers to avoid issues with unaligned load in that library - Discover bug in spec that makes zero-copy well defined behavior virtually impossible with flatbuffers (need to discuss on ML). For now I'm not turning on ASAN and will file a follow-up JIRA to track this. Still needed: - [ ] Performance testing - [X] Discuss flatbuffers issues (I sent e-mail to LM) Author: Micah Kornfield <[email protected]> Author: emkornfield <[email protected]> Closes #4757 from emkornfield/ubsan_mem and squashes the following commits: 5528584 <emkornfield> remove TODO db49fbb <Micah Kornfield> Ubsan excluding flatbuffers
- Add utility methods for unaligned loads use where errors are discovered. - Upgrade version of flatbuffers to avoid issues with unaligned load in that library - Discover bug in spec that makes zero-copy well defined behavior virtually impossible with flatbuffers (need to discuss on ML). For now I'm not turning on ASAN and will file a follow-up JIRA to track this. Still needed: - [ ] Performance testing - [X] Discuss flatbuffers issues (I sent e-mail to LM) Author: Micah Kornfield <[email protected]> Author: emkornfield <[email protected]> Closes #4757 from emkornfield/ubsan_mem and squashes the following commits: 5528584 <emkornfield> remove TODO db49fbb <Micah Kornfield> Ubsan excluding flatbuffers
are discovered.
load in that library
virtually impossible with flatbuffers (need to discuss on ML). For now I'm
not turning on ASAN and will file a follow-up JIRA to track this.
Still needed: