Commit 6a11a72
Fix BigQueryIO.Read to work the same in Direct and Dataflow runners
This is a partial revert of commits f5e3b8e and 18c82ad.
When running a batch Dataflow job on Cloud Dataflow service, the data
are produced by running a BigQuery export job and then reading all the
files in parallel. When run in the DirectPipelineRunner, BigQuery's JSON
API is used directly. These data come back in different formats.
To compensate, we use BigQueryTableRowIterator to normalize the behavior in
DirectPipelineRunner to the behavior seen when running on the service.
(We cannot change this decision without a major breaking change.)
This patch fixes some discrepancies in the way that BigQueryTableRowIterator is
implemented. Specifically,
*) In commit 18c82ad (response to issue apache#20) we updated the format of
timestamps to be printed as strings. However, we did not correctly match the
behavior of BigQuery export. Here is a sample set of times from the export job
vs the JSON API.
2016-01-06 06:38:00 UTC 1.45206228E9
2016-01-06 06:38:11 UTC 1.452062291E9
2016-01-06 06:38:11.1 UTC 1.4520622911E9
2016-01-06 06:38:11.12 UTC 1.45206229112E9
2016-01-06 06:38:11.123 UTC 1.452062291123E9 *
2016-01-06 06:38:11.1234 UTC 1.4520622911234E9
2016-01-06 06:38:11.12345 UTC 1.45206229112345E9
2016-01-06 06:38:11.123456 UTC 1.452062291123456E9
Before, only the * test would have passed.
*) In commit f5e3b8e we updated TableRow iterator to preserve the
usual TableRow field `f` corresponding to getF(), which returns a
list of fields in Schema order. This was my mistaken attempt to better support
users who have prior experience with BigQuery's API and expect to use
getF()/getV(). However, there were two issues:
1. this change did not affect the behavior in the DataflowPipelineRunner.
2. this was actually a breaking backwards-incompatible change, because common
downstream DoFns may iterate over the keys of the TableRow, and it added
the field "f".
So we should not propagate the change to DataflowPipelineRunner, but instead we
should revert the change to BigQueryTableRowIterator.
(Note this is also a slightly-backwards-incompatible change, but it's
reverting to old behavior and users are more likely to be depending on
DataflowPipelineRunner rather than DirectPipelineRunner.)
Fix both these issues and add tests.
This is still ugly for now. The long-term fix here is to support a parser that
lets users skip TableRow altogether and goes straight to POJOs of their
choosing (See apache#41). That would also eliminate our performance and typing issues
using TableRow as an inner type in pipelines (See e.g.
http://stackoverflow.com/questions/33622227/dataflow-mixing-integer-long-types).
----Release Notes----
[]
-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=1117462361 parent a918a31 commit 6a11a72
3 files changed
Lines changed: 130 additions & 98 deletions
File tree
- sdk/src
- main/java/com/google/cloud/dataflow/sdk/util
- test/java/com/google/cloud/dataflow/sdk
- runners/worker
- util
Lines changed: 75 additions & 47 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | 29 | | |
| |||
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
| 54 | + | |
52 | 55 | | |
53 | 56 | | |
54 | 57 | | |
| |||
133 | 136 | | |
134 | 137 | | |
135 | 138 | | |
136 | | - | |
137 | | - | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
138 | 177 | | |
139 | 178 | | |
140 | 179 | | |
| |||
143 | 182 | | |
144 | 183 | | |
145 | 184 | | |
146 | | - | |
| 185 | + | |
| 186 | + | |
147 | 187 | | |
148 | 188 | | |
149 | 189 | | |
150 | | - | |
151 | | - | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
152 | 194 | | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
| 195 | + | |
158 | 196 | | |
159 | 197 | | |
160 | 198 | | |
| |||
185 | 223 | | |
186 | 224 | | |
187 | 225 | | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
| 226 | + | |
193 | 227 | | |
194 | 228 | | |
195 | 229 | | |
196 | 230 | | |
197 | 231 | | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
198 | 242 | | |
199 | 243 | | |
200 | 244 | | |
| |||
206 | 250 | | |
207 | 251 | | |
208 | 252 | | |
| 253 | + | |
209 | 254 | | |
210 | | - | |
211 | | - | |
| 255 | + | |
| 256 | + | |
212 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
213 | 262 | | |
214 | 263 | | |
215 | 264 | | |
216 | 265 | | |
217 | 266 | | |
218 | 267 | | |
219 | 268 | | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
232 | 272 | | |
233 | 273 | | |
234 | | - | |
235 | | - | |
236 | 274 | | |
237 | 275 | | |
238 | 276 | | |
239 | 277 | | |
240 | 278 | | |
241 | | - | |
| 279 | + | |
242 | 280 | | |
243 | | - | |
| 281 | + | |
244 | 282 | | |
245 | 283 | | |
246 | | - | |
| 284 | + | |
247 | 285 | | |
248 | 286 | | |
249 | 287 | | |
250 | | - | |
251 | | - | |
| 288 | + | |
252 | 289 | | |
253 | 290 | | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
| 291 | + | |
| 292 | + | |
265 | 293 | | |
266 | 294 | | |
267 | 295 | | |
| |||
Lines changed: 14 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
21 | 22 | | |
22 | | - | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
39 | 38 | | |
40 | 39 | | |
41 | 40 | | |
42 | 41 | | |
43 | 42 | | |
44 | | - | |
45 | 43 | | |
46 | 44 | | |
| 45 | + | |
47 | 46 | | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
| 67 | + | |
66 | 68 | | |
67 | 69 | | |
68 | 70 | | |
| |||
73 | 75 | | |
74 | 76 | | |
75 | 77 | | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | 78 | | |
85 | 79 | | |
86 | 80 | | |
| |||
578 | 572 | | |
579 | 573 | | |
580 | 574 | | |
581 | | - | |
| 575 | + | |
582 | 576 | | |
583 | 577 | | |
584 | 578 | | |
585 | 579 | | |
586 | | - | |
| 580 | + | |
587 | 581 | | |
588 | 582 | | |
589 | 583 | | |
| |||
784 | 778 | | |
785 | 779 | | |
786 | 780 | | |
787 | | - | |
| 781 | + | |
788 | 782 | | |
789 | 783 | | |
790 | 784 | | |
| |||
800 | 794 | | |
801 | 795 | | |
802 | 796 | | |
803 | | - | |
| 797 | + | |
804 | 798 | | |
805 | 799 | | |
806 | 800 | | |
| |||
810 | 804 | | |
811 | 805 | | |
812 | 806 | | |
813 | | - | |
| 807 | + | |
814 | 808 | | |
815 | 809 | | |
816 | | - | |
| 810 | + | |
817 | 811 | | |
818 | 812 | | |
819 | 813 | | |
| |||
909 | 903 | | |
910 | 904 | | |
911 | 905 | | |
912 | | - | |
913 | | - | |
914 | | - | |
| 906 | + | |
| 907 | + | |
915 | 908 | | |
916 | | - | |
917 | | - | |
918 | | - | |
| 909 | + | |
919 | 910 | | |
920 | 911 | | |
0 commit comments