Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Aug 21, 2025

What problem does this PR solve?

Fix segment number mismatch caused by erroneously skipped segments during concurrent incremental open on auto-partitioned table:

Problem

During concurrent incremental open on an auto-partitioned table, one sink may incorrectly assume that stream opened by another sink have already been opened and begin writing data while those segments are still being opened. This leads to some segments being silently skipped and results in a segment number mismatch. For example(two instances, 4 BEs: a, b, c, d):

Time Event
t0 sink1 and sink2 start incremental open for BEs a, b, c, d.
t1 sink1 adds a, b, c to _load_stream_map and initiates open.
t2 sink2 adds d to _load_stream_map and initiates open.
t3 sink1 completes open for a and b; c is still in progress.
t4 sink2 successfully opens d, assumes a, b, c are all ready, and starts writing. Because c is not yet fully open, its segments are skipped, causing the mismatch.

Expected behavior

A sink must wait until all stream it depends on are fully opened before starting any write.

Proposed fix

All sinks open the full set of streams (a, b, c, d) instead of a partial subset. Lock on each stream guarantees that:

  • Duplicate open attempts are prevented:only the first sink performs the actual open; subsequent sinks wait until the open is complete.
  • Expected behavior is preserved:every sink waits until all streams are fully opened before starting any write, eliminating skipped segments and the resulting segment-number mismatch.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui sollhui force-pushed the fix_segment_num_mismatch branch 5 times, most recently from b71c278 to 1ce3d0e Compare August 26, 2025 15:43
@sollhui sollhui changed the title [test] [fix](move-memtable) fix move memtable lose send segment Aug 26, 2025
@sollhui
Copy link
Contributor Author

sollhui commented Aug 26, 2025

run buildall

@sollhui sollhui changed the title [fix](move-memtable) fix move memtable lose send segment [fix](move-memtable) fix segment number mismatch for erroneously skipped segments Aug 26, 2025
@sollhui sollhui force-pushed the fix_segment_num_mismatch branch from 1ce3d0e to 20257a4 Compare August 26, 2025 15:59
@doris-robot
Copy link

TPC-H: Total hot run time: 33603 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1ce3d0e36853c7527f70ea40f2cdc66cc6d33a09, data reload: false

------ Round 1 ----------------------------------
q1	17634	5202	5035	5035
q2	1940	282	185	185
q3	10285	1296	739	739
q4	10217	987	537	537
q5	7501	2294	2384	2294
q6	172	157	129	129
q7	885	755	604	604
q8	9312	1291	1091	1091
q9	6932	5118	5159	5118
q10	6923	2410	1950	1950
q11	491	285	275	275
q12	337	346	211	211
q13	17758	3628	2993	2993
q14	245	231	224	224
q15	568	487	489	487
q16	410	414	374	374
q17	609	865	362	362
q18	7466	7187	6894	6894
q19	1426	945	575	575
q20	333	335	221	221
q21	4045	3180	2338	2338
q22	1075	1050	967	967
Total cold run time: 106564 ms
Total hot run time: 33603 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5174	5130	5108	5108
q2	247	316	225	225
q3	2182	2702	2311	2311
q4	1377	1753	1357	1357
q5	4201	4441	4545	4441
q6	224	185	147	147
q7	2089	1930	1807	1807
q8	2640	2763	2589	2589
q9	7432	7303	7326	7303
q10	3102	3264	2862	2862
q11	564	512	486	486
q12	681	776	649	649
q13	3690	3870	3302	3302
q14	276	325	396	325
q15	519	476	478	476
q16	453	491	471	471
q17	1174	1584	1391	1391
q18	7778	7899	7491	7491
q19	782	782	822	782
q20	1875	1958	1805	1805
q21	4665	4342	4330	4330
q22	1103	1047	1022	1022
Total cold run time: 52228 ms
Total hot run time: 50680 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185614 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1ce3d0e36853c7527f70ea40f2cdc66cc6d33a09, data reload: false

query1	1007	370	401	370
query2	6517	1745	1768	1745
query3	6743	221	221	221
query4	26094	23718	23333	23333
query5	4308	617	482	482
query6	316	227	197	197
query7	4640	522	293	293
query8	276	233	220	220
query9	8614	2865	2881	2865
query10	446	335	290	290
query11	15807	15131	14850	14850
query12	164	119	114	114
query13	1649	535	432	432
query14	8591	5787	5746	5746
query15	206	191	164	164
query16	7293	660	483	483
query17	1219	735	595	595
query18	2016	425	330	330
query19	204	187	181	181
query20	125	122	118	118
query21	215	123	108	108
query22	4340	4083	4099	4083
query23	33766	32949	33092	32949
query24	8103	2358	2356	2356
query25	542	482	406	406
query26	1275	271	155	155
query27	2766	500	345	345
query28	4409	2237	2223	2223
query29	782	557	439	439
query30	287	239	187	187
query31	900	784	735	735
query32	91	73	71	71
query33	552	375	347	347
query34	781	841	500	500
query35	811	812	753	753
query36	990	1001	921	921
query37	125	104	90	90
query38	4063	3970	3997	3970
query39	1498	1434	1400	1400
query40	225	127	112	112
query41	61	58	54	54
query42	130	111	109	109
query43	489	499	480	480
query44	1331	856	853	853
query45	172	170	166	166
query46	852	1011	641	641
query47	1781	1828	1772	1772
query48	376	409	334	334
query49	708	493	381	381
query50	633	684	402	402
query51	4167	4110	4099	4099
query52	109	116	107	107
query53	229	255	193	193
query54	588	590	521	521
query55	91	89	92	89
query56	312	324	293	293
query57	1206	1200	1103	1103
query58	279	263	291	263
query59	2751	2768	2606	2606
query60	346	342	323	323
query61	127	121	126	121
query62	788	715	652	652
query63	220	186	187	186
query64	4353	1016	728	728
query65	4236	4232	4206	4206
query66	1168	408	330	330
query67	15599	15438	15308	15308
query68	8299	916	576	576
query69	501	329	280	280
query70	1234	1167	1130	1130
query71	465	336	316	316
query72	5810	4853	4986	4853
query73	772	714	353	353
query74	8889	9088	8840	8840
query75	3826	3056	2636	2636
query76	3700	1153	728	728
query77	792	443	313	313
query78	9683	9755	8888	8888
query79	2457	797	595	595
query80	606	531	471	471
query81	467	255	230	230
query82	431	138	108	108
query83	289	246	238	238
query84	295	108	87	87
query85	775	369	336	336
query86	343	304	291	291
query87	4303	4234	4254	4234
query88	3143	2242	2228	2228
query89	392	306	291	291
query90	1957	220	216	216
query91	142	150	155	150
query92	84	73	67	67
query93	1727	988	645	645
query94	689	402	314	314
query95	401	320	315	315
query96	492	576	272	272
query97	2688	2721	2609	2609
query98	252	230	212	212
query99	1406	1396	1293	1293
Total cold run time: 273168 ms
Total hot run time: 185614 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.14 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1ce3d0e36853c7527f70ea40f2cdc66cc6d33a09, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.05
query3	0.24	0.08	0.07
query4	1.62	0.10	0.11
query5	0.42	0.44	0.41
query6	1.17	0.63	0.66
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.61	0.54	0.52
query10	0.60	0.58	0.58
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.62	0.60	0.60
query14	0.80	0.84	0.85
query15	0.88	0.85	0.84
query16	0.40	0.39	0.39
query17	1.03	1.04	1.03
query18	0.21	0.20	0.20
query19	1.89	1.91	1.78
query20	0.02	0.01	0.01
query21	15.38	0.94	0.59
query22	0.80	1.18	0.84
query23	14.78	1.35	0.64
query24	7.22	1.48	0.37
query25	0.52	0.22	0.11
query26	0.55	0.15	0.14
query27	0.06	0.05	0.04
query28	10.25	0.92	0.42
query29	12.53	3.84	3.19
query30	3.04	3.03	2.93
query31	2.82	0.57	0.38
query32	3.26	0.54	0.47
query33	3.07	3.14	3.08
query34	16.06	5.48	4.83
query35	4.92	5.00	4.90
query36	0.70	0.50	0.50
query37	0.10	0.07	0.07
query38	0.05	0.05	0.04
query39	0.03	0.03	0.02
query40	0.18	0.14	0.14
query41	0.08	0.03	0.03
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 107.49 s
Total hot run time: 32.14 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.78% (17148/33115)
Line Coverage 37.26% (156234/419351)
Region Coverage 31.93% (119036/372785)
Branch Coverage 33.24% (52339/157460)

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 27, 2025
@liaoxin01
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 35287 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 20257a449e75af48ef28d5a6653a00bedb7faeea, data reload: false

------ Round 1 ----------------------------------
q1	17653	5562	5345	5345
q2	2069	337	214	214
q3	10177	1420	740	740
q4	10245	1088	549	549
q5	7592	2706	2711	2706
q6	210	198	142	142
q7	1054	825	639	639
q8	9382	1606	1245	1245
q9	7013	5484	5375	5375
q10	7059	2449	1964	1964
q11	536	330	277	277
q12	371	400	247	247
q13	17778	3779	2985	2985
q14	257	244	231	231
q15	590	510	498	498
q16	442	443	378	378
q17	615	918	371	371
q18	7497	7059	7198	7059
q19	1293	1105	606	606
q20	348	369	234	234
q21	4093	3363	2474	2474
q22	1098	1052	1008	1008
Total cold run time: 107372 ms
Total hot run time: 35287 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5540	5491	5453	5453
q2	265	360	230	230
q3	2253	2785	2337	2337
q4	1388	1877	1374	1374
q5	4719	4559	4619	4559
q6	272	184	136	136
q7	2113	2057	1828	1828
q8	3125	2818	2885	2818
q9	7423	7294	7413	7294
q10	3280	3343	2918	2918
q11	671	518	510	510
q12	712	807	620	620
q13	3832	3912	3282	3282
q14	334	316	286	286
q15	561	490	487	487
q16	469	515	467	467
q17	1260	1825	1490	1490
q18	7941	7652	7798	7652
q19	867	868	1177	868
q20	2041	1991	1821	1821
q21	4987	4416	4347	4347
q22	1113	1052	995	995
Total cold run time: 55166 ms
Total hot run time: 51772 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187563 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 20257a449e75af48ef28d5a6653a00bedb7faeea, data reload: false

query1	1067	457	427	427
query2	6586	1771	1771	1771
query3	6756	235	228	228
query4	26949	23391	23510	23391
query5	4389	677	532	532
query6	366	246	240	240
query7	4665	509	311	311
query8	325	287	254	254
query9	8662	2864	2900	2864
query10	503	348	321	321
query11	16118	15277	14864	14864
query12	180	128	123	123
query13	1701	593	450	450
query14	8707	5899	5945	5899
query15	224	205	176	176
query16	7751	679	502	502
query17	1271	777	662	662
query18	2063	440	409	409
query19	207	197	212	197
query20	134	124	123	123
query21	222	132	112	112
query22	4265	4213	4017	4017
query23	33697	32894	32980	32894
query24	8178	2427	2410	2410
query25	574	516	443	443
query26	1233	285	172	172
query27	2705	515	366	366
query28	4341	2241	2216	2216
query29	796	596	524	524
query30	299	221	199	199
query31	896	803	740	740
query32	87	82	84	82
query33	562	416	364	364
query34	807	845	534	534
query35	830	846	754	754
query36	981	1013	885	885
query37	124	116	94	94
query38	4077	4057	3934	3934
query39	1496	1422	1441	1422
query40	226	135	128	128
query41	72	70	64	64
query42	132	117	119	117
query43	522	496	485	485
query44	1335	859	873	859
query45	185	185	176	176
query46	869	1014	651	651
query47	1797	1811	1749	1749
query48	395	422	324	324
query49	752	505	417	417
query50	643	681	410	410
query51	4115	4172	4080	4080
query52	118	112	105	105
query53	255	263	205	205
query54	622	599	573	573
query55	91	92	92	92
query56	345	337	334	334
query57	1195	1180	1130	1130
query58	296	287	289	287
query59	2635	2717	2677	2677
query60	367	363	361	361
query61	170	169	169	169
query62	797	746	661	661
query63	233	196	199	196
query64	4457	1146	865	865
query65	4297	4237	4238	4237
query66	1131	446	421	421
query67	15393	15304	15028	15028
query68	8563	923	585	585
query69	490	336	293	293
query70	1267	1118	1138	1118
query71	473	353	385	353
query72	5959	5036	5047	5036
query73	751	628	364	364
query74	8880	8986	9126	8986
query75	3954	3140	2628	2628
query76	3737	1197	776	776
query77	794	409	357	357
query78	9651	9562	8873	8873
query79	2457	845	613	613
query80	690	611	531	531
query81	481	269	223	223
query82	444	150	117	117
query83	295	276	251	251
query84	310	125	100	100
query85	899	537	451	451
query86	351	332	292	292
query87	4287	4340	4208	4208
query88	3140	2214	2246	2214
query89	403	335	307	307
query90	1945	239	236	236
query91	167	166	214	166
query92	95	79	75	75
query93	1511	1011	645	645
query94	698	424	335	335
query95	413	341	332	332
query96	514	581	284	284
query97	2642	2646	2576	2576
query98	253	222	219	219
query99	1470	1421	1312	1312
Total cold run time: 276223 ms
Total hot run time: 187563 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 20257a449e75af48ef28d5a6653a00bedb7faeea, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.06	0.06
query3	0.25	0.08	0.09
query4	1.60	0.12	0.12
query5	0.45	0.42	0.42
query6	1.23	0.65	0.67
query7	0.03	0.03	0.02
query8	0.06	0.04	0.04
query9	0.62	0.52	0.52
query10	0.57	0.58	0.57
query11	0.17	0.11	0.12
query12	0.16	0.13	0.12
query13	0.65	0.62	0.62
query14	0.80	0.85	0.85
query15	0.91	0.85	0.88
query16	0.40	0.41	0.39
query17	1.08	1.05	1.10
query18	0.22	0.20	0.20
query19	1.94	1.82	1.84
query20	0.02	0.02	0.02
query21	15.41	0.98	0.59
query22	0.78	1.18	0.66
query23	14.91	1.42	0.63
query24	6.62	1.38	1.22
query25	0.44	0.17	0.13
query26	0.57	0.16	0.14
query27	0.06	0.06	0.05
query28	9.98	0.93	0.45
query29	12.63	3.91	3.23
query30	3.09	3.08	2.99
query31	2.83	0.58	0.39
query32	3.24	0.56	0.48
query33	3.14	3.10	3.24
query34	15.95	5.54	4.82
query35	4.90	5.02	4.94
query36	0.73	0.52	0.49
query37	0.11	0.08	0.07
query38	0.06	0.04	0.04
query39	0.03	0.03	0.03
query40	0.18	0.14	0.15
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 107.14 s
Total hot run time: 33.24 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.78% (17148/33115)
Line Coverage 37.26% (156238/419374)
Region Coverage 31.93% (119021/372790)
Branch Coverage 33.24% (52345/157466)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.75% (23017/32533)
Line Coverage 57.02% (239047/419264)
Region Coverage 52.52% (198720/378336)
Branch Coverage 54.18% (85862/158464)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.75% (23017/32533)
Line Coverage 57.02% (239047/419264)
Region Coverage 52.52% (198720/378336)
Branch Coverage 54.18% (85862/158464)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.82% (23041/32533)
Line Coverage 57.11% (239462/419264)
Region Coverage 52.70% (199388/378336)
Branch Coverage 54.31% (86056/158464)

3 similar comments
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.82% (23041/32533)
Line Coverage 57.11% (239462/419264)
Region Coverage 52.70% (199388/378336)
Branch Coverage 54.31% (86056/158464)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.82% (23041/32533)
Line Coverage 57.11% (239462/419264)
Region Coverage 52.70% (199388/378336)
Branch Coverage 54.31% (86056/158464)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.82% (23041/32533)
Line Coverage 57.11% (239462/419264)
Region Coverage 52.70% (199388/378336)
Branch Coverage 54.31% (86056/158464)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.75% (23017/32533)
Line Coverage 57.02% (239071/419264)
Region Coverage 52.56% (198855/378336)
Branch Coverage 54.19% (85870/158464)

@sollhui
Copy link
Contributor Author

sollhui commented Aug 28, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33990 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 20257a449e75af48ef28d5a6653a00bedb7faeea, data reload: false

------ Round 1 ----------------------------------
q1	17674	5177	5063	5063
q2	2020	315	211	211
q3	10242	1248	723	723
q4	10221	1022	515	515
q5	7550	2375	2327	2327
q6	193	174	140	140
q7	939	815	645	645
q8	9356	1494	1125	1125
q9	6871	5154	5066	5066
q10	6923	2408	2009	2009
q11	501	306	288	288
q12	355	358	238	238
q13	17778	3671	3047	3047
q14	245	241	225	225
q15	566	497	490	490
q16	443	436	391	391
q17	611	859	361	361
q18	7345	7175	6939	6939
q19	1577	955	556	556
q20	338	335	226	226
q21	3930	3179	2426	2426
q22	1076	1034	979	979
Total cold run time: 106754 ms
Total hot run time: 33990 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5258	5118	5094	5094
q2	262	330	230	230
q3	2209	2634	2331	2331
q4	1410	1750	1336	1336
q5	4187	4411	4616	4411
q6	213	168	131	131
q7	2076	2076	1835	1835
q8	2746	2737	2555	2555
q9	7264	7301	7329	7301
q10	3175	3266	2878	2878
q11	598	525	504	504
q12	695	760	635	635
q13	3542	4040	3168	3168
q14	294	296	273	273
q15	534	486	512	486
q16	463	506	494	494
q17	1190	1535	1369	1369
q18	7792	7829	7474	7474
q19	846	874	995	874
q20	1950	1947	1808	1808
q21	4825	4349	4276	4276
q22	1084	1037	977	977
Total cold run time: 52613 ms
Total hot run time: 50440 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186694 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 20257a449e75af48ef28d5a6653a00bedb7faeea, data reload: false

query1	1057	448	442	442
query2	6569	1817	1783	1783
query3	6750	234	223	223
query4	26535	23478	22979	22979
query5	4412	638	509	509
query6	350	260	223	223
query7	4656	515	305	305
query8	302	267	249	249
query9	8630	2903	2898	2898
query10	533	335	301	301
query11	16058	15263	15028	15028
query12	181	124	118	118
query13	1684	602	468	468
query14	9879	6010	6003	6003
query15	228	204	173	173
query16	7226	681	492	492
query17	1655	762	663	663
query18	2013	430	332	332
query19	208	197	173	173
query20	132	123	123	123
query21	218	137	114	114
query22	4208	4245	4107	4107
query23	33960	33055	32671	32671
query24	8135	2362	2396	2362
query25	608	523	474	474
query26	1253	278	170	170
query27	2732	523	371	371
query28	4401	2271	2234	2234
query29	829	633	483	483
query30	291	227	195	195
query31	889	813	730	730
query32	90	81	89	81
query33	612	391	341	341
query34	811	836	510	510
query35	844	835	782	782
query36	997	1009	914	914
query37	134	116	89	89
query38	4142	4060	3947	3947
query39	1481	1477	1412	1412
query40	235	134	127	127
query41	70	67	62	62
query42	128	117	118	117
query43	534	520	493	493
query44	1379	870	862	862
query45	181	182	184	182
query46	875	1017	654	654
query47	1782	1848	1720	1720
query48	390	415	326	326
query49	742	515	417	417
query50	643	684	398	398
query51	4082	4292	4077	4077
query52	120	118	108	108
query53	250	263	200	200
query54	627	614	551	551
query55	105	91	93	91
query56	347	333	339	333
query57	1207	1228	1118	1118
query58	305	315	280	280
query59	2621	2710	2547	2547
query60	358	354	343	343
query61	167	156	160	156
query62	803	741	657	657
query63	237	200	197	197
query64	4501	1116	818	818
query65	4296	4228	4230	4228
query66	1189	431	362	362
query67	15449	15189	15207	15189
query68	7887	922	583	583
query69	494	332	299	299
query70	1253	1141	1121	1121
query71	454	352	330	330
query72	5949	5006	4964	4964
query73	682	609	360	360
query74	8949	9171	8904	8904
query75	3575	3095	2617	2617
query76	3364	1244	786	786
query77	811	402	350	350
query78	9550	9513	8863	8863
query79	2606	838	591	591
query80	622	568	504	504
query81	510	271	247	247
query82	488	144	116	116
query83	265	267	245	245
query84	260	106	96	96
query85	994	471	434	434
query86	393	332	312	312
query87	4387	4249	4222	4222
query88	3699	2257	2237	2237
query89	398	325	296	296
query90	1877	226	231	226
query91	165	163	130	130
query92	91	78	74	74
query93	2012	982	655	655
query94	694	418	330	330
query95	427	325	326	325
query96	488	580	279	279
query97	2689	2713	2570	2570
query98	242	227	212	212
query99	1356	1423	1331	1331
Total cold run time: 276947 ms
Total hot run time: 186694 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 20257a449e75af48ef28d5a6653a00bedb7faeea, data reload: false

query1	0.06	0.06	0.05
query2	0.09	0.05	0.05
query3	0.25	0.08	0.08
query4	1.64	0.11	0.11
query5	0.46	0.44	0.43
query6	1.16	0.64	0.65
query7	0.03	0.03	0.03
query8	0.06	0.05	0.05
query9	0.61	0.53	0.54
query10	0.58	0.59	0.58
query11	0.16	0.12	0.11
query12	0.15	0.12	0.13
query13	0.62	0.63	0.62
query14	0.80	0.83	0.85
query15	0.87	0.86	0.88
query16	0.39	0.42	0.39
query17	1.05	1.09	1.07
query18	0.21	0.21	0.20
query19	1.96	1.83	1.85
query20	0.02	0.02	0.02
query21	15.39	0.98	0.58
query22	0.81	1.22	0.74
query23	14.78	1.37	0.64
query24	6.41	1.05	0.88
query25	0.53	0.17	0.14
query26	0.53	0.16	0.13
query27	0.06	0.06	0.06
query28	10.38	0.95	0.45
query29	12.57	3.87	3.24
query30	3.06	3.03	3.00
query31	2.84	0.59	0.39
query32	3.25	0.57	0.47
query33	3.10	3.03	3.07
query34	16.14	5.50	4.91
query35	4.94	4.93	5.01
query36	0.71	0.51	0.50
query37	0.11	0.07	0.07
query38	0.06	0.05	0.04
query39	0.04	0.02	0.02
query40	0.19	0.16	0.14
query41	0.08	0.02	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 107.23 s
Total hot run time: 33.04 s

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.66% (22987/32533)
Line Coverage 56.91% (238603/419265)
Region Coverage 52.38% (198207/378384)
Branch Coverage 54.05% (85663/158484)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.79% (17150/33115)
Line Coverage 37.27% (156289/419375)
Region Coverage 31.95% (119122/372838)
Branch Coverage 33.25% (52368/157486)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.66% (22987/32533)
Line Coverage 56.91% (238603/419265)
Region Coverage 52.38% (198207/378384)
Branch Coverage 54.05% (85663/158484)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.77% (23022/32533)
Line Coverage 57.02% (239081/419265)
Region Coverage 52.50% (198659/378384)
Branch Coverage 54.16% (85842/158484)

@dataroaring dataroaring merged commit 13f4367 into apache:master Aug 29, 2025
30 of 32 checks passed
github-actions bot pushed a commit that referenced this pull request Aug 29, 2025
…ped segments (#55092)

### What problem does this PR solve?

Fix segment number mismatch caused by erroneously skipped segments
during concurrent incremental open on auto-partitioned table:

#### Problem
During concurrent incremental open on an auto-partitioned table, one
sink may incorrectly assume that stream opened by another sink have
already been opened and begin writing data while those segments are
still being opened. This leads to some segments being silently skipped
and results in a segment number mismatch. For example(two instances, 4
BEs: a, b, c, d):
| Time | Event |
| ---- |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| t0 | `sink1` and `sink2` start incremental open for BEs **a, b, c,
d**. |
| t1 | `sink1` adds **a, b, c** to `_load_stream_map` and initiates
open. |
| t2 | `sink2` adds **d** to `_load_stream_map` and initiates open. |
| t3 | `sink1` completes open for **a** and **b**; **c** is still in
progress. |
| t4 | `sink2` successfully opens **d**, assumes **a, b, c** are **all**
ready, and starts writing. Because **c** is not yet fully open, its
segments are skipped, causing the mismatch. |

#### Expected behavior
A sink must wait until all stream it depends on are fully opened before
starting any write.

#### Proposed fix
All sinks open the full set of streams (a, b, c, d) instead of a partial
subset. Lock on each stream guarantees that:
- Duplicate open attempts are prevented:only the first sink performs the
actual open; subsequent sinks wait until the open is complete.
- Expected behavior is preserved:every sink waits until all streams are
fully opened before starting any write, eliminating skipped segments and
the resulting segment-number mismatch.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
github-actions bot pushed a commit that referenced this pull request Aug 29, 2025
…ped segments (#55092)

### What problem does this PR solve?

Fix segment number mismatch caused by erroneously skipped segments
during concurrent incremental open on auto-partitioned table:

#### Problem
During concurrent incremental open on an auto-partitioned table, one
sink may incorrectly assume that stream opened by another sink have
already been opened and begin writing data while those segments are
still being opened. This leads to some segments being silently skipped
and results in a segment number mismatch. For example(two instances, 4
BEs: a, b, c, d):
| Time | Event |
| ---- |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
| t0 | `sink1` and `sink2` start incremental open for BEs **a, b, c,
d**. |
| t1 | `sink1` adds **a, b, c** to `_load_stream_map` and initiates
open. |
| t2 | `sink2` adds **d** to `_load_stream_map` and initiates open. |
| t3 | `sink1` completes open for **a** and **b**; **c** is still in
progress. |
| t4 | `sink2` successfully opens **d**, assumes **a, b, c** are **all**
ready, and starts writing. Because **c** is not yet fully open, its
segments are skipped, causing the mismatch. |

#### Expected behavior
A sink must wait until all stream it depends on are fully opened before
starting any write.

#### Proposed fix
All sinks open the full set of streams (a, b, c, d) instead of a partial
subset. Lock on each stream guarantees that:
- Duplicate open attempts are prevented:only the first sink performs the
actual open; subsequent sinks wait until the open is complete.
- Expected behavior is preserved:every sink waits until all streams are
fully opened before starting any write, eliminating skipped segments and
the resulting segment-number mismatch.

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
morrySnow pushed a commit that referenced this pull request Sep 4, 2025
…neously skipped segments #55092 (#55471)

Cherry-picked from #55092

Co-authored-by: hui lai <[email protected]>
dataroaring pushed a commit that referenced this pull request Sep 5, 2025
…neously skipped segments #55092 (#55470)

Cherry-picked from #55092

Co-authored-by: hui lai <[email protected]>
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.9-merged dev/3.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants