Skip to content

Conversation

@deardeng
Copy link
Contributor

cherry pick from #54614

apache#54614)

…and replay failure

Fix
```
2025-08-04 01:00:20,626 ERROR (replayer|119) [EditLog.loadJournal():1439] replay Operation Type 10, log id: 62731
java.lang.NullPointerException: Cannot invoke "org.apache.doris.catalog.Database.createTableWithLock(org.apache.doris.catalog.Table, boolean, boolean)" because "db" is null
        at org.apache.doris.datasource.InternalCatalog.replayCreateTable(InternalCatalog.java:1359) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.replayCreateTable(Env.java:4767) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:351) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.replayJournal(Env.java:3103) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env$4.runOneCycle(Env.java:2865) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:119) ~[doris-fe.jar:1.2-SNAPSHOT]
```

The cause of the problem, as observed in the observer bdbje log, is as
follows:
1. Key:10, Rename db from dbA to dbB
2. Key:11, Rename db from dbB to dbA
3. key:12, The edit log for create view (table) saves the db name dbB
from step 1.

During replay, because the db has become dbA, replay cannot find the dbB
by using the name dbB, resulting in an error (npe). The follower
crashed.
@deardeng deardeng requested a review from morrySnow as a code owner September 14, 2025 13:16
@deardeng
Copy link
Contributor Author

run buildall

@Thearas
Copy link
Contributor

Thearas commented Sep 14, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 32523 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 35a9ef34bc5e3ed764d6d7744e3634271c51447a, data reload: false

------ Round 1 ----------------------------------
q1	17568	5648	5388	5388
q2	2014	394	284	284
q3	12367	1247	758	758
q4	10498	875	442	442
q5	9395	2381	2148	2148
q6	187	167	133	133
q7	895	765	601	601
q8	9322	1452	1196	1196
q9	5250	4983	4927	4927
q10	6776	2269	1808	1808
q11	479	277	264	264
q12	333	364	209	209
q13	17766	3574	2982	2982
q14	222	221	209	209
q15	531	461	465	461
q16	415	431	372	372
q17	598	865	366	366
q18	6853	6443	6279	6279
q19	1460	940	552	552
q20	322	333	198	198
q21	2888	2119	1944	1944
q22	1074	1025	1002	1002
Total cold run time: 107213 ms
Total hot run time: 32523 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5595	5492	5680	5492
q2	232	321	242	242
q3	2229	2648	2318	2318
q4	1393	1795	1356	1356
q5	4389	5034	4975	4975
q6	173	167	134	134
q7	2053	1974	1770	1770
q8	2658	2811	2701	2701
q9	7202	7257	7166	7166
q10	3014	3238	2732	2732
q11	568	520	513	513
q12	661	772	593	593
q13	3352	3812	3176	3176
q14	275	300	283	283
q15	511	487	476	476
q16	430	478	441	441
q17	1220	1741	1272	1272
q18	7573	7401	7358	7358
q19	808	1055	1120	1055
q20	2028	2089	1934	1934
q21	5322	4996	4576	4576
q22	1129	1093	1026	1026
Total cold run time: 52815 ms
Total hot run time: 51589 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192894 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 35a9ef34bc5e3ed764d6d7744e3634271c51447a, data reload: false

query1	950	381	385	381
query2	6270	1865	1876	1865
query3	8687	198	194	194
query4	33624	23828	24007	23828
query5	4353	584	450	450
query6	312	197	189	189
query7	4203	491	359	359
query8	395	238	235	235
query9	9120	2600	2588	2588
query10	487	320	263	263
query11	18200	15468	15204	15204
query12	166	109	103	103
query13	1564	535	429	429
query14	9615	7128	7118	7118
query15	233	192	180	180
query16	8110	692	496	496
query17	1571	789	594	594
query18	2186	419	329	329
query19	227	193	194	193
query20	138	125	125	125
query21	208	146	107	107
query22	4552	4622	4413	4413
query23	35411	34609	33767	33767
query24	7564	2709	2758	2709
query25	552	494	425	425
query26	966	292	192	192
query27	2006	481	355	355
query28	5552	2211	2183	2183
query29	752	618	455	455
query30	248	192	161	161
query31	1004	922	835	835
query32	92	66	60	60
query33	508	362	314	314
query34	766	869	572	572
query35	812	794	750	750
query36	1039	1079	964	964
query37	117	93	69	69
query38	4036	4134	3951	3951
query39	1552	1514	1514	1514
query40	202	118	109	109
query41	51	52	47	47
query42	127	105	105	105
query43	511	524	471	471
query44	1332	805	816	805
query45	183	172	170	170
query46	888	1056	701	701
query47	1964	2021	1967	1967
query48	415	419	352	352
query49	783	509	428	428
query50	662	686	446	446
query51	7385	7294	7229	7229
query52	104	105	94	94
query53	233	250	185	185
query54	551	573	477	477
query55	81	81	86	81
query56	279	275	258	258
query57	1284	1282	1229	1229
query58	243	228	221	221
query59	3062	3159	3059	3059
query60	296	322	272	272
query61	128	115	125	115
query62	799	755	693	693
query63	227	189	190	189
query64	4028	1018	642	642
query65	3556	3324	3278	3278
query66	1223	424	313	313
query67	16681	15776	15685	15685
query68	7937	833	548	548
query69	494	304	267	267
query70	1194	1157	1107	1107
query71	426	296	256	256
query72	5213	3954	3856	3856
query73	663	741	358	358
query74	10466	9292	9155	9155
query75	3971	3147	2632	2632
query76	3550	1180	774	774
query77	784	359	281	281
query78	10424	10454	9575	9575
query79	3057	892	605	605
query80	730	534	439	439
query81	488	255	221	221
query82	345	121	90	90
query83	160	162	145	145
query84	286	110	89	89
query85	744	360	298	298
query86	356	329	295	295
query87	4328	4291	4245	4245
query88	3535	2419	2406	2406
query89	431	338	299	299
query90	2017	193	192	192
query91	133	137	111	111
query92	66	56	55	55
query93	2047	900	545	545
query94	647	418	306	306
query95	360	280	270	270
query96	501	622	286	286
query97	3217	3274	3149	3149
query98	213	217	203	203
query99	1509	1414	1296	1296
Total cold run time: 295718 ms
Total hot run time: 192894 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.48 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 35a9ef34bc5e3ed764d6d7744e3634271c51447a, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.03	0.03
query3	0.23	0.07	0.06
query4	1.63	0.11	0.10
query5	0.53	0.50	0.49
query6	1.13	0.73	0.74
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.59	0.51	0.50
query10	0.55	0.55	0.57
query11	0.14	0.10	0.10
query12	0.14	0.11	0.12
query13	0.62	0.61	0.59
query14	0.78	0.80	0.79
query15	0.83	0.83	0.81
query16	0.38	0.39	0.38
query17	0.99	1.01	1.07
query18	0.24	0.23	0.22
query19	1.84	1.83	1.76
query20	0.02	0.01	0.00
query21	15.39	0.96	0.61
query22	0.74	0.76	0.68
query23	15.11	1.37	0.53
query24	3.04	1.54	0.62
query25	0.17	0.17	0.06
query26	0.22	0.15	0.13
query27	0.06	0.05	0.06
query28	13.61	1.09	0.44
query29	12.53	3.93	3.24
query30	0.26	0.09	0.06
query31	2.83	0.58	0.39
query32	3.22	0.54	0.47
query33	3.03	3.02	3.04
query34	16.57	5.16	4.51
query35	4.59	4.56	4.57
query36	0.65	0.51	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.02
query40	0.17	0.13	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.32 s
Total hot run time: 28.48 s

@shuke987
Copy link
Collaborator

run fe_ut

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 28.57% (4/14) 🎉
Increment coverage report
Complete coverage report

@morrySnow morrySnow changed the title branch-3.1: [Fix](create table) concurrent rename database causes table creation … #54614 branch-3.1: [Fix](create table) concurrent rename database causes table creation and replay failure #54614 Sep 17, 2025
@morrySnow morrySnow merged commit ed53f91 into apache:branch-3.1 Sep 17, 2025
24 checks passed
@morrySnow morrySnow mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants