Skip to content

Conversation

@Mrhs121
Copy link
Contributor

@Mrhs121 Mrhs121 commented Sep 20, 2025

Purpose of the pull request

An NPE error is thrown when starting the master.

[WI-0][TI-0] - 2025-09-21 03:24:47.217 INFO  [Master-Server] o.a.d.s.m.c.ClusterManager:[93] - Initialized WorkerClusters: [ ]
[WI-0][TI-0] - 2025-09-21 03:24:47.218 ERROR [Curator-TreeCache-0] o.a.c.f.r.c.TreeCache:[828] -
java.lang.NullPointerException: null
        at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperTreeCacheListenerAdapter.childEvent(ZookeeperTreeCacheListenerAdapter.java:42)
        at org.apache.curator.framework.recipes.cache.TreeCache.lambda$callListeners$1(TreeCache.java:811)
        at org.apache.curator.framework.listen.MappingListenerManager.lambda$forEach$0(MappingListenerManager.java:92)
        at org.apache.curator.framework.listen.MappingListenerManager.forEach(MappingListenerManager.java:89)
        at org.apache.curator.framework.listen.StandardListenerManager.forEach(StandardListenerManager.java:89)
        at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:807)
        at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:79)
        at org.apache.curator.framework.recipes.cache.TreeCache$2.run(TreeCache.java:909)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
[WI-0][TI-0] - 2025-09-21 03:24:47.219 INFO  [Curator-TreeCache-0] o.a.d.s.m.c.AbstractClusterSubscribeListener:[41] - Server MasterServerMetadata(super=BaseServerMetadata(processId=46421, serverStartupTime=1758396285183, address=10.1.2.10:5678, cpuUsage=0.435457953936797, memoryUsage=0.6507625579833984, serverStatus=NORMAL)) added
[WI-0][TI-0] - 2025-09-21 03:24:47.219 ERROR [Curator-TreeCache-0] o.a.c.f.r.c.TreeCache:[828] -
java.lang.NullPointerException: null
        at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperTreeCacheListenerAdapter.childEvent(ZookeeperTreeCacheListenerAdapter.java:42)
        at org.apache.curator.framework.recipes.cache.TreeCache.lambda$callListeners$1(TreeCache.java:811)
        at org.apache.curator.framework.listen.MappingListenerManager.lambda$forEach$0(MappingListenerManager.java:92)
        at org.apache.curator.framework.listen.MappingListenerManager.forEach(MappingListenerManager.java:89)
        at org.apache.curator.framework.listen.StandardListenerManager.forEach(StandardListenerManager.java:89)
        at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:807)
        at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:79)
        at org.apache.curator.framework.recipes.cache.TreeCache$2.run(TreeCache.java:909)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
[WI-0][TI-0] - 2025-09-21 03:24:47.288 INFO  [Master-Server] o.a.d.s.m.c.ClusterManager:[59] - ClusterManager started...
[WI-0][TI-0] - 2025-09-21 03:24:47.290 INFO  [Master-Server] o.a.d.s.m.c.ClusterStateMonitors:[46] - ClusterStateMonitors started...
[WI-0][TI-0] - 2025-09-21 03:24:47.513 INFO  [Master-Server] o.a.d.s.m.e.WorkflowEventBusFireWorkers:[71] - WorkflowEventBusFireWorkers s

First, let me explain why NPE is thrown. From the source code of curator below, you can see that when the event is of the following four types(INITIALIZED or CONNECTION_SUSPENDED or CONNECTION_LOST or CONNECTION_RECONNECTED), the data in event is null by default. Dolphin does not handle the null value of data, so npe is thrown

截屏2025-09-21 03 36 50 截屏2025-09-21 03 36 38

Curator only logs the error message, so the NPE doesn’t break master startup; nevertheless, we should still eliminate the misleading stack-trace to avoid confusing users.

截屏2025-09-21 03 35 06 截屏2025-09-21 03 34 50

Brief change log

Verify this pull request

This pull request is code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(or)

Pull Request Notice

Pull Request Notice

If your pull request contains incompatible change, you should also add it to docs/docs/en/guide/upgrade/incompatible.md

@Mrhs121 Mrhs121 changed the title [Chore] prevent NPE when handling zk connection events [Chore] Prevent NPE when handling zk connection events Sep 20, 2025
ruanwenjun
ruanwenjun previously approved these changes Sep 22, 2025
Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 42 to 53
// When the event type is INITIALIZED or CONNECTION_SUSPENDED or CONNECTION_LOST or CONNECTION_RECONNECTED, the
// data in the event is null by default
if (event.getData() == null) {
return;
}
Copy link
Member

@ruanwenjun ruanwenjun Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a known case, since this not affect the program, only print a log, so no one fix it. It wouldn't better to filter here by event type(NODE_ADDED, NODE_UPDATED, NODE_REMOVED)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, amended.

@ruanwenjun ruanwenjun added improvement make more easy to user or prompt friendly priority:low labels Sep 22, 2025
@ruanwenjun ruanwenjun added this to the 3.3.2 milestone Sep 22, 2025
Copy link
Member

@ruanwenjun ruanwenjun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@SbloodyS SbloodyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SbloodyS SbloodyS requested a review from ruanwenjun September 26, 2025 08:54
@sonarqubecloud
Copy link

@SbloodyS SbloodyS merged commit b47b3c1 into apache:dev Sep 26, 2025
71 of 72 checks passed
davidzollo pushed a commit to davidzollo/dolphinscheduler that referenced this pull request Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend improvement make more easy to user or prompt friendly priority:low ready-to-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants