-
Notifications
You must be signed in to change notification settings - Fork 5k
Closed
Labels
Description
Search before asking
- I had searched in the issues and found no similar issues.
What happened
when recover/failover a workflow from running/failed/stopped/paused state in multi master cluster, the host didn't set to new master's address, the operation may failed.
if old master is not exist, server will report Connection refused, if old master exist, server will report Cannot find the WorkflowExecuteRunnable
2025-03-22 18:51:19.676 ERROR [qtp742969054-35] o.a.d.a.e.w.StopWorkflowInstanceExecutorDelegate:[98] - WorkflowInstance: sleep-20250321085059987 stop failed
org.apache.dolphinscheduler.extract.base.exception.RemoteException: Call method to Host(ip=10.0.6.23, port=15678) failed
at org.apache.dolphinscheduler.extract.base.client.NettyRemotingClient.sendSync(NettyRemotingClient.java:147)
at org.apache.dolphinscheduler.extract.base.client.SyncClientMethodInvoker.invoke(SyncClientMethodInvoker.java:51)
at org.apache.dolphinscheduler.extract.base.client.ClientInvocationHandler.invoke(ClientInvocationHandler.java:56)
at com.sun.proxy.$Proxy830.stopWorkflowInstance(Unknown Source)
at org.apache.dolphinscheduler.api.executor.workflow.StopWorkflowInstanceExecutorDelegate.stopInMaster(StopWorkflowInstanceExecutorDelegate.java:87)
at org.apache.dolphinscheduler.api.executor.workflow.StopWorkflowInstanceExecutorDelegate.execute(StopWorkflowInstanceExecutorDelegate.java:52)
at org.apache.dolphinscheduler.api.executor.workflow.StopWorkflowInstanceExecutorDelegate$StopWorkflowInstanceOperation.execute(StopWorkflowInstanceExecutorDelegate.java:127)
...
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /10.0.6.23:15678
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:750)
What you expected to happen
.
How to reproduce
in multi master cluster, run a workflow, stop (and start) the master which running the workflow, stop workflow in web
Anything else
No response
Version
dev
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct

