Skip to content

[🐛 Bug]: Hub hangs with "Unable to find session" after a while #10404

@xtianus

Description

@xtianus

What happened?

I've set up a Hub and 2 Nodes on three different Ubuntu servers. The Hub is on Ubuntu 16.04.7 LTS and the two nodes on Ubuntu 20.04.4 LTS. Selenium is driven by a Java program using standard libraries.

All works fine for a while (an hour or so), then suddenly the Hub stops responding. The symptoms are:

  • when connecting to http://myhub:4444/ui/index.html I get the normal "Selenium Grid" page but it is stuck on the "Loading..." message
  • the hub output shows many messages containing this:
"exception.message": "Unable to execute request for an existing session: Unable to find session with ID: 98d1a9efa8b787398b29c2d6c5f2ddab
Build info: version: '4.1.2', revision: '9a5a329c5a'
System info: host: 'controller', ip: 'xx.xx.xx.xx', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-210-generic', java.version: '1.8.0_292'
Driver info: driver.version: unknown"
  • the java program gets the exception shown in the "Relevant log output" box.

If I restart the Hub, all goes back to normal: I don't have to restart either the nodes or the Java program.

Command to start the hub:

java -jar selenium-server-4.1.2.jar hub --config hub.toml

Content of hub.toml:

[server]
port = 4444
registration-secret = "xxxxxx"
[sessionqueue]
session-request-timeout = 300

Command to start the node:

java -jar selenium-server-4.1.2.jar node --config node.toml

Content of node.toml:

[server]
port = 5555
registration-secret = "xxxxxx"
[node]
hub = "xx.xx.xx.xx"
override-max-sessions = true
max-sessions = 8
session-timeout = 60
detect-drivers = false
[[node.driver-configuration]]
display-name = "Chrome Latest"
stereotype = "{\"browserName\": \"chrome\"}"
webdriver-executable = '/mypath/chromedriver'

Output of top (last line is the hub):

top - 15:11:10 up 22:00,  3 users,  load average: 0.20, 0.23, 0.19
Tasks: 208 total,   1 running, 207 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.5 us,  1.2 sy,  0.0 ni, 82.3 id,  0.2 wa,  0.0 hi,  1.7 si,  2.2 st
KiB Mem :  8175040 total,  1773696 free,  3951596 used,  2449748 buff/cache
KiB Swap:  8388604 total,  8348196 free,    40408 used.  3875604 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  908 tomcat8   20   0 4055844 1.196g  17808 S  21.9 15.3 115:44.34 java
 1183 mysql     20   0 1670676 528960  15472 S   2.3  6.5  25:39.78 mysqld
15065 root      20   0 4685516 346108  19128 S   2.0  4.2  11:59.94 java

If I restart just the hub, all goes back to normal: nodes register and I see the queue growing in the Selenium Grid web console.

How can we reproduce the issue?

I know you need to reproduce the problem but I can't share my whole application: it's not just a script and it needs a lot of work to be set up (tomcat, mysql, user data etc.)

Relevant log output

org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Build info: version: '4.1.2', revision: '9a5a329c5a'
System info: host: 'controller', ip: 'xx.xx.xx.xx', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-210-generic', java.version: '1.8.0_292'
Driver info: net.yadaframework.selenium.YadaSeleniumUtil$1
Command: [null, newSession {capabilities=[Capabilities {browserName: chrome, goog:chromeOptions: {args: [user-data-dir=/srv/xxx/..., user-agent=Mozilla/5.0 (Win...], extensions: []}, proxy: {httpProxy: yy.yy.yy.yy:9092, proxyType: manual, sslProxy: yy.yy.yy.yy:9092}}], desiredCapabilities=Capabilities {browserName: chrome, goog:chromeOptions: {args: [user-data-dir=/srv/xxx/..., user-agent=Mozilla/5.0 (Win...], extensions: []}, proxy: Proxy(manual, http=yy.yy.yy...}}]
Capabilities {}
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:576)
        at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:245)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:161)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:143)
        at net...
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.UncheckedIOException: java.io.IOException: Stream closed
        at org.openqa.selenium.remote.http.netty.NettyHttpHandler.makeCall(NettyHttpHandler.java:80)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)
        at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
        at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
        at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)
        at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
        at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
        at org.openqa.selenium.remote.http.netty.NettyHttpHandler.execute(NettyHttpHandler.java:51)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$6(RetryRequest.java:80)
        at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:66)
        at net.jodah.failsafe.Execution.executeSync(Execution.java:128)
        at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:379)
        at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:68)
        at org.openqa.selenium.remote.http.RetryRequest.lambda$apply$7(RetryRequest.java:80)
        at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
        at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:56)
        at org.openqa.selenium.remote.http.netty.NettyClient.execute(NettyClient.java:110)
        at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:55)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:102)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:84)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:62)
        at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:156)
        at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:51)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:558)
        ... 17 common frames omitted
Caused by: java.io.IOException: Stream closed
        at java.io.BufferedInputStream.getInIfOpen(BufferedInputStream.java:159)
        at java.io.BufferedInputStream.available(BufferedInputStream.java:410)
        at java.io.FilterInputStream.available(FilterInputStream.java:168)
        at java.io.PushbackInputStream.available(PushbackInputStream.java:277)
        at io.netty.handler.stream.ChunkedStream.isEndOfInput(ChunkedStream.java:82)
        at io.netty.handler.stream.ChunkedStream.readChunk(ChunkedStream.java:109)
        at io.netty.handler.stream.ChunkedStream.readChunk(ChunkedStream.java:36)
        at io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:229)
        at io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:131)
        at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
        at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
        at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:311)
        at org.asynchttpclient.netty.request.body.NettyInputStreamBody.write(NettyInputStreamBody.java:77)
        at org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:433)
        at org.asynchttpclient.netty.channel.NettyConnectListener.writeRequest(NettyConnectListener.java:80)
        at org.asynchttpclient.netty.channel.NettyConnectListener.onSuccess(NettyConnectListener.java:156)
        at org.asynchttpclient.netty.channel.NettyChannelConnector$1.onSuccess(NettyChannelConnector.java:92)
        at org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:26)
        at org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
        at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
        at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:300)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        ... 1 common frames omitted

Operating System

Ubuntu 16.04.7 LTS

Selenium version

Java 1.8.0_292, Selenium 4.1.2.

What are the browser(s) and version(s) where you see this issue?

Chrome 98.0.4758.102

What are the browser driver(s) and version(s) where you see this issue?

ChromeDriver 98.0.4758.80

Are you using Selenium Grid?

4.1.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    B-gridEverything grid and server relatedI-defectSomething is not working as intendedJ-awaiting answerQuestion asked of user; a reply moves it to triage again

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions