-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Around a month ago, we had seen an issue where Pinot brokers for one of our High QPS use-cases had what looked like a deadlock/livelock related issue. The brokers were serving traffic around 300-400 QPS and during the issue the number of the threads in the brokers kept on increasing (to up to 30k/40k). When we took a thread-dump, we saw around 30k threads with the following stack-trace.
We were able to remediate the issue by increasing the number of brokers in the tenant to lower the average QPS per broker-instance. We didn't have a chance to dive-deep into the issue but thought of sharing it here in case someone here has seen this issue as well.
The Pinot base version was 0.7.0 with some cherry-picks from later versions.
"jersey-server-managed-async-executor-154433" #173980 prio=5 os_prio=0 tid=0x00007f5de8468000 nid=0xa1b8 waiting on condition [0x00007f5a0a868000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f6182902770> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
at org.glassfish.hk2.utilities.general.Hk2ThreadLocal.get(Hk2ThreadLocal.java:108)
at org.jvnet.hk2.internal.PerLocatorUtilities.getAutoAnalyzerName(PerLocatorUtilities.java:166)
at org.jvnet.hk2.internal.ConstantActiveDescriptor.<init>(ConstantActiveDescriptor.java:84)
at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetInjecteeDescriptor(ServiceLocatorImpl.java:562)
at org.jvnet.hk2.internal.ServiceLocatorImpl.getInjecteeDescriptor(ServiceLocatorImpl.java:587)
at org.jvnet.hk2.internal.ThreeThirtyResolver.resolve(ThreeThirtyResolver.java:70)
at org.jvnet.hk2.internal.ClazzCreator.resolve(ClazzCreator.java:212)
at org.jvnet.hk2.internal.ClazzCreator.resolveAllDependencies(ClazzCreator.java:229)
at org.jvnet.hk2.internal.ClazzCreator.create(ClazzCreator.java:358)
at org.jvnet.hk2.internal.SystemDescriptor.create(SystemDescriptor.java:487)
at org.jvnet.hk2.internal.PerLookupContext.findOrCreate(PerLookupContext.java:70)
at org.jvnet.hk2.internal.Utilities.createService(Utilities.java:2022)
at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetService(ServiceLocatorImpl.java:774)
at org.jvnet.hk2.internal.ServiceLocatorImpl.internalGetService(ServiceLocatorImpl.java:737)
at org.jvnet.hk2.internal.ServiceLocatorImpl.getService(ServiceLocatorImpl.java:733)
at org.glassfish.jersey.inject.hk2.SupplierFactoryBridge.provide(SupplierFactoryBridge.java:74)
at org.jvnet.hk2.internal.FactoryCreator.create(FactoryCreator.java:153)
at org.jvnet.hk2.internal.SystemDescriptor.create(SystemDescriptor.java:487)
at org.glassfish.jersey.inject.hk2.RequestContext.findOrCreate(RequestContext.java:59)
at org.jvnet.hk2.internal.Utilities.createService(Utilities.java:2022)
at org.jvnet.hk2.internal.ServiceHandleImpl.getService(ServiceHandleImpl.java:114)
- locked <0x00007f7e5d003c88> (a java.lang.Object)
at org.jvnet.hk2.internal.ServiceHandleImpl.getService(ServiceHandleImpl.java:88)
at org.glassfish.jersey.inject.hk2.ContextInjectionResolverImpl.resolve(ContextInjectionResolverImpl.java:103)
at org.glassfish.jersey.inject.hk2.ContextInjectionResolverImpl.resolve(ContextInjectionResolverImpl.java:121)
at org.glassfish.jersey.server.internal.inject.DelegatedInjectionValueParamProvider.lambda$getValueProvider$0(DelegatedInjectionValueParamProvider.java:67)
at org.glassfish.jersey.server.internal.inject.DelegatedInjectionValueParamProvider$$Lambda$199/1475889071.apply(Unknown Source)
at org.glassfish.jersey.server.spi.internal.ParamValueFactoryWithSource.apply(ParamValueFactoryWithSource.java:50)
at org.glassfish.jersey.server.spi.internal.ParameterValueHelper.getParameterValues(ParameterValueHelper.java:64)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$AbstractMethodParamInvoker.getParamValues(JavaResourceMethodDispatcherProvider.java:109)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.lambda$apply$0(ResourceMethodInvoker.java:381)
at org.glassfish.jersey.server.model.ResourceMethodInvoker$$Lambda$233/665077335.call(Unknown Source)
at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2$1.run(ServerRuntime.java:819)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2.run(ServerRuntime.java:814)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Here's the paste for the possible object which could be part of a deadlock or livelock.
┌ /private/tmp 130 ↵
❯❯❯ cat ~/Desktop/a.thdump| grep "0x00007f6182902770" | wc -l
33893