chore(core): add resource pool allocation tracing to detect leaks origin by RaphDal · Pull Request #6234 · questdb/questdb

RaphDal · 2025-10-06T14:37:28Z

This PR adds optional tracing functionality to help developers diagnose resource pool leaks by tracking where table readers are allocated.

Example Output

When the CairoEngine is closed, resource pools are released. If one of those resource pool still has allocated resources, we log it and later throw an error:

2025-10-06T13:56:27.970490Z I i.q.c.p.ReaderPool shutting down, table is left behind [table=x~]: Time-limited test borrowed io.questdb.cairo.pool.ReaderPool$R@131fd514:
	at java.base/java.lang.Thread.getStackTrace(Thread.java:2451)
	at io.questdb/io.questdb.cairo.pool.TracingResourcePoolSupervisor.onResourceBorrowed(TracingResourcePoolSupervisor.java:22)
	at io.questdb/io.questdb.cairo.pool.AbstractMultiTenantPool.get0(AbstractMultiTenantPool.java:281)
	at io.questdb/io.questdb.cairo.pool.AbstractMultiTenantPool.get(AbstractMultiTenantPool.java:82)
	at io.questdb.test/io.questdb.test.cairo.pool.ReaderLeftBehindTest.lambda$testClosePoolWhenReaderIsOut$0(ReaderLeftBehindTest.java:44)
	at io.questdb.test/io.questdb.test.AbstractCairoTest.lambda$assertMemoryLeak$9(AbstractCairoTest.java:1245)
	at io.questdb.test/io.questdb.test.tools.TestUtils.assertMemoryLeak(TestUtils.java:750)
	at io.questdb.test/io.questdb.test.AbstractCairoTest.assertMemoryLeak(AbstractCairoTest.java:1240)
	at io.questdb.test/io.questdb.test.AbstractCairoTest.assertMemoryLeak(AbstractCairoTest.java:1223)
	at io.questdb.test/io.questdb.test.cairo.pool.ReaderLeftBehindTest.testClosePoolWhenReaderIsOut(ReaderLeftBehindTest.java:38)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at [email protected]/org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at [email protected]/org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at [email protected]/org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at [email protected]/org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at [email protected]/org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at [email protected]/org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at [email protected]/org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
	at [email protected]/org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at [email protected]/org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:317)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Usage

Enable tracing with this configuration option:

cairo.resource.pool.tracing.enabled=true

By default, this option is only enabled in tests.

How it works

When enabled, the AbstractMultiTenantPool lazily creates a TracingResourcePoolSupervisor instance for each thread that retrieves a TableReader.

This TracingResourcePoolSupervisor instance captures and stores a full stack trace on every resource borrow until the resource is returned back to the pool.

This feature is expensive and intended for development/debugging only. Capturing a full stack trace has significant performance overhead and allocates objects, putting additional pressure on the Garbage collector.

coderabbitai · 2025-10-06T14:38:21Z

Walkthrough

Introduces a configurable flag to enable resource pool tracing, wires it through configuration classes, conditionally activates a tracing supervisor in the multi-tenant pool, and augments shutdown logging to include per-resource trace info when enabled. Adds a tracing supervisor implementation and updates/relocates tests and properties expectations accordingly.

Changes

Cohort / File(s)	Summary
Configuration flag: resource pool tracing `core/src/main/java/io/questdb/PropServerConfiguration.java`, `core/src/main/java/io/questdb/PropertyKey.java`, `core/src/main/java/io/questdb/cairo/CairoConfiguration.java`, `core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java`, `core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java`	Adds config key `cairo.resource.pool.tracing.enabled`; plumbs boolean through server and Cairo configs; wrapper delegates; default impl returns false; PropServerConfiguration reads property and exposes via inner Cairo config.
Pooling and tracing `core/src/main/java/io/questdb/cairo/pool/AbstractMultiTenantPool.java`, `core/src/main/java/io/questdb/cairo/pool/PoolTenant.java`, `core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java`	Conditionally initializes a ThreadLocal supervisor: tracing vs plain; enhances releaseAll logging to include optional trace details; adds default `getSupervisor()` to `PoolTenant`; introduces `TracingResourcePoolSupervisor` capturing borrow stack traces and printing resource info.
Tests and expectations `core/src/test/java/io/questdb/test/ServerMainTest.java`, `core/src/test/java/io/questdb/test/cairo/CairoTestConfiguration.java`, `core/src/test/java/io/questdb/test/cairo/pool/ReaderLeftBehindTest.java`, `core/src/test/java/io/questdb/test/cairo/pool/ReaderPoolTest.java`	Updates expected properties to include new flag; test config enables tracing; adds `ReaderLeftBehindTest` validating shutdown logging with a left-behind reader; removes overlapping test from `ReaderPoolTest`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Pool as AbstractMultiTenantPool
  participant TL as ThreadLocal<Supervisor>
  participant Sup as ResourcePoolSupervisor (Tracing or None)
  participant Res as Resource

  rect rgb(245,248,252)
  note over Pool: Initialization
  Client->>Pool: construct(config)
  alt tracing enabled
    Pool->>TL: new ThreadLocal(TracingResourcePoolSupervisor::new)
  else tracing disabled
    Pool->>TL: new ThreadLocal()
  end
  end

  rect rgb(240,255,240)
  note over Client,Res: Borrow flow
  Client->>Pool: borrow()
  Pool->>TL: get()
  TL-->>Pool: Sup (maybe null)
  opt tracing active
    Pool->>Sup: onResourceBorrowed(Res)
  end
  Pool-->>Client: Res
  end

  rect rgb(255,248,240)
  note over Client,Res: Return flow
  Client->>Pool: return(Res)
  Pool->>TL: get()
  TL-->>Pool: Sup (maybe null)
  opt tracing active
    Pool->>Sup: onResourceReturned(Res)
  end
  end

sequenceDiagram
  autonumber
  participant Pool as AbstractMultiTenantPool
  participant TL as ThreadLocal<Supervisor>
  participant Sup as TracingResourcePoolSupervisor
  participant Log as LogRecorder
  participant Res as Resource (left behind)

  note over Pool: releaseAll(dealine=MAX_VALUE)
  Pool->>Log: xInfo().$("table is left behind...")
  Pool->>TL: get()
  TL-->>Pool: Sup or null
  alt Sup != null
    Pool->>Sup: printResourceInfo(Log, Res)
  else Sup == null
    note over Pool: No extra trace info
  end
  Pool->>Log: $()

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.90% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The description clearly outlines the purpose, usage, and internal workings of the new resource pool tracing feature, directly reflecting the changes introduced in the pull request.
Title Check	✅ Passed	The title clearly describes the primary change by stating that resource pool allocation tracing is being added to aid in detecting the origin of leaks, directly reflecting the main purpose of the pull request and using concise, specific language without extraneous detail.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch rd_tracing_supervisor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

core/src/main/java/io/questdb/cairo/pool/AbstractMultiTenantPool.java

core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java

core/src/main/java/io/questdb/cairo/pool/AbstractMultiTenantPool.java

core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java

core/src/main/java/io/questdb/PropServerConfiguration.java

bluestreak01 · 2025-10-07T17:05:30Z

@CodeRabbit review

coderabbitai · 2025-10-07T17:06:20Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java (3)
6-10: Consider clarifying the per-thread requirement.

The documentation states "each thread should have its own instance" but doesn't explain the consequences if this contract is violated. Consider adding a note that sharing an instance across threads will result in incorrect thread name attribution and potential race conditions in the resources map.

33-33: String generation via toString() remains unaddressed.

The previous review comment suggested using toSink() or similar to avoid string allocation when printing resource info. The current implementation still calls resource.toString(), which generates a string and adds GC pressure.

While this is a debugging feature where some overhead is acceptable, the suggestion to use toSink() would reduce allocations, especially if multiple resources are leaked and printed during shutdown.

Based on past review comments.

34-36: Enhanced for-loop and stack trace toString() remain unaddressed.

Two performance suggestions from previous review remain unaddressed:

Line 34: Using an indexed loop instead of enhanced for-loop would avoid iterator allocation.

Line 35: traceElement.toString() generates strings for each stack frame.

While these are minor optimizations for a debugging feature, they would reduce allocation overhead when printing multiple leaked resources with deep stack traces.

Apply this diff to use an indexed loop:
-            for (StackTraceElement traceElement : stackTrace) {
-                sink.put("\n\tat ").put(traceElement.toString());
+            for (int i = 0, n = stackTrace.length; i < n; i++) {
+                sink.put("\n\tat ").put(stackTrace[i].toString());
             }
Based on past review comments.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dcd824 and 94e70af.

📒 Files selected for processing (12)

core/src/main/java/io/questdb/PropServerConfiguration.java (3 hunks)
core/src/main/java/io/questdb/PropertyKey.java (1 hunks)
core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1 hunks)
core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1 hunks)
core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1 hunks)
core/src/main/java/io/questdb/cairo/pool/AbstractMultiTenantPool.java (3 hunks)
core/src/main/java/io/questdb/cairo/pool/PoolTenant.java (1 hunks)
core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java (1 hunks)
core/src/test/java/io/questdb/test/ServerMainTest.java (1 hunks)
core/src/test/java/io/questdb/test/cairo/CairoTestConfiguration.java (1 hunks)
core/src/test/java/io/questdb/test/cairo/pool/ReaderLeftBehindTest.java (1 hunks)
core/src/test/java/io/questdb/test/cairo/pool/ReaderPoolTest.java (0 hunks)

💤 Files with no reviewable changes (1)

core/src/test/java/io/questdb/test/cairo/pool/ReaderPoolTest.java

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)

GitHub Check: New pull request (Coverage Report Coverage Report)
GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
GitHub Check: New pull request (Hosted Running tests on windows-other)
GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
GitHub Check: New pull request (Hosted Running tests on windows-cairo)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
GitHub Check: New pull request (Hosted Running tests on windows-griffin)
GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
GitHub Check: New pull request (Hosted Running tests on mac-other)
GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
GitHub Check: New pull request (Hosted Running tests on mac-cairo)
GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
GitHub Check: New pull request (Hosted Running tests on mac-griffin)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
GitHub Check: New pull request (Check Changes Check changes)

🔇 Additional comments (7)

core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)

1348-1351: LGTM! Configuration default correctly disables expensive tracing in production.

The default false value appropriately disables resource pool tracing in production environments, aligning with the PR's stated intent that this debugging feature should only be enabled during development/testing due to its performance overhead.

core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1)

746-747: LGTM! Interface method properly defines the configuration flag.

The method declaration is clean and follows the existing patterns in the interface. Placement after isReadOnlyInstance() is consistent with the implementations in DefaultCairoConfiguration and other configuration classes.

core/src/main/java/io/questdb/cairo/pool/PoolTenant.java (1)

49-58: LGTM! Well-designed optional accessor with proper nullability contract.

The default implementation returning null is appropriate for optional supervisor functionality, and the @Nullable annotation correctly documents the contract. The Javadoc clearly explains when the supervisor is present versus absent.

core/src/main/java/io/questdb/PropertyKey.java (1)

585-586: LGTM! Property key follows naming conventions.

The enum constant is properly defined and follows the established naming convention for Cairo configuration properties. The placement with other Cairo properties is logical and the property path string is consistent with the codebase patterns.

core/src/main/java/io/questdb/PropServerConfiguration.java (2)

265-266: Resource pool tracing flag wiring looks solid.

Final field slots the new property cleanly into the configuration state.

4105-4108: Cairo configuration accessor matches the new contract.

Delegating straight to the captured flag ensures consumers see the configured value.

core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java (1)

19-21: Thread name captured on first borrow only.

As noted in the previous review comment, the thread name is captured lazily on first borrow and reused for all subsequent resources. This is correct for the intended thread-local usage pattern, but if the supervisor is accidentally shared across threads, only the first thread's name will be recorded.

Since the class documentation already states "each thread should have its own instance," and the integration uses ThreadLocal in AbstractMultiTenantPool, the current implementation is appropriate for the intended use case.

Based on past review comments.

core/src/main/java/io/questdb/cairo/pool/AbstractMultiTenantPool.java

glasstiger · 2025-10-08T08:43:52Z

[PR Coverage check]

😍 pass : 30 / 39 (76.92%)

file detail

	path	covered line	new line	coverage
🔵	io/questdb/cairo/pool/TableReaderMetadataTenantImpl.java	0	2	00.00%
🔵	io/questdb/cairo/pool/PoolTenant.java	0	1	00.00%
🔵	io/questdb/cairo/pool/SequencerMetadataPool.java	0	2	00.00%
🔵	io/questdb/cairo/pool/WalWriterPool.java	0	2	00.00%
🔵	io/questdb/cairo/pool/SqlCompilerPool.java	0	2	00.00%
🔵	io/questdb/PropServerConfiguration.java	2	2	100.00%
🔵	io/questdb/cairo/pool/TracingResourcePoolSupervisor.java	14	14	100.00%
🔵	io/questdb/cairo/DefaultCairoConfiguration.java	1	1	100.00%
🔵	io/questdb/PropertyKey.java	2	2	100.00%
🔵	io/questdb/cairo/pool/AbstractMultiTenantPool.java	8	8	100.00%
🔵	io/questdb/cairo/pool/ReaderPool.java	2	2	100.00%
🔵	io/questdb/cairo/CairoConfigurationWrapper.java	1	1	100.00%

RaphDal added 2 commits October 6, 2025 16:15

feat(core): Add resource pool allocation tracing to detect leaks origin.

9fae5b7

test: Fix parameters test to add new option

4e024b3

RaphDal added 4 commits October 6, 2025 16:45

test: Remove unnecessary usage of regex

fd4d9cb

formatting

2817301

refactor: Cleanup unecessary interface function

3174e20

fix: remove unnecessary override

94e70af