Skip to content

chore(core): add resource pool allocation tracing to detect leaks origin#6234

Merged
bluestreak01 merged 9 commits intomasterfrom
rd_tracing_supervisor
Oct 9, 2025
Merged

chore(core): add resource pool allocation tracing to detect leaks origin#6234
bluestreak01 merged 9 commits intomasterfrom
rd_tracing_supervisor

Conversation

@RaphDal
Copy link
Copy Markdown
Contributor

@RaphDal RaphDal commented Oct 6, 2025

This PR adds optional tracing functionality to help developers diagnose resource pool leaks by tracking where table readers are allocated.

Example Output

When the CairoEngine is closed, resource pools are released. If one of those resource pool still has allocated resources, we log it and later throw an error:

2025-10-06T13:56:27.970490Z I i.q.c.p.ReaderPool shutting down, table is left behind [table=x~]: Time-limited test borrowed io.questdb.cairo.pool.ReaderPool$R@131fd514:
	at java.base/java.lang.Thread.getStackTrace(Thread.java:2451)
	at io.questdb/io.questdb.cairo.pool.TracingResourcePoolSupervisor.onResourceBorrowed(TracingResourcePoolSupervisor.java:22)
	at io.questdb/io.questdb.cairo.pool.AbstractMultiTenantPool.get0(AbstractMultiTenantPool.java:281)
	at io.questdb/io.questdb.cairo.pool.AbstractMultiTenantPool.get(AbstractMultiTenantPool.java:82)
	at io.questdb.test/io.questdb.test.cairo.pool.ReaderLeftBehindTest.lambda$testClosePoolWhenReaderIsOut$0(ReaderLeftBehindTest.java:44)
	at io.questdb.test/io.questdb.test.AbstractCairoTest.lambda$assertMemoryLeak$9(AbstractCairoTest.java:1245)
	at io.questdb.test/io.questdb.test.tools.TestUtils.assertMemoryLeak(TestUtils.java:750)
	at io.questdb.test/io.questdb.test.AbstractCairoTest.assertMemoryLeak(AbstractCairoTest.java:1240)
	at io.questdb.test/io.questdb.test.AbstractCairoTest.assertMemoryLeak(AbstractCairoTest.java:1223)
	at io.questdb.test/io.questdb.test.cairo.pool.ReaderLeftBehindTest.testClosePoolWhenReaderIsOut(ReaderLeftBehindTest.java:38)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at [email protected]/org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at [email protected]/org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at [email protected]/org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at [email protected]/org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at [email protected]/org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at [email protected]/org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at [email protected]/org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
	at [email protected]/org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
	at [email protected]/org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
	at java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:317)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java)
	at java.base/java.lang.Thread.run(Thread.java:1583)

Usage

Enable tracing with this configuration option:

cairo.resource.pool.tracing.enabled=true

By default, this option is only enabled in tests.

How it works

When enabled, the AbstractMultiTenantPool lazily creates a TracingResourcePoolSupervisor instance for each thread that retrieves a TableReader.

This TracingResourcePoolSupervisor instance captures and stores a full stack trace on every resource borrow until the resource is returned back to the pool.

This feature is expensive and intended for development/debugging only. Capturing a full stack trace has significant performance overhead and allocates objects, putting additional pressure on the Garbage collector.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 6, 2025

Walkthrough

Introduces a configurable flag to enable resource pool tracing, wires it through configuration classes, conditionally activates a tracing supervisor in the multi-tenant pool, and augments shutdown logging to include per-resource trace info when enabled. Adds a tracing supervisor implementation and updates/relocates tests and properties expectations accordingly.

Changes

Cohort / File(s) Summary
Configuration flag: resource pool tracing
core/src/main/java/io/questdb/PropServerConfiguration.java, core/src/main/java/io/questdb/PropertyKey.java, core/src/main/java/io/questdb/cairo/CairoConfiguration.java, core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java, core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java
Adds config key cairo.resource.pool.tracing.enabled; plumbs boolean through server and Cairo configs; wrapper delegates; default impl returns false; PropServerConfiguration reads property and exposes via inner Cairo config.
Pooling and tracing
core/src/main/java/io/questdb/cairo/pool/AbstractMultiTenantPool.java, core/src/main/java/io/questdb/cairo/pool/PoolTenant.java, core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java
Conditionally initializes a ThreadLocal supervisor: tracing vs plain; enhances releaseAll logging to include optional trace details; adds default getSupervisor() to PoolTenant; introduces TracingResourcePoolSupervisor capturing borrow stack traces and printing resource info.
Tests and expectations
core/src/test/java/io/questdb/test/ServerMainTest.java, core/src/test/java/io/questdb/test/cairo/CairoTestConfiguration.java, core/src/test/java/io/questdb/test/cairo/pool/ReaderLeftBehindTest.java, core/src/test/java/io/questdb/test/cairo/pool/ReaderPoolTest.java
Updates expected properties to include new flag; test config enables tracing; adds ReaderLeftBehindTest validating shutdown logging with a left-behind reader; removes overlapping test from ReaderPoolTest.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Pool as AbstractMultiTenantPool
  participant TL as ThreadLocal<Supervisor>
  participant Sup as ResourcePoolSupervisor (Tracing or None)
  participant Res as Resource

  rect rgb(245,248,252)
  note over Pool: Initialization
  Client->>Pool: construct(config)
  alt tracing enabled
    Pool->>TL: new ThreadLocal(TracingResourcePoolSupervisor::new)
  else tracing disabled
    Pool->>TL: new ThreadLocal()
  end
  end

  rect rgb(240,255,240)
  note over Client,Res: Borrow flow
  Client->>Pool: borrow()
  Pool->>TL: get()
  TL-->>Pool: Sup (maybe null)
  opt tracing active
    Pool->>Sup: onResourceBorrowed(Res)
  end
  Pool-->>Client: Res
  end

  rect rgb(255,248,240)
  note over Client,Res: Return flow
  Client->>Pool: return(Res)
  Pool->>TL: get()
  TL-->>Pool: Sup (maybe null)
  opt tracing active
    Pool->>Sup: onResourceReturned(Res)
  end
  end
Loading
sequenceDiagram
  autonumber
  participant Pool as AbstractMultiTenantPool
  participant TL as ThreadLocal<Supervisor>
  participant Sup as TracingResourcePoolSupervisor
  participant Log as LogRecorder
  participant Res as Resource (left behind)

  note over Pool: releaseAll(dealine=MAX_VALUE)
  Pool->>Log: xInfo().$("table is left behind...")
  Pool->>TL: get()
  TL-->>Pool: Sup or null
  alt Sup != null
    Pool->>Sup: printResourceInfo(Log, Res)
  else Sup == null
    note over Pool: No extra trace info
  end
  Pool->>Log: $()
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.90% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed The description clearly outlines the purpose, usage, and internal workings of the new resource pool tracing feature, directly reflecting the changes introduced in the pull request.
Title Check ✅ Passed The title clearly describes the primary change by stating that resource pool allocation tracing is being added to aid in detecting the origin of leaks, directly reflecting the main purpose of the pull request and using concise, specific language without extraneous detail.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rd_tracing_supervisor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bluestreak01
Copy link
Copy Markdown
Member

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 7, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java (3)

6-10: Consider clarifying the per-thread requirement.

The documentation states "each thread should have its own instance" but doesn't explain the consequences if this contract is violated. Consider adding a note that sharing an instance across threads will result in incorrect thread name attribution and potential race conditions in the resources map.


33-33: String generation via toString() remains unaddressed.

The previous review comment suggested using toSink() or similar to avoid string allocation when printing resource info. The current implementation still calls resource.toString(), which generates a string and adds GC pressure.

While this is a debugging feature where some overhead is acceptable, the suggestion to use toSink() would reduce allocations, especially if multiple resources are leaked and printed during shutdown.

Based on past review comments.


34-36: Enhanced for-loop and stack trace toString() remain unaddressed.

Two performance suggestions from previous review remain unaddressed:

  1. Line 34: Using an indexed loop instead of enhanced for-loop would avoid iterator allocation.
  2. Line 35: traceElement.toString() generates strings for each stack frame.

While these are minor optimizations for a debugging feature, they would reduce allocation overhead when printing multiple leaked resources with deep stack traces.

Apply this diff to use an indexed loop:

-            for (StackTraceElement traceElement : stackTrace) {
-                sink.put("\n\tat ").put(traceElement.toString());
+            for (int i = 0, n = stackTrace.length; i < n; i++) {
+                sink.put("\n\tat ").put(stackTrace[i].toString());
             }

Based on past review comments.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dcd824 and 94e70af.

📒 Files selected for processing (12)
  • core/src/main/java/io/questdb/PropServerConfiguration.java (3 hunks)
  • core/src/main/java/io/questdb/PropertyKey.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/CairoConfigurationWrapper.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/pool/AbstractMultiTenantPool.java (3 hunks)
  • core/src/main/java/io/questdb/cairo/pool/PoolTenant.java (1 hunks)
  • core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java (1 hunks)
  • core/src/test/java/io/questdb/test/ServerMainTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/cairo/CairoTestConfiguration.java (1 hunks)
  • core/src/test/java/io/questdb/test/cairo/pool/ReaderLeftBehindTest.java (1 hunks)
  • core/src/test/java/io/questdb/test/cairo/pool/ReaderPoolTest.java (0 hunks)
💤 Files with no reviewable changes (1)
  • core/src/test/java/io/questdb/test/cairo/pool/ReaderPoolTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Hosted Running tests on windows-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (7)
core/src/main/java/io/questdb/cairo/DefaultCairoConfiguration.java (1)

1348-1351: LGTM! Configuration default correctly disables expensive tracing in production.

The default false value appropriately disables resource pool tracing in production environments, aligning with the PR's stated intent that this debugging feature should only be enabled during development/testing due to its performance overhead.

core/src/main/java/io/questdb/cairo/CairoConfiguration.java (1)

746-747: LGTM! Interface method properly defines the configuration flag.

The method declaration is clean and follows the existing patterns in the interface. Placement after isReadOnlyInstance() is consistent with the implementations in DefaultCairoConfiguration and other configuration classes.

core/src/main/java/io/questdb/cairo/pool/PoolTenant.java (1)

49-58: LGTM! Well-designed optional accessor with proper nullability contract.

The default implementation returning null is appropriate for optional supervisor functionality, and the @Nullable annotation correctly documents the contract. The Javadoc clearly explains when the supervisor is present versus absent.

core/src/main/java/io/questdb/PropertyKey.java (1)

585-586: LGTM! Property key follows naming conventions.

The enum constant is properly defined and follows the established naming convention for Cairo configuration properties. The placement with other Cairo properties is logical and the property path string is consistent with the codebase patterns.

core/src/main/java/io/questdb/PropServerConfiguration.java (2)

265-266: Resource pool tracing flag wiring looks solid.

Final field slots the new property cleanly into the configuration state.


4105-4108: Cairo configuration accessor matches the new contract.

Delegating straight to the captured flag ensures consumers see the configured value.

core/src/main/java/io/questdb/cairo/pool/TracingResourcePoolSupervisor.java (1)

19-21: Thread name captured on first borrow only.

As noted in the previous review comment, the thread name is captured lazily on first borrow and reused for all subsequent resources. This is correct for the intended thread-local usage pattern, but if the supervisor is accidentally shared across threads, only the first thread's name will be recorded.

Since the class documentation already states "each thread should have its own instance," and the integration uses ThreadLocal in AbstractMultiTenantPool, the current implementation is appropriate for the intended use case.

Based on past review comments.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 30 / 39 (76.92%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/pool/TableReaderMetadataTenantImpl.java 0 2 00.00%
🔵 io/questdb/cairo/pool/PoolTenant.java 0 1 00.00%
🔵 io/questdb/cairo/pool/SequencerMetadataPool.java 0 2 00.00%
🔵 io/questdb/cairo/pool/WalWriterPool.java 0 2 00.00%
🔵 io/questdb/cairo/pool/SqlCompilerPool.java 0 2 00.00%
🔵 io/questdb/PropServerConfiguration.java 2 2 100.00%
🔵 io/questdb/cairo/pool/TracingResourcePoolSupervisor.java 14 14 100.00%
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 1 1 100.00%
🔵 io/questdb/PropertyKey.java 2 2 100.00%
🔵 io/questdb/cairo/pool/AbstractMultiTenantPool.java 8 8 100.00%
🔵 io/questdb/cairo/pool/ReaderPool.java 2 2 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%

@bluestreak01 bluestreak01 changed the title feat(core): Add resource pool allocation tracing to detect leaks origin chore(core): add resource pool allocation tracing to detect leaks origin Oct 9, 2025
@bluestreak01 bluestreak01 merged commit ced1613 into master Oct 9, 2025
36 checks passed
@bluestreak01 bluestreak01 deleted the rd_tracing_supervisor branch October 9, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants