Skip to content

Cache SqlConnection.query results per-instance.#14094

Merged
sfc-gh-jkinkead merged 3 commits intodevelopfrom
fix-sql-connection
Feb 25, 2026
Merged

Cache SqlConnection.query results per-instance.#14094
sfc-gh-jkinkead merged 3 commits intodevelopfrom
fix-sql-connection

Conversation

@sfc-gh-jkinkead
Copy link
Copy Markdown
Contributor

@sfc-gh-jkinkead sfc-gh-jkinkead commented Feb 24, 2026

Describe your changes

Include an instance ID parameter in the cache key for SqlConnection.query's internal cache. This prevents cache leakage between two different database connections running the same query.

This may result in some new cache misses. Note however that since connection_factory / st.connection cache resources globally, this will only result in extra cache misses in cases where users are constructing SqlConnections manually and not through st.connection.

GitHub Issue Link (if applicable)

Testing Plan

See PR.


Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

Include an instance ID parameter in the cache key for SqlConnection.query's internal cache. This prevents cache leakage between two different database connections running the same query.

This may result in some new cache misses. Note however That since `connection_factory` / `st.connection` cache resources globally, this will only result in extra cache misses in cases where users are constructing SqlConnections manually and not through `st.connection`.
Copilot AI review requested due to automatic review settings February 24, 2026 00:08
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 24, 2026

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-14094/streamlit-1.54.0-py3-none-any.whl
📦 @streamlit/component-v2-lib Download from artifacts
🕹️ Preview app pr-14094.streamlit.app (☁️ Deploy here if not accessible)

@snyk-io
Copy link
Copy Markdown
Contributor

snyk-io bot commented Feb 24, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes SQLConnection.query() cache leakage across multiple SQLConnection instances that share the same connection name but differ in connection parameters, by ensuring the internal cached helper is keyed per connection instance.

Changes:

  • Add an instance-specific parameter to the internal cached _query function and pass an instance identifier from SQLConnection.query().
  • Update the SQL connection unit test to verify caches are scoped per SQLConnection instance (not just by connection name).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
lib/streamlit/connections/sql_connection.py Adds a per-instance identifier into the cached _query call to prevent cross-instance cache collisions.
lib/tests/streamlit/connections/sql_connection_test.py Adjusts the cache-scoping test to cover multiple instances sharing the same connection name.

@sfc-gh-lmasuch sfc-gh-lmasuch added security-assessment-completed change:bugfix PR contains bug fix implementation impact:users PR changes affect end users ai-review If applied to PR or issue will run AI review workflow labels Feb 24, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Consolidated Code Review

Summary

This PR fixes a cache-leakage bug in SQLConnection.query() (issue #14077) where two different SQLConnection instances sharing the same _connection_name but configured to different databases would incorrectly share cached query results. The fix adds id(self) as a dummy cache_id parameter to the inner _query function so that @st.cache_data includes the connection instance identity in the cache key.

Files changed (2):

  • lib/streamlit/connections/sql_connection.py — adds a cache_id parameter to _query and passes id(self) at the call site.
  • lib/tests/streamlit/connections/sql_connection_test.py — updates the cache-scoping test to verify per-instance (not just per-name) caching.

Code Quality

Both reviewers agree the implementation is clean, minimal, and follows existing patterns in the codebase. The # noqa: ARG001 suppression and the explanatory comment are appropriate since cache_id is intentionally unused in the function body.

Area of disagreement — id(self) reuse risk:

  • gpt-5.3-codex-high raised this as a High severity issue, calling it a "realistic cache-key collision path" and requesting a uuid4-based alternative before merge.
  • opus-4.6-thinking acknowledged the same theoretical concern but assessed it as non-blocking, noting that (a) connections via st.connection() are long-lived cached instances where ID reuse cannot happen, (b) for manually constructed connections, the scenario is extremely unlikely, and (c) even in the unlikely event, the current bug (wrong database data) is far worse than this edge case.

My assessment: I agree with opus-4.6-thinking that this is non-blocking. Here is the reasoning:

  1. st.connection() — the recommended and overwhelmingly common path — uses @cache_resource to cache connection instances globally. The same object persists across reruns, so id(self) is stable.
  2. For manually constructed connections, id reuse requires a prior connection to be garbage-collected while its cached data persists AND a new connection to allocate at the exact same memory address AND the user to execute the same SQL query. This is an extremely narrow window.
  3. Even if it happens, the result is a stale cache hit (returning data from a prior connection at the same address), which is still a strict improvement over the status quo (returning data from an entirely different connection).
  4. A uuid4 improvement could be a follow-up enhancement but should not block this targeted bug fix.

Consistency note (agreed by both reviewers): SnowflakeConnection.query() and SnowparkConnection.query() use the same __qualname__-based pattern and are susceptible to the same class-level caching issue. A follow-up PR to apply the same fix consistently would be valuable.

Test Coverage

The updated test test_scopes_caches_by_connection_instance is well-structured and covers three clear scenarios:

  1. Same instance, same query — cache hit (call count stays at 1).
  2. Different instance, same name, same query — cache miss (call count goes to 2).
  3. Different instance, different name, same query — cache miss (call count goes to 3).

Incremental assertions after each .query() call make the test easy to follow and debug.

Nice-to-have improvement (agreed by both): Add a final assertion confirming conn1 still gets a cache hit after conn2/conn3 cause misses (i.e., call conn1.query("SELECT 1;") again and assert call_count remains 3). This is not a blocking issue.

No E2E test is needed — this is an internal caching behavior with no frontend component.

Backwards Compatibility

No API break is introduced. As the PR description notes, manually constructed SQLConnection instances (not via st.connection()) may see additional cache misses since each new object gets a new id. This is an acceptable trade-off: cache misses (safe but slower) are far preferable to cache leakage (returning data from a wrong database). Users of st.connection() (the recommended path) are unaffected since the same cached instance is reused.

Security & Risk

This PR closes a security-relevant issue — cache leakage between different database connections could expose data from one database to queries intended for another. The fix correctly isolates cache entries per connection instance. No new security concerns are introduced.

Accessibility

No frontend/UI changes — not applicable.

Recommendations

  1. (Follow-up) Apply the same per-instance cache scoping fix to SnowflakeConnection.query() and SnowparkConnection.query(), which have the same vulnerability.
  2. (Nice-to-have) Add a final assertion in the test to confirm earlier connections still get cache hits after later connections cause misses.
  3. (Optional follow-up) Consider replacing id(self) with a uuid4-based token stored on the instance for complete robustness against object ID reuse, if desired.

Reviewer Agreement Summary

Topic gpt-5.3-codex-high opus-4.6-thinking Consolidated
Implementation quality Clean, minimal Clean, minimal Agreed
id(self) reuse risk High / blocking Non-blocking Non-blocking (see reasoning above)
Test coverage adequate Yes (with gaps noted) Yes (with nice-to-have noted) Adequate
Backwards compatibility Acceptable Acceptable Agreed
Snowflake/Snowpark follow-up Not mentioned Recommended Recommended
Verdict Changes Requested Approved Approved

Verdict

APPROVED — Clean, minimal fix for a real cache-leakage bug with appropriate test coverage. The id(self) reuse concern is theoretical and non-blocking for the standard usage path. The fix is a strict improvement over the status quo.


Consolidated review by opus-4.6-thinking.


📋 Review by `gpt-5.3-codex-high`

Summary

This PR updates SQLConnection.query caching to scope cached query results per connection instance by adding a dummy cache key argument, and it updates unit tests accordingly. The intent is correct and addresses the core leakage scenario from issue #14077, but the current implementation still leaves a collision path.

Code Quality

The implementation is small, readable, and follows existing patterns in SQLConnection.query, and the test updates are easy to follow.

Issue found:

  • High: id(self) is used as the per-instance cache discriminator (lib/streamlit/connections/sql_connection.py:315, lib/streamlit/connections/sql_connection.py:351). In CPython, object IDs can be reused after an object is garbage-collected. Because st.cache_data entries can outlive the original connection instance, a new SQLConnection may later receive the same ID and accidentally hit stale cached rows from a different database connection.

Test Coverage

The updated unit test (lib/tests/streamlit/connections/sql_connection_test.py:167) correctly verifies cache separation across multiple live connection instances and names.

Coverage gap:

  • There is no test for sequential connection recreation where the previous instance is gone but cached entries remain. That is the remaining edge case for the current id(self) approach.
  • No e2e coverage was added; for this backend caching fix, targeted unit coverage is generally appropriate, but the sequential-lifetime regression risk is not yet validated.

Backwards Compatibility

No API break is introduced. Behavior changes for users manually constructing SQLConnection instances: cache sharing is reduced (more misses), which is acceptable and aligned with the bugfix goal.

Security & Risk

Main risk is unresolved cache leakage due to recycled object IDs (lib/streamlit/connections/sql_connection.py:315, lib/streamlit/connections/sql_connection.py:351). Since this PR is security-adjacent (cross-connection data exposure), this remaining collision vector should be closed before merge.

Accessibility

No frontend/UI changes in this PR, so accessibility impact is not applicable.

Recommendations

  1. Replace id(self) with a non-recycled per-instance token (for example, generate self._query_cache_id = uuid.uuid4().hex once during initialization and use that in the cached function arguments).
  2. Add a unit test that validates cache scoping by that stable instance token across connection lifetimes (not only concurrent live objects).

Verdict

CHANGES REQUESTED: The fix direction is good, but using id(self) leaves a realistic cache-key collision path that can reintroduce cross-connection cache leakage over time.


This is an automated AI review by gpt-5.3-codex-high. Please verify the feedback and use your judgment.

📋 Review by `opus-4.6-thinking`

Summary

This PR fixes a cache-leakage bug in SQLConnection.query() where two different connection instances sharing the same _connection_name (but configured to different databases) would incorrectly share cached query results. The fix adds id(self) as a dummy parameter to the inner _query function so that @st.cache_data includes the connection instance identity in the cache key.

Changes are limited to two files:

  • lib/streamlit/connections/sql_connection.py — adds a cache_id parameter to _query and passes id(self) at the call site.
  • lib/tests/streamlit/connections/sql_connection_test.py — updates the existing cache-scoping test to verify per-instance (not just per-name) caching.

Code Quality

The implementation is clean and minimal. The approach of using a dummy parameter to influence the @st.cache_data cache key is a well-understood pattern already established in this codebase.

  • sql_connection.py line 314-315: The # noqa: ARG001 suppression is appropriate since cache_id is intentionally unused in the function body — it only serves as a cache key discriminator.
  • The comment "Dummy parameter to retain per-instance caching" is concise and explains the non-obvious intent.

One minor concern: id(self) returns a memory address that can be reused after garbage collection. For connections created via st.connection() (the recommended path), this is a non-issue since instances are cached globally and long-lived. For manually constructed connections that are garbage-collected and recreated, a new object could theoretically inherit the same id as a prior one, producing a stale cache hit. In practice this is extremely unlikely and arguably still preferable to the current bug (returning data from the wrong database). If robustness against id reuse were desired, a uuid4 stored on the instance would be an alternative — but this is not a blocking concern.

Consistency note: SnowflakeConnection.query() (snowflake_connection.py:142) and SnowparkConnection.query() (snowpark_connection.py:147) use the same __qualname__-based cache scoping pattern and are susceptible to the same class-level (rather than instance-level) caching issue. This PR does not address those. This is acceptable for a targeted bug fix, but a follow-up issue to apply the same fix consistently would be valuable.

Test Coverage

The updated test test_scopes_caches_by_connection_instance is well-structured and covers three scenarios:

  1. Same instance, same query — cache hit (conn1 called twice, call_count stays at 1).
  2. Different instance, same name, same query — cache miss (conn2 with same name but different host, call_count goes to 2).
  3. Different instance, different name, same query — cache miss (conn3 with different name, call_count goes to 3).

The incremental assertions after each .query() call make the test easy to follow and debug.

The test correctly verifies the core fix. One additional scenario that could strengthen coverage is verifying that the same instance still gets cache hits even after a different instance causes a miss (e.g., calling conn1.query("SELECT 1;") again after the conn2/conn3 calls to confirm call_count remains 3). However, this is a nice-to-have, not a requirement.

No E2E test is included. Given this is an internal caching behavior with no frontend component, unit test coverage alone is adequate.

Backwards Compatibility

As the PR description correctly notes, this change may introduce new cache misses for users who manually construct SQLConnection instances (not through st.connection()). On each script rerun, a manually constructed SQLConnection will be a new object with a new id, causing the cache to miss.

For users of st.connection() (the recommended and common path), behavior is unchanged: st.connection returns the same cached instance across reruns, so id(self) will be consistent and caching will continue to work as expected.

This is an acceptable trade-off: cache misses (safe but slower) are far preferable to cache leakage (returning data from a wrong database connection).

Security & Risk

This PR closes a security-relevant issue — cache leakage between different database connections could expose data from one database to queries intended for another. The fix correctly isolates cache entries per connection instance.

No new security concerns are introduced. The id(self) value is an integer memory address used only internally as a cache key; it is not exposed to users or transmitted externally.

Accessibility

No frontend changes — not applicable.

Recommendations

  1. Consider applying the same fix to SnowflakeConnection.query() and SnowparkConnection.query() in a follow-up PR, as they have the same __qualname__-based cache scoping pattern and are vulnerable to the same issue.
  2. (Nice-to-have) Add a brief assertion at the end of the test to confirm earlier connections still get cache hits after later connections cause misses (e.g., conn1.query("SELECT 1;") followed by assert patched_read_sql.call_count == 3). This would verify there is no cache invalidation side-effect.

Verdict

APPROVED: Clean, minimal fix for a real cache-leakage bug with appropriate test coverage. The approach is sound, backwards compatible for the standard usage path, and consistent with the existing codebase patterns.


This is an automated AI review by opus-4.6-thinking.

Use a UUID to make memory address edge-case not a problem.
@sfc-gh-jkinkead
Copy link
Copy Markdown
Contributor Author

GitHub correctly noticed that SnowflakeConnection overloads query, so I fixed that.

def _query(sql: str, params: Any = None) -> DataFrame:
def _query(
# Dummy parameter to retain per-instance caching.
instance_id: int, # noqa: ARG001
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch: instance_id is typed as int, but self._connection_instance_id is a uuid.UUID object (created via uuid.uuid4() in base_connection.py line 79). This will cause type checker failures and is misleading.

# Fix: Change type annotation to match actual type
instance_id: uuid.UUID,  # noqa: ARG001

Alternatively, convert the UUID to int when passing:

return _query(id(self._connection_instance_id), sql, params)
Suggested change
instance_id: int, # noqa: ARG001
instance_id: uuid.UUID, # noqa: ARG001

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Copy Markdown
Collaborator

@lukasmasuch lukasmasuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Related question: what about the SnowparkConnection? From the docs it sounds like its deprecated, does it make sense to plan in the removal of this connection type at some point?

@sfc-gh-jkinkead
Copy link
Copy Markdown
Contributor Author

SnowparkConnection has been deprecated for almost three years now, and I think we should delete it. :)

It's not used in SiS, and it doesn't do anything that SnowflakeConnection doesn't do better.

@sfc-gh-jkinkead sfc-gh-jkinkead merged commit 0f471ac into develop Feb 25, 2026
42 checks passed
@sfc-gh-jkinkead sfc-gh-jkinkead deleted the fix-sql-connection branch February 25, 2026 00:01
@lukasmasuch
Copy link
Copy Markdown
Collaborator

lukasmasuch commented Feb 25, 2026

SnowparkConnection has been deprecated for almost three years now, and I think we should delete it. :)

Good to know 👍 I will add something to make this deprecation a bit more official: #14125

lukasmasuch added a commit that referenced this pull request Feb 26, 2026
## Describe your changes

Deprecates `SnowparkConnection` in favor of `SnowflakeConnection`:

- Adds `show_deprecation_warning` call in `__init__` with
`show_once=True`
- Adds `.. warning::` directive to the class docstring pointing users to
`SnowflakeConnection`
- Adds unit test to verify the deprecation warning is shown

Context:
#14094 (comment)

## Testing Plan

- [x] Unit test (`SnowparkConnectionDeprecationTest`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:bugfix PR contains bug fix implementation impact:users PR changes affect end users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SqlConnection.query caches at the class level, not the instance level

4 participants