Cache SqlConnection.query results per-instance. by sfc-gh-jkinkead · Pull Request #14094 · streamlit/streamlit

sfc-gh-jkinkead · 2026-02-24T00:08:50Z

Describe your changes

Include an instance ID parameter in the cache key for SqlConnection.query's internal cache. This prevents cache leakage between two different database connections running the same query.

This may result in some new cache misses. Note however that since connection_factory / st.connection cache resources globally, this will only result in extra cache misses in cases where users are constructing SqlConnections manually and not through st.connection.

GitHub Issue Link (if applicable)

Closes SqlConnection.query caches at the class level, not the instance level #14077

Testing Plan

See PR.

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

Include an instance ID parameter in the cache key for SqlConnection.query's internal cache. This prevents cache leakage between two different database connections running the same query. This may result in some new cache misses. Note however That since `connection_factory` / `st.connection` cache resources globally, this will only result in extra cache misses in cases where users are constructing SqlConnections manually and not through `st.connection`.

github-actions · 2026-02-24T00:09:06Z

✅ PR preview is ready!

Name	Link
📦 Wheel file	https://core-previews.s3-us-west-2.amazonaws.com/pr-14094/streamlit-1.54.0-py3-none-any.whl
📦 `@streamlit/component-v2-lib`	Download from artifacts
🕹️ Preview app	pr-14094.streamlit.app (☁️ Deploy here if not accessible)

snyk-io · 2026-02-24T00:09:16Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copilot

Pull request overview

Fixes SQLConnection.query() cache leakage across multiple SQLConnection instances that share the same connection name but differ in connection parameters, by ensuring the internal cached helper is keyed per connection instance.

Changes:

Add an instance-specific parameter to the internal cached _query function and pass an instance identifier from SQLConnection.query().
Update the SQL connection unit test to verify caches are scoped per SQLConnection instance (not just by connection name).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
lib/streamlit/connections/sql_connection.py	Adds a per-instance identifier into the cached `_query` call to prevent cross-instance cache collisions.
lib/tests/streamlit/connections/sql_connection_test.py	Adjusts the cache-scoping test to cover multiple instances sharing the same connection name.

lib/streamlit/connections/sql_connection.py

github-actions · 2026-02-24T19:43:20Z

Consolidated Code Review

Summary

This PR fixes a cache-leakage bug in SQLConnection.query() (issue #14077) where two different SQLConnection instances sharing the same _connection_name but configured to different databases would incorrectly share cached query results. The fix adds id(self) as a dummy cache_id parameter to the inner _query function so that @st.cache_data includes the connection instance identity in the cache key.

Files changed (2):

lib/streamlit/connections/sql_connection.py — adds a cache_id parameter to _query and passes id(self) at the call site.
lib/tests/streamlit/connections/sql_connection_test.py — updates the cache-scoping test to verify per-instance (not just per-name) caching.

Code Quality

Both reviewers agree the implementation is clean, minimal, and follows existing patterns in the codebase. The # noqa: ARG001 suppression and the explanatory comment are appropriate since cache_id is intentionally unused in the function body.

Area of disagreement — id(self) reuse risk:

gpt-5.3-codex-high raised this as a High severity issue, calling it a "realistic cache-key collision path" and requesting a uuid4-based alternative before merge.
opus-4.6-thinking acknowledged the same theoretical concern but assessed it as non-blocking, noting that (a) connections via st.connection() are long-lived cached instances where ID reuse cannot happen, (b) for manually constructed connections, the scenario is extremely unlikely, and (c) even in the unlikely event, the current bug (wrong database data) is far worse than this edge case.

My assessment: I agree with opus-4.6-thinking that this is non-blocking. Here is the reasoning:

st.connection() — the recommended and overwhelmingly common path — uses @cache_resource to cache connection instances globally. The same object persists across reruns, so id(self) is stable.
For manually constructed connections, id reuse requires a prior connection to be garbage-collected while its cached data persists AND a new connection to allocate at the exact same memory address AND the user to execute the same SQL query. This is an extremely narrow window.
Even if it happens, the result is a stale cache hit (returning data from a prior connection at the same address), which is still a strict improvement over the status quo (returning data from an entirely different connection).
A uuid4 improvement could be a follow-up enhancement but should not block this targeted bug fix.

Consistency note (agreed by both reviewers): SnowflakeConnection.query() and SnowparkConnection.query() use the same __qualname__-based pattern and are susceptible to the same class-level caching issue. A follow-up PR to apply the same fix consistently would be valuable.

Test Coverage

The updated test test_scopes_caches_by_connection_instance is well-structured and covers three clear scenarios:

Same instance, same query — cache hit (call count stays at 1).
Different instance, same name, same query — cache miss (call count goes to 2).
Different instance, different name, same query — cache miss (call count goes to 3).

Incremental assertions after each .query() call make the test easy to follow and debug.

Nice-to-have improvement (agreed by both): Add a final assertion confirming conn1 still gets a cache hit after conn2/conn3 cause misses (i.e., call conn1.query("SELECT 1;") again and assert call_count remains 3). This is not a blocking issue.

No E2E test is needed — this is an internal caching behavior with no frontend component.

Backwards Compatibility

No API break is introduced. As the PR description notes, manually constructed SQLConnection instances (not via st.connection()) may see additional cache misses since each new object gets a new id. This is an acceptable trade-off: cache misses (safe but slower) are far preferable to cache leakage (returning data from a wrong database). Users of st.connection() (the recommended path) are unaffected since the same cached instance is reused.

Security & Risk

This PR closes a security-relevant issue — cache leakage between different database connections could expose data from one database to queries intended for another. The fix correctly isolates cache entries per connection instance. No new security concerns are introduced.

Accessibility

No frontend/UI changes — not applicable.

Recommendations

(Follow-up) Apply the same per-instance cache scoping fix to SnowflakeConnection.query() and SnowparkConnection.query(), which have the same vulnerability.
(Nice-to-have) Add a final assertion in the test to confirm earlier connections still get cache hits after later connections cause misses.
(Optional follow-up) Consider replacing id(self) with a uuid4-based token stored on the instance for complete robustness against object ID reuse, if desired.

Reviewer Agreement Summary

Topic	gpt-5.3-codex-high	opus-4.6-thinking	Consolidated
Implementation quality	Clean, minimal	Clean, minimal	Agreed
`id(self)` reuse risk	High / blocking	Non-blocking	Non-blocking (see reasoning above)
Test coverage adequate	Yes (with gaps noted)	Yes (with nice-to-have noted)	Adequate
Backwards compatibility	Acceptable	Acceptable	Agreed
Snowflake/Snowpark follow-up	Not mentioned	Recommended	Recommended
Verdict	Changes Requested	Approved	Approved

Verdict

APPROVED — Clean, minimal fix for a real cache-leakage bug with appropriate test coverage. The id(self) reuse concern is theoretical and non-blocking for the standard usage path. The fix is a strict improvement over the status quo.

Consolidated review by opus-4.6-thinking.

📋 Review by `gpt-5.3-codex-high`

Summary

This PR updates SQLConnection.query caching to scope cached query results per connection instance by adding a dummy cache key argument, and it updates unit tests accordingly. The intent is correct and addresses the core leakage scenario from issue #14077, but the current implementation still leaves a collision path.

Code Quality

The implementation is small, readable, and follows existing patterns in SQLConnection.query, and the test updates are easy to follow.

Issue found:

High: id(self) is used as the per-instance cache discriminator (lib/streamlit/connections/sql_connection.py:315, lib/streamlit/connections/sql_connection.py:351). In CPython, object IDs can be reused after an object is garbage-collected. Because st.cache_data entries can outlive the original connection instance, a new SQLConnection may later receive the same ID and accidentally hit stale cached rows from a different database connection.

Test Coverage

The updated unit test (lib/tests/streamlit/connections/sql_connection_test.py:167) correctly verifies cache separation across multiple live connection instances and names.

Coverage gap:

There is no test for sequential connection recreation where the previous instance is gone but cached entries remain. That is the remaining edge case for the current id(self) approach.
No e2e coverage was added; for this backend caching fix, targeted unit coverage is generally appropriate, but the sequential-lifetime regression risk is not yet validated.

Backwards Compatibility

No API break is introduced. Behavior changes for users manually constructing SQLConnection instances: cache sharing is reduced (more misses), which is acceptable and aligned with the bugfix goal.

Security & Risk

Main risk is unresolved cache leakage due to recycled object IDs (lib/streamlit/connections/sql_connection.py:315, lib/streamlit/connections/sql_connection.py:351). Since this PR is security-adjacent (cross-connection data exposure), this remaining collision vector should be closed before merge.

Accessibility

No frontend/UI changes in this PR, so accessibility impact is not applicable.

Recommendations

Replace id(self) with a non-recycled per-instance token (for example, generate self._query_cache_id = uuid.uuid4().hex once during initialization and use that in the cached function arguments).
Add a unit test that validates cache scoping by that stable instance token across connection lifetimes (not only concurrent live objects).

Verdict

CHANGES REQUESTED: The fix direction is good, but using id(self) leaves a realistic cache-key collision path that can reintroduce cross-connection cache leakage over time.

This is an automated AI review by gpt-5.3-codex-high. Please verify the feedback and use your judgment.

📋 Review by `opus-4.6-thinking`

Summary

This PR fixes a cache-leakage bug in SQLConnection.query() where two different connection instances sharing the same _connection_name (but configured to different databases) would incorrectly share cached query results. The fix adds id(self) as a dummy parameter to the inner _query function so that @st.cache_data includes the connection instance identity in the cache key.

Changes are limited to two files:

lib/streamlit/connections/sql_connection.py — adds a cache_id parameter to _query and passes id(self) at the call site.
lib/tests/streamlit/connections/sql_connection_test.py — updates the existing cache-scoping test to verify per-instance (not just per-name) caching.

Code Quality

The implementation is clean and minimal. The approach of using a dummy parameter to influence the @st.cache_data cache key is a well-understood pattern already established in this codebase.

sql_connection.py line 314-315: The # noqa: ARG001 suppression is appropriate since cache_id is intentionally unused in the function body — it only serves as a cache key discriminator.
The comment "Dummy parameter to retain per-instance caching" is concise and explains the non-obvious intent.

One minor concern: id(self) returns a memory address that can be reused after garbage collection. For connections created via st.connection() (the recommended path), this is a non-issue since instances are cached globally and long-lived. For manually constructed connections that are garbage-collected and recreated, a new object could theoretically inherit the same id as a prior one, producing a stale cache hit. In practice this is extremely unlikely and arguably still preferable to the current bug (returning data from the wrong database). If robustness against id reuse were desired, a uuid4 stored on the instance would be an alternative — but this is not a blocking concern.

Consistency note: SnowflakeConnection.query() (snowflake_connection.py:142) and SnowparkConnection.query() (snowpark_connection.py:147) use the same __qualname__-based cache scoping pattern and are susceptible to the same class-level (rather than instance-level) caching issue. This PR does not address those. This is acceptable for a targeted bug fix, but a follow-up issue to apply the same fix consistently would be valuable.

Test Coverage

The updated test test_scopes_caches_by_connection_instance is well-structured and covers three scenarios:

Same instance, same query — cache hit (conn1 called twice, call_count stays at 1).
Different instance, same name, same query — cache miss (conn2 with same name but different host, call_count goes to 2).
Different instance, different name, same query — cache miss (conn3 with different name, call_count goes to 3).

The incremental assertions after each .query() call make the test easy to follow and debug.

The test correctly verifies the core fix. One additional scenario that could strengthen coverage is verifying that the same instance still gets cache hits even after a different instance causes a miss (e.g., calling conn1.query("SELECT 1;") again after the conn2/conn3 calls to confirm call_count remains 3). However, this is a nice-to-have, not a requirement.

No E2E test is included. Given this is an internal caching behavior with no frontend component, unit test coverage alone is adequate.

Backwards Compatibility

As the PR description correctly notes, this change may introduce new cache misses for users who manually construct SQLConnection instances (not through st.connection()). On each script rerun, a manually constructed SQLConnection will be a new object with a new id, causing the cache to miss.

For users of st.connection() (the recommended and common path), behavior is unchanged: st.connection returns the same cached instance across reruns, so id(self) will be consistent and caching will continue to work as expected.

This is an acceptable trade-off: cache misses (safe but slower) are far preferable to cache leakage (returning data from a wrong database connection).

Security & Risk

This PR closes a security-relevant issue — cache leakage between different database connections could expose data from one database to queries intended for another. The fix correctly isolates cache entries per connection instance.

No new security concerns are introduced. The id(self) value is an integer memory address used only internally as a cache key; it is not exposed to users or transmitted externally.

Accessibility

No frontend changes — not applicable.

Recommendations

Consider applying the same fix to SnowflakeConnection.query() and SnowparkConnection.query() in a follow-up PR, as they have the same __qualname__-based cache scoping pattern and are vulnerable to the same issue.
(Nice-to-have) Add a brief assertion at the end of the test to confirm earlier connections still get cache hits after later connections cause misses (e.g., conn1.query("SELECT 1;") followed by assert patched_read_sql.call_count == 3). This would verify there is no cache invalidation side-effect.

Verdict

APPROVED: Clean, minimal fix for a real cache-leakage bug with appropriate test coverage. The approach is sound, backwards compatible for the standard usage path, and consistent with the existing codebase patterns.

This is an automated AI review by opus-4.6-thinking.

Use a UUID to make memory address edge-case not a problem.

sfc-gh-jkinkead · 2026-02-24T22:56:23Z

GitHub correctly noticed that SnowflakeConnection overloads query, so I fixed that.

graphite-app · 2026-02-24T22:58:50Z

lib/streamlit/connections/snowflake_connection.py

-        def _query(sql: str, params: Any = None) -> DataFrame:
+        def _query(
+            # Dummy parameter to retain per-instance caching.
+            instance_id: int,  # noqa: ARG001


Type mismatch: instance_id is typed as int, but self._connection_instance_id is a uuid.UUID object (created via uuid.uuid4() in base_connection.py line 79). This will cause type checker failures and is misleading.

# Fix: Change type annotation to match actual type instance_id: uuid.UUID, # noqa: ARG001

Alternatively, convert the UUID to int when passing:

return _query(id(self._connection_instance_id), sql, params)

Suggested change

instance_id: int, # noqa: ARG001

instance_id: uuid.UUID, # noqa: ARG001

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

lukasmasuch

LGTM 👍 Related question: what about the SnowparkConnection? From the docs it sounds like its deprecated, does it make sense to plan in the removal of this connection type at some point?

sfc-gh-jkinkead · 2026-02-25T00:00:23Z

SnowparkConnection has been deprecated for almost three years now, and I think we should delete it. :)

It's not used in SiS, and it doesn't do anything that SnowflakeConnection doesn't do better.

lukasmasuch · 2026-02-25T10:01:54Z

SnowparkConnection has been deprecated for almost three years now, and I think we should delete it. :)

Good to know 👍 I will add something to make this deprecation a bit more official: #14125

## Describe your changes Deprecates `SnowparkConnection` in favor of `SnowflakeConnection`: - Adds `show_deprecation_warning` call in `__init__` with `show_once=True` - Adds `.. warning::` directive to the class docstring pointing users to `SnowflakeConnection` - Adds unit test to verify the deprecation warning is shown Context: #14094 (comment) ## Testing Plan - [x] Unit test (`SnowparkConnectionDeprecationTest`)

Copilot AI review requested due to automatic review settings February 24, 2026 00:08

Copilot started reviewing on behalf of sfc-gh-jkinkead February 24, 2026 00:09 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

lib/streamlit/connections/sql_connection.py Show resolved Hide resolved

sfc-gh-lmasuch added security-assessment-completed change:bugfix PR contains bug fix implementation impact:users PR changes affect end users ai-review If applied to PR or issue will run AI review workflow labels Feb 24, 2026

github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Feb 24, 2026

Update SnowflakeConnection's query method.

7571e3e

Use a UUID to make memory address edge-case not a problem.

graphite-app bot reviewed Feb 24, 2026

View reviewed changes

lukasmasuch approved these changes Feb 24, 2026

View reviewed changes

Fix typing.

2f918f1

sfc-gh-jkinkead merged commit 0f471ac into develop Feb 25, 2026
42 checks passed

sfc-gh-jkinkead deleted the fix-sql-connection branch February 25, 2026 00:01

lukasmasuch mentioned this pull request Feb 25, 2026

Add deprecation warning to SnowparkConnection #14125

Merged

1 task

	instance_id: int, # noqa: ARG001
	instance_id: uuid.UUID, # noqa: ARG001

Conversation

sfc-gh-jkinkead commented Feb 24, 2026 • edited by sfc-gh-lmasuch Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

GitHub Issue Link (if applicable)

Testing Plan

Uh oh!

github-actions bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ PR preview is ready!

Uh oh!

snyk-io bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions bot commented Feb 24, 2026

Consolidated Code Review

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Reviewer Agreement Summary

Verdict

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Verdict

Summary

Code Quality

Test Coverage

Backwards Compatibility

Security & Risk

Accessibility

Recommendations

Verdict

Uh oh!

sfc-gh-jkinkead commented Feb 24, 2026

Uh oh!

graphite-app bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

lukasmasuch left a comment • edited by sfc-gh-lmasuch Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfc-gh-jkinkead commented Feb 25, 2026

Uh oh!

Uh oh!

lukasmasuch commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sfc-gh-jkinkead commented Feb 24, 2026 •

edited by sfc-gh-lmasuch

Loading

github-actions bot commented Feb 24, 2026 •

edited

Loading

snyk-io bot commented Feb 24, 2026 •

edited

Loading

lukasmasuch left a comment •

edited by sfc-gh-lmasuch

Loading

lukasmasuch commented Feb 25, 2026 •

edited

Loading