-
Notifications
You must be signed in to change notification settings - Fork 4.1k
inspect: GC threshold errors during scale test #159866
Copy link
Copy link
Closed
Labels
C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)GA-blockerO-25.4-scale-testingscale testing issues reported by team drp for 25.4scale testing issues reported by team drp for 25.4P-1Issues/test failures with a fix SLA of 1 monthIssues/test failures with a fix SLA of 1 monthT-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)SQL Foundations Team (formerly SQL Schema + SQL Sessions)branch-release-26.1Used to mark GA and release blockers, technical advisories, and bugs for 26.1Used to mark GA and release blockers, technical advisories, and bugs for 26.1target-release-26.2.0v26.1.0-prereleasev26.2.0-prerelease
Description
During the 300 node scale test on v25.4, we ran INSPECT on the TPCC district table. We noticed hundreds of these in the logs (link):
inspect issue: {type=internal_error aost="2025-12-18 17:04:21.536001" db=254 schema=255 obj=257 pk=‹""›
details=map[error_message:‹inspect-index-consistency-check:
batch timestamp 1766077461.536001016,0 must be after replica GC threshold 1766078030.230733493,0
(r4510920: /Table/257/1/3{483727/10-600000})› error_type:‹internal_query_error› index_name:‹district_idx_195›
query:‹SELECT * FROM (›
‹ WITH ›
‹ pri_nonnull AS (›
‹ SELECT d_w_id, d_id, d_name, d_street_1, d_street_2, d_city, d_state, d_zip FROM [257 AS table_pri]@{FORCE_INDEX=[1]} WHERE (›
‹ (d_w_id > $3::INT8) OR›
‹ (d_w_id = $3::INT8 AND d_id >= $4::INT8)›
‹)›
‹AND (›
‹ (d_w_id < $1::INT8) OR›
‹ (d_w_id = $1::INT8 AND d_id <= $2::INT8)›
‹) AND (d_name IS NOT NULL AND d_street_1 IS NOT NULL AND d_street_2 IS NOT NULL AND d_city IS NOT NULL AND d_state IS NOT NULL AND d_zip IS NOT NULL)›
‹ ),›
‹ pri_null AS (›
‹ SELECT d_w_id, d_id, d_name, d_street_1, d_street_2, d_city, d_state, d_zip FROM [257 AS table_pri]@{FORCE_INDEX=[1]} WHERE (›
‹ (d_w_id > $3::INT8) OR›
‹ (d_w_id = $3::INT8 AND d_id >= $4::INT8)›
‹)›
‹AND (›
‹ (d_w_id < $1::INT8) OR›
‹ (d_w_id = $1::INT8 AND d_id <= $2::INT8)›
‹) AND (d_name IS NULL AND d_street_1 IS NULL AND d_street_2 IS NULL AND d_city IS NULL AND d_state IS NULL AND d_zip IS NULL) ORDER BY d_w_id, d_id›
‹ ),›
‹ sec_nonnull AS (›
‹ SELECT d_w_id, d_id, d_name, d_street_1, d_street_2, d_city, d_state, d_zip FROM [257 AS table_sec]@{FORCE_INDEX=[392]} WHERE (›
‹ (d_w_id > $3::INT8) OR›
‹ (d_w_id = $3::INT8 AND d_id >= $4::INT8)›
‹)›
‹AND (›
‹ (d_w_id < $1::INT8) OR›
‹ (d_w_id = $1::INT8 AND d_id <= $2::INT8)›
‹) AND (d_name IS NOT NULL AND d_street_1 IS NOT NULL AND d_street_2 IS NOT NULL AND d_city IS NOT NULL AND d_state IS NOT NULL AND d_zip IS NOT NULL)›
‹ ),›
‹ sec_null AS (›
‹ SELECT d_w_id, d_id, d_name, d_street_1, d_street_2, d_city, d_state, d_zip FROM [257 AS table_sec]@{FORCE_INDEX=[392]} WHERE (›
‹ (d_w_id > $3::INT8) OR›
‹ (d_w_id = $3::INT8 AND d_id >= $4::INT8)›
‹)›
‹AND (›
‹ (d_w_id < $1::INT8) OR›
‹ (d_w_id = $1::INT8 AND d_id <= $2::INT8)›
‹) AND (d_name IS NULL AND d_street_1 IS NULL AND d_street_2 IS NULL AND d_city IS NULL AND d_state IS NULL AND d_zip IS NULL) ORDER BY d_w_id, d_id›
‹ )›
‹ ›
‹ -- 1. left join from pri to sec, non-null cases›
‹ SELECT 'missing_secondary_index_entry' AS error_type, pri.d_w_id, pri.d_id, pri.d_name, pri.d_street_1, pri.d_street_2, pri.d_city, pri.d_state, pri.d_zip, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL›
‹ FROM pri_nonnull AS pri›
‹ LEFT LOOKUP JOIN [257 AS table_sec]@{FORCE_INDEX=[392]} AS sec›
‹ ON pri.d_w_id = sec.d_w_id AND pri.d_id = sec.d_id AND pri.d_name = sec.d_name AND pri.d_street_1 = sec.d_street_1 AND pri.d_street_2 = sec.d_street_2 AND pri.d_city = sec.d_city AND pri.d_state = sec.d_state AND pri.d_zip = sec.d_zip AND crdb_internal.void_func() IS NOT NULL›
‹ WHERE sec.d_w_id IS NULL›
‹ ›
‹ UNION ALL›
‹ ›
‹ -- 2. left join from pri to sec, null cases ›
‹ SELECT 'missing_secondary_index_entry' AS error_type, pri.d_w_id, pri.d_id, pri.d_name, pri.d_street_1, pri.d_street_2, pri.d_city, pri.d_state, pri.d_zip, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL›
‹ FROM pri_null AS pri›
‹ LEFT MERGE JOIN (›
‹ SELECT d_w_id, d_id, d_name, d_street_1, d_street_2, d_city, d_state, d_zip FROM [257 AS table_sec]@{FORCE_INDEX=[392]} WHERE (›
‹ (d_w_id > $3::INT8) OR›
‹ (d_w_id = $3::INT8 AND d_id >= $4::INT8)›
‹)›
‹AND (›
‹ (d_w_id < $1::INT8) OR›
‹ (d_w_id = $1::INT8 AND d_id <= $2::INT8)›
‹) AND (d_name IS NULL AND d_street_1 IS NULL AND d_street_2 IS NULL AND d_city IS NULL AND d_state IS NULL AND d_zip IS NULL) ORDER BY d_w_id, d_id›
‹ ) AS sec›
‹ ON pri.d_w_id = sec.d_w_id AND pri.d_id = sec.d_id AND crdb_internal.void_func() IS NOT NULL›
‹ WHERE sec.d_w_id IS NULL›
‹ ›
‹ UNION ALL›
‹ ›
‹ -- 3. left join from sec to pri, non-null cases›
‹ SELECT 'dangling_secondary_index_entry' AS error_type, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, sec.d_w_id, sec.d_id, sec.d_name, sec.d_street_1, sec.d_street_2, sec.d_city, sec.d_state, sec.d_zip ›
‹ FROM sec_nonnull AS sec›
‹ LEFT LOOKUP JOIN [257 AS table_pri]@{FORCE_INDEX=[1]} AS pri›
‹ ON sec.d_w_id = pri.d_w_id AND sec.d_id = pri.d_id AND sec.d_name = pri.d_name AND sec.d_street_1 = pri.d_street_1 AND sec.d_street_2 = pri.d_street_2 AND sec.d_city = pri.d_city AND sec.d_state = pri.d_state AND sec.d_zip = pri.d_zip AND crdb_internal.void_func() IS NOT NULL›
‹ WHERE pri.d_w_id IS NULL›
‹ ›
‹ UNION ALL›
‹ ›
‹ -- 4. left join from sec to pri, null cases›
‹ SELECT 'dangling_secondary_index_entry' AS error_type, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, sec.d_w_id, sec.d_id, sec.d_name, sec.d_street_1, sec.d_street_2, sec.d_city, sec.d_state, sec.d_zip›
‹ FROM sec_null AS sec ›
‹ LEFT MERGE JOIN (›
‹ SELECT d_w_id, d_id, d_name, d_street_1, d_street_2, d_city, d_state, d_zip FROM [257 AS table_pri]@{FORCE_INDEX=[1]} WHERE (›
‹ (d_w_id > $3::INT8) OR›
‹ (d_w_id = $3::INT8 AND d_id >= $4::INT8)›
‹)›
‹AND (›
‹ (d_w_id < $1::INT8) OR›
‹ (d_w_id = $1::INT8 AND d_id <= $2::INT8)›
‹) AND (d_name IS NULL AND d_street_1 IS NULL AND d_street_2 IS NULL AND d_city IS NULL AND d_state IS NULL AND d_zip IS NULL) ORDER BY d_w_id, d_id›
‹ ) AS pri›
‹ ON sec.d_w_id = pri.d_w_id AND sec.d_id = pri.d_id AND crdb_internal.void_func() IS NOT NULL›
‹ WHERE pri.d_w_id IS NULL) AS OF SYSTEM TIME 1766077461536001016.0000000000›]}
The jobs.inspect.spans_processed metric shows that the rate of spans processed started declining:
INSPECT only uses a historical AOST if the user opts in. A theory for how this error happened is that the code uses now but still creates an AOST clause. We don't create a protected timestamp in this case where we use now. If the query takes a very long time, which might be more likely since INSPECT uses admissionpb.BulkLowPri, then it could hit a GC threshold error.
We should confirm that or investigate the issue to find the root cause, and fix the bug.
Jira issue: CRDB-58050
Epic CRDB-55075
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)GA-blockerO-25.4-scale-testingscale testing issues reported by team drp for 25.4scale testing issues reported by team drp for 25.4P-1Issues/test failures with a fix SLA of 1 monthIssues/test failures with a fix SLA of 1 monthT-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)SQL Foundations Team (formerly SQL Schema + SQL Sessions)branch-release-26.1Used to mark GA and release blockers, technical advisories, and bugs for 26.1Used to mark GA and release blockers, technical advisories, and bugs for 26.1target-release-26.2.0v26.1.0-prereleasev26.2.0-prerelease