-
Notifications
You must be signed in to change notification settings - Fork 716
feat: check indices exists before running create index commands #4618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yashodhan Joshi <[email protected]>
Signed-off-by: Yashodhan Joshi <[email protected]>
Signed-off-by: Yashodhan Joshi <[email protected]>
Signed-off-by: Yashodhan Joshi <[email protected]>
WalkthroughThe changes introduce new functionality for managing database indices across multiple database implementations (MySQL, PostgreSQL, SQLite). This includes the addition of a Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
Outside diff range and nitpick comments (15)
src/infra/src/schema/history/sqlite.rs (1)
112-134: LGTM: Improved index naming and existence checkingThe changes improve the code in two ways:
- More descriptive index names enhance readability and maintainability.
- The new logic for checking existing indices prevents redundant index creation, aligning with the PR objectives.
Minor suggestion: Consider extracting the index names into constants at the top of the file for easier maintenance and reuse.
const SCHEMA_HISTORY_ORG_IDX: &str = "schema_history_org_idx"; const SCHEMA_HISTORY_STREAM_IDX: &str = "schema_history_stream_idx"; const SCHEMA_HISTORY_STREAM_VERSION_IDX: &str = "schema_history_stream_version_idx";src/infra/src/db/mod.rs (4)
245-257: LGTM: cache_indices_mysql function with suggestion.The
cache_indices_mysqlfunction is well-implemented. It correctly queries the MySQL database for index information and handles potential errors.Consider adding error logging when the query fails. This could help with debugging in production environments:
Err(e) => { log::error!("Failed to cache MySQL indices: {}", e); HashSet::new() }
259-271: LGTM: cache_indices_pg function with suggestion.The
cache_indices_pgfunction is well-implemented. It correctly queries the PostgreSQL database for index information and handles potential errors.Consider adding error logging when the query fails, similar to the suggestion for the MySQL function:
Err(e) => { log::error!("Failed to cache PostgreSQL indices: {}", e); HashSet::new() }
273-285: LGTM: cache_indices_sqlite function with suggestion.The
cache_indices_sqlitefunction is well-implemented. It correctly queries the SQLite database for index information and handles potential errors.Consider adding error logging when the query fails, similar to the suggestions for the MySQL and PostgreSQL functions:
Err(e) => { log::error!("Failed to cache SQLite indices: {}", e); HashSet::new() }
239-285: Overall implementation aligns well with PR objectives.The introduced changes effectively implement the caching mechanism for database indices across MySQL, PostgreSQL, and SQLite. This aligns well with the PR objective of checking for index existence before running create index commands.
Some points to consider:
- The caching mechanism is implemented as described in the PR summary, operating during the initial startup of the application.
- The implementation is consistent across all three database types, which is good for maintainability.
- As mentioned in the PR summary, this approach may not capture indices created during initialization. Consider adding a comment in the code to highlight this limitation.
To address the potential issue of lock-ups in a multi-node scenario, consider implementing a distributed locking mechanism or exploring the use of
IF NOT EXISTSin all relevant queries, as suggested in the PR summary.src/infra/src/scheduler/postgres.rs (1)
85-95: Redundant index existence checksThe code checks for existing indices before creating them, and the SQL statements already include
IF NOT EXISTS. This duplication might be unnecessary. Consider removing the code-level checks and relying solely on the SQLIF NOT EXISTSclause.src/infra/src/scheduler/sqlite.rs (1)
95-95: Consider renaming variableidxtoindex_namefor clarityThe variable
idxrepresents an index name, not an index position. Renaming it toindex_nameornamewill enhance readability and reduce potential confusion.src/infra/src/db/postgres.rs (1)
Line range hint
614-629: Incorrect condition when creating indexIn the
add_start_dt_columnfunction, the condition for creating the indexmeta_module_start_dt_idxis incorrect. Currently, it checks if the index exists and then proceeds to create it. The logic should be inverted to create the index only if it doesn't already exist.Apply this diff to fix the condition:
-if indices.contains(&DBIndex { +if !indices.contains(&DBIndex { name: "meta_module_start_dt_idx".into(), table: "meta".into(), }) { // Proceed to create the index }This ensures that the index is only created if it does not already exist, aligning with the intended functionality.
src/infra/src/db/sqlite.rs (2)
Line range hint
743-751: Match specific error types instead of error message stringsIn
add_start_dt_column, the code checks for a duplicate column error by matching the error message string:if !e.to_string().contains("duplicate column name")Relying on error message strings is brittle and may fail if messages change or differ across database versions or locales. It's better to match on the specific error types or codes provided by
sqlxor the database driver.Apply this diff to improve error handling:
if let Err(e) = sqlx::query(r#"ALTER TABLE meta ADD COLUMN start_dt INTEGER NOT NULL DEFAULT 0;"#) .execute(client) .await { - // Check if the error is about the duplicate column - if !e.to_string().contains("duplicate column name") { - // If the error is not about the duplicate column, return it - return Err(e.into()); + if let sqlx::Error::Database(db_err) = &e { + if db_err.message().contains("duplicate column name") { + // Column already exists, ignore the error + } else { + // Return other database errors + return Err(e.into()); + } + } else { + // Return non-database errors + return Err(e.into()); } }
Line range hint
689-775: Ensure indices cache is updated after modifying indicesAfter creating or dropping indices in
create_tableandadd_start_dt_column, theINDICEScache may become stale. To prevent inconsistencies, consider refreshing theINDICEScache after modifying indices.Apply this diff to update the indices cache:
} + // Refresh indices cache after modifying indices + *INDICES.get_mut().unwrap() = cache_indices_sqlite(client).await; Ok(()) }src/infra/src/db/mysql.rs (1)
593-593: Potential issue with indices caching mechanism.Caching the indices at the beginning of the function may cause the
INDICEScache to become outdated if indices are created or dropped afterward within the same runtime. Since the cache is in-memory and not refreshed, subsequent index operations may operate on stale data. Consider refreshing theINDICEScache after index creation or deletion to ensure it reflects the current state of the database.Also applies to: 664-664
src/infra/src/file_list/postgres.rs (3)
Line range hint
1254-1277: Incorrect logging condition in duplicate record deletion loop.The condition
if i / 1000 == 0will always be true wheniis less than 1000, causing excessive logging.Update the condition to correctly log progress every 1000 iterations.
Apply this diff to fix the logging condition:
- if i / 1000 == 0 { + if i % 1000 == 0 { log::warn!("[POSTGRES] delete duplicate records: {}/{}", i, ret.len()); }
Line range hint
1254-1286: Inefficient deletion of duplicate records.Deleting duplicates one by one can be time-consuming for large datasets.
Consider performing the deletion in a single SQL query using
DELETE ... WHERE id IN (...)or using aCTE(Common Table Expression) to improve performance.
Line range hint
1249-1286: Lack of transactional integrity during index recreation.The process of deleting duplicates and recreating the index is not enclosed in a transaction, which may lead to data inconsistency if an error occurs midway.
Wrap the deletion and index recreation within a transaction to ensure atomicity.
Apply this diff to include transaction handling:
+ let mut tx = pool.begin().await?; // delete duplicate records log::warn!("[POSTGRES] starting delete duplicate records"); let ret = sqlx::query( r#"SELECT stream, date, file, min(id) as id FROM file_list GROUP BY stream, date, file HAVING COUNT(*) > 1;"# ) .fetch_all(&mut *tx).await?; // ... existing deletion logic ... // create index again sqlx::query(unique_index_sql).execute(&mut *tx).await?; + tx.commit().await?;src/infra/src/file_list/mysql.rs (1)
1280-1283: Use specific error codes instead of parsing error messagesRelying on error message strings like
"Duplicate key"is fragile and may not handle localization or changes in error messages. Consider checking for specific database error codes to handle duplicate key errors more robustly.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (13)
- src/infra/src/db/mod.rs (3 hunks)
- src/infra/src/db/mysql.rs (5 hunks)
- src/infra/src/db/postgres.rs (6 hunks)
- src/infra/src/db/sqlite.rs (4 hunks)
- src/infra/src/file_list/mysql.rs (3 hunks)
- src/infra/src/file_list/postgres.rs (3 hunks)
- src/infra/src/file_list/sqlite.rs (2 hunks)
- src/infra/src/scheduler/mysql.rs (2 hunks)
- src/infra/src/scheduler/postgres.rs (2 hunks)
- src/infra/src/scheduler/sqlite.rs (2 hunks)
- src/infra/src/schema/history/mysql.rs (2 hunks)
- src/infra/src/schema/history/postgres.rs (2 hunks)
- src/infra/src/schema/history/sqlite.rs (2 hunks)
Additional context used
Path-based instructions (13)
src/infra/src/db/mod.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/db/mysql.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/db/postgres.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/db/sqlite.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/file_list/mysql.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/file_list/postgres.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/file_list/sqlite.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/scheduler/mysql.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/scheduler/postgres.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/scheduler/sqlite.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/schema/history/mysql.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/schema/history/postgres.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/schema/history/sqlite.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (32)
src/infra/src/schema/history/sqlite.rs (3)
21-21: LGTM: New import aligns with PR objectivesThe added import brings in the necessary components for the index caching mechanism, which is consistent with the PR's goal of implementing index existence checks.
136-136: LGTM: Improved error loggingThe error logging message has been updated to be more specific, mentioning the "schema_history" table. This change enhances error traceability and makes debugging easier.
Line range hint
1-141: Overall assessment: Changes align with PR objectives and improve code qualityThe modifications in this file successfully implement the index existence checking mechanism as described in the PR objectives. The changes improve code readability, prevent redundant index creation, and enhance error logging. The only suggestion is a minor refactoring to extract index names as constants for easier maintenance.
src/infra/src/db/mod.rs (2)
16-16: LGTM: New imports and static variable.The new imports and static variable are correctly added to support the new functionality for managing database indices.
Also applies to: 22-22, 39-39
239-243: LGTM: DBIndex struct definition.The
DBIndexstruct is well-defined with appropriate fields and derives the necessary traits for its intended use in aHashSet.src/infra/src/schema/history/postgres.rs (5)
21-21: Updated imports to include index caching functionalityThe import statement now includes
cache_indices_pg,DBIndex, andINDICES, which are required for caching indices and checking their existence before creation.
110-110: Cache existing indices before creating new onesInitializing
indiceswithINDICES.get_or_init(|| cache_indices_pg(&pool)).await;ensures that existing indices are cached and checked before attempting to create new ones. This enhances efficiency and prevents redundant operations.
113-121: Use specific index names with conditional creationThe SQL statements now define specific index names and use
CREATE INDEX IF NOT EXISTS, which prevents errors from attempting to create indices that already exist.
125-131: Check for existing indices before creationThe loop now checks if each index exists in
indicesbefore attempting to create it:if indices.contains(&DBIndex { name: idx.into(), table: "schema_history".into(), }) { continue; }This prevents unnecessary creation attempts and potential errors.
133-133: Improve error logging specificityThe error message now includes context:
log::error!("[POSTGRES] create table schema_history index error: {}", e);This makes it clearer where the error occurred, aiding in debugging.
src/infra/src/schema/history/mysql.rs (5)
21-21: Imported modules are appropriateThe added imports are necessary for the index caching functionality and are correctly specified.
109-109: Efficient initialization of the indices cacheInitializing
indicesusingINDICES.get_or_init(|| cache_indices_mysql(&pool)).awaitensures that the index cache is populated once and reused, which improves performance by avoiding redundant database queries.
112-120: Index creation statements are correctly definedThe index names and corresponding SQL statements for the
schema_historytable are accurately specified, aligning with the intended schema.
124-130: Index existence check logic is accurateThe check for existing indices using the
indices.contains()method before creating new ones effectively prevents attempts to create duplicate indices, which could cause errors.
136-136: Consistent error loggingThe error message provides clear context for any index creation errors related to the
schema_historytable, aiding in troubleshooting.src/infra/src/scheduler/sqlite.rs (1)
80-80: Verify thatget_or_initsupports asynchronous initializationThe use of
INDICES.get_or_init(|| cache_indices_sqlite(&*client)).awaitmay not work as intended becauseget_or_initfromOnceCelldoes not support asynchronous initializers out of the box. This could lead to race conditions or panics if accessed concurrently.Please confirm that
INDICESis an async-aware cell that properly supports asynchronous initialization. If not, consider usingLazyfromonce_cell::sync::Lazywith an async block or refactor the initialization logic to ensure thread safety and correct behavior.src/infra/src/scheduler/mysql.rs (4)
23-23: Importing necessary database componentsThe addition of
cache_indices_mysql,DBIndex, andINDICESto the import statements is appropriate for managing index caching.
81-81: Proper initialization of index cacheThe code correctly initializes and caches the existing indices using
INDICES.get_or_init(|| cache_indices_mysql(&pool)).await;, ensuring that index existence checks are efficient.
84-96: Index definitions are correctly specifiedThe index creation queries for the
scheduled_jobstable are well-defined, which will help optimize query performance:
scheduled_jobs_key_idxon(module_key)scheduled_jobs_org_key_idxon(org, module_key)scheduled_jobs_org_module_key_idxon(org, module, module_key)
98-103: Efficient check for existing indices before creationThe logic to check if an index already exists before attempting creation is implemented correctly, preventing redundant operations:
if indices.contains(&DBIndex { name: idx.into(), table: "scheduled_jobs".into(), }) { continue; }src/infra/src/db/postgres.rs (1)
29-29: Importing necessary modulesThe import of
cache_indices_pg,DBIndex, andINDICESis appropriate for the new index caching functionality.src/infra/src/db/mysql.rs (4)
29-29: Import statements are appropriate.The necessary modules
cache_indices_mysql,DBIndex, andINDICESare correctly imported.
593-593: Indices initialization is correctly implemented.The
indicesare properly cached and initialized asynchronously.
664-664: Indices initialization is correctly implemented.The
indicesare properly cached and initialized asynchronously.
686-692: Consider refactoring to reduce code duplication.As previously mentioned, the index checking and creation logic for
meta_module_start_dt_idxis repetitive. Refactoring this into a reusable function can improve maintainability.src/infra/src/file_list/sqlite.rs (2)
28-32: Import necessary database modulesThe added imports correctly bring in the modules for caching and index management.
1223-1230: Efficiently skip existing indicesThe logic to cache and check for existing indices before creation is efficient and prevents unnecessary operations.
src/infra/src/file_list/postgres.rs (5)
28-28: Approved import ofINDICESfor index caching.The addition of
INDICESto the imports ensures that the index cache is accessible in this module.
1172-1172: Asynchronous initialization ofINDICESis appropriate.Using
get_or_initwith an asynchronous initializer allows for lazy initialization of the index cache without blocking.
1175-1225: Efficient index creation by checking for existing indices before creation.The addition of index names and tables to the
sqlsvector, along with the existence check againstindices, prevents redundant index creation and improves performance.
1230-1236: Correctly skipping existing indices during index creation loop.The loop properly checks if each index exists and continues to the next iteration if it does, avoiding unnecessary SQL execution.
Line range hint
1172-1286: Concurrency considerations withINDICEScache initialization.Ensure that
INDICESis thread-safe and suitable for asynchronous contexts to prevent race conditions during initialization.Confirm that
INDICESis using an appropriate async-aware synchronization primitive liketokio::sync::OnceCell.
Signed-off-by: Yashodhan Joshi <[email protected]>
Signed-off-by: Yashodhan Joshi <[email protected]>
Signed-off-by: Yashodhan Joshi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (12)
src/infra/src/scheduler/mysql.rs (1)
85-109: LGTM: Improved index creation implementationThe changes to the
create_table_indexfunction look good. The use of thecreate_indexfunction improves efficiency by handling index existence checks internally. The indices being created are consistent with the previous implementation, and the unique constraint is correctly applied.Consider extracting the table name "scheduled_jobs" into a constant to improve readability and maintainability. This would make it easier to update the table name if needed in the future.
const SCHEDULED_JOBS_TABLE: &str = "scheduled_jobs"; // Then use it in the function calls: create_index( &pool, "scheduled_jobs_key_idx", SCHEDULED_JOBS_TABLE, false, &["module_key"], ) .await?; // ... and so on for the other create_index callssrc/infra/src/db/postgres.rs (3)
606-617: Improved index management inadd_start_dt_columnfunctionThe modifications to the
add_start_dt_columnfunction enhance consistency by using the newcreate_indexanddelete_indexfunctions. This approach improves safety and maintainability.Consider adding error handling for the
create_indexanddelete_indexcalls to ensure the function fails gracefully if these operations encounter issues.- create_index( - &pool, - "meta_module_start_dt_idx", - "meta", - true, - &["module", "key1", "key2", "start_dt"], - ) - .await?; - delete_index(&pool, "meta_module_key2_idx", "meta").await?; + if let Err(e) = create_index( + &pool, + "meta_module_start_dt_idx", + "meta", + true, + &["module", "key1", "key2", "start_dt"], + ) + .await + { + log::error!("[POSTGRES] Error creating meta_module_start_dt_idx: {}", e); + return Err(e.into()); + } + if let Err(e) = delete_index(&pool, "meta_module_key2_idx", "meta").await { + log::error!("[POSTGRES] Error deleting meta_module_key2_idx: {}", e); + return Err(e.into()); + }
648-674: Well-implementedcreate_indexfunctionThe
create_indexfunction is efficiently implemented, using a cache to avoid unnecessary database queries and constructing a safe SQL query. The logging is appropriate and informative.Consider adding more detailed error logging to provide context about which index creation failed:
- sqlx::query(&sql).execute(client).await?; + sqlx::query(&sql).execute(client).await.map_err(|e| { + log::error!("[POSTGRES] Failed to create index {}: {}", idx_name, e); + e + })?;
676-689: Well-implementeddelete_indexfunctionThe
delete_indexfunction is efficiently implemented, using a cache to avoid unnecessary database queries and constructing a safe SQL query. The logging is appropriate and informative.Similar to the
create_indexfunction, consider adding more detailed error logging:- sqlx::query(&sql).execute(client).await?; + sqlx::query(&sql).execute(client).await.map_err(|e| { + log::error!("[POSTGRES] Failed to delete index {}: {}", idx_name, e); + e + })?;src/infra/src/db/mysql.rs (2)
54-67: Consider using a more descriptive variable name for the SQL query.The variable name
var_namefor the SQL query string is not very descriptive. Consider using a more meaningful name likeindex_queryorfetch_indices_sqlto improve code readability.- let var_name = r#"SELECT INDEX_NAME,TABLE_NAME FROM information_schema.statistics;"#; - let sql = var_name; + let fetch_indices_sql = r#"SELECT INDEX_NAME,TABLE_NAME FROM information_schema.statistics;"#; + let sql = fetch_indices_sql;
Line range hint
659-690: LGTM: Improved index management inadd_start_dt_column.The modifications to use the new
create_indexanddelete_indexfunctions are well-implemented. However, there's an outdated comment that needs to be addressed.Please update or remove the following comment as it's no longer relevant to the current implementation:
- // Check for the specific MySQL error code for duplicate column + // Check for existing column to avoid duplicate column errorsrc/infra/src/db/sqlite.rs (5)
724-741: LGTM: Improved index creation using newcreate_indexfunctionThe changes to the
create_tablefunction look good. Using the newcreate_indexfunction for creating indices improves code consistency and maintainability. The indices created ("meta_module_idx", "meta_module_key1_idx", and "meta_module_start_dt_idx") seem appropriate for the table structure.Consider extracting the index names and table name into constants at the top of the file for easier maintenance and to avoid magic strings.
760-768: LGTM: Improved index management inadd_start_dt_columnThe changes to the
add_start_dt_columnfunction look good. Using the newcreate_indexanddelete_indexfunctions improves code consistency. Creating a new index with the start_dt column and deleting the old one is a good practice for maintaining optimal database performance.Consider adding a comment explaining why the "meta_module_key2_idx" index is being deleted, as it might not be immediately obvious to future maintainers.
797-823: LGTM: Well-implementedcreate_indexfunctionThe
create_indexfunction is well-implemented. It efficiently uses the INDICES cache to avoid unnecessary index creation attempts, handles both unique and non-unique indices, and includes appropriate logging.Consider adding error logging in case the index creation fails. This would help with debugging in production environments.
sqlx::query(&sql).execute(client).await?; log::info!("[SQLITE] index {} created successfully", idx_name); +log::error!("[SQLITE] failed to create index {}: {}", idx_name, e); Ok(())
825-838: LGTM: Well-implementeddelete_indexfunctionThe
delete_indexfunction is well-implemented. It efficiently uses the INDICES cache to avoid unnecessary index deletion attempts and includes appropriate logging.Consider adding error logging in case the index deletion fails. Also, there's a minor typo in the success log message (missing space after "index").
let sql = format!("DROP INDEX IF EXISTS {};", idx_name,); -sqlx::query(&sql).execute(client).await?; -log::info!("[SQLITE] index {}deleted successfully", idx_name); +match sqlx::query(&sql).execute(client).await { + Ok(_) => log::info!("[SQLITE] index {} deleted successfully", idx_name), + Err(e) => log::error!("[SQLITE] failed to delete index {}: {}", idx_name, e), +} Ok(())
Line range hint
100-838: Overall: Good improvements to index management, with room for minor enhancementsThe changes introduce well-implemented functionality for managing database indices, including caching existing indices for performance optimization. The code is generally well-structured and follows good practices.
Consider creating a separate module or struct for index management, which could encapsulate the
INDICEScache and provide methods for creating, deleting, and checking indices. This would further improve code organization and make it easier to maintain and potentially extend index-related functionality in the future.Example structure:
mod index_manager { use once_cell::sync::Lazy; use sqlx::{Pool, Sqlite}; use std::collections::HashSet; static INDICES: Lazy<RwLock<HashSet<DBIndex>>> = Lazy::new(|| RwLock::new(HashSet::new())); pub struct IndexManager; impl IndexManager { pub async fn init(client: &Pool<Sqlite>) { let indices = cache_indices(client).await; *INDICES.write().await = indices; } pub async fn create_index(/* ... */) -> Result<()> { // Implementation } pub async fn delete_index(/* ... */) -> Result<()> { // Implementation } pub async fn index_exists(/* ... */) -> bool { // Implementation } // Other index-related methods... } async fn cache_indices(client: &Pool<Sqlite>) -> HashSet<DBIndex> { // Implementation } }This structure would allow for better encapsulation of index-related functionality and make the code more modular and easier to maintain.
src/infra/src/file_list/mysql.rs (1)
1247-1299: Special case handling for file_list table indexThe implementation includes a special case for creating a unique index on the
file_listtable. It handles potential duplicate entries by deleting them before creating the index. This approach ensures data integrity but has some considerations:
- The error message check on line 1257 might be too broad. Consider using a more specific error code check.
- The duplicate record deletion process could potentially be optimized for large datasets.
Consider the following improvements:
- Use a more specific error code check:
-if !e.to_string().contains("Duplicate entry") { +if let Some(db_err) = e.as_database_error() { + if db_err.code() != Some("1062".into()) { // MySQL error code for duplicate entry return Err(e); + } +} else { + return Err(e); }
- For large datasets, consider batching the delete operations:
let chunk_size = 1000; for chunk in ret.chunks(chunk_size) { let mut query_builder: QueryBuilder<MySql> = QueryBuilder::new( "DELETE FROM file_list WHERE (stream, date, file, id) IN " ); query_builder.push_tuples(chunk, |mut b, r| { b.push_bind(r.get::<String, _>("stream")) .push_bind(r.get::<String, _>("date")) .push_bind(r.get::<String, _>("file")) .push_bind(r.get::<i64, _>("id")); }); query_builder.push(" AND id != VALUES(id)"); query_builder.build().execute(&pool).await?; log::warn!("[MYSQL] Deleted {} duplicate records", chunk.len()); }
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (13)
- src/infra/src/db/mod.rs (3 hunks)
- src/infra/src/db/mysql.rs (6 hunks)
- src/infra/src/db/postgres.rs (6 hunks)
- src/infra/src/db/sqlite.rs (6 hunks)
- src/infra/src/file_list/mysql.rs (2 hunks)
- src/infra/src/file_list/postgres.rs (4 hunks)
- src/infra/src/file_list/sqlite.rs (4 hunks)
- src/infra/src/scheduler/mysql.rs (2 hunks)
- src/infra/src/scheduler/postgres.rs (2 hunks)
- src/infra/src/scheduler/sqlite.rs (2 hunks)
- src/infra/src/schema/history/mysql.rs (2 hunks)
- src/infra/src/schema/history/postgres.rs (2 hunks)
- src/infra/src/schema/history/sqlite.rs (2 hunks)
Files skipped from review as they are similar to previous changes (8)
- src/infra/src/db/mod.rs
- src/infra/src/file_list/postgres.rs
- src/infra/src/file_list/sqlite.rs
- src/infra/src/scheduler/postgres.rs
- src/infra/src/scheduler/sqlite.rs
- src/infra/src/schema/history/mysql.rs
- src/infra/src/schema/history/postgres.rs
- src/infra/src/schema/history/sqlite.rs
Additional context used
Path-based instructions (5)
src/infra/src/db/mysql.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/db/postgres.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/db/sqlite.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/file_list/mysql.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/scheduler/mysql.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (10)
src/infra/src/scheduler/mysql.rs (1)
23-26: LGTM: Import statement updated correctlyThe import statement has been appropriately updated to include the
create_indexfunction, which is consistent with its usage in thecreate_table_indexfunction.src/infra/src/db/postgres.rs (2)
54-66: New functioncache_indiceslooks goodThe
cache_indicesfunction efficiently retrieves and caches existing database indices. It uses a SQL query to fetch index names and table names, then maps the results to aHashSet<DBIndex>. The error handling is appropriate, returning an emptyHashSetif the query fails.
570-584: Improved index creation increate_tablefunctionThe modifications to the
create_tablefunction enhance code maintainability by using the newcreate_indexfunction for creating indices. This centralized approach allows for better error handling and logging. The changes are well-structured and consistent.src/infra/src/db/mysql.rs (4)
635-649: LGTM: Improved index creation using the newcreate_indexfunction.The modifications to use the new
create_indexfunction for creating indices are well-implemented. This approach is more consistent and allows for better error handling and index existence checking.
740-766: LGTM: Well-implementedcreate_indexfunction.The new
create_indexfunction is well-designed and implemented. It efficiently checks for existing indices using the cache, constructs the appropriate SQL statement, and handles the index creation process with proper logging and error handling.
768-781: LGTM: Well-implementeddelete_indexfunction.The new
delete_indexfunction is well-designed and implemented. It efficiently checks for existing indices using the cache, constructs the appropriate SQL statement, and handles the index deletion process with proper logging and error handling.
Line range hint
1-781: Summary: Improved index management with caching mechanism.The changes introduced in this file significantly improve the efficiency of index management for the MySQL database. Key improvements include:
- A new caching mechanism for existing indices.
- New
create_indexanddelete_indexfunctions that utilize the cache to prevent unnecessary operations.- Updates to existing functions like
create_tableandadd_start_dt_columnto use the new index management functions.These changes should result in more efficient database operations, especially when dealing with index creation and deletion. The code is well-structured, consistent, and includes proper error handling and logging.
src/infra/src/db/sqlite.rs (1)
100-112: LGTM: Efficient caching of database indicesThe
cache_indicesfunction is well-implemented. It efficiently retrieves existing indices from the database and returns them as aHashSet<DBIndex>. This approach is good for performance, as it allows for quick lookups when checking if an index exists.src/infra/src/file_list/mysql.rs (2)
1196-1228: Improved index creation structureThe new implementation uses a vector of tuples to define indices, which improves code organization and readability. This approach makes it easier to add or modify indices in the future.
1230-1245: Separation of unique indicesSeparating unique indices from regular indices is a good practice. It clearly distinguishes between the two types and allows for different handling if needed in the future.
a param Signed-off-by: Yashodhan Joshi <[email protected]>
8adc10b to
cb39e69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Outside diff range and nitpick comments (4)
src/infra/src/db/sqlite.rs (4)
793-820: LGTM: Efficient index creation with room for minor improvementThe
create_indexfunction is well-implemented, efficiently checking for existing indices before creation and supporting unique indices. The use of proper logging is commendable.Consider adding error logging in case the index creation fails. You can modify the last few lines like this:
- sqlx::query(&sql).execute(&*client).await?; - log::info!("[SQLITE] index {} created successfully", idx_name); - Ok(()) + match sqlx::query(&sql).execute(&*client).await { + Ok(_) => { + log::info!("[SQLITE] index {} created successfully", idx_name); + Ok(()) + } + Err(e) => { + log::error!("[SQLITE] Failed to create index {}: {}", idx_name, e); + Err(e.into()) + } + }
822-837: LGTM: Efficient index deletion with room for minor improvementThe
delete_indexfunction is well-implemented, efficiently checking for existing indices before deletion. The use of proper logging is commendable.For consistency with the
create_indexfunction, consider adding error logging in case the index deletion fails. You can modify the last few lines like this:- sqlx::query(&sql).execute(&*client).await?; - log::info!("[SQLITE] index {}deleted successfully", idx_name); - Ok(()) + match sqlx::query(&sql).execute(&*client).await { + Ok(_) => { + log::info!("[SQLITE] index {} deleted successfully", idx_name); + Ok(()) + } + Err(e) => { + log::error!("[SQLITE] Failed to delete index {}: {}", idx_name, e); + Err(e.into()) + } + }
693-694: LGTM: Improved data safety with backup creationThe changes to the
add_start_dt_columnmethod in theSqliteDbimplementation improve data safety by creating a backup before modifying the table structure. The use ofawaitensures proper asynchronous execution.Consider adding error handling to ensure that if the backup creation fails, the column addition is not performed. You can modify the code like this:
- create_meta_backup().await?; - add_start_dt_column().await?; + create_meta_backup().await.map_err(|e| { + log::error!("[SQLITE] Failed to create backup before adding start_dt column: {}", e); + e + })?; + add_start_dt_column().awaitThis ensures that if the backup creation fails, an error is logged, and the column addition is not attempted.
Dropping the "meta_module_key2_idx" index may impact query performance.
Several queries in
sqlite.rs,postgres.rs, andmysql.rsutilize thekey2field in their WHERE clauses, which may have previously benefited from the now-dropped index. This change could lead to slower query performance if these operations relied on the index for efficiency.Analysis chain
Line range hint
737-762: Improved index management, but verify impact of dropped indexThe changes to
add_start_dt_columnimprove consistency by using the newcreate_indexanddelete_indexfunctions. However, dropping the "meta_module_key2_idx" index might impact query performance if it was previously used.Please verify that dropping the "meta_module_key2_idx" index doesn't negatively impact query performance. Run the following script to check for queries that might have used this index:
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for queries that might have used the dropped index rg --type rust 'WHERE.*key2.*=' src/Length of output: 2682
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/infra/src/db/sqlite.rs (7 hunks)
- src/infra/src/file_list/sqlite.rs (3 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/infra/src/file_list/sqlite.rs
Additional context used
Path-based instructions (1)
src/infra/src/db/sqlite.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (3)
src/infra/src/db/sqlite.rs (3)
100-113: LGTM: Efficient implementation of index cachingThe
cache_indicesfunction is well-implemented. It efficiently retrieves existing indices from the database and handles potential errors gracefully by returning an empty HashSet if an error occurs. The use of a HashSet for storing indices is a good choice for efficient lookups.
Line range hint
766-791: LGTM: Simplified function signature and consistent client usageThe changes to
create_meta_backupimprove consistency by using theCLIENT_RWlazy static variable instead of passing the client as a parameter. This simplifies the function signature and ensures consistent use of the write client across the module.
718-733: LGTM: Improved consistency in index creationThe changes to the
create_tablefunction improve consistency by using the newcreate_indexfunction to create indices. The indices created ("meta_module_idx", "meta_module_key1_idx", and "meta_module_start_dt_idx") seem appropriate for the table structure and should improve query performance.
fixes : #4601
This runs a query to get and cache existing indices before running queries to create indices. This way we only run the
create index...command if the index does not exist.Note that as we create indices at the start/init of the application, this does not add the newly created indices in the cache, because once that index is create at init, there is not other case which will re-try creating the same index. The issue is only at the pod start, i.e. the first application init, so this only cache pre-existing indices , i.e. ones that exist before the pod start.
As this cache is in-memory, it will be per-pod / per-node, hence when starting on a clean DB, this can still cause lock-ups if all nodes try to create the indices in the blank table simultaneously. Have not handled that as index creation on blannk table is not that time consuming anyways, and to implement that we will need to use etcd or something similar to coordinate the cache.
PS: Maybe we can also get the solution if we add
IF NOT EXISTin all places, currently only some queries use this.Summary by CodeRabbit
New Features
Bug Fixes
Documentation