fix: stream status incorrect #4547

hengfeiyang · 2024-09-13T14:34:53Z

If your steam stats is incorrect, please upgrade the this version and reset the stream stats once.

./openobserve reset -c stream-stats

Summary by CodeRabbit

New Features
- Enhanced the load_local_node function to provide detailed initialization of Node instances based on configuration settings.
Bug Fixes
- Improved efficiency and accuracy in managing stream statistics across MySQL, PostgreSQL, and SQLite databases by streamlining update processes and ensuring data integrity.
Refactor
- Simplified function names and logic related to loading local node configurations and handling stream statistics.

coderabbitai · 2024-09-13T14:35:04Z

Walkthrough

The changes in this pull request involve renaming and modifying functions related to loading local node configurations and stream statistics management across various files. The load_local_mode_node function has been renamed to load_local_node, and its visibility has been changed to public. Additionally, the logic for handling stream statistics in MySQL, PostgreSQL, and SQLite has been simplified to improve efficiency and accuracy in updating statistics.

Changes

File Path	Change Summary
`src/common/infra/cluster/mod.rs`	Renamed `load_local_mode_node` to `load_local_node` and removed the original function.
`src/config/src/cluster.rs`	Changed `load_local_node` from private to public and updated its implementation for detailed initialization of `Node`.
`src/infra/src/file_list/mysql.rs`	Simplified logic for handling stream statistics and updated SQL statements for efficiency.
`src/infra/src/file_list/postgres.rs`	Streamlined handling of stream statistics and modified SQL update statements for cumulative updates.
`src/infra/src/file_list/sqlite.rs`	Simplified stream statistics handling and updated SQL statements to ensure accurate cumulative statistics.

Assessment against linked issues

Objective	Addressed	Explanation
Stream Stats should be updated after retention job completes (#3935)	❓	The changes improve statistics handling but do not explicitly confirm updates after retention jobs.

Possibly related PRs

fix: improve consistent hashing #4512: This PR modifies the same file as the main PR, indicating a connection in the codebase related to consistent hashing.
fix: consistent hashing #4517: This PR also involves changes to variable names related to node identification, aligning with the focus of the main PR.

Suggested labels

🧹 Updates

Tip

OpenAI O1 model for chat

We have deployed OpenAI's latest O1 model for chat.
OpenAI claims that this model has superior reasoning capabilities than their GPT-4o model.
Please share any feedback with us in the discussions post.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Outside diff range and nitpick comments (1)

src/config/src/cluster.rs (1)

30-43: Consider using a unique identifier for the id field and handling different node states.

Making the load_local_node function public is appropriate if it needs to be accessed from other modules.

Initializing the Node struct with explicit fields based on the configuration provides a more detailed and specific initialization, reflecting the current configuration settings.

However, consider the following suggestions:

The id field is hardcoded to 1. If multiple nodes are expected, consider using a unique identifier for each node.

The status field is set to NodeStatus::Online, which assumes the node is always online when loaded. Consider handling cases where the node might be offline or in a different state.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between db7c7f6 and 3a1b7ea.

Files selected for processing (5)

src/common/infra/cluster/mod.rs (2 hunks)
src/config/src/cluster.rs (1 hunks)
src/infra/src/file_list/mysql.rs (3 hunks)
src/infra/src/file_list/postgres.rs (3 hunks)
src/infra/src/file_list/sqlite.rs (3 hunks)

Files skipped from review due to trivial changes (1)

src/common/infra/cluster/mod.rs

Additional context used

Path-based instructions (4)

src/config/src/cluster.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/infra/src/file_list/postgres.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/infra/src/file_list/sqlite.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/infra/src/file_list/mysql.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Additional comments not posted (10)

src/infra/src/file_list/postgres.rs (4)

485-487: LGTM!

The code correctly identifies new stream keys that are not present in old_stats and adds them to new_streams for further processing.

488-488: LGTM!

The code correctly adds the current stream key and item to update_streams for updating the stream statistics later.

520-522: LGTM!

The changes to the SQL update statement are correct. Incrementally updating the statistics by adding the new values to the existing ones allows for cumulative statistics tracking.

569-574: LGTM!

The new SQL query is correct and enhances data integrity. It ensures that the max_ts field is always greater than or equal to min_ts, preventing inconsistencies in the stream statistics.

src/infra/src/file_list/sqlite.rs (3)

490-493: LGTM!

The changes simplify the process of updating stream statistics by directly checking the old_stats map and adding new streams to a separate vector. This approach improves efficiency and aligns with the AI-generated summary.

528-529: Excellent improvement!

The modifications to the SQL update statement ensure that the statistics for existing streams are incrementally updated by adding the new values to the existing ones. This approach maintains the cumulative nature of the statistics and prevents data loss, enhancing the accuracy of the stream statistics. Well done!

570-575: Good catch!

The introduction of the new SQL query to set max_ts to min_ts when max_ts is less than min_ts is a valuable addition. This ensures data integrity and maintains the consistency and accuracy of the stream statistics. It's a thoughtful improvement that aligns with the AI-generated summary.

src/infra/src/file_list/mysql.rs (3)

480-483: LGTM!

The simplified logic for identifying new streams looks good. It directly checks if a stream exists in old_stats and adds it to new_streams if it doesn't, which is more efficient than the previous implementation.

Line range hint 517-524: Incremental updates and max_ts consistency check are good optimizations.

The modifications to the SQL update statement for stream_stats look good:

Incrementally updating the file_num, records, original_size, and compressed_size fields enhances efficiency by avoiding redundant data retrieval and manipulation.

Updating the max_ts field to reflect the minimum timestamp when it is less than the current minimum ensures data consistency.

565-570: Consistency check for max_ts and min_ts is a good addition.

The new SQL query to update the max_ts field to the value of min_ts when max_ts is less than min_ts is a good addition. It ensures data consistency by maintaining the invariant that max_ts should always be greater than or equal to min_ts in the stream_stats table.

src/config/src/cluster.rs

hengfeiyang added 3 commits September 13, 2024 12:16

fix: steam stats incorrect

0704f92

fix: compactor on local mode

075d9ee

Merge branch 'main' into fix/stream-stats

3a1b7ea

github-actions bot added the ☢️ Bug Something isn't working label Sep 13, 2024

coderabbitai bot reviewed Sep 13, 2024

View reviewed changes

src/config/src/cluster.rs Show resolved Hide resolved

oasisk approved these changes Sep 13, 2024

View reviewed changes

hengfeiyang merged commit 3b1eebe into main Sep 13, 2024

hengfeiyang deleted the fix/stream-stats branch September 13, 2024 15:25

coderabbitai bot mentioned this pull request Sep 13, 2024

fix: stream stats incorrect #4548

Merged

hengfeiyang added a commit that referenced this pull request Sep 17, 2024

fix: stream status incorrect (#4547)

6107e05

coderabbitai bot mentioned this pull request Sep 18, 2024

fix: release wal parquet files #4584

Merged

coderabbitai bot mentioned this pull request Oct 14, 2024

perf: use private merge function for basic stream type #4783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: stream status incorrect #4547

fix: stream status incorrect #4547

Uh oh!

hengfeiyang commented Sep 13, 2024 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 13, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: stream status incorrect #4547

fix: stream status incorrect #4547

Uh oh!

Conversation

hengfeiyang commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested labels

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hengfeiyang commented Sep 13, 2024 •

edited

Loading

coderabbitai bot commented Sep 13, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)