Skip to content

Support for RDB analysis reports#1743

Merged
zuiderkwast merged 10 commits intovalkey-io:unstablefrom
artikell:feat/profile_feature
Jun 16, 2025
Merged

Support for RDB analysis reports#1743
zuiderkwast merged 10 commits intovalkey-io:unstablefrom
artikell:feat/profile_feature

Conversation

@artikell
Copy link
Contributor

@artikell artikell commented Feb 16, 2025

The PR add analysis capabilities to the valkey-check-rdb tool. Now the tool can statistically analyze various types of keys, expired keys, and key elements.

  • Added the --stats parameter to the valkey-check-rdb tool's rdb analysis functionality, which enables opening reports.
  • Supported the --format parameter with three options, table/csv/info, to provide more flexible output formats.
  • Add function & lua script number info

The content format is as follows:

... ...
[offset 8999] \o/ RDB looks OK! \o/
[info] 60 keys read
[info] 0 expires
[info] 0 already expired
db.0.type.name                  string  list    set     zset    hash    module  stream
db.0.keys.total                 10      0       0       0       0       0       0    
db.0.exipre_keys.total          0       0       0       0       0       0       0    
db.0.already_expired.total      0       0       0       0       0       0       0    
db.0.keys.size                  200     0       0       0       0       0       0    
db.0.keys.value_size            2000    0       0       0       0       0       0    
db.0.elements.total             10      0       0       0       0       0       0    
db.0.elements.size              2000    0       0       0       0       0       0    
db.0.elements.num.max           1       0       0       0       0       0       0    
db.0.elements.num.avg           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p99           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p90           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p50           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.max          200     0       0       0       0       0       0    
db.0.elements.size.avg          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p99          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p90          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p50          200.00  0.00    0.00    0.00    0.00    0.00    0.00

@codecov
Copy link

codecov bot commented Feb 16, 2025

Codecov Report

Attention: Patch coverage is 75.09025% with 69 lines in your changes missing coverage. Please review.

Project coverage is 71.44%. Comparing base (c5ae37f) to head (b033ab7).
Report is 212 commits behind head on unstable.

Files with missing lines Patch % Lines
src/valkey-check-rdb.c 75.09% 69 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1743      +/-   ##
============================================
+ Coverage     71.17%   71.44%   +0.27%     
============================================
  Files           123      122       -1     
  Lines         65536    66766    +1230     
============================================
+ Hits          46645    47704    +1059     
- Misses        18891    19062     +171     
Files with missing lines Coverage Δ
src/valkey-check-rdb.c 72.15% <75.09%> (+7.56%) ⬆️

... and 70 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: wei.kukey <[email protected]>
Signed-off-by: artikell <[email protected]>
@artikell artikell force-pushed the feat/profile_feature branch from 523708d to 04c476c Compare February 16, 2025 12:30
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Not sure about the argument name --profiler. Profiler is more often used for things like CPU time.

How about --stats or --verbose?

An RDB file can also contain Functions IIRC. Shall we print the count too?

@enjoy-binbin I know you wanted to print the eval scripts in old RDB files. Maybe we can print the number of scripts as part of stats here too?

@artikell
Copy link
Contributor Author

--stats seems more suitable for this scenario, let me add the data of the function.

@artikell artikell requested a review from zuiderkwast March 24, 2025 15:17
@enjoy-binbin
Copy link
Member

@enjoy-binbin I know you wanted to print the eval scripts in old RDB files. Maybe we can print the number of scripts as part of stats here too?

You have a wrong memory, i dont want to print the lua scripts, i want to print the number of scripts. See #1448

@zuiderkwast
Copy link
Contributor

@enjoy-binbin I know you wanted to print the eval scripts in old RDB files. Maybe we can print the number of scripts as part of stats here too?

You have a wrong memory, i dont want to print the lua scripts, i want to print the number of scripts. See #1448

Sorry, I mean number of scripts. This PR prints the number of many things, such as the of functions, number of keys of each type, etc. To print the number of lua scripts if any would make sense here? Maybe only print it if it's non-zero, because it's only for quite old RDB files. WDYT?

@enjoy-binbin
Copy link
Member

This PR prints the number of many things, such as the of functions, number of keys of each type, etc. To print the number of lua scripts if any would make sense here? Maybe only print it if it's non-zero, because it's only for quite old RDB files. WDYT?

SGTM, we can print it (that is what i was looking for at that time). I can re-open the old PR if needed if you want to keep the history. Or we can do it in this PR.

@zuiderkwast
Copy link
Contributor

Or we can do it in this PR.

I guess in this PR is better in that case. This PR prints the info in another way. Or maybe reopen it after this PR is merged.

@artikell do you want to add the number of lua scripts from old RDB files as in #1448? They are just aux-fields with the name "lua". I suggest print them only if the number is non-zero since it's old stuff that will always be zero except in very old RDB files.

@zuiderkwast
Copy link
Contributor

@artikell What is the "form" formats? Can you mention it in the PR top comment?

What's the use case of the "form" format? If it is to consume by scripts, I guess CSV or JSON formats is more common. We already use those formats in valkey-cli and valkey-benchmark.

@artikell
Copy link
Contributor Author

artikell commented Mar 25, 2025

@artikell What is the "form" formats? Can you mention it in the PR top comment?

What's the use case of the "form" format? If it is to consume by scripts, I guess CSV or JSON formats is more common. We already use those formats in valkey-cli and valkey-benchmark.

  • The form is a format aligned by tabs.(better to call it "table"?) I think it is more conducive to people's reading. Just like the format in the commit, the result is that it is aligned. In my opinion, CSV can also be supported.

  • I have considered using JSON, but implementing the YAML format is a bit simpler (at least, all it takes is adding some spaces).

Regarding the number of Lua scripts, I will refer to Binbin's MR to make the addition in this MR.

@zuiderkwast
Copy link
Contributor

  • The form is a format aligned by tabs.(better to call it "table"?) I think it is more conducive to people's reading.

Yes maybe "table" is a better name. The example in the PR comment is the "info" format? It is also readable for users. Not a big difference to "form"? I'm thinking maybe we don't need both....

For CSV, JSON, YAML, I don't have a strong opinion but it seems nice to align the output format from our different tools. CSV seems to be supported by most of them so add it here too?

@artikell
Copy link
Contributor Author

artikell commented Mar 25, 2025

  • The form is a format aligned by tabs.(better to call it "table"?) I think it is more conducive to people's reading.

Yes maybe "table" is a better name. The example in the PR comment is the "info" format? It is also readable for users. Not a big difference to "form"? I'm thinking maybe we don't need both....

For CSV, JSON, YAML, I don't have a strong opinion but it seems nice to align the output format from our different tools. CSV seems to be supported by most of them so add it here too?

@zuiderkwast I've understood your thoughts. Now I've changed it to three formats:

  • Table: It is the default structure in the PR comment, which is conducive to human reading.
  • CSV: A formatted output that is convenient for other scripts to parse.
  • Info: Referring to the original format, it is used for the TCL script to make correctness judgments.

This is CSV output:
image

This is Info output:
image

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV output is prefixed with some non-CSV content like

[offset 8999] \o/ RDB looks OK! \o/
[info] 60 keys read
[info] 0 expires
[info] 0 already expired

.... so I guess it doesn't work to direct the output to file and load it as a CSV file.

Can we skip the prefix information for the CSV output mode? An option could be to print the non-CSV prefix info to stderr and the CSV content to stdout (?).

@artikell
Copy link
Contributor Author

The CSV output is prefixed with some non-CSV content like

[offset 8999] \o/ RDB looks OK! \o/
[info] 60 keys read
[info] 0 expires
[info] 0 already expired

.... so I guess it doesn't work to direct the output to file and load it as a CSV file.

Can we skip the prefix information for the CSV output mode? An option could be to print the non-CSV prefix info to stderr and the CSV content to stdout (?).

I have added an --output parameter, which enables the output of statistical information to a file by modifying the target of stdout. It seems to meet the requirements.

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the --output argument will work with --csv. I hope users will find it, instead of using shell redirect or pipe valkey-check-rdb --csv > file.

artikell added 2 commits May 28, 2025 16:58
Signed-off-by: skyfirelee <[email protected]>
Signed-off-by: skyfirelee <[email protected]>
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks and sorry for the delay.

Please update the PR top comment to mention the parameters that were actually implemented.

@enjoy-binbin or @madolson do you want to take a look? If not, I will merge it.

Adjust the irrelevant comments.

Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: skyfirelee <[email protected]>
@artikell artikell force-pushed the feat/profile_feature branch from 7f58984 to f6adf9b Compare June 10, 2025 14:15
@zuiderkwast zuiderkwast added to-be-merged Almost ready to merge release-notes This issue should get a line item in the release notes labels Jun 10, 2025
Signed-off-by: wei.kukey <[email protected]>
Signed-off-by: artikell <[email protected]>
@artikell artikell force-pushed the feat/profile_feature branch from bce221f to b033ab7 Compare June 15, 2025 17:14
@zuiderkwast zuiderkwast merged commit e71f03c into valkey-io:unstable Jun 16, 2025
52 checks passed
@zuiderkwast zuiderkwast removed the to-be-merged Almost ready to merge label Jun 16, 2025
@roshkhatri
Copy link
Member

roshkhatri commented Jun 17, 2025

This commit causes a build failure in alpine-linux

valkey-check-rdb.c: In function 'rdbShowGenericInfo':
valkey-check-rdb.c:498:20: error: assignment of read-only variable 'stdout'
  498 |             stdout = fdopen(1, "w");
      |                    ^
make[1]: *** [Makefile:546: valkey-check-rdb.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: Leaving directory '/usr/src/valkey/src'
make: *** [Makefile:6: all] Error 2
make: Leaving directory '/usr/src/valkey'

https://github.com/valkey-io/valkey-container/actions/runs/15694960906/job/44219225483#step:5:739

enjoy-binbin added a commit that referenced this pull request Jun 18, 2025
…2225)

Use the standard `dup2` to redirect `stdout`

Closes #2224
Follow-up of #1743

Signed-off-by: skyfirelee <[email protected]>
Signed-off-by: Binbin <[email protected]>
Co-authored-by: Binbin <[email protected]>
ranshid pushed a commit to ranshid/valkey that referenced this pull request Jun 18, 2025
The PR add analysis capabilities to the valkey-check-rdb tool. Now the
tool can statistically analyze various types of keys, expired keys, and
key elements.

- Added the --stats parameter to the valkey-check-rdb tool's rdb
analysis functionality, which enables opening reports.
- Supported the --format parameter with three options, table/csv/info,
to provide more flexible output formats.
- Add function & lua script number info

The content format is as follows:
```
... ...
[offset 8999] \o/ RDB looks OK! \o/
[info] 60 keys read
[info] 0 expires
[info] 0 already expired
db.0.type.name                  string  list    set     zset    hash    module  stream
db.0.keys.total                 10      0       0       0       0       0       0    
db.0.exipre_keys.total          0       0       0       0       0       0       0    
db.0.already_expired.total      0       0       0       0       0       0       0    
db.0.keys.size                  200     0       0       0       0       0       0    
db.0.keys.value_size            2000    0       0       0       0       0       0    
db.0.elements.total             10      0       0       0       0       0       0    
db.0.elements.size              2000    0       0       0       0       0       0    
db.0.elements.num.max           1       0       0       0       0       0       0    
db.0.elements.num.avg           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p99           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p90           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.num.p50           1.00    0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.max          200     0       0       0       0       0       0    
db.0.elements.size.avg          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p99          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p90          200.00  0.00    0.00    0.00    0.00    0.00    0.00 
db.0.elements.size.p50          200.00  0.00    0.00    0.00    0.00    0.00    0.00
```

---------

Signed-off-by: artikell <[email protected]>
Signed-off-by: skyfirelee <[email protected]>
Signed-off-by: wei.kukey <[email protected]>
Co-authored-by: wei.kukey <[email protected]>
Co-authored-by: Binbin <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
ranshid pushed a commit to ranshid/valkey that referenced this pull request Jun 18, 2025
…alkey-io#2225)

Use the standard `dup2` to redirect `stdout`

Closes valkey-io#2224
Follow-up of valkey-io#1743

Signed-off-by: skyfirelee <[email protected]>
Signed-off-by: Binbin <[email protected]>
Co-authored-by: Binbin <[email protected]>
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Jul 27, 2025
There are two problems with initializing databases to 1:
1. db0 is used by default. After printing db0, we will additionally print
   db1 with no reason.
2. Based on 1, if the RDB is an empty RDB, check-rdb will crash. Because
   there is no key data, we don't have the chance to call tryExpandRdbStats,
   and then when printting db1, we will crash on rdbstate.stats.

Intrudoced in valkey-io#1743.

Signed-off-by: Binbin <[email protected]>
enjoy-binbin added a commit that referenced this pull request Aug 4, 2025
There are two problems with initializing databases to 1:
1. db0 is used by default. After printing db0, we will additionally print
   db1 with no reason.
2. Based on 1, if the RDB is an empty RDB, check-rdb will crash. Because
   there is no key data, we don't have the chance to call tryExpandRdbStats,
   and then when printting db1, we will crash on rdbstate.stats.

Introduced in #1743.

Signed-off-by: Binbin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants