Skip to content

MOD-10349: Add Scripts For Reading And Generating Numeric Trees#6327

Merged
kei-nan merged 14 commits intomasterfrom
master_jk_read_numeric_tree_from_file
Jul 6, 2025
Merged

MOD-10349: Add Scripts For Reading And Generating Numeric Trees#6327
kei-nan merged 14 commits intomasterfrom
master_jk_read_numeric_tree_from_file

Conversation

@kei-nan
Copy link
Collaborator

@kei-nan kei-nan commented Jun 16, 2025

Add a bunch of scripts:

  • read the dump of numeric trees into a json file
  • visualize the json file via an html file - interactive tree
  • generate numeric trees with different document ids spread
  • benchmark the numeric trees against one another.

A clear and concise description of what the PR is solving, including:

  1. Current: Hard to analyse a numeric tree dump
  2. Change: Add scripts to help analyse and benchmark numeric trees.
  3. Outcome: Easier analysis of numeric tree issues.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@kei-nan kei-nan self-assigned this Jun 16, 2025
@kei-nan kei-nan requested a review from raz-mon June 16, 2025 12:40
@kei-nan kei-nan requested a review from GuyAv46 June 16, 2025 12:40
@GuyAv46
Copy link
Collaborator

GuyAv46 commented Jun 18, 2025

Do we plan to merge this PR?

@kei-nan
Copy link
Collaborator Author

kei-nan commented Jun 18, 2025

Do we plan to merge this PR?

@GuyAv46 I would like to,
The script which parses the numeric tree is nice, also the visualization script I feel is very helpful to have.
We can talk script by script basis but it for sure can help having some of these scripts around.

Copy link
Collaborator

@GuyAv46 GuyAv46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python scripts look awesome 🦾
Let's avoid the micro-benchmarks changes

…parsed don't use real values when incrementing the document id.
@raz-mon raz-mon requested a review from Copilot June 19, 2025 07:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces end-to-end tooling for numeric tree analysis: generating Redis numeric indexes with varied ID distributions, parsing tree dumps into JSON, visualizing them interactively, and benchmarking intersection performance.

  • Added C++ micro-benchmark for intersection iterator under varied ID patterns
  • Introduced Python scripts to generate, parse, and visualize numeric trees
  • Updated Python dependencies and documentation for new tools

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp New benchmark for document‐ID pattern intersection
sbin/numeric_tree/visualize_numeric_tree.py Script to render parsed trees with Plotly and fallback
sbin/numeric_tree/requirements.txt Added networkx, plotly (and optional pygraphviz) deps
sbin/numeric_tree/parse_numeric_tree.py Fast parser for Redis DEBUG DUMP_NUMIDXTREE output
sbin/numeric_tree/generate_numeric_trees.py Generator for multiple numeric indexes with spread options
sbin/numeric_tree/README.md Documentation for generation, parsing, benchmarking, and visualization
Comments suppressed due to low confidence (5)

tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp:188

  • The scenario macro covers (10,1000), (10,5000), (25,1000), (25,5000), and (50,1000) but omits (50,5000) combinations. Add Args({50, 5000, CONSECUTIVE}) and its sparse/modulo variants to fully exercise benchmarks.
#define INTERSECTION_SCENARIOS() \

sbin/numeric_tree/generate_numeric_trees.py:10

  • Docstring and usage examples mention --indexes and --spread, but argparse only defines --docs-per-index, --sparse-size, and --cleanup. Align documentation with actual CLI options.
    python generate_numeric_trees.py --indexes 5 --docs-per-index 10000 --spread sparse

sbin/numeric_tree/generate_numeric_trees.py:119

  • [nitpick] This message prints len(insertion_sequence) including placeholder entries skipped during insertion. Consider reporting the actual inserted count (count) or clarifying that placeholders are filtered later.
        print(f"Populating index {config.name} ({config.insertion_order} order) with {len(insertion_sequence)} documents...")

sbin/numeric_tree/README.md:10

  • The README refers to benchmark_numeric_tree.py, but no such script exists in this directory. Update the docs to reference the actual C++ benchmark or provide the missing script.
### 2. `benchmark_numeric_tree.py`

sbin/numeric_tree/visualize_numeric_tree.py:321

  • The usage message refers to numeric_tree_visualize.py but the script is named visualize_numeric_tree.py. Update the help text to match the actual filename.
        print("  Interactive visualization: python numeric_tree_visualize.py <tree.json> [spacing_factor]")

}

// Helper function to create intersection iterator with two union children
IndexIterator* createIntersectionIterator() {
Copy link

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory allocated for child idlist iterators and their ID arrays via rm_malloc is not freed, potentially leaking memory across benchmark runs. Consider freeing these allocations after iterator use.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@raz-mon raz-mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice stuff, I think tools like this can help us a lot in general 💯
Let's have this a parent directory for such tools, where we will put more stuff, e.g., sbin/debug_tools/

parser.add_argument('--docs-per-index', '-d', type=int, default=10000,
help='Number of base documents per index (default: 10000)')
parser.add_argument('--sparse-size', '-s', type=int, default=100,
help='Sparse size for sparsed insertion (default: 100)')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean practically?

Copy link
Collaborator Author

@kei-nan kei-nan Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gap size between document ids

@kei-nan kei-nan changed the title MOD-8808: Add Scripts For Reading And Generating Numeric Trees Add Scripts For Reading And Generating Numeric Trees Jul 3, 2025
@CLAassistant
Copy link

CLAassistant commented Jul 3, 2025

CLA assistant check
All committers have signed the CLA.

@kei-nan kei-nan changed the title Add Scripts For Reading And Generating Numeric Trees MOD-10349: Add Scripts For Reading And Generating Numeric Trees Jul 3, 2025
@kei-nan kei-nan enabled auto-merge July 3, 2025 15:35
GuyAv46
GuyAv46 previously approved these changes Jul 3, 2025
@codecov
Copy link

codecov bot commented Jul 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.38%. Comparing base (612db18) to head (e0c5f86).
Report is 59 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6327      +/-   ##
==========================================
+ Coverage   88.88%   89.38%   +0.49%     
==========================================
  Files         240      252      +12     
  Lines       40454    41681    +1227     
  Branches     3165     3963     +798     
==========================================
+ Hits        35959    37255    +1296     
+ Misses       4460     4369      -91     
- Partials       35       57      +22     
Flag Coverage Δ
flow 82.00% <100.00%> (-0.46%) ⬇️
unit 47.25% <0.00%> (+1.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kei-nan kei-nan added this pull request to the merge queue Jul 6, 2025
Merged via the queue into master with commit cce9400 Jul 6, 2025
15 checks passed
@kei-nan kei-nan deleted the master_jk_read_numeric_tree_from_file branch July 6, 2025 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants