MOD-10349: Add Scripts For Reading And Generating Numeric Trees#6327
MOD-10349: Add Scripts For Reading And Generating Numeric Trees#6327
Conversation
|
Do we plan to merge this PR? |
@GuyAv46 I would like to, |
GuyAv46
left a comment
There was a problem hiding this comment.
Python scripts look awesome 🦾
Let's avoid the micro-benchmarks changes
…parsed don't use real values when incrementing the document id.
There was a problem hiding this comment.
Pull Request Overview
This PR introduces end-to-end tooling for numeric tree analysis: generating Redis numeric indexes with varied ID distributions, parsing tree dumps into JSON, visualizing them interactively, and benchmarking intersection performance.
- Added C++ micro-benchmark for intersection iterator under varied ID patterns
- Introduced Python scripts to generate, parse, and visualize numeric trees
- Updated Python dependencies and documentation for new tools
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp | New benchmark for document‐ID pattern intersection |
| sbin/numeric_tree/visualize_numeric_tree.py | Script to render parsed trees with Plotly and fallback |
| sbin/numeric_tree/requirements.txt | Added networkx, plotly (and optional pygraphviz) deps |
| sbin/numeric_tree/parse_numeric_tree.py | Fast parser for Redis DEBUG DUMP_NUMIDXTREE output |
| sbin/numeric_tree/generate_numeric_trees.py | Generator for multiple numeric indexes with spread options |
| sbin/numeric_tree/README.md | Documentation for generation, parsing, benchmarking, and visualization |
Comments suppressed due to low confidence (5)
tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp:188
- The scenario macro covers (10,1000), (10,5000), (25,1000), (25,5000), and (50,1000) but omits (50,5000) combinations. Add Args({50, 5000, CONSECUTIVE}) and its sparse/modulo variants to fully exercise benchmarks.
#define INTERSECTION_SCENARIOS() \
sbin/numeric_tree/generate_numeric_trees.py:10
- Docstring and usage examples mention
--indexesand--spread, but argparse only defines--docs-per-index,--sparse-size, and--cleanup. Align documentation with actual CLI options.
python generate_numeric_trees.py --indexes 5 --docs-per-index 10000 --spread sparse
sbin/numeric_tree/generate_numeric_trees.py:119
- [nitpick] This message prints
len(insertion_sequence)including placeholder entries skipped during insertion. Consider reporting the actual inserted count (count) or clarifying that placeholders are filtered later.
print(f"Populating index {config.name} ({config.insertion_order} order) with {len(insertion_sequence)} documents...")
sbin/numeric_tree/README.md:10
- The README refers to
benchmark_numeric_tree.py, but no such script exists in this directory. Update the docs to reference the actual C++ benchmark or provide the missing script.
### 2. `benchmark_numeric_tree.py`
sbin/numeric_tree/visualize_numeric_tree.py:321
- The usage message refers to
numeric_tree_visualize.pybut the script is namedvisualize_numeric_tree.py. Update the help text to match the actual filename.
print(" Interactive visualization: python numeric_tree_visualize.py <tree.json> [spacing_factor]")
| } | ||
|
|
||
| // Helper function to create intersection iterator with two union children | ||
| IndexIterator* createIntersectionIterator() { |
There was a problem hiding this comment.
Memory allocated for child idlist iterators and their ID arrays via rm_malloc is not freed, potentially leaking memory across benchmark runs. Consider freeing these allocations after iterator use.
raz-mon
left a comment
There was a problem hiding this comment.
Nice stuff, I think tools like this can help us a lot in general 💯
Let's have this a parent directory for such tools, where we will put more stuff, e.g., sbin/debug_tools/
| parser.add_argument('--docs-per-index', '-d', type=int, default=10000, | ||
| help='Number of base documents per index (default: 10000)') | ||
| parser.add_argument('--sparse-size', '-s', type=int, default=100, | ||
| help='Sparse size for sparsed insertion (default: 100)') |
There was a problem hiding this comment.
What does this mean practically?
There was a problem hiding this comment.
the gap size between document ids
tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp
Outdated
Show resolved
Hide resolved
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6327 +/- ##
==========================================
+ Coverage 88.88% 89.38% +0.49%
==========================================
Files 240 252 +12
Lines 40454 41681 +1227
Branches 3165 3963 +798
==========================================
+ Hits 35959 37255 +1296
+ Misses 4460 4369 -91
- Partials 35 57 +22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add a bunch of scripts:
A clear and concise description of what the PR is solving, including:
Mark if applicable