MOD-10349: Add Scripts For Reading And Generating Numeric Trees by kei-nan · Pull Request #6327 · RediSearch/RediSearch

kei-nan · 2025-06-16T12:40:06Z

Add a bunch of scripts:

read the dump of numeric trees into a json file
visualize the json file via an html file - interactive tree
generate numeric trees with different document ids spread
benchmark the numeric trees against one another.

A clear and concise description of what the PR is solving, including:

Current: Hard to analyse a numeric tree dump
Change: Add scripts to help analyse and benchmark numeric trees.
Outcome: Easier analysis of numeric tree issues.

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

…nt id spreads

GuyAv46 · 2025-06-18T07:47:44Z

Do we plan to merge this PR?

kei-nan · 2025-06-18T10:34:36Z

Do we plan to merge this PR?

@GuyAv46 I would like to,
The script which parses the numeric tree is nice, also the visualization script I feel is very helpful to have.
We can talk script by script basis but it for sure can help having some of these scripts around.

GuyAv46

Python scripts look awesome 🦾
Let's avoid the micro-benchmarks changes

…parsed don't use real values when incrementing the document id.

Copilot

Pull Request Overview

This PR introduces end-to-end tooling for numeric tree analysis: generating Redis numeric indexes with varied ID distributions, parsing tree dumps into JSON, visualizing them interactively, and benchmarking intersection performance.

Added C++ micro-benchmark for intersection iterator under varied ID patterns
Introduced Python scripts to generate, parse, and visualize numeric trees
Updated Python dependencies and documentation for new tools

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp	New benchmark for document‐ID pattern intersection
sbin/numeric_tree/visualize_numeric_tree.py	Script to render parsed trees with Plotly and fallback
sbin/numeric_tree/requirements.txt	Added networkx, plotly (and optional pygraphviz) deps
sbin/numeric_tree/parse_numeric_tree.py	Fast parser for Redis DEBUG DUMP_NUMIDXTREE output
sbin/numeric_tree/generate_numeric_trees.py	Generator for multiple numeric indexes with spread options
sbin/numeric_tree/README.md	Documentation for generation, parsing, benchmarking, and visualization

Comments suppressed due to low confidence (5)

tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp:188

The scenario macro covers (10,1000), (10,5000), (25,1000), (25,5000), and (50,1000) but omits (50,5000) combinations. Add Args({50, 5000, CONSECUTIVE}) and its sparse/modulo variants to fully exercise benchmarks.

#define INTERSECTION_SCENARIOS() \

sbin/numeric_tree/generate_numeric_trees.py:10

Docstring and usage examples mention --indexes and --spread, but argparse only defines --docs-per-index, --sparse-size, and --cleanup. Align documentation with actual CLI options.

    python generate_numeric_trees.py --indexes 5 --docs-per-index 10000 --spread sparse

sbin/numeric_tree/generate_numeric_trees.py:119

[nitpick] This message prints len(insertion_sequence) including placeholder entries skipped during insertion. Consider reporting the actual inserted count (count) or clarifying that placeholders are filtered later.

        print(f"Populating index {config.name} ({config.insertion_order} order) with {len(insertion_sequence)} documents...")

sbin/numeric_tree/README.md:10

The README refers to benchmark_numeric_tree.py, but no such script exists in this directory. Update the docs to reference the actual C++ benchmark or provide the missing script.

### 2. `benchmark_numeric_tree.py`

sbin/numeric_tree/visualize_numeric_tree.py:321

The usage message refers to numeric_tree_visualize.py but the script is named visualize_numeric_tree.py. Update the help text to match the actual filename.

        print("  Interactive visualization: python numeric_tree_visualize.py <tree.json> [spacing_factor]")

Copilot · 2025-06-19T07:41:50Z

tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp

+    }
+
+    // Helper function to create intersection iterator with two union children
+    IndexIterator* createIntersectionIterator() {


Memory allocated for child idlist iterators and their ID arrays via rm_malloc is not freed, potentially leaking memory across benchmark runs. Consider freeing these allocations after iterator use.

raz-mon

Nice stuff, I think tools like this can help us a lot in general 💯
Let's have this a parent directory for such tools, where we will put more stuff, e.g., sbin/debug_tools/

raz-mon · 2025-06-19T07:50:47Z

sbin/numeric_tree/generate_numeric_trees.py

+    parser.add_argument('--docs-per-index', '-d', type=int, default=10000,
+                       help='Number of base documents per index (default: 10000)')
+    parser.add_argument('--sparse-size', '-s', type=int, default=100,
+                       help='Sparse size for sparsed insertion (default: 100)')


What does this mean practically?

the gap size between document ids

CLAassistant · 2025-07-03T11:34:33Z

All committers have signed the CLA.

src/profile.c

tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp

codecov · 2025-07-06T10:29:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.38%. Comparing base (612db18) to head (e0c5f86).
Report is 59 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6327      +/-   ##
==========================================
+ Coverage   88.88%   89.38%   +0.49%     
==========================================
  Files         240      252      +12     
  Lines       40454    41681    +1227     
  Branches     3165     3963     +798     
==========================================
+ Hits        35959    37255    +1296     
+ Misses       4460     4369      -91     
- Partials       35       57      +22

Flag	Coverage Δ
flow	`82.00% <100.00%> (-0.46%)`	⬇️
unit	`47.25% <0.00%> (+1.26%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kei-nan added 4 commits June 12, 2025 15:16

initial commit

e10d372

update micro benchmarks

b660e8a

update benchmark

4a65a5e

add scripts to construct and test numeric trees with different docume…

0f92b0c

…nt id spreads

kei-nan self-assigned this Jun 16, 2025

kei-nan requested a review from raz-mon June 16, 2025 12:40

github-actions bot added the size:XL label Jun 16, 2025

kei-nan requested a review from GuyAv46 June 16, 2025 12:40

kei-nan added 2 commits June 17, 2025 10:26

few changes

7b9ebee

run gc, use vtune, more uniform data

e38afae

update script to support reading old dump files

6e232b5

GuyAv46 reviewed Jun 19, 2025

View reviewed changes

align script names, modify a bit of the logic to avoid using GC, in s…

670afff

…parsed don't use real values when incrementing the document id.

raz-mon requested a review from Copilot June 19, 2025 07:39

Copilot AI reviewed Jun 19, 2025

View reviewed changes

raz-mon reviewed Jun 19, 2025

View reviewed changes

kei-nan and others added 2 commits June 23, 2025 18:21

add profile compare script, add profile modifications

1df2927

Fix missing size info

bbb1244

kei-nan changed the title ~~MOD-8808: Add Scripts For Reading And Generating Numeric Trees~~ Add Scripts For Reading And Generating Numeric Trees Jul 3, 2025

kei-nan added 2 commits July 3, 2025 17:06

revert some profile fields

84d7572

remove history size

21743c9

GuyAv46 reviewed Jul 3, 2025

View reviewed changes

src/profile.c Outdated Show resolved Hide resolved

tests/cpptests/micro-benchmarks/benchmark_doc_id_pattern_iteration.cpp Outdated Show resolved Hide resolved

code review comments

ce72f83

kei-nan changed the title ~~Add Scripts For Reading And Generating Numeric Trees~~ MOD-10349: Add Scripts For Reading And Generating Numeric Trees Jul 3, 2025

kei-nan enabled auto-merge July 3, 2025 15:35

GuyAv46 previously approved these changes Jul 3, 2025

View reviewed changes

revert to old behaviour of counting

e0c5f86

kei-nan dismissed GuyAv46’s stale review via e0c5f86 July 6, 2025 09:59

GuyAv46 approved these changes Jul 6, 2025

View reviewed changes

kei-nan added this pull request to the merge queue Jul 6, 2025

Merged via the queue into master with commit cce9400 Jul 6, 2025
15 checks passed

kei-nan deleted the master_jk_read_numeric_tree_from_file branch July 6, 2025 11:41

meiravgri mentioned this pull request Nov 12, 2025

[8.2] [MOD-6056] Rename FT.PROFILE counter fields #7341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MOD-10349: Add Scripts For Reading And Generating Numeric Trees#6327

MOD-10349: Add Scripts For Reading And Generating Numeric Trees#6327
kei-nan merged 14 commits intomasterfrom
master_jk_read_numeric_tree_from_file

kei-nan commented Jun 16, 2025

Uh oh!

GuyAv46 commented Jun 18, 2025

Uh oh!

kei-nan commented Jun 18, 2025 •

edited

Loading

Uh oh!

GuyAv46 left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 19, 2025

Uh oh!

raz-mon left a comment

Uh oh!

raz-mon Jun 19, 2025

Uh oh!

kei-nan Jul 3, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Jul 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kei-nan commented Jun 16, 2025

Mark if applicable

Uh oh!

GuyAv46 commented Jun 18, 2025

Uh oh!

kei-nan commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GuyAv46 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

raz-mon left a comment

Choose a reason for hiding this comment

Uh oh!

raz-mon Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

kei-nan Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kei-nan commented Jun 18, 2025 •

edited

Loading

kei-nan Jul 3, 2025 •

edited

Loading

CLAassistant commented Jul 3, 2025 •

edited

Loading

codecov bot commented Jul 6, 2025 •

edited

Loading