Skip to content

[MOD-12070] Extend indexing metrics#7669

Merged
lerman25 merged 14 commits intomasterfrom
omerl-extend-indexing-metrics
Dec 11, 2025
Merged

[MOD-12070] Extend indexing metrics#7669
lerman25 merged 14 commits intomasterfrom
omerl-extend-indexing-metrics

Conversation

@lerman25
Copy link
Collaborator

@lerman25 lerman25 commented Dec 7, 2025

This PR adds 2 new metrics to INFO command.

total_num_docs_in_indexes - Amount of docs indexed by a RediSearch index.
Note - Each doc is counter per amount of indexes it is indexed by.

total_indexing_ops_<field_type>_fields - Amount of docs indexed because they have a <field_type> field.
Field types include: text, tag, numeric, geo, vector, geometric.


Note

Add global counters for documents indexed per field type and total docs across indexes, update them during indexing, and expose via INFO MODULES with tests.

  • Indexing metrics (core):
    • Update global stats during indexing via FieldsGlobalStats_UpdateFieldDocsIndexed(fs, 1) in src/document.c (full-text preprocessor and bulk index success path).
    • Add new counters to FieldsGlobalStats (text/tag/numeric/geo/geoshape/vector TotalDocsIndexed) and implement updater in src/info/global_stats.c/h.
  • Indexes aggregation:
    • Aggregate total_num_docs_in_indexes in IndexesInfo_TotalInfo() (src/info/indexes_info.c/h).
  • INFO MODULES output:
    • Expose per-field-type totals as search_total_indexing_ops_*_fields and overall search_total_num_docs_in_indexes in src/info/info_redis/info_redis.c.
  • Tests:
    • Add SA tests for total_num_docs_in_indexes and total_indexing_ops_<field_type>_fields, including multi-field docs and JSON multi-value paths (tests/pytests/test_info_modules.py).

Written by Cursor Bugbot for commit 0ad0353. This will update automatically on new commits. Configure here.

@github-actions github-actions bot added the size:M label Dec 7, 2025
@lerman25 lerman25 changed the title Omerl extend indexing metrics [MOD-12070] Extend indexing metrics Dec 7, 2025
@lerman25 lerman25 marked this pull request as ready for review December 7, 2025 14:27
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Geo metric not incremented for FLD_VAR_T_GEO case

When the field's unionType is FLD_VAR_T_GEO, the geoPreprocessor function returns early with REDISMODULE_OK at line 667 without incrementing RSGlobalStats.fieldsStats.geoTotalDocsIndexed. The counter increment at line 732 is only reached when control flow falls through to the end of the function. This means successfully indexed geo documents with FLD_VAR_T_GEO type won't be counted in the metric.

src/document.c#L666-L667

RediSearch/src/document.c

Lines 666 to 667 in a9f365b

}
return REDISMODULE_OK;

Fix in Cursor Fix in Web


@codecov
Copy link

codecov bot commented Dec 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.83%. Comparing base (4f470c6) to head (0ad0353).
⚠️ Report is 11 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7669      +/-   ##
==========================================
- Coverage   84.85%   84.83%   -0.03%     
==========================================
  Files         349      349              
  Lines       54611    54665      +54     
  Branches    14632    14632              
==========================================
+ Hits        46342    46373      +31     
- Misses       8072     8092      +20     
- Partials      197      200       +3     
Flag Coverage Δ
flow 85.25% <100.00%> (-0.09%) ⬇️
unit 52.08% <62.16%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean as always
Some comments

size_t num_active_indexes_indexing; // Number of active write indexes
size_t total_active_write_threads; // Total number of active writes (proportional to the number
// of threads)
size_t total_documents_indexed; // Total number of documents in all indexes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align comment

// of threads)
size_t total_documents_indexed; // Total number of documents in all indexes
size_t total_active_queries; // Total number of active queries (reads)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's avoid

RedisModule_InfoAddFieldULongLong(ctx, "number_of_active_indexes_indexing", total_info->num_active_indexes_indexing);
RedisModule_InfoAddFieldULongLong(ctx, "total_active_write_threads", total_info->total_active_write_threads);
RedisModule_InfoAddFieldDouble(ctx, "total_indexing_time", (float)total_info->indexing_time / (float)CLOCKS_PER_MILLISEC);
RedisModule_InfoAddFieldULongLong(ctx, "total_documents_indexed", total_info->total_documents_indexed);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should name total_documents_indexed_by_*_field and total_documents_indexed in a way that emphasizes that they reflect different values to avoid confusion (the first is an always growing counter, while the second reflects a temporary state)
i suggest:
total_documents_indexed_*_field (removed by)
total_indexes_num_docs

WDYT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point,
what about : total_num_docs_in_indexes
with total_indexes_num_docs it's not clear what to total refers to.
For field - how about : total_indexing_ops_text_fields

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but maybe indexing ops can be translated as deletions as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the op is indexing so I don't think so
We also have the redis docs for clearer explanation


# 1. Test TEXT field indexing
env.expect('FT.CREATE', 'idx_text', 'PREFIX', 1, 'text:', 'SCHEMA', 't', 'TEXT').ok()
waitForIndex(env, 'idx_text')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we waiting for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to stop indexing in bg

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure we have docs indexed at this specific point

metrics = get_field_metrics()
env.assertEqual(metrics['text'], 1, message="After 1 text doc")

conn.execute_command('HSET', 'text:2', 't', 'foo bar')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what value does adding this second doc add?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None, removing

message="Multi-field doc increments text")
env.assertEqual(metrics['tag'], prev_metrics['tag'] + 1,
message="Multi-field doc increments tag")
env.assertEqual(metrics['numeric'], prev_metrics['numeric'] + 1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to test all field types? this is the most interesting test case


# 7. Test double counting with overlapping indexes
# Create another text index that will also match 'text:*' docs
env.expect('FT.CREATE', 'idx_text2', 'PREFIX', 1, 'text:', 'SCHEMA', 't', 'TEXT').ok()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding an index with more than one field of the same type then:

  1. a doc that matches only one of them
  2. a doc that contains both
  3. a doc that contains none (not necessary)

src/document.c Outdated
// //TODO: GEOMETRY
// }
}
RSGlobalStats.fieldsStats.geometryTotalDocsIndexed++;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Geometry metric increments without indexing multi-value fields

In geometryIndexer, when fdata->isMulti is true (multi-value geometry fields), the else branch at lines 568-572 contains only commented-out TODO code - no actual indexing occurs. However, geometryTotalDocsIndexed is incremented unconditionally at line 573 and the function returns success. This causes the metric to count documents as indexed when no geometry indexing actually happened for multi-value cases.

Fix in Cursor Fix in Web

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👆

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder why it doesn't at least returns an error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lerman25 lets add a test for multi value json (simple and short. maybe only a doc having all field types)

src/document.c Outdated
ctx->spec->stats.numRecords += rv.numRecords;
}
}
bool isGeo = (fs->types & INDEXFLD_T_GEO) != 0;
Copy link
Collaborator

@meiravgri meiravgri Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this made me wonder rather we should have the metric update in one location where we iterate over the field types
Document_AddToIndexes for example and use the hash map to detect the field type like in FieldsGlobalStats_UpdateIndexError
because currently it feels kinda random and hard to maintain.
For example, text fields are updated in the preprocessor and numeric in the indexer. It's inconsistent

Copy link
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments
consider to centrelize metric update in one location

case INDEXFLD_T_GEOMETRY:
RSGlobalStats.fieldsStats.geometryTotalDocsIndexed += toAdd;
break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Switch statement fails for multi-type dynamic fields

The FieldsGlobalStats_UpdateFieldDocsIndexed function uses a switch statement with direct equality matching on fs->types. Since FieldType is a bit field where dynamic fields can have multiple types OR'd together (e.g., INDEXFLD_T_FULLTEXT | INDEXFLD_T_TAG), the switch will not match any case for multi-type fields, causing no statistics to be recorded. The existing FieldsGlobalStats_UpdateStats function correctly uses bitwise AND (fs->types & INDEXFLD_T_...) to handle this case.

Fix in Cursor Fix in Web

Copy link
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small comments

size_t numVectorFieldsSvsVamana;
size_t numVectorFieldsSvsVamanaCompressed;
// Total number of documents indexed by each field type
// Indexing documents happens only in the main thread or with the GIL locked.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to the update function (more relevant there)

src/document.c Outdated
// //TODO: GEOMETRY
// }
}
RSGlobalStats.fieldsStats.geometryTotalDocsIndexed++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lerman25 lets add a test for multi value json (simple and short. maybe only a doc having all field types)

case INDEXFLD_T_GEOMETRY:
RSGlobalStats.fieldsStats.geometryTotalDocsIndexed += toAdd;
break;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Switch on bitmask silently ignores multi-type fields

The FieldsGlobalStats_UpdateFieldDocsIndexed function uses a switch statement on fs->types, which is a bitmask where multiple field types can be combined (e.g., INDEXFLD_T_FULLTEXT | INDEXFLD_T_NUMERIC = 0x03). When a field has multiple types (a "dynamic" field), none of the switch cases match and no counter is incremented. The existing FieldsGlobalStats_UpdateStats function correctly uses bitwise AND checks (if (fs->types & INDEXFLD_T_FULLTEXT)) to handle multi-type fields. This causes the total_indexing_ops_<field_type>_fields metrics to silently undercount for dynamic fields.

Fix in Cursor Fix in Web

Copy link
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect

@lerman25 lerman25 marked this pull request as draft December 11, 2025 15:21
@lerman25 lerman25 marked this pull request as ready for review December 11, 2025 15:21
@lerman25 lerman25 enabled auto-merge December 11, 2025 15:22
@lerman25 lerman25 marked this pull request as draft December 11, 2025 20:00
auto-merge was automatically disabled December 11, 2025 20:00

Pull request was converted to draft

@lerman25 lerman25 marked this pull request as ready for review December 11, 2025 20:00
@lerman25 lerman25 enabled auto-merge December 11, 2025 20:00
@lerman25 lerman25 added this pull request to the merge queue Dec 11, 2025
Merged via the queue into master with commit b346977 Dec 11, 2025
52 of 56 checks passed
@lerman25 lerman25 deleted the omerl-extend-indexing-metrics branch December 11, 2025 22:19
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-7669-to-2.8 origin/2.8
cd .worktree/backport-7669-to-2.8
git switch --create backport-7669-to-2.8
git cherry-pick -x b34697777f7555418ac5c99afa5c68106976f07d

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-7669-to-2.10 origin/2.10
cd .worktree/backport-7669-to-2.10
git switch --create backport-7669-to-2.10
git cherry-pick -x b34697777f7555418ac5c99afa5c68106976f07d

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Dec 11, 2025
* Total num indexed metric

* total docs indexed by field type

* test total indexed metic

* test field metric

* format tests

* remove empty line

* remove unecessary wait

* meirav comments

* more comments + change metric name

* expose and test geometry

* Unify metric

* test multi json

(cherry picked from commit b346977)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.2:

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Dec 11, 2025
* Total num indexed metric

* total docs indexed by field type

* test total indexed metic

* test field metric

* format tests

* remove empty line

* remove unecessary wait

* meirav comments

* more comments + change metric name

* expose and test geometry

* Unify metric

* test multi json

(cherry picked from commit b346977)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.4:

github-merge-queue bot pushed a commit that referenced this pull request Dec 12, 2025
[MOD-12070] Extend indexing metrics (#7669)

* Total num indexed metric

* total docs indexed by field type

* test total indexed metic

* test field metric

* format tests

* remove empty line

* remove unecessary wait

* meirav comments

* more comments + change metric name

* expose and test geometry

* Unify metric

* test multi json

(cherry picked from commit b346977)

Co-authored-by: lerman25 <[email protected]>
lerman25 added a commit that referenced this pull request Dec 14, 2025
* Total num indexed metric

* total docs indexed by field type

* test total indexed metic

* test field metric

* format tests

* remove empty line

* remove unecessary wait

* meirav comments

* more comments + change metric name

* expose and test geometry

* Unify metric

* test multi json

(cherry picked from commit b346977)
github-merge-queue bot pushed a commit that referenced this pull request Dec 15, 2025
* [MOD-12070] Extend indexing metrics (#7669)

* Total num indexed metric

* total docs indexed by field type

* test total indexed metic

* test field metric

* format tests

* remove empty line

* remove unecessary wait

* meirav comments

* more comments + change metric name

* expose and test geometry

* Unify metric

* test multi json

(cherry picked from commit b346977)

* remove no json

* remove total indexing time
redisearch-backport-pull-request bot pushed a commit that referenced this pull request Dec 15, 2025
* [MOD-12070] Extend indexing metrics (#7669)

* Total num indexed metric

* total docs indexed by field type

* test total indexed metic

* test field metric

* format tests

* remove empty line

* remove unecessary wait

* meirav comments

* more comments + change metric name

* expose and test geometry

* Unify metric

* test multi json

(cherry picked from commit b346977)

* remove no json

* remove total indexing time

(cherry picked from commit 8ea201f)
github-merge-queue bot pushed a commit that referenced this pull request Dec 15, 2025
[MOD-12070] Extend indexing metrics (#7669)

* Total num indexed metric

* total docs indexed by field type

* test total indexed metic

* test field metric

* format tests

* remove empty line

* remove unecessary wait

* meirav comments

* more comments + change metric name

* expose and test geometry

* Unify metric

* test multi json

(cherry picked from commit b346977)

Co-authored-by: lerman25 <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Dec 15, 2025
[2.10] [MOD-12070] Extend indexing metrics (#7669) (#7766)

* [MOD-12070] Extend indexing metrics (#7669)

* Total num indexed metric

* total docs indexed by field type

* test total indexed metic

* test field metric

* format tests

* remove empty line

* remove unecessary wait

* meirav comments

* more comments + change metric name

* expose and test geometry

* Unify metric

* test multi json

(cherry picked from commit b346977)

* remove no json

* remove total indexing time

(cherry picked from commit 8ea201f)

Co-authored-by: lerman25 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants