[MOD-12070] Extend indexing metrics#7669
Conversation
There was a problem hiding this comment.
Bug: Geo metric not incremented for FLD_VAR_T_GEO case
When the field's unionType is FLD_VAR_T_GEO, the geoPreprocessor function returns early with REDISMODULE_OK at line 667 without incrementing RSGlobalStats.fieldsStats.geoTotalDocsIndexed. The counter increment at line 732 is only reached when control flow falls through to the end of the function. This means successfully indexed geo documents with FLD_VAR_T_GEO type won't be counted in the metric.
src/document.c#L666-L667
Lines 666 to 667 in a9f365b
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7669 +/- ##
==========================================
- Coverage 84.85% 84.83% -0.03%
==========================================
Files 349 349
Lines 54611 54665 +54
Branches 14632 14632
==========================================
+ Hits 46342 46373 +31
- Misses 8072 8092 +20
- Partials 197 200 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
meiravgri
left a comment
There was a problem hiding this comment.
Nice and clean as always
Some comments
src/info/indexes_info.h
Outdated
| size_t num_active_indexes_indexing; // Number of active write indexes | ||
| size_t total_active_write_threads; // Total number of active writes (proportional to the number | ||
| // of threads) | ||
| size_t total_documents_indexed; // Total number of documents in all indexes |
src/info/indexes_info.h
Outdated
| // of threads) | ||
| size_t total_documents_indexed; // Total number of documents in all indexes | ||
| size_t total_active_queries; // Total number of active queries (reads) | ||
|
|
src/info/info_redis/info_redis.c
Outdated
| RedisModule_InfoAddFieldULongLong(ctx, "number_of_active_indexes_indexing", total_info->num_active_indexes_indexing); | ||
| RedisModule_InfoAddFieldULongLong(ctx, "total_active_write_threads", total_info->total_active_write_threads); | ||
| RedisModule_InfoAddFieldDouble(ctx, "total_indexing_time", (float)total_info->indexing_time / (float)CLOCKS_PER_MILLISEC); | ||
| RedisModule_InfoAddFieldULongLong(ctx, "total_documents_indexed", total_info->total_documents_indexed); |
There was a problem hiding this comment.
I think we should name total_documents_indexed_by_*_field and total_documents_indexed in a way that emphasizes that they reflect different values to avoid confusion (the first is an always growing counter, while the second reflects a temporary state)
i suggest:
total_documents_indexed_*_field (removed by)
total_indexes_num_docs
WDYT
There was a problem hiding this comment.
I see your point,
what about : total_num_docs_in_indexes
with total_indexes_num_docs it's not clear what to total refers to.
For field - how about : total_indexing_ops_text_fields
There was a problem hiding this comment.
but maybe indexing ops can be translated as deletions as well?
There was a problem hiding this comment.
the op is indexing so I don't think so
We also have the redis docs for clearer explanation
tests/pytests/test_info_modules.py
Outdated
|
|
||
| # 1. Test TEXT field indexing | ||
| env.expect('FT.CREATE', 'idx_text', 'PREFIX', 1, 'text:', 'SCHEMA', 't', 'TEXT').ok() | ||
| waitForIndex(env, 'idx_text') |
There was a problem hiding this comment.
what are we waiting for?
There was a problem hiding this comment.
to stop indexing in bg
There was a problem hiding this comment.
are you sure we have docs indexed at this specific point
tests/pytests/test_info_modules.py
Outdated
| metrics = get_field_metrics() | ||
| env.assertEqual(metrics['text'], 1, message="After 1 text doc") | ||
|
|
||
| conn.execute_command('HSET', 'text:2', 't', 'foo bar') |
There was a problem hiding this comment.
what value does adding this second doc add?
| message="Multi-field doc increments text") | ||
| env.assertEqual(metrics['tag'], prev_metrics['tag'] + 1, | ||
| message="Multi-field doc increments tag") | ||
| env.assertEqual(metrics['numeric'], prev_metrics['numeric'] + 1, |
There was a problem hiding this comment.
Why not to test all field types? this is the most interesting test case
|
|
||
| # 7. Test double counting with overlapping indexes | ||
| # Create another text index that will also match 'text:*' docs | ||
| env.expect('FT.CREATE', 'idx_text2', 'PREFIX', 1, 'text:', 'SCHEMA', 't', 'TEXT').ok() |
There was a problem hiding this comment.
consider adding an index with more than one field of the same type then:
- a doc that matches only one of them
- a doc that contains both
- a doc that contains none (not necessary)
src/document.c
Outdated
| // //TODO: GEOMETRY | ||
| // } | ||
| } | ||
| RSGlobalStats.fieldsStats.geometryTotalDocsIndexed++; |
There was a problem hiding this comment.
Bug: Geometry metric increments without indexing multi-value fields
In geometryIndexer, when fdata->isMulti is true (multi-value geometry fields), the else branch at lines 568-572 contains only commented-out TODO code - no actual indexing occurs. However, geometryTotalDocsIndexed is incremented unconditionally at line 573 and the function returns success. This causes the metric to count documents as indexed when no geometry indexing actually happened for multi-value cases.
There was a problem hiding this comment.
wonder why it doesn't at least returns an error
There was a problem hiding this comment.
@lerman25 lets add a test for multi value json (simple and short. maybe only a doc having all field types)
src/document.c
Outdated
| ctx->spec->stats.numRecords += rv.numRecords; | ||
| } | ||
| } | ||
| bool isGeo = (fs->types & INDEXFLD_T_GEO) != 0; |
There was a problem hiding this comment.
this made me wonder rather we should have the metric update in one location where we iterate over the field types
Document_AddToIndexes for example and use the hash map to detect the field type like in FieldsGlobalStats_UpdateIndexError
because currently it feels kinda random and hard to maintain.
For example, text fields are updated in the preprocessor and numeric in the indexer. It's inconsistent
meiravgri
left a comment
There was a problem hiding this comment.
Few comments
consider to centrelize metric update in one location
| case INDEXFLD_T_GEOMETRY: | ||
| RSGlobalStats.fieldsStats.geometryTotalDocsIndexed += toAdd; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Bug: Switch statement fails for multi-type dynamic fields
The FieldsGlobalStats_UpdateFieldDocsIndexed function uses a switch statement with direct equality matching on fs->types. Since FieldType is a bit field where dynamic fields can have multiple types OR'd together (e.g., INDEXFLD_T_FULLTEXT | INDEXFLD_T_TAG), the switch will not match any case for multi-type fields, causing no statistics to be recorded. The existing FieldsGlobalStats_UpdateStats function correctly uses bitwise AND (fs->types & INDEXFLD_T_...) to handle this case.
src/info/global_stats.h
Outdated
| size_t numVectorFieldsSvsVamana; | ||
| size_t numVectorFieldsSvsVamanaCompressed; | ||
| // Total number of documents indexed by each field type | ||
| // Indexing documents happens only in the main thread or with the GIL locked. |
There was a problem hiding this comment.
moved to the update function (more relevant there)
src/document.c
Outdated
| // //TODO: GEOMETRY | ||
| // } | ||
| } | ||
| RSGlobalStats.fieldsStats.geometryTotalDocsIndexed++; |
There was a problem hiding this comment.
@lerman25 lets add a test for multi value json (simple and short. maybe only a doc having all field types)
| case INDEXFLD_T_GEOMETRY: | ||
| RSGlobalStats.fieldsStats.geometryTotalDocsIndexed += toAdd; | ||
| break; | ||
| } |
There was a problem hiding this comment.
Bug: Switch on bitmask silently ignores multi-type fields
The FieldsGlobalStats_UpdateFieldDocsIndexed function uses a switch statement on fs->types, which is a bitmask where multiple field types can be combined (e.g., INDEXFLD_T_FULLTEXT | INDEXFLD_T_NUMERIC = 0x03). When a field has multiple types (a "dynamic" field), none of the switch cases match and no counter is incremented. The existing FieldsGlobalStats_UpdateStats function correctly uses bitwise AND checks (if (fs->types & INDEXFLD_T_FULLTEXT)) to handle multi-type fields. This causes the total_indexing_ops_<field_type>_fields metrics to silently undercount for dynamic fields.
Pull request was converted to draft
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.8
git worktree add -d .worktree/backport-7669-to-2.8 origin/2.8
cd .worktree/backport-7669-to-2.8
git switch --create backport-7669-to-2.8
git cherry-pick -x b34697777f7555418ac5c99afa5c68106976f07d |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.10
git worktree add -d .worktree/backport-7669-to-2.10 origin/2.10
cd .worktree/backport-7669-to-2.10
git switch --create backport-7669-to-2.10
git cherry-pick -x b34697777f7555418ac5c99afa5c68106976f07d |
* Total num indexed metric * total docs indexed by field type * test total indexed metic * test field metric * format tests * remove empty line * remove unecessary wait * meirav comments * more comments + change metric name * expose and test geometry * Unify metric * test multi json (cherry picked from commit b346977)
|
Successfully created backport PR for |
* Total num indexed metric * total docs indexed by field type * test total indexed metic * test field metric * format tests * remove empty line * remove unecessary wait * meirav comments * more comments + change metric name * expose and test geometry * Unify metric * test multi json (cherry picked from commit b346977)
|
Successfully created backport PR for |
[MOD-12070] Extend indexing metrics (#7669) * Total num indexed metric * total docs indexed by field type * test total indexed metic * test field metric * format tests * remove empty line * remove unecessary wait * meirav comments * more comments + change metric name * expose and test geometry * Unify metric * test multi json (cherry picked from commit b346977) Co-authored-by: lerman25 <[email protected]>
* Total num indexed metric * total docs indexed by field type * test total indexed metic * test field metric * format tests * remove empty line * remove unecessary wait * meirav comments * more comments + change metric name * expose and test geometry * Unify metric * test multi json (cherry picked from commit b346977)
* [MOD-12070] Extend indexing metrics (#7669) * Total num indexed metric * total docs indexed by field type * test total indexed metic * test field metric * format tests * remove empty line * remove unecessary wait * meirav comments * more comments + change metric name * expose and test geometry * Unify metric * test multi json (cherry picked from commit b346977) * remove no json * remove total indexing time
* [MOD-12070] Extend indexing metrics (#7669) * Total num indexed metric * total docs indexed by field type * test total indexed metic * test field metric * format tests * remove empty line * remove unecessary wait * meirav comments * more comments + change metric name * expose and test geometry * Unify metric * test multi json (cherry picked from commit b346977) * remove no json * remove total indexing time (cherry picked from commit 8ea201f)
[MOD-12070] Extend indexing metrics (#7669) * Total num indexed metric * total docs indexed by field type * test total indexed metic * test field metric * format tests * remove empty line * remove unecessary wait * meirav comments * more comments + change metric name * expose and test geometry * Unify metric * test multi json (cherry picked from commit b346977) Co-authored-by: lerman25 <[email protected]>
[2.10] [MOD-12070] Extend indexing metrics (#7669) (#7766) * [MOD-12070] Extend indexing metrics (#7669) * Total num indexed metric * total docs indexed by field type * test total indexed metic * test field metric * format tests * remove empty line * remove unecessary wait * meirav comments * more comments + change metric name * expose and test geometry * Unify metric * test multi json (cherry picked from commit b346977) * remove no json * remove total indexing time (cherry picked from commit 8ea201f) Co-authored-by: lerman25 <[email protected]>
This PR adds 2 new metrics to
INFOcommand.total_num_docs_in_indexes- Amount of docs indexed by a RediSearch index.Note - Each doc is counter per amount of indexes it is indexed by.
total_indexing_ops_<field_type>_fields- Amount of docs indexed because they have a<field_type>field.Field types include:
text,tag,numeric,geo,vector,geometric.Note
Add global counters for documents indexed per field type and total docs across indexes, update them during indexing, and expose via INFO MODULES with tests.
FieldsGlobalStats_UpdateFieldDocsIndexed(fs, 1)insrc/document.c(full-text preprocessor and bulk index success path).FieldsGlobalStats(text/tag/numeric/geo/geoshape/vector TotalDocsIndexed) and implement updater insrc/info/global_stats.c/h.total_num_docs_in_indexesinIndexesInfo_TotalInfo()(src/info/indexes_info.c/h).search_total_indexing_ops_*_fieldsand overallsearch_total_num_docs_in_indexesinsrc/info/info_redis/info_redis.c.total_num_docs_in_indexesandtotal_indexing_ops_<field_type>_fields, including multi-field docs and JSON multi-value paths (tests/pytests/test_info_modules.py).Written by Cursor Bugbot for commit 0ad0353. This will update automatically on new commits. Configure here.