Skip to content

refactor NotIterator - [MOD-9677]#6238

Merged
JoanFM merged 48 commits intomasterfrom
joan-refactor-no-iterator
Jun 19, 2025
Merged

refactor NotIterator - [MOD-9677]#6238
JoanFM merged 48 commits intomasterfrom
joan-refactor-no-iterator

Conversation

@JoanFM
Copy link
Collaborator

@JoanFM JoanFM commented Jun 2, 2025

Describe the changes in the pull request

This will be the continuation of #6307

The PR does:

  • Provide the functionality of the NotIterator
  • Adds testing
  • Adds micro-benchmarks

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

I strongly believe that the OldIterator has a bug.

The Old implementation assumes that there are no DocID in the childIterator that does not exist in the wildcard iterator(the child iterator (the negation functions) seems to be assumed to be a subgroup of the WildCard iterator).

This test fails for OldIterator:

#include "gtest/gtest.h"
#include "redisearch.h"
#include "index.h"
#include "query.h"
#include "micro-benchmarks/deprecated_iterator_util.h"
#include <vector>
#include <algorithm>
#include <iostream>

class NotIteratorOldTest : public ::testing::Test {
protected:
  IndexIterator *iterator_base;
  std::vector<t_docId> childDocIds = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
  std::vector<t_docId> wcDocIds = {3, 4, 9};
  std::vector<t_docId> resultSet;
  t_docId maxDocId = 10;

  void SetUp() override {
    // Create the child iterator with the specified document IDs
    IndexIterator *child = (IndexIterator *)new MockOldIterator(childDocIds);

    // Create the wildcard iterator with the specified document IDs
    IndexIterator *wcii = (IndexIterator *)new MockOldIterator(wcDocIds);

    // Set up the timeout
    struct timespec timeout = {LONG_MAX, 999999999}; // "infinite" timeout

    // Create the NOT iterator with the child and wildcard iterators
    iterator_base = _New_NotIterator_With_WildCardIterator(child, wcii, maxDocId, 1.0, timeout);

    // Compute the expected result set
    // For the optimized NOT iterator, the result set is all documents in the wildcard iterator
    // that are not in the child iterator
    resultSet.clear();
    for (auto wcId : wcDocIds) {
      if (std::find(childDocIds.begin(), childDocIds.end(), wcId) == childDocIds.end()) {
        resultSet.push_back(wcId);
      }
    }

    // Print test setup information
    std::cout << "Child Doc IDs: ";
    for (auto id : childDocIds) {
      std::cout << id << " ";
    }
    std::cout << std::endl;

    std::cout << "Wildcard Doc IDs: ";
    for (auto id : wcDocIds) {
      std::cout << id << " ";
    }
    std::cout << std::endl;

    std::cout << "Expected Result Set: ";
    if (resultSet.empty()) {
      std::cout << "(empty)";
    } else {
      for (auto id : resultSet) {
        std::cout << id << " ";
      }
    }
    std::cout << std::endl;
  }

  void TearDown() override {
    iterator_base->Free(iterator_base);
  }
};

TEST_F(NotIteratorOldTest, ReadOptimized) {
  RSIndexResult *hit = iterator_base->current;
  int rc;

  std::cout << "=== READING RESULTS ===\n";
  size_t count = 0;

  // Read all results from the iterator
  while ((rc = iterator_base->Read(iterator_base->ctx, &hit)) == INDEXREAD_OK) {
    std::cout << "Read result: docId=" << hit->docId << std::endl;
    // Verify the result is in our expected result set
    //ASSERT_EQ(hit->docId, resultSet[count]);
    count++;
  }
  ASSERT_EQ(count, 0);

  // Verify we reached EOF
  ASSERT_EQ(rc, INDEXREAD_EOF);

  // Verify we read the expected number of results
  ASSERT_EQ(count, resultSet.size()) << "Expected to read " << resultSet.size() << " documents";
  std::cout << "Read " << count << " results (expected " << resultSet.size() << ")" << std::endl;

  // Try reading again after EOF
  rc = iterator_base->Read(iterator_base->ctx, &hit);
  ASSERT_EQ(rc, INDEXREAD_EOF);
  std::cout << "=== FINISHED READING ===\n";
}

This seems to be OK for most of the times because the QueryEvalNode filters the Not Iterator conditions before creating the ChildIterator, but there may be some edge cases where this is not true.

This makes the Benchmarks look bad for this NEW implementation.

However, I added a set of tests where the conditions are met (_Subset) in the micro benchmarks, that show that the performance is comparable or slightly better for the new implementation

@JoanFM JoanFM changed the title refactor NotIterator - [MOD- 884 refactor NotIterator - [MOD- 8843] Jun 2, 2025
@JoanFM JoanFM marked this pull request as ready for review June 2, 2025 15:35
@JoanFM JoanFM requested review from BenGoldberger and GuyAv46 June 2, 2025 15:35
@JoanFM JoanFM force-pushed the joan-refactor-no-iterator branch from cf924bd to a3ba5ce Compare June 2, 2025 15:53
@codecov
Copy link

codecov bot commented Jun 2, 2025

Codecov Report

Attention: Patch coverage is 95.16129% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.86%. Comparing base (9b2cdea) to head (5b56b8c).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/iterators/not_iterator.c 95.16% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6238      +/-   ##
==========================================
- Coverage   88.87%   88.86%   -0.01%     
==========================================
  Files         247      247              
  Lines       41213    41273      +60     
  Branches     3483     3483              
==========================================
+ Hits        36629    36679      +50     
- Misses       4541     4551      +10     
  Partials       43       43              
Flag Coverage Δ
flow 81.40% <0.00%> (-0.28%) ⬇️
unit 46.76% <95.16%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JoanFM JoanFM changed the title refactor NotIterator - [MOD- 8843] refactor NotIterator - [MOD-8843 - MOD-8845] Jun 2, 2025
@JoanFM
Copy link
Collaborator Author

JoanFM commented Jun 3, 2025

Putting it as draft until #6240 and #6241 are merged so that this PR is smaller

@JoanFM JoanFM marked this pull request as draft June 3, 2025 07:01
@JoanFM JoanFM changed the title refactor NotIterator - [MOD-8843 - MOD-8845] refactor NotIterator - [MOD-8843] Jun 5, 2025
@JoanFM JoanFM removed the size:XL label Jun 5, 2025
@JoanFM JoanFM marked this pull request as ready for review June 5, 2025 09:56
@JoanFM JoanFM changed the title refactor NotIterator - [MOD-8843] refactor NotIterator - [MOD-9677] Jun 12, 2025
@JoanFM JoanFM marked this pull request as draft June 12, 2025 13:13
@JoanFM JoanFM marked this pull request as ready for review June 13, 2025 09:58
@JoanFM JoanFM marked this pull request as draft June 13, 2025 09:58
@JoanFM JoanFM requested a review from GuyAv46 June 17, 2025 14:19
@JoanFM JoanFM marked this pull request as ready for review June 17, 2025 14:19
@JoanFM JoanFM removed the size:XL label Jun 18, 2025
@JoanFM JoanFM requested a review from GuyAv46 June 19, 2025 06:59
@JoanFM JoanFM requested a review from GuyAv46 June 19, 2025 08:50
@JoanFM JoanFM added this pull request to the merge queue Jun 19, 2025
Merged via the queue into master with commit 57d938d Jun 19, 2025
15 checks passed
@JoanFM JoanFM deleted the joan-refactor-no-iterator branch June 19, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants