Skip to content

Conversation

@yaga-simha
Copy link
Contributor

PR Type

Fix

Description

Query recommendations were adjusted to be based on triggers with retries upon failure. Instead of relying on retry, we are simply skipping this run for the next generation of query_recommendations.

Triggers and retries were used in hopes that the non-ingester node will eventually give up and an ingester node may pick up this trigger. This was a bad idea from the get go. We use ingester_service::ingest now which finds the ingester node to upload to it regardless of where the ingestions request comes from.

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Oct 31, 2025
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Refactored query recommendations to use ingestion_service::ingest instead of direct Ingester gRPC calls, ensuring ingestion requests are properly routed to ingester nodes regardless of where they originate. Removed the retry mechanism from the trigger handler - the job now always schedules the next run regardless of success or failure, preventing issues where non-ingester nodes would retry indefinitely.

Key Changes:

  • Replaced Ingester::ingest() with ingestion_service::ingest() in query_optimization_recommendation.rs:89-91 to properly route to ingester nodes
  • Removed retry logic from handlers.rs that decremented retries and set status to Processing
  • Simplified error handling to always queue next run, preventing the trigger from getting stuck
  • Added consistent [QUERY_RECOMMENDATIONS] prefix to all log messages for better traceability

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-structured and address a real architectural issue. The switch from direct Ingester calls to ingestion_service::ingest ensures proper routing to ingester nodes. Removing the retry mechanism simplifies the logic and prevents the system from getting stuck. The changes maintain existing error logging while ensuring the scheduler continues to function. All modifications are focused and don't introduce new complexity or potential bugs.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
src/main.rs 5/5 Added logging statement to track initial query recommendations trigger setup
src/service/alerts/scheduler/handlers.rs 5/5 Removed retry mechanism and simplified error handling to always queue next run regardless of success/failure
src/service/alerts/scheduler/query_optimization_recommendation.rs 5/5 Replaced direct gRPC Ingester call with ingestion_service::ingest to route to ingester node

Sequence Diagram

sequenceDiagram
    participant Scheduler
    participant Handler as Query Recommendations Handler
    participant Service as QueryRecommendationService
    participant IngestService as ingestion_service
    participant Ingester as Ingester Node
    participant DB as Trigger Database

    Scheduler->>Handler: Trigger query_recommendations job
    Note over Handler: No retry decrement<br/>(removed in this PR)
    Handler->>Service: run()
    Service->>Service: Analyze queries & generate recommendations
    Service->>IngestService: ingest(recommendations)
    Note over IngestService: NEW: Routes to ingester node<br/>instead of local call
    IngestService->>Ingester: Forward request to ingester node
    Ingester-->>IngestService: IngestionResponse
    IngestService-->>Service: Result
    Service-->>Handler: Result (success or error)
    Handler->>DB: update_trigger(status=Waiting, retries=3, next_run_at)
    Note over Handler: Always schedules next run<br/>regardless of result
    Handler->>Handler: Log error if failed (but don't propagate)
    Handler-->>Scheduler: Ok(())
Loading

3 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@testdino-playwright-reporter
Copy link

⚠️ Test Run Unstable


Author: yaga-simha | Branch: fix/query_recommendations | Commit: 4326350

Testdino Test Results

Status Total Passed Failed Skipped Flaky Pass Rate Duration
All tests passed 366 340 0 19 7 93% 5m 42s

View Detailed Results

@yaga-simha yaga-simha merged commit 321bc4c into main Oct 31, 2025
33 of 34 checks passed
@yaga-simha yaga-simha deleted the fix/query_recommendations branch October 31, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants