Skip to content

Enable Consistent Data Push on Ingestion Jobs for REFRESH use case (standalone only) #9268

@yuanbenson

Description

@yuanbenson

Consistent data push protocol APIs are available via controller REST APIs such as startReplaceSegments, endReplaceSegments, and revertReplaceSegments. However, previously, ingestion jobs are not wired to use this feature.

Introduce a new boolean consistentDataPush in TableConfig->ingestionConfig->batchIngestionConfig that when enabled, supports batch ingestion in REFRESH mode to run in consistent data push mode.

Consistent push goal: supports atomic switching (on broker level) between data snapshots and eliminate the time period where the query is getting computed from inconsistent data mixed from existing and new data. Moreover, we aim to provide an easy way to rollback to the previous data in case of the bad data push.

See #7813 for more details.

Some tasks breakdown associated with this issue:

  1. Improve test coverage for pinot-batch-ingestion-standalone jobs to cover SegmentMetadataPushJobRunner,
    SegmentTarPushJobRunner and SegmentUriPushJobRunner.
  2. Refactor the common logics out of all pushJobRunner(s) into a new abstract class BaseSegmentPushJobRunner.
  3. Main change on enabling consistent data push on ingestion jobs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions