Skip to content

[Proposal] Pinot Multistage Engine Lite Mode #14640

@ankitsultana

Description

@ankitsultana

Note: We will be sharing a design doc soon. We are working on testing this out in one of our clusters via a prototype to get a sense of the scalability characteristics of this approach.

Overview

Earlier today, we released an Engineering Blog on our use of Neutrino at Uber, and how that has helped us serve complex queries that can't be served by the V1 Engine at 100 and even 1000+ QPS.

We want to bring the same approach to Pinot's Multistage Engine, via a new mode which I am calling the "Express Mode" right now.

The idea is that instead of relying on shuffles, you try to run the maximal sub-plan that you can independently in the servers, and run the remaining plan in the broker.

Example: for a query plan such as follows, which can be common for window function queries that leverage an aggregation after the window function, with the express mode, Pinot will run as much of the plan as it can in the servers without any shuffles with the remaining plan being run in the broker.

So in the simple case, we would run the Leaf stage in the servers, and the rest of the plan in the broker. If we are able to support auto-colocation and the data is partitioned by the partition-key of the window function, then we may be able to run Agg > Filter > Window > Sort Exch. > Leaf independently in the servers and run just the final aggregation in the broker.

sample-window-fn-plan

Benefits

The current Multistage Engine enables Pinot to process a large amount of data in really complex queries. The goal of the Express Mode is to support relatively simpler queries, that process a relatively smaller amount of data, at lower latencies and higher QPS.

Challenges

There are several challenges in supporting something like this, and we outline some of them below (will be discussed in detail in the design doc):

  • We want to rely as little as possible on Query Hints. Perhaps a single SET statement option should be all that's required to enable express mode, with a broker/server level config to set the default mode.
  • Since our goal is to avoid processing excessively large data, we need to find a way to limit the amount of data processed and avoid expensive queries. There are several different approaches for this: limiting the data returned by the Leaf stage, limiting the data returned by the servers, etc. A design doc is a better medium to discuss all of these in detail, but our guiding principle would be to avoid new configs and making the semantics as intuitive as possible for users to understand. As our blog calls out, this is one of the major limitations of our Neutrino based approach.
  • From a product perspective and even from a technical perspective, this mode should sit cohesively with the rest of the features in Pinot. The last thing we would like is to complicate the design and the offerings further.

Metadata

Metadata

Assignees

No one assigned

    Labels

    multi-stageRelated to the multi-stage query engine

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions