Skip to content

Limit together with pushdown_filters #13745

@bchalk101

Description

@bchalk101

Describe the bug

I am trying to load a parquet dataset, using both a limit and filter. When combining this with the pushdown_filters config, no data is found.
If I either remove the limit or remove pushdown_filters than it works.

To Reproduce

The following code reproduces the issues.
The dataset here is a path in S3, which contains 120 parquets, and each parquet has about 7000-8000 rows.

The specific rows, where the match occurs, must be deep inside the dataset and no the first parquet in the dataset.

    let object_store = Arc::new(aws_s3);
    let mut config = SessionConfig::new();
    config.options_mut().execution.parquet.pushdown_filters = true;

    let state = SessionStateBuilder::new().with_config(config).build();
    let ctx = SessionContext::from(state);
    ctx.register_object_store(object_store_url.as_ref(), object_store.clone());

    let mut parquet_options = ParquetReadOptions::new();
    parquet_options = parquet_options.parquet_pruning(true);
    let mut df = ctx
        .read_parquet(path, parquet_options.clone())
        .await
        .unwrap();

    df = df
        .filter(col("a").eq(lit(
            "23asdas23",
        )))
        .unwrap();
    df = df.limit(0, Some(1)).unwrap();
    let batch = df.collect().await.unwrap();

Expected behavior

Expected to apply both limit and filter.

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions