-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I am trying to load a parquet dataset, using both a limit and filter. When combining this with the pushdown_filters config, no data is found.
If I either remove the limit or remove pushdown_filters than it works.
To Reproduce
The following code reproduces the issues.
The dataset here is a path in S3, which contains 120 parquets, and each parquet has about 7000-8000 rows.
The specific rows, where the match occurs, must be deep inside the dataset and no the first parquet in the dataset.
let object_store = Arc::new(aws_s3);
let mut config = SessionConfig::new();
config.options_mut().execution.parquet.pushdown_filters = true;
let state = SessionStateBuilder::new().with_config(config).build();
let ctx = SessionContext::from(state);
ctx.register_object_store(object_store_url.as_ref(), object_store.clone());
let mut parquet_options = ParquetReadOptions::new();
parquet_options = parquet_options.parquet_pruning(true);
let mut df = ctx
.read_parquet(path, parquet_options.clone())
.await
.unwrap();
df = df
.filter(col("a").eq(lit(
"23asdas23",
)))
.unwrap();
df = df.limit(0, Some(1)).unwrap();
let batch = df.collect().await.unwrap();
Expected behavior
Expected to apply both limit and filter.
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working