Skip to content

Conversation

@haohuaijin
Copy link
Collaborator

@haohuaijin haohuaijin commented Dec 9, 2025

This is a datafusion bug will cause the query error:

SELECT item, max(abc) as i2 from t group by item order by i2 LIMIT 1

abc's data type is string

@haohuaijin haohuaijin requested review from Copilot and hengfeiyang and removed request for Copilot December 9, 2025 04:01
@github-actions github-actions bot added the ☢️ Bug Something isn't working label Dec 9, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

Failed to generate code suggestions for PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 9, 2025

Greptile Overview

Greptile Summary

Disabled DataFusion's top-k aggregation optimizer by setting enable_topk_aggregation to false as a temporary workaround for upstream issue apache/datafusion#19219.

  • Added configuration flag in create_session_config function to disable the optimizer
  • Included reference comment linking to the upstream DataFusion issue for context
  • No other behavioral changes to the query execution path

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is a simple, well-documented configuration flag to disable a problematic optimizer. It's a conservative approach to work around an upstream DataFusion issue, with clear documentation referencing the issue. The change is isolated to a single configuration function and doesn't introduce new logic or modify existing behavior beyond disabling one optimization.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
src/service/search/datafusion/exec.rs 5/5 Added configuration to disable DataFusion top-k aggregation optimizer as temporary workaround for upstream issue

Sequence Diagram

sequenceDiagram
    participant Caller
    participant create_session_config
    participant SessionConfig
    participant DataFusion

    Caller->>create_session_config: create_session_config(sorted_by_time, target_partitions)
    create_session_config->>SessionConfig: Initialize with defaults
    SessionConfig-->>create_session_config: config object
    
    create_session_config->>SessionConfig: Set batch_size, target_partitions, dialect
    create_session_config->>SessionConfig: Configure bloom_filter settings
    
    alt sorted_by_time == true
        create_session_config->>SessionConfig: Enable split_file_groups_by_statistics
    end
    
    Note over create_session_config,SessionConfig: New change in this PR
    create_session_config->>SessionConfig: set_bool("enable_topk_aggregation", false)
    Note over create_session_config,SessionConfig: Workaround for DataFusion issue #19219
    
    create_session_config->>SessionConfig: Enable skip_physical_aggregate_schema_check
    
    create_session_config-->>Caller: Return configured SessionConfig
    Caller->>DataFusion: Use config for query execution
    DataFusion-->>Caller: Execute queries with topk optimization disabled
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@haohuaijin haohuaijin added this to the v0.30.0 milestone Dec 9, 2025
@haohuaijin haohuaijin added the Needs-Testing Needs-Testing label Dec 9, 2025
@haohuaijin haohuaijin merged commit 10f8de1 into branch-v0.30.0 Dec 9, 2025
44 checks passed
@haohuaijin haohuaijin deleted the disable-datafusion-topk-30 branch December 9, 2025 05:31
@Shrinath-O2 Shrinath-O2 added Testing-Completed Testing-Completed and removed Needs-Testing Needs-Testing labels Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working Testing-Completed Testing-Completed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants