Skip to content

Conversation

@haohuaijin
Copy link
Collaborator

@haohuaijin haohuaijin commented Nov 24, 2025

  • use datafusion DefaultSchemaAdapterFactory to do schema evolution
  • use ListingAdapter to do index fallback logical
  • tantivy index -> ListingAdapter
  • repartition file distrubute for sort by desc optimizer -> split_groups_by_statistics_with_target_partitions -> but need add sort back when the file group greater than target_partition
  • file metadata error log line -> CachedParquetFileReaderFactory
  • disable repartition file group -> do not need
  • reduce copy for bitvec

@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

@haohuaijin haohuaijin added this to the Backlog milestone Nov 24, 2025
haohuaijin and others added 6 commits November 24, 2025 17:31
Auto-generated translation updates from English source file.

🤖 Generated with automated translation workflow
…openobserve into datafusion-schema-evolution
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the search infrastructure to use DataFusion's native ListingTable instead of a custom implementation. The PR removes the manual schema version grouping logic and delegates schema evolution to DataFusion's DefaultSchemaAdapterFactory. Key architectural changes include:

  • Replacing custom NewListingTable with a new ListingTableAdapter wrapper
  • Using DataFusion's native schema evolution capabilities
  • Simplifying file handling by removing schema version-based file grouping
  • Optimizing memory usage by using Arc<BitVec> for segment IDs instead of cloning

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/service/search/grpc/wal.rs Removed schema version grouping logic; simplified to single table creation
src/service/search/grpc/storage.rs Removed schema version handling; streamlined file processing
src/service/search/grpc/utils.rs Deleted file containing UTF8 view schema conversion utility
src/service/search/grpc/mod.rs Removed utils module reference
src/service/search/datafusion/table_provider/parquet_reader.rs Deleted custom parquet reader implementation
src/service/search/datafusion/table_provider/mod.rs Removed custom NewListingTable implementation and related helper functions
src/service/search/datafusion/table_provider/listing_adapter.rs Added new adapter wrapping DataFusion's ListingTable with index condition handling
src/service/search/datafusion/table_provider/helpers.rs Cleaned up by removing unused helper functions
src/service/search/datafusion/table_provider/memtable.rs Updated imports to use helpers from new location
src/service/search/datafusion/storage/file_list.rs Changed to use Arc<BitVec> to avoid cloning segment IDs
src/service/search/datafusion/mod.rs Removed file_type module reference
src/service/search/datafusion/file_type.rs Deleted custom file type enum
src/service/search/datafusion/exec.rs Simplified TableBuilder by removing schema diff rules and optimization flags
src/service/promql/search/grpc/storage.rs Updated to use new simplified register_table signature
src/service/file_list_dump.rs Changed signature to pass files by value
src/service/compact/merge.rs Removed schema diff generation logic; simplified to single table creation
src/job/files/parquet.rs Updated to use simplified TableBuilder API
src/config/src/meta/stream.rs Changed segment_ids field to use Arc<BitVec>
web/src/locales/languages/*.json Added translations for new UI features (insights, alerts, pipeline types)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hengfeiyang hengfeiyang merged commit 2d1cec8 into main Nov 27, 2025
13 of 14 checks passed
@hengfeiyang hengfeiyang deleted the datafusion-schema-evolution branch November 27, 2025 08:46
@haohuaijin haohuaijin modified the milestones: Backlog, v0.30.0 Dec 1, 2025
@Shrinath-O2 Shrinath-O2 removed the Needs-Testing Needs-Testing label Dec 12, 2025
@Shrinath-O2 Shrinath-O2 added the Testing-Completed Testing-Completed label Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Testing-Completed Testing-Completed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants