-
Notifications
You must be signed in to change notification settings - Fork 715
feat: secondary index #3870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: secondary index #3870
Conversation
WalkthroughThe recent changes introduced new features and made several updates throughout the codebase. Key updates include dependency version bumps, enhanced configuration handling, additional fields in data structures, and modifications to indexing and schema processing. Some functions and configurations were refactored to handle the new inverted indexes and segment IDs more efficiently. These enhancements collectively aim to improve performance, flexibility, and maintainability. Changes
Sequence Diagram(s)New Flow with Segment IDs and Inverted IndexessequenceDiagram
participant User
participant HTTPHandler
participant FileService
participant SchemaService
participant IngestService
User->>HTTPHandler: Make Request
HTTPHandler->>FileService: getFileMeta(min_field, max_field)
FileService->>SchemaService: getSchemaSettings(schema)
SchemaService-->>FileService: Return Settings
FileService-->>HTTPHandler: Return File Meta
HTTPHandler-->>User: Send Response
User->>IngestService: Send Data
IngestService->>SchemaService: getStreamSettings(stream_type)
SchemaService-->>IngestService: Return Stream Settings
IngestService->>IngestService: pop_time_range(batch, min_field, max_field)
IngestService-->>User: Acknowledge Data Ingestion
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
New ENV
ZO_ENABLE_INVERTED_INDEX=true: enabled inverted index by default.ZO_COMPACT_STRATEGY=file_time: file merge strategy, support:file_sizeandfile_time, default isfile_time.ZO_FEATURE_QUERY_WITHOUT_INDEX=false: for debugging, we can set it totrueZO_INVERTED_INDEX_OLD_FORMAT=false: enable it will use inverted index v1 stream name format.API changes:
PUT /api/{org}/streams/{stream}/settings{ "partition_keys": [], "full_text_search_keys": [ "content" ], "index_fields": [ "f3" ], "bloom_filter_fields": [], "defined_schema_fields": [], "data_retention": 3000 }We added
index_fieldsas the secondary index fields.TODO
secondary indexon stream setting page.It is same to
inverted index, but you can't both chooseinverted indexandsecondary index, others can multiple select.How it works
with
secondary indexand enabledZO_ENABLE_INVERTED_INDEX=truewe will build an extra index file for logs stream, then when we search forwhere idx_field='abc'we will use index to accelerate the query.Limitation
data_type: Utf8Summary by CodeRabbit
Dependency Updates
datafusion,arrow,arrow-json,arrow-schema, andparquet.New Features
Binarydata type conversion from hex strings to binary data.segment_idsin various components.Improved Error Handling
Performance Improvements
Bug Fixes
Tests