Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

@hengfeiyang hengfeiyang commented Sep 4, 2025

PR Type

Bug fix, Enhancement

close #8264


Description

  • Fix: correct serialize field name to enable_distinct_fields

  • Add explicit Default for StreamSettings

  • Change default: enable_distinct_fields now true

  • Change default: disable utf8_view_enabled


Diagram Walkthrough

flowchart LR
  A["config.rs Common"] -- "default=false" --> B["utf8_view_enabled"]
  C["StreamSettings struct"] -- "serde defaults + impl Default" --> D["stable default values"]
  C -- "serialize field name" --> E["enable_distinct_fields"]
Loading

File Walkthrough

Relevant files
Enhancement
config.rs
Disable UTF-8 view by default                                                       

src/config/src/config.rs

  • Set ZO_UTF8_VIEW_ENABLED default to false
  • Keep env var and field utf8_view_enabled unchanged otherwise
+1/-1     
Bug fix
stream.rs
StreamSettings defaults and serialization fixes                   

src/config/src/meta/stream.rs

  • Replace skip-serializing hints with serde(default) on fields
  • Add Default impl; set enable_distinct_fields true
  • Fix serializer key to enable_distinct_fields
  • Remove derive Default in favor of manual impl
+33/-9   

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Sep 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Backward Compatibility

Changing defaults (e.g., enabling enable_distinct_fields by default and disabling utf8_view_enabled) can alter behavior for existing deployments. Validate migration impact, config overrides, and serialization/deserialization with older stored configs.

impl Default for StreamSettings {
    fn default() -> Self {
        Self {
            partition_time_level: None,
            partition_keys: Vec::new(),
            full_text_search_keys: Vec::new(),
            index_fields: Vec::new(),
            bloom_filter_fields: Vec::new(),
            data_retention: 0,
            flatten_level: None,
            defined_schema_fields: Vec::new(),
            max_query_range: 0,
            store_original_data: false,
            approx_partition: false,
            distinct_value_fields: Vec::new(),
            index_updated_at: 0,
            extended_retention_days: Vec::new(),
            index_original_data: false,
            index_all_values: false,
            enable_distinct_fields: true,
        }
    }
}
Serialization Consistency

The struct now uses custom Serialize plus many serde defaults. Confirm that fields previously omitted with skip_serializing_if are still intentionally emitted or omitted, and that consumers tolerate always-present empty arrays and None fields now that defaults are used.

#[derive(Clone, Debug, Deserialize, ToSchema, PartialEq)]
pub struct StreamSettings {
    #[serde(default)]
    pub partition_time_level: Option<PartitionTimeLevel>,
    #[serde(default)]
    pub partition_keys: Vec<StreamPartition>,
    #[serde(default)]
    pub full_text_search_keys: Vec<String>,
    #[serde(default)]
    pub index_fields: Vec<String>,
    #[serde(default)]
    pub bloom_filter_fields: Vec<String>,
    #[serde(default)]
    pub data_retention: i64,
    #[serde(default)]
    pub flatten_level: Option<i64>,
    #[serde(default)]
    pub defined_schema_fields: Vec<String>,
    #[serde(default)]
    pub max_query_range: i64, // hours
    #[serde(default)]
    pub store_original_data: bool,
    #[serde(default)]
    pub approx_partition: bool,
    #[serde(default)]
    pub distinct_value_fields: Vec<DistinctField>,
    #[serde(default)]
    pub index_updated_at: i64,
    #[serde(default)]
    pub extended_retention_days: Vec<TimeRange>,
    #[serde(default)]
    pub index_original_data: bool,
    #[serde(default)]
    pub index_all_values: bool,
    #[serde(default)]
    pub enable_distinct_fields: bool,
}
Config Default Change

ZO_UTF8_VIEW_ENABLED default flipped to false. Ensure this aligns with product expectations and that documentation and environment defaults are updated accordingly.

#[env_config(name = "ZO_UTF8_VIEW_ENABLED", default = false)]
pub utf8_view_enabled: bool,

@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Align serde default with struct default

Add serde default for enable_distinct_fields to align deserialization with the new
Default implementation. Without it, missing fields in incoming JSON will deserialize
to false, diverging from the struct's Default of true.

src/config/src/meta/stream.rs [783-819]

     #[derive(Clone, Debug, Deserialize, ToSchema, PartialEq)]
     pub struct StreamSettings {
         #[serde(default)]
         pub partition_time_level: Option<PartitionTimeLevel>,
         #[serde(default)]
         pub partition_keys: Vec<StreamPartition>,
         #[serde(default)]
         pub full_text_search_keys: Vec<String>,
         #[serde(default)]
         pub index_fields: Vec<String>,
         #[serde(default)]
         pub bloom_filter_fields: Vec<String>,
         #[serde(default)]
         pub data_retention: i64,
         #[serde(default)]
         pub flatten_level: Option<i64>,
         #[serde(default)]
         pub defined_schema_fields: Vec<String>,
         #[serde(default)]
         pub max_query_range: i64, // hours
         #[serde(default)]
         pub store_original_data: bool,
         #[serde(default)]
         pub approx_partition: bool,
         #[serde(default)]
         pub distinct_value_fields: Vec<DistinctField>,
         #[serde(default)]
         pub index_updated_at: i64,
         #[serde(default)]
         pub extended_retention_days: Vec<i64>,
         #[serde(default)]
         pub index_original_data: bool,
         #[serde(default)]
         pub index_all_values: bool,
-        #[serde(default)]
+        #[serde(default = "StreamSettings::default_enable_distinct_fields")]
         pub enable_distinct_fields: bool,
     }
+    
+    impl StreamSettings {
+        fn default_enable_distinct_fields() -> bool { true }
+    }
Suggestion importance[1-10]: 8

__

Why: Ensuring enable_distinct_fields deserializes to true when omitted aligns with the new Default implementation and avoids subtle behavior changes; the code maps correctly to the struct definition in the new hunk.

Medium
Preserve backward compatibility keys

Ensure backward compatibility for existing clients expecting
disable_distinct_fields. Serialize both keys, keeping disable_distinct_fields as the
negation of the new flag until clients migrate. This avoids breaking deserialization
for older versions.

src/config/src/meta/stream.rs [864-872]

     impl Serialize for StreamSettings {
         fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
         where
             S: Serializer,
         {
-            let mut state = serializer.serialize_struct("StreamSettings", 23)?;
+            let mut state = serializer.serialize_struct("StreamSettings", 24)?;
             ...
--           state.serialize_field("disable_distinct_fields", &self.enable_distinct_fields)?;
-    +       state.serialize_field("enable_distinct_fields", &self.enable_distinct_fields)?;
+            state.serialize_field("enable_distinct_fields", &self.enable_distinct_fields)?;
+            // Backward compatibility: keep legacy key as negation
+            state.serialize_field("disable_distinct_fields", &(!self.enable_distinct_fields))?;
     
             if !self.defined_schema_fields.is_empty() {
                 let mut fields = self.defined_schema_fields.clone();
                 fields.sort_unstable();
Suggestion importance[1-10]: 7

__

Why: Re-adding a legacy disable_distinct_fields key alongside enable_distinct_fields can prevent breaking older clients; the proposal is technically sound and directly matches the changed serialization block, but it's a transitional concern rather than a critical bug fix.

Medium
General
Confirm risky default change

Verify this default flip is intentional; changing a public config default can
silently disable existing behavior. If needed, gate it behind a migration flag or
add a deprecation note to avoid unexpected runtime changes.

src/config/src/config.rs [1110-1111]

-    #[env_config(name = "ZO_UTF8_VIEW_ENABLED", default = false)]
+    #[env_config(name = "ZO_UTF8_VIEW_ENABLED", default = false /* TODO: confirm intentional default change; consider migration notice */)]
     pub utf8_view_enabled: bool,
Suggestion importance[1-10]: 5

__

Why: Asking to verify a default flip is reasonable risk mitigation for a public config change and matches the exact lines, but it’s advisory and not a concrete code fix, so moderate impact.

Low

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR makes two configuration changes related to default values for stream functionality:

  1. Stream Settings Default Behavior: The main change modifies the StreamSettings struct in src/config/src/meta/stream.rs to set enable_distinct_fields to default to true instead of false. This involves removing the Default derive and implementing a custom Default trait. The change also simplifies serde attributes by removing skip_serializing_if conditions and fixes a serialization bug where 'disable_distinct_fields' was incorrectly used instead of 'enable_distinct_fields'.

  2. UTF8 View Configuration: A secondary change in src/config/src/config.rs sets the default value for utf8_view_enabled from true to false, making this feature opt-in rather than enabled by default.

The changes integrate with OpenObserve's stream management system where StreamSettings controls how streams handle distinct field tracking - a feature used for optimizing queries and managing field cardinality. The distinct fields functionality helps with query performance by tracking unique field values. The UTF8 view feature relates to how string data is processed and stored internally.

Both changes represent a shift toward more conservative defaults - enabling distinct field tracking (which is likely beneficial for most use cases) while disabling UTF8 view functionality (which may have performance or compatibility implications that require careful consideration).

Confidence score: 4/5

  • This PR is safe to merge with minimal risk as it only changes default configuration values
  • Score reflects straightforward configuration changes with clear intent, though the UTF8 view change seems unrelated to the PR title
  • Pay close attention to the serialization fix in stream.rs to ensure the field name correction doesn't break existing API consumers

2 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

@haohuaijin haohuaijin merged commit 30babf9 into main Sep 4, 2025
27 checks passed
@haohuaijin haohuaijin deleted the fix/stream-distinct-values branch September 4, 2025 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working Review effort 2/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

enable_distinct_fields setting of stream is wrong

4 participants