Skip to content

Conversation

@tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented May 15, 2024

label:
bugfix

Recently, we found that post-compaction, string fields with length greater than default 512 characters got trimmed. On debugging, I found that we were using schema generated from metadata.properties file and not passing schema from ZK to SegmentGenerator class. This hotfix is related to passing ZK schema to this.

One of the other root-causes is metadata.properties file missing schema max length information for columns. Also while creating fieldSpec from metadata properties we don't pass maxlength anywhere --

fieldSpec = new DimensionFieldSpec(column, dataType, isSingleValue, defaultNullValueString);

This leads to falling back to default 512 for string fields.

Sample metadata.properties values for a column:

column.Attr.cardinality = -2147483648
column.Attr.totalDocs = 1322868
column.Attr.dataType = STRING
column.Attr.bitsPerElement = 31
column.Attr.lengthOfEachEntry = 0
column.Attr.columnType = DIMENSION
column.Attr.isSorted = false
column.Attr.hasDictionary = false
column.Attr.isSingleValues = true
column.Attr.maxNumberOfMultiValues = 0
column.Attr.totalNumberOfEntries = 1322868
column.Attr.isAutoGenerated = false
column.Attr.minValue = null
column.Attr.minMaxValueInvalid = true
column.Attr.defaultNullValue = null

No schemaMaxLength field present here^^.

Longer term fix I am planning as a follow-up is to persist maxLength info in metadata properties file and parsing that in the ColumnMetadataImpl file while creating FieldSpec.

cc @snleee @Jackie-Jiang

@codecov-commenter
Copy link

codecov-commenter commented May 15, 2024

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.15%. Comparing base (59551e4) to head (75a3fb1).
⚠️ Report is 2926 commits behind head on master.

Files with missing lines Patch % Lines
...upsertcompaction/UpsertCompactionTaskExecutor.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13157      +/-   ##
============================================
+ Coverage     61.75%   62.15%   +0.40%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2515      +79     
  Lines        133233   137861    +4628     
  Branches      20636    21335     +699     
============================================
+ Hits          82274    85692    +3418     
- Misses        44911    45780     +869     
- Partials       6048     6389     +341     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.11% <0.00%> (+0.40%) ⬆️
java-21 62.03% <0.00%> (+0.41%) ⬆️
skip-bytebuffers-false 62.14% <0.00%> (+0.39%) ⬆️
skip-bytebuffers-true 62.01% <0.00%> (+34.28%) ⬆️
temurin 62.15% <0.00%> (+0.40%) ⬆️
unittests 62.15% <0.00%> (+0.40%) ⬆️
unittests1 46.72% <ø> (-0.17%) ⬇️
unittests2 27.79% <0.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jackie-Jiang Jackie-Jiang merged commit 8206ce8 into apache:master May 15, 2024
@tibrewalpratik17 tibrewalpratik17 deleted the fix_upsert_compaction_bug branch May 16, 2024 15:06
gortiz pushed a commit to gortiz/pinot that referenced this pull request Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants