-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[feature] Add pinot-clp-log plugin to encode user-specified fields with CLP during ingestion #9942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is a great feature! Can you please give public access to the design doc? |
Codecov Report
@@ Coverage Diff @@
## master #9942 +/- ##
=============================================
- Coverage 68.75% 13.58% -55.17%
+ Complexity 5685 176 -5509
=============================================
Files 1996 1941 -55
Lines 107802 105336 -2466
Branches 16388 16094 -294
=============================================
- Hits 74115 14315 -59800
- Misses 28440 89894 +61454
+ Partials 5247 1127 -4120
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Apologies for the delay, the previous doc couldn't be shared due to some security settings. I've updated the link to a publicly accessible doc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments. What does extractAll mean?
...-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java
Outdated
Show resolved
Hide resolved
...-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this an extractor test when it tests and decoder actually? Can we test the extractor directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to test different scenarios based on props's value.
…RecordExtractorTest.
6beda35 to
5fcad3e
Compare
|
Thanks for adding this powerful feature. Can you help add a documentation page in the pinot doc describing when and how to use the feature? |
|
Yep, will do. |
At a high-level, the plugin takes two inputs: a JSON record and a list of fields (unstructured log messages) to encode with CLP. The plugin will extract and encode the user-specified fields into CLP's three-column format and store the output in a Pinot
GenericRowobject.This is part of the change requested in #9819 and described in this design doc.
Release notes
New plugin added:
pinot-clp-logto encode user-specified fields with CLP during ingestion.Users can use the plugin by specifying these configuration options in their
tableIndexConfig.streamConfigs:where
<field-names>is a comma-separated list of fields you wish to encode with CLP.Testing performed
messagefield was replaced with CLP's three fields:message_logtype,message_dictionaryVars, andmessage_encodedVars.