Skip to content

Conversation

@kirkrodrigues
Copy link
Contributor

@kirkrodrigues kirkrodrigues commented Dec 8, 2022

At a high-level, the plugin takes two inputs: a JSON record and a list of fields (unstructured log messages) to encode with CLP. The plugin will extract and encode the user-specified fields into CLP's three-column format and store the output in a Pinot GenericRow object.

This is part of the change requested in #9819 and described in this design doc.

Release notes

  • New plugin added: pinot-clp-log to encode user-specified fields with CLP during ingestion.

  • Users can use the plugin by specifying these configuration options in their tableIndexConfig.streamConfigs:

    "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.clplog.CLPLogMessageDecoder",
    "stream.kafka.decoder.prop.fieldsForClpEncoding": "<field-names>"

    where <field-names> is a comma-separated list of fields you wish to encode with CLP.

Testing performed

  • Validated fields are encoded correctly using the added unit test.
  • Created a table with the following settings:
"tableIndexConfig": {
    ...,
    "streamConfigs": {
        ...,
        "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.clplog.CLPLogMessageDecoder",
        "stream.kafka.decoder.prop.fieldsForClpEncoding": "message",
    }
}
  • Ingested logs through Kafka and ensured log events containing the message field were transformed such that the message field was replaced with CLP's three fields: message_logtype, message_dictionaryVars, and message_encodedVars.

@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Dec 8, 2022
@Jackie-Jiang
Copy link
Contributor

This is a great feature! Can you please give public access to the design doc?

@codecov-commenter
Copy link

codecov-commenter commented Dec 8, 2022

Codecov Report

Merging #9942 (5fcad3e) into master (c4ae332) will decrease coverage by 55.16%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##             master    #9942       +/-   ##
=============================================
- Coverage     68.75%   13.58%   -55.17%     
+ Complexity     5685      176     -5509     
=============================================
  Files          1996     1941       -55     
  Lines        107802   105336     -2466     
  Branches      16388    16094      -294     
=============================================
- Hits          74115    14315    -59800     
- Misses        28440    89894    +61454     
+ Partials       5247     1127     -4120     
Flag Coverage Δ
integration1 ?
unittests1 ?
unittests2 13.58% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...src/main/java/org/apache/pinot/sql/FilterKind.java 0.00% <0.00%> (-100.00%) ⬇️
...ain/java/org/apache/pinot/spi/utils/LoopUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...ain/java/org/apache/pinot/core/data/table/Key.java 0.00% <0.00%> (-100.00%) ⬇️
...in/java/org/apache/pinot/spi/utils/BytesUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...n/java/org/apache/pinot/core/data/table/Table.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/core/data/table/Record.java 0.00% <0.00%> (-100.00%) ⬇️
.../java/org/apache/pinot/core/util/GroupByUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/BaseRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/NoOpRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1534 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kirkrodrigues
Copy link
Contributor Author

This is a great feature! Can you please give public access to the design doc?

Apologies for the delay, the previous doc couldn't be shared due to some security settings. I've updated the link to a publicly accessible doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comments. What does extractAll mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this an extractor test when it tests and decoder actually? Can we test the extractor directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to test different scenarios based on props's value.

@chenboat chenboat merged commit fd0267d into apache:master Jan 6, 2023
@Jackie-Jiang
Copy link
Contributor

Thanks for adding this powerful feature. Can you help add a documentation page in the pinot doc describing when and how to use the feature?

@kirkrodrigues
Copy link
Contributor Author

Yep, will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants