Skip to content

Use default context output rather than outputWithTimestamp for ElasticsearchIO#16744

Merged
echauchot merged 1 commit intoapache:masterfrom
egalpin:elasticsearch-default-output
Mar 1, 2022
Merged

Use default context output rather than outputWithTimestamp for ElasticsearchIO#16744
echauchot merged 1 commit intoapache:masterfrom
egalpin:elasticsearch-default-output

Conversation

@egalpin
Copy link
Copy Markdown
Member

@egalpin egalpin commented Feb 4, 2022

Using ProcessContext#outputWithTimestamp is more fragile than output, and explicit use of outputWithTimestamp is not required in the case of ES module; in fact, code is currently trying to emulate output by storing and recalling element timestamps. This change is being made after finding an issue with timestamp validation when starting a pipeline from a snapshot that could have been avoided with this change.

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@egalpin
Copy link
Copy Markdown
Member Author

egalpin commented Feb 4, 2022

R: @echauchot

Copy link
Copy Markdown
Contributor

@echauchot echauchot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks !

@echauchot
Copy link
Copy Markdown
Contributor

@egalpin I thought this PR was only for cosmetics, but I see that it is linked to BEAM-14064 if I understood correctly. For next time, if it is linked with an open issue, remember to put the jira ticket in the commit. I'll not reword as it is already merged to master

@egalpin
Copy link
Copy Markdown
Member Author

egalpin commented Mar 9, 2022

Will do @echauchot 👍 This change was made and merged prior to BEAM-14064, but I think it would have been useful for me to document the reason for my fix in Jira anyway. Thanks for the reminder!

@lukecwik
Copy link
Copy Markdown
Member

lukecwik commented Mar 10, 2022

I think this is still incorrect as the buffered Document will be assigned to an arbitrary window and timestamp.

@egalpin
Copy link
Copy Markdown
Member Author

egalpin commented Mar 10, 2022

I agree that it's sub-optimal. I do feel that having this change is far better than not, otherwise ElasticsearchIO#write will be unusable. Ideally, I'll be able to fix the data correctness issue before 2.38.0 release after getting advice on the dev@ mailing list for best practices.

@lukecwik
Copy link
Copy Markdown
Member

I thought this was considered the final fix so thanks for confirming that this is only a partial improvement and that your looking at fixing it completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants