-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Avoid modify merge state in per field mergers #13208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jpountz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nhat, I had completely forgotten about the fact that we'd mutate the MergeState in-place at merge time, this is scary! I like your change better than the current approach, regardless of the concurrency issue.
| private final FieldInfos orgMergeFieldInfos; | ||
| private final FieldInfos[] orgFieldInfos; | ||
| private final FieldsProducer[] orgFieldsProducers; | ||
| private final MergeState copy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we dynamically allocate this copy in apply() so that each caller gets an independent instance, and we're less subject to a similar kind of bug in the future due to multiple threads trying to modify this copy at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I pushed ee3b035
| import org.apache.lucene.codecs.PointsReader; | ||
| import org.apache.lucene.codecs.StoredFieldsReader; | ||
| import org.apache.lucene.codecs.TermVectorsReader; | ||
| import org.apache.lucene.codecs.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to avoid star imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpountz Sorry, I haven't changed the imports setting for my new Intellij setup.
| * @param fieldInfos the new field infos | ||
| * @param fieldsProducers the new field producers | ||
| */ | ||
| public MergeState( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too happy with this constructor being public since it's really an implementation detail of PerFieldMergeState . At the same time, I like having the per-field-specific logic in the .perfield package rather than here.
I'm not sure how to fix this cleanly... my best idea at this point would be to store all parameters of the main MergeState constructor on the MergeState object, so that PerFieldMergeState can then call this constructor again, wrapping codec readers with a FilterCodecReader that only exposes a subset of fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also works for me. I pushed 0f6a566.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had something different in mind but I realize it wouldn't work without re-computing merge instances over and over again, which we probably don't want. Your approach looks better.
|
I just tested this PR on the nightly benchy and indeed it fixes the merge exception that was blocking the nightly high-cardinality facets test! Yay, thanks @dnhatn! |
|
@jpountz @mikemccand Thank you! |
The PerFieldDocValuesFormat and PerFieldPostingsFormat mutate and reset the fieldInfos of the mergeState during merges. Consequently, other running merge sub-tasks may fail to see some fieldInfos. This was problematic since we introduced concurrency for merge sub-tasks. Relates #13190
The PerFieldDocValuesFormat and PerFieldPostingsFormat mutate and reset the fieldInfos of the mergeState during merges. Consequently, other running merge sub-tasks may fail to see some fieldInfos. This was problematic since we introduced concurrency for merge sub-tasks. Relates #13190
|
Thank you @dnhatn! I've kicked off a one-off nightly benchy run ... |
The PerFieldDocValuesFormat and PerFieldPostingsFormat mutate and reset the fieldInfos of the mergeState during merges. Consequently, other running merge sub-tasks may fail to see some fieldInfos. This was problematic since we introduced concurrency for merge sub-tasks.
Relates #13190