Skip to content

Conversation

@dnhatn
Copy link
Member

@dnhatn dnhatn commented Mar 25, 2024

The PerFieldDocValuesFormat and PerFieldPostingsFormat mutate and reset the fieldInfos of the mergeState during merges. Consequently, other running merge sub-tasks may fail to see some fieldInfos. This was problematic since we introduced concurrency for merge sub-tasks.

Relates #13190

@dnhatn dnhatn requested review from benwtrent and jpountz March 25, 2024 02:56
@dnhatn dnhatn marked this pull request as ready for review March 25, 2024 02:57
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nhat, I had completely forgotten about the fact that we'd mutate the MergeState in-place at merge time, this is scary! I like your change better than the current approach, regardless of the concurrency issue.

private final FieldInfos orgMergeFieldInfos;
private final FieldInfos[] orgFieldInfos;
private final FieldsProducer[] orgFieldsProducers;
private final MergeState copy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we dynamically allocate this copy in apply() so that each caller gets an independent instance, and we're less subject to a similar kind of bug in the future due to multiple threads trying to modify this copy at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I pushed ee3b035

import org.apache.lucene.codecs.PointsReader;
import org.apache.lucene.codecs.StoredFieldsReader;
import org.apache.lucene.codecs.TermVectorsReader;
import org.apache.lucene.codecs.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer to avoid star imports.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpountz Sorry, I haven't changed the imports setting for my new Intellij setup.

@dnhatn dnhatn requested review from jpountz and mikemccand March 25, 2024 14:52
* @param fieldInfos the new field infos
* @param fieldsProducers the new field producers
*/
public MergeState(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too happy with this constructor being public since it's really an implementation detail of PerFieldMergeState . At the same time, I like having the per-field-specific logic in the .perfield package rather than here.

I'm not sure how to fix this cleanly... my best idea at this point would be to store all parameters of the main MergeState constructor on the MergeState object, so that PerFieldMergeState can then call this constructor again, wrapping codec readers with a FilterCodecReader that only exposes a subset of fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also works for me. I pushed 0f6a566.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had something different in mind but I realize it wouldn't work without re-computing merge instances over and over again, which we probably don't want. Your approach looks better.

@dnhatn dnhatn requested a review from jpountz March 25, 2024 15:12
@mikemccand
Copy link
Member

I just tested this PR on the nightly benchy and indeed it fixes the merge exception that was blocking the nightly high-cardinality facets test! Yay, thanks @dnhatn!

@dnhatn
Copy link
Member Author

dnhatn commented Mar 25, 2024

@jpountz @mikemccand Thank you!

@dnhatn dnhatn merged commit f4db67f into main Mar 25, 2024
@dnhatn dnhatn deleted the clone-merge-state branch March 25, 2024 15:56
dnhatn added a commit that referenced this pull request Mar 25, 2024
The PerFieldDocValuesFormat and PerFieldPostingsFormat mutate and reset 
the fieldInfos of the mergeState during merges. Consequently, other 
running merge sub-tasks may fail to see some fieldInfos. This was
problematic since we introduced concurrency for merge sub-tasks.

Relates #13190
dnhatn added a commit that referenced this pull request Mar 25, 2024
The PerFieldDocValuesFormat and PerFieldPostingsFormat mutate and reset 
the fieldInfos of the mergeState during merges. Consequently, other 
running merge sub-tasks may fail to see some fieldInfos. This was
problematic since we introduced concurrency for merge sub-tasks.

Relates #13190
@benwtrent
Copy link
Member

Huzzah! Thanks @dnhatn @jpountz !!!

@mikemccand
Copy link
Member

Thank you @dnhatn! I've kicked off a one-off nightly benchy run ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants