Use optimized text in match_only_text fields#129371
Use optimized text in match_only_text fields#129371jordan-powers merged 8 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| /** | ||
| * Reader that decodes UTF-8 formatted bytes into chars. | ||
| */ | ||
| public class UTF8DecodingReader extends Reader { |
There was a problem hiding this comment.
This is basically equivalent to:
final var reader = new InputStreamReader(new ByteArrayInputStream(utfBytes.bytes(), utfBytes.offset(), utfBytes.length()), StandardCharsets.UTF_8);But according to the microbenchmarks, using the InputStreamReader/ByteArrayInputStream is really slow, and was actually slower than the original string-based implementation.
There was a problem hiding this comment.
Here are the results from the microbenchmark:
Benchmark (nDocs) Mode Cnt Score Error Units
OptimizedTextBenchmark.indexDocuments (baseline) 1048576 avgt 5 581.242 ± 3.050 ms/op
OptimizedTextBenchmark.indexDocuments (UTF8DecodingReader) 1048576 avgt 5 544.477 ± 4.961 ms/op
OptimizedTextBenchmark.indexDocuments (InputStreamReader) 1048576 avgt 5 852.380 ± 6.238 ms/op
There was a problem hiding this comment.
Good catch. Yes, this implementation makes a lot of sense then. Did you also run this PR against elastic/logs (enterprise) benchmark?
There was a problem hiding this comment.
I haven't yet, but I plan to today
There was a problem hiding this comment.
Maybe make this a final class?
martijnvg
left a comment
There was a problem hiding this comment.
Looks good. I left a few questions.
| /** | ||
| * Reader that decodes UTF-8 formatted bytes into chars. | ||
| */ | ||
| public class UTF8DecodingReader extends Reader { |
There was a problem hiding this comment.
Good catch. Yes, this implementation makes a lot of sense then. Did you also run this PR against elastic/logs (enterprise) benchmark?
...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java
Show resolved
Hide resolved
...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/SourceConfirmedTextQuery.java
Show resolved
Hide resolved
...per-extras/src/main/java/org/elasticsearch/index/mapper/extras/MatchOnlyTextFieldMapper.java
Show resolved
Hide resolved
| /** | ||
| * Reader that decodes UTF-8 formatted bytes into chars. | ||
| */ | ||
| public class UTF8DecodingReader extends Reader { |
There was a problem hiding this comment.
Maybe make this a final class?
|
Looks like serverless check failed, because of: Which killed the search node and many tests failed because of this. I don't think it is related to this change. |
|
Ok, I benchmarked this with the In light of that, I'm not sure if we want to continue forward with this change. |
Given that even median indexing throughput varies, I think that this is within noise. I don't think this change makes things slower. I suspect we don't see a gain here, given that that are many other moving parts here. The micro benchmark suggested a minor improvement. I think not having to create java strings during indexing for match only fields is a beneficial. Additionally block loaders for match only text fields can work with bytes refs instead of strings, which I think is beneficial as well.
So I think we can move forward with the change and merge it in. |
|
Ok, I'll merge it and keep an eye on the nightlies. We can always revert it if we see something we don't like. |
💚 Backport successful
|
Follow-up to #126492 to use the json parsing optimizations for
match_only_textfields.Relates to #129072.