-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
After writing the FSTStore-backed FST to DataOutput, and specifying a different DataOutput for meta, if we try to read from these (using the FST public ctor) we will get the following the exception:
java.lang.ArrayIndexOutOfBoundsException: Index 17 out of bounds for length 17
at __randomizedtesting.SeedInfo.seed([CBCB30F6D2F8FEA1:821F24747AC56DDD]:0)
at org.apache.lucene.store.ByteArrayDataInput.readVLong(ByteArrayDataInput.java:133)
at org.apache.lucene.util.fst.FST.<init>(FST.java:494)
at org.apache.lucene.util.fst.FST.<init>(FST.java:443)
The reason is that, when writing to metadata, if the FST is backed by FSTStore, it would not write the numBytes: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/FST.java#L555-L562
The numBytes is instead written by FSTStore to the main DataOutput: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/OnHeapFSTStore.java
Thus if we set the metaOut and dataOut as the same DataOutput, they will subsequently write the numBytes correctly. However if we use different DataOutput, the metaOut will thus lack of the numBytes and cause the index out of bounds exception.
To illustrate:
When writing on the same DataOutput
[ HEADER ] [ EMPTY_OUTPUT_FLAG ] [ EMPTY_OUTPUT ] [INPUT_TYPE ] [ START_NODE ] [ NUM_BYTES ] [ MAIN ]
When writing on the different DataOutput
metaOut: [ HEADER ] [ EMPTY_OUTPUT_FLAG ] [ EMPTY_OUTPUT ] [INPUT_TYPE ] [ START_NODE ]
dataOut: [ NUM_BYTES ] [ MAIN ]
Expected DataOutput:
metaOut: [ HEADER ] [ EMPTY_OUTPUT_FLAG ] [ EMPTY_OUTPUT ] [INPUT_TYPE ] [ START_NODE ] [ NUM_BYTES ]
dataOut: [ MAIN ]
I can put a fix to this
Version and environment details
No response