Skip to content

ArrayIndexOutOfBoundsException when writing the FSTStore-backed FST with different DataOutput for meta #12697

@dungba88

Description

@dungba88

Description

After writing the FSTStore-backed FST to DataOutput, and specifying a different DataOutput for meta, if we try to read from these (using the FST public ctor) we will get the following the exception:

java.lang.ArrayIndexOutOfBoundsException: Index 17 out of bounds for length 17

	at __randomizedtesting.SeedInfo.seed([CBCB30F6D2F8FEA1:821F24747AC56DDD]:0)
	at org.apache.lucene.store.ByteArrayDataInput.readVLong(ByteArrayDataInput.java:133)
	at org.apache.lucene.util.fst.FST.<init>(FST.java:494)
	at org.apache.lucene.util.fst.FST.<init>(FST.java:443)

The reason is that, when writing to metadata, if the FST is backed by FSTStore, it would not write the numBytes: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/FST.java#L555-L562

The numBytes is instead written by FSTStore to the main DataOutput: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/fst/OnHeapFSTStore.java

Thus if we set the metaOut and dataOut as the same DataOutput, they will subsequently write the numBytes correctly. However if we use different DataOutput, the metaOut will thus lack of the numBytes and cause the index out of bounds exception.

To illustrate:

When writing on the same DataOutput

[ HEADER ] [ EMPTY_OUTPUT_FLAG ] [ EMPTY_OUTPUT ] [INPUT_TYPE ] [ START_NODE ] [ NUM_BYTES ] [ MAIN ]

When writing on the different DataOutput

metaOut: [ HEADER ] [ EMPTY_OUTPUT_FLAG ] [ EMPTY_OUTPUT ] [INPUT_TYPE ] [ START_NODE ]
dataOut: [ NUM_BYTES ] [ MAIN ]

Expected DataOutput:

metaOut: [ HEADER ] [ EMPTY_OUTPUT_FLAG ] [ EMPTY_OUTPUT ] [INPUT_TYPE ] [ START_NODE ] [ NUM_BYTES ] 
dataOut: [ MAIN ]

I can put a fix to this

Version and environment details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions