Skip to content

Comments

Fix Shared Strings Table parsing when using SST cache#105

Merged
monitorjbl merged 2 commits intomonitorjbl:masterfrom
datadotworld:buffered-strings
Aug 17, 2017
Merged

Fix Shared Strings Table parsing when using SST cache#105
monitorjbl merged 2 commits intomonitorjbl:masterfrom
datadotworld:buffered-strings

Conversation

@shawnsmith
Copy link
Contributor

This PR fixes three issues I encountered using the SST cache feature due to errors parsing xl/sharedStrings.xml:

  1. Truncated text where eg. B1 is Blank ---> was being parsed as B1 is Blank ---. The XML parser was parsing the > at the end of the string as a second XMLEvent which the code ignored. Fixed by using XMLEventReader.getElementText() instead of XmlEventReader.nextEvent().asCharacters(). Example input:

    <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
      <si>
        <t>B1 is Blank ---&gt;</t>
      </si>
    </sst>
  2. The issue with blank strings described in com.sun.xml.internal.stream.events.EndElementEvent cannot be cast to javax.xml.stream.events.Characters #100. This is fixed by using XMLEventReader.getElementText(). Example input:

    <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
      <si>
        <t/>
      </si>
    </sst>
  3. Strings with rich text formatting were getting dropped. Fixed by adding code to parse the <r><t>...</t></r> pattern (Rich Text Run). Note that the new code still drops the formatting. Example input:

    <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="2" uniqueCount="2">
      <si>
        <r>
          <rPr>
            <b/>
            <sz val="10"/>
            <rFont val="Arial"/>
            <family val="2"/>
          </rPr>
          <t>shared styled string</t>
        </r>
      </si>
    </sst>

@monitorjbl
Copy link
Owner

Looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants