Skip to content

Whitespace in text tokenized as IGNORABLE_WHITESPACE in XmlReader #241

@westnordost

Description

@westnordost

When I have this XML

<user>dude &amp; &lt;dudette&gt;</user>

I expect to get the following events when I iterate through it via the XmlReader:

  1. START_DOCUMENT
  2. START_ELEMENT localName="user"
  3. TEXT text="dude "
  4. ENTITY_REF text="&"
  5. TEXT text=" "
  6. ENTITY_REF text="<"
  7. TEXT text="dudette"
  8. ENTITY_REF text=">"
  9. END_ELEMENT localName="user"
  10. END_DOCUMENT

However, number 5 doesn't turn up as a TEXT but as an IGNORABLE_WHITESPACE.

I think this is a bug, this is not an ignorable whitespace. Whitespaces between XML elements, such as <user>abc</user> <id>1234</id> would be ignorable.

(By the way, the existence of CDSECT and ENTITY_REF was a pitfall (aka footgun) for me, I assumed before that the XMLReader would already have all text content, i.e. I expected there would be just TEXT text="dude & <dudette>" and then END_ELEMENT.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions