Skip to content

Reader::read_event_into panics because of incorrect DTD handling #923

@tayu0110

Description

@tayu0110

Hello, I would like to report an issue because of incorrect DTD handling.

This is a well-formed and valid XML document.

<?xml version="1.0"?>
<!-- sample.xml -->
<!DOCTYPE root [
        <!ENTITY ent ">">
        <!ELEMENT root (#PCDATA)>
]>
<root>&ent;</root>

For the above XML document, write the following simple code,

use quick_xml::{Reader, events::Event};

fn main() {
    let path = std::env::args().nth(1).unwrap();
    let mut reader = Reader::from_file(path).unwrap();
    let mut buf = vec![];
    while reader.read_event_into(&mut buf).unwrap() != Event::Eof {}
}

and run it

cargo run -- sample.xml

As a result, Reader::read_event_into panics with the following message.

thread 'main' (427815) panicked at src/main.rs:7:44:
called `Result::unwrap()` on an `Err` value: Syntax(InvalidBangMarkup)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Furthermore, the following XML document is also well-formed.

<?xml version="1.0"?>
<!-- sample.xml -->
<!DOCTYPE root [
        <!ENTITY ent "<">
]>
<root />

The execution results for this are as follows.

thread 'main' (435643) panicked at src/main.rs:7:44:
called `Result::unwrap()` on an `Err` value: Syntax(UnclosedDoctype)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

According to the XML specification, including < and > in an entity value literal is entirely legal. Merely checking the correspondence between < and > is insufficient to skip over the DTD.
In this sense, the current implementation's DTD handling is incorrect.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions