Skip to content

Uninitialized string offset in src/HTML5/Parser/Scanner.php:108 #215

@leeN

Description

@leeN

Hello!

I'm playing around with some PHP sanitization libraries and found the following issue in your HTML parser:

The Scanner::peek() method attempts to read beyond the string's end in some cases. From looking at the strack trace it seems like this check is wrong (I think it should be < instead of <= as EOF == strlen($data)). Changing the comparison operator to < makes the warning go away as well.

How to reproduce:

Install the current masterminds/html5 version via composer: composer require masterminds/html5

Run the following php script:

<?php
require "vendor/autoload.php";

use Masterminds\HTML5;
$html5 = new HTML5();
$html = "<form ></span><!--*/'><!--";
$dom = $html5->loadHTML($html);

print $html5->saveHTML($dom);

The warning seems to occur if there are incorrect comments (i.e., trailing and unclosed xml comments) in the input. While this HTML fragment is obviously invalid, your parser is used by several sanitization libraries (e.g., the typo3 one) which have to handle broken HTML.

I do not think this causes any kind of parsing issues, but this still seems to be a bug on your end.

Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions