fix(span): track end tag positions correctly in naive state switching mode#124
Conversation
|
@Akida31 can you take a look? |
untitaker
left a comment
There was a problem hiding this comment.
going to wait a bit for a second review but that makes sense to me. the pr just calls start_open_tag consistently with data state
|
not entirely related to this PR, but it seems that in data state we also call |
It will only call html5gum/src/emitters/emitter.rs Lines 322 to 333 in e97132d |
|
I'm talking about the explicit init-string call in machine.rs in the data state. it seems to me this should be copied to all other text/data states, but I don't know its purpose. haven't investigated it deeply yet, maybe @Akida31 remembers since he wrote the code |
|
ty @shulaoda! |
|
I think I just overlooked these cases (and don't use the naive state switching mode), so this PR looks good to me, thanks! |
|
That's alright, I have now found more cases using fuzzing:
the fuzzer has now run since 1 day and has not found anything new. anybody is encouraged to run the fuzzer themselves (also with the other envvars enabled) to find other cases. let me know if you need help doing so. |
Description
When using html5gum's
naively_switch_states(true)feature to properly parse HTML content within<script>,<style>, and similar raw text tags, the tokenizer returns incorrect span (position) information for the text content inside these tags.Specifically, the
span.startvalue is incorrectly set to0instead of the actual position where the text content begins in the HTML string.The Span Bug Example