• ResolvedPlugin Contributor Rumperuu

    (@rumperuu)


    Hi,

    There is a bug in the RegEx on line 1056 of the file task.php:

    
    $l_str_FootnoteText = preg_replace( '#(?<![-\w\.!~\*\'\(\);]=[\'"])(?<![-\w\.!~\*\'\(\);]=)((ht|f)tps?://[^\\s<]+)#', '<span class="footnote_url_wrap">$1</span>', $l_str_FootnoteText );
    

    Expected Result

    The RegEx should ‘line wrap…URLs (hyperlinked or not) based on pattern, not link element, to prevent them from hanging out of the tooltip in non-Unicode-compliant user agents.’

    Actual Result

    The footnote rendering completely breaks when using a URL with two https:// strings, e.g., a Web page saved via the Wayback Machine (like https://web.archive.org/web/20210122200554/https://wordpress.org/plugins/footnotes/).

    Steps to Replicate

    See this RegExr example.

    Also, here is a screenshot of the rendering issue—note that due to an apparently-unrelated numbering issue, the ordering of the second and third footnotes is switched.

    This is the code that produced that (using the ((/)) footnote tags):

    
    This is a normal, link-free footnote.((Dicta quas esse enim aut tempore corporis nobis. Quaerat dolorem repellat quo qui quam ducimus mollitia nulla. Perferendis blanditiis voluptatem laudantium et ut ea illo veritatis.))
    
    This is a footnote with a link in it.((Dicta quas esse enim aut tempore corporis nobis. Quaerat dolorem repellat quo qui quam ducimus mollitia nulla. <a href="https://wordpress.org/plugins/footnotes/" target="_blank" rel="noopener noreferrer">Perferendis blanditiis voluptatem</a> laudantium et ut ea illo veritatis.))
    
    This is the same footnote with a Wayback Machine link in it.((Dicta quas esse enim aut tempore corporis nobis. Quaerat dolorem repellat quo qui quam ducimus mollitia nulla. <a href="https://web.archive.org/web/20210122200554/https://wordpress.org/plugins/footnotes//" target="_blank" rel="noopener noreferrer">Perferendis blanditiis voluptatem</a> laudantium et ut ea illo veritatis.))
    

    Temporary Workaround

    Disabling the ‘Allow URLs to line-wrap anywhere’ setting stops this from happening.

    • This topic was modified 5 years ago by Rumperuu.
    • This topic was modified 5 years ago by Rumperuu.
    • This topic was modified 5 years ago by Rumperuu.
Viewing 3 replies - 1 through 3 (of 3 total)
  • Plugin Contributor pewgeuges

    (@pewgeuges)

    Hi @rumperuu

    Thank you very much for reporting this bug, and for all the testing you performed. It is now fixed in v2.5.3 that has been fast-tracked for instant release, now available. A third negative lookbehind has been added for a leading slash hinting that the URL pattern is part of a folder name in a Wayback Machine URL.

    Despite my label I’m not a plugin author, only a maintenance programmer coming up with a bunch of bug fixes 3 months ago. We’d be happy to register you for Footnotes development.

    The other bug about switching footnotes numbers I seem unable to reproduce despite using the exact blog post text that you posted. I’m very concerned about this too, please feel free to share the page where that is happening. If not you may also save it and send me the .zip to my username at Gmail.

    I apologize for not anticipating the use case of Wayback Machine URLs despite visiting the page in your example and pinning the website’s home page in my browser. Also this regex caused already many bugs; all bug reports (and fixes) are listed in the comment block:

    1. https://wordpress.org/support/topic/2-1-4-breaks-on-my-site-images-dont-show/
    2. https://wordpress.org/support/topic/broken-layout-starting-version-2-1-4/
    3. https://wordpress.org/support/topic/two-links-now-breaks-footnotes-with-blogtext/
    4. https://wordpress.org/support/topic/two-links-now-breaks-footnotes-with-blogtext/

    After updating to 2.5.3 you may re-enable URL wrap.

    If you know a better way of doing this rather than a buggy regex, please let us know.

    Thank you.

    Plugin Contributor Rumperuu

    (@rumperuu)

    Thanks for the quick response @pewgeuges, the URL wrapping is working fine now.

    Sure, feel free to register me as a developer.

    I’ve opened a second issue for the footnote numbering bug.

    RE: alternative URL recognition solutions, RegEx alone isn’t ideal; parsing the HTML would be better. Parse the footnote text, then you should be able to identify whether a URL is part of a child <a> node or not.

    For example, here’s some rough pseudocode for Simple HTML DOM Parser:

    
    $footnote_dom = str_get_html($footnote_text);
    foreach($footnote->find('p, span, strong, em, i, b') as $element)
        if $element.body.contains.match(url_regex) && match_not_in_child_element()
            wrap_in_<span>()
    
    • This reply was modified 5 years ago by Rumperuu.
    • This reply was modified 5 years ago by Rumperuu.
    Plugin Contributor pewgeuges

    (@pewgeuges)

    Thank you @rumperuu for your response, and sorry for not seeing it sooner.

    May we use your WordPress username to register you?

    In the Footnotes codebase I need to fix or add innumerable comment blocks to properly credit all new features (not only part of them or more recent ones), because as I’m new to the project, and to PHP overall, on one hand I missed out on PHPDOC best practices, and on the other hand the available standard tags are by far insufficient so I added new ones in the VSCode TextMate syntax configuration until a fortnight ago. And after that I need to add many missing settings, part of which are already promised, while the to-do list got only longer because I needed to catch up on other projects over the past weeks. For the time being, I’m afraid of commit conflicts; managing multiple contributors is easier on GitHub, but Footnotes’ repo there is on standby and out of sync, while that’s where I got in touch with the project.

    Glad that parsing HTML is feasible in a plugin, but I fear that recognizing only a limited set of elements would cause new bugs, and URLs in inner HTML may or may not be in an <‌a‌> element. The initial fix was purely CSS: break anywhere all link elements. But these may not be URLs, while URLs may not be hyperlinked.

    I don’t know why Google does not implement the Unicode Standard in Chrome. Mozilla does; not Google. If Chrome did line-wrap URLs at slashes like Firefox does, implementing UAX #14, the whole issue might be off the table (depending on other browsers).

    Thanks also for the new topic about numbering. I’ve tested through your code: There is indeed a big problem! I’m going to post over there right now.

    • This reply was modified 5 years ago by pewgeuges.
Viewing 3 replies - 1 through 3 (of 3 total)

The topic ‘Line wrap RegEx bug’ is closed to new replies.