Skip to content

ClassCastException HtmlUnknownElement -> HtmlMeta when encountering a "meta" tag that is unknown #905

@staticgears

Description

@staticgears

Original website in the wild, it is in Dutch https://lokaleregelgeving.overheid.nl/ZoekResultaat

   <dependency>
      <groupId>org.htmlunit</groupId>
      <artifactId>htmlunit</artifactId>
      <version>4.7.0</version>
    </dependency>

When parsing an html chunk that contains unknown html tags that match the final List<HtmlMeta> tags = getDocumentElement().getStaticElementsByTagName("meta") it returns a List which contains HtmlUnknownElement and HtmlMeta and blindly casts it to List and the error is only uncovered when iterating over the list a few lines later.
Arguably the List<E extends HtmlElement> getStaticElementsByTagName(String) is the unsafe link in the chain, but modifying that was too big of a change for me.

The relevant portion of the stack


	at org.htmlunit.html.HtmlPage.getMetaTags(HtmlPage.java:1936)
	at org.htmlunit.html.HtmlPage.getRefreshStringOrNull(HtmlPage.java:1448)
	at org.htmlunit.html.HtmlPage.executeRefreshIfNeeded(HtmlPage.java:1356)
	at org.htmlunit.html.HtmlPage.initialize(HtmlPage.java:332)
	at build.struck.testBug.testHTMLUnit(testBug.java:790)

The fix is simple, filterout classes that will fail the cast in org.htmlunit.html.HtmlPage.getMetaTags

Proposed solution

I coded it using a stream for a quick and dirty approach. But it would be better to ask for a List and then filter before casting and thus avoid the

        protected List<HtmlMeta> getMetaTags(final String httpEquiv) {
            if (getDocumentElement() == null) {
                return Collections.emptyList(); // weird case, for instance if document.documentElement has been removed
            }
            final List<HtmlMeta> tags = getDocumentElement().getStaticElementsByTagName("meta")
                    .stream().filter((i) -> i instanceof HtmlMeta).map(HtmlMeta.class::cast).collect(Collectors.toList());
            final List<HtmlMeta> foundTags = new ArrayList<>();
            for (final HtmlMeta htmlMeta : tags) {
                if (httpEquiv.equalsIgnoreCase(htmlMeta.getHttpEquivAttribute())) {
                    foundTags.add(htmlMeta);
                }
            }
            return foundTags;
        }

Unittest

This is TestNG, not JUnit but it gets the point across

    @Test(groups = "manual")
    public void testHTMLUnit() throws IOException {
        String poison = "<overheidrg:meta xmlns:overheidrg=\"http://standaarden.overheid.nl/cvdr/terms/\">";
        String badHtml = String.format("<html><body>%s</body></html>", poison);
        URL originalUrl = new URL("https://lokaleregelgeving.overheid.nl/ZoekResultaat");
        try(WebClient client = new WebClient()) {
            StringWebResponse webResponse = new StringWebResponse(badHtml, originalUrl);

            // Create a WebWindow to hold the HtmlPage
            WebWindow webWindow = client.getCurrentWindow();
            PageCreator pc = new DefaultPageCreator();
            HtmlPage page = (HtmlPage) pc.createPage(webResponse, webWindow);
            page.initialize(); //java.lang.ClassCastException: class org.htmlunit.html.HtmlUnknownElement cannot be cast to class org.htmlunit.html.HtmlMeta (org.htmlunit.html.HtmlUnknownElement and org.htmlunit.html.HtmlMeta are in unnamed module of loader 'app')
        }
    }
}

Workaround

  • Subclass HtmlPage as HackHtmlPage and override protected List<HtmlMeta> getMetaTags(final String httpEquiv) as indicated above
  • Subclass DefaultPageCreator as HackDefaultPageCreator and override protected HtmlPage createHtmlPage(final WebResponse webResponse, final WebWindow webWindow) throws IOException to return HackHtmlPage
  • webClient.setPageCreator(new HackDefaultPageCreator())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions