-
-
Notifications
You must be signed in to change notification settings - Fork 184
Closed
Description
I am using HtmlUnit to web scrape documentation.
The following documentation includes a code block that contains tags, and the content of those tags is HTML.
The org.htmlunit.html.The DomNode.asXML() function is returning a block of text with extra spacing, which is not present in the original.
<span data-stt-ignore="">
<div class="highlight">
<pre tabindex="0" style=";-moz-tab-size:4;-o-tab-size:4;tab-size:4;">
<code class="language-xml" data-lang="xml">
<span style="display:flex;">
<span>
<span style="color:#008000;font-weight:bold">
<property
</span>
<span style="color:#7d9029">
key=
</span>
<span style="color:#ba2121">
"propertyKey"
</span>
<span style="color:#7d9029">
type=
</span>
<span style="color:#ba2121">
"propertyType"
</span>
<span style="color:#008000;font-weight:bold">
>
</span>
</span>
</span>
<span style="display:flex;">
<span>
<span style="color:#008000;font-weight:bold">
<caption>
</span>
My Property
<span style="color:#008000;font-weight:bold">
</caption>
</span>
</span>
</span>
<span style="display:flex;">
<span>
<span style="color:#008000;font-weight:bold">
<description>
</span>
This is my property
<span style="color:#008000;font-weight:bold">
</description>
</span>
</span>
</span>
<span style="display:flex;">
<span>
<span style="color:#008000;font-weight:bold">
</property>
</span>
</span>
</span>
</code>
</pre>
</div>
</span>
Are there options to customize this behavior? At the moment, I am trying to use regex in a while loop with String.replaceAll, but this is not ideal.
Metadata
Metadata
Assignees
Labels
No labels

