-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Simple reproducer, attached HTML text file is the document in question. Using JSoup version 1.18.3
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.junit.jupiter.api.Test;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
public class SelectXpathTest {
@Test
public void test() throws IOException {
String html = new String(Files.readAllBytes(Path.of("src/test/resources/debug.html.txt")));
Document document = Jsoup.parse(html);
int foundElements = document.selectXpath(xpath).size();
System.out.println("Found %s elements".formatted(foundElements));
assert foundElements > 0;
}
private static final String xpath = "/html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1]/div[5]/fieldset[1]/div[2]/div/div[1]/label[1]/input[2]";
}I have evidence that the xpath provided should be resolvable, since I am able to take the attached html, open it in firefox and then using the following JS:
function getElementByXpath(path) {
return document.evaluate(path, document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue;
}run the following command in the console:
getElementByXpath("/html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1]/div[5]/fieldset[1]/div[2]/div/div[1]/label[1]/input[2]")
to get:
<input id="assignment_text_entry" type="checkbox" value="1" name="online_submission_types[online_text_entry]" aria-label="Online Submission Type - Text Entry" style="">
Initial debugging shows that the last element JSoup is able to retrieve along the xpath is:
/html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1]
When I retrieve the children for /html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1], I get 7 children, then:
/html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1]/div[1] -> works
/html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1]/div[2] -> works
/html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1]/div[3] -> works
/html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1]/div[4] -> works
/html/body/div[3]/div[2]/div[2]/div[3]/div[1]/div/div[1]/form/div[1]/div[5] -> Nope...