Skip to content

Navigating to javascript form using HtmlUnit #401

@MatroidX

Description

@MatroidX

Note: This information is duplicated on https://stackoverflow.com/questions/69849910/navigating-to-javascript-form-using-htmlunit. Feel free to request any additional information there.

Overview

I have successfully been using HtmlUnit to navigate BoardGameGeek and execute tasks (e.g. send GeekMail). Recently they changed their login from a normal webpage to a javascript-generated form, and now I just can't seem to access the login form using HtmlUnit no matter what I try, including:

  • Adding a WebWindowListener.
  • Waiting for javascript to complete with webClient.waitForBackgroundJavaScript(JAVASCRIPT_PAUSE) or webClient.waitForBackgroundJavaScriptStartingBefore(JAVASCRIPT_PAUSE).
  • Listing all windows (0) and forms (1).
  • Printing XML for the current HtmlPage and all HtmlForms.

Of course I want a general understanding beyond this specific webpage / javascript code, but I give this specific concrete example since the other StackOverflow questions I looked at didn't help in this particular situation (so maybe there is something unique here?).

Steps to Reproduce

  1. Use your browser to navigate to https://boardgamegeek.com/geekmail/compose?touser=FakeUserName (any name will suffice).
  2. As long as you are not logged in to BoardGameGeek, you will then see a pop-up window titled "Sign in" with text inputs "Username" and "Password".
  3. If you View Page Source, you will see that the form is generated by javascript (https://cf.geekdo-static.com/frontend/main-es2015.aeff0e4f13bcecc7eb55.js or https://cf.geekdo-static.com/frontend/main-es5.aeff0e4f13bcecc7eb55.js). As far as I can tell, the created form has no name / id that I can use to access it. Even if it did, I do not seem to be able to view it within HtmlUnit (i.e. "Sign up" never appears when I print XML, whether for HtmlPage or HtmlForm).

Existing code

Here is my current version of the code with multiple attempts made to diagnose the problem / extract some useful information:

import com.gargoylesoftware.htmlunit.WebClient;
...
import java.util.LinkedList;

public class GeekMailSender {
    ...
    
    // Static variable to track all website windows.
    private final static LinkedList<WebWindow> websiteWindows = new LinkedList<WebWindow>();
    
    // Inner-class to listen for new (i.e. pop-up) windows.
    static class GeekMailWindowListener implements WebWindowListener {
        public void webWindowClosed(WebWindowEvent event) {}
        public void webWindowContentChanged(WebWindowEvent event) {}
        public void webWindowOpened(WebWindowEvent event) {
            GeekMailSender.websiteWindows.add(event.getWebWindow());
        }
    }

    // Method to actually send GeekMail by navigating the BGG website.
    public static void sendGeekMail(...) {
        ...
        try (final WebClient webClient = new WebClient()) {
            // Track creation of new (i.e. pop-up) windows.
            websiteWindows.clear();
            webClient.addWebWindowListener(new GeekMailWindowListener());
            	
            // Try to access the GeekMail page.
            HtmlPage currentPage = webClient.getPage("https://boardgamegeek.com/geekmail/compose?touser=FakeUserName");
            String pageTitle = currentPage.getTitleText();
            System.out.println(pageTitle);  // BoardGameGeek
            
            // We may need to login first.
            if (!pageTitle.contains("GeekMail")) {
            	// Need to wait for javascript to complete, otherwise no forms are available.
            	webClient.waitForBackgroundJavaScriptStartingBefore(JAVASCRIPT_PAUSE);
            	// No difference if use webClient.waitForBackgroundJavaScript(JAVASCRIPT_PAUSE);

                // Unfortunately the only form found is the top-right Search form on BoardGameGeek.
                if (currentPage.getForms().isEmpty()) {
                        // This does NOT happen.
            		System.out.println("WARNING! No form found, even after waiting for javascript!");
            		return;
            	}

                // We don't find any windows at all... this confuses me.
            	if (websiteWindows.isEmpty()) {
                        // This does happen :(
            		System.out.println("WARNING! No windows found even after waiting for javascript!");
            	}

                // Additional printing does not reveal where the form is.
                // For instance, searching the XML for "Sign up" yields no results.
            	System.out.println(currentPage.asXml());
            	
            	// And printing the one form we can access reveals it is just the Search form.
            	System.out.println(currentPage.getForms().size());  // 1
            	final HtmlForm loginForm = currentPage.getForms().get(0);
            	System.out.println(loginForm.asXml());
                ...
            }
            ...
        }
        ...
    }

References

In trying to solve this, I have checked the following references (among many others):

Nevertheless, I seem unable to locate the desired form. Any help would be much appreciated!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions