Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkbox Input Elements Not Detected as Clickable by Agent #1153

Closed
Noam5 opened this issue Mar 26, 2025 · 9 comments
Closed

Checkbox Input Elements Not Detected as Clickable by Agent #1153

Noam5 opened this issue Mar 26, 2025 · 9 comments
Labels
bug Something isn't working

Comments

@Noam5
Copy link

Noam5 commented Mar 26, 2025

Bug Description

Description
The browser-use agent is unable to identify and interact with certain checkbox input elements. Specifically, Angular-generated checkbox inputs are not being framed as clickable elements like other interactive elements on the page.

Environment

LLM Models:
  • gemini-2.0-flash-exp
  • gemini-2.5-pro-exp-03-25

browser-use version: 0.1.40

Element Not Being Detected
<input _ngcontent-c11="" autocomplete="off" formcontrolname="agreedToMarketing" id="chkAgreeToMarketing" name="agreedToMarketing" required="" type="checkbox" class="ng-untouched ng-pristine ng-invalid" style="">

Expected Behavior
The agent should identify this checkbox as a clickable element and be able to interact with it (check/uncheck) like it does with other interactive elements on the page.

Actual Behavior
The agent does not recognize or frame this checkbox as an interactive element, making it impossible for the AI to interact with forms requiring checkbox confirmation.

Additional Context
This issue occurs consistently with Angular-generated checkbox inputs that have classes like "ng-untouched", "ng-pristine", and "ng-invalid". The agent seems to have difficulty recognizing these specific form elements as interactive.

Reproduction Steps

Set up an agent with either gemini-2.0-flash-exp or gemini-2.5-pro-exp-03-25
Direct the agent to a page containing Angular-generated checkboxes (particularly those with ng-* classes)
Observe that the agent cannot identify or interact with these checkboxes

Code Sample

from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent
import asyncio
import os
from dotenv import load_dotenv
load_dotenv()

async def main():
    # Initialize the LLM
    llm = ChatGoogleGenerativeAI(
        model='gemini-2.5-pro-exp-03-25',  # Also tested with gemini-2.0-flash-exp
        api_key=os.getenv("GEMINI_API_KEY")
    )
    
    # Create task focused on the checkbox issue
    task = """
    Navigate to https://www.go-ins.co.il/cars/policyproposal/insurance-type
    Find and click on the marketing agreement checkbox with ID "chkAgreeToMarketing"
    
    The specific element looks like this:
    <input _ngcontent-c11="" autocomplete="off" formcontrolname="agreedToMarketing" 
    id="chkAgreeToMarketing" name="agreedToMarketing" required="" type="checkbox" 
    class="ng-untouched ng-pristine ng-invalid" style="">
    """
    
    # Initialize the agent
    agent = Agent(
        task=task,
        llm=llm,
        use_vision=True,
        max_actions_per_step=20,
        retry_delay=5,
    )
    
    # Run the agent
    result = await agent.run()
    print("Agent execution complete")
    print(result)

if __name__ == '__main__':
    asyncio.run(main())

Version

0.1.40

LLM Model

Other (specify in description)

Operating System

Windows 10

Relevant Log Output

@Noam5 Noam5 added the bug Something isn't working label Mar 26, 2025
@pirate
Copy link
Member

pirate commented Mar 26, 2025

Did you install the tagged release 0.1.40 or just the code on main?

I ask because we pushed a bunch of fixes for element detection about ~10hr ago right around when you opened this. If you think you might have used the older code, can you re-test with the latest code on main? Thanks!

@djwealthblock
Copy link

I'm experiencing the same issue, so followed the suggestion of testing the code directly on main. However it appears to be buggy as it's now clicking on random links embedded within iframes on the page. For instance the site we are testing includes the Stripe SDK, as well as YouTube videos. Browser-use is somehow randomly clicking links within the Stripe iframe that the SDK loads (which is not even visible) and randomly opening tabs to https://m.stripe.network/. It also randomly opened a tab directly to a YouTube video that was being embedded. This errant behavior does not exist in release 0.1.40

@pirate
Copy link
Member

pirate commented Mar 26, 2025

Thank you for reporting, I recently added beta support for cross-origin iframes and it appears it's causing these issues. I will revert my change for now until the feature is improved to ignore invisible iframes.

@djwealthblock
Copy link

Ah I see. OK happy to test again when the changes have been reverted.

Appreciate the incredibly quick response. Thank you!

@pirate
Copy link
Member

pirate commented Mar 26, 2025

It is reverted @djwealthblock #1161

@djwealthblock
Copy link

Great, thanks. I can confirm that the iframe issue no longer exists. In addition, the main branch does have significantly more coverage of clickable elements, which also resolves my initial issue.

@pirate pirate closed this as completed Mar 26, 2025
@pirate pirate reopened this Mar 26, 2025
@pirate
Copy link
Member

pirate commented Mar 26, 2025

oh whoops closed too fast, @Noam5 is your issue still present on the latest main branch?

@Noam5
Copy link
Author

Noam5 commented Mar 30, 2025

After investigating, I was able to solve this problem by modifying the buildDomTree.js code to ensure all input elements are properly handled.

The issue was occurring because Angular-generated form controls weren't passing all the visibility and element position checks in the DOM extraction pipeline. The fix was to add special handling for input elements right after the nodeData object is initialized in the buildDomTree function:

// After nodeData is defined in buildDomTree function:
if (node.nodeType === Node.ELEMENT_NODE && node.tagName.toLowerCase() === 'input') {
  // Special handling for input elements - ensure they're always interactive
  nodeData.isVisible = true;
  nodeData.isTopElement = true;
  nodeData.isInteractive = true;
  nodeData.isInViewport = true;
  nodeData.highlightIndex = highlightIndex++;

  // Handle highlighting if enabled
  if (doHighlightElements) {
    if (focusHighlightIndex >= 0) {
      if (focusHighlightIndex === nodeData.highlightIndex) {
        highlightElement(node, nodeData.highlightIndex, parentIframe);
      }
    } else {
      highlightElement(node, nodeData.highlightIndex, parentIframe);
    }
  }
}

This approach ensures that all input elements (including checkboxes) are always treated as visible, interactive elements regardless of their CSS properties, position in the DOM, or framework-specific attributes.

@Noam5
Copy link
Author

Noam5 commented Mar 30, 2025

oh whoops closed too fast, @Noam5 is your issue still present on the latest main branch?

Actually when using the latest main branch the problem seems to be solved

@Noam5 Noam5 closed this as completed Mar 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants