Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open cross-origin iframes in a new tab to avoid needing --disable-web-security #1114

Conversation

pirate
Copy link
Member

@pirate pirate commented Mar 23, 2025

Related issues (this first-pass PR does not fully fix all of these, but it should help):


This PR attempts to break CORS from the agent's perspectve and allow buildDOMTree.js to traverse cross-origin iframes just like it currently does non-cross-origin iframes. Unfortunately the same approach used for same-origin frames is not possible, browser security forces us to go around the wall via playwright instead, but this introduces several new hurdles.

We have to create a new id translation mapping that mutates all the subframe generated DOM trees to reconnect the references to cross-origin parents & children. We also have to deterministically merge/offset all the IDs and update child id references to prevent conflicts, without introducing any dependency between frames during tree generation to keep it parallel.


The main challenges

  • playwright doesn't deterministically provide all the frames in the first place‼️ with identical settings some pageloads page.frames contain cross-origin frames, exit, retry and the next time it doesn't, retry a few times then it does, etc. there is a race or timing issue
  • iframes arent guaranteed to have any id, name, or unique src url, the standard API to pierce and find a nested iframe only accepts name= or url= as lookup params and nothing else.
  • iframes and their content can be added/removed/updated multiple times per second by JS, and their context handles and all child handles (and potentially nested frames) break any time they navigate
  • non-cross-origin iframes can nested inside cross-origin iframes and vice versa up to 3x deep

In an ideal world we'd want

  • every <iframe /> to have a stable and globally unique name="..." or url="..." attr
  • which means page.frame('name') works to pierce and get that specific iframe even if nested
  • it also lets us merge separate DOMTree maps without conflicts because every frame ends up with a unique xpath /html/body/iframe[somename]/html/body/iframe[othername]
  • all frames should inject buildDOMTree.js in parallel for speed, and output should be merged at the end. building it frame by frame and passing the growing indexOffset counter between them serially is too slow

Approaches tried

  • pre-tagging every iframe encountered with a data-browser-use-iframe-id={idx} attr doesn't work because JS frameworks like react/vue/etc redraw the element immediately, also doesn't solve the issue of how to pierce parent frames to find the element later
  • using playwright to trick the browser into thinking it's a same-origin domain by rewriting the iframe request and response URLs too complex/browser security prevents it without crazy root CA hacks
  • passing an indexOffset counter into buildDOMTree.js so the IDs it generates don't conflict with the parent frame forces them to run serially, or forces us to switch to a longer ID format w/ random ids (which might break vision, the system prompt, or lower LLM success)
  • give up on nested and non-unique iframes and only support top-level cross-origin and same-origin iframes I got this 1/2 working but it's slow and makes some pages more buggy
  • ⭐️ SIMPLEST SOLUTION: if any visible (non-ads) cross-origin iframe is present on the page, simply open the url for it in a new tab and treat it like a separate page, with a special title telling the LLM it's a popup that should be treated as part of the previous page

@pirate pirate marked this pull request as ready for review March 24, 2025 23:57
@pirate pirate changed the title [WIP] Pierce cross-origin iframes without needing --disable-web-security Open cross-origin iframes in a new tab to avoid needing --disable-web-security Mar 25, 2025
@pirate pirate merged commit d6ab307 into browser-use:main Mar 25, 2025
1 check passed
@pirate
Copy link
Member Author

pirate commented Mar 26, 2025

  • go back to doing this properly in the future
    • instead of using playwright page.frames to get nested cross-origin frames, use the output of the first buildDOMTree, and iterate through the frames it found (while you have the handle/xpath), and generate the domtree for elements within, then remap all the idxs and parent/child relationships + apprend the inner xpaths to the outer xpaths
    • update/make sure the code in browser/context.py:get_locate_element works to handle the clicking with severed xpaths across cross-origin boundaries

@pirate pirate deleted the nick/tri-4-make-cross-site-iframes-work-without-disabling-chrome branch March 28, 2025 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant