-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature to save webpage as PDF to given path #1095
Conversation
Awesome! What happens if the file location already exists? Can the agent cause any damage to the system, e.g., by replacing the home folder, etc.? |
Thats a good question. At the moment, there is no specific safeguard against writing to the filesystem except if the python process does not have write permissions for a folder. How do you suggest we can introduce some guardrails? |
webdriver-bidi (the upcoming CDP replacement) returns screenshots and PDFs as base64 over the channel to the caller to avoid the issue of direct filesystem writes from the browser. we can do the equivalent manually by running some https://developer.mozilla.org/en-US/docs/Web/API/Screen_Capture_API JS in the context of the page and returning base64 |
browser_use/controller/views.py
Outdated
class SavePDFAction(BaseModel): | ||
file_path: str | ||
media_type: Literal['screen', 'print'] = 'screen' | ||
format: Literal['A4', 'A3', 'A5', 'Legal', 'Letter'] = 'A4' | ||
print_background: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho this is nicely typed, but is maybe too much choice for LLMs. it will work for 99% of ppl fixed at A4
& screen
. if someone is really printing out webpages on paper then we could add env vars if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I set the media type as 'screen' and format as 'A4'. Kept the print_background parameter since that does change the actual pdf that is stored, and might be useful for customization. @pirate , let me know if the PR looks good now.
@pirate thanks for the feedback. So is your proposal to add a JS file in browser/ and then have the save_pdf controller action call a JS function? The function would then return a base64 image/pdf and we can write to filesystem separately? How exactly would that prevent malicious intent to write to file system though? |
No concrete proposal to change this PR, just making a note for future that we may want to switch methods as BIDI gets closer to being standardized. I think the security risk is acceptable for now as long as the Python code puts output in a dedicated output folder and doesn't allow writes elsewhere. |
@pirate , makes sense. Please let me know if you are able to add a note somewhere for a future enhancement (or I can write it) |
browser_use/controller/views.py
Outdated
class SavePDFAction(BaseModel): | ||
file_path: str | ||
print_background: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think we can remove params entirely to not give the LLM decision fatigue and also avoid potential security issues of allowing it to come up with arbitrary (possibly conflicting/invalid/undescriptive) filenames.
There are improvements to download saving coming soon in this branch: https://github.com/browser-use/browser-use/tree/gregor/lib-48-better-file-uploads
so this will be revisited and improved soon anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is, If the user wants to save to a specific folder, by saying "go to apple.com and save the homepage as a PDF to /Users/abc/xyz", we would lose this capability. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that is intentional. the LLM should not be able to write to arbitrary paths on the filesystem for security reasons, we should definitely restrict it to the current working directory or a dedicated temp directory + sanitize & truncate filenames to prevent OSError
s. Otherwise you could convince an LLM to overwrite/write to ~/.bashrc
, /etc/password
, ~/Downloads/invoice_to_be_paid.pdf
etc. by hiding some malicious jailbreak prompt in the page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
No need for SavePDFAction Co-authored-by: Nick Sweeting <[email protected]>
Improvements to use simple file naming instead of LLM generated Co-authored-by: Nick Sweeting <[email protected]>
@ml5ah thanks! |
@MagMueller in this PR the path is already saved within the ActionResult with memory enabled, did you mean something more than that? Also this does tie into @gregpr07's work on file uploads, perhaps we wait until he merges that and then refactor to use consistent naming, santiization, and temp dir approach for all download/upload handling:
|
This is a simple feature to save a webpage as PDF to a given path.
Brief summary to changes:
1. service.py: A new SavePDFAction has been introduced.
2. views.py: Definition for SavePDFAction params.
Happy to incorporate any feedback.