Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to save webpage as PDF to given path #1095

Merged
merged 6 commits into from
Mar 23, 2025

Conversation

ml5ah
Copy link
Contributor

@ml5ah ml5ah commented Mar 21, 2025

This is a simple feature to save a webpage as PDF to a given path.
Brief summary to changes:
1. service.py: A new SavePDFAction has been introduced.
2. views.py: Definition for SavePDFAction params.

Happy to incorporate any feedback.

@CLAassistant
Copy link

CLAassistant commented Mar 21, 2025

CLA assistant check
All committers have signed the CLA.

@MagMueller
Copy link
Collaborator

Awesome! What happens if the file location already exists? Can the agent cause any damage to the system, e.g., by replacing the home folder, etc.?

@ml5ah
Copy link
Contributor Author

ml5ah commented Mar 22, 2025

Thats a good question. At the moment, there is no specific safeguard against writing to the filesystem except if the python process does not have write permissions for a folder.
The expected behavior is that, as part of the SavePDFAction, a file path is returned by the model (with an appropriate file name based on page URL, content, etc.). Since it only writes files to systems, I'm thinking its unlikely that it can replace a crucial folder.

How do you suggest we can introduce some guardrails?

@pirate
Copy link
Member

pirate commented Mar 22, 2025

webdriver-bidi (the upcoming CDP replacement) returns screenshots and PDFs as base64 over the channel to the caller to avoid the issue of direct filesystem writes from the browser. we can do the equivalent manually by running some https://developer.mozilla.org/en-US/docs/Web/API/Screen_Capture_API JS in the context of the page and returning base64

Comment on lines 51 to 55
class SavePDFAction(BaseModel):
file_path: str
media_type: Literal['screen', 'print'] = 'screen'
format: Literal['A4', 'A3', 'A5', 'Legal', 'Letter'] = 'A4'
print_background: bool = False
Copy link
Member

@pirate pirate Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho this is nicely typed, but is maybe too much choice for LLMs. it will work for 99% of ppl fixed at A4 & screen. if someone is really printing out webpages on paper then we could add env vars if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I set the media type as 'screen' and format as 'A4'. Kept the print_background parameter since that does change the actual pdf that is stored, and might be useful for customization. @pirate , let me know if the PR looks good now.

@ml5ah
Copy link
Contributor Author

ml5ah commented Mar 22, 2025

@pirate thanks for the feedback. So is your proposal to add a JS file in browser/ and then have the save_pdf controller action call a JS function? The function would then return a base64 image/pdf and we can write to filesystem separately?

How exactly would that prevent malicious intent to write to file system though?

@pirate
Copy link
Member

pirate commented Mar 22, 2025

No concrete proposal to change this PR, just making a note for future that we may want to switch methods as BIDI gets closer to being standardized.

I think the security risk is acceptable for now as long as the Python code puts output in a dedicated output folder and doesn't allow writes elsewhere.

@ml5ah
Copy link
Contributor Author

ml5ah commented Mar 22, 2025

@pirate , makes sense. Please let me know if you are able to add a note somewhere for a future enhancement (or I can write it)

Comment on lines 51 to 53
class SavePDFAction(BaseModel):
file_path: str
print_background: bool = False
Copy link
Member

@pirate pirate Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think we can remove params entirely to not give the LLM decision fatigue and also avoid potential security issues of allowing it to come up with arbitrary (possibly conflicting/invalid/undescriptive) filenames.

There are improvements to download saving coming soon in this branch: https://github.com/browser-use/browser-use/tree/gregor/lib-48-better-file-uploads
so this will be revisited and improved soon anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is, If the user wants to save to a specific folder, by saying "go to apple.com and save the homepage as a PDF to /Users/abc/xyz", we would lose this capability. What are your thoughts?

Copy link
Member

@pirate pirate Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is intentional. the LLM should not be able to write to arbitrary paths on the filesystem for security reasons, we should definitely restrict it to the current working directory or a dedicated temp directory + sanitize & truncate filenames to prevent OSErrors. Otherwise you could convince an LLM to overwrite/write to ~/.bashrc, /etc/password, ~/Downloads/invoice_to_be_paid.pdf etc. by hiding some malicious jailbreak prompt in the page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ml5ah and others added 3 commits March 22, 2025 14:01
No need for SavePDFAction

Co-authored-by: Nick Sweeting <[email protected]>
Improvements to use simple file naming instead of LLM generated

Co-authored-by: Nick Sweeting <[email protected]>
@MagMueller MagMueller merged commit 134f0f2 into browser-use:main Mar 23, 2025
1 check passed
@MagMueller
Copy link
Collaborator

@ml5ah thanks!
Can you think of a system on how to still get the saved path as a developer when the agent is done to further process it? For example, via ActionResult.

@pirate
Copy link
Member

pirate commented Mar 23, 2025

@MagMueller in this PR the path is already saved within the ActionResult with memory enabled, did you mean something more than that?

Also this does tie into @gregpr07's work on file uploads, perhaps we wait until he merges that and then refactor to use consistent naming, santiization, and temp dir approach for all download/upload handling:

  • downloaded PDFs: the code from this PR
  • downloaded HTML: add html download action in the registry #886
  • downloaded screenshots: does this exist already? if not we should probably add an action to allow the agent do download a screenshot as an output meant for the user (not just for the vision model)
  • downloaded files triggered by page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants