What happened?
I am running a Selenium instance with a Firefox driver to scrape and download many files from a website. This runs as a Python Flask web service inside a Docker container.
I discovered that my container would scrape a few pages before it began to hit it's memory limits and need a restart. I used Python's default profiler to investigate where the memory allocation was growing and discovered that the process handler to the log file continued to grow with each execution. This was especially surprising given that I was pointing the logging service to /dev/null.
I was able to resolve this in my code by manually closing the file handler prior to calling driver.quit(). I think it might be best if driver.quit() handled closing this handler internally.
How can we reproduce the issue?
# This was a wrapper I created to ensure that the log file handler closes when
# I am finished with the driver. If you remove the line that closes the handler
# and run this driver instance against a site multiple times, you'll observe
# that the handler eats up more and more space. If you don't have a lot of
# memory, you may also observe that each execution gets slower.
class DriverManager:
"""wraps a selenium driver instance
attrs:
download_path (str): the folder location that driver downloads will be placed inside
firefox_exe_path (str): the path to the Firefox executable
gecko_driver_exe_path (str): the path to the Gecko Driver executable
"""
def __init__(
self,
download_path: str,
firefox_exe_path: str,
gecko_driver_exe_path: str,
):
self.download_path = download_path
# Setup the firefox webdriver
service = Service(executable_path=gecko_driver_exe_path, log_path=os.devnull)
options = Options()
options.headless = True
options.binary = firefox_exe_path
options.set_preference("browser.download.folderList", 2)
options.set_preference("browser.download.manager.showWhenStarting", False)
options.set_preference("browser.download.dir", download_path)
options.set_preference("download.prompt_for_download", False)
options.set_preference(
"browser.helperApps.neverAsk.saveToDisk", "application/pdf"
)
options.set_preference("pdfjs.disabled", True)
options.set_capability("marionette", True)
self.driver = Firefox(options=options, service=service)
def __enter__(self) -> Firefox:
return self.driver
def __exit__(self, exception_type, exception_val, trace):
# closes file handler manually to fix memory leak
self.driver.binary._log_file.close()
self.driver.quit()
Relevant log output
I wish I'd kept the profiler output, but I don't have it anymore.
Operating System
Debian Buster
Selenium version
Python 4.1.3
What are the browser(s) and version(s) where you see this issue?
Firefox 102.0.1
What are the browser driver(s) and version(s) where you see this issue?
GeckoDriver v0.31.0
Are you using Selenium Grid?
No response
What happened?
I am running a Selenium instance with a Firefox driver to scrape and download many files from a website. This runs as a Python Flask web service inside a Docker container.
I discovered that my container would scrape a few pages before it began to hit it's memory limits and need a restart. I used Python's default profiler to investigate where the memory allocation was growing and discovered that the process handler to the log file continued to grow with each execution. This was especially surprising given that I was pointing the logging service to
/dev/null.I was able to resolve this in my code by manually closing the file handler prior to calling driver.quit(). I think it might be best if driver.quit() handled closing this handler internally.
How can we reproduce the issue?
Relevant log output
I wish I'd kept the profiler output, but I don't have it anymore.Operating System
Debian Buster
Selenium version
Python 4.1.3
What are the browser(s) and version(s) where you see this issue?
Firefox 102.0.1
What are the browser driver(s) and version(s) where you see this issue?
GeckoDriver v0.31.0
Are you using Selenium Grid?
No response