Running Breeze Consumes 100% Host Memory #29731
-
|
I just cloned the repository today (latest For some context, I'm not running in a constrained environment. On the contrary, I have 48 cores, 96GB of RAM and well over a TB of disk space available on my host machine. As soon as any of the main 3 processes mentioned above starts, the memory usage starts climbing rapidly, and then the kernel OOM steps in and kills them. No services ever start, and normally all three get killed at around ~60GB of allocated memory by the kernel OOM reaper. Occasionally, two of the processes will get killed and the third (e.g. the webserver) might keep running. In those cases, I've seen the web server continue running and consuming ~70-80GB of RAM, but it's not really usable since the other components can't start. I did try to search through issues (open/closed) and discussions but didn't find anything identical to my situation. I found a few memory leak discussions but those were over longer time periods and related to K8s or other deployment scenarios. I'm explicitly just trying to get the development environment running. $ docker --version
Docker version 23.0.1, build a5ee5b1dfc
$ docker-compose --version
Docker Compose version 2.16.0 |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 22 replies
-
|
This is rather strange and not easily reproducible - and it might likely indicate your docker installation is broken somehow. Result of I run it for 20 minutes without even the least significant digit change once it hit 1.523 GB. What I would suggest is to go step-by-step through the points in https://github.com/apache/airflow/blob/main/BREEZE.rst#troubleshooting and see whether they fix your problems. Also it might be worthwile to check if running some other docker run software (with mounted volumes) has the same effects. Few other issues that might make your machine having troubles is some low-level kernel or filesystem related problems. It could be for example thet the filesystem you try to check-out Airflow on is somewhat unfriendly with docker volume mounts or that your docker engine uses some non-standard settings for backing storage (docker engine might use various backend storages to store theyr layers and containers. It might for example be that the drivers used by docker have memory leak and they are causing memory growth by synchronising volumes. Or it might be a kernel issue in your machine - so maybe rather than looking for similar "Airflow" issues - look for similar "docker" issues for your machine. Also If you run your docker in a VM on Windows / WSL2 or others, you might have various problems related to particular issues related with those (and filesystem issues there are usually one of the culprits of strange behaviours). Making sure that your filesystem where you checkout the sources is "native" (ie. not additionally mounted between HOST OS and VM that runs docker engine) usually helps a lot. |
Beta Was this translation helpful? Give feedback.
-
|
I have the same problem when trying to run in docker on Fedora. After some experimentation, I found that using any version starting from 2.3.4 results in what you described. While on Ubuntu 20.04 everything works fine. |
Beta Was this translation helpful? Give feedback.
-
|
I don't see a way to mark one of the comments as the "answer" here, so I'll re-post this for clarity. The underlying problem is that All of this together means that if you have a relatively new version of In moby/moby#43361 (comment), a user claimed that using Docker Desktop for Linux does not have this problem. This is likely due to it using an older version of Alternatively, @Blizzke mentioned that he could get around this by calling My personal preferred solution: A suitable workaround is just to manually override the limit set in the systemd unit. Systemd allows you to provide an override file which, as the name implies, overrides select properties in a unit without having to modify the source unit (see Example 2 in the systemd unit man page). This has the benefit of not requiring modifications to any container images, and not uninstalling or reinstalling any software, and should not be affected by updating To create an override file, use the command [Service]
LimitNOFILE=1048576This command will create a file named To remove this modification in the future (when the problem is hopefully fixed upstream), you can simply remove the override file. |
Beta Was this translation helpful? Give feedback.
-
|
Looks like we know the dependency that causes it now - it's python-daemon so maybe we will be able to do something about it (looking at it now). See #29841 (comment) and the issue opened in https://pagure.io/python-daemon/issue/72 (upvoting might increase our chances to get it solved upstream in python-daemon) |
Beta Was this translation helpful? Give feedback.
-
|
@calebstewart @Blizzke @j2cry @rodrigorochag I'd really love if you could apply the two commits / patch airflow to see if that fixes the problem regardless from the workaround #29848 |
Beta Was this translation helpful? Give feedback.
-
|
The python-daemon==3.0.0 has been releassed with the fix - you can upgrade it and it should fix the problem. I am adding >= 3.0.0 for 2.5.2 release as well. |
Beta Was this translation helpful? Give feedback.


I don't see a way to mark one of the comments as the "answer" here, so I'll re-post this for clarity.
The underlying problem is that
containerdmade a change which sets thenofilelimit toinfinityon the service. This trickles down to all the child containers. Some services such as MySQL and apparently something within Airflow attempt to pre-allocate resources based on the number of potentially open files. Therefore, when the limit is set to infinity, they allocate huge amounts of memory on startup in an attempt to pre-optimize.All of this together means that if you have a relatively new version of
containerd, then certain container images will cause memory exhaustion as described in th…