-
Notifications
You must be signed in to change notification settings - Fork 166
Bump Windows max open files from 512 to 2048 #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/llama.cpp
Outdated
| } | ||
|
|
||
| #ifdef _WIN32 | ||
| int _setmaxstdio_ret = _setmaxstdio(2048); // 8,192 may be supported - https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmaxstdio?view=msvc-160 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to make this dependent on the value of GGML_MAX_CONTEXTS instead of it being simply set to 2048?
I don't know much about Windows, but if I understand correctly the description of the _setmaxstdio function, it changes the max. number of files that can be open at the same time at the stream I/O level. The default for this is 512. The Microsoft engineers must have had a reason to keep it at 512 instead of just setting it to the 8192 limit of the low I/O level. If they did have a reason, then my thinking is that it would be wise to not increase the stream I/O limit unless necessary. It only becomes necessary if we want to use more than 512 shards, which is only possible if we have changed the value of GGML_MAX_CONTEXTS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and this is what I was saying here: #611 (comment)
The default for this is 512. The Microsoft engineers must have had a reason to keep it at 512 instead of just setting it to the 8192 limit of the low I/O level.
Since this came up, I've looked into it, best reason I found was this (from a time when the true maximum was 2048):
I believe the limit has to do with the ability to inherit the open files from a CreateProcess call. The CreateProcess has only 2048 slots for passing handles (both on 32-bit and 64-bit). You can debug a program and step into the system, exec, or spawn CRT functions to see the limit of the 2048 slots.
If you use the Win32 file API (CreateFile, WriteFile, ReadFile, CloseHandle, etc.), then you don't have a limit on open files (well, you do but I believe it is based on your resources like memory).
alongside this corroborating piece from https://bugs.mysql.com/bug.php?id=24509 (they also mention Win32 on that page):
It's a hard windows limit due to the fact of using posix-like
functions in some places. I will open 2nd bug report about a
handle leak when that 2048 limit is hit.
If 2048/8192+ is wanted Win32 API might be needed (not sure how big a change that would be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are sure that limitations in CreateProcess implementation is the only reason, then it wouldn't be an issue as llama.cpp is not actually spawning new processes. A file handle leak each time one starts a llama.cpp process is not too bad either: one simply needs to reboot their Windows box from time to time just like in the old days. Just joking. If there is indeed a file handle leak, then it is even more important to make the increase conditional upon GGML_MAX_CONTEXTS > 512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change made. Please let me know if this is now acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is indeed a file handle leak, then it is even more important to make the increase conditional upon
GGML_MAX_CONTEXTS > 512.
I wouldn't take the "leak" part seriously as it is from "10 Dec 2006", just included that because it mentioned the handles. Win32 should only be needed if models large enough (much more than deepseek) and people have 2048 limits (instead of 8192).
Allows up to 2048 shards to be loaded on Windows builds, from the current default of 512. This change is specific to Windows, it instructs the Windows OS that the binary requires 2048 of max opened files. This is the equivalent to Linux's
ulimit -n.