-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
osrm-routed connection accept loop can exit and not recover #6040
Description
As discussed in #6033
osrm-routed does not immediately clean up a keep-alive connection when the client closes it. Instead it waits for five seconds of inactivity before removing.
If you have a client that opens and closes a lot of keep-alive connections, it's possible for osrm-routed to run out of file descriptors whilst it waits for the clean-up to trigger.
The key point here is if this does happen, the connection acceptor loop exits. Even after the old connections are cleaned up, new ones will not be created. Any new requests will block until the server is restarted. See: https://github.com/Project-OSRM/osrm-backend/blob/master/include/server/server.hpp#L96
It's possible that there are other errors which will also generate this behaviour.
Ideas for improving the error handling:
- Close connections as soon as they receive an error (e.g. when the client closes them).
- Attempt to restart the connection accept loop on error.
- Don't try to handle these errors, and ensure
osrm-routedexits cleanly.