Skip to content

osrm-routed connection accept loop can exit and not recover #6040

@mjjbell

Description

@mjjbell

As discussed in #6033

osrm-routed does not immediately clean up a keep-alive connection when the client closes it. Instead it waits for five seconds of inactivity before removing.

If you have a client that opens and closes a lot of keep-alive connections, it's possible for osrm-routed to run out of file descriptors whilst it waits for the clean-up to trigger.

The key point here is if this does happen, the connection acceptor loop exits. Even after the old connections are cleaned up, new ones will not be created. Any new requests will block until the server is restarted. See: https://github.com/Project-OSRM/osrm-backend/blob/master/include/server/server.hpp#L96

It's possible that there are other errors which will also generate this behaviour.

Ideas for improving the error handling:

  • Close connections as soon as they receive an error (e.g. when the client closes them).
  • Attempt to restart the connection accept loop on error.
  • Don't try to handle these errors, and ensure osrm-routed exits cleanly.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions