Describe the bug
This bug happens when client machine time is much faster than the server machine time.
Originally reported by @KSchmidtACR here: #1815
When server sends out a request, it will set "MessageHeaderKey.WAIT_UNTIL" to be its current time.time() + wait_time.
When client receives it, since client time is much faster than server time, it thinks the server timeout already
https://github.com/NVIDIA/NVFlare/blob/dev/nvflare/fuel/f3/cellnet/cell.py#L1828-L1835
To Reproduce
Steps to reproduce the behavior:
- Starts a server at machine 1
- Starts a client at machine 2, machine 2's time is faster than machine one (say an hour)
- Just submit a job and run then we will see error logs like
server:
2023-06-20 19:48:15,089 - Cell - DEBUG - server: set up waiter 73c1427d-687c-447f-befa-3f275537da62 to wait for 15 secs
client:
2023-06-20 20:12:18,877 - Cell - DEBUG - site-1: received message: {'cn__topic': 'admin', 'cn__channel': 'admin', 'cn__destination': 'site-1', 'cn__req_id': '73c1427d-687c-447f-befa-3f275537da62', 'cn__reply_expected': True, 'cn__wait_until': 1687290510.0878868, 'cn__optional': False, 'cn__origin': 'server', 'cn__from': 'server', 'cn__msg_type': 'req', 'cn__route': [['server', 1687290495.0881503]], 'cn__to': 'site-1', 'cn__payload_encoding': 'fobs', 'cn__send_time': 1687290495.0882032}
2023-06-20 20:12:18,878 - Cell - DEBUG - site-1: processing incoming request
2023-06-20 20:12:18,878 - Cell - DEBUG - site-1: calling registered request CB
2023-06-20 20:12:18,878 - Cell - DEBUG - site-1: calling CB _dispatch_request
2023-06-20 20:12:18,878 - GPUResourceManager - DEBUG - [identity=41c6e867-7691-492c-bf00-ab4ef0f99968, run=?]: reserving resources: {} for requirements {}.
2023-06-20 20:12:18,879 - GPUResourceManager - DEBUG - [identity=41c6e867-7691-492c-bf00-ab4ef0f99968, run=?]: current resources: {}, reserved_resources {'492d5744-acf3-4287-a076-62bd0984cdf0': ({}, 5), '489bb50e-54c8-4cd2-838e-0104de25d312': ({}, 30)}.
2023-06-20 20:12:18,879 - Cell - DEBUG - site-1: don't send response - reply is too late
Expected behavior
Things should run in normal
Describe the bug
This bug happens when client machine time is much faster than the server machine time.
Originally reported by @KSchmidtACR here: #1815
When server sends out a request, it will set "MessageHeaderKey.WAIT_UNTIL" to be its current time.time() + wait_time.
When client receives it, since client time is much faster than server time, it thinks the server timeout already
https://github.com/NVIDIA/NVFlare/blob/dev/nvflare/fuel/f3/cellnet/cell.py#L1828-L1835
To Reproduce
Steps to reproduce the behavior:
server:
client:
Expected behavior
Things should run in normal