Skip to content

Don't assume that c-ares won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP#25726

Merged
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:fix_sporadic_windows_crash
Mar 16, 2021
Merged

Don't assume that c-ares won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP#25726
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:fix_sporadic_windows_crash

Conversation

@apolcyn
Copy link
Copy Markdown
Contributor

@apolcyn apolcyn commented Mar 16, 2021

Per conversation in #22555

cc @okravets

Looking at the trace provided in #22555 (comment), they interesting part is what happens to c-ares socket: 868. Filtering for that socket in the logs:

D0315 12:36:45.918000000  5828 grpc_ares_ev_driver_windows.cc:811] (c-ares resolver) fd:|c-ares socket: 868| created with params af:2 type:2 protocol:0
D0315 12:36:45.919000000  5828 grpc_ares_ev_driver_windows.cc:513] (c-ares resolver) fd:c-ares socket: 868 ConnectUDP
D0315 12:36:45.922000000  5828 grpc_ares_ev_driver_windows.cc:524] (c-ares resolver) fd:c-ares socket: 868 WSAConnect error code:|0| msg:|The operation completed successfully.
D0315 12:36:45.923000000  5828 grpc_ares_ev_driver_windows.cc:353] (c-ares resolver) fd:|c-ares socket: 868| SendV called connect_done_:1 wsa_connect_error_:0
D0315 12:36:45.924000000  5828 grpc_ares_ev_driver_windows.cc:377] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP called
D0315 12:36:45.925000000  5828 grpc_ares_ev_driver_windows.cc:345] (c-ares resolver) fd:|c-ares socket: 868| SendWriteBuf WSASend buf.len:46 *bytes_sent_ptr:46 overlapped:0000000000000000 return:0 *wsa_error_code:0
D0315 12:36:45.929000000  5828 grpc_ares_ev_driver_windows.cc:353] (c-ares resolver) fd:|c-ares socket: 868| SendV called connect_done_:1 wsa_connect_error_:0
D0315 12:36:45.930000000  5828 grpc_ares_ev_driver_windows.cc:377] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP called
D0315 12:36:45.931000000  5828 grpc_ares_ev_driver_windows.cc:345] (c-ares resolver) fd:|c-ares socket: 868| SendWriteBuf WSASend buf.len:46 *bytes_sent_ptr:46 overlapped:0000000000000000 return:0 *wsa_error_code:0
D0315 12:36:45.932000000  5828 grpc_ares_ev_driver.cc:392] (c-ares resolver) request:00000231EA8E7280 new fd: c-ares socket: 868
D0315 12:36:45.934000000  5828 grpc_ares_ev_driver.cc:407] (c-ares resolver) request:00000231EA8E7280 notify read on: c-ares socket: 868
D0315 12:36:45.942000000  5828 grpc_ares_ev_driver_windows.cc:168] (c-ares resolver) fd:|c-ares socket: 868| InnerContinueRegisterForOnReadableLocked wsa_connect_error_:0
D0315 12:36:45.944000000  5828 grpc_ares_ev_driver_windows.cc:189] (c-ares resolver) fd:|c-ares socket: 868| RegisterForOnReadableLocked WSARecvFrom error code:|997| msg:|Overlapped I/O operation is in progress.
D0315 12:36:46.879000000  4804 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:46.881000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:47.880000000  1292 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:47.882000000  1292 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:48.878000000  4804 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:48.879000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:49.878000000  1292 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:49.880000000  1292 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:50.867000000  4804 grpc_ares_ev_driver_windows.cc:651] (c-ares resolver) fd:|c-ares socket: 868| OnIocpReadable finishing. read buf length now:|96|
D0315 12:36:50.869000000  4804 grpc_ares_ev_driver.cc:313] (c-ares resolver) request:00000231EA8E7280 readable on c-ares socket: 868
D0315 12:36:50.870000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:1 Current read buf length:|96|
D0315 12:36:50.872000000  4804 grpc_ares_ev_driver_windows.cc:353] (c-ares resolver) fd:|c-ares socket: 868| SendV called connect_done_:1 wsa_connect_error_:0
D0315 12:36:50.873000000  4804 grpc_ares_ev_driver_windows.cc:377] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP called
D0315 12:36:50.875000000  4804 grpc_ares_ev_driver_windows.cc:345] (c-ares resolver) fd:|c-ares socket: 868| SendWriteBuf WSASend buf.len:46 *bytes_sent_ptr:0 overlapped:0000000000000000 return:-1 *wsa_error_code:10022
D0315 12:36:50.876000000  4804 grpc_ares_ev_driver_windows.cc:388] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP SendWriteBuf error code:10022 msg:|An invalid argument was supplied.
D0315 12:36:50.897000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|0|
D0315 12:36:50.900000000  4804 grpc_ares_ev_driver.cc:407] (c-ares resolver) request:00000231EA8E7280 notify read on: c-ares socket: 868
D0315 12:36:50.920000000  4804 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:50.922000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:50.927000000  4804 grpc_ares_ev_driver_windows.cc:168] (c-ares resolver) fd:|c-ares socket: 868| InnerContinueRegisterForOnReadableLocked wsa_connect_error_:0
D0315 12:36:51.931000000  1292 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:51.933000000  1292 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:52.941000000 22148 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:52.942000000 22148 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:53.941000000  4804 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:53.943000000  4804 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:54.945000000  1292 grpc_ares_ev_driver.cc:284] (c-ares resolver) request:00000231EA8E7280 ev_driver=00000231E892C070 on_ares_backup_poll_alarm_locked; ares_process_fd. fd=c-ares socket: 868
D0315 12:36:54.947000000  1292 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:0 Current read buf length:|4192|
D0315 12:36:55.924000000 22148 grpc_ares_ev_driver_windows.cc:651] (c-ares resolver) fd:|c-ares socket: 868| OnIocpReadable finishing. read buf length now:|96|
D0315 12:36:55.925000000 22148 grpc_ares_ev_driver.cc:313] (c-ares resolver) request:00000231EA8E7280 readable on c-ares socket: 868
D0315 12:36:55.926000000 22148 grpc_ares_ev_driver_windows.cc:291] (c-ares resolver) fd:|c-ares socket: 868| RecvFrom called read_buf_has_data:1 Current read buf length:|96|
D0315 12:36:55.928000000 22148 grpc_ares_ev_driver_windows.cc:353] (c-ares resolver) fd:|c-ares socket: 868| SendV called connect_done_:1 wsa_connect_error_:0
D0315 12:36:55.929000000 22148 grpc_ares_ev_driver_windows.cc:377] (c-ares resolver) fd:|c-ares socket: 868| SendVUDP called
E0315 12:36:55.930000000 22148 grpc_ares_ev_driver_windows.cc:378] assertion failed: GRPC_SLICE_LENGTH(write_buf_) == 0

The main thing to note is that the c-ares library attempted a write on this UDP socket and got an "Invalid argument" error on that attempt (which can apparently happen on network issues). Then, later, the c-ares library attempted to write on the same socket, again.

The c-ares windows UDP socket handler currently assumes that if a UDP socket returns an error, that the c-ares library won't try to re-send data on that socket. However, looking closely at this codepath, c-ares won't necessarily do that. Thus, we are vulnerable to hitting the assertion error in here.

So the patch listed in #22555 (comment) looks like the right fix.

@apolcyn apolcyn added lang/core release notes: yes Indicates if PR needs to be in release notes labels Mar 16, 2021
@apolcyn apolcyn changed the title Attempt to fix https://github.com/grpc/grpc/issues/22555 Don't assume that no previous writes failed in grpc_core::GrpcPolledFdWindows::SendVUDP Mar 16, 2021
@apolcyn apolcyn changed the title Don't assume that no previous writes failed in grpc_core::GrpcPolledFdWindows::SendVUDP Don't assume that we won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP Mar 16, 2021
@apolcyn apolcyn changed the title Don't assume that we won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP Don't assume that c-ares won't retry failed writes in grpc_core::GrpcPolledFdWindows::SendVUDP Mar 16, 2021
@apolcyn apolcyn marked this pull request as ready for review March 16, 2021 19:21
@apolcyn apolcyn requested a review from markdroth as a code owner March 16, 2021 19:21
@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Mar 16, 2021

basic tests objc ios: #21858
interop cloud to cloud and cloud to prod: #25737
ios binary size diff: #25431

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang/core release notes: yes Indicates if PR needs to be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants