Commit 3ab3b13
fix(rpc): Read RPC responses as binary data in Python (semgrep/semgrep-proprietary#5117)
Change the Python RPC implementation to read from the sub-process's
output stream in bytes rather than Unicode characters. All of the IO is
now measured in bytes, with explicit encoding/decoding steps to convert
to text.
The RPC format consists of a length in bytes followed by that many bytes
of UTF-8-encoded text. However, the current Python implementation reads
data from the process *as text* (`text=True` when starting the process),
so `io.read(n)` counts in Unicode characters rather than bytes. When the
RPC output includes non-ASCII characters, the number of bytes written in
the message header is larger than the number of Unicode characters in
the stream.
This has not been a problem so far because we only run a single RPC call
per process. After the RPC call we close the stream and send an EOF, so
`io.read(n)` will read the whole string even if it has `< n` characters.
However, this caused a problem when I implemented running multiple RPC
calls through a single long-lived process because `io.read(n)` would
block indefinitely if the stream did not contain at least `n`
characters. This change fixes that problem.
Test plan: ran existing tests + reproduced the problem and fix on top of
#5066.
synced from Pro d507ac7668dcccb43c12dc732a615866d53dc12b1 parent e60b2d5 commit 3ab3b13
1 file changed
+9
-10
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
62 | | - | |
| 62 | + | |
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
| 76 | + | |
77 | 77 | | |
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
| 82 | + | |
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
| 89 | + | |
90 | 90 | | |
91 | 91 | | |
92 | | - | |
| 92 | + | |
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
| |||
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
102 | | - | |
| 102 | + | |
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
106 | | - | |
107 | | - | |
| 106 | + | |
| 107 | + | |
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
| |||
156 | 156 | | |
157 | 157 | | |
158 | 158 | | |
159 | | - | |
160 | | - | |
| 159 | + | |
161 | 160 | | |
162 | 161 | | |
163 | 162 | | |
| |||
0 commit comments