Stop reraising global exceptions in greendns (alternative to #811)#682
Stop reraising global exceptions in greendns (alternative to #811)#682jjohnson42 wants to merge 2 commits intoeventlet:masterfrom
Conversation
Having a single Exception and re-raising it was the culprit for a user's memory leak. Evidently each raise could compound including more stack frames, causing memory to be forever tangled in an ever growing Exception each time it was reused. Correct while preserving DRY by using the global to hold the arguments to socket.gaierror, rather than a socket.gaierror instance.
|
This is really interesting – can you provide some details as to how the stack frames are being accumulated? (I'm interested in what object reference chain leads to this) |
|
I didn't quite get the tight understanding, and I failed to produce the same However, guppy shortest path to the huge bunch of stackframes was always attached to that single instance of the socket.gaierror (well, tied between NODATA and NONAME, which is a reference to the same single exception). I am not quite familiar enough to have dug in and more precisely understand the linkages in more detail. The DNS server triggered ENODATA on an IPv6 lookup, and that gets swallowed in getaddrinfo since they were IPv4 based anyway, but with the side-effect of bigger and bigger global EAI_NODATA_ERROR object. I did confirm low memory usage after change, and then reverted the change and the memory consumption shot back up as expected. Sorry I don't quite understand things enough to be more detailed... |
|
@jjohnson42 memory consumption is good enough unit test for me. Check this out https://code.activestate.com/recipes/577504/ (linked in stdlib doc https://docs.python.org/3/library/sys.html#sys.getsizeof ) |
eventlet/support/greendns.py
Outdated
| EAI_EAGAIN_ERRORARGS = (socket.EAI_AGAIN, 'Lookup timed out') | ||
| EAI_NONAME_ERRORARGS = (socket.EAI_NONAME, 'Name or service not known') | ||
| EAI_NONAME_ERROR = socket.gaierror(*EAI_NONAME_ERRORARGS) |
There was a problem hiding this comment.
- (the only major comment here) hide exception args from public API (
_FOO_ARGS) FOO_ERROR_ARGSorFOO_ARGSEAI_EAGAIN_ERRORwas deleted, probably good idea to leave it like NONAME, NODATA errors.
| EAI_EAGAIN_ERRORARGS = (socket.EAI_AGAIN, 'Lookup timed out') | |
| EAI_NONAME_ERRORARGS = (socket.EAI_NONAME, 'Name or service not known') | |
| EAI_NONAME_ERROR = socket.gaierror(*EAI_NONAME_ERRORARGS) | |
| _EAI_EAGAIN_ARGS = (socket.EAI_AGAIN, 'Lookup timed out') | |
| EAI_EAGAIN_ERROR = socket.gaierror(*EAI_EAGAIN_ARGS) | |
| _EAI_NONAME_ARGS = (socket.EAI_NONAME, 'Name or service not known') | |
| EAI_NONAME_ERROR = socket.gaierror(*EAI_NONAME_ARGS) |
There was a problem hiding this comment.
The desired change is in the lattest commit
|
Ok, made suggested changes. The problem I have is less about how to measure it, it's not being able to produce the same behavior. When I try to write a test case, the size of the exception does not seem to grow on re-raise. It clearly grew in their environment, and it clearly came from the exception and Frame objects. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #682 +/- ##
======================================
Coverage 44% 44%
======================================
Files 87 87
Lines 11917 11917
Branches 1780 1780
======================================
+ Hits 5290 5294 +4
+ Misses 6224 6222 -2
+ Partials 403 401 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
We want to merge either this or #811, but not both. |
|
Thank you; this has been addressed by #811. |
I was working on a user who was having a problem with memory consumption over time.
I found that that most of their memory consumption was the EAI_NONAME_ERROR object in greendns. It had accumulated massive memory through accumulating many stack frames each time it was raised.
In the user's case, they had getaddrinfo calls to an IPv4 name with AF_UNSPEC, so it tried INET6 and INET and squashed the error on INET6 (to be expected). However the raise would grow the reused exception object each time.