Skip to content

Stop reraising global exceptions in greendns (alternative to #811)#682

Closed
jjohnson42 wants to merge 2 commits intoeventlet:masterfrom
jjohnson42:master
Closed

Stop reraising global exceptions in greendns (alternative to #811)#682
jjohnson42 wants to merge 2 commits intoeventlet:masterfrom
jjohnson42:master

Conversation

@jjohnson42
Copy link
Copy Markdown

I was working on a user who was having a problem with memory consumption over time.

I found that that most of their memory consumption was the EAI_NONAME_ERROR object in greendns. It had accumulated massive memory through accumulating many stack frames each time it was raised.

In the user's case, they had getaddrinfo calls to an IPv4 name with AF_UNSPEC, so it tried INET6 and INET and squashed the error on INET6 (to be expected). However the raise would grow the reused exception object each time.

Having a single Exception and re-raising it was the culprit for
a user's memory leak.

Evidently each raise could compound including more stack frames, causing
memory to be forever tangled in an ever growing Exception each time it
was reused.

Correct while preserving DRY by using the global to hold the arguments
to socket.gaierror, rather than a socket.gaierror instance.
@jstasiak
Copy link
Copy Markdown
Contributor

This is really interesting – can you provide some details as to how the stack frames are being accumulated? (I'm interested in what object reference chain leads to this)

@jjohnson42
Copy link
Copy Markdown
Author

I didn't quite get the tight understanding, and I failed to produce the same
behavior in my own copy of their environment and also failed to produce a unit test to demonstrate the issue.

However, guppy shortest path to the huge bunch of stackframes was always attached to that single instance of the socket.gaierror (well, tied between NODATA and NONAME, which is a reference to the same single exception).

I am not quite familiar enough to have dug in and more precisely understand the linkages in more detail. The DNS server triggered ENODATA on an IPv6 lookup, and that gets swallowed in getaddrinfo since they were IPv4 based anyway, but with the side-effect of bigger and bigger global EAI_NODATA_ERROR object. I did confirm low memory usage after change, and then reverted the change and the memory consumption shot back up as expected.

Sorry I don't quite understand things enough to be more detailed...

@temoto
Copy link
Copy Markdown
Member

temoto commented Jan 21, 2021

Comment on lines +80 to +82
EAI_EAGAIN_ERRORARGS = (socket.EAI_AGAIN, 'Lookup timed out')
EAI_NONAME_ERRORARGS = (socket.EAI_NONAME, 'Name or service not known')
EAI_NONAME_ERROR = socket.gaierror(*EAI_NONAME_ERRORARGS)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • (the only major comment here) hide exception args from public API (_FOO_ARGS)
  • FOO_ERROR_ARGS or FOO_ARGS
  • EAI_EAGAIN_ERROR was deleted, probably good idea to leave it like NONAME, NODATA errors.
Suggested change
EAI_EAGAIN_ERRORARGS = (socket.EAI_AGAIN, 'Lookup timed out')
EAI_NONAME_ERRORARGS = (socket.EAI_NONAME, 'Name or service not known')
EAI_NONAME_ERROR = socket.gaierror(*EAI_NONAME_ERRORARGS)
_EAI_EAGAIN_ARGS = (socket.EAI_AGAIN, 'Lookup timed out')
EAI_EAGAIN_ERROR = socket.gaierror(*EAI_EAGAIN_ARGS)
_EAI_NONAME_ARGS = (socket.EAI_NONAME, 'Name or service not known')
EAI_NONAME_ERROR = socket.gaierror(*EAI_NONAME_ARGS)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The desired change is in the lattest commit

@jjohnson42
Copy link
Copy Markdown
Author

Ok, made suggested changes.

The problem I have is less about how to measure it, it's not being able to produce the same behavior. When I try to write a test case, the size of the exception does not seem to grow on re-raise. It clearly grew in their environment, and it clearly came from the exception and Frame objects.
So as far as me trying to synthetically test, it seems like a fine method, but in their environment, somehow it grew and I don't understand why it misbehaved there and I can't make it misbehave even knowing what was getting bigger.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 21, 2021

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (4705f07) 44% compared to head (9e175a3) 44%.
Report is 67 commits behind head on master.

Files Patch % Lines
eventlet/support/greendns.py 52% 9 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master    #682   +/-   ##
======================================
  Coverage      44%     44%           
======================================
  Files          87      87           
  Lines       11917   11917           
  Branches     1780    1780           
======================================
+ Hits         5290    5294    +4     
+ Misses       6224    6222    -2     
+ Partials      403     401    -2     
Flag Coverage Δ
ipv6 16% <26%> (+<1%) ⬆️
py27epolls 56% <52%> (-1%) ⬇️
py27poll 56% <52%> (+<1%) ⬆️
py27selects 55% <52%> (+<1%) ⬆️
py35epolls 49% <52%> (+<1%) ⬆️
py35poll 49% <52%> (-1%) ⬇️
py35selects 49% <52%> (+<1%) ⬆️
py36epolls 49% <52%> (+<1%) ⬆️
py36poll 49% <52%> (+<1%) ⬆️
py36selects 49% <52%> (+<1%) ⬆️
py37epolls 49% <52%> (+<1%) ⬆️
py37poll 49% <52%> (+<1%) ⬆️
py37selects 49% <52%> (+<1%) ⬆️
py38epolls 40% <71%> (+<1%) ⬆️
py38openssl 39% <71%> (+<1%) ⬆️
py38poll 40% <71%> (-1%) ⬇️
py38selects 40% <71%> (+<1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jiajunsu jiajunsu mentioned this pull request Sep 21, 2023
@itamarst
Copy link
Copy Markdown
Contributor

We want to merge either this or #811, but not both.

@itamarst itamarst changed the title Stop reraising global exceptions in greendns Stop reraising global exceptions in greendns (alternative to #811) Dec 19, 2023
@4383
Copy link
Copy Markdown
Member

4383 commented Dec 20, 2023

We want to merge either this or #811, but not both.

I'd suggest to go with #811 as unit tests are also provided within.

@4383
Copy link
Copy Markdown
Member

4383 commented Dec 20, 2023

Also #811 comes with a clear description of the bug and with a reproducer (#810)

itamarst pushed a commit that referenced this pull request Dec 20, 2023
* Fix memory leak in module greendns

Change `EAI_*_ERROR` to `lambda` expression, and initialize them
 when being called, to avoid memory leak with `__traceback__.tb_next`

fixes #810
@itamarst
Copy link
Copy Markdown
Contributor

Thank you; this has been addressed by #811.

@itamarst itamarst closed this Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants