Throttle simultaneous DNS requests#1924
Conversation
Added a new option at TCPConnector level called `throttle_dns` that allows to the connector throttle simulatnewous DNS requests for a specific domain. This functionality its only compatible when the DNS cache is enabled and it is dissabled by default. Therefore, many requests can end up making the same DNS resolution.
Codecov Report
@@ Coverage Diff @@
## master #1924 +/- ##
==========================================
+ Coverage 97.05% 97.06% +0.01%
==========================================
Files 37 38 +1
Lines 7610 7637 +27
Branches 1328 1336 +8
==========================================
+ Hits 7386 7413 +27
Misses 101 101
Partials 123 123
Continue to review full report at Codecov.
|
|
Hmm. The PR adds extra level of complexity to aiohttp client API but be could live with it probably. |
|
@asvetlov fair enough about the locking primitive, if the Not sure what you mean with
Events are supposed to be temporary to only lock those requests that had a miss in the cache and were keen on the same host resolution. Once the creator of the event either has filled the cache or got an exception, the event is removed from Also not sure about:
Regarding the complexity I would like to justify why this change can be interesting.The following code shows a situation where 10 request to the same domain are made when there is a cache miss import asyncio
import aiohttp
from aiohttp.connector import TCPConnector
from aiohttp.resolver import DefaultResolver
class MyResolver(DefaultResolver):
def __init__(self, *args, **kwargs):
self.cnt = 0
super(MyResolver, self).__init__(*args, **kwargs)
async def resolve(self, *args, **kwargs):
self.cnt += 1
return (await super(MyResolver, self).resolve(*args, **kwargs))
async def fetch(session, url):
await session.get(url)
async def main():
url = 'http://www.vilaweb.cat/'
resolver = MyResolver()
connector = TCPConnector(resolver=resolver, use_dns_cache=True)
session = aiohttp.ClientSession(connector=connector)
tasks = [fetch(session, url) for i in range(10)]
await asyncio.gather(*tasks)
print('DNS Resolve calls %d' % resolver.cnt)
await fetch(session, url)
print('DNS Resolve calls %d' % resolver.cnt)
await fetch(session, url)
print('DNS Resolve calls %d' % resolver.cnt)
await fetch(session, url)
print('DNS Resolve calls %d' % resolver.cnt)
await fetch(session, url)
print('DNS Resolve calls %d' % resolver.cnt)
if __name__ == '__main__':
asyncio.get_event_loop().run_until_complete(main())This is by design. First consecutive calls will be paused at the same code path, and they wont be resumed until the next full scheduler iteration. This MR get rid of that issue having as a result only one DNS query. |
|
Ok, your motivation is very clean. That I'm talking about is missing getting errors from Suppose we have non-processed yet failed events in Pretty simple scenario, isn't it? To get rid of it you should handle these exceptions in |
|
Checking this scenario I could see that the simple execution of True that once the program finishes this messages might be printed if the exception is not gathered by the caller, but IMHO this is or it should be the default behavior. |
|
No. Default should be finishing without warning. |
|
Handle the futures that might be already finished with exception but are still waiting to be processed by the callers will require some deep changes in that MR, have in mind that the current pattern pop the keys as soon as the first caller ends [1]. Main reason, once the cache is filled does not make sense keep stacking future callers in the Event. Aside of that, I'm still not convinced about that need. Stop the TCPconnector explicitly and make a del of the instance won't always delete the instance, only if there are no more references to them. Correct me if Im wrong, but even without this MR the current code can suffer the same issue. Let me explain. Imagine that you have a bunch of callers that are paused waiting for the DNS response, and these have finished with an error, indeed the error is an exception saved to the futures that link the paused callers. If the TCPConnector is stoped and deleted - in fact wont be deleted - the callers will still have the chance to gather that exception if the loop keeps on motion, or if the loop is stoped the exception will be printed to the log. If this is true, fix the issue that you claim shouldnt make sense because its the same behaviour that right now is happening in the current code. |
Will help the user to throttle DNS requests to the same domain. By design it will happen always when many URLs behind the same domain are scheduled at the same time, perhaps via *gather* The Event class is no longer a derivated class of the asyncio.locks.Event, and makes a full copy of this one to avoid future issues.
|
@asvetlov I've modified the Event class getting rid of the inheritance, basically doing a full copy of the official class and trying to get the 100% of coverage. Also, Ive left the Regarding the controversial topic of the futures. I would like to have another reviewer and gather other opinions. what do you think? |
|
Sorry, I'm almost not available until roughly end of month. |
|
any ideas @fafhrd91 why the
|
|
Different enchant dictionaries I guess. Just add failed words to whiteout
(txt file in docs folder)
On Tue, Jul 18, 2017, 09:23 Pau Freixes ***@***.***> wrote:
any ideas @fafhrd91 <https://github.com/fafhrd91> why the make
doc-spelling is failing in Travis?
it works on my computer
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1924 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAVwL18bA9qh_7oO77BnglkwfdX3_-z8ks5sPF15gaJpZM4NnqrP>
.
--
Thanks,
Andrew Svetlov
|
* Throttle simultaneous DNS requests Added a new option at TCPConnector level called `throttle_dns` that allows to the connector throttle simulatnewous DNS requests for a specific domain. This functionality its only compatible when the DNS cache is enabled and it is dissabled by default. Therefore, many requests can end up making the same DNS resolution. * Explicit loop for sleep calls * Fixed typos in documentation * Added DNS request throttling in CHANGES.rst * Dogpile invalid word by doc-spelling * Get full coverage for locks * Throttle DNS always enabled Will help the user to throttle DNS requests to the same domain. By design it will happen always when many URLs behind the same domain are scheduled at the same time, perhaps via *gather* The Event class is no longer a derivated class of the asyncio.locks.Event, and makes a full copy of this one to avoid future issues. * Remove implicit loop tests for locks.Event * Use create_future helper * Updated spelling wordlist * Use a simple Event wrapper instead of a full Event class implementation * Pass the loop explicitly * Cancel pending throttled DNS requests if connector is close * Removed not used attribute * Pass loop explicit * Throtlle DNS feature adapted to CHANGES protocol * Update 2111.feature * Changed class name
What do these changes do?
Added a new option at TCPConnector kw called
throttle_dnsthatallows to the connector throttle simultaneous DNS requests for a specific
domain. This functionality it's only compatible when the DNS cache is
enabled and it is disabled by default.
This feature get rid of the dogpile side effect when there is a miss in the cache, having
just one request doing the DNS resolution - at hostname granularity, requests that continue arriving while the ongoing DNS resolution is not finished will wait till it finishes.
When the DNS cache is not enabled this feature is not allowed. All requests will end up making a DNS query.
Are there changes in behavior for the user?
No
Related issue number
No
Checklist
CONTRIBUTORS.txtCHANGES.rst#issue_numberformat at the end of changelog message. Use Pull Request number if there are no issues for PR or PR covers the issue only partially.