Refactor: performance optimization in tracker announce#751
Refactor: performance optimization in tracker announce#751josecelano wants to merge 1 commit intotorrust:developfrom
Conversation
|
ACK 59a68d3 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #751 +/- ##
===========================================
- Coverage 74.06% 74.02% -0.05%
===========================================
Files 145 145
Lines 8958 8923 -35
===========================================
- Hits 6635 6605 -30
+ Misses 2323 2318 -5 ☔ View full report in Codecov by Sentry. |
cd9f82c to
4a17fa1
Compare
The tracker announce method uses two locks: 1. To write the new peer in the torrent entry peer list (write lock). 2. To read the peers list from the torrent entry (read lock). I was wondering if the repo would be faster removing the second lock since we can get the peer list in the first lock. This are the benchmarking results: ```output tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Entry>> add_one_torrent: Avg/AdjAvg: (57ns, 59ns) update_one_torrent_in_parallel: Avg/AdjAvg: (15.769541ms, 16.69948ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (20.6541ms, 20.010245ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (18.326633ms, 18.326633ms) std::sync::RwLock<std::collections::BTreeMap<InfoHash, Entry>> add_one_torrent: Avg/AdjAvg: (38ns, 39ns) update_one_torrent_in_parallel: Avg/AdjAvg: (7.154301ms, 7.296907ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (8.370377ms, 8.370377ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (6.902327ms, 6.902327ms) std::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<std::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (44ns, 40ns) update_one_torrent_in_parallel: Avg/AdjAvg: (6.023789ms, 6.023789ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (12.306719ms, 12.306719ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (8.429203ms, 8.429203ms) tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<std::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (71ns, 69ns) update_one_torrent_in_parallel: Avg/AdjAvg: (7.022376ms, 7.022376ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (28.125712ms, 27.811162ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (8.908545ms, 9.036288ms) tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<tokio::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (85ns, 81ns) update_one_torrent_in_parallel: Avg/AdjAvg: (15.285006ms, 15.555171ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (29.794849ms, 30.254531ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (9.302051ms, 9.302051ms) ``` ```output Requests out: 384701.13/second Responses in: 345642.53/second - Connect responses: 171189.76 - Announce responses: 171187.17 - Scrape responses: 3265.60 - Error responses: 0.00 Peers per announce response: 0.00 Announce responses per info hash: - p10: 1 - p25: 1 - p50: 1 - p75: 1 - p90: 2 - p95: 3 - p99: 104 - p99.9: 281 - p100: 347 ``` ```output tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Entry>> add_one_torrent: Avg/AdjAvg: (64ns, 64ns) update_one_torrent_in_parallel: Avg/AdjAvg: (15.652934ms, 15.54617ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (23.834278ms, 22.935421ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (17.973855ms, 18.28944ms) std::sync::RwLock<std::collections::BTreeMap<InfoHash, Entry>> add_one_torrent: Avg/AdjAvg: (43ns, 39ns) update_one_torrent_in_parallel: Avg/AdjAvg: (6.362147ms, 6.362147ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (8.869102ms, 8.869102ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (6.589264ms, 6.510441ms) std::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<std::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (48ns, 49ns) update_one_torrent_in_parallel: Avg/AdjAvg: (6.194211ms, 6.194211ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (12.50352ms, 12.50352ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (8.411279ms, 8.411279ms) tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<std::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (79ns, 79ns) update_one_torrent_in_parallel: Avg/AdjAvg: (6.966462ms, 6.966462ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (27.584361ms, 27.411202ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (6.965519ms, 7.474346ms) tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<tokio::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (97ns, 97ns) update_one_torrent_in_parallel: Avg/AdjAvg: (16.046499ms, 16.430313ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (28.42327ms, 27.876115ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (6.576256ms, 6.326548ms) ``` Only in a couple of cases is a little bit faster. But I think the code is simpler. However, the UDP load test looks better. From 171187.17 announce requests to 185922.40. ```output Requests out: 416413.93/second Responses in: 374655.61/second - Connect responses: 185740.08 - Announce responses: 185922.40 - Scrape responses: 2993.13 - Error responses: 0.00 Peers per announce response: 0.00 Announce responses per info hash: - p10: 1 - p25: 1 - p50: 1 - p75: 1 - p90: 2 - p95: 3 - p99: 105 - p99.9: 301 - p100: 365 ``` In general, It's not a great performance increase. I would merge it only because the code is cleaner. This is the initial version but with a cleaner code. It's the final version in this PR. - Write lock to update the peer list. - Read lock to get stats and the peer list. ```output tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Entry>> add_one_torrent: Avg/AdjAvg: (75ns, 73ns) update_one_torrent_in_parallel: Avg/AdjAvg: (13.987718ms, 13.987718ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (18.375705ms, 17.887158ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (18.242291ms, 18.242291ms) std::sync::RwLock<std::collections::BTreeMap<InfoHash, Entry>> add_one_torrent: Avg/AdjAvg: (55ns, 54ns) update_one_torrent_in_parallel: Avg/AdjAvg: (9.550877ms, 9.80194ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (7.861103ms, 7.861103ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (7.80655ms, 7.80655ms) std::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<std::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (61ns, 59ns) update_one_torrent_in_parallel: Avg/AdjAvg: (9.649314ms, 9.521965ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (12.225382ms, 12.582905ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (10.700225ms, 10.700225ms) tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<std::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (83ns, 82ns) update_one_torrent_in_parallel: Avg/AdjAvg: (12.059674ms, 12.059674ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (23.095544ms, 24.209862ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (11.731907ms, 11.731907ms) tokio::sync::RwLock<std::collections::BTreeMap<InfoHash, Arc<tokio::sync::Mutex<Entry>>>> add_one_torrent: Avg/AdjAvg: (105ns, 104ns) update_one_torrent_in_parallel: Avg/AdjAvg: (12.257331ms, 12.257331ms) add_multiple_torrents_in_parallel: Avg/AdjAvg: (23.032795ms, 23.541457ms) update_multiple_torrents_in_parallel: Avg/AdjAvg: (11.880602ms, 11.880602ms) ``` ```output Requests out: 416171.50/second Responses in: 374553.95/second - Connect responses: 186039.97 - Announce responses: 185587.24 - Scrape responses: 2926.74 - Error responses: 0.00 Peers per announce response: 0.00 Announce responses per info hash: - p10: 1 - p25: 1 - p50: 1 - p75: 1 - p90: 2 - p95: 3 - p99: 105 - p99.9: 301 - p100: 393 ```
4a17fa1 to
59a68d3
Compare
|
I would prefer to merge #690 before this. I've just got to finish a couple more tests and then it is ready to review. (I will prob need to do rebase to squash a few comments also.) |
Hey, yes I didn't want to merge it before you finish yours. Anyway It's not an important change. Take a look when you have time and let me know if you think the code is simpler. Regarding performance It looks like it's not a big change. |
|
I'm closing this. I think it's a good refactor. I would like to decouple peer lists from statistics. Statistics could even be enabled/disabled in config. Since the performance is the same, I would prefer to split the method In short, I think it would be better because:
|
The tracker announce method uses two locks:
I was wondering if the repo would be faster removing the second lock since we can get the peer list in the first lock.
These are the benchmarking results:
Before the refactor
With only a write lock
Only in a couple of cases is a little bit faster. But I think the code is simpler.
However, the UDP load test looks better. From 171187.17 announce requests to 185922.40.
In general, It's not a great performance increase. I would merge it only because the code is cleaner.
With independent write and read lock
This is the initial version but with a cleaner code. It's the final version in this PR.