Implementation of per-client UDP readloops by rg0now · Pull Request #288 · pion/turn

rg0now · 2023-01-19T16:53:30Z

This is a first attempt to address #287.

The aim is to allow the server to maintain a specific per-client UDP readloop thread instead of one global readloop, which would allow it to drain each client connection on a separate CPU thread. See more on this here.

Problem: Currently pion/turn server-side performance is limited at about 40-50 kpps per TURN/UDP listener. This is because we allocate a single global net.PacketConn per UDP listener, which is then drained by a single CPU go-routine. This means that all client allocations made via that listener will share the same CPU thread and there is no way to load-balance client allocations across CPUs. This only affects TURN/UDP: for TCP, TLS and DTLS the TURN sockets are connected back to the client and therefore a separate CPU go-routine is created for each allocation.

Proposed solution: Create a separate net.PacketConn per each UDP client-allocation, by (1) sharing the same listener server socket using SO_REUSEADDR, (2) connecting each per-allocation connection back to the client, and (3) firing up a separate read-loop/go-routine per each client socket.

Security: Care must be taken in implementing this plan: if we blindly create a new socket per received UDP packet then a simple UDP portscan will DoS the TURN listener. In the proposed solution the per-client socket is created only after a client has made a successful allocation, which rules out the blind port-scan problem since the attacker must at least have a set of valid TURN credentials. It is still not completely safe against DoS attacks though, in that an attacker in possession of valid credentials can still open thousands of server sockets, but this is at least the same level of security as provided by the TURN/TCP implementation.

Implementation: Some fairly intrusive changes are required to support this. The patch set in particular:

moves readLoop() into internal/server/readLoop.go,
splits server.Request struct into server.State (stuff needed by the readLoop) and server.Request (the rest) that embeds server.State,
lets the server invoke the Connect callback once it successfully created an TURN allocation for a client to obtain a per-client socket and spawn the readloop go-routine for the new socket,
adds a WrapConn implementation that wraps the net.Conn returned by the DiapUDP as a net.PacketConn,
adds an example ( examples/turn-server/udp-connect/main.go) to demonstrate the use of the new feature.

Performance: Attached is a super-simple script to benchmark a naive TURN server with (udp-connect) and without (simple) the patch. Requires the turncat TURN client (you can obtain from here) in the source dir, plus iperfv2 in the PATH. The script opens 4 client connections and runs a UDP iperf benchmark in each, then in every 5 seconds it reports the cumulative packet rate through the tested server. On a 24-core Intel server, the built-in simple server (./test.sh simple 30000000) produced 34816 pps packet rate and used 141.1% CPU (recall, there are 4 clients connected to the same listener so theoretically the server could scale out to 4 CPUs), while the server with the patches (./test.sh udp-connect 30000000) produced 140194 pps with 472% CPU usage.

Feedback appreciated.

test.sh.gz

Aim is to allow the server to maintain a specific per-client UDP readloop thread instead if the global readloop thread we maintain today for improving performance. - add a new Connect() callback to PacketConnConfig to teach the server how to obtain a connected socket to a client - move readLoop to internal/server/readLoop.go - split server.Request struct into server.State (stuff needed by the readLoop) and server.Request (the rest) that embeds server.State - let the server invoke the Connect callback once it successfully created an TURN allocation for a client to obtain a per-client socket and spawn the readloop for the new socket - add a WrapConn implementation for net.PacketConn that wraps the net.Conn returned by the DiapUDP - add examples/turn-server/udp-connect/main.go to demonstrate the use of the new feature

codecov · 2023-01-19T16:56:25Z

Codecov Report

Base: 68.65% // Head: 67.52% // Decreases project coverage by -1.13% ⚠️

Coverage data is based on head (dbdb8ff) compared to base (68e752b).
Patch coverage: 38.46% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #288      +/-   ##
==========================================
- Coverage   68.65%   67.52%   -1.13%     
==========================================
  Files          38       39       +1     
  Lines        2469     2522      +53     
==========================================
+ Hits         1695     1703       +8     
- Misses        641      685      +44     
- Partials      133      134       +1

Flag	Coverage Δ
go	`67.52% <38.46%> (-1.13%)`	⬇️
wasm	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
internal/server/server.go	`68.42% <ø> (ø)`
server_config.go	`33.33% <ø> (ø)`
stun_conn.go	`21.12% <0.00%> (-6.66%)`	⬇️
internal/server/turn.go	`56.16% <18.18%> (-3.10%)`	⬇️
server.go	`51.61% <50.00%> (-7.45%)`	⬇️
internal/server/readloop.go	`88.23% <88.23%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

jech · 2023-01-27T13:49:32Z

What's the CPU usage like when running on a single core?

rg0now · 2023-01-27T15:57:44Z

Depends on the traffic rate and the CPU, but in general UDP listeners can handle about 30-60 thousands packets/sec (roughly 100-200 Mbps depending on the packet size) per CPU-core. For comparison, a pure-C TURN can server handle about 100-300 thousand pps and our optimized DPDK-based UDP proxy (no TURN) reaches 10-20 million(!) pps per CPU-core.

jech · 2023-01-27T16:02:17Z

I meant: how much does this patch reduce single-core performance?

rg0now · 2023-01-27T16:28:38Z

It doesn't. Theoretically, it shouldn't, and in practice my super-rudimentary iperf benchmarks confirm this (62500 pps with 20-30 usec mean delay on localhost both with and without the patch).

jech · 2023-01-27T16:39:52Z

Ah, cool. I'll do my best to find time to review your patch, then.

rg0now · 2023-01-30T11:08:49Z

You're going to hate me for this but I think I will double your review load: it seems that we found a much less intrusive solution, see #295. The performance is a bit smaller but there are no changes in the server code and the required changes seem more secure.

stv0g · 2023-02-22T07:38:50Z

@rg0now Are you okay with it if we close this PR?

rg0now · 2023-02-22T10:04:20Z

@rg0now Are you okay with it if we close this PR?

Already wanted to suggest this... Please go ahead

stv0g changed the title ~~Draft implementation of per-client UDP readloops~~ Implementation of per-client UDP readloops Jan 20, 2023

stv0g marked this pull request as draft January 20, 2023 11:46

stv0g mentioned this pull request Jan 21, 2023

Simplify NewServer() #286

Closed

rg0now mentioned this pull request Jan 30, 2023

Implementation of per-client UDP readloops, take 2 #295

Merged

stv0g closed this Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of per-client UDP readloops#288

Implementation of per-client UDP readloops#288
rg0now wants to merge 1 commit intopion:masterfrom
l7mp:feature-udp-per-client-readloop

rg0now commented Jan 19, 2023

Uh oh!

codecov bot commented Jan 19, 2023 •

edited

Loading

Uh oh!

jech commented Jan 27, 2023

Uh oh!

rg0now commented Jan 27, 2023

Uh oh!

jech commented Jan 27, 2023

Uh oh!

rg0now commented Jan 27, 2023

Uh oh!

jech commented Jan 27, 2023

Uh oh!

rg0now commented Jan 30, 2023

Uh oh!

stv0g commented Feb 22, 2023

Uh oh!

rg0now commented Feb 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

rg0now commented Jan 19, 2023

Uh oh!

codecov bot commented Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jech commented Jan 27, 2023

Uh oh!

rg0now commented Jan 27, 2023

Uh oh!

jech commented Jan 27, 2023

Uh oh!

rg0now commented Jan 27, 2023

Uh oh!

jech commented Jan 27, 2023

Uh oh!

rg0now commented Jan 30, 2023

Uh oh!

stv0g commented Feb 22, 2023

Uh oh!

rg0now commented Feb 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 19, 2023 •

edited

Loading