Prototype Managed QUIC support implementation #293

rzikm · 2020-11-03T22:36:25Z

This PR adds initial version of almost C#-only QUIC implementation as an alternative to MsQuic based implementation. The work originated as my Master Thesis.

Supported QUIC Protocol Features

This implementation is highly in WIP state. The features are at the draft-27 version. The goal was to obtain a minimal working
implementation able to reliably transmit data between client and server. This can be used to evaluate the viability of purely C# implementation for production releases. Most transport-unrelated features are
left unimplemented.

Currently implemented:

Basic connection establishment
Encryption
TLS integration backed by modified OpenSSL
Sending data, flow control, loss recovery, congestion control
Stream/Connection termination
Coalescing packets into a single UDP datagram

Unsupported QUIC Protocol Features

A list of highlights of unimplemented protocol features follows:

Connection migration
Stateless reset
0-RTT data
Server Preferred Address
Version Negotiation
Address Validation
Encryption key updates

OpenSSL integration

To function correctly, the implementation requires a custom branch of OpenSSL from Akamai
https://github.com/akamai/openssl/tree/OpenSSL_1_1_1g-quic. The implementation tries to detect that
a QUIC-supporting OpenSSL is present in the path. If not, then a mock implementation is used instead.

The mock implementation can be used for trying the implementation locally, but interoperation with other
QUIC implementations is, of course, possible only with the OpenSSL-based TLS. If you want to make sure your
implementation runs with OpenSSL, define the DOTNETQUIC_OPENSSL environment variable. This will
cause the implementation to throw if proper OpenSSL version cannot be loaded.

Tracing and qvis

The implementation can produce traces that can be consumed by https://qvis.edm.uhasselt.be
visualizer. To collect traces, define DOTNETQUIC_TRACE environment variable to something nonempty. The traces will be
saved in the working directory in a file named [timestamp]-[server|client].qvis.

Disabling encryption

Regardless of TLS used, you can circumvent encryption by defining the DOTNETQUIC_NOENCRYPT environment variable to something nonempty.

Ported from dotnet/runtime#43948

geoffkizer · 2020-11-04T18:41:20Z

Thanks for posting @rzikm.... I will review and comment on any issues.

It looks like a couple commits from runtime/master got added in to your PR, likely because you rebased there and then pushed here, and these commits don't exist in runtimelabs yet. I will merge the latest from runtime/master in here and hopefully this will fix that.

rzikm · 2020-11-04T18:48:15Z

Yes, I struggled a bit with creating a PR into this repo, since runtimelab and runtime are unrelated in Github's eyes.

Anyway, I am thinking if it would be better to squash all those 250 or so commits. from the multiple rebases and conflict resolutions, it is unlikely that an arbitrary commit in the middle is compilable, let alone functional. thoughts?

geoffkizer · 2020-11-04T18:53:22Z

Anyway, I am thinking if it would be better to squash all those 250 or so commits. from the multiple rebases and conflict resolutions, it is unlikely that an arbitrary commit in the middle is compilable, let alone functional. thoughts?

It's probably worth squashing at least to some degree. There's always a tradeoff here of maintaining the history for documentation purposes (e.g., see exactly what we changed to enable some feature) vs the cost of doing so and complexity of having so many commits.

If you want to squash it all to one commit, I'd be fine with that. If you want to squash it down to some set of steps (i.e. commits) that retains some useful history/context that we might case about preserving for future information, I'm fine with that too. Up to you.

EDIT: That said, please don't squash the changes that got picked up from runtime/master :) The merge should take care of those, hopefully...

geoffkizer · 2020-11-04T18:54:58Z

BTW, quick note: You shouldn't need #nullable enable in most of these files (at least source files; test code is different) because it's enabled at the project level.

geoffkizer · 2020-11-04T18:57:14Z

CI is failing with this error: .packages\microsoft.dotnet.build.tasks.targetframework.sdk\6.0.0-beta.20529.1\build\Microsoft.DotNet.Build.Tasks.TargetFramework.Sdk.targets(66,5): error MSB3202: (NETCORE_ENGINEERING_TELEMETRY=Build) The project file "....\System.Runtime.InteropServices.JavaScript/src/System.Runtime.InteropServices.JavaScript.csproj" was not found.

Do you know if that's related to this PR or not? It's possible that it's a pre-existing issue since this branch just got set up. I'll investigate once I get the merging stuff figured out, but thought I'd mention it now.

geoffkizer · 2020-11-04T19:03:01Z

BTW, the merge will also bring over this change: dotnet/runtime#44223

This should simplify handling of the "browser" build, but may cause merge conflicts with your PR, not sure...

README.md

src/libraries/System.Net.Http/src/System.Net.Http.csproj

geoffkizer · 2020-11-04T19:49:10Z

It looks like there are a bunch of changes to clean up "using" statements in the MsQuic and Mock providers. We should remove these changes as it's likely to just lead to merge conflicts.

geoffkizer · 2020-11-04T20:00:56Z

Similarly, there are some changes to System.Net.Http, and some changes to the shared QUIC code (like adding a QuicConnection constructor). These may be good and/or necessary changes, but we probably shouldn't be making them here.

We should either make these changes in runtime/master or drop them.

In general, we should minimize changes in this tree outside of the managed implementation proper. If they are good changes, they should go into runtime/master. And making those sort of changes here can lead to merge nightmares later.

geoffkizer · 2020-11-07T21:38:19Z

Looks like there are a few issues with Quic functional tests. Do you know what's happening here?

Also, looks like there are a bunch of issues with Http functional tests -- I'm guessing there is some basic issue here, haven't investigated though...

rzikm · 2020-11-07T22:43:59Z

Also, looks like there are a bunch of issues with Http functional tests -- I'm guessing there is some basic issue here, haven't investigated though...
That is baffling, I have reverted all changes in the System.Net.Http subtree, so it should behave exactly as in upstream

Looks like there are a few issues with Quic functional tests. Do you know what's happening here?
I will try to investigate sometime soon

geoffkizer · 2020-11-08T00:12:30Z

The reason the Http functional tests are failing seems to be that you've changed QuicImplementationProviders.Default to be the managed provider, and it returns true for IsSupported. There are some HTTP version selection tests that run conditionally based on this, and try to actually verify the expected version selection behavior by issuing HTTP requests against the loopback server.

It's not entirely clear to me whether these tests are actually correct or not -- some of them look a little wonky. But regardless, they weren't running in CI before because QuicImplementationProviders.Default.IsSupported was returning false, and now it's returning true, so they are running.

geoffkizer · 2020-11-08T00:22:20Z

It's probably fine to change the default QUIC provider to the managed provider in this branch -- after all that's the point of this branch.

But what seems undesirable is that, when OpenSsl is not available, the managed provider will fall back to its mock TLS implementation, which means that code (like these tests) will think they can actually use QUIC when what they are actually getting is mock TLS QUIC.

I think we should separate out the mock TLS handling into a separate QUIC provider. That is, we should have:
(1) A managed QUIC provider that does real QUIC. This can be the default. This depends on OpenSsl, so it will return false from IsSupported if it can't find OpenSsl.
(2) A mock TLS managed QUIC provider -- something like ManagedMockTls. This doesn't depend on OpenSsl, so should always return true for IsSupported.

And we should run tests against both of these. The real managed provider won't run in CI, at least for now, because there's no OpenSsl. But at least the Mock TLS provider will run, which is valuable.

geoffkizer · 2020-11-08T00:26:08Z

BTW, I verified that changing the default provider back to MsQuic makes the Http tests pass, because Default.IsSupported returns false now.

src/libraries/System.Net.Http/src/Resources/SR.Designer.cs

src/libraries/System.Net.Http/src/Resources/Strings.Designer.cs

src/libraries/System.Net.Quic/tests/FunctionalTests/QuicTestBase.cs

geoffkizer · 2020-11-08T02:01:59Z

I've manually ported the QUIC API changes back to dotnet/runtime. See here: dotnet/runtime#44380

Once this gets in, I will merge it to the feature branch here and we should be able to rebase and get rid of the API changes in this PR.

geoffkizer · 2020-11-08T02:18:28Z

FWIW, I can't easily reproduce any of the QUIC test failures locally. They seem to happen pretty consistently in CI. This usually means that the failures only happen under high CPU load, as occurs in CI.

geoffkizer · 2020-11-08T02:19:55Z

Though I do notice, the functional tests take much longer than they used to with just the mock provider -- ~60s on my machine vs ~3s with just the mock provider.

Any idea why the tests take so long?

geoffkizer · 2020-11-08T02:44:45Z

We should enable the new stream conformance tests for the managed provider. They are here: https://github.com/dotnet/runtimelab/blob/feature/ManagedQuic/src/libraries/System.Net.Quic/tests/FunctionalTests/QuicStreamConnectedStreamConformanceTests.cs

I tried doing this locally, but unfortunately I got a bunch of errors and some test hangs. I'm guessing the flush behavior on managed QuicStream is causing problems.

We may want to just add auto-flushing on every stream write, as a way to get these tests passing.

geoffkizer · 2020-11-08T02:47:27Z

Also re conformance tests: Several tests are currently disabled because of the non-standard behavior of QuicStream on dispose. That's fine for now, but the plan is to fix the behavior of QuicStream on dispose to align with standard stream behavior. We should make this change for managed QuicStream and then re-enable these tests.

I recently merged a PR to do this for the mock provider, see here: dotnet/runtime#44282

geoffkizer · 2020-11-08T02:53:18Z

We should also enable some of the HTTP3 tests in System.Net.Http to run against the managed provider.

The tests are here: https://github.com/dotnet/runtime/blob/master/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs#L3008

However, we should address issues with the stream conformance tests before attempting this.

geoffkizer · 2020-11-08T02:57:06Z

Here's the change to enable stream conformance tests: https://github.com/geoffkizer/runtimelab/tree/addstreamconformancetests

stephentoub · 2020-11-08T02:57:36Z

I'm guessing the flush behavior on managed QuicStream is causing problems.

There's a property you can override on the test base class to say that writes require flushing.

geoffkizer · 2020-11-08T03:26:18Z

There's a property you can override on the test base class to say that writes require flushing.

Good point, thanks. Unfortunately, this doesn't seem to help. There's some other issue going on here.

…ed QUIC tests

rzikm · 2020-12-15T11:11:12Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2020-12-15T11:11:17Z

Commenter does not have sufficient privileges for PR 293 in repo dotnet/runtimelab

rzikm · 2020-12-18T15:27:58Z

@geoffkizer, In the last test run, all QUIC tests passed, the only one failed test is in Sockets and HTTP2

https://helix.dot.net/api/2019-06-17/jobs/9e1d5026-ee81-4786-99cb-bfcfd335697e/workitems/System.Net.Http.Functional.Tests/console

This feels like random unrelated fail. Could you rerun the pipeline?

…cation

geoffkizer · 2021-02-16T06:33:57Z

Sorry, lost track of this. It looks like there are only minor failures in CI. Do you know what the issue here is?

rzikm · 2021-02-16T07:42:47Z

I did not have much time to spend on it because I was too busy writing the master thesis text. But there seems to be some race condition leading to deadlock/livelock of either client or server. Sadly, it does not seem to be isolated to a single test that could be disabled.

What I find quite strange is that the probability of failure seems to depend on the order in which the tests are run. I will try to investigate again this week when I have time.

DemiMarie · 2021-02-16T18:24:38Z

I did not have much time to spend on it because I was too busy writing the master thesis text. But there seems to be some race condition leading to deadlock/livelock of either client or server. Sadly, it does not seem to be isolated to a single test that could be disabled.

What I find quite strange is that the probability of failure seems to depend on the order in which the tests are run. I will try to investigate again this week when I have time.

The Linux netem framework (part of tc) is fantastic for this. I used it to find quinn-rs/quinn#692, which was a bug in another QUIC implementation with similar symptoms.

rzikm · 2021-02-16T19:13:49Z

The Linux netem framework (part of tc) is fantastic for this. I used it to find quinn-rs/quinn#692, which was a bug in another QUIC implementation with similar symptoms.

I am working primarily on Windows now, but simulating arbitrary network issues might help tracking this down, or at least reproduce it on my machine (as far as I know the tests fail only on CI). Thanks for the idea.

DemiMarie · 2021-02-16T19:44:21Z

The Linux netem framework (part of tc) is fantastic for this. I used it to find quinn-rs/quinn#692, which was a bug in another QUIC implementation with similar symptoms.

I am working primarily on Windows now, but simulating arbitrary network issues might help tracking this down, or at least reproduce it on my machine (as far as I know the tests fail only on CI). Thanks for the idea.

You’re welcome!

rzikm changed the title ~~Master managed quic~~ Prototype Managed QUIC support implementation Nov 3, 2020

rzikm mentioned this pull request Nov 3, 2020

[QUIC] - Initial managed implementation in C# dotnet/runtime#43948

Closed

geoffkizer reviewed Nov 4, 2020

View reviewed changes

README.md Outdated Show resolved Hide resolved

geoffkizer reviewed Nov 4, 2020

View reviewed changes

src/libraries/System.Net.Http/src/System.Net.Http.csproj Outdated Show resolved Hide resolved

rzikm force-pushed the master-managed-quic branch from 9e01d37 to 4fdfb5c Compare November 7, 2020 17:48

geoffkizer reviewed Nov 8, 2020

View reviewed changes

src/libraries/System.Net.Http/src/Resources/SR.Designer.cs Outdated Show resolved Hide resolved

geoffkizer reviewed Nov 8, 2020

View reviewed changes

src/libraries/System.Net.Http/src/Resources/Strings.Designer.cs Outdated Show resolved Hide resolved

geoffkizer reviewed Nov 8, 2020

View reviewed changes

src/libraries/System.Net.Quic/tests/FunctionalTests/QuicTestBase.cs Outdated Show resolved Hide resolved

rzikm added 8 commits December 9, 2020 15:13

Fix overflow in pacer allowance calculation

f71de43

Reduce padding in packets

037932c

Fix failing unit tests

b81ddda

Improved application limitting detection

8d1819f

Catch ChannelClosedException

42469ff

Use ODCID in the qlog trace name

f382aa8

Fix pacing timer firing during handshake

0b8e2a1

Try fixing failing CI runs by disabling parallelism for running Manag…

1da84b5

…ed QUIC tests

Avoid OperationCancelledExceptions in QuicConnectionContext

cc368bb

Minor improvements

eb8faa9

rzikm force-pushed the master-managed-quic branch from 00307d6 to eb8faa9 Compare December 19, 2020 09:54

rzikm added 4 commits December 20, 2020 14:10

Reduce memory footprint

17dd311

Fix checking incorrect limit for protocol violation

7792e96

Fix failing unit test

9e2834d

Avoid growing congestion window when limited by flow control or appli…

72c8205

…cation

rzikm force-pushed the master-managed-quic branch from e7a69ba to 72c8205 Compare December 22, 2020 22:53

rzikm added 2 commits December 28, 2020 20:59

WIP CUBIC implementation

43825a9

Minor comments

22d8660

danmoseley added the area-ManagedQuic label Feb 5, 2021

rzikm added 2 commits February 16, 2021 22:40

Use SafeHandles for SSL structure

0feb26b

Add unit test for resending FIN bit

c22b939

rzikm closed this Sep 7, 2022

Prototype Managed QUIC support implementation #293

Prototype Managed QUIC support implementation #293

Uh oh!

Conversation

rzikm commented Nov 3, 2020 • edited by ManickaP Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Supported QUIC Protocol Features

Unsupported QUIC Protocol Features

OpenSSL integration

Tracing and qvis

Disabling encryption

Uh oh!

geoffkizer commented Nov 4, 2020

Uh oh!

rzikm commented Nov 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffkizer commented Nov 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffkizer commented Nov 4, 2020

Uh oh!

geoffkizer commented Nov 4, 2020

Uh oh!

geoffkizer commented Nov 4, 2020

Uh oh!

Uh oh!

Uh oh!

geoffkizer commented Nov 4, 2020

Uh oh!

geoffkizer commented Nov 4, 2020

Uh oh!

geoffkizer commented Nov 7, 2020

Uh oh!

rzikm commented Nov 7, 2020

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

geoffkizer commented Nov 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

stephentoub commented Nov 8, 2020

Uh oh!

geoffkizer commented Nov 8, 2020

Uh oh!

rzikm commented Dec 15, 2020

Uh oh!

azure-pipelines bot commented Dec 15, 2020

Uh oh!

rzikm commented Dec 18, 2020

Uh oh!

geoffkizer commented Feb 16, 2021

Uh oh!

rzikm commented Feb 16, 2021

Uh oh!

DemiMarie commented Feb 16, 2021

Uh oh!

rzikm commented Feb 16, 2021

Uh oh!

DemiMarie commented Feb 16, 2021

Uh oh!

rzikm commented Nov 3, 2020 •

edited by ManickaP

Loading

rzikm commented Nov 4, 2020 •

edited

Loading

geoffkizer commented Nov 4, 2020 •

edited

Loading

geoffkizer commented Nov 8, 2020 •

edited

Loading