Skip to content

QUIC ACK manager design#18564

Closed
hlandau wants to merge 5 commits intoopenssl:masterfrom
hlandau:quic-loss-detector
Closed

QUIC ACK manager design#18564
hlandau wants to merge 5 commits intoopenssl:masterfrom
hlandau:quic-loss-detector

Conversation

@hlandau
Copy link
Member

@hlandau hlandau commented Jun 14, 2022

This is my prototype design for the QUIC ACK manager API.

I've developed it based on on a number of sources: the RFCs, MSQUIC, and go-quic. I've tried to develop as great an understanding as possible of the inputs, outputs, and boundaries of what I here term the ACK manager, and what state it consumes. This is probably the(?) hairiest part of QUIC, though, so it's possible it will change, and it touches a lot of different pieces of state and different moving parts.

@paulidale Let me know if there seems to be anything here that doesn't mash up with your own work. We may want to hold a workshop for this.

@hlandau hlandau added branch: master Applies to master branch approval: review pending This pull request needs review by a committer approval: otc review pending triaged: documentation The issue/pr deals with documentation (errors) labels Jun 14, 2022
@hlandau hlandau requested review from mattcaswell and paulidale June 14, 2022 17:09
@hlandau hlandau self-assigned this Jun 14, 2022

The ACK manager consumes:

- an arbitrary function which returns the current time;
Copy link
Contributor

@paulidale paulidale Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't the ACK manager should be raising timeout events directly?

This would make QUIC_ACKM_on_timeout() and QUIC_ACKM_get_loss_detection_timeout() internal only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, at the cost of coupling the ACK manager to the event queue. I don't mind either way.

latest_rtt,
smoothed_rtt,
max_ack_delay;
} QUIC_RTT_STATS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't these be part of the flow controller/statistics collector?

More will be required for congestion control from memory (after MVP).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes. We don't have an API for it yet, so this is just a placeholder, I guess.

* is cb_arg.
*/
void (*on_discarded)(void *arg);
void *cb_arg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to put a type to this (once it's type is known). I'm currently using OSSL_QUIC_PACKET but it will probably change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is unnecessary and just couples things unnecessary.

Copy link
Contributor

@paulidale paulidale Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've ongoing pain where we overused void pointers in the provider APIs :(

Calling it struct blah will give type checking.

@paulidale paulidale mentioned this pull request Jun 15, 2022
2 tasks
@levitte levitte mentioned this pull request Jun 20, 2022

- Handling received ACK frames
- Generating notifications that a packet we sent was delivered successfully
- Generating notifications that a packet we sent was lost
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be talking about handling frames we have received - but the introductory sentence says "on the TX side"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No? If we send a packet the ACKM generates a callback when we a) know the packet was successfully delivered or b) has been deemed lost. This is (for the purposes of this design) considered part of the TX side, as it is considered part of the process of reliably transmitting packets.

The RX side handles ACK'ing packets we receive from the peer.

- Generating notifications that a packet we sent was delivered successfully
- Generating notifications that a packet we sent was lost

On the RX side, it is responsible for:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TX side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. This is the RX side because we have to generate ACK frames for packets we receive.

void QUIC_ACKM_delete(QUIC_ACKM *ackm);
```

The function pointer `now` is invoked by the ACK manager to obtain the current
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a callback - rather than a direct call of ossl_time_now ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is desired because it makes the functionality far easier to test. The time function can be replaced with a mock which returns a test-controlled timing value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we will probably want some debugging/testing callback from the ossl_time_now() itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I favour instrumenting or replacing ossl_time_now() for testing.

const QUIC_ACKM_ACK_RANGE *ack_ranges;
size_t num_ack_ranges;

OSSL_TIME delay_time;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directly corresponds with the ACK Delay field in the ACK frame as defined in RFC 9000. The loss detection psuedocode in RFC 9002 describes the use of this field by the loss detector.

size_t num_ack_ranges;

OSSL_TIME delay_time;
uint64_t ect0, ect1, ecnce;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


The structure pointed to by the returned pointer, and the referenced ACK range
structures, are guaranteed to remain valid until the next call to any
`QUIC_ACKM` function. After such a call is made, all fields become undefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies calls to the ACKM are intended to be single-threaded. Is that the case? I've not seen any discussion about thread management.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronisation is assumed to be the responsibility of the caller.

Realistically, the ACKM is a bunch of state and I don't see an easy way to make it thread safe other than just with a single lock. I believe other implementations are similar in this regard. If there is a way, it would definitely add enough complexity that we should definitely make this work single-threaded first and have it passing tests before we attempt to add that complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO at worst we could do synchronization via the event queue - i.e., the ack manager would be called only when processing events and things like received ack frames would be created as events from the rx depacketizer.

```

Packet numbers are 62-bit values represented herein by `QUIC_PN`.
`QUIC_PN_INFINITE` evaluates to an invalid QUIC packet number value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QUIC_PN_INVALID maybe?

```c
typedef struct quic_ackm_st QUIC_ACKM;

QUIC_ACKM *QUIC_ACKM_new(OSSL_TIME (*now)(void *arg),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function should be called ossl_quic_ackm_new and similarly below. See the coding style policy. Use lowercase prefix like ossl_ for internal symbols unless they are static (i.e., local to the source file).

Comment on lines +300 to +301
The value returned by this function may change after any call to any of the
event functions above is made.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the timeout will be a timed event in the event queue, does it mean it has to be constantly updated (i.e. moved around in the queue)? I suppose so. That has some implications for the event queue design @paulidale. Now the question is which component should do the updates of the event - to me it would be more natural to just directly put (and update) the event in the queue from the ack manager, although that would make it less isolated from the rest of the code.

One middle-ground option would be to add a callback that would be set when creating the ACK manager that would be called whenever the timeout event needs to be updated.

Copy link
Contributor

@paulidale paulidale Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event queue has a postpone event call which could be used. I think it would be the ACK Manager's task to update the event queue either using this or remove/add.

However, I see no reason why all timeout events couldn't be added to the event queue and this function ceasing to exist. In fact, I'd favour this approach over an ad-hoc updating -- it's simplier. In other words, the ACK Manager adds and removes events for all packets from the event queue directly.

padded Initial packets being requested. This is equivalent to
`SendOneAckElicitingPaddedInitialPacket()` in RFC 9002.

- `pto` designates the number of ACK-eliciting outstanding probe events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expand the pto name to probe_timeouts?

@hlandau hlandau force-pushed the quic-loss-detector branch from 838679e to b6092ab Compare June 24, 2022 07:17
@hlandau hlandau force-pushed the quic-loss-detector branch from b6092ab to 8647380 Compare June 24, 2022 07:27
@hlandau
Copy link
Member Author

hlandau commented Jun 27, 2022

Some updates, mainly requirements discovered during implementation.

@openssl-machine
Copy link
Collaborator

This PR is in a state where it requires action by @openssl/otc but the last update was 30 days ago

@hlandau
Copy link
Member Author

hlandau commented Aug 24, 2022

Superceded by #18676

@hlandau hlandau closed this Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval: review pending This pull request needs review by a committer branch: master Applies to master branch triaged: documentation The issue/pr deals with documentation (errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants