Skip to content

TX packetiser (QUIC)#18570

Closed
paulidale wants to merge 3 commits intoopenssl:masterfrom
paulidale:tx-packetiser
Closed

TX packetiser (QUIC)#18570
paulidale wants to merge 3 commits intoopenssl:masterfrom
paulidale:tx-packetiser

Conversation

@paulidale
Copy link
Contributor

@paulidale paulidale commented Jun 15, 2022

Intended as a companion for #18564, although less advanced.

  • documentation is added or updated
  • tests are added or updated

@paulidale paulidale added branch: master Applies to master branch triaged: documentation The issue/pr deals with documentation (errors) labels Jun 15, 2022
@paulidale paulidale self-assigned this Jun 15, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the input and output interfaces consumed by the TX packetiser more clearly delineated. e.g. How is the packetiser instantiated? How and what dependencies are injected (e.g. the network BIO). Is this module the exclusive permitted user of BIO_sendmmsg? (Presumably it is.)

Will the TX packetiser need to handle timer events and have the event queue provided to it (e.g. frame/packet coalescing delays)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this module the exclusive permitted user of BIO_sendmmsg? (Presumably it is.)

At least in the original design the TX QUIC record layer would call the BIO_sendmmsg(). However that does not really work well with 1 call to record layer for QUIC 1 packet and it also does not work for packet coalescing. So... this is still unclear.

On the other hand I suppose some clever filtering BIO inserted in front of the real BIO_dgram might work.

@t8m
Copy link
Member

t8m commented Jun 15, 2022

Intended as a companion for #11991, although less advanced.

I suppose you've meant to link some different PR.

@levitte levitte mentioned this pull request Jun 20, 2022
@paulidale paulidale force-pushed the tx-packetiser branch 2 times, most recently from 3b5241a to 694fe62 Compare June 27, 2022 05:27
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected the packetiser to determine the length of application data frames - not be told by something external what that length is. Presumably the packetiser knows the maximum size that the packet can be, and we also (presumably) want to send maximum sized packets where ever possible. The packetiser will have to take the data input from various sources and form the complete packet from it. For example let say that the packetiser decides to send some ack frames and a crypto frame, and its got 1k of "space" left in the packet which it can fill with application data. It needs to take that much data from the steam buffers. Surely, only the packetiser knows the amount of space available and can form the application data frames accordingly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design specifies that complete frames are passed into the packetiser. Either we need to change this or the stream send buffers need to know the length (which means the packetiser should have a call to make this available).

I'm imagining that several packets will be on the go at any one time.
Packetising is equivalent to the bin packing problem, so this should ease things. Also, there are frames that are best sent separately (e.g. ack frames don't need an ack if sent without any frames that do need an ack, likewise they don't count for bytes in flight alone).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was envisaging that the packetiser would have access to a set of stream send buffers. Each stream buffer would know how much data it has available to send - and the packetiser can query that. The packetiser would query each stream buffer in turn and fill the "current" packet with data from each buffer. We will have to have some algorithm to determine how much to take from each one - I can think of a few different strategies, e.g. 2 possibilities might be:

  1. Take as much data as possible from the first stream buffer, until either we've run out of data or the packet is full. If the packet is not yet full move onto the next stream buffer and repeat. Next time we create a packet continue where we left off, and iterate through each stream buffer in a round-robin fashion.

or

  1. Work out how many stream buffers we have. Divide the amount of space available by the amount of stream buffers and fill the packet with a small amount of data from every buffer. If any one stream buffer has less data than its allowance, then increase the allowance of all the other stream buffers accordingly.

Possibly different stream buffers have different priorities. You could envisage a "very high priority" that means "just send the data from this stream buffer as soon as possible and don't mix it with other stream data", which could be used for the ack frames scenario you describe. Or something like that anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the best strategy (to avoid stalling multiple streams when packets are lost) is to fill in a packet with as much data available from a single stream as possible. Then in another packet continue with next stream to provide fairness to streams and so on. Of course if a stream does not have enough data to fill a full packet, add data from another stream. So this is something in-between 1 and 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in the MVP I would not try to add priority to streams as prioritization can be done by the application itself by blocking adding more stream data to lower priority streams until it has sent all the high priority stream data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the TXP creates the frames (i.e., the framer is part of the TXP). The calling modules just ask it to wrap their content into appropriate frames. I.e., ACK manager asks it to create ACK frames, the TLS handshake RL asks it to create CRYPTO frames, and so on. The TXP then buffers/sends the frames as appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retries should be done at higher priority.

For lost frames? Well they just need to be sent before sending any other data for the same stream IMO it is not that complicated.

Copy link
Contributor Author

@paulidale paulidale Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design speaks otherwise:

TX Packetizer

This module creates frames from the application data obtained from
the application. It also receives CRYPTO frames from the TLS Handshake
Record Layer and ACK frames from the ACK Handling And Loss Detector
subsystem.

Which means the TXP should be creating the application data frames.

I also remember something about frames being passed to the TXP but I could easily be misremembering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the design is vague in this regard. Perhaps it should be updated? I.e. It also receives data for CRYPTO frames from the TLS Handshake Record Layer and data for ACK frames from the ACK Handling And Loss Detector subsystem. ? I do not know what makes better sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattcaswell said:

I was envisaging that the packetiser would have access to a set of stream send buffers. Each stream buffer would know how much data it has available to send - and the packetiser can query that. The packetiser would query each stream buffer in turn and fill the "current" packet with data from each buffer.

So I wonder, who would call the appropriate function to get the packetiser to start working?

Me, I was always thinking that it would be other parts that would essentially call the packetiser function to say "hey, I have something for you"... and while doing so, they could as well include "and here's my data buffer". As far as I understand, that's more or less exactly how this function should be called? I'm not sure I understand why the packetiser should do the extra churn of running around all the SSL objects to ask "hey, do you have something for me?" in this case.

So, in the scenario where the packetiser just gets the signal to start doing something and go asking all the SSL objects it can reach, how does that process get started?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I envisaged a single API call that asks "how much data can I send at the moment?" That API would then have to query the flow controller and congestion controller to find the answer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I envisage two (configurable) modes of operation:

  1. Send as soon as we have application/crypto data available to send.
  2. If we have a full packet worth of data, then send immediately - otherwise send after a configurable delay period.

In the first mode I imagine somewhere in the implementation of SSL_write there will be a call to send the packets.
In the second mode I envisage this being invoked by a callback as a result of a time based event firing.

We might need 2 API calls available, e.g. ossl_quic_send_packets_now and ossl_quic_send_packets_if_full....or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a single mode:

  • if there is no data in flight or immediately after the handshake send now
  • if there is data in flight, accumulate data but also start a short timeout and send if it expires

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending data now ought to take a stream SSL object -- presumably one is interested in flushing a stream rather than a connection. Accumulating data would take a connection SSL object (which is implicit in the TXP instance).

This would lead to:

int ossl_quic_send_packets(SSL_QUIC_TX_PACKETISER *tx, SSL *stream);

which flushes and sends the stream now if it is not NULL and accumulates & sends if desirable otherwise.

@paulidale paulidale force-pushed the tx-packetiser branch 2 times, most recently from ee6b0ad to 72deaff Compare June 27, 2022 11:13
Copy link
Member

@hlandau hlandau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mattcaswell that it should be the TXP composing packets, pulling frames from available sources as appropriate.

This is already how things work for the ACKM, you call ossl_ackm_is_ack_desired and ossl_ackm_get_ack_frame to get a structure logically expressing the desired ACK ranges. So TXP has to poll ACKM for this. However, the number of ACK ranges returned by ACKM may be more than TXP can or wants to fit in a packet, it is for TXP to decide what size it wants to make ACK frames. It would make sense to me to use a similar design for STREAM frames.

I see the TXP here as a kitchen chef which pulls from available sources of frames in order to generate full packets, since the TXP is in the best position to know how much space is available, etc.

I imagine the core mechanic would just be a loop filling a packet buffer which calls different possible frame sources until no more will fit. There might be some exceptions to this, like CRYPTO frames, where it makes more sense for other code to push to the TXP, but this doesn't seem like it would affect the design much (and if one really wanted to avoid further complicating the TXP, could be a small queue maintained outside the TXP which it polls).

While we could have dynamic registration of frame sources using callbacks, I think this is overkill since the possible frame sources are statically known.

@paulidale
Copy link
Contributor Author

I agree with @mattcaswell that it should be the TXP composing packets, pulling frames from available sources as appropriate.

This has never been in doubt. The question is who/what builds the frames. The design currently has the TXP building application frames but receiving completed frames from everything else. The question being posed is: should the TXP also compose the other types of frame too?

@hlandau
Copy link
Member

hlandau commented Jul 1, 2022

My very rough take would be:

  • STREAM and ACK frames should be built by the TXP because these must be generated dynamically according to available packet space.
  • CRYPTO frames seem to be functionally very similar to STREAM frames, providing a logical bytestream to the handshake layer, so the TXP can choose how much to send in a given packet. So they should probably be handled very similarly to STREAM frames.
  • Whether to insert a PING frame, and whether to insert PADDING frames is obviously the TXP's decision, which it will probably make quite autonomously.
  • All other frame types are basically fixed and inflexible, so the TXP should probably give priority to sending these (put them in a packet first if queued). Because the TXP doesn't get a choice in how long these frames are, it would be reasonable to have either the TXP or other parts of the code generate them.
  • "Generating" these other frames basically seems to amount to whether the frame is offered to the TXP as a high-level structure logically representing the data in the frame, which the TXP then serializes into the wire format, or whether other code pre-serializes into the wire format and just offers a buffer to the TXP, which it transmits at high priority blind in the next packet.
  • Both of these seem valid options to me. The latter would reduce complexity in the TXP, so it seems preferable on that point. The former might be necessary if the TXP needs to snoop the transmission of some of these other frame types due to having requirements which impinge on the TXP (but then it might just be better to have whatever function the TXP calls to retrieve the frame have some sideband interface providing the TXP whatever information it needs).

(This reminds me that I've been meaning to create an issue to create functions for wire format encoding/decoding of specific frame types, since we'll need them.)

@paulidale
Copy link
Contributor Author

Why would the TXP need to worry about PING frames? It's seems pretty disconnected from packet assembly.

I'm also not convinced that variable length means the TXP is responsible. ACKs e.g. could be sent as a separate packet if they don't fit the current one -- which also has the advantage of being non-ack eliciting.

@paulidale paulidale marked this pull request as ready for review July 11, 2022 03:03
@paulidale
Copy link
Contributor Author

It's a bit rough around the edges and there are some outstanding questions but I think this is good enough for the moment.

@levitte
Copy link
Member

levitte commented Jul 11, 2022

From discussions both here and in #18610, it seems that we have different views on if the TX and RX Frame handlers should work with a pull model or a push model. For a pull model, it's also remains a bit unclear when and how these Frame handlers are supposed to get started or by whom they should be called to do their work.

@paulidale
Copy link
Contributor Author

State machine drives most things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this module the exclusive permitted user of BIO_sendmmsg? (Presumably it is.)

At least in the original design the TX QUIC record layer would call the BIO_sendmmsg(). However that does not really work well with 1 call to record layer for QUIC 1 packet and it also does not work for packet coalescing. So... this is still unclear.

On the other hand I suppose some clever filtering BIO inserted in front of the real BIO_dgram might work.


The MTU limits the size of an individual packet, the other two limit the
total amount of data that can be sent. The packetiser needs to query the
current limits using the `ossl_quic_flow_maximum_size()`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure whether this isn't too late - IMO the flow control limits should be limited already when writing to the stream send buffers. I.e, on the SSL_ API front end call.

@mattcaswell
Copy link
Member

Getting an annoying error from github when trying to comment on one of the threads above...so I'll post my comment here instead:

At least in the original design the TX QUIC record layer would call the BIO_sendmmsg(). However that does not really work well with 1 call to record layer for QUIC 1 packet and it also does not work for packet coalescing. So... this is still unclear.

The record layer design does not have 1 call to record layer for 1 QUIC packet. The relevant function is:

    int (*write_records)(OSSL_RECORD_LAYER *rl, OSSL_RECORD_TEMPLATE **templates,
                         size_t numtempl, size_t allowance, size_t *sent);

A "record" here mean a QUIC packet. So write_records takes multiple record templates and will then call BIO_sendmmsg() when it has constructed the dgrams. The packetiser is expect to make 1 call to the record layer when it has a set of records to send. There are also these functions available:

    /*
     * Find out the maximum amount of plaintext data that the record layer is
     * prepared to write in a single record. When calling write_records it is
     * the caller's responsibility to ensure that no record template exceeds
     * this maximum when calling write_records.
     */
    size_t (*get_max_record_len)(OSSL_RECORD_LAYER *rl);

    /*
     * Find out the maximum number of records that the record layer is prepared
     * to process in a single call to write_records. It is the caller's
     * responsibility to ensure that no call to write_records exceeds this
     * number of records.
     */
    size_t (*get_max_records)(OSSL_RECORD_LAYER *rl);

So with these functions the packetizer can figure out the total amount of data it can send in one go, and the maximum amount of data that can go into a single record. From there it is expected to form the record templates accordingly and make one call to write_records

@paulidale
Copy link
Contributor Author

ping for second review

int ossl_quic_free_app_data(OSSL_QUIC_FRAME *frame,
void *data, size_t data_len);
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there will need to be some running total held somewhere of the total amount of data not sent yet across all buffers. The packetiser will need to create a new packet either when some timeout has expired, or alternatively when sufficient data is pending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The running total would be the stats manager or ACK managers concern. I really don't think it belongs here.

It's the ACK manager's job to tell the TXP that retransmission is required & what needs to be retransmitted. The TXP then rebuilds packets based on this information. This complicates the coordination between the TXP and the send buffers because data block boundaries can be expected to change. I've not got a way to say forget about these blocks they need to be retransmitted and the boundaries could change.

@hlandau
Copy link
Member

hlandau commented Aug 11, 2022

Just a heads up, #18949 now has the TX record layer API and implementation.

@paulidale
Copy link
Contributor Author

Updated.


```c
void OSSL_QUIC_STREAM_set_priority(SSL *stream, uint32_t priority);
uint32_t OSSL_QUIC_STREAM_get_priority(SSL *stream);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't look quite right. Either these are public functions in which case I would expect them to be:

void SSL_set_priority(SSL *stream, uint32_t priority);
uint32_t SSL_get_priority(SSL *stream);

In the above case we need to have some decision about how these functions behave in the case that a TLS SSL object is passed to them (I'd expect they just "work" - the priority is a hint only that is ignored in TLS).

The document needs to be clear about which functions that are presented are public and which are internal.

If, on the other hand, these are non-public functions I would expect them to look like this:

void QUIC_STREAM_set_priority(QUIC_STREAM *stream, uint32_t priority);
uint32_t QUIC_STREAM_get_priority(QUIC_STREAM *stream);

Aside: Are we using OSSL_QUIC_STREAM or QUIC_STREAM for the stream object. I've seen both used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: Are we using OSSL_QUIC_STREAM or QUIC_STREAM for the stream object. I've seen both used.

In what context? For structures and types, QUIC_ is okay. For functions, they should begin ossl_ IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the object would be QUIC_STREAM, but any function that uses it would be OSSL_QUIC_STREAM_whatever(QUIC_STREAM *stream, ...) (or should that be ossl_quic_stream_whatever(QUIC_STREAM *stream, ...) for internal things)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.


```c
int ossl_quic_retransmitting_app_data(QUIC_STREAM *stream, size_t offset,
void *data, size_t data_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear how this is interpreted. IIUC this is telling a stream send buffer that data we previously requested to be sent now needs to be retransmitted - so remove any record of the fact that we sent it before and make the data eligible to be sent again. Is that correct? What does offset mean here?

Copy link
Contributor Author

@paulidale paulidale Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offset is the data offset in the stream. I'll change stream offset to something more verbose in the description.

I'm not convinced that offset is required but I included it for the moment.

Your interpretation is correct. I've reworded it a little to hopefully clarify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unconvinced about the need for offset. Or at least I don't understand why we need both data and offset. They both seem to be being used to identify the data that needs to be retransmitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me either. It depends on how the send buffers maintain the information about what has been transmitted and what hasn't. Once that's known, we'll know which is required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should pick one (data or offset) and run with just that in this design - not have both. We can always change it later. Having both is definitely wrong.

by calling:

```c
void ossl_qtx_flush_net(OSSL_QTX *qtx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an OSSL_QTX? Isn't this supposed to be OSSL_QUIC_TX_PACKETISER?

Copy link
Contributor Author

@paulidale paulidale Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defined in #18949, I assumed it was the TX record layer.

Copy link
Member

@mattcaswell mattcaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I think its nearly there.


```c
int ossl_quic_retransmitting_app_data(QUIC_STREAM *stream, size_t offset,
void *data, size_t data_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should pick one (data or offset) and run with just that in this design - not have both. We can always change it later. Having both is definitely wrong.

@mattcaswell mattcaswell added the approval: review pending This pull request needs review by a committer label Sep 12, 2022
@paulidale
Copy link
Contributor Author

@t8m still okay?

Copy link
Member

@t8m t8m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for now. Might need some finetunning later.

@t8m t8m added approval: done This pull request has the required number of approvals triaged: design The issue/pr deals with a design document and removed approval: review pending This pull request needs review by a committer labels Sep 12, 2022
@paulidale
Copy link
Contributor Author

Merged, finally.

@paulidale paulidale closed this Sep 13, 2022
@paulidale paulidale deleted the tx-packetiser branch September 13, 2022 11:22
openssl-machine pushed a commit that referenced this pull request Sep 13, 2022
Reviewed-by: Tomas Mraz <[email protected]>
Reviewed-by: Matt Caswell <[email protected]>
(Merged from #18570)
sftcd pushed a commit to sftcd/openssl that referenced this pull request Sep 24, 2022
Reviewed-by: Tomas Mraz <[email protected]>
Reviewed-by: Matt Caswell <[email protected]>
(Merged from openssl#18570)
beldmit pushed a commit to beldmit/openssl that referenced this pull request Dec 26, 2022
Reviewed-by: Tomas Mraz <[email protected]>
Reviewed-by: Matt Caswell <[email protected]>
(Merged from openssl#18570)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approval: done This pull request has the required number of approvals branch: master Applies to master branch triaged: design The issue/pr deals with a design document triaged: documentation The issue/pr deals with documentation (errors)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants