drivers/net: Add netdev_driver_t::confirm_send#14660
Conversation
| * @return Number of bytes transmitted. (The size of the transmitted | ||
| * frame including all overhead, such as frame check sequence, | ||
| * bit stuffing, escaping, headers, trailers, preambles, start of | ||
| * frame delimiters, etc. May be an estimate for performance | ||
| * reasons.) |
There was a problem hiding this comment.
@miri64: We had a discussion on what this size should be in an issue. This interpretation is different to what most drivers implement (in the send(), not the confirm_send() function), but as porting would result in work anyway, I figured we should rather go for the best solution than the lazy solution.
IMO there is little reason to just return the result of iolist_size(iolist) on what was passed in send(), as the upper layer could just call iolist_size() directly to get this. So if we really want additional data for statistics, it should be IMO a value that has the most value when the value of iolist_size(iolist) is known anyway. And to me this is the number of bytes physically transmitted; as this would allow to calculate the layer 2 throughput, while iolist_size(iolist) can provide the layer 2 goodput.
There was a problem hiding this comment.
The last sentence is wrong. The netif can easily calculate the sizes needed for goodput calculation, but it would need to make sure to call iolist_size(iolist) prior to adding the L2 header (and trailer, if applicable). But the reasoning should still be valid.
| * @param[in] iolist IO vector list to send. Elements of this list may | ||
| * have iolist_t::iol_size == 0 and (in this case only) | ||
| * iolist_t::iol_data == 0. |
There was a problem hiding this comment.
Note this change. Previously the API doc read as if a driver would need to expect elements in the iolist that have iolist_t::iol_data == NULL but iolist_t::iol_size > 0. I think this was just a mistake in formulation and not intentionally.
|
nice one! IMO we could also specify that the frame from the send function is always a PSDU as well (Full L2 frame). What do you think? |
I mostly agree, but I would like to leave the FCS calculation to the driver, as hardware support for that is very common. |
And stuff like escaping (e.g. slipdev) or bitstuffing is also be better left to the driver, as doing this on the fly can be implemented quite efficiently, while doing so in advance involves a lot of |
|
I guess this should be rebased on top of #14657 |
a272b86 to
64b85cd
Compare
Yes. Done :-) |
drivers/include/net/netdev.h
Outdated
| * via the `info` parameter | ||
| */ | ||
| NETDEV_EVENT_TX_COMPLETE_DATA_PENDING, | ||
| NETDEV_EVENT_TX_NOACK, /**< ACK requested but not received */ |
There was a problem hiding this comment.
Shoudln't NOACK and TX_MEDIUM_BUSY be included as well? They are resolved at the same time
There was a problem hiding this comment.
Done. I changed the return values to have -EGAIN in confirm_send() when called to early. This matches the semantics of the error code better. And this allows to return -EBUSY when the medium was busy; which is consistent with the interpretation of the same error code issued by send().
|
It would be nice to get some feedback on this. I have now implemented a PTP client on top of the
With that, PTP synchronization does already work. Without TX timestamps, the network delay cannot yet be estimated and thus not yet be compensated. I would now like to also implement the missing blocks for TX timestamp to also estimate the network delay. However, I would like to have a rough agreement on whether this approach or #14625 will make the race. |
|
I totally agree with the proposed change. We need more feedback though since it's an API change. |
|
If it helps, I am all for it :-). |
| * NETDEV_EVENT_TX_COMPLETE event. This function must not return | ||
| * `-EAGAIN` after that event was received. | ||
| */ | ||
| int (*confirm_send)(netdev_t *dev, void *info); |
There was a problem hiding this comment.
Who is going to call that function and what are they going to do with the number of bytes sent?
The 802.15.4 sub-mac will call a callback on TX done, can we do this here too?
There was a problem hiding this comment.
Who is going to call that function
E.g. in the current GNRC network stack, I would implement this in the gnrc_netif. That way the gnrc_netif could turn the asynchronous non-blocking API into a "blocking" API. (I'm not sure if blocking is the right word, but the gnrc_netif will only act upon a message when the transmission is fully complete. That way if e.g. the 6lowpan implementation would send 3 frames as fragments of a single IPv6 packet, the gnrc_netif would send them one by one. That way the driver does no longer need to block on send() for link layer fragmentation to work.)
and what are they going to do with the number of bytes sent?
The number of bytes sent is only useful to gather L2 stats IMO. So unless the l2 stats module is used, the caller will only check for the return value to be non-negative.
The 802.15.4 sub-mac will call a callback on TX done, can we do this here too?
In the long term I would like to adopt those ideas of the 802.15.4 radio HAL that can be generically applied to all network devices for the netdev API. As said, my personal goal is that in the end a compatible subset of the radio HAL becomes the "generic network HAL". That way code duplication between different network technologies could be avoided, while still being able to exploit technology specific features through the technology specific API additions on top of the "generic network HAL".
But I think it would be better to change the netdev API in small steps.
There was a problem hiding this comment.
So would this do
dev->send();
while (dev->confirm_send() == -EAGAIN) {}?
There was a problem hiding this comment.
Yes. But more likely you want to block until NETDEV_EVENT_TX_COMPLETE is issued.
There was a problem hiding this comment.
When would we get NETDEV_EVENT_TX_COMPLETE but have confirm_send() return -EAGAIN?
Shouldn’t one be enough?
I see also have .confirm_tx in the radio HAL.
I’d like to have @jia200x‘s 2 ct on this so we don’t end up with two slightly incompatible mechanisms for the same thing.
There was a problem hiding this comment.
The only mechanism that tells the upper layer that a packet was received is when confirm_transmit() returns 0.
The thing is, some radios have an "TX DONE" IRQ (and impressively, some like CC2420 don't). If the cap IEEE802154_CAP_IRQ_TX_DONE is available, the upper layer knows that it can e.g lock a mutex and unlock it on IEEE802154_CONFIRM_TX_DONE. But e.g for CC2420 the lower layer needs to poll until confirm_transmit return 0.
So in the Radio HAL officially there's only one mechanism (the confirm function).
Maybe we should update the documentation of NETDEV_EVENT_TX_COMPLETE to reflect that's not a mandatory event.
There was a problem hiding this comment.
Couldn't we instead use a timer in the CC2420 driver and keep the API easier instead? At least the driver should be in an excellent position to optimize the timeout duration.
There was a problem hiding this comment.
Nothing blocks the driver to implement such a feature. however, the timeout should make sure that confirm_send returns 0. Otherwise the API is not respected
benpicco
left a comment
There was a problem hiding this comment.
I'm all for taking the burden of blocking .send away from the driver.
I take that this is not in conflict with the radio HAL work, so have my ACK.
+1 Note that although not ideal, now it's possible to make all radios that return -EBUSY on send work with GNRC without needing |
|
Yea ideally the thread that produces those fragments should block instead of filling up a queue. |
Is this an ACK 😉 ? |
|
Let me squash and fix the whitespace errors. By now, the history of fixups is no useful information anymore, as I assume the memory of the reviewers of this PR has faded to much, so that they need to re-review the whole PR anyway. |
0e97732 to
da15774
Compare
|
what's the status of this one? is it close to go? :) |
I'd hit the ACK button right away once you provide the second ACK ;-) |
|
I think it would be nice to add the deprecation release (21.10?). Otherwise I'm fine. Otherwise, I ACK upon reply :) |
Changed the API of `netdev_driver_t`: - The `send()` function should no longer return the number of bytes and should not block - The upper layer now must call the new `confirm_send()` function after calling `send()`; either busy waiting until something different to `-EBUSY` is returned, or after `NETDEV_EVENT_TX_COMPLETE` was signaled During transition to the new API, the upper layer must remain backward compatible and must assume the legacy API if `netdev_driver_t::confirm_send()` is `NULL`.
da15774 to
53fcb97
Compare
My idea was to update |
|
I directly amended a typo fix ("An this case" --> "In this case") |
|
Thanks everyone :-) |
|
Just a short reminder that the deprecated enums are still in use within RIOT. |
Contribution description
Changed the API of
netdev_driver_t:send()function should no longer return the number of bytes and should not blockconfirm_send()function after callingsend(); either busy waiting until something different to-EBUSYis returned, or afterNETDEV_EVENT_TX_COMPLETEwas signaledDuring transition to the new API, the upper layer must remain backward compatible and must assume the legacy API if
netdev_driver_t::confirm_send()isNULL.Testing procedure
Just reading the doc and Murdock should be sufficient. Without any driver and no upper layer code being touched, this PR will just increase RAM consumption by one function pointer per network driver.
Issues/PRs references
Alternative to: #14625