Skip to content

esp_now: avoid parsing IPv6 header in netdev driver#10581

Merged
miri64 merged 4 commits intoRIOT-OS:masterfrom
TimoRoth:master
Dec 29, 2018
Merged

esp_now: avoid parsing IPv6 header in netdev driver#10581
miri64 merged 4 commits intoRIOT-OS:masterfrom
TimoRoth:master

Conversation

@TimoRoth
Copy link
Copy Markdown
Contributor

@TimoRoth TimoRoth commented Dec 10, 2018

Contribution description

Due to the very low MTU of ESP-NOW, it cannot use normal IPv6, which needs a way larger minimum packet size(See #10531).
These patches make ESP-NOW capable of being used with 6Lo instead. In turn they get rid of IPv6 header parsing inside of the netdev driver, which seemed very out of place there.
It provides a custom netif driver instead, that gets the regular l2 destination address from the netif header.

Testing procedure

Two esp32 boards with the gnrc_networking should still be able to communicate, although via 6Lo now instead of plain IPv6.
This also means that this patch does break backwards compatibility. A board with RIOT without these patches applied will not be able to communicate with one with them.

Issues/PRs references

Fixes #10531
Depends on #10536 (Merged)

@miri64 miri64 requested review from gschorcht and miri64 December 10, 2018 12:55
@miri64 miri64 added Area: network Area: Networking Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation labels Dec 10, 2018
@miri64 miri64 mentioned this pull request Dec 10, 2018
@gschorcht
Copy link
Copy Markdown
Contributor

@TimoRoth Thank you a for doing the rework of the ESP-NOW-netdev driver. Even though I was trying to understand more about the interaction of GNRC network layer and link layer implementations, as you can see from my questions, I'm really glad that you have already provided your changes.

Some first remarks:

  • maybe it would be better to create a branch for these changes instead of doing that in your master branch
  • please add your copyright notice and yourself as the author in new files
  • please add yourself as the author of changed files.
  • you should also try to compile the modules you changed with ENABLE_DEBUG.
--- a/cpu/esp32/esp-now/esp_now_netdev.c
+++ b/cpu/esp32/esp-now/esp_now_netdev.c
@@ -16,7 +16,7 @@
  * @author      Gunar Schorcht <[email protected]>
  */
 
-#define ENABLE_DEBUG (0)
+#define ENABLE_DEBUG (1)
 #include "debug.h"
 #include "log.h"
 #include "tools.h"
@@ -561,7 +561,7 @@ static int _recv(netdev_t *netdev, void *buf, size_t len, void *info)
 
 #if ENABLE_DEBUG
         printf ("%s: received %d byte from %02x:%02x:%02x:%02x:%02x:%02x\n",
-                __func__, dev->rx_len,
+                __func__, dev->rx_pkt.len,
                 dev->rx_pkt.mac[0], dev->rx_pkt.mac[1], dev->rx_pkt.mac[2],
                 dev->rx_pkt.mac[3], dev->rx_pkt.mac[4], dev->rx_pkt.mac[5]);

I have tried your changes. Unfortunatly, they are not working yet. The system hangs after initialization in netdev_esp_now_setup. I will try to figure out why.

@gschorcht
Copy link
Copy Markdown
Contributor

@TimoRoth Don't worry, it is not easy to fulfill static check requirements the first time 😉

@gschorcht
Copy link
Copy Markdown
Contributor

@TimoRoth Regarding esp_hexdump. I didnt realize that there is an hexdump function when I was looking for. Thank you for figuring out. Could you replace esp_hexdumpeverywhere withod_hexdump`, even if they are commented out?

@TimoRoth
Copy link
Copy Markdown
Contributor Author

TimoRoth commented Dec 10, 2018

Just now noticed you already started to review, sorry for the force pushes!

The od_hexdump comes from the parts that are based on the cc110x. I can change the old parts of the code to use it as well.

@gschorcht
Copy link
Copy Markdown
Contributor

Just now noticed you already started to review, sorry for the force pushes!

No problem.

@TimoRoth
Copy link
Copy Markdown
Contributor Author

I traced the hang on startup back to esp_wifi_scan_get_ap_num() never returning. Not really sure why that is happening though.

@TimoRoth
Copy link
Copy Markdown
Contributor Author

This generally works now. I can't ping my peers yet, but they do get a public IP auto-assigned from my Border-Router, so something has to be working.
I'll investigate why ping doesn't work tomorrow.

@TimoRoth
Copy link
Copy Markdown
Contributor Author

TimoRoth commented Dec 11, 2018

Works fine for me now. I can ping6 from the RIOT shell with large package sizes and observe how they get correctly fragmented and reassembled on the other end.
I still can't ping with packages larger than 189 bytes through the router, but judging from the logs that's entirely unrelated to this. Probably a too small receive buffer somewhere on the Ethernet device.

One thing that doesn't seem to work is IPHC, but I'm not sure if it's supposed to. If I set the GNRC_NETIF_FLAGS_6LO_HC flag for the ESP_NOW device, all I get is an EXCEPTION on send.

@gschorcht
Copy link
Copy Markdown
Contributor

@TimoRoth
Great to hear that you are making progress.

I still can't ping with packages larger than 189 bytes through the router, but judging from the logs that's entirely unrelated to this. Probably a too small receive buffer somewhere on the Ethernet device.

Are you using esp-eth aka the EMAC as ethernet device or anything else?

@TimoRoth
Copy link
Copy Markdown
Contributor Author

Are you using esp-eth aka the EMAC as ethernet device or anything else?

Yes, I already debugged it a bit, and at least at the lower levels, I can see the larger packages coming in just fine. Not sure where they get lost.

@gschorcht
Copy link
Copy Markdown
Contributor

gschorcht commented Dec 11, 2018

Let me know if I can do anything to support you. I'm working on other RIOT parts at the moment, but I'm still responsible for the RIOT port to ESP32.

@TimoRoth
Copy link
Copy Markdown
Contributor Author

I'd consider the remaining issues outside of the scope of this PR. I can send arbitrarily large packages with ping6 from the RIOT shell and they get fragmented and answered just fine.
So for all I'm aware, this is complete. Squashed and rebased, with some minor changes.

@gschorcht
Copy link
Copy Markdown
Contributor

I still can't ping with packages larger than 189 bytes through the router, but judging from the logs that's entirely unrelated to this. Probably a too small receive buffer somewhere on the Ethernet device.

I can observe the same problem for link local addresses when pinging from an ESP32 board. Pinging global unicast addresses behaves even more strange, see issue #10594.

@TimoRoth
Copy link
Copy Markdown
Contributor Author

One issue I just noticed is that very frequently esp_now_recv_cb drops incoming packages because the currently stored package hasn't been read yet. This is specially an issue with fragmented packages which tend to come in in very rapid succession.
This also has very weird consequences where my two boards go completely crazy during router adv/sol, getting into an infinite loop of sending mass amounts of sol/adv through the air.
Might be worth to introduce a small ring buffer there, with enough space to hold at least 1500 bytes (Ethernet MTU) worth of packages, so probably 2048 bytes due do the power of two requirement of tsrb.

@gschorcht
Copy link
Copy Markdown
Contributor

Might be worth to introduce a small ring buffer there, with enough space to hold at least 1500 bytes (Ethernet MTU) worth of packages, so probably 2048 bytes due do the power of two requirement of tsrb.

I would say yes, or at least the maximum packet length of 1280 byte which is supported by the netif. Memory shouldn't be an issue on the ESP32. But, when we start to port your changes to ESP8266, the memory will become a problem.

@TimoRoth
Copy link
Copy Markdown
Contributor Author

With the ringbuffer in place I can now properly communicate through the router with packages up to the esp-now interface MTU of 1280 bytes(Assuming #10604 is applied as well). Decided to make it a bit larger by default, as the data is also slightly larger than the raw package data due to it also holding a length byte and the mac address for each package.

Not 100% sure if I should squash this into the main commit, or leave it as independent change.

@gschorcht
Copy link
Copy Markdown
Contributor

@TimoRoth Is your ESP Router project up-to-date so that I could use it to test PRs #10581 and #10594 a bit more?

@gschorcht
Copy link
Copy Markdown
Contributor

Not 100% sure if I should squash this into the main commit, or leave it as independent change.

I would say keep it as a separate commit until the review has been finished and you are asked for squashing.

@gschorcht
Copy link
Copy Markdown
Contributor

gschorcht commented Dec 28, 2018

  • chunk 1 .. 3 make no sense to me
  • chunk 4 is strange since we have a 6Lo packet insde @00000058.
  • chunk 5 is even more strange since it is a double of MTU

@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 28, 2018

Chunk 1 looks like some pkt snip descriptor or netif header. I have a look tomorrow for more details.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 28, 2018

Double frees might also lead to such buffer dumps. I'll have a look for that as well.

@gschorcht
Copy link
Copy Markdown
Contributor

gschorcht commented Dec 29, 2018

chunk 4 starts with a last 6lo fragment with tag 0x0036 and offset 0x96 (1200)

00000000  E5  00  00  36  96  53  53  53  53  53  53  53  53  53  53  53

which ends at

00000050  53  53  53  53  53  53  53  53  C5  00  00  37  41  60  00  00

The chunk should have only have 85 bytes, but it has 336 bytes.

@00000058 starts the first 6Lo fragment of next packet with a size of 0x500 (1280 bytes) and tag 0x0037.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 29, 2018

If there is a double release there can also be "garbage" within the chunks so if something doesn't make sense is a good hint this is happening.

@gschorcht
Copy link
Copy Markdown
Contributor

I tried to take a look for additional gnrc_pktbuf_release. The only locations where I'm not really sure are

gnrc_pktbuf_release(pkt);
and
gnrc_pktbuf_release(pkt);

where the packet buffer is released in error cases. I have seen in
static int _send(gnrc_netif_t *netif, gnrc_pktsnip_t *pkt)
and
static int _send(gnrc_netif_t *netif, gnrc_pktsnip_t *pkt)
that packet buffer is not released in case of send errors.

@gschorcht
Copy link
Copy Markdown
Contributor

chunk 4 starts with a last 6lo fragment with tag 0x0037 and offset 0x96 (1200)

I meant 0x0036 of course. It was late last night 😎 I have updated the comment.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 29, 2018

Ok. Was able to reproduce this in native using socket_zep.

@gschorcht
Copy link
Copy Markdown
Contributor

Ups.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 29, 2018

I have seen in
RIOT/sys/net/gnrc/netif/ethernet/gnrc_netif_ethernet.c

Line 75 in 591092f

static int _send(gnrc_netif_t *netif, gnrc_pktsnip_t *pkt)
and
RIOT/sys/net/gnrc/netif/ieee802154/gnrc_netif_ieee802154.c

Line 182 in 591092f

static int _send(gnrc_netif_t *netif, gnrc_pktsnip_t *pkt)
that packet buffer is not released in case of send errors.

I rather think since the packet isn't released in the error case when the send method is left

res = netif->ops->send(netif, msg.content.ptr);
if (res < 0) {
DEBUG("gnrc_netif: error sending packet %p (code: %u)\n",
msg.content.ptr, res);
}

that those implementations are errorneous.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 29, 2018

(but since both are only in a very rare edge case that I believe we don't hit I don't think that is the problem)

TimoRoth and others added 2 commits December 29, 2018 13:15
Avoids parsing IPv6 packets to determine destination address.
Allows using 6Lo over ESP-NOW, which is required due to the low MTU of
ESP-NOW.

Co-authored-by: Gunar Schorcht <[email protected]>
@TimoRoth
Copy link
Copy Markdown
Contributor Author

So it's not something this change causes, but some rare edge case somewhere in RIOT?
Squashed once more.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 29, 2018

Ok. Was able to reproduce this in native using socket_zep.

Because of this, I don't believe this is an issue with this driver rework. Once @TimoRoth fixed the compiler issue pointed out in #10581 (review) (may be squashed immediately) we are good to go, I believe.

@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 29, 2018

So it's not something this change causes, but some rare edge case somewhere in RIOT?

Yes. I will allocate some time at the RIOT assembly @ 35C3 later to search for it. debugging race conditions time 💃

@gschorcht
Copy link
Copy Markdown
Contributor

ACK from my side. Under normal conditions with low rates and small packets it should work without any problems.

@TimoRoth
Copy link
Copy Markdown
Contributor Author

"falls through" change is already squashed into the relevant commit.

@miri64 miri64 added the Reviewed: 3-testing The PR was tested according to the maintainer guidelines label Dec 29, 2018
Copy link
Copy Markdown
Member

@miri64 miri64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-ACK. Is compilable on Arch again.

@miri64 miri64 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Dec 29, 2018
@miri64
Copy link
Copy Markdown
Member

miri64 commented Dec 29, 2018

And go.

@miri64 miri64 merged commit 5dda85f into RIOT-OS:master Dec 29, 2018
@gschorcht
Copy link
Copy Markdown
Contributor

@TimoRoth @miri64 Thanks a lot for your support to get ESP-NOW working. I will merge the ESP8266 implementation with cpu/esp_common/esp-now and provide it as part of PR #9917.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: network Area: Networking CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 3-testing The PR was tested according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: cleanup The issue proposes a clean-up / The PR cleans-up parts of the codebase / documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants