Use TAILQ_FOREACH_SAFE whenever we do TAILQ_REMOVE during traversal by pbui · Pull Request #11 · NetworkConfiguration/dhcpcd

pbui · 2019-10-29T13:06:00Z

We must use TAILQ_FOREACH_SAFE instead of TAILQ_FOREACH if we call TAILQ_REMOVE on the current item and free it.

Failure to do so can lead to bad behavior such as infinite loops.

A few users of Void Linux are currently experiencing high CPU usage due to this bug in 8.1.1: void-linux/void-packages#15858

https://www.reddit.com/r/voidlinux/comments/donf2w/dhcpcd_consuming_100cpu/

It is difficult to reliably trigger this bug, but when it does occur, a backtrace showed that the process was stuck in ipv4ll_drop which calls ipv4_deladdr, which in turn had the unsafe traversal where it calls TAILQ_REMOVE inside a TAILQ_FOREACH rather than TAILQ_FOREACH_SAFE.

Further auditing of the code should be performed to ensure the proper loop traversal macro is used to avoid such bugs (I fixed three instances where I believe TAILQ_FOREACH_SAFE should be used instead of TAILQ_FOREACH, but there may be others).

We must use TAILQ_FOREACH_SAFE instead of TAILQ_FOREACH if we call TAILQ_REMOVE on the current item and free it. Failure to do so can lead to bad behavior such as infinite loops.

rsmarples

These changes are not necessary.

rsmarples · 2019-10-29T14:00:36Z

src/eloop.c


 	/* Remove existing timeout if present. */
-	TAILQ_FOREACH(t, &eloop->timeouts, next) {
+	TAILQ_FOREACH_SAFE(t, &eloop->timeouts, next, tt) {


This is not needed because we break from the loop immediately after removing the matching item.

rsmarples · 2019-10-29T14:03:01Z

src/ipv4.c


 	state = IPV4_STATE(addr->iface);
-	TAILQ_FOREACH(ap, &state->addrs, next) {
+	TAILQ_FOREACH_SAFE(ap, &state->addrs, next, ian) {


This is not needed because we break from the loop after removing the matching address.

rsmarples · 2019-10-29T14:03:56Z

src/ipv6.c


 	state = IPV6_STATE(ia->iface);
-	TAILQ_FOREACH(ap, &state->addrs, next) {
+	TAILQ_FOREACH_SAFE(ap, &state->addrs, next, ian) {


This is not needed because we break from the loop after removing the matching address.

rsmarples · 2019-10-29T14:07:57Z

I suspect the issue you are seeing is already fixed by commits 09dc4d0 and f7e2c24.
If you agree, please close this PR.

pbui · 2019-10-29T16:30:25Z

I suspect the issue you are seeing is already fixed by commits 09dc4d0 and f7e2c24.
If you agree, please close this PR.

Those commits do not resolve the high CPU usage problem. As you can see in https://github.com/void-linux/void-packages/pull/15858/files, I included those commits in what I thought should fix the bug, but it had no effect as I was still able to experience the high CPU after switching networks.

pbui · 2019-10-29T16:51:27Z

I actually managed to trigger the bug with these changes... so this is clearly not the fix. However, I was able to produce a backtrace:

#0  ipv4_iffindaddr (ifp=ifp@entry=0x560f925376b0, addr=addr@entry=0x560f9251a948, mask=mask@entry=0x0)
    at ipv4.c:130
#1  0x0000560f9110d21b in arp_announceaddr (ctx=<optimized out>, ia=ia@entry=0x560f9251a948)
    at arp.c:542
#2  0x0000560f9110b790 in ipv4_applyaddr (arg=arg@entry=0x560f92537190) at ipv4.c:747
#3  0x0000560f911060a2 in dhcp_drop (ifp=0x560f92537190, reason=0x560f911206d7 "EXPIRE") at dhcp.c:2713
#4  0x0000560f910f0030 in dhcpcd_drop (ifp=0x560f92537190, stop=<optimized out>) at dhcpcd.c:412
#5  0x0000560f91102f89 in link_netlink (arg=0x7fffe5308700, nlm=0x7fffe5304620, ctx=0x7fffe5308700)
    at if-linux.c:895
#6  link_netlink (ctx=ctx@entry=0x7fffe5308700, arg=arg@entry=0x0, nlm=nlm@entry=0x7fffe5304620)
    at if-linux.c:798
#7  0x0000560f9110139a in get_netlink (ctx=0x7fffe5308700, iov=iov@entry=0x7fffe5304610, 
    arg=arg@entry=0x0, fd=8, flags=flags@entry=64, 
    callback=callback@entry=0x560f91102b00 <link_netlink>) at if-linux.c:476
#8  0x0000560f91101c99 in if_handlelink (ctx=<optimized out>) at if-linux.c:910
#9  0x0000560f910f2139 in dhcpcd_handlelink (arg=0x7fffe5308700) at dhcpcd.c:1089
#10 0x0000560f910f3bbb in eloop_start (eloop=0x560f92535c70, signals=0x7fffe5308810) at eloop.c:997
#11 0x0000560f910ee4c9 in main (argc=<optimized out>, argv=<optimized out>) at dhcpcd.c:2110

It looks like it is stuck in this loop in ipv4_iffindaddr:

                TAILQ_FOREACH(ap, &state->addrs, next) {                                                
                        if ((addr == NULL || ap->addr.s_addr == addr->s_addr) &&
                            (mask == NULL || ap->mask.s_addr == mask->s_addr))
                                return ap; 
                }

I can close this PR since it does not address the issue, but how should I proceed on the bug itself?

pbui · 2019-10-29T17:56:07Z

So what appears to happen is that when I transition from wireless to wireless + ethernet (or wireless + ethernet to just wireless), I get stuck in ifv4_iffindaddr. As you can see from the annotated backtrace below (paired with the output of gdb above), ipv4_iffindaddr gets stuck in the loop because:

state->addrs appears to be circular (tqh_first and tqh_last point to the same object).
state->addrs contains an address (ap->addr.s_addr) related to eth0 even though the addr is related to wlan0.

Because of this, a match is never found and it loops forever...

But there are other strange properties. For instance, link_netlink, dhcpcd_drop, and dhcp_drop have an ifp argument of eth0 but when we look up the lease->addr.s_addr of eth0, we get an IP address (10.36.27.71) associated with wlan0 (originally eth0 was 129.74.161.60).

Moreover, in arp_announceaddr even though we are reacting to changes in eth0, we call ipv4_iffindaddr on wlan0.

I'm not sure what to make of this or how to fix it... but I hope this helps in tracking down this bug.

Here is some more context (I printed out select values for variables) on that backtrace...

ipv4_iffindaddr:

    ifp->name		    wlan0

    ap->addr.s_addr	    1192961034	    129.74.161.60   # OLD eth0	

    addr.s_addr		    1017203329	    10.36.27.71	    #  CURRENT wlan0

    state->addrs	    {tqh_first = 0x560f9251af00, tqh_last = 0x560f9251af00}

# Why does IPV4_STATE return addresses associated with eth0 instead of wlan0?

arp_announceaddr:

    ifp->name		    wlan0
    ia->s_addr		    1017203329	    10.36.27.71

# Should we announce on wlan0 even though eth0 is the device in question?

ipv4_applyaddr:

    ifp->name		    eth0
    (&lease->addr).s_addr   1017203329	    10.36.27.71

# Why does eth0 have lease address of 10.36.27.71?

dhcp_drop:

    ifp->name		    eth0

dhcpd_drop:

    ifp->name		    eth0

link_netlink:

    ifn			    eth0

pbui · 2019-10-29T18:08:01Z

Also, here are the logs from the process:

wlan0: renewing lease of 10.36.27.71
wlan0: sending REQUEST (xid 0xf9756657), next in 3.8 seconds
wlan0: acknowledged 10.36.27.71 from 66.205.160.60
wlan0: leased 10.36.27.71 for 7200 seconds
wlan0: renew in 3600 seconds, rebind in 6300 seconds
wlan0: writing lease `/var/db/dhcpcd/wlan0-eduroam.lease'
wlan0: adding IP address 10.36.27.71/21 broadcast 10.36.31.255
wlan0: ARP announcing 10.36.27.71 (1 of 2), next in 2.0 seconds
wlan0: executing `/usr/libexec/dhcpcd-run-hooks' RENEW
wlan0: ARP announcing 10.36.27.71 (2 of 2)
eth0: carrier lost
eth0: executing `/usr/libexec/dhcpcd-run-hooks' NOCARRIER
eth0: deleting IP address 129.74.161.60/23
eth0: deleting route to 129.74.160.0/23
eth0: deleting default route via 129.74.160.1

After this, the process hangs with the infinite loop.

rsmarples · 2019-11-05T22:06:00Z

This should be fixed in 73ac184.
Please verify

pbui · 2019-11-07T14:17:56Z

Looks like that fixed it... Thanks!

Use TAILQ_FOREACH_SAFE whenever we do TAILQ_REMOVE during traversal

a7dc708

We must use TAILQ_FOREACH_SAFE instead of TAILQ_FOREACH if we call TAILQ_REMOVE on the current item and free it. Failure to do so can lead to bad behavior such as infinite loops.

rsmarples reviewed Oct 29, 2019

View reviewed changes

yorickvP mentioned this pull request Nov 1, 2019

dhcpcd: 8.0.6 -> 8.1.1 NixOS/nixpkgs#71365

Merged

10 tasks

pbui closed this Nov 7, 2019

Sime-Zupanovic mentioned this pull request Jun 12, 2025

dhcpcd crash due to double free attempt #509

Open

Sime-Zupanovic mentioned this pull request Jul 4, 2025

dhcpcd program crash in case flooded with ICMPv6 RAs (Router Advertisements) #515

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use TAILQ_FOREACH_SAFE whenever we do TAILQ_REMOVE during traversal#11

Use TAILQ_FOREACH_SAFE whenever we do TAILQ_REMOVE during traversal#11
pbui wants to merge 1 commit intoNetworkConfiguration:masterfrom
pbui:tail-foreach-safe

pbui commented Oct 29, 2019

Uh oh!

rsmarples left a comment

Uh oh!

rsmarples Oct 29, 2019

Uh oh!

rsmarples Oct 29, 2019

Uh oh!

rsmarples Oct 29, 2019

Uh oh!

rsmarples commented Oct 29, 2019

Uh oh!

pbui commented Oct 29, 2019

Uh oh!

pbui commented Oct 29, 2019

Uh oh!

pbui commented Oct 29, 2019

Uh oh!

pbui commented Oct 29, 2019

Uh oh!

rsmarples commented Nov 5, 2019

Uh oh!

pbui commented Nov 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pbui commented Oct 29, 2019

Uh oh!

rsmarples left a comment

Choose a reason for hiding this comment

Uh oh!

rsmarples Oct 29, 2019

Choose a reason for hiding this comment

Uh oh!

rsmarples Oct 29, 2019

Choose a reason for hiding this comment

Uh oh!

rsmarples Oct 29, 2019

Choose a reason for hiding this comment

Uh oh!

rsmarples commented Oct 29, 2019

Uh oh!

pbui commented Oct 29, 2019

Uh oh!

pbui commented Oct 29, 2019

Uh oh!

pbui commented Oct 29, 2019

Uh oh!

pbui commented Oct 29, 2019

Uh oh!

rsmarples commented Nov 5, 2019

Uh oh!

pbui commented Nov 7, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants