Skip to content

dhcpv6_client: refactor to use event_timeout for non-sock timeouts#16668

Merged
miri64 merged 7 commits intoRIOT-OS:masterfrom
miri64:dhcpv6_client/enh/refactor-timeouts
Jul 23, 2021
Merged

dhcpv6_client: refactor to use event_timeout for non-sock timeouts#16668
miri64 merged 7 commits intoRIOT-OS:masterfrom
miri64:dhcpv6_client/enh/refactor-timeouts

Conversation

@miri64
Copy link
Copy Markdown
Member

@miri64 miri64 commented Jul 21, 2021

Contribution description

Not sure, why I did not do that in the first place (maybe event_timeout did not exist back then, I seemed to basically have mirrored its behavior). As an added bonus, this makes the migration to ztimer really easy, which I also did in this PR.

Testing procedure

make -C tests/gnrc_dhcpv6_client flash -j test still passes, with and without USEMODULE=ztimer

Issues/PRs references

Requires #16667 for the test to pass with ztimer. (merged)

#16661 conflicts with this PR, but IMHO #16661 should be merged first, to have the issue fixed in master.

@miri64 miri64 requested review from benpicco, kaspar030 and kfessel July 21, 2021 11:40
@github-actions github-actions bot added Area: network Area: Networking Area: sys Area: System labels Jul 21, 2021
@miri64 miri64 added the Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation label Jul 21, 2021
@miri64
Copy link
Copy Markdown
Member Author

miri64 commented Jul 21, 2021

Some of the changes (such as the general microsecond -> millisecond conversion) are mainly to prepare for a port to sock_async + event_timeout for retransmissions (so dhcpv6_client and dhcpv6_relay can run in the same thread without the client blocking on sock_udp_recv), but I introduced it here already to not have confusing conversions for ztimer.

Copy link
Copy Markdown
Contributor

@kfessel kfessel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some time unit typos

@miri64 miri64 force-pushed the dhcpv6_client/enh/refactor-timeouts branch from 132d206 to 6493241 Compare July 21, 2021 14:29
@miri64
Copy link
Copy Markdown
Member Author

miri64 commented Jul 21, 2021

Rebased and squashed to current master (to resolve conflict caused by merging #16658 .

Copy link
Copy Markdown
Contributor

@kfessel kfessel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought about rebind time an its offset by t2.

rebind_time = (xtimer_now_usec64() / US_PER_SEC) + t2;
xtimer_remove(&rebind_timer);
rebind_timer.callback = _post_rebind;
rebind_time = _now_sec() + t2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and l.: 842 seem to be the only lines that access rebind_time, they cancel each other +t2 and -t2

at a first glance the -t2 in l.: 842 may be wrong (but might have been working around: failing renew since mrd seems to be used to evaluate if a renew is atempted '-t2' make this only be done if the renew is run in the same second as the "_schedule_t2").

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR aims not to change the logic of the DHCPv6 implementation, but only refactors, so even if it is wrong, that change would belong into a different PR. Hovewer, L. 842 calculates the MRD for a RENEW transmission (please refer to the RFC to figure out what this means exactly, the implementation was a few years ago, I would need to reread it myself, to explain it properly myself; all the more reason why I don't want to change logic in this PR ... ;-)). This line only reschedules the next T2 event which was supplied in a REPLY message by the IA_PD option (or IA_NA option, once #16228 is in), so I don't see how the two operations would be related. I guess in L. 842 we want some kind of offset to T2, as for RENEW the MRD shall be "Remaining time until earliest T2".

Concerns about things happening at the same time don't make sense, since the client runs in a single thread ;-).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong with the mrd calculation in the case of "renew" and "rebind" (request does not calculate an mrd).

(rebind will never happen, renew may run longer than t2 allows)

The error does not influence how the system works if the first renew is successful -> dificult to see in the wild

I only looked at these lines because i checked on the xtimer ztimer conversion.

The error(s) was in there before.

everything done in this PR is working as expected

Copy link
Copy Markdown
Contributor

@kfessel kfessel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this on native following the "Testing procedure". ✔️
Using short T1 and T2, renew was successful for xtimer as well as for ztimer. ✔️

I like the changes of this patch. ✔️

Maybe i would like the use of just msec (one ztimer to rule them all, this could also avoid some jitter but dhcpv6 seems to be tolerant to that) ❓

Comment on lines +953 to +960
#if IS_USED(MODULE_EVENT_TIMEOUT_ZTIMER)
event_timeout_ztimer_init(timeout, ZTIMER_SEC, event_queue, event);
event_timeout_set(timeout, delay_sec);
#else
event_timeout_init(timeout, event_queue, event);
/* use xtimer_set64 instead of event_timeout_set to prevent overflows */
xtimer_set64(&timeout->timer, ((uint64_t)delay_sec) * US_PER_SEC);
#endif
Copy link
Copy Markdown
Member Author

@miri64 miri64 Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe i would like the use of just msec (one ztimer to rule them all, this could also avoid some jitter but dhcpv6 seems to be tolerant to that) ❓

Main reason for also using ZTIMER_SEC is thes lines. I really do not want to cap of a potential overflow here. Since this function is used to set the T1 and T2 timeouts (both come over the wire) it is not guaranteed that they will be e.g. UINT32_MAX - 1.

Copy link
Copy Markdown
Contributor

@kfessel kfessel Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, But: This may still be problematic since gnrc_dhcp dhcpv6_client_conf_prefix and nib _nib_pl_add have a max addr lifetime of uint32_max-1 msecs or infite
which will make the addr last shorter than its T1 and T2 timeout

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but nothing prevents us to port that to ZTIMER_SEC in the future as well ;-). Currently the same problem exists in master, since the xtimer of DHCPv6 uses 64-bit scale.

@miri64
Copy link
Copy Markdown
Member Author

miri64 commented Jul 23, 2021

Now this PR only needs an ACK by a maintainer.

Copy link
Copy Markdown
Member

@maribu maribu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK. Change looks sane to me and I trust @kfessel testing. Please confirm that the potential issue I pointed out inline is indeed not present before hitting merge.

#endif
}

static inline uint32_t _now_sec(void)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential footgun with using both timestamps from ZTIMER_SEC and ZTIMER_MSEC, as both could be provided by different hardware. One could (or at least I can see myself doing so) assume that the timestamps can be converted to each other.

I don't see this incorrect assumption anywhere here in the code, but I'm not really familiar with it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the timestamp should be used independent from each other and are only used for the respective timeouts using the different scales.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are multiple potential foot guns, but none of them is worse than it was before with just xtimer, especialiy within reasnonable limit of leasetime and T1 and T2 I would advice against giving longer than 1 month leases

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The drift pontential of msec vs sec is low enough for T1 and T2 application,

The problem left is one might uses SEC as some kind of RTC (which should not be done), combine that with DHCP and some NTP after getting the dhcp lease and you get a time scramble, this is a problem of the ztimer and not of this DHCP.

@miri64 miri64 added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jul 23, 2021
@kfessel
Copy link
Copy Markdown
Contributor

kfessel commented Jul 23, 2021

i also ran it on the board ethernet port of a nucleo-f767zi (the biggest dificulty was to find the place where "ethos" was configured)

xtimer and ztimer

-> embedded hardware ✔️

it gets its lease and renews them (short duration)

@miri64 miri64 merged commit bab7def into RIOT-OS:master Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: network Area: Networking Area: sys Area: System CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants