-
Notifications
You must be signed in to change notification settings - Fork 1.4k
ospf6d: Fix summary LSA removal #18345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ospf6d: Fix summary LSA removal #18345
Conversation
|
Hey can we submit the PR against master not 10.2? All code must go into master first. |
|
Sure. I'm currently testing on 10.2.1 to have a stable base, but of course I'll rebase to master when the PR is complete and before removing the draft status |
0b13876 to
721409e
Compare
|
Latest update fixes the memory corruption triggered by ospf6_gr_topo1. Still a few TODOs left. |
f0bb512 to
5569bc9
Compare
|
Yay, all ospf6 topotests work now |
|
Thank you @gromit1811 |
5569bc9 to
4cf7b8f
Compare
4cf7b8f to
69ffa58
Compare
No point in doing this for route types we're not interested in at all. Signed-off-by: Martin Buck <[email protected]>
When originating summaries, we perform a lookup to find possible previous summaries. Don't just compare the prefix in that case, but also route and path properties: We might have several summary entries for the same destination (e.g. with a intra- and inter-area path) at the same time (at least temporarily) and we need to deal with the right one. Signed-off-by: Martin Buck <[email protected]>
Previously, we checked each individual route to see whether a summary LSA needs to be originated or purged. The problem is that this decision can't be made reliably by looking at a single route, because there might be several routes for a destination (e.g. one with a intra- and one with an inter-area path) and one might cause origination of a summary LSA while the other causes it to be purged. This causes unstable and/or missing LSAs as described in FRRouting#16197 Instead, ospf6_abr_originate_summary_to_area() now only originates new LSAs or approves existing ones. Removal is handled in ospf6_abr_task() after the whole routing table has been iterated. LSAs which have not been approved by any call to ospf6_abr_originate_summary_to_area() are then removed. This matches the behaviour of OSPFv2. Signed-off-by: Martin Buck <[email protected]>
We slightly abuse summary routes to store information about LSAs that have been originated for them. Most other route parameters are not used for anything. So don't update them to avoid giving the impression that we've got real routes here and to reduce confusion. Signed-off-by: Martin Buck <[email protected]>
Calling ospf6_abr_task()'s callees directly no longer works as expected because they don't know about the "LSA unapprove/approve/remove unapproved" game implemented by ospf6_abr_task(). So just schedule ospf6_abr_task() instead of calling its callees directly. Signed-off-by: Martin Buck <[email protected]>
When reconfiguring an area between stub and normal, explicitly schedule the ABR task to originate/expire summaries. So far, this happend implicitly as a side effect of (re)originating the router LSA, but let's make this more obvious by explictly scheduling the ABR task. This also matches what we do when reconfiguring between NSSA and normal and also what ospfd does. Signed-off-by: Martin Buck <[email protected]>
Move general ABR infrastructure code not specific to NSSA support
(ospf6_abr_task() and friends) out of ospf6_nssa.c. It proably got copied in
there (from ospfd) when adding NSSA support since it was a prerequisite for
that. Now that general intera-area summary orgination uses the ABR task as
well, it no longer makes sense to hide it in ospf6_nssa.c, so move it to
ospf6_abr.c.
While we're at it, also clean up some other unused cruft from
ospf6_nssa.{c,h} and move things which should not be private (like LSA
flags) to their proper place.
Signed-off-by: Martin Buck <[email protected]>
69ffa58 to
5c3f487
Compare
|
Rebased again to current master. Now CI looks much better and my local topotests were also successful (except for some BGP/SRv6/NHRP tests I really don't feel responsible for 😉) |
|
Fixed style issues found by frrbot. I did this in a separate commit so that the "move code around" commit doesn't move and modify code at the same time which would be more difficult to review. I intentionally left the ospf6_abr_task() entry/exit log messages in which "[CI] Verify Source" complained about - ABR task may run asynchronously and it's important to understand when that happens when reading logs. And no, ftrace is not a suitable replacement as it's important to have those log messages together the other log messages and in sequence. I removed some other entry/exit messages from functions only called by ospf6_abr_task(), though - they're only called from there and ABR task already logs each individual step. |
|
As expected, "Verify Source" still fails, but now with only one warning left triggered by the @riw777 Anything else I can do simplify review? I know this is a big PR (but not as big as the labels suggest 😉), but I tried to split it up into reasonable commits which should be easier to review individually than the whole diff. And the largest commit only moves code, so that should be relatively easy as well... |
riw777
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
|
Not certain if we should merge past this or not: |
Me neither. I fixed the other issues checkpatch was complaining about, but this debug message here was pretty important for me when analyzing logs, so I'd strongly prefer to leave it in. Of course, I can try to reformulate it to not trigger the warning. But IIRC, @donaldsharp also commented somewhere (Slack?) and questioned the usefulness of this warning as the ftrace recommendation doesn't really apply in this context. |
04d69da to
3e23f74
Compare
Fix style issues found by frrbot in unmodified but moved around code. Also remove some excessive log messages found by "[CI] Verify Source". Signed-off-by: Martin Buck <[email protected]>
3e23f74 to
3643e6d
Compare
|
Rephrased debug message to make checkpatch happy. Hopefully the final fix that allows to merge this 🤞 |
|
ci:rerun |
1 similar comment
|
ci:rerun |
|
ci has been running for two weeks? hmmm... ci:rerun |
@riw777 Is it just this PR with repeated CI failures totally unrelated to the actual changes in the PR, or is CI generally unstable ATM? |
|
The remaining 2 "Basic Tests" failures look unrelated, BGP/ISIS SRv6 problems. Shall I rebase to get a possible fix from current master or just ignore them? |
I'm going to try to rerun just the failed tests to see what we can do |
|
@riw777 Thanks for merging, just after I celebrated the 1st birthday of the original bug report 😄 This set at least 3 records: The most complicated bug I analyzed, the most complicated bug fix I did and the longest time to get the fix through CI/CD. Now we just need to get #18549 in as well and then I can finally get this off my radar... |
Finally, a first attempt to fix #16197
The main fixes are from the following 2 commits (other commits are mostly cleanup/optimization):
commit 9eb0492
commit 6f8df0c
Note that this isn't finished/complete yet and is intended mainly for feedback or to stop me in case I'm moving in a completely wrong direction 😉
Open issues: