-
Notifications
You must be signed in to change notification settings - Fork 565
pci: be more robust when parsing PCI capabilities #7018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
likebreath
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blitz Thanks for reporting and fixing the bug. Since we are addressing multiple (but related) issues, would you please split the the first commit into three:
- Check whether the device claims to have a capability list;
- Mask the lower 2 bits of a capability pointer to align with PCI spec;
- Terminate the caps loop if the current capability points to itself.
Also, want to confirm, issue 3 would only happen for devices that are malicious or broken, right?
OSDev has cranked up its bot protection. The following link works for me locally after clicking the "I'm a human" button. I guess the CI fails this check... Without this exception the CI fails the link check stage: * [403] [https://wiki.osdev.org/IOAPIC](https://wiki.osdev.org/IOAPIC) | Network error: Forbidden Signed-off-by: Julian Stecklina <[email protected]>
Done! |
Currently, the code tries to follow the PCI capabilities list in offset 0x34 in the config space regardless of whether the status registers says this is valid. Fix by adding the appropriate check. Signed-off-by: Julian Stecklina <[email protected]>
The PCI standard mandates that the lower bits of the capability pointer are masked out before using the pointer. See PCI Local Bus Specification 3.0 Chapter 6.7 "Capabilities List". Signed-off-by: Julian Stecklina <[email protected]>
If a device returns 0xff as a capability pointer bad things happen. The code before the previous commits would crash in debug builds due to integer overflow. With the two lowest bits masked out, it sends the code into an endless loop. Be more robust by at least handling the case where the capability appears to point to itself. Signed-off-by: Julian Stecklina <[email protected]>
There are a lot of internal functions that are not and probably should not be called from other places. Signed-off-by: Julian Stecklina <[email protected]>
| // doesn't handle all cases where a device might send us in a loop here, but it | ||
| // handles case of a device returning 0xFF instead of implementing a real | ||
| // capabilities list. | ||
| if cap_next == 0 || cap_next == cap_iter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit makes sense to me that it handles a corner case where we should break the loop if a cap pointer points to itself. But I don't see how returning 0xff as the next cap pointer would be a problem, particularly with the previous commit where the last two bits are masked out. Basically, the loop would stop at the top level with condition while cap_iter != 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The top level condition check would catch the cap_next == 0 case, but not cap_iter == cap_next for values other than 0 (i.e. 0xfc for the masked 0xff example).
There is a subtle implementation detail in that this break check leaves cap_iter at the last offset instead of updating it. And it is still used after the loop to set up the nv_gpudirect_clique_cap. That logic would break. So the top level loop check only ever triggered if the initial offset is already 0, even before this change.
But I don't see how returning 0xff as the next cap pointer would be a problem
The problem is not with 0xff in general. This issue was triggered with a bus device emulation that returns -1 (0xff) on all reads as a "there is nothing here" marker. Any case that just returns the same fixed non-zero value for this situation is caught here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I misread the code for cap_next and cap_iter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @snue for explaining. I just got back from vacation and saw that you already took care of this. 🙏
|
|
||
| # OSDev has added bot protection and accesses my result in 403 Forbidden. | ||
| '^https://wiki.osdev.org', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to split for this time. Generally we want to have distinct changes from separate PRs. Please keep this in mind in the future. :)
de76445
When devices don't implement a capabilities list or just return 0xFF for the capabilities pointer sad things happen. It typically boils down to receiving 0xFF from the capabilities register. In debug builds, this will then crash the VMM, because
cap_iter + 1overflows:Fix by:
A very broken (malicious) device might still send this code into an endless loop, but I assume this is not part of the threat model and we don't want to overcomplicate this code.
It would be nice to implement a small iterator for the capability list, so we don't have to duplicate the logic (and quirks handling). If this is interesting, please say so. :)
While I was here, I also changed the capability pointer handling to follow the PCI spec and mask out the lowest two bits. See PCI Local Bus Specification 3.0 Chapter 6.7 "Capabilities List".