Skip to content

Conversation

@jpbrucker
Copy link
Contributor

The PL031 RTC provides two features: a real-time counter and an alarm interrupt. To use the alarm, the driver normally writes a time value into the match register RTCMR, and when the counter reaches that value the device triggers the interrupt.

At the moment the implementation ignores programming of the alarm, as the feature seems rarely used in VMs. However the interrupt is still triggered arbitrarily when the guest writes to registers, and the line is never cleared. This really confuses the Linux driver, which loops in the interrupt handler until Linux realizes that no one is dealing with the interrupt (200000 unanswered calls) and disables the handler.

One way to fix this would be implementing the alarm function properly, which isn't too difficult but requires adding some async timer logic which probably won't ever get used. In addition the device's interrupt is level-triggered and we don't support level interrupts at the moment, though we could probably get away with changing this interrupt to edge.

The simplest fix, though, is to just disable the interrupt logic entirely, so that the alarm function still doesn't work but the guest doesn't see spurious interrupts.

@jpbrucker jpbrucker requested a review from a team as a code owner July 16, 2025 11:04
Copy link
Member

@likebreath likebreath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbrucker Thank you for reporting and fixing this bug. It appears to me that the IMSC register is only used for interrupts too. Why we don't remove imsc related code too? Any considerations?

Can you please also share some kernel logs from the guest showing the spurious interrupts? It might be helpful for the community to search when looking for similar issues.

@jpbrucker
Copy link
Contributor Author

Sure I can remove the local storage of IMSC as well. It's the interrupt mask, where the guest writes 1 to enable the interrupt. So removing the variable and tying it to 0 does seem more accurate since we don't support the interrupt.

The warning we get when Linux disables the interrupt after initially hanging in the interrupt handler is:

[  193.216582] irq 22: nobody cared (try booting with the "irqpoll" option)
[  193.224530] CPU: 0 UID: 0 PID: 62 Comm: kworker/0:2 Not tainted 6.16.0-rc1 #1 PREEMPT
[  193.224733] Hardware name: linux,dummy-virt (DT)
[  193.224980] Workqueue: events rtc_timer_do_work
[  193.225991] Call trace:
[  193.226167]  show_stack+0x18/0x24 (C)
[  193.226333]  dump_stack_lvl+0x78/0x90
[  193.226404]  dump_stack+0x18/0x24
[  193.226434]  __report_bad_irq+0x38/0xe8
[  193.226468]  note_interrupt+0x31c/0x364
[  193.226495]  handle_irq_event+0x9c/0xac
[  193.226520]  handle_fasteoi_irq+0xa4/0x1b4
[  193.226547]  handle_irq_desc+0x40/0x58
[  193.226571]  generic_handle_domain_irq+0x1c/0x28
[  193.226596]  gic_handle_irq+0x4c/0x120
[  193.226622]  call_on_irq_stack+0x24/0x30
[  193.226653]  do_interrupt_handler+0x80/0x84
[  193.226679]  el1_interrupt+0x34/0x68
[  193.226709]  el1h_64_irq_handler+0x18/0x24
[  193.226738]  el1h_64_irq+0x6c/0x70
[  193.226865]  pl031_alarm_irq_enable+0x18/0x70 (P)
[  193.226899]  rtc_timer_do_work+0x1e4/0x3d4
[  193.226923]  process_one_work+0x150/0x398
[  193.226950]  worker_thread+0x2d0/0x3e4
[  193.226975]  kthread+0x144/0x21c
[  193.227002]  ret_from_fork+0x10/0x20
[  193.227130] handlers:
[  193.344427] [<00000000c7607607>] pl031_interrupt
[  193.350308] Disabling IRQ #22

The PL031 RTC provides two features: a real-time counter and an alarm
interrupt. To use the alarm, the driver normally writes a time value
into the match register RTCMR, and when the counter reaches that value
the device triggers the interrupt.

At the moment the implementation ignores programming of the alarm, as
the feature seems rarely used in VMs. However the interrupt is still
triggered arbitrarily when the guest writes to registers, and the line
is never cleared. This really confuses the Linux driver, which loops in
the interrupt handler until Linux realizes that no one is dealing with
the interrupt (200000 unanswered calls) and disables the handler.

One way to fix this would be implementing the alarm function properly,
which isn't too difficult but requires adding some async timer logic
which probably won't ever get used. In addition the device's interrupt
is level-triggered and we don't support level interrupts at the moment,
though we could probably get away with changing this interrupt to edge.

The simplest fix, though, is to just disable the interrupt logic
entirely, so that the alarm function still doesn't work but the guest
doesn't see spurious interrupts.

Add a default() implementation to satisfy clippy's new_without_default
check, since Rtc::new() doesn't take a parameter after this change.

Signed-off-by: Jean-Philippe Brucker <[email protected]>
@likebreath likebreath enabled auto-merge July 17, 2025 16:30
@likebreath likebreath moved this to 👀 In review in Cloud Hypervisor Roadmap Jul 17, 2025
@likebreath likebreath added the bug-fix Bug fix to include in release notes label Jul 17, 2025
@likebreath likebreath added this pull request to the merge queue Jul 17, 2025
Merged via the queue into cloud-hypervisor:main with commit 4528e2f Jul 17, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Cloud Hypervisor Roadmap Jul 17, 2025
@jpbrucker jpbrucker deleted the fix-rtc branch July 18, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix Bug fix to include in release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants