Skip to content

feat: Use kyber IO scheduler for SSDs; bfq for rotational block devices#149

Merged
mmstick merged 1 commit intomaster_jammyfrom
iosched_jammy
Jul 12, 2022
Merged

feat: Use kyber IO scheduler for SSDs; bfq for rotational block devices#149
mmstick merged 1 commit intomaster_jammyfrom
iosched_jammy

Conversation

@mmstick
Copy link
Copy Markdown
Member

@mmstick mmstick commented Jul 12, 2022

I've done some reading around and it seems that we're using an I/O scheduler that isn't ideal for NVME or SATA SSDs. We're defaulting to none, which can cause noticeable stutter and delay on the desktop when there are many processes performing I/O at the same time. Most Linux distributions are doing the same thing. It could be the cause of some of the reported issues with the system occasionally stuttering (there's reports that DRAM-less SSDs are more affected).

We may want to think about defaulting to Kyber for NVME and SATA SSDs, and BFQ for rotational drives. Kyber is a simple (< 1000 lines of code) and efficient multiqueue scheduler that's recommended in Red Hat documentation to be used by NVME/SATA SSDs. BFQ is a complex multiqueue scheduler that has a higher CPU cost but brings low latency benefits to slow rotational storage.

Tipped by some Phoronix comments talking about BFQ and None causing issues and Kyber being a good default for SSDs, I did some searching and found this LWN article, this Red Hat bugzilla issue, this Red Hat documentation, and this research paper.

From the Red Hat bugzilla issue report, there was a demand to enable mq-deadline for all SSDs because None and BFQ caused noticeable latency/stutter on the desktop for SSDs. However, MQ Deadline is focused more towards a server-oriented workflow with higher throughput at the cost of a higher latency, so I think Kyber will be better for the desktop use case. The K2 research paper also shows that Kyber has the lowest latency between BFQ, MQ Deadline, and None.

This should require testing to see how much of an impact this makes when doing a lot of I/O in the background. Perhaps playing an intensive 3D video game that streams textures and maps constantly, while also having multiple processes copying multiple files to different drives and across the network. It'd be good if we could find some people with NVME SSDs suffering from freezes that could try the change.

Note that it is working if you get an output like this:

$ cat /sys/block/nvme0n1/queue/scheduler
mq-deadline [kyber] bfq none

The change may impact max I/O throughput since that's the tradeoff made for improving desktop responsiveness.

@mmstick mmstick requested review from a team July 12, 2022 00:03
@JacekJagosz
Copy link
Copy Markdown

I suggest mmcblk devices to have bfq assigned as well. In my testing it improves user experience of slow SD cards and devices equipped with eMMC as main storage (often chromebooks).

@mmstick
Copy link
Copy Markdown
Member Author

mmstick commented Jul 12, 2022

Makes sense.

Copy link
Copy Markdown

@Absolucy Absolucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems alright to me

@linuxgnuru
Copy link
Copy Markdown

linuxgnuru commented Jul 12, 2022

Would SSD-SATA drives as well as USB flash drives (which show up as /dev/sdX) i.e. SSD on the SCSI bus be affected?

@mmstick
Copy link
Copy Markdown
Member Author

mmstick commented Jul 12, 2022

ACTION=="add|change", SUBSYSTEM=="block", ATTR{queue/rotational}=="0", KERNEL=="sd?", ATTR{queue/scheduler}="kyber"

This udev rule will apply kyber as the scheduler for any /dev/sdX device that is not rotational. So that'd include SATA SSDs and USB flash drives.

@mmstick
Copy link
Copy Markdown
Member Author

mmstick commented Jul 12, 2022

I think bfq would be better for USB storage. Looking for an attribute to filter USB drives.

@gangwerz
Copy link
Copy Markdown

Seems to be working with the internal NVMe on the galp5-1650Ti

  1. Added the iosched repo
  2. Upgrade without error using sudo apt update && sudo apt upgrade
  3. Reboot and cat /sys/block/nvme0n1/queue/scheduler shows the [kyber] scheduler used on the NVMe

I tested with a USB flashdrive, and the BFQ sched is being selected. I would assume that means my flashdrive is being marked as rotational.

@mmstick
Copy link
Copy Markdown
Member Author

mmstick commented Jul 12, 2022

Looks like USB flash drives are automatically assigned 1 to their queue/rotational parameter, so they're getting BFQ.

@mmstick
Copy link
Copy Markdown
Member Author

mmstick commented Jul 12, 2022

Looks like a hardware-related flaw in a lot of flash drives.

https://bugzilla.kernel.org/show_bug.cgi?id=90761

Currently, there is no way to distinguish if an USB that don't have VPD is a flash disk, because VPD BDC is used. USB sticks that have VPD BDC should be set as not rotational as expected. So it's a known issue, but so far there is no way to fix it.

@gangwerz
Copy link
Copy Markdown

Looks like that flaw is found on all 5 my flash drives

I can't test if the USB rule will set a properly constructed flash drive will be set to kyber. But, it's consistently setting my flash drives that have queue/rotational = 1 to BFQ.

@mmstick
Copy link
Copy Markdown
Member Author

mmstick commented Jul 12, 2022

Sounds fine that flash drives and portable HDDs will get BFQ by default. I'm guessing that the only devices that'd have this chip are portable SSDs.

Copy link
Copy Markdown

@gangwerz gangwerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants