DRBD for bandwidth intensive replication

Hello

There are two modern servers with the single ConnectX-7 200 Gb/s link for DRBD replication.
On both servers DRBD disks are based on 8 x PCIe 5.0 CM7-V NVMe SSDs, configured in md raid0, so they are definitely not a bottleneck.

The highest possible bandwidth replication I achieved ~35 Gb/s, with the next resource config,

[root@memverge4 drbd.d]# cat test.res
resource test {

disk {
        c-plan-ahead 0;
        resync-rate 4G;
        c-max-rate 4G;
        c-min-rate 2G;
        al-extents 65536;
#        c-fill-target 512M;
     }

  volume 1 {
    device      /dev/drbd1;
    disk        /dev/vg_r0/lvol0;
    meta-disk   internal;
  }

  on memverge3 {
    node-id   27;
  }
  on memverge4 {
    node-id   28;
  }

net
    {
        transport tcp;
        protocol  C;
        sndbuf-size 64M;
        rcvbuf-size 64M;
        max-buffers 128K;
        max-epoch-size 16K;
        timeout 90;
        ping-timeout 10;
        ping-int 15;
        connect-int 15;
#       verify-alg crc32c;
    }
connection
    {
        path
        {
            host memverge3 address 1.1.1.3:7900;
            host memverge4 address 1.1.1.4:7900;
        }
    }

}
[root@memverge4 drbd.d]#

With the “transport rdma” even slightly worse ~30 Gb/s

When I tried set c-max-rate=resync-rate=5G, I got next error,

[root@memverge4 drbd.d]# drbdadm adjust all
drbd.d/test.res:6: Parse error: while parsing value ('5G')
for c-max-rate. Value is too big.

Anton

When I added second volume (located on the same physical disks) to the test resource,

  volume 2 {
    device      /dev/drbd2;
    disk        /dev/vg_r0/lvol1;
    meta-disk   internal;

I can’t exceed 39 Gb/s for the test resource, no matter using TCP or RDMA.

However, when I created two resources (test and test1) with only one volume in each resource, I got 69 Gb/s when I sync two resources simultaneously.

So how to increase initial replication bandwidth far beyond ~4 GB/s for a single resource using TCP or RDMA ?

What are the sizes of the volumes you used in each of these tests? A large enough volume size will be limited by the activity log and cause a hit to performance. This is also why you may have seen better performance synching multiple volumes simultaneously.

Initially I tested 1TB size. But today I have used two logical volumes, 256 GB each.

Something is limiting single resource replication up to ~4 GByte/s…. and with the modern hardware - PCIe 5.0 NVMe SSD, 200/400 Gb/s per port network, etc… This is a huge limit. This limitation for both - TCP and RDMA transports. Splitting a single large volume on N smaller volumes and configuring N separate resource files for each smaller volume to achieve (4GByte/s x N) bandwidth is not the best way…

How are you performing your testing? If you are using FIO, could you share the options you are using with it?

The local write speed is also good to know here, you can perform a peer isolation by drbdadm disconnect on the Primary node and repeating the test (after clearing any buffers/caches as relevant to the tests being . That way we can get a baseline of the write speed of the workload where DRBD is not replicating over the network, and compare that speed to when the peers are connected.

To confirm if the activity log is involved in the performance that you are seeing, you are able to turn it off in a test environment:

disk {
al-updates no;
}

The c-max-rate is only involved in the resynchronization rate, not the regular write replication, so changing that value would not be applicable in this case.

How are you performing your testing? If you are using FIO, could you share the options you are using with it?

On primary (memverge3) I run “drbdadm invalidate-remote test” and check network traffic between servers and/or check “iostat” on secondary (memverge4).

Ok, I applied

disk {
al-updates no;
}

but still can’t exceed 35 Gb/s using single 256 GB volume for both - TCP and RDMA transports.

The local write speed is also good to know here, you can perform a peer isolation by drbdadm disconnect on the Primary node and repeating the test (after clearing any buffers/caches as relevant to the tests being . That way we can get a baseline of the write speed of the workload where DRBD is not replicating over the network, and compare that speed to when the peers are connected.

[root@memverge3 drbd.d]# drbdadm status test
test role:Primary
  volume:1 disk:UpToDate open:no
  memverge4 role:Secondary
    volume:1 peer-disk:UpToDate

[root@memverge3 drbd.d]#
[root@memverge3 drbd.d]# drbdadm disconnect test
[root@memverge3 drbd.d]# drbdadm status test
test role:Primary
  volume:1 disk:UpToDate open:no
  memverge4 connection:StandAlone

[root@memverge3 drbd.d]#

Next on secondary (memverge4) I run fio two times. First for lvol device and then for drbd device.

  volume 1 {
    device      /dev/drbd1;
    disk        /dev/vg_r0/lvol0;
    meta-disk   internal;

[root@memverge4 anton]# fio --name=test --rw=write --bs=128k --filename=/dev/vg_r0/lvol0 --direct=1 --numjobs=1 --iodepth=8 --exitall --group_reporting --ioengine=libaio --runtime=30 --time_based=1
test: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=8
fio-3.41-55-g3a4c1
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=25.5GiB/s][w=209k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=16617: Wed Dec 17 13:50:00 2025
  write: IOPS=208k, BW=25.4GiB/s (27.3GB/s)(762GiB/30001msec)
    slat (nsec): min=2383, max=482284, avg=3224.62, stdev=969.12
    clat (usec): min=10, max=1484, avg=35.08, stdev=27.98
     lat (usec): min=17, max=1487, avg=38.30, stdev=27.99



[root@memverge4 anton]# fio --name=test --rw=write --bs=128k --filename=/dev/drbd1 --direct=1 --numjobs=1 --iodepth=8 --exitall --group_reporting --ioengine=libaio --runtime=30 --time_based=1
test: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=8
fio-3.41-55-g3a4c1
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=19.5GiB/s][w=160k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=16678: Wed Dec 17 13:50:43 2025
  write: IOPS=163k, BW=19.9GiB/s (21.3GB/s)(596GiB/30001msec)
    slat (nsec): min=1513, max=15399k, avg=5698.29, stdev=7793.88
    clat (usec): min=10, max=15520, avg=43.33, stdev=27.67
     lat (usec): min=19, max=15535, avg=49.03, stdev=28.84

In both cases far beyond 4GB/s.

Anton

Regarding the ‘invalidate remote’ command, I wouldn’t suggest relying on data from that method of testing, invalidating the peer would result in a resync operation which is treated differently than sychronous replication.

For the isolation testing, you would be running the same test on the same device, one where DRBD is connected to the peer, and one where it is not. So you would be using FIO to write to your device on the node that is Primary (with the secondary connected and UpToDate as well), and then you would disconnect DRBD, and then run FIO again, on the very same device you wrote to before. That would provide you appropriate statistics you can compare based on the outputs of FIO.

This knowledgebase article provides detailed information on how to do this:

Regarding the ‘invalidate remote’ command, I wouldn’t suggest relying on data from that method of testing, invalidating the peer would result in a resync operation which is treated differently than sychronous replication.

Thank you for clarifying. I think I need this too, so how to exceed 4 GB/s for resync operation too ??, if modern hardware (disks, network) and configuration allows it.

For the isolation testing, you would be running the same test on the same device, one where DRBD is connected to the peer, and one where it is not. So you would be using FIO to write to your device on the node that is Primary (with the secondary connected and UpToDate as well), and then you would disconnect DRBD, and then run FIO again, on the very same device you wrote to before. That would provide you appropriate statistics you can compare based on the outputs of FIO.

Ok, on primary (memverge3),

  volume 1 {
    device      /dev/drbd1;
    disk        /dev/vg_r0/lvol0;
    meta-disk   internal;

Here is next results,

[root@memverge3 anton]# drbdadm status test
test role:Primary
  volume:1 disk:UpToDate open:no
  memverge4 role:Secondary
    volume:1 peer-disk:UpToDate

[root@memverge3 anton]# fio --name=test --rw=write --bs=128k --filename=/dev/drbd1 --direct=1 --numjobs=1 --iodepth=8 --exitall --group_reporting --ioengine=libaio --runtime=30 --time_based=1
test: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=8
fio-3.41-55-g3a4c1
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=3765MiB/s][w=30.1k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=13997: Thu Dec 18 14:07:01 2025
  write: IOPS=30.6k, BW=3826MiB/s (4012MB/s)(112GiB/30001msec)
    slat (usec): min=5, max=115, avg= 8.51, stdev= 1.14
    clat (usec): min=79, max=922, avg=252.70, stdev=26.74
     lat (usec): min=87, max=931, avg=261.21, stdev=26.72



[root@memverge3 anton]# drbdadm disconnect test
[root@memverge3 anton]# drbdadm status test
test role:Primary
  volume:1 disk:UpToDate open:no
  memverge4 connection:StandAlone

[root@memverge3 anton]# fio --name=test --rw=write --bs=128k --filename=/dev/drbd1 --direct=1 --numjobs=1 --iodepth=8 --exitall --group_reporting --ioengine=libaio --runtime=30 --time_based=1
test: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=8
fio-3.41-55-g3a4c1
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=21.1GiB/s][w=173k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=14062: Thu Dec 18 14:07:45 2025
  write: IOPS=166k, BW=20.3GiB/s (21.8GB/s)(610GiB/30001msec)
    slat (nsec): min=1502, max=936645, avg=5495.71, stdev=3017.01
    clat (usec): min=12, max=2520, avg=42.40, stdev=35.34
     lat (usec): min=19, max=2523, avg=47.90, stdev=35.50

By using iperf3, I checked single 200 Gb/s link TCP bandwidth between the two directly connected servers. Even with one TCP stream I got ~100 Gb/s, two TCP streams ~150 Gb/s, 3 TCP streams ~200 Gb/s,

[root@memverge4 ~]# iperf3 -c 1.1.1.3 -P1
Connecting to host 1.1.1.3, port 5201
[  5] local 1.1.1.4 port 40464 connected to 1.1.1.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  11.1 GBytes  94.9 Gbits/sec    0   4.07 MBytes
[  5]   1.00-2.00   sec  10.5 GBytes  89.9 Gbits/sec    0   4.07 MBytes
[  5]   2.00-3.00   sec  11.2 GBytes  95.9 Gbits/sec    0   4.07 MBytes
[  5]   3.00-4.00   sec  11.3 GBytes  97.1 Gbits/sec    0   4.07 MBytes
[  5]   4.00-5.00   sec  11.1 GBytes  95.0 Gbits/sec    0   4.07 MBytes
[  5]   5.00-6.00   sec  11.4 GBytes  98.3 Gbits/sec    0   4.07 MBytes
[  5]   6.00-7.00   sec  11.1 GBytes  95.6 Gbits/sec    0   4.07 MBytes
[  5]   7.00-8.00   sec  11.2 GBytes  96.4 Gbits/sec    0   4.07 MBytes
[  5]   8.00-9.00   sec  11.2 GBytes  96.6 Gbits/sec    0   4.07 MBytes
[  5]   9.00-10.00  sec  11.2 GBytes  96.3 Gbits/sec    0   4.07 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   111 GBytes  95.6 Gbits/sec    0             sender
[  5]   0.00-10.00  sec   111 GBytes  95.6 Gbits/sec                  receiver

iperf Done.
[root@memverge4 ~]# iperf3 -c 1.1.1.3 -P2
Connecting to host 1.1.1.3, port 5201
[  5] local 1.1.1.4 port 42410 connected to 1.1.1.3 port 5201
[  7] local 1.1.1.4 port 42420 connected to 1.1.1.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  8.30 GBytes  71.3 Gbits/sec    0   4.05 MBytes
[  7]   0.00-1.00   sec  9.68 GBytes  83.1 Gbits/sec    0   4.09 MBytes
[SUM]   0.00-1.00   sec  18.0 GBytes   154 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  7.68 GBytes  65.9 Gbits/sec    0   4.05 MBytes
[  7]   1.00-2.00   sec  9.88 GBytes  84.9 Gbits/sec    0   4.09 MBytes
[SUM]   1.00-2.00   sec  17.6 GBytes   151 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  7.65 GBytes  65.7 Gbits/sec    0   4.05 MBytes
[  7]   2.00-3.00   sec  9.79 GBytes  84.1 Gbits/sec    0   4.09 MBytes
[SUM]   2.00-3.00   sec  17.4 GBytes   150 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  7.93 GBytes  68.2 Gbits/sec    0   4.05 MBytes
[  7]   3.00-4.00   sec  10.1 GBytes  86.5 Gbits/sec    0   4.09 MBytes
[SUM]   3.00-4.00   sec  18.0 GBytes   155 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  7.57 GBytes  65.0 Gbits/sec    0   4.05 MBytes
[  7]   4.00-5.00   sec  9.70 GBytes  83.3 Gbits/sec    0   4.09 MBytes
[SUM]   4.00-5.00   sec  17.3 GBytes   148 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  8.83 GBytes  75.8 Gbits/sec    0   4.05 MBytes
[  7]   5.00-6.00   sec  8.77 GBytes  75.3 Gbits/sec    0   4.09 MBytes
[SUM]   5.00-6.00   sec  17.6 GBytes   151 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  8.94 GBytes  76.8 Gbits/sec    0   4.05 MBytes
[  7]   6.00-7.00   sec  8.92 GBytes  76.6 Gbits/sec    0   4.09 MBytes
[SUM]   6.00-7.00   sec  17.9 GBytes   153 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  8.81 GBytes  75.7 Gbits/sec    0   4.05 MBytes
[  7]   7.00-8.00   sec  8.57 GBytes  73.6 Gbits/sec    0   4.09 MBytes
[SUM]   7.00-8.00   sec  17.4 GBytes   149 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  7.85 GBytes  67.4 Gbits/sec    0   4.05 MBytes
[  7]   8.00-9.00   sec  9.77 GBytes  84.0 Gbits/sec    0   4.09 MBytes
[SUM]   8.00-9.00   sec  17.6 GBytes   151 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  8.20 GBytes  70.5 Gbits/sec    0   4.05 MBytes
[  7]   9.00-10.00  sec  9.87 GBytes  84.7 Gbits/sec    0   4.09 MBytes
[SUM]   9.00-10.00  sec  18.1 GBytes   155 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  81.8 GBytes  70.2 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  81.8 GBytes  70.2 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  95.0 GBytes  81.6 Gbits/sec    0             sender
[  7]   0.00-10.00  sec  95.0 GBytes  81.6 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec   177 GBytes   152 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec   177 GBytes   152 Gbits/sec                  receiver

iperf Done.
[root@memverge4 ~]# iperf3 -c 1.1.1.3 -P3
Connecting to host 1.1.1.3, port 5201
[  5] local 1.1.1.4 port 46890 connected to 1.1.1.3 port 5201
[  7] local 1.1.1.4 port 46898 connected to 1.1.1.3 port 5201
[  9] local 1.1.1.4 port 46912 connected to 1.1.1.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  7.72 GBytes  66.2 Gbits/sec    0   4.13 MBytes
[  7]   0.00-1.00   sec  6.85 GBytes  58.8 Gbits/sec    0   4.04 MBytes
[  9]   0.00-1.00   sec  7.44 GBytes  63.9 Gbits/sec    0   4.12 MBytes
[SUM]   0.00-1.00   sec  22.0 GBytes   189 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  7.72 GBytes  66.3 Gbits/sec    0   4.13 MBytes
[  7]   1.00-2.00   sec  6.65 GBytes  57.2 Gbits/sec    0   4.04 MBytes
[  9]   1.00-2.00   sec  8.17 GBytes  70.2 Gbits/sec    0   4.12 MBytes
[SUM]   1.00-2.00   sec  22.5 GBytes   194 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  7.85 GBytes  67.4 Gbits/sec    0   4.13 MBytes
[  7]   2.00-3.00   sec  7.25 GBytes  62.3 Gbits/sec    0   4.04 MBytes
[  9]   2.00-3.00   sec  7.79 GBytes  66.9 Gbits/sec    0   4.12 MBytes
[SUM]   2.00-3.00   sec  22.9 GBytes   197 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  7.62 GBytes  65.5 Gbits/sec    0   4.13 MBytes
[  7]   3.00-4.00   sec  7.78 GBytes  66.8 Gbits/sec    0   4.04 MBytes
[  9]   3.00-4.00   sec  7.32 GBytes  62.9 Gbits/sec    0   4.12 MBytes
[SUM]   3.00-4.00   sec  22.7 GBytes   195 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  7.85 GBytes  67.5 Gbits/sec    0   4.13 MBytes
[  7]   4.00-5.00   sec  7.72 GBytes  66.3 Gbits/sec    0   4.04 MBytes
[  9]   4.00-5.00   sec  6.99 GBytes  60.0 Gbits/sec    0   4.12 MBytes
[SUM]   4.00-5.00   sec  22.6 GBytes   194 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  7.39 GBytes  63.5 Gbits/sec    0   4.13 MBytes
[  7]   5.00-6.00   sec  7.71 GBytes  66.3 Gbits/sec    0   4.04 MBytes
[  9]   5.00-6.00   sec  7.05 GBytes  60.6 Gbits/sec    0   4.12 MBytes
[SUM]   5.00-6.00   sec  22.2 GBytes   190 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  7.75 GBytes  66.6 Gbits/sec    0   4.13 MBytes
[  7]   6.00-7.00   sec  7.94 GBytes  68.2 Gbits/sec    0   4.04 MBytes
[  9]   6.00-7.00   sec  7.14 GBytes  61.3 Gbits/sec    0   4.12 MBytes
[SUM]   6.00-7.00   sec  22.8 GBytes   196 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  7.71 GBytes  66.2 Gbits/sec    0   4.13 MBytes
[  7]   7.00-8.00   sec  7.92 GBytes  68.1 Gbits/sec    0   4.04 MBytes
[  9]   7.00-8.00   sec  7.35 GBytes  63.1 Gbits/sec    0   4.12 MBytes
[SUM]   7.00-8.00   sec  23.0 GBytes   197 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec  7.70 GBytes  66.2 Gbits/sec    0   4.13 MBytes
[  7]   8.00-9.00   sec  7.68 GBytes  66.0 Gbits/sec    0   4.04 MBytes
[  9]   8.00-9.00   sec  7.66 GBytes  65.8 Gbits/sec    0   4.12 MBytes
[SUM]   8.00-9.00   sec  23.0 GBytes   198 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  7.71 GBytes  66.2 Gbits/sec    0   4.13 MBytes
[  7]   9.00-10.00  sec  7.66 GBytes  65.8 Gbits/sec    0   4.04 MBytes
[  9]   9.00-10.00  sec  7.68 GBytes  66.0 Gbits/sec    0   4.12 MBytes
[SUM]   9.00-10.00  sec  23.0 GBytes   198 Gbits/sec    0
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  77.0 GBytes  66.2 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  77.0 GBytes  66.2 Gbits/sec                  receiver
[  7]   0.00-10.00  sec  75.2 GBytes  64.6 Gbits/sec    0             sender
[  7]   0.00-10.00  sec  75.2 GBytes  64.6 Gbits/sec                  receiver
[  9]   0.00-10.00  sec  74.6 GBytes  64.1 Gbits/sec    0             sender
[  9]   0.00-10.00  sec  74.6 GBytes  64.1 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec   227 GBytes   195 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec   227 GBytes   195 Gbits/sec                  receiver

iperf Done.
[root@memverge4 ~]#

The same iperf3 results if I swap the sender and receiver.

Anton

I also tried with DRBD 9.3 (–bitmap-block-size=64k), but still the same - can’t exceed ~4 GB/s for initial sync (drbdadm invalidate-remote) and regular sync (fio). I also tried different modes of transport, the same for TCP and RDMA. I don’t know what to try else… Is it possible, could it be that ~4 GB/s is somewhere hardcoded in the DRBD code ?

I even tried “Protocol A = Asynchronous replication“, only +500 MB/s improvement, i. e. 4,5 GB/s total using TCP, lower with RDMA.

[root@memverge3 ~]# cat /etc/drbd.d/test.res | grep -i rate
        resync-rate 4096M;
        c-max-rate 4096M;
        c-min-rate 2G;
[root@memverge3 ~]# drbdadm adjust test
[root@memverge3 ~]#
[root@memverge3 ~]# vi /etc/drbd.d/test.res
[root@memverge3 ~]# cat /etc/drbd.d/test.res | grep -i rate
        resync-rate 4097M;
        c-max-rate 4097M;
        c-min-rate 2G;
[root@memverge3 ~]# drbdadm adjust test
drbd.d/test.res:6: Parse error: while parsing value ('4097M')
for c-max-rate. Value is too big.

I would suggest repeating the isolation testing from the other peer for thoroughness, where you demote the resource on memverge3, disconnect, promote the resource on memverge4 and then perform the fio test again. But this is for the sake of completeness, I do suspect at this point you may be running into CPU limitations.

DRBD uses one CPU for each resource by default, so setting a mask can allow individual resources to use multiple cores. You can learn more about how to set this here:

CPU pinning based on NUMA topology may also help improve performance in this situation.

Hello

Two DRBD bare metal servers use the latest AMD (EPYC 9455P 48-Core Processor) without enabled hyper-threading, with disabled power savings and completely idle. I suppose this is the fastest CPU cores as of today. Other single thread I/O workloads on these two servers, such as NFS reads/write significantly exceed ~4 GByte/s.

Could it be that 4GByte/s is an internal DRBD limitation, when hardware is not a limiting factor ?

Anton

For synchronous replication throughput, there is no such hardcoded limitation.

A number of options in the DRBD configuration are able to impact resynchronization behavior, and those can be limited as well as by the limits of the activity log in the case of very large volumes and low granularity of bitmap block size, but that is a distinct process that occurs when peers reconnect after being disconnected. Replication happens while the peers are connected, which should be the majority of typical operation, and isn’t impacted by those same settings.

The DRBD user guide has a section on what impacts DRBD throughput, it may be helpful to review here:

The processor you mention has 48 cores.
A single DRBD resource will, without configuration of a cpu-mask parameter, use only a single one of those cores. So you very well could be fully saturating a single core in your tests, while all of the others sit idle, and DRBD will not attempt to use any others, unlike other processes which consume as many cores as they are able to saturate. This would appear to place an artificial limit as your CPU is only working at 1/48th of its potential.

The configuration of the cpu-mask setting is critical to allow DRBD to use this capacity in the case that you are using a single DRBD resource. My prior message links to the section in the user guide on how to configure this setting.

You can confirm your cpu-mask value was applied correctly by using htop to monitor your system during your fio tests.
Go into the setup menu for htop and uncheck ‘Hide kernel threads’ in the Display options, and enable the processor column by going to the Screens menu in the setup and adding PROCESSOR to the Active columns.

After configuring htop, I suggest you repeat your previous fio commands when your peers are connected, and confirm that your cpu-intensive DBRD kernel threads, (drbd_w_, drbd_s_, drbd_r_,drbd_a_) are utilizing the cores you’ve intended via your cpu-mask setting.

For synchronous replication throughput, there is no such hardcoded limitation.

Ok, but the parser says “Value is too big” for 4097M

[root@memverge3 ~]# cat /etc/drbd.d/test.res | grep -i rate
        resync-rate 4096M;
        c-max-rate 4096M;
        c-min-rate 2G;
[root@memverge3 ~]# drbdadm adjust test
[root@memverge3 ~]#
[root@memverge3 ~]# vi /etc/drbd.d/test.res
[root@memverge3 ~]# cat /etc/drbd.d/test.res | grep -i rate
        resync-rate 4097M;
        c-max-rate 4097M;
        c-min-rate 2G;
[root@memverge3 ~]# drbdadm adjust test
drbd.d/test.res:6: Parse error: while parsing value ('4097M')
for c-max-rate. Value is too big.

A number of options in the DRBD configuration are able to impact resynchronization behavior, and those can be limited as well as by the limits of the activity log in the case of very large volumes and low granularity of bitmap block size

The testing volume size is 256 GB

  volume 1 {
    device      /dev/drbd1;
    disk        /dev/vg_r0/lvol0;
    meta-disk   internal;

And with (--bitmap-block-size=64k), I have got almost the same ~4 GByte/s 

I tried to set cpu-mask like in the documentation.

[root@memverge3 ~]# drbdsetup show test
resource "test" {
    options {
        cpu-mask                "f";
    }

[root@memverge4 ~]# drbdsetup show test
resource "test" {
    options {
        cpu-mask                "f";
    }

But looks it still uses single CPU core on both hosts.

You are still referring to the resync options, those are different than replication. Please refer to the documentation here:

Your cpu-mask is likely not appropriate for the amount of cores on your system, it needs to represent a binary value that corrosponds to the amount of cores of your processor.

Your cpu-mask is likely not appropriate for the amount of cores on your system, it needs to represent a binary value that corrosponds to the amount of cores of your processor.

Ok. Now focus on replication. Let’s get back to (re-)synchronization later.

On Primary

[root@memverge3 ~]# drbdadm status test
test role:Primary
  volume:1 disk:UpToDate open:no
  memverge4 role:Secondary
    volume:1 peer-disk:UpToDate

[root@memverge3 ~]#
[root@memverge3 ~]# drbdsetup show test
resource "test" {
    options {
        cpu-mask                "110000";
    }
    _this_host {
        node-id                 27;
        volume 1 {
            device                      minor 1;
            disk                        "/dev/vg_r0/lvol0";
            meta-disk                   internal;
            disk {
                al-extents              6433;
                al-updates              no;
            }
        }
    }
    connection {
        _peer_node_id 28;
        path {
            _this_host ipv4 1.1.1.3:7900;
            _remote_host ipv4 1.1.1.4:7900;
        }
        _cstate Connected;
        net {
            transport           "tcp";
            timeout             90; # 1/10 seconds
            max-epoch-size      16384;
            connect-int         15; # seconds
            ping-int            15; # seconds
            sndbuf-size         67108864; # bytes
            rcvbuf-size         67108864; # bytes
            ping-timeout        10; # 1/10 seconds
            max-buffers         131072;
            _name               "memverge4";
        }
        volume 1 {
            disk {
                resync-rate             8388608k; # bytes/second
            }
        }
    }
}

[root@memverge3 ~]#

On Secodary

[root@memverge4 ~]# drbdsetup show test
resource "test" {
    options {
        cpu-mask                "110000";
    }
    _this_host {
        node-id                 28;
        volume 1 {
            device                      minor 1;
            disk                        "/dev/vg_r0/lvol0";
            meta-disk                   internal;
            disk {
                al-extents              6433;
                al-updates              no;
            }
        }
    }
    connection {
        _peer_node_id 27;
        path {
            _this_host ipv4 1.1.1.4:7900;
            _remote_host ipv4 1.1.1.3:7900;
        }
        _cstate Connected;
        net {
            transport           "tcp";
            timeout             90; # 1/10 seconds
            max-epoch-size      16384;
            connect-int         15; # seconds
            ping-int            15; # seconds
            sndbuf-size         67108864; # bytes
            rcvbuf-size         67108864; # bytes
            ping-timeout        10; # 1/10 seconds
            max-buffers         131072;
            _name               "memverge3";
        }
        volume 1 {
            disk {
                resync-rate             8388608k; # bytes/second
            }
        }
    }
}

[root@memverge4 ~]#

On Primary fio

[root@memverge3 ~]# fio --name=test --rw=write --bs=1024k --filename=/dev/drbd1 --direct=1 --numjobs=1 --iodepth=16 --exitall --group_reporting --ioengine=libaio --runtime=30 --time_based=1
test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=16
fio-3.41-72-g6ff32
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=4436MiB/s][w=4436 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=12042: Fri Jan 16 12:57:25 2026
  write: IOPS=4435, BW=4435MiB/s (4651MB/s)(130GiB/30004msec)
    slat (usec): min=6, max=153, avg=14.97, stdev= 8.84
    clat (usec): min=1146, max=6406, avg=3592.31, stdev=42.69
     lat (usec): min=1164, max=6445, avg=3607.28, stdev=41.76
    clat percentiles (usec):
     |  1.00th=[ 3523],  5.00th=[ 3556], 10.00th=[ 3556], 20.00th=[ 3589],
     | 30.00th=[ 3589], 40.00th=[ 3589], 50.00th=[ 3589], 60.00th=[ 3589],
     | 70.00th=[ 3589], 80.00th=[ 3621], 90.00th=[ 3621], 95.00th=[ 3621],
     | 99.00th=[ 3687], 99.50th=[ 3720], 99.90th=[ 3785], 99.95th=[ 4113],
     | 99.99th=[ 4752]
   bw (  MiB/s): min= 4370, max= 4480, per=100.00%, avg=4435.67, stdev=15.03, samples=60
   iops        : min= 4370, max= 4480, avg=4435.67, stdev=15.03, samples=60
  lat (msec)   : 2=0.01%, 4=99.94%, 10=0.05%
  cpu          : usr=1.39%, sys=5.88%, ctx=133059, majf=0, minf=16
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,133070,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=4435MiB/s (4651MB/s), 4435MiB/s-4435MiB/s (4651MB/s-4651MB/s), io=130GiB (140GB), run=30004-30004msec

Disk stats (read/write):
    drbd1: ios=0/265155, sectors=0/271518720, merge=0/0, ticks=0/936425, in_queue=936425, util=99.69%, aggrios=0/266140, aggsectors=0/272527360, aggrmerge=0/0, aggrticks=0/16993, aggrin_queue=16993, aggrutil=29.87%
    dm-3: ios=0/266140, sectors=0/272527360, merge=0/0, ticks=0/16993, in_queue=16993, util=29.87%, aggrios=0/266140, aggsectors=0/272527360, aggrmerge=0/0, aggrticks=0/16896, aggrin_queue=16896, aggrutil=29.68%
    md127: ios=0/266140, sectors=0/272527360, merge=0/0, ticks=0/16896, in_queue=16896, util=29.68%, aggrios=0/66535, aggsectors=0/68131840, aggrmerge=0/0, aggrticks=0/4132, aggrin_queue=4132, aggrutil=13.76%
  nvme0n1: ios=0/66535, sectors=0/68131840, merge=0/0, ticks=0/4094, in_queue=4094, util=13.58%
  nvme3n1: ios=0/66535, sectors=0/68131840, merge=0/0, ticks=0/4150, in_queue=4150, util=13.76%
  nvme1n1: ios=0/66535, sectors=0/68131840, merge=0/0, ticks=0/4150, in_queue=4150, util=13.76%
  nvme4n1: ios=0/66535, sectors=0/68131840, merge=0/0, ticks=0/4135, in_queue=4135, util=13.72%
[root@memverge3 ~]#