0Answer
  • 13
name

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191

Backtrace:

File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

zfs not using full disk I/O

I have setup zfs on a fresh Ubuntu 20.04. But speed seems awfully slow.

The disks can deliver 140 MB/s if run in parallel:

$ parallel -j0 dd if={} of=/dev/null bs=1M count=1000 ::: /dev/disk/by-id/ata-WDC_WD80E*2
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.23989 s, 145 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.34566 s, 143 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.39782 s, 142 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.43704 s, 141 MB/s
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 7.90308 s, 133 MB/s

$ iostat -dkx 1 
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop3            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda            214.00 136112.00    11.00   4.89    5.97   636.04    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.84  89.60
sdb            216.00 145920.00     0.00   0.00    5.71   675.56    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.77  92.40
sdc            216.00 147456.00     0.00   0.00    5.55   682.67    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.73  92.80
sdd            199.00 135168.00     0.00   0.00    5.77   679.24    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.73  88.40
sde            198.00 133120.00     0.00   0.00    5.82   672.32    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.75  88.00

So the raw parallel performance is around 700 MBytes/s. This is limited by CPU - not disk I/O. If not run in parallel each disk can deliver 170 MB/s.

If I build a RAID5 with the disks:

$ mdadm --create --verbose /dev/md0 --level=5 --raid-devices=5 /dev/disk/by-id/ata-WDC_WD80E*2
# To stop the initial computing of parity:
$ echo frozen > /sys/block/md0/md/sync_action
$ echo 0 > /proc/sys/dev/raid/speed_limit_max
$ dd if=/dev/md0 of=/dev/null bs=1M count=10k
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 40.3711 s, 266 MB/s

$ iostat -dkx 1
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop3            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
md0            264.00 270336.00     0.00   0.00    0.00  1024.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdb            134.00  68608.00 16764.00  99.21    5.48   512.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.54  73.20
sdc            134.00  68608.00 16764.00  99.21    5.56   512.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.54  74.00
sdd            134.00  68608.00 16764.00  99.21    5.65   512.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.54  74.00
sde            134.00  68608.00 16764.00  99.21    5.84   512.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.54  74.80
sdf              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00

This is worse than the 700 MBytes/s. top says the process md0_raid5 takes 80% of a core, and dd takes 60%. I do not see where the bottleneck is here: The disks are not 100% busy and neither are the CPUs.

On the same 5 disks I create a zfs pool:

zpool create -f -o ashift=12 -O acltype=posixacl -O canmount=off -O dnodesize=auto -O normalization=formD -O relatime=on -O xattr=sa rpool raidz /dev/disk/by-id/ata-WDC_WD80E*2
zfs create -o mountpoint=/data rpool/DATA

Then I write some data:

$ seq 100000000000 | dd bs=1M count=10k iflag=fullblock > out
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 91.2438 s, 118 MB/s

$ iostat -dkx 1
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop3            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00   79.00  27460.00     8.00   9.20   15.99   347.59    0.00      0.00     0.00   0.00    0.00     0.00    1.12  36.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00  113.00  25740.00     0.00   0.00    9.73   227.79    0.00      0.00     0.00   0.00    0.00     0.00    0.88  38.00
sdc              2.00      0.00     0.00   0.00    2.50     0.00  319.00  25228.00     2.00   0.62    1.68    79.08    0.00      0.00     0.00   0.00    0.00     0.00    0.28  31.20
sdd              0.00      0.00     0.00   0.00    0.00     0.00  111.00  25872.00     0.00   0.00   10.87   233.08    0.00      0.00     0.00   0.00    0.00     0.00    0.96  40.00
sde              0.00      0.00     0.00   0.00    0.00     0.00  109.00  26356.00     0.00   0.00   10.74   241.80    0.00      0.00     0.00   0.00    0.00     0.00    0.93  40.80
sdf              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00

z_wr_iss uses 30% CPU.

When I read back the data:

$ pv out >/dev/null
10.0GiB 0:00:44 [ 228MiB/s]

$ iostat -dkx 1
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
loop0            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop3            0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda           1594.00  51120.00     5.00   0.31    0.25    32.07    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00 100.40
sdb           1136.00  36312.00     0.00   0.00    0.20    31.96    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00  96.40
sdc           1662.00  53184.00     0.00   0.00    0.20    32.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00 100.00
sdd           1504.00  48088.00     0.00   0.00    0.19    31.97    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00  99.60
sde           1135.00  36280.00     0.00   0.00    0.21    31.96    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00  99.20
sdf              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00

iostat -dkx 1 says disks are 100% I/O utilized. top says pv uses around 60% CPU and zthr_procedure around 30% CPU. With 2 cores this leaves a full core idle.

This surprises me: If I get a raw parallel read performance of 700 MB/s, I would expect zfs could utilize this, too, and be CPU constrained (and probably give in the order of 300 MB/s).

The only guess I have is if zfs uses a tiny stripe size or block size, and forces the drives to flush their cache often. But that would hardly make sense on reading.

Why does iostat -dkx 1 say disks are 100% I/O utilized at only 40 MBytes/s per disk when using zfs? Can I recreate the pool in a way where zfs better utilizes the disk I/O?

      • 1
    • Have you tried what kind of performance numbers you get with for example iozone when you let it go through different read sizes / simultaneous processes etc?
      • 2
    • @JannePikkarainen I am not very interested in small file performance nor parallel performance. But I will be happy to run iozone. What options do you want me to use?

Trending Tags