• 6

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

I've got a Solaris 11 ZFS-based NAS device with 12x1TB 7.2k rpm SATA drives in a mirrored configuration.

It provides 2 services from the same pool - an NFS server for a small VM farm and a CIFS server for a small team to host shared files. The cifs ZFS has dedup on, while the NFS ZFS filesystem has dedup off. Compression is off everywhere. I'm snapshotting each filesystem every day and keeping the last 14 snapshots.

I've run into a performance issue in cases where I'm either moving, copying or deleting a large amount of data while directly SSH'd into the NAS. Basically, the process seems to block all other IO operations, even to the point of VMs stalling because they receive disk timeouts.

I've a couple of theories as to why this should be the case, but would appreciate some insight into what I might do next.


1) the hardware isn't good enough. I'm not so convinced of this - the system is an HP X1600 (single Xeon CPU) with 30GB RAM. Although the drives are only 7.2k SATA, they should push a max of 80 IOPS each, which should give me more than enough. Happy to be proven wrong though.

2) I've configured it wrong - more than likely. Is it worth turning dedup off everywhere? I'm working under the assumption that RAM = good for dedup, hence giving it a reasonable splodge of RAM.

3) Solaris being stupid about scheduling IO. Is it possible that a local rm command completely blocks IO to the nfsd? If so, how do I change this?

      • 2
    • how's the CPU usage when you move the files? does it spike? monitor it with mpstat/vmstat/prstat. ZFS shouldn't care who's using its services (NFS, CIFS, local). you've plenty of RAM for the DDT.

Option #2 is most likely the reason. Dedup performs best when the dedup table (DDT) fits entirely in memory. If it doesn't, then it spills over onto disk, and DDT lookups that have to go to disk are very slow and that produces the blocking behavior.

I would think that 30G of RAM is plenty, but the size of the DDT is directly dependent on the amount of data being deduped and how well dedup works on your data. The dedup property is set at the dataset level, but lookups are done across the entire pool, so there is just one pool-wide DDT.

See this zfs-discuss thread on calculating the DDT size. Essentially it's one DDT entry per unique block on the pool, so if you have a large amount of data but a low dedup ratio, that means more unique blocks and thus a larger DDT. The system tries to keep the DDT in RAM, but some of it may be evicted if the memory is needed for applications. Having L2ARC cache can help prevent the DDT from going to the main pool disks, as it will be evicted from main memory (ARC) into L2ARC.

  • 3
Reply Report
      • 1
    • I'll turn off dedup for the fs in question. I know this will take a while for writes to become un-duplicated, but hopefully it should make a difference. My Dedup ratio is only 1.14, so shouldn't cost much in space.
      • 2
    • Unfortunately once you have activated dedup on any fs in a pool, there will be a DDT and it was always be active. The only way to "start over" is to destroy the pool and recreate it (and never turn on dedup on any fs.) We had to do that at work on a server that didn't have enough RAM and could not practically be upgraded with enough.
      • 1
    • Would it not be the case that if I deactivated dedup, moved all data off the pool and then copied it back, the DDT size would effectively be zero, as all data on the pool would have been written post-turning-off-dedup?
      • 1
    • It's possible, but the only way to be sure is to destroy the pool. If you can move all the data off and then back, why not destroy and recreate the pool in the process?
    • Apologies, I misspoke. It's not possible to move the data off the entire pool. I meant to say that if I move all the data off that particular ZFS fs that has had dedup enabled, might I see an improvement?

One thing to keep in mind with ZFS and snapshots is that nothing is free, and as you remove large amounts of data, and expect snapshots to continue to maintain that data for you, especially if you have a large number of snapshots, as you conduct your deletes, snapshots have to be updated accordingly to reflect changes to the filesystem. I am assuming that you have 6 VDEVs, basically 6 mirrors in the pool, which means that you actually have to perform these changes against all disks, since data is quite evenly spread across each VDEV. With dedup on, the situation gets much more complicated, especially if the ratio is good, and if the ratio is poor, do not use it. In essence, if the ratio is good, or great, you have a large number of references, all of which are of course metadata and they all need to be updated, along with snapshots, and snapshot related metadata. If you filesystems have small blocksizes, the situations gets even more complex, because the amount of refs. is much greater for a 4K blocksize dataset vs. a 128K dataset.

Basically, there are few things you can do, other than: 1) Add high-performance SSDs as caching devices, and tune your filesystem to use caching devices for nothing but metadata, 2) reduce large delete/destroy operations, 3) re-consider usage of deduplication. But, you cannot simply disable deduplication on a pool, or a filesystem. If enabled on whole pool, you have to re-create the pool, or if set on individual filesystem destroying and re-creating the filesystem will address the issue.

At Nexenta, we are very careful with deduplication when we recommend it to customers. There are a lot of cases where it is a brilliant solution, and customer could not live without it. And in those cases we often have customers using 96GB of RAM or more, to maintain more of the metadata, and the DDT in RAM. As soon as DDT metadata is pushed to spinning media everything literally comes to a screeching halt. Hope this helps.

  • 3
Reply Report
      • 2
    • Is an SSD array that can do 400k IOPS good enough for the DDT metadata? I'm wondering if I can realistically do dedup on 150tb with only 100gb RAM. Throughput above 300MB/s isn't that important.

Trending Tags