• 10

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

I am configuring a server that runs 3 ZFS pools, 2 of which are rather purpose specific and I feel like the default recommendations are simply not optimized for them. Networking is facilitated by dual 10gbit adapters.

Pool 1 is a big file storage, it contains raw video data that is rarely written and read, and also occasional backups. There is absolutely no point in caching anything from that pool, as it is high bandwidth data that is read through in one sweep beginning to end, caching anything from it will be a complete waste of memory. Latency is not that much of an issue, and bandwidth is ample due to highly compressible data. The pool is made of 8 HDDs in z2 mode, usable capacity of 24TB.

Pool 2 is compressed video frames storage. Portions of this content are frequently read when compositing video projects. The portion of frequently used data is usually higher than the total amount of RAM the server has, there is a low latency requirement, but not ultra low, bandwidth is more important. The pool is made of 3 HDDs in z1, usable capacity of 8TB, and a 1TB NVME SSD for L2ARC.

Pool 3 is general storage used as storage for several computer systems that boot and run software from it rather than local storage. Since it has to service several machines and primary system storage, the requirements for latency and bandwidth here are the highest. This pool is mostly read from, writes are limited to what the client systems do. The pool is made of 3 SATA SSDs in z1 mode, 1TB of usable capacity.

My intent at optimization has to do with minimizing the ARC size for the first two pools in order to maximize the ARC size for the third one.

Pool 1 has no benefit from caching whatsoever, so what's the minimum safe amount of ARC I can set for it?

Pool 2 can benefit from ARC but it is not really worth it, as L2ARC is fast enough for the purpose and the drive has 1 TB of capacity. Ideally, I would be happy if I could get away without using any ARC for this volume, and using the full terabyte of L2ARC, but it seems that at least some ARC is needed for L2ARC header data.

So considering L2ARC capacity of 1 TB and pool record size of 64k, 1tb / 64kb * 70b gives me ~0.995gb. Does this mean I can safely cap the ARC for that pool at 1GB? Or maybe it needs more?

It seems that the ARC contains both read cache as well as the information to handle the L2ARC, so it looks like what I need is some option to give emphasis to managing a larger L2ARC than bother with caching actual data in RAM. And if necessary, mandate that any cache evictions from ARC are moved to L2ARC in the event cache eviction policies do not abide to usual caching hierarchy policies.

The general recommendations I've read suggest about 1GB of RAM per 1TB of storage, I am planning 32GB of RAM per 33 TB of storage which I am almost dead on, but 4 or 5 to 1 for L2ARC vs ARC, which I fall short of by quite a lot. The goal is to cut pool 1 ARC as low as possible, and cut pool 2 ARC to only as much as it needs in order to be able to utilize the whole 1TB of L2ARC, in order to maximize the RAM available for ARC for pool 3.

First, I really suggest you to reconsider your layout for pools n.2 and n.3: a 3-way mirror is not going to give you low latency, nor high bandwidth. Rather than an expensive 1 TB NVMe disk for L2ARC (which, by the way, is unbalanced due to the small 32 GB ARC), I would use more 7200 RPM disks in a RAID10 fashion or even cheaper but reliable SSDs (eg: Samsung 850 Pro/Evo or Crucial MX500).

At the very least, you can put all disks on a single RAID10 pool (with SSD L2ARC) and segment the single pool by the virtue of multiple datasets.

That said, you can specify how ARC/L2ARC should be used on a dataset-by-dataset base by using the primarycache and secondarycache options:

  • zfs set <dataset1> primarycache=none; zfs set <dataset1> secondarycache=none will disable any ARC/L2ARC caching for the dataset. You can also issue zfs set <dataset1> logbias=throughput to privilege througput rather than latency during write operations;
  • zfs set <dataset2> primarycache=metadata will enable metadata-only caching for the second dataset. Please note that L2ARC is feed by the ARC; this means that if ARC is caching metadata only, the same will be true for L2ARC;
  • leave ARC/L2ARC default option for the third dataset.

Finally, you can set your ZFS instance to use more than (the default of) 50% of your RAM for ARC (look for zfs_arc_max in the module man page)

  • 3
Reply Report
    • The idea behind pool 2 is that the low latency and high bandwidth will come from the L2ARC, applied to the files that are in recent use. The HDDs are there for bulk storage for cold frames and some basic redundancy. RAM caching is overkill for that pool, which is why I want to make it to pseudo-tier between the nvme ssd for hot files and the hdds for cold files. I am not that keen on raid 10 as it sacrifices quite a lot of capacity and redundancy.
    • In ZFS, L2ARC is not a magical "low latency/high bandwidth" bullet. As it is only feed from the ARC (and never from the disks), you can quite often end with data which are evicted from ARC before they are pushed on L2ARC. Moreover, L2ARC is not persistent during reboot (albeit work is in progress for that), so a reboot will drop any data which were on L2ARC (with much slower access time to user data).
    • Reboots are not a problem, with my setup a reboot would only take place in the event of hardware failure that cannot be addressed via hotswap. I don't know any implementation details, but it seems unreasonable that cache might be evicted from ARC without being pushed to L2ARC, why would something like that happen? Shouldn't cache be moved from L1 to L2? That's how cache hierarchies usually work. Is that not the case with zfs?
    • I mean, from what I know about cache, access to a sector or line or whatever caches it at L1, and it is usually FIFO, so as soon as cache capacity runs out, the first cached entity is pushed to a higher level of cache or purged if there isn't any. So everything that is read should end up in the L2 cache until it fills up and is displaced.
      • 1
    • @dtech because L2ARC is not a cache in the stricter sense. It is feed by ARC, but it does not directly receive evicted ARC entries. This is by design - to leave off the critical path the slower L2ARC. You can read more here. For reboots: how are you planning to patch kernel or libc level bugs/CVE? You had to plan for reboots.

Trending Tags