• 7
name

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191

Backtrace:

File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

I'm taking care of a few servers that make relatively heavy use of a NFS filer.

At crunch time a single server sends about 2k reads + 2k writes per sec to the filer. According to nfsiostat at these times the read and write RTTs increase from the usual 5-10ms to 50-100ms for both r/w. The average execution time according to nfsiostat though is drastically different for reads and writes: reads ca 200ms, writes between 100-200sec.

Now from the little I understand average exec time in nfsiostat is essentially RTT + kernel time (RPC queuing etc). Talking about RPC - I see the exact same behaviour on RH5.8 (RPC backlog ca 5k at crunch) and RH6.3 (RPC backlog 0 at all times).

So - question: what could be the reason for NFS read/write execution time being so vastly different?

(The NFS calls are made by proprietary app into which I don't have much visibility. Same applies to the filer (NetApp - outsourced). Kernel tracing is probably not going to be an option either)

Thank you very much in advance.

EDIT: to clarify and because I get some answers dealing with NFS performance in general - the only part the really interests me is: according to nfsiostat doco RTT is "... the duration from the time that client's kernel sends the RPC request until the time it receives the reply." and Avg Exe is "... the duration from the time that NFS client does the RPC request to its kernel until the RPC request is completed, this includes the RTT time above." (http://man7.org/linux/man-pages/man8/nfsiostat.8.html).

My (maybe naive) understanding is that after RTT the server is done, out of the equation. So what does the NFS client do that Avg Exe are so different for reads/writes when RTT is the same?

      • 2
    • I would check network latensy first. Second what you can do is to check with packet trace time difference between TCP ACK reply ( time when server received TCP request) and RPC reply. This difference will indicate how long server was processing the request. After that you know where the problem is: server or network.
      • 2
    • @tigran - I am aware of network latency and server processing time - see me mentioning RTT. What I wonder is: considering BOTH r/w report 50-100ms RTT (which includes server processing time) why do reads take 200ms to execute fully, but writes 200sec? Where does this x1000 factor come from?
      • 2
    • server side problems usually related to IO subsystem. For example if you have less than 5% of free space, one of the disk has IO errors or battery of raid controller is broken.
      • 1
    • Are any of the servers mentioned VMs and if so do you know the hypervisor backend and/or other related configurations of the VM?
    • @matt - the clients are physical, the server is physical at the core, but presents itself to the client as virtual (vFiler - essentially it's a really beefy corporate NFS filer).

Based on the scenario described it is possible that there are multiple contributing factors. sync and async options can have a significant impact on performance, as can the nolock option when mounting the NFS.

Performance could also potentially be impacted by file-ownership and permissions issues related to ACLs for the files and/or use of the root_squash and no_root_squash options when mounting. Though if this were the case there would likely be evidence of it in logs somewhere.

There could also be issues moving files in and out of memory. Depending on the sequence of operations and amount of physical and virtual memory there could be a lot of thrash occurring as well.

If the traffic is ongoing and file operations are not cleaned up properly/hanging/taking excessive time to complete write-ops, the number of files currently open could start to push the limit supported by the kernel, this can be checked by running:

cat /proc/sys/fs/file-max

If that value is low on the NFS server or client it might be worth increasing it to see if performance improves. Depending on the network environment implementing certain QoS policies could potentially have an impact on performance(though probably not very likely).

  • 0
Reply Report

Trending Tags