• 3

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

How is disk speed measured? Is it Mbit or Mbyte per second read? What is average today and what is fast and what is very fast in the industry?

Let's say somebody says it takes a long time to make a copy of a file of 1500 GB (say a database file), how long would that take on a professional system and how can that be calculated taking the speed of the hard disk into acount?

Disk speeds are usually measured in;

  • Rotational speed in revolutions per minute (lowest at 4200rpm, then 5400, 7200, 10k and 15k - this isn't applicable to SSDs or flash memory).
  • Interface speed is the fastest a disks electronics can try to send the data to the disk controller (these range from ATA's 100MBps through SATA's 150/300/600 Mbps, Fibre-Channel's 2/4/8/16 Gbps and even to PCIe speeds for flash-based storage such as FusionIO).
  • Seek time is simply the time it takes to start reading or writing a particular sector of disk - these can range from 3-15ms for disks to a small fraction of this for SSD/flash disks.
  • Then we get to actual speed you can expect, there are four speeds you should care about; sequential read (reading a very large block of data), sequential write (same but writing), random read (getting data from all over the disk) and random write. These vary enormously but for spinning disks you can expect anything from 25MBps-to-150MBps for sequential read and write and anything from 3MBps-to-50Mps for random read and write. SSDs are typically in the 200MBps range for sequential and usually a little less for random operations. FusionIO can easily hit 1GBps for all, but are typically small and expensive.

As you can see there is no real average, if you'd like recommendations on what to buy please feel free to come back to us with as much information as you can - this should include budget, application type, data set size, user base, hardware/OS plus anything else you think would be useful.

As for your 1.5TB copy, well if you were doing this to a USB 2-attached 7200rpm SATA disk you should get at least 30MBps-40MBps or so the full 1.5TB could take over 10 hours or so. If this were a typical professional DAS/SAN system I'd expect in the region of 100MBps meaning it'd take around 3 hours.

Hope this helps, oh and just to clarify, MB=megabytes, Mb is megabits.

  • 19
Reply Report

There are many, many variables involved in these kinds of calculations. Real world disk systems have a lot of inter-dependencies. Just within a single computer:

  • Actual rated speed of the drive itself (generally the RPMs, 5200, 7200, 10K, 15K)
  • The file-system in use
  • Whether or not a RAID system is in use
    • If it is, the performance of the RAID card
    • The type of RAID
  • The Operating system in use
  • Read and Write operations have completely different performance characteristics
  • The read/write ratio for operations
  • For sequential operations, the fragmentation factor of storage

As you can see, the speed of a disk itself is but one of many factors. It's a largish factor, but still one of many. If that 1.5TB copy is all on the same disk, then the disk will (95% likely) be performing a 100% random read/write performance, which generally turns in the worst performance metrics. If the copy is from one disk to another, and the data is 100% sequential and the target disk is completely empty, this should turn in the fastest performance possible with this disk subsystem. Real world performance will be somewhere between these two extremes.

If you're copying between two separate servers there are even more factors involved.

I have a storage array at work that can saturate 3Gb (gigaBIT) SAS channels when doing largely sequential operations. If I had 6Gb SAS it could probably get very close to saturating those too. For random I/O this particular system performs very differently based on what the OS is (OpenSolaris, for instance, had the worst random I/O, and Linux XFS the best by a factor of 3).

There are just too many variables to answer this questions definitively.

  • 8
Reply Report

How long 1.5TB of data takes to copy depends very much on the type of data. If you have a few 1,500 1GB files, it will probably only take a few hours, but if you have a billion and a half 1KB files it will probably take days.

This is because of two contending specs on discs: the throughput and the average access time. A traditional disc with a 100MB/sec throughput and 10ms access time is fairly common. If you can stream data sequentially, you can get 100MB/sec. However, if you need to jump to another place it takes 10ms. Had you been streaming, you could have written 1MB of data in the time it takes to jump to another location.

Creating a file can take several seeks, so making a 1KB file can "cost" as much as streaming several MB of data.

So, in some cases it's better to do a raw disc copy of the block device than copying at the file-system via something like rsync. If you have a lot of files, in a file-system that is, say, 50% or more full, you're often better off just copying the full block device via "dd", as far as the time it takes. Of course, you can't do this while the file-system is mounted, so this has drawbacks as well.

SSDs can help mitigate this, because their access times are around 100 times as fast, but MLC SSD drives have complicated access issues depending on the availability of a pool of pre-erased blocks. SLC SSDs can help this.

RAID controllers with built-in cache can help with the seeks, as can something like the flashcache kernel module that lets you cache a block device via an SSD.

RAID systems can allow for multiple parallel seeks, effectively reducing the average access time, and also parallelization to increase the throughput. But your overall performance will often depend on how many files are involved.

  • 3
Reply Report

Trending Tags