• 10

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

I'm curious if anyone out there has done any experimentation or comparisons between file systems and database performance. On Linux, I'm wondering what is the optimal file system for a postgres database. Also, what settings (inode, etc) are ideal for it? Is this something that may drastically differ based on the data in the database?

If you are looking for a question relating to general filesystem / database performance, this post has some good information.

However, I would like to get as much advice on insert performance opposed to read performance as possible. Thanks for all of the great answers!

    • +1 for Oskar. We just went from a server config where RAM was ~33% of the total size of the DB to a new machine where total RAM was greater than the size of the DB. Now we can cache the entire DB in memory. Our slowest SQL query is now 2 orders of magnitude faster.

Buy a copy of "postgresql high performance" by Greg Smith. It is a great Book and two or more chapters are about Disk Hardware and filesystems. You will learn a lot.

In short: there is no short answer.

But i will try to summerize:

  • don't use ext2 until you know what you are doing.
  • with ext3 beware of checkpoint spikes because of fsync calls, see page 113 and 82 and 79
  • use ext4 or xfs
  • there are other options

But as you are really asking yourself what FS to use, you should read the book!

  • 15
Reply Report

First of all, you want a reliable filesystem first, and a fast one second. Which rules out some options...

Performance testing shows that often XFS gives the best performance. There are some stability issues with it once you reach disk-very-close-to-full scenarios, but as long as you monitor for that not happening, it'll give you slightly better performance.

In theory you don't need a journaling filesystem for the pg_xlog directory, but the difference in speed is usually so small it's just not worth it. For the data directory, you really should always have a metadata journaling filesystem.

  • 6
Reply Report
      • 2
    • You might want to /not/ use XFS to store a database, namely because it will (when needed) zero out blocks that it can't recover.

Database management systems implement their own journalling through the database logs, so installing such a DBMS on a journalled file system degrades performance through two mechanisms:

  1. Redundant journalling increases the amount of disk activity

  2. Physical disk layout can be fragmented (although some journalling file systems do have mechanisms to clean this up).

  3. Lots of disk activity can fill up the journal, causing spurious 'disk full' conditions.

I have seen an instance some years ago where this was done on LFS file system on a Baan installation on a HP/UX box. The system had persistent performance and data corruption issues that went undiagnosed until someone worked out that the file systems were formatted with LFS.

Volumes holding database files will normally have a small number of large files. DBMS servers will normally have a setting that configures how many blocks are read in a single I/O. Smaller numbers would be appropriate for high volume transaction processing systems as they would minimise caching of redundant data. Larger numbers would be appropriate for systems such as data warehouses that did a lot of sequetial reads. If possible, tune your file system allocation block size to be the same size as the multi-block read that the DBMS is set to.

Some database management systems can work off raw disk partitions. This gives varying degrees of performance gain, typically less so on a modern system with lots of memory. On older systems with less space to cache file system metadata the savings on disk I/O were quite significant. Raw partitions make the system harder to manage, but provide the best performance available.

RAID-5 volumes incur more write overhead than RAID-10 volumes, so a busy database with lots of write traffic will perform better (often much better) on a RAID-10. Logs should be put physically separate disk volumes to the data. If your database is large and mostly read only (e.g. a data warehouse) there may be a case to put it on RAID-5 volumes if this does not unduly slow down the load process.

Write-back caching on a controller can give you a performance win at the expense of creating some (reasonably unlikely but possible) failure modes where data could be corrupted. The biggest performance win for this is on highly random access loads. If you want to do this, consider putting the logs on a separate controller and disabling write-back caching on the log volumes. The logs will then have better data integrity and a single failure cannot take out both the log and data volumes. This allows you to restore from a backup and roll forward from the logs.

  • 4
Reply Report
      • 1
    • Journalling data degrades performance; journalling metadata should have at worst minimal impact, and most likely, almost none. Not journalling metadata is unadvisable.
      • 2
    • I think you misunderstood the article. Any filesystem at all has filesystem metadata and any disk traffic will involve reading or writing this. Modern computers usually have enough RAM to easily cache this file system metadata but older machines didn't. This meant that disk accesses incurred significant additional I/O overhead (the oft-quoted figure for Oracle was a 30% performance hit over raw partitions) for reading or updating the file system's metadata. On a modern system with more RAM the file system metadata is more likely to be cached, so the overhead is lower.
    • This contains some good general advice, but I downvoted because it also contains information that is irrelevant or incorrect for postgresql and modern journaled filesystems.

I did such a detailed report but it is only in French. If you read french or are happy with automatic translation tools... You can reuse the methodology and run it for yourself.

Executive summary: I used pgbench. The Linux I/O scheduler has very little importance for performances and the filesystem only a little. So, if you are in a hurry, just choose the default. I choosed JFS.

  • 3
Reply Report

Filesystem is only part of the problem. You can get significant performance boost by changing your IO scheduler. Fortunately this is fairly easy to test as you can change the IO scheduler on the fly. I'd suggest trying each one for a couple of days under typical load and see which gives the best performance.

  • 2
Reply Report

I did some testing a few months ago:

I had a small test program that created 50 threads, where every thread inserted 1000 (or if it was 10000) rows into the same table.

  • With the database on EXT3 and a 4 disk RAID5 it took 50 seconds.
  • With the table on ramdisk (using tablespace) it still took 50 seconds. The reason it wasn't faster is that everything is logged in the pg_xlog directory that where still on the same RAID 5.
  • I moved the pg_xlog to a 4 disk RAID0 (stripe) and the same program run in 40 seconds.
  • For testing purposes I moved the pg_xlog to the ramdisk and had everything else on the EXT3 4 disk RAID. The program was finished after less than 5 seconds.

But having the pg___xlog on a software ramdisk is not an option: If you lose the contents of the pg_xlog directory postgres will not start. (But there exists hardware ramdisks with battery backup that might be of interest.)

IMHO: Use the filesytem you are most comfortable with for the database files. Move the pg_xlog (with a symlink, see documentation) to the fastest possible device you have.

  • 2
Reply Report

Trending Tags