At Travis CI (http://travis-ci.org) we use VirtualBox VMs (through Vagrant) for running tests for the Ruby community.
On our worker servers we have up to (up to) N parallel processes running N test suites in N VMs in parallel, i.e. one worker process runs one test suite in one VM at a time, but N of them are running concurrently.
Now, as soon as many workers are actually performing builds in parallel the performance of each build will degrade significantly compared to when the very same build would be run in a single worker (and nothing else running in parallel).
Here is an example:
This "build matrix" consists of 20 individual builds:
At the time when this was run there were 10 workers running, so this build started out with 10 individual builds being executed in 10 workers (and VMs) in parallel. This build is one of them and it has taken ~ 2 hours to complete:
[see the last link in the list on the page above, i only can post 2 urls]
The very same build would take only ~ 20 minutes when there are no other builds being executed in parallel. Here's an example of that:
This performance degradation obviously is something we need to sort out but we're not sure where to look.
The test suite basically executes Ruby processes which might shell out and spawn several other Ruby processes each executing unit tests on the codebase. Some of them hit databases such as MySQL, Sqlite3 and Postgres but we also notice the same sort of degradation with tests that do not hit any database at all.
The worker server that hosts these processes and VMs looks like this:
- Linux 2.6.32-31-server #61-Ubuntu SMP Fri Apr 8 19:44:42 UTC 2011 x86_64 GNU/Linux
- 12x (Hexacore) Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
- 12 GB Memory
- Linux lucid32 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 21:21:01 UTC 2011 i686 GNU/Linux
- 1x Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
- 1 GB Memory
Any hints on how to sort this out or maybe just better identify the root problem would be highly appreciated.