I'm a new systems and network admin. My experience has been in the hardware and software of systems and servers, the network part is pretty new to me. I'm familiar with plugging numbers into network configurations, but if you ask me about subnetting or packet droppings (;) you'll see this really lost look flash across my face. I'm learning.
Here's my problem:
Since about two months before taking over the reins here, the previous Network Admin reports they've had issues downloading large files. Well, not really just large files, it's just more frustrating with the large ones. Now that I'm the one doing the downloads (everything from a random driver to the latest distros and SPs from our Technet subscription and Licensing agreement, to multi-gigabyte engineering software packages for our various departments) I have to "baby-sit" the downloads, keeping me chained to my desk for hours at a time.
The downloads will start just fine and get to some random point ranging from a few K to a couple G before the download will stall and fail if I don't pause and restart the download before it fails. Sometimes the pause/restart works right away, the download picks up speed and progresses a bit before the cycle repeats. Sometimes I have to go through several pause/restart cycles before the download starts actually downloading again.
The network and ISP details:
- Fiber internet connection served by our ISP (our local city is our ISP). Download speeds generally even out around 1.1Mbps, with spikes as high as 1.6Mpbs. Sometimes in the midst of pause/restart cycles we'll see speeds as low as a few hundred Kbps, but a few cycles later and it'll speed up again. Speeds from different hosts are pretty consistent.
- There is no proxy in our internal network and no firewall that I am aware of blocking the connection. We use a Cisco 1811W as our gateway, but it has not had any trouble before.
The issue was first noted around September, and there were no changes on our side around that time we can attribute this to.
What should I test, check, etc, to determine whether the problem is on our side or the ISPs?
I'm watching a wireshark feed filtered for the TCP stream of a large download I've had trouble with for a few days now. Most of the traffic frames are labeled...
Continuation or non-HTTP traffic
...which I assume is just the subsequent packets comprising the download. However, relatively frequently (between every 3-20 seconds) and corresponding pretty much exactly to any dips in the download speed reported by Firefox are large sections of frames labeled...
[TCP Retransmission] Continuation or non-HTTP traffic
There are also a few random frames, usually spread out surrounding the Retransmission packets a few dozen frames on either side, labeled...
[TCP Previous segment lost] Continuation or non-HTTP traffic
...and whadayaknow, the download just failed about halfway through the 3.2GB file. The final frame is a TCP Previous Segment Lost frame. This came immediately after I had to pause the download and attempted to restart it: queue immediate failure.
Final frames in the download were http [ACK] followed by http [FIN, ACK], which I believe indicated a "graceful" TCP connection closure.
I did not see anything else indicating interruption by an intermediary.
The issue is observed in all browsers and apps that download and the pause/restart functionality works 99% of the time in all apps that allow pause/restart. Specific apps and browsers I can replicate this easily in: Firefox (current versions), IE (9), iTunes (downloading apps and updates for iOS devices). I'm not sure if these all use the same functionality for the pause/resume function in downloading.
iTunes downloads from servers that all allow restarting (except the iOS update files) and so it does not matter how long I pause the download for. Most sites I'm downloading large files from (MS, PTC, Solidworks, AutoDesk) don't support resuming stopped/canceled downloads (MS does but only from there java-based download manager) and so I can only pause for around 15 seconds max before the download will fail immediately upon attempted resume.
Using mturoute (Thanks Tom H), I found the consistent route max MTU is 1500 bytes before fragmentation, and the path carried ICMP payloads with fragmentation of 10000 bytes from end to end without much issue, including the hops through my ISPs devices. So the issue does not appear to be fragmentation or incompatible MTU settings.
ICMP is also not blocked by my ISP, and neither is BitTorrent, though I'm not using BT to download these files.
So what I need to look into, judging from the WireShark logs, is how to pin down the cause of the Retransmissions and Previous segment lost frames. How would I isolate the probable source of these?