Network MTU Check
Recently had a cable modem 'updated' ... people were not getting emails from gmail, after looking in mail.log I noticed some timeouts like this (also from some other hosts)
postfix/smtpd: timeout after DATA (0 bytes) from mail-ee0-f51.google.com[126.96.36.199]
...which was odd as most mail was working fine....didn't get very far googling until I saw a mention about MTU...so I looked into it, and it turned out to be the problem.
So here is how to check the MTU out.
Check your router is set to 1500, otherwise it will be limited by that before you get any further.
Basically you want to make sure you can send a packet without it needing to be fragmented, (the larger the better for efficiency, without it fragmeting)
These took place from the server, sitting behind the cable modem/router box. MTU set to 1401 in ping fails (note the MTU is 28 bytes larger than what you ping with due to overhead of protocols, hence 1373)
$ ping -M do -c 2 -s 1373 www.yahoo.com PING eu-fp3.wa1.b.yahoo.com (188.8.131.52) 1373(1401) bytes of data. From 192.168.1.50 icmp_seq=1 Frag needed and DF set (mtu = 1400) From 192.168.1.50 icmp_seq=1 Frag needed and DF set (mtu = 1400) --- eu-fp3.wa1.b.yahoo.com ping statistics --- 0 packets transmitted, 0 received, +2 errors
This is with MTU set to 1400, which turns out to be the max due to a limit somewhere within the ISP's system.
$ ping -M do -c 2 -s 1372 www.yahoo.com PING eu-fp3.wa1.b.yahoo.com (184.108.40.206) 1372(1400) bytes of data. 1380 bytes from ir1.fp.vip.ird.yahoo.com (220.127.116.11): icmp_req=1 ttl=52 time=43.6 ms 1380 bytes from ir1.fp.vip.ird.yahoo.com (18.104.22.168): icmp_req=2 ttl=52 time=68.7 ms --- eu-fp3.wa1.b.yahoo.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 43.637/56.185/68.734/12.550 ms
To set network to suit on the fly (lasting till reboot)
sudo ifconfig eth0 mtu 1400
To change it permanently add it at bottom of the eth0 properties (in this case server isn't using network-manager) in /etc/network/interfaces
# The primary network interface auto eth0 iface eth0 inet static address 192.168.1.50 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.0.255 gateway 192.168.1.1 mtu 1400
It also fixed a long running issue I was having with this mediawiki which usually hung trying to submit a page and would take several retries before it worked. But once a page was submitted further submissions seemed to usually be ok. Hurray!
The router is a Virgin cable rebranded Netgear box. Something appears to have changed (April/2012) as max MTU is now up to 1460.
During looking into this a second time (thanks to info here, http://www.netheaven.com/pmtu.html , specifically "Example Path MTU Discovery Failure Scenario") I think the problem boils down to path MTU (pmtu) discovery not working properly on Virgin Cable. (this is the 30Mbit service, maybe it isn't broken on the other ones)
e.g. the distant mail server connets using small packets, it then tries to send the bulky message with 1500 byte packets, which doesnt fit through Virgin. It doesn't get the DF response so doesnt try to reduce it, so no data gets through after its initial connection - then I see the "DATA (0 bytes) error" as it cant send anything.
- Router set to 1500 MTU (or 0 in its settings, which makes it default to 1500). IP pings at max 1460, anything higher gets no response whatsoever (no DF, PMTU broken)
- Router set at any less e.g. 1499, max mtu to ping IP with is 1400 , do not understand this.
- Mail server behind router needs MTU setting to 1460 if router set to 1500, or 1400 if router any less than 1500 (like 1499), otherwise DATA (0 bytes) errors reoccur.
Using tracepath is also very helpful in seeing more visually where the MTU restriction comes about. I suggest running it against the routers IP from your machine and from the machine behind the router back to your IP.
The second ping with 1433(1461) bytes should get a DF but doesn't, hence the problem.
$ ping -M do -c 2 -n -s 1432 richud.com PING richud.com (22.214.171.124) 1432(1460) bytes of data. 1440 bytes from 126.96.36.199: icmp_req=1 ttl=50 time=102 ms 1440 bytes from 188.8.131.52: icmp_req=2 ttl=50 time=124 ms --- richud.com ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 102.433/113.705/124.978/11.277 ms $ ping -M do -c 2 -n -s 1433 richud.com PING richud.com (184.108.40.206) 1433(1461) bytes of data. --- richud.com ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 999ms
So in a nutshell, the first problem is the MTU isnt 1500 all the way through their system, the second is if path MTU worked then you shouldn't need to force the MTU down manually, which makes sense now!