MTU and TCP MSS when using PPPoE

I switched to Bell from Rogers about half a year ago. A goal I had was to remove their router and use my own EdgeRouter Pro. Once I got the PPPoE connection up I was able to ping the rest of the world but couldn't load most websites. Eventually I found I had to adjust the MTU and add MSS clamping to get everything to work. At the time just blindly used MTU and MSS clamp values I found online. They turned out to be correct but last night I decided to experiment and research to find the correct values I should be using.

Finding the MTU

First you should understand that almost all networking gear has their Maximum transmission unit set to 1500 bytes for each interface. The Ethernet header overhead (18 bytes1) is not included in this. This means that the payload inside the Ethernet frame can be at most 1500 bytes long.

What goes inside the payload of the frames depends on what you are doing. If you are pinging an IP, it would be a ICMP packet inside an IP packet so to figure out the largest ICMP packet size you can use, you subtract the size of the IP header (20 bytes2) and the ICMP header (8 bytes) from the MTU: 1500 - 20 - 8 = 1472.

Throw in some PPPoE

Now if you tried to ping with the Don't fragment (DF) flag set, a packet size of 1472 should work and a packet size of 1473 should not work. Like this (on Linux):

$ ping -M do -s 1473 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 1473(1501) bytes of data.
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500

$ ping -M do -s 1472 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 1472(1500) bytes of data.
1480 bytes from 8.8.8.8: icmp_seq=1 ttl=51 time=1.27 ms
1480 bytes from 8.8.8.8: icmp_seq=2 ttl=51 time=24.3 ms
1480 bytes from 8.8.8.8: icmp_seq=3 ttl=51 time=1.31 ms
1480 bytes from 8.8.8.8: icmp_seq=4 ttl=51 time=1.77 ms

That is unless you're connecting over PPPoE. If you are using PPPoE you will find that your ping will fail with a packet size of 1472. This is because PPPoE has its own packet header of 8 bytes. If you subtract the PPPoE header from our previous value you will get the actual largest ICMP packet size: 1472 - 8 = 1464. Now you can try pinging with the new packet size, like this (on Mac):

$ ping -D -s 1465 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 1465 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1
ping: sendto: Message too long
Request timeout for icmp_seq 2
ping: sendto: Message too long
Request timeout for icmp_seq 3

$ ping -D -s 1464 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 1464 data bytes
1472 bytes from 8.8.8.8: icmp_seq=0 ttl=59 time=6.844 ms
1472 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=7.066 ms
1472 bytes from 8.8.8.8: icmp_seq=2 ttl=59 time=7.066 ms
1472 bytes from 8.8.8.8: icmp_seq=3 ttl=59 time=7.229 ms
1472 bytes from 8.8.8.8: icmp_seq=4 ttl=59 time=7.081 ms

What is MSS clamping?

Normally your computer will be able to determine a safe MTU using Path MTU Discovery (PMTUD) but this relies on your ISP actually sending back ICMP Too Big packets. Unfortunately Bell has decided (in their infinite wisdom) that this is not a good thing (probably under the guise of "security") so they leave you high and dry because your TCP connections may end up as "black hole connections"; this happens when the TCP handshake works but trying to send any data just gets dropped silently on their side.

The solution for this is called MSS clamping. You use your firewall to override the Maximum Segment Size (MSS) option on all TCP connections so they do not have issues with packets being too large. To figure out the MSS you want, you take the standard 1500 MTU and subtract the PPPoE header, the IP header, and the TCP header (20 bytes3): 1500 - 8 - 20 - 20 = 1452.

EdgeRouter

If you have an EdgeRouter, you'll want the following configuration options to set the MTU for your PPPoE connection and MSS clamping, where eth0 is the interface you are using and vif 35 is for VLAN 35.

set firewall options mss-clamp interface-type pppoe
set firewall options mss-clamp mss 1452
set interfaces ethernet eth0 vif 35 pppoe 0 mtu 1492

Conclusion

Blindly following values I found posted online worked but I wasn't satisfied. After some experimenting and reading Wikipedia, I now am confident in 1492 as the MTU and 1452 for the TCP MSS, and I understand why they work.

Notes:
  1. Ethernet frame headers start at 18 bytes long, grow to 22 bytes with VLAN tagging, and 26 bytes with Q-in-Q VLAN tagging.
  2. IP packet header start at 20 bytes long and can be up to 60 bytes if there are options specified; however, it is rarely used.
  3. Like IP, TCP packet headers start at 20 bytes long and can be up to 60 bytes if there are options.

Join the conversation

9 Comments

  1. It worked! Thanks for sharing

    I used a tun interface-type in my case 🙂

  2. Hi,

    I’ve just done the same thing as you (bypassed the Bell-provided Home Hub router and now using an EdgeRouter instead). The setup works fine, but I noticed an issue with my IPTV service. My ONT is connected via a small switch to both the Bell router and the EdgeRouter; I left the Bell router for the IPTV service, since my IPTV receiver is connected to it via coax cable. I’ve noticed constant traffic flowing between the ONT and the Bell modem, even when I’m not watching TV or recording anything. The traffic is 24/7. Any ideas what this could be?

    Thanks
    cinergi

  3. This was just what I needed. I had figured out there was an MTU issue, but this is a great explanation of the fix.

    To help others find the post, let me just mention this was using an EdgeRouter Lite on CenturyLink gigabit GPON fiber, in Seattle.

  4. Thanks for sharing.

    I got a EdgeRouter Lite.
    I’m was using TCP MSS of 1412 on all interfaces.
    Now i’m using TCP MSS of 1452 on only PPPoE interface.

    Now i know how it’s working.

  5. Hi, thanks for sharing Your understanding of how encapsulation process is working in TCP/IP networks.

    I think it would be worth mentioning, that not only IP and TCP datagrams size may vary, but also ICMP datagram can vary. Actually, ICMP is encapsulated in IP datagram body. Hence, ICMP does not increase size of IP (that is already calculated in the “average” size of 20B). When calculating the MTU (section “Finding the MTU”), the number 8 is not actually from ICMP, but its from the PPPoE (which size is also variable due to different messages such as PADI, PADO etc., but 8 works as a “rule of thumb”).

    Anyways, thanks again for sharing!

    More info available online:
    http://wiki.treck.com/Introduction_to_TCP/IP#ICMP_Message_Delivery
    https://tools.ietf.org/html/rfc792

  6. The MSS clamping did not work for me until I applied it to all interfaces.

    set firewall options mss-clamp interface-type all

    I was checking it with wireshark using this filter:
    tcp.options.mss_val > 1452

    I was seeing values of 1460 that traversed the pppoe interface.

  7. Wow, nice catch. This just happened to me, too.
    My client’s ISP Deutsche Telekom all of a sudden stopped throwing back ICMP Too Big packets, and as a consequence the whole office wasn’t able to do what they’re paid for.

    Anyway, I had it figured out by manually setting a Win7 workstation’s MTU to 1492 (it’s PPPoE) but couldn’t be bothered configuring each and every client on the LAN.

    Seems like I don’t have to – thanks to you, good Sir.

  8. Your figures worked great for me. Thanks for writing this up!

    For the benefit of those coming from search engines, this allowed me to use an EE BrightBox in bridge mode as a VDSL modem with an EdgeRouter X handling PPPoE.

  9. I’m using OpenWRT on a flash router. I’ve a PPPoE ISP. I frequently loose connectivity due to the ISP’s server failure to return an LCP Reply. I’ve toyed with many options trying to resolve the issue. The ISP does not understand the problem. In reviewing the MTU/MRU/MSS, via many websites, I suspect this is an issue for my connection. This is what I see, starting at 1472 and working down:

    ….All values from 1472 to 1446 return this:

    $ ping -M do -s 1446 -c 2 -W 4 8.8.8.8
    PING 8.8.8.8 (8.8.8.8) 1446(1474) bytes of data.
    From (Local IP) icmp_seq=1 Frag needed and DF set (mtu = 1472)
    ping: sendmsg: Message too long
    From (Local IP) icmp_seq=2 Frag needed and DF set (mtu = 1472)
    ping: sendmsg: Message too long

    …Then:

    $ ping -M do -s 1444 -c 2 -W 4 8.8.8.8
    PING 8.8.8.8 (8.8.8.8) 1444(1472) bytes of data.
    1452 bytes from 8.8.8.8: icmp_seq=1 ttl=114 time=27.2 ms
    1452 bytes from 8.8.8.8: icmp_seq=2 ttl=114 time=34.3 ms

    Given this, am I to conclude that the MTU/MRU is 1472 and the MSS is 1444? Or is the MSS to be 1452? Or have I missed somethine else entirely?

Leave a comment

Your email address will not be published. Required fields are marked *