octoDNS and Route53

Just a quick and simple post. If you want to use octoDNS with Amazon's Route53, you can use the following permisson policy to restrict the user to only what octoDNS needs to do its job.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "route53:ChangeResourceRecordSets",
                "route53:CreateHostedZone",
                "route53:ListHealthChecks",
                "route53:ListHostedZones",
                "route53:ListHostedZonesByName",
                "route53:ListResourceRecordSets"
            ],
            "Resource": "*"
        }
    ]
}

ICMP Redirect Host(New addr: X.X.X.X)

I'm setting up a VPN on my linux server and I wanted to have firewall rules to control communication between clients so in my OpenVPN configuration I removed client-to-client so OpenVPN pass all of the traffic to the kernel.

Once I did this and did a ping from one client to another, I would get these ICMP Redirect messages but the ping would work.

$ ping 192.168.12.127
PING 192.168.12.127 (192.168.12.127): 56 data bytes
64 bytes from 192.168.12.127: icmp_seq=0 ttl=63 time=59.198 ms
92 bytes from 192.168.12.1: Redirect Host(New addr: 192.168.12.127)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 b2ac   0 0000  3f  01 f1d7 192.168.12.128 192.168.12.127

64 bytes from 192.168.12.127: icmp_seq=1 ttl=63 time=54.229 ms
92 bytes from 192.168.12.1: Redirect Host(New addr: 192.168.12.127)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 b854   0 0000  3f  01 ec2f 192.168.12.128  192.168.12.127

64 bytes from 192.168.12.127: icmp_seq=2 ttl=63 time=55.497 ms
64 bytes from 192.168.12.127: icmp_seq=3 ttl=63 time=54.718 ms

After a bit of Googling I found this forum post which mentions traffic that comes in from one interface and goes out the same interface on linux causes the redirects. I then found there's a sysctl that controls this behavour called net.ipv4.conf.all.send_redirects. So a quick sysctl net.ipv4.conf.all.send_redirects=0 and the redirects went away and all was right in the world.

MTU and TCP MSS when using PPPoE

I switched to Bell from Rogers about half a year ago. A goal I had was to remove their router and use my own EdgeRouter Pro. Once I got the PPPoE connection up I was able to ping the rest of the world but couldn't load most websites. Eventually I found I had to adjust the MTU and add MSS clamping to get everything to work. At the time just blindly used MTU and MSS clamp values I found online. They turned out to be correct but last night I decided to experiment and research to find the correct values I should be using.

Finding the MTU

First you should understand that almost all networking gear has their Maximum transmission unit set to 1500 bytes for each interface. The Ethernet header overhead (18 bytes1) is not included in this. This means that the payload inside the Ethernet frame can be at most 1500 bytes long.

What goes inside the payload of the frames depends on what you are doing. If you are pinging an IP, it would be a ICMP packet inside an IP packet so to figure out the largest ICMP packet size you can use, you subtract the size of the IP header (20 bytes2) and the ICMP header (8 bytes) from the MTU: 1500 - 20 - 8 = 1472.

Throw in some PPPoE

Now if you tried to ping with the Don't fragment (DF) flag set, a packet size of 1472 should work and a packet size of 1473 should not work. Like this (on Linux):

$ ping -M do -s 1473 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 1473(1501) bytes of data.
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500

$ ping -M do -s 1472 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 1472(1500) bytes of data.
1480 bytes from 8.8.8.8: icmp_seq=1 ttl=51 time=1.27 ms
1480 bytes from 8.8.8.8: icmp_seq=2 ttl=51 time=24.3 ms
1480 bytes from 8.8.8.8: icmp_seq=3 ttl=51 time=1.31 ms
1480 bytes from 8.8.8.8: icmp_seq=4 ttl=51 time=1.77 ms

That is unless you're connecting over PPPoE. If you are using PPPoE you will find that your ping will fail with a packet size of 1472. This is because PPPoE has its own packet header of 8 bytes. If you subtract the PPPoE header from our previous value you will get the actual largest ICMP packet size: 1472 - 8 = 1464. Now you can try pinging with the new packet size, like this (on Mac):

$ ping -D -s 1465 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 1465 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1
ping: sendto: Message too long
Request timeout for icmp_seq 2
ping: sendto: Message too long
Request timeout for icmp_seq 3

$ ping -D -s 1464 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 1464 data bytes
1472 bytes from 8.8.8.8: icmp_seq=0 ttl=59 time=6.844 ms
1472 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=7.066 ms
1472 bytes from 8.8.8.8: icmp_seq=2 ttl=59 time=7.066 ms
1472 bytes from 8.8.8.8: icmp_seq=3 ttl=59 time=7.229 ms
1472 bytes from 8.8.8.8: icmp_seq=4 ttl=59 time=7.081 ms

What is MSS clamping?

Normally your computer will be able to determine a safe MTU using Path MTU Discovery (PMTUD) but this relies on your ISP actually sending back ICMP Too Big packets. Unfortunately Bell has decided (in their infinite wisdom) that this is not a good thing (probably under the guise of "security") so they leave you high and dry because your TCP connections may end up as "black hole connections"; this happens when the TCP handshake works but trying to send any data just gets dropped silently on their side.

The solution for this is called MSS clamping. You use your firewall to override the Maximum Segment Size (MSS) option on all TCP connections so they do not have issues with packets being too large. To figure out the MSS you want, you take the standard 1500 MTU and subtract the PPPoE header, the IP header, and the TCP header (20 bytes3): 1500 - 8 - 20 - 20 = 1452.

EdgeRouter

If you have an EdgeRouter, you'll want the following configuration options to set the MTU for your PPPoE connection and MSS clamping, where eth0 is the interface you are using and vif 35 is for VLAN 35.

set firewall options mss-clamp interface-type pppoe
set firewall options mss-clamp mss 1452
set interfaces ethernet eth0 vif 35 pppoe 0 mtu 1492

Conclusion

Blindly following values I found posted online worked but I wasn't satisfied. After some experimenting and reading Wikipedia, I now am confident in 1492 as the MTU and 1452 for the TCP MSS, and I understand why they work.

Notes:
  1. Ethernet frame headers start at 18 bytes long, grow to 22 bytes with VLAN tagging, and 26 bytes with Q-in-Q VLAN tagging.
  2. IP packet header start at 20 bytes long and can be up to 60 bytes if there are options specified; however, it is rarely used.
  3. Like IP, TCP packet headers start at 20 bytes long and can be up to 60 bytes if there are options.

Clearing EX4200 PEM chassis alarms

In preparing for Black Friday we installed the UPS on the floor where our Operations but the final step of that would mean unplugging the switches and moving them to the PDU powered by the UPS because they don't have redundant power supplies. We do have extras from a floor that is not complete yet so we borrowed a normal one and a PoE one and went about plugging the extra in and swapping the switch to UPS power. This however lead to minor alarms on the switch about the power supply being removed.

--- JUNOS 12.3R6.6 built 2014-03-13 06:58:47 UTC
4 alarms currently active
Alarm time               Class  Description
2014-11-27 22:39:02 UTC  Minor  FPC 0 PEM 1 Removed
2014-11-27 22:45:21 UTC  Minor  FPC 1 PEM 1 Removed
2014-11-27 22:51:55 UTC  Minor  FPC 2 PEM 1 Removed
2014-11-27 22:57:30 UTC  Minor  FPC 3 PEM 1 Removed

A minor annoyance and unfortunately Googling around for a solution did not reveal one. I can't recall what put me down this path but I happen to try restart chassis-control gracefully. This removed the minor alarms about the power supply without any downtime so I was happy to have found this solution.