Samuel Kadolph

Amazon SES and BYODKIM

I started testing Amazon SES and didn't want to use their ugly EasyDKIM domains so I went for the BYODKIM. Never having to generate my own DKIM key I did a bit of searching and it's really simple.

Generate a 2048 (or 1024) bit RSA key: openssl genrsa -out dkim.priv 2048
Split the public key out: openssl rsa -in dkim.priv -pubout -out dkim.pub
You need to remove the header, footer, and new lines to paste into the SES console: cat dkim.priv | sed '1d;$d' | tr -d '\n'
Create a TXT DNS record for the public key with this value: echo "v=DKIM1\; k=rsa\; p=$(cat dkim.pub | sed '1d;$d' | tr -d '\n')"

Now wait a bit and you should see your domain validated in the SES console.

octoDNS and Route53

Just a quick and simple post. If you want to use octoDNS with Amazon's Route53, you can use the following permisson policy to restrict the user to only what octoDNS needs to do its job.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "route53:ChangeResourceRecordSets",
                "route53:CreateHostedZone",
                "route53:ListHealthChecks",
                "route53:ListHostedZones",
                "route53:ListHostedZonesByName",
                "route53:ListResourceRecordSets"
            ],
            "Resource": "*"
        }
    ]
}

ICMP Redirect Host(New addr: X.X.X.X)

I'm setting up a VPN on my linux server and I wanted to have firewall rules to control communication between clients so in my OpenVPN configuration I removed client-to-client so OpenVPN pass all of the traffic to the kernel.

Once I did this and did a ping from one client to another, I would get these ICMP Redirect messages but the ping would work.

$ ping 192.168.12.127
PING 192.168.12.127 (192.168.12.127): 56 data bytes
64 bytes from 192.168.12.127: icmp_seq=0 ttl=63 time=59.198 ms
92 bytes from 192.168.12.1: Redirect Host(New addr: 192.168.12.127)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 b2ac   0 0000  3f  01 f1d7 192.168.12.128 192.168.12.127

64 bytes from 192.168.12.127: icmp_seq=1 ttl=63 time=54.229 ms
92 bytes from 192.168.12.1: Redirect Host(New addr: 192.168.12.127)
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  00 0054 b854   0 0000  3f  01 ec2f 192.168.12.128  192.168.12.127

64 bytes from 192.168.12.127: icmp_seq=2 ttl=63 time=55.497 ms
64 bytes from 192.168.12.127: icmp_seq=3 ttl=63 time=54.718 ms

After a bit of Googling I found this forum post which mentions traffic that comes in from one interface and goes out the same interface on linux causes the redirects. I then found there's a sysctl that controls this behavour called net.ipv4.conf.all.send_redirects. So a quick sysctl net.ipv4.conf.all.send_redirects=0 and the redirects went away and all was right in the world.

Change Proxmox VM ID with ZFS root storage

I recently started using proxmox to run a few VMs and one issue I quickly ran into was wanting to change the ID of a VM I had already created. After a bit of googling I found a solution however it did not work for me because it assumes you are using LVM as your root storage. In my case, I love ZFS so that's what I run on most of my systems. Like docker with containers, proxmox will store VM disks as separate ZFS datasets. With a slight tweak to his code, you can easily change the ID of your VMs when you're using ZFS.

I've written a little script you can put on your proxmox server and after you make sure your VM is not running, a quick call will change the VM id: ./change-vm-id 103 199. The basic process is: rename ZFS datasets for that VM, change the VM disk IDs in the conf file, and then rename the conf file. Once you do this, it should show up in the proxmox GUI with the new ID right away.

#!/bin/bash

POOL=rpool

if [[ $# != 2 ]]; then
  cat <<-END >&2
usage: $0 old-id new-id
END
  exit 1
fi

old_id=$1
new_id=$2

if ! disks=$(zfs list -r -o name $POOL/data | grep "vm-${old_id}-disk"); then
  echo "did not find any disks, check old vm id and running zfs" >&2
  exit 1
fi

for disk in $disks; do
  new_disk=$(echo $disk | sed "s/vm-${old_id}-disk/vm-${new_id}-disk/g")
  zfs rename $disk $new_disk
done

sed -i "s/vm-${old_id}-disk/vm-${new_id}-disk/g" /etc/pve/qemu-server/${old_id}.conf
mv /etc/pve/qemu-server/${old_id}.conf /etc/pve/qemu-server/${new_id}.conf

Linux ZFS Root & Datasets & systemd-networkd

With my servers I prefer to have the root filesystem be a set of 2 SSDs in a ZFS mirror. That way you get bit rot detection, snapshots before significant changes, separate datasets, and redundancy. I follow the openzfs guide with a few tweaks to set this up. Then I create a dataset at /c where I prefer to put all of my configuration files and then syslink to them at their original locations; however, this leads to an issue during boot for some services.

The issue is that only the root pool is mounted early on and nested datasets are only mounted as part of the local-fs.target chain. This causes an issue if the service is loaded before this systemd unit. In my case, I wanted my systemd-networkd configration files to be stored in /c but when systemd-networkd runs, /c isn't mounted so the syslinks are bad and won't be configured.

The solution is fairly simple, we want systemd-networkd to run after local-fs.target. To accomplish this, you'll want to run sudo systemctl edit systemd-networkd which will open the override.conf file for editing. Add the following, save, and exit.

[Unit]
After=local-fs.target

Now on your next boot, everything should work properly. This should work for most units but for generators (such as netplan.io), this won't work because they run very early in the systemd process

Plex GPU Transcoding in Docker on Debian 10

With Docker 19.03 adding native support for GPU passthrough and Plex support for GPU transcoding being reliable and stabe, it's now very easy to get both working together for some super duper GPU transcoding.

I installed an NVIDIA Quadro RTX 4000 in my 2U server recently and after installing all the packages required and one flag to docker, Plex was able to use the GPU.

nvidia-driver

Firstly, we'll install the latest nvidia drivers for Debian buster. If you're on stretch or earlier, you will have to install the nvidia drivers manually. ffmpeg (which Plex uses for transcoding) requires at least 418.30 so check what distro provides if you aren't on Debian.

echo "deb http://deb.debian.org/debian buster-backports main contrib non-free" | sudo tee /etc/apt/sources.list.d/buster-backports.list

sudo apt update

sudo apt install linux-headers-$(uname -r|sed 's/[^-]*-[^-]*-//')
sudo apt install -yt buster-backports nvidia-driver libcuda1 libnvidia-encode1 libnvcuvid1

Once you got that done you will have to restart to not have the nouveau get in the way, then once you're back up you should be able to run nvidia-smi and see your GPU.

nvidia-docker

Now onto nvidia and Docker. Install Docker if you haven't and then we'll add the nvidia-docker gpg key and apt repository.

curl -sL https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -sL https://nvidia.github.io/nvidia-docker/debian10/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update

sudo apt-get install -y nvidia-container-toolkit

sudo systemctl restart docker

plex

With docker restarted, you should be able to run docker run --gpus all nvidia/cuda:10.0-base nvidia-smi and see the same output as before on the host. Next step is to add the --gpus all (see usage here) to your Plex container. Note that docker-compose does not have support for GPUs yet so you will have to do this with docker run for your container. For example, my plex is launched with a command similar to this:

docker run --name plex --restart unless-stopped --gpus all --network=host --env VERSION=latest --volume /plex:/config --volume /media:/media linuxserver/plex

Once Plex is up you'll want to go to Settings > Transcoder and enable Use hardware acceleration when available and Use hardware-accelerated video encoding. Now start watching something and make sure it's transcoding. You should then be able to check nvidia-smi and see the process and (mostly memory) usage. You should see something like this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     On   | 00000000:03:00.0 Off |                  N/A |
| 45%   74C    P0    86W / 125W |   2159MiB /  7982MiB |     11%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     32212      C   /usr/lib/plexmediaserver/Plex Transcoder    1501MiB |
|    0     32870      C   /usr/lib/plexmediaserver/Plex Transcoder     342MiB |
|    0     39082      C   /usr/lib/plexmediaserver/Plex Transcoder     302MiB |
+-----------------------------------------------------------------------------+

As you can see, 4K streams are quite memory intensive. They are 4x a 1080p stream which is 4x a 720p stream. GPU transcoding is primarily memory limited. Secondarily limited by number of streams: consumer cards are limited to 2 by nVidia. There is a patch to remove that limit though. Thirdly, you're limited by the nvenc and nvdec FPS limit. This website has lots of information comparing various GPUs and their transcoding performance.

If you don't see Plex in there, you'll want to look at Plex Media Server.log and look for any debug messages that might show why it's unable to use your hardware for transcoding. In my case I had 2 error messages: Cannot load libnvidia-encode.so.1 and Cannot load libnvcuvid.so.1 which is why I you install those libraries earlier.

Enjoy your sweet GPU transcoding!

Improving NZBGet Performance

After finally getting fibre to the home my ISO download speeds went up dramatically compared to my old DSL connection and now the slowest part of the NZB download process was the unpacking part. I just had NZBGet downloading and unpacking onto my root filesystem (two SSDs in a ZFS mirror) so I figured I could move them to separate drives to see if it would improve performance. And the results were quite amazing.

I went out and bought 2 1TB Samsung 860 PRO drives then put them into my server and formatting them with ext4 (with journaling off), I swapped the downloading and completed directories for NZBGet.

ISO	Size	Root Unpacking	Scratch Unpacking
A	35.2GB	4:16 (256s)	1:20 (80s)
B	64.2GB	14:08 (848s)	1:46 (106s)

At first I couldn't believe it but after testing a few more times I confirmed the results. So if you find your unpacking process is slow then add some dedicated drives for the downloading and unpacking directories.

Next Hardware Project: Console Switcher

It's been a while since my last hardware project so I decided with the new games room in Shopify's Ottawa office that I would build a retro-gaming themed box to switch between the consoles we have hooked up to the TV. When equipping the room, I specifically picked the LG 60LB6500 because it had RS232 control with a 3.5mm TRS jack. I later learned that all LG TVs support RS232 either through a DB9 port, 3.5mm TRS, or via USB serial adapter; however, the USB serial adapter cannot turn the TV on because the USB port does not provide power if the TV is off.

The first piece I wanted to find was an enclosure as it is the most important part of this project because I had a very specific design in mind. After searching eBay on and off for a week, I finally found one that was exactly what I wanted: short but wide with a sloped front for the buttons so you can easily press them.

Enclosure

Next was to find buttons that I liked. The first ones that I liked were from Adafruit but they were a bit on the small side at 16mm in diameter and they were only available in 4 colours: blue, green, red, white. I really wanted unique colours for 5 buttons: power, Xbox, PlayStation, Wii, and Composite. I continued my search and eventually I found the anti-vandal line of switches from E-Switch. They looked very much like the switches from Adafruit, so much so I believe the ones from Adafruit are actually Chinese knock-offs given their cheap price and from identical ones I've seen on AliExpress. Their PV7, PV8, and PV9 lines looked the most promising as they were large and had 6 colours. It was harder than I thought finding a distributor that actually supplied all 6 colours: blue, green, red, yellow, white (as well as orange). Eventually I found Straight Road Electronics. They carry the white, red, orange, green, blue, yellow switches that I wanted. The switches are definitely on the pricey side at $22 each.

PV7 Switch

With the buttons picked it was time to move onto the design. Based on the measurements I started on the sloped part that would lay out the 5 buttons and volume knob. I'm using QCAD (well worth the £33 for the pro version). After a bajillion (actually like 40-50) iterations this is the design I've settled on:

Sloped Design Sloped

My plan is to find a company in Ottawa that can laser cut and etch the aluminum case otherwise I will use the Epilog Mini 24 at the Ottawa Imagine Space and drill the holes myself. The back of the case needs to be laser cut and etched as well for the power and 3.5mm TRS jacks. I also decided to laser etch the front which means I had to come up with a name. Tom suggested CNSL and I added SWTCR to make CNSL SWTCR. The name has grown on me. What has also grown on me is the yellow colour in the drawing. The final look definitely needs to include the yellow colour. Normally laser etched aluminum will keep its natural (or regain in this case) look but you can fill the grove left with paint and then wipe away the extra. Watch this video if you are curious.

Rear Rear[/caption]
Front Front

While designing the layouts I was also trying to find the perfect volume knob. I've ordered a few because there are a lot of different designs but my current favourite is simple with just a notch as the indicator and a reasonably large diameter of 35mm. It looks like the top is a different colour but the whole thing is black which matches the enclosure.

Volume Knob

And for the brains, my microcontroller of choice is the Teensy. It's faster than an Arduino with more inputs and outputs. Combining the Teensy with a Darlington Transistor Array and a TTL to RS232 transceiver is all of the electronics needed to connect the switches, potentiometer, switch LEDs, and the TV. It took me longer to find the term "transistor array" than I'm willing to admit but I did find it and it will be very useful for handling turning on up to 7 LEDs in a compact DIP package that will fit easily on the prototype board I picked.

I've previously used the Teensy for 2 hardware projects (USB emergency stop button and traffic lights for stairs) and highly recommend it.

With all the parts ordered all I can do now is wait for them to come from around the world and tweak the design while trying to find someone to help laser cut and etch my design onto the unique shape of the case. I plan on taking lots of photographs during and documenting the build process for anyone to reproduce on their own.

MTU and TCP MSS when using PPPoE

I switched to Bell from Rogers about half a year ago. A goal I had was to remove their router and use my own EdgeRouter Pro. Once I got the PPPoE connection up I was able to ping the rest of the world but couldn't load most websites. Eventually I found I had to adjust the MTU and add MSS clamping to get everything to work. At the time just blindly used MTU and MSS clamp values I found online. They turned out to be correct but last night I decided to experiment and research to find the correct values I should be using.

Finding the MTU

First you should understand that almost all networking gear has their Maximum transmission unit set to 1500 bytes for each interface. The Ethernet header overhead (18 bytes¹) is not included in this. This means that the payload inside the Ethernet frame can be at most 1500 bytes long.

What goes inside the payload of the frames depends on what you are doing. If you are pinging an IP, it would be a ICMP packet inside an IP packet so to figure out the largest ICMP packet size you can use, you subtract the size of the IP header (20 bytes²) and the ICMP header (8 bytes) from the MTU: 1500 - 20 - 8 = 1472.

Throw in some PPPoE

Now if you tried to ping with the Don't fragment (DF) flag set, a packet size of 1472 should work and a packet size of 1473 should not work. Like this (on Linux):

$ ping -M do -s 1473 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 1473(1501) bytes of data.
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500

$ ping -M do -s 1472 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 1472(1500) bytes of data.
1480 bytes from 8.8.8.8: icmp_seq=1 ttl=51 time=1.27 ms
1480 bytes from 8.8.8.8: icmp_seq=2 ttl=51 time=24.3 ms
1480 bytes from 8.8.8.8: icmp_seq=3 ttl=51 time=1.31 ms
1480 bytes from 8.8.8.8: icmp_seq=4 ttl=51 time=1.77 ms

That is unless you're connecting over PPPoE. If you are using PPPoE you will find that your ping will fail with a packet size of 1472. This is because PPPoE has its own packet header of 8 bytes. If you subtract the PPPoE header from our previous value you will get the actual largest ICMP packet size: 1472 - 8 = 1464. Now you can try pinging with the new packet size, like this (on Mac):

$ ping -D -s 1465 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 1465 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1
ping: sendto: Message too long
Request timeout for icmp_seq 2
ping: sendto: Message too long
Request timeout for icmp_seq 3

$ ping -D -s 1464 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 1464 data bytes
1472 bytes from 8.8.8.8: icmp_seq=0 ttl=59 time=6.844 ms
1472 bytes from 8.8.8.8: icmp_seq=1 ttl=59 time=7.066 ms
1472 bytes from 8.8.8.8: icmp_seq=2 ttl=59 time=7.066 ms
1472 bytes from 8.8.8.8: icmp_seq=3 ttl=59 time=7.229 ms
1472 bytes from 8.8.8.8: icmp_seq=4 ttl=59 time=7.081 ms

What is MSS clamping?

Normally your computer will be able to determine a safe MTU using Path MTU Discovery (PMTUD) but this relies on your ISP actually sending back ICMP Too Big packets. Unfortunately Bell has decided (in their infinite wisdom) that this is not a good thing (probably under the guise of "security") so they leave you high and dry because your TCP connections may end up as "black hole connections"; this happens when the TCP handshake works but trying to send any data just gets dropped silently on their side.

The solution for this is called MSS clamping. You use your firewall to override the Maximum Segment Size (MSS) option on all TCP connections so they do not have issues with packets being too large. To figure out the MSS you want, you take the standard 1500 MTU and subtract the PPPoE header, the IP header, and the TCP header (20 bytes³): 1500 - 8 - 20 - 20 = 1452.

EdgeRouter

If you have an EdgeRouter, you'll want the following configuration options to set the MTU for your PPPoE connection and MSS clamping, where eth0 is the interface you are using and vif 35 is for VLAN 35.

set firewall options mss-clamp interface-type pppoe
set firewall options mss-clamp mss 1452
set interfaces ethernet eth0 vif 35 pppoe 0 mtu 1492

Conclusion

Blindly following values I found posted online worked but I wasn't satisfied. After some experimenting and reading Wikipedia, I now am confident in 1492 as the MTU and 1452 for the TCP MSS, and I understand why they work.

Notes:

Ethernet frame headers start at 18 bytes long, grow to 22 bytes with VLAN tagging, and 26 bytes with Q-in-Q VLAN tagging.
IP packet header start at 20 bytes long and can be up to 60 bytes if there are options specified; however, it is rarely used.
Like IP, TCP packet headers start at 20 bytes long and can be up to 60 bytes if there are options.

Clearing EX4200 PEM chassis alarms

In preparing for Black Friday we installed the UPS on the floor where our Operations but the final step of that would mean unplugging the switches and moving them to the PDU powered by the UPS because they don't have redundant power supplies. We do have extras from a floor that is not complete yet so we borrowed a normal one and a PoE one and went about plugging the extra in and swapping the switch to UPS power. This however lead to minor alarms on the switch about the power supply being removed.

--- JUNOS 12.3R6.6 built 2014-03-13 06:58:47 UTC
4 alarms currently active
Alarm time               Class  Description
2014-11-27 22:39:02 UTC  Minor  FPC 0 PEM 1 Removed
2014-11-27 22:45:21 UTC  Minor  FPC 1 PEM 1 Removed
2014-11-27 22:51:55 UTC  Minor  FPC 2 PEM 1 Removed
2014-11-27 22:57:30 UTC  Minor  FPC 3 PEM 1 Removed

A minor annoyance and unfortunately Googling around for a solution did not reveal one. I can't recall what put me down this path but I happen to try restart chassis-control gracefully. This removed the minor alarms about the power supply without any downtime so I was happy to have found this solution.