I’ve worked on several different teams over the past 8 years I’ve worked at Amazon. Each one of them had on-call in which the engineers were on-call to keep the system running 24/7 for a week. If something broke at 2am, they’d get paged to fix it.
Now, Amazon’s a big company. On-call varied quite a bit. Some teams had more ops load, others had barely any. I had my fair share of weeks with lots of tickets, but usually I sought out teams where it was more manageable. However, those engineers frequently struggled to get anywhere, playing a bit of on-call hot potato with the next on-call. Sadly, Amazon largely did not leverage SREs or dedicated support groups except for the most critical systems. I do wish they would have leveraged them.
Here’s my strategy that I employed when I joined a new team.
I hope you know your AWS icons. There’s over 200 services and I have to guess frequently when playing the AWS Logo Quiz. While this diagram could easily add some descriptive labels to help, the icons assume developers can remember what the icon means. Some color-blind people may even struggle to see the difference in coloring that AWS uses for different types of services. These icons become visually cluttered and distract the viewer from what matters–your system.
I’ve previously explored the world of home energy monitoring systems and in the past arrived at using the Brultech GreenEye Monitor for a project in a friend’s house. It had the advantage of being local out-of-the-box and had a wide range of compact CTs that made fitting the electronics in the breaker box a lot easier, but it had one flaw that made it not suitable for my condo. It had to be mounted outside the breaker box with wires running into the box. I had no space in my condo, so I instead explored other options.
I came across the Emporia Vue2 and identified that it was running a standard ESP32 device and was easy to reflash with custom ESPHome firmware. ESPHome is an open-source framework for creating firmware to collect data from a variety of different sensors and publish it to MQTT/Home Assistant. This sounded perfect, so I ordered a Vue2 and here’s how I made it work.
I upgraded to Apache Zeppelin v0.10.x from v0.9.x and randomly my Python Matplotlib scripts stopped rendering images. Anything that called the plot method would just return the string response of the function. Like below:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
[<matplotlib.lines.Line2D at 0x7ff547624210>]
In an earlier post, I made an error that incorrectly aggregated the energy data which resulted in hugely inflated aggregated energy usage. All the un-aggregated data was accurate, but the sums were wrong. Luckily I had all the raw data stored in InfluxDB and could rebuild it.
In this post, I walk through how to re-write the Home Assistant Long-term statistics database to fix this mistake.
My external cluster runs on 3 different dedicated servers (most from SoYouStart.com.) I have 3 machines since the Kubernetes control plane needs 3 or more to be able to have a quorum and be able to handle any one machine going down. If one machine goes down, then the other two maintain a majority and can agree on the state of the cluster.
I randomly encountered issues where the Kubernetes control plane of Rancher UI would crash and restart. While this cluster didn’t really matter, it still annoyed me and wanted to figure it out.
I narrowed it down to one single host and documented the steps I took to resolve this issue which seems to be have been caused by one machine using HDDs and all other hosts using SSDs.
Air—it’s invisible, I can’t see it, but I feel effects of it in so many ways, temperature, humidity, gas composition, but I lacked sensors to measure it. In this post, I walk through some different Air Quality sensors that I found and how I wired them up into a dashboard.
In the previous post in this series, I selected an energy monitoring system that is purely local based (no cloud), integrates into the breaker box, and showed how to connect it to the network and configure the size of each circuit. In this post, I’ll show how to connect the BrulTech GreenEye Energy Monitor to HomeAssistant and create some useful monitoring dashboards.
There were a few services that I ran that I wanted to be able to access from both inside my home network and outside my home network. If I was inside my home network, I wanted to route directly to the service, but if I was outside I needed to be able to route traffic through a proxy that would then route into my home lab. Additionally, I wanted to support SSL on all my services for security using cert-manager
Since my IPv4 addresses differ inside my network vs outside, I need to use split-horizon DNS to respond with the correct DNS query. Split-horizon DNS refers to the DNS on one horizon (inside the network) showing different results than outside the network.