We’ve seen an example of how to set up PostgreSQL monitoring in Kubernetes. We’ve looked at two sets of statistics to keep track of in your PostgreSQL cluster: your vitals (CPU/memory/disk/network) and your DBA fundamentals.
While starting at these charts should help you to anticipate, diagnose, and respond to issues with your Postgres cluster, the odds are that you are not staring at your monitor 24 hours a day. This is where alerts come in: a properly set up alerting system will let you know if you are on the verge of a major issue so you can head it off at the pass (and alerts should also let you know that there is a major issue).
Dealing with operational production issues was a departure from my application developer roots, but I looked at it as an opportunity to learn a new set of troubleshooting skills. It also offered an opportunity to improve communication skills: I would often convey to the team and customers what transpired during a downtime or performance degradation situation (VSSE: be transparent!). Some of what I observed I used to help us to improve to application, while other parts helped me to better understand how PostgreSQL works.
But I digress: let’s drill into alerts on your Postgres database.
Note that just because an alert or alarm is going off, it does not mean you need to immediately react: for example, a transient network degradation issue may cause replica to lag further behind a primary for a bit too long but will clear up when the degradation passes. That said, you typically want to investigate the alert to understand what is causing it.
Additionally, it’s important to understand what actions you want to take to solve the problem. For example, a common mistake during an “out-of-disk” error is to delete the PostgreSQL WAL logs with rm command; doing so can lead to a very bad day (and is also an advertisement for ensuring you have backups).
As mentioned in the post on setting up PostgreSQL monitoring in Kubernetes, the Postgres Operator uses pgMonitor for metric collection and visualization via open source projects like Prometheus and Grafana. pgMonitor uses open source Alertmanager for configuring and sending alerts, and is what the PostgreSQL Operator uses.
Using the above, let’s dive into some of the items that you should be alerting on, and I will describe how my experience as an app developer translated into troubleshooting strategies.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…