• 1 Post
  • 2 Comments
Joined 1 year ago
cake
Cake day: July 12th, 2023

help-circle
  • I will be keeping an eye on this thread to see what other people do, but what I have done in the past is to have a couple different health checking strategies.

    • For web-accessible services I am running, I usually run something like Uptime Kuma or Gatus on a different box checking to make sure those web endpoints are available and performant. I lately have been really digging how Gatus can check more than just the response header, but also latency and certificate validity.
    • For the host machine, you can set up custom alerts within netdata for stuff like cpu utilization and memory with custom thresholds. The only other solution I have used for this in the past is setting up alerts through my VPS provider (if it is a VPS that is).
      • On really low-spec machines I have had trouble with netdata though, so I don’t have a good solution in those cases. Interested to see if there are less demanding options. Instead, I have resorted to just using dashdot as a PWA so that I can check it easily on my phone if I am on the go.
    • For some custom services in the past that run on set schedules, I have used healthchecks.io (which you can selfhost) to send alerts in the case that they don’t run for some reason.
    • As for the containers being restarted, I actually don’t have experience with that, so I am interested to see what others have done.