• 10 Posts
  • 94 Comments
Joined 11 months ago
cake
Cake day: July 29th, 2023

help-circle

  • I started as more “homelab” than “selfhosted” as first - so I was just stuffing around playing with things, but then that seemed sort of pointless and I wanted to run real workloads, then I discovered that was super useful and I loved extracting myself from commercial cloud services (dropbox etc). The point of this story is that I sort of built most of the infrastructure before I was running services that I (or family) depended on - which is where it can become a source of stress rather than fun, which is what I’m guessing you’re finding yourself in.

    There’s no real way around this (the pressure you’re feeling), if you are running real services it is going to take some sysadmin work to get to the point where you feel relaxed that you can quickly deal with any problems. There’s lots of good advice elsewhere in this thread about bit and pieces to do this - the exact methods are going to vary according to your needs. Here’s mine (which is not perfect!).

    • I’m running on a single mini PC & a Synology NAS setup for RAID 5
    • I’ve got a nearly identical spare mini PC, and swap over to it for a couple of weeks (originally every month, but stretched out when I’m busy). That tests my ability to recover from that hardware failure.
    • All my local workloads are in LXC containers or VM’s on Proxmox with automated snapshots that are my (bulky) backups, but allow for restoration in minutes if needed.
    • The NAS is backed up locally to an external USB that’s not usually plugged in, and to a lower speced similar setup 300km away.
    • All the workloads are dockerised, and I have a standard directory structure and compose approach so if I need to upgrade something or do some other maintenance of something I don’t often touch, I know where everything is with out looking back to the playbook
    • I don’t use a script or Terrafrom to set those up, I’ve got a proxmox template with docker and tailscale etc installed that I use, so the only bit of unique infrastructure is the docker compose file which is source controlled on Forgejo
    • Everything’s on UPSs
    • A have a bunch of ansible playbooks for routine maintenance such as apt updates, also in source control
    • all the VPS workloads are dockerised with the same directory structure, and behind NGINX PM. I’ve gotten super comfortable with one VPS provider, so that’s a weakness. I should try moving them one day. They are mostly static websites, plus one important web app that I have a tested backup strategy for, but not an automated one, so that needs addressed.
    • I use a local and an external UptimeKuma for monitoring, enhanced by running a tiny server on every instance that just exposes a disk free and memory free api that can be consumed by Uptime.

    I still have lots of single points of failure - Tailscale, my internet provider, my domain provider etc, but I think I’ve addressed the most common which would be hardware failures at home. My monitoring is also probably sub-par, I’m not really looking at logs unless I’m investigating a problem. Maybe there’s a Netdata or something in my future.

    You’ve mentioned that a syncing to a remote server for backups is a step you don’t want to take, if you mean managing your own is a step you don’t want to take, then your solutions are a paid backup service like backblaze or, physically shuffling external USB drives (or extra NASs) back and forth to somewhere - depending on what downtime you can tolerate.








  • I run two local physical servers, one production and one dev (and a third prod2 kept in case of a prod1 failure), and two remote production/backup servers all running Proxmox, and two VPSs. Most apps are dockerised inside LXC containers (on Proxmox) or just docker on Ubuntu (VPSs). Each of the three locations runs a Synology NAS in addition to the server.

    Backups run automatically, and I manually run apt updates on everything each weekend with a single ansible playbook. Every host runs a little golang program that exposes the memory and disk use percent as a JSON endpoint, and I use two instances of Uptime Kuma (one local, and one on fly.io) to monitor all of those with keywords.

    So -

    • weekly: 10 minutes to run the update playbook, and I usually ssh into the VPS’s, have a look at the Fail2Ban stats and reboot them if needed. I also look at each of the Proxmox GUIs to check the backs have been working as expected.
    • Monthly: stop the local prod machine and switch to the prod2 machine (from backups) for a few days. Probably 30 minutes each way, most of it waiting for backups.
    • From time to time (if I hear of a security update), but generally every three months: Look through my container versions and see if I want to update them. They’re on docker compose so the steps are just backup the LXC, docker down, pull, up - probs 5 minutes per container.
    • Yearly: consider if I need to do operating systems - eg to Proxmox 8, or a new Debian or Ubuntu LTS
    • Yearly: visit the remotes and have a proper check/clean up/updates

  • My ‘good reason’ is just that it’s super convenient - for backups and painlessly moving apps around between nodes with all their data.

    I would run plain LXCs if people nicely packaged up their web apps as LXC templates and made them available on LXCHub for me to run with lxc compose up, but they generally don’t.

    I guess another alternate future would be if Proxmox added docker container supervision to their web interface, but you’re still not going to have the self-contained neat snapshot system that includes the data.

    In theory you should be able to convert an OCI container layer by layer into an LXC, so I bet there’s projects out there that attempt this.




  • I routinely run my homelab services as a single Docker inside an LXC - they are quicker, and it makes backups and moving them around trivial. However, while you’re learning, a VM (with something conventional like Debian or Ubuntu) is probably advised - it’s a more common experience so you’ll get more helpful advice when you ask a question like this.















  • Your workload (a NAS and a handful of services) is going to be a very familiar one to members of the community, so you should get some great answers.

    My (I guess slightly wacky) solution for this sort of workload has ended up being a single Docker container inside an LXC container for each service on Proxmox. Docker for ease of management with compose and separate LXCs for each service for ease of snapshots/backups.

    Obviously there’s some overhead, but it doesn’t seem to be significant.

    On the subject of clustering, I actually purchased three machines to do this, but have ended up abandoning that idea - I can move a service (or restore it from a snapshot to a different machine) in a couple of minutes which provides all the redundancy I need for a home service. Now I keep the three machines as a production server, a backup (that I swap over to for a week or so every month or two) and a development machine. The NAS is separate to these.

    I love Proxmox, but most times it get mentioned here people pop up to boost Incus/LXD so that’s something I’d like to investigate, but my skills (and Ansible playbooks) are currently built around Proxmox so I’ve got a bit on inertia.