I am developing a platform, the details don’t matter, but it’s a system the hosts personal data. As a result, I want to avoid hosting users in any way, and I am trying to make it as easy to self-host as possible.

I have some experience self hosting applications and I have some intuuition what to do or don’t, but I wanted to see if I can pull from the collective wisdom.

Got any good resources to share? Any tips? Or, maybe some bad experiences or things to avoid?

  • Helix 🧬@feddit.de
    link
    fedilink
    English
    arrow-up
    14
    ·
    edit-2
    8 months ago

    Ideas what you can do. These are all SHOULD and not MUST requirements, so pick and choose what you can reasonably do in a realistic timeframe without overburdening yourself. Some of these steps can be outsourced to your community.

    You can try to make a twelve factor app but some of their advice is probably not suited for your application. You will end with some 7.5factor app which is fine.

    Follow SemVer and provide detailed instructions for upgrading major versions.

    Use a build system which is easily installable and a language where you don’t have to upgrade dependencies every second for security issues (looking at you, npm/nodejs).

    Don’t include a webserver which does HTTPS, let the people run their own reverse proxy.

    Test your setup with and provide multiple web server configs for nginx, Apache2, Caddy, Traefik.

    Test your setup with and provide multiple default configs for bare metal (with a dependency manager), Docker, Podman, Kubernetes, Kata Containers.

    If you need a DB, include the possibility to migrate from a self contained one instance SQLite to a multi container pgsql/MySQL setup.

    Write database migrations in both directions so people can downgrade on failures.

    Make it possible to configure your system via ENV variables, ENV files and config files. Provide instructions on best practices and sane defaults. Explain these defaults and make clear configuration is optional.

    Make it possible to disable authentication to add Authelia or LDAP through the webserver. Make clear that this is only to be used for external authentication.

    Make it possible to run multiple parallel instances of your software without affecting the database consistency, e.g. for high availability or horizontal scaling.

    Provide a versioned, documented API (does not need to be public) and use it yourself for your frontend. Provide a telemetry endpoint which is human readable and machine readable, so Prometheus or a similar system can scrape it.

    • souperk@reddthat.comOP
      link
      fedilink
      arrow-up
      2
      ·
      8 months ago

      twelve factor app

      Great resource!

      Write database migrations in both directions so people can downgrade on failures.

      Good point. Personally, I take backups before upgrades and restore if anything goes wrong. But, I understand how downgrading sometimes is just easier.

      I have trouble coming up with a migration procedure that makes sense to me. I have the following in mind:

      1. Provide init scripts that produce a schema that matches beginning state of the current major.
      2. Provide major to major migration scripts.
      3. For every major, provide minor to minor migration scripts.
      4. Schema changes require at least a minor release.

      Make it possible to configure your system via ENV variables, ENV files and config files.

      I am bit worried about this one, environment variables can be a security concern. Specifically, I am not sure if I should allow providing secrets (like db connection strings) through environment variables. I am inclined to let people do what they want to, but issue a warning.

      Make it possible to disable authentication to add Authelia or LDAP through the webserver. Make clear that this is only to be used for external authentication.

      I am considering adding support for oauth through keycloak. My assumption is that if you are going to host your own LDAP, you can probably configure keycloak too. Do you think that makes sense?

      Make it possible to run multiple parallel instances of your software without affecting the database consistency, e.g. for high availability or horizontal scaling.

      Ideally, an instance shouldn’t be big enough to need it. I know, famous last words, but in my case I think it’s a bad problem to have. I am going out of scope, but I am wondering where is the line between discouraging large scale deployments and designing something pre-destined to obscurity.

      Telemetry

      Not even on my radar, thanks for bringing it into my attention 🙏

      • Helix 🧬@feddit.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 months ago

        Why require keycloak specifically? Maybe I want to use another authentication gateway.

  • mox@lemmy.sdf.org
    link
    fedilink
    arrow-up
    5
    ·
    8 months ago

    One thing that goes a long way toward making self-hosting easy is to minimise dependencies.

    In order of preference (best to worst):

    • Your language’s standard library.
    • Those that are installed by default on most linux distros.
    • Those that are available in the main package repos of most distros.
    • Those that come from a community package archive. (AUR, PyPI, etc.)

    Mind the version numbers, too; try not to depend on library features that aren’t widely packaged/deployed yet.

    Bonus points for supporting multiple OS, like the various BSD flavours.

    Being conservative with dependencies makes it more likely that someone will be willing to install, package, or administer your software. It also helps limit the attack surface, potentially avoiding exploits in the future.

    • souperk@reddthat.comOP
      link
      fedilink
      arrow-up
      2
      ·
      8 months ago

      Great point, I always consider dependencies from a security perspective, but for management/setup sometimes I am like “the devops are going to figure it out”…

      To clarify, would an example be supporting sqlite, so people won’t have to deploy postgres unless they need to?

      My plan is to offer a docker-compose configuration people can tinker with. I had the mindset that whatever happens in the container stays in the container, but your comment made me realize I should be mindful of other installation methods. Thanks 🙏

      • mox@lemmy.sdf.org
        link
        fedilink
        arrow-up
        1
        ·
        8 months ago

        Supporting SQLite as an option for people with modest needs is not a bad idea. As long as you keep your SQL simple and avoid vendor-specific extensions, adding support for it at any point shouldn’t be difficult.

        Providing a Docker config is fine, but I would never lean on it as a substitute for conservative dependency choices and good build scripts. Many people don’t use it and never will. If you instead design your software to be easily built/installed/packaged natively for any distro, then it will reach more users, and as a side effect, will also be easy to package for just about any container system (Docker, Kubernetes, LXC, etc.)

  • h3ndrik@feddit.de
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    8 months ago

    I think you’ll have to learn a bit about security. There is no one article, but entire books written about that… And it really depends on the type of service, the used frameworks and the intended deployment.

    I’d have a look at similar software. There are tons of open source projects that handle sensitive information. From files like Nextcloud to contact sync to ticketing and payment information.

    Edit: I’d leave Docker as an afterthought, since some people recommend that. It’s deployment, not development. And not a means of stopping user data getting leaked or stopping login brute forcing.)

    • souperk@reddthat.comOP
      link
      fedilink
      arrow-up
      2
      ·
      8 months ago

      A good place to start is the owasp cheat sheet. They provide up-to-date, high value information about software security, I wish there was a resource like this when I started learning about security.

      Even though, I have a decent background in software security, it’s hard to decide on an encryption schema that’s both safe and easy to use. My goal is to increase the number of components an attacker has to compromise in order to get access to the data.

  • mox@lemmy.sdf.org
    link
    fedilink
    arrow-up
    2
    ·
    8 months ago

    Another thing to keep in mind is resource usage. Software with low RAM and CPU requirements will work well on a great variety of self-hosted server platforms. If your code runs well on an old Raspberry Pi (the original or maybe a Pi 2), it will probably do well in most other environments. This VPS list should give you a picture of low-end platforms that are in use out there.