Nextcloud Deployment

Problem

The current deployment of our Nextcloud instance is a single Instance that can only scale so far. With an increasing userbase, the current setup could become overwhelmed. We are also facing issues with our current CORS setup and images sometimes take very long to load. Since Nextcloud also hosts all background images for maps, profile pictures, etc. a solution to scale out our nextcloud infrastructure must be decided on.

Constraints

  1. Data Storage: Barring more complex and costly real time data duplication setups, the data that Nextcloud is accessing is located in only one place and has to be available to all instances of Nextcloud.
  2. Support for at least 3.000, easily scalable to 10.000 users
  3. Be in line with official recommendations of Nextcloud deployments

Assumptions

  1. We assume we have to support more than 3.000 users, up to 10.000. But we also assume they are not so active than normal Nextcloud users as in a business, so we implement only "Mid-sized Enterprises" for 1.000 concurrent users.
  2. We assume the official recommendations of Nextcloud deployments are correct.
  3. We can easily migrate from everything in a single VM to separated VMs/cluster/...

Solutions

Image

AIO

Image

The Nextcloud AIO image contains everything needed to run an instance. It is the officially endorsed way to install nextcloud in a docker environment. Due to this, the AIO image is not really suited to deployments of multiple load balanced instances.

Custom Image

Image

The custom image is a community supported docker image of nextcloud, that offers more versitility. Instead of packageing everything into a single docker image, this image lets the administrator pick and choose which components to combine.

Databases

Option 2: Postgres

Postgres is a SQL database with official support from the Nextcloud team. It is included in the AIO Docker image of Nextcloud.

Option 1: My-SQL / Mariadb

MySQL or its successor MariaDB, is a classic relational SQL database with official support from the Nextcloud team. It is also recommended by the custom setup guide for Nextcloud.

Deployment

Option 1: Assign more resources to the current instance

Assigning more resources to the currently running instance would somewhat resolve performance issues, but this solution does not scale well. If we hit a limit regarding the capabilities of the instance, e.g: having too many users concurrently accessing Nextcloud, the process of changing the deployment could cause outages.

This option would provide the greatest capacity to our nextcloud environment. It is also the most expensive option, as running multiple instances of nextcloud on different VMs requires more compute/memory resources.

This option may not make much sense for our current workload, but gives us the greatest flexibility and would easily allow us to scale out in the future. This option would also require setting up the required scripts and processes to be able to quickly add new instances in the future.

Reverse Proxy / Load Balancer

The performance comparisons in this section are based on this benchmark.

Option 1: Caddy

Caddy is a reverse proxy solution, that markets itself on its ease of use and simple config. It is often recommended for beginner users for this reason. This reverse proxy does theoretically offer more advanced features, but getting those to work has proven difficult. It also shows the worst performance of the given options.

Option 2: Traefik

Traefik offers the same set of features as caddy and more, but comes with a in depth configuration system, that new users will have to take more time to learn. Its label-based configuration option offers the ability to store the traefik configuration for each service directly in the docker-compose file. It also performs better than caddy.

Option 3: NGiNX

NGiNX is a web server that also offers reverse proxy capabilities. It has established itself as an industry standard tool and is also used in enterprise environments which both Caddy and Traefik are not. Because of it's widespread use, it also sees a lot more development and boasts a larger set of features and improved performance.

Option 4: HAProxy

HAProxy is designed as a dedicated load balancer. It offers only a reduced set of features compared to NGiNX, as it is not able to e.g. serve static content but it makes up for this by improving on NGiNX's performance in the loadbalancing / reverse proxy use case.

Decision

Image

Since the AIO is geared towards smaller instances which are not load balanced, the decision is to stay with the custom docker image, as it offers more flexibility in the deployment.

Database

The community seems to be split about this issue, with one half preferring Postgres and the other half preferring MariaDB. Most of the reasons given are related to familiarity with one or the other. There are some reports of third-party apps not being compatible with one or the other, but since we do not use such apps, this should not be an issue for us. The main reason this can occur is, that nextcloud lets every app decide for itself, which databases they support. One thing still stands out: Postgres is included in the AiO image but MariaDB is recommended for the custom setup. As this discussion indicates however, this seems to also be about familiarity on Nextcloud's side. Also, this maintainer of the Nextcloud AiO image states that postgres actually performs better.

After considering these facts, the best choice is to stay with postgres. The main reasons are:

  • There is no definite favorite among the comminuty
  • In terms of performance, little evidence exists either way and the evidence that did turn up favored postgres
  • A migration would confer significant effort that is better spent elsewhere and would yield no benefit

The only reason to rethink this decision would be in one of the following scenarios:

  • MariaDB receives updates, that improve performance significantly above that of postgres
  • We require the use of an app that does not support postgres

Deployment

The decision is to implement the 3rd option. This means setting up proxy and following nextcloud's recommendations on running multiple instances, but only having one active instance currently running.

Although the overhead for option 3 is greater than it needs to be for our current needs, it does not impact the performance of the system, since a reverse proxy is already in use. This reverse proxy can be replaced by HAProxy. Since this option also allows for quick and efficient scaling in the future, it also makes sense from a future-proofing standpoint. It is also worth mentioning, that using the AiO image is not feasible, as every instance comes packaged with it's own database, therefore making it impossible to scale out to multiple instances.

Reverse Proxy / Load Balancer

The best option for load balancing is option 4: HAProxy, since performance is our main concern when it comes to load balancing. Since HAProxy offers no built in way to automatically update it's SSL certificates, another tool like the industry standard certbot must be deployed alongside it to manage this task