Service Replicas and Autoscaler
High Availability and Fault Tolerance
Service Replicas
Service Replicas allows for multiple instances (or replicas) of your services, enhancing availability and fault tolerance of your backend infrastructure. By distributing user requests among replicas, your backend can handle more traffic and provide a better experience to your users.
Benefits
- Improved fault tolerance: Multiple replicas ensure that if one instance crashes duo to an unexpected issue, the other replicas can continue to serve user requests.
- Improved availability: Distributing user requests among multiple replicas allows your apps to handle more traffic and maintain a high level of performance.
- Load balancing: Distributing workloads evenly among replicas to prevent bottlenecks and ensure smooth performance during peak times.
Configuration
To setup replicas for your project, you can either use the Dashboard or the Config.
Autoscaler
The autoscaler is a powerful feature that dynamically manages the number of replicas for your services based on application load. This document explains how the autoscaler works and its key features.
Overview
When enabled, the autoscaler continuously monitors the CPU usage of your service. Its primary goal is to maintain an average CPU utilization of approximately 50% across all replicas.
How It Works
Scaling Up
If the CPU utilization exceeds 50% for a sustained period and the number of replicas is below the configured maximum, the autoscaler will increase the number of replicas.
Scaling Down
Conversely, if the average CPU utilization falls below 50% for a sustained period and the number of replicas is above the configured minimum, the autoscaler will decrease the number of replicas.
Key Features
- Automatic Management: Eliminates the need for manual scaling interventions.
- Load-Based Scaling: Scales based on actual CPU utilization, ensuring efficient resource use.
- Configurable Thresholds: Allows setting of minimum and maximum replica counts.
- Balanced Performance: Aims to maintain optimal performance by targeting 50% CPU utilization.
Example
Below you can see a graph illustrating the above with a configuration of 1 replica as minimum value and 10 as maximum:
For more details and some load testing demos you can refer to our blog post.
Configuration
To configure the autoscaler you can either use the nhost.toml
configuration file or the dashboard: