Dynamic scaling based on demand

Description

Demand for resources depends on user load at any given time. However, most applications run without taking this into consideration, resulting in underused and inefficient resources. By scaling infrastructure dynamically based on actual demand, whether through predictable time patterns or reactive metrics, you can eliminate excess capacity and improve both energy and hardware efficiency.

Solution

Implement dynamic scaling using one or both of the following complementary approaches:

Approach 1: Time-Based Scaling for Predictable Patterns

Review workloads for predictable patterns and identify periods of low or no activity where resources can be scaled back:

Inspect CPU load over time to identify underutilized resources
Scale back resources during periods of low utilization (e.g., nights, weekends for business applications)
Consolidate underutilized Virtual Machines
Shut down resources on a schedule when not needed
Remove permanently idle resources

This approach works best when traffic patterns are predictable and consistent. Scaling based on time is simpler to implement and requires less sophisticated monitoring infrastructure.

Approach 2: Metrics-Driven Scaling for Reactive Demand

Scale infrastructure based on actual demand metrics that directly correlate with your application's load:

Demand Metrics Examples:

HTTP request rates
Message queue length
Database connection pools
Custom business metrics (e.g., active user sessions)
Cloud alerting events

Implementation:

Implement custom scalers tailored to your application's demand drivers
Use platform-specific tools (e.g., AWS Auto Scaling, Azure Auto Scale, GCP Autoscaler)
For Kubernetes: Use KEDA (Kubernetes Event-Driven Autoscaling) to build event-driven applications
Enable scaling down to zero when there is no demand (supported by KEDA and serverless platforms)
Implement cluster auto-scaling and bursting capabilities to scale infrastructure based on pod demand

Why metrics-driven scaling? By default, platforms like Kubernetes scale based on CPU and RAM utilization. However, it's often difficult to correlate your application's actual demand drivers with CPU and RAM utilization. Scaling on relevant metrics provides more precise resource allocation.

Combined Strategy

For optimal efficiency, use both approaches:

Time-based scaling for known low-traffic periods
Metrics-driven scaling for handling variable demand during active hours
This combination ensures resources are right-sized at all times

SCI Impact

SCI = (E * I) + M per R Software Carbon Intensity Spec

Dynamic scaling based on demand impacts SCI as follows:

E: Reduces electricity consumed by running only the resources needed for current demand
M: Reduces embodied carbon emissions by decreasing the total amount of hardware required to support the workload

Assumptions

For time-based scaling: Traffic patterns are predictable and consistent enough to define a schedule
For metrics-driven scaling: The metrics that drive application demand and scaling needs are known and measurable
The application architecture supports scaling (stateless or with proper state management)

Considerations

Consider starting with the easiest optimizations first (time-based scaling is simpler to implement)
If workloads have predictable usage patterns, time-based scaling may be sufficient and simpler than metrics-driven approaches
Metrics-driven scaling requires more sophisticated monitoring and configuration but provides better responsiveness to demand changes
Account for scaling delays (cold starts) when sizing minimum capacity
Test scaling behavior under various load conditions to ensure performance requirements are met
For Kubernetes: Combine horizontal pod autoscaling (HPA/KEDA) with cluster autoscaling for complete infrastructure optimization

Description​

Solution​

Approach 1: Time-Based Scaling for Predictable Patterns​

Approach 2: Metrics-Driven Scaling for Reactive Demand​

Combined Strategy​

SCI Impact​

Assumptions​

Considerations​

References​