Run AI models at the edge

Applicable Role: Provider and Consumer

Description

AI and ML workloads often rely on centralized cloud infrastructure for training and inference. This requires data to be transferred from source systems to the cloud, increasing network usage, latency, and energy consumption.

Running AI models closer to where data is generated or consumed, such as on edge devices or local infrastructure, reduces data movement and enables more efficient processing. This is especially relevant for real-time, high-frequency, or latency-sensitive inference workloads.

Providers also deploy edge inference capabilities through on-device ML SDKs and embedded models, making this pattern applicable to both roles.

Solution

Deploy models on edge devices or local infrastructure to reduce data transfer to centralized systems
Perform data preprocessing tasks such as filtering, cleansing, and feature generation locally
Use edge inference for real-time or latency-sensitive applications
Limit transmission of raw data by sending only necessary or aggregated results to the cloud
Evaluate hybrid architectures that combine edge and cloud processing based on workload requirements
For applications using external AI services, consider on-device or local inference options to reduce repeated remote calls

SCI Impact

SCI = (E × I) + M per R

E (Energy): Reduced data transfer and localized processing lower energy consumption associated with network and centralized compute.

I (Carbon Intensity): Edge devices run on local power grids which may have different carbon intensity than centralized data center regions. This should be measured and factored into the SCI calculation.

M (Embodied Carbon): Edge deployments may increase device footprint, but can reduce reliance on large centralized infrastructure.

Cost Impact

Cloud compute costs: Reduced by moving inference to edge devices
Network costs: Lower data transfer to centralized systems
Edge device costs: Increased due to deploying hardware at the edge
Model management costs: Higher due to complexity of distributed model updates
Trade-off: Cloud cost savings offset by edge device and management overhead

Assumptions

Edge or local devices have sufficient memory, compute capacity, and power to run the target model without requiring additional optimization
Workloads can be partitioned effectively between edge and cloud

Considerations

Embodied emissions of edge devices must be accounted for
Edge environments may have limited compute and storage capacity
Model updates and lifecycle management can be more complex in distributed systems
Not all workloads are suitable for edge deployment
Carbon intensity of edge locations versus cloud regions should be compared
Trade-offs between latency, cost, and carbon should be evaluated

Description​

Solution​

SCI Impact​

Cost Impact​

Assumptions​

Considerations​

References​