Skip to main content

Use right-sized and energy-efficient AI models

Applicable Role: Provider and Consumer

Description

AI and ML models vary significantly in size, architecture, complexity, and resource requirements. Larger models typically require more compute, memory, and storage, leading to higher energy consumption during both training and inference.

Using models that are appropriately sized and architecturally efficient for the task avoids unnecessary resource usage. This includes selecting smaller or task-specific models, choosing energy-efficient architectures at equivalent capability levels, and applying optimization techniques to reduce model footprint without sacrificing required performance.

Solution

  • Select smaller or task-specific models where they provide sufficient performance
  • Choose base models that provide the required capability with lower compute requirements
  • Prefer optimized or distilled versions of larger models for fine-tuning and inference
  • Apply model compression techniques such as quantization, pruning, and knowledge distillation
  • Remove redundant or inactive parameters where possible
  • Evaluate model options based on both performance and energy efficiency before selection
  • Continuously evaluate newer model variants that offer improved efficiency
  • Avoid defaulting to the largest available model when simpler alternatives can achieve similar outcomes

SCI Impact

SCI = (E × I) + M per R

E (Energy): Smaller or optimized models reduce compute requirements, memory usage, and data movement during training and inference.

M (Embodied Carbon): Reduced infrastructure and storage needs lower embodied emissions over time.

R (Functional Unit): When the functional unit is per inference or per token, right-sizing a model reduces the energy cost per functional unit, directly lowering the SCI score. However, if optimization reduces output quality and more functional units are needed to achieve the same outcome, the net effect on SCI should be evaluated.

Cost Impact

  • Compute costs: Reduced due to smaller model sizes and faster inference
  • Infrastructure costs: Lower due to reduced memory and storage requirements
  • Benchmarking overhead: May add cost for performance testing across model variants
  • Trade-off: Optimization for efficiency may require initial investment in model compression tooling

Assumptions

  • Smaller or optimized models can meet the functional requirements of the application
  • Model performance can be validated against acceptable thresholds
  • Efficiency improvements do not significantly degrade output quality

Considerations

  • There is a trade-off between model size, accuracy, and efficiency
  • Some complex tasks may require larger models
  • Over-optimization can degrade performance
  • Fine-tuning larger models may be necessary for complex domain-specific tasks
  • Periodic re-evaluation is needed as workloads and models evolve
  • Benchmarking should include both performance and resource usage

References