Skip to main content

Leverage pre-trained models and transfer learning

Applicable Role: Provider

Description

Training AI and ML models from scratch requires significant compute, data, and time, leading to high energy consumption and carbon emissions. In many cases, models can be initialized from pre-trained versions and adapted to specific tasks through fine-tuning.

Leveraging pre-trained models avoids redundant training effort and reduces the overall resource footprint of model development.

Solution

  • Select pre-trained models that are relevant to the target task
  • Fine-tune models instead of training from scratch where possible
  • Reuse existing model weights and representations to reduce training effort
  • Evaluate whether full training is necessary before starting new model development
  • Use domain-adapted or task-specific pre-trained models when available

SCI Impact

SCI = (E × I) + M per R

E (Energy): Avoiding full training significantly reduces compute and energy consumption.

M (Embodied Carbon): Reduced infrastructure usage lowers embodied emissions associated with training.

R (Functional Unit): For providers using per FLOP or per training token as the functional unit, transfer learning dramatically reduces the total FLOPs and tokens required, lowering total carbon (C) while R scales proportionally, resulting in a more favorable SCI score.

Cost Impact

  • Training costs: Dramatically reduced by avoiding full model training
  • Compute time: Significantly lower for fine-tuning vs. training from scratch
  • Pre-trained model licensing: Potential licensing costs for commercial model access
  • Data costs: May be lower if transfer learning requires less training data
  • Trade-off: Pre-trained model licensing may offset training cost savings

Assumptions

  • Suitable pre-trained models are available for the target use case
  • Fine-tuning can achieve the required performance

Considerations

  • Pre-trained models may introduce biases or limitations from their original training data
  • Fine-tuning large foundation models can still require substantial compute resources comparable to training from scratch; evaluate the true cost-benefit of fine-tuning vs. full training for your use case
  • Licensing and usage restrictions of pre-trained models must be evaluated
  • Model suitability should be validated for the specific domain

References