12

图解大模型训练之:Megatron源码解读2,模型并行

 11 months ago
source link: https://www.6aiq.com/article/1685770304796
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

AIWeekly

via Mac OS
实时周报:https://github.com/cbamls/AI_Tutorial
  •  0 回帖  •  16 浏览  •  3 天前

图解大模型训练之:Megatron源码解读2,模型并行

image-8aa2565bf8bb42b6a8276b89f333f5a7.png-imageStyle
源码解读系列将和大家一起来读Megatron的pretrain部分代码。 在源码解读第一篇中,我们讲解了如何做分布式环境初始化 ,即按照DP/TP/PP对进程进行分组,并为每个进程指定GPU。在这一章中,我们将一起读模型并行 部分:如何切分模型,并搬入分布式环境定义好的

图解大模型训练之:Megatron源码解读2,模型并行


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK