大模型训练技术论文

文章来源地址https://uudwc.com/A/6X42x

A Reading List for MLSys

An Overview of Distributed Methods | Papers With Code

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

https://arxiv.org/abs/1910.02054

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

https://arxiv.org/abs/2104.04473

Reducing Activation Recomputation in Large Transformer Models

https://arxiv.org/abs/2205.05198

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

https://arxiv.org/abs/1909.08053

Fully Sharded Data Parallel: faster AI training with fewer GPUs

Fully Sharded Data Parallel: faster AI training with fewer GPUs Engineering at Meta -

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

https://arxiv.org/pdf/2006.16668.pdf

GSPMD: General and Scalable Parallelization for ML Computation Graphs

https://arxiv.org/pdf/2105.04663.pdf

Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training

https://arxiv.org/abs/2004.13336v1

原文地址:https://blog.csdn.net/bbbeoy/article/details/131323758

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请联系站长进行投诉反馈,一经查实,立即删除!

h
上一篇 2023年06月25日 19:30
语音录音转文字的方法使用过吗
下一篇 2023年06月25日 19:34