Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism – PaperGrep https://papergrep.dev/paper/megatron-lm-training-multi-billion-parameter-4621c1