Automatic Cross-Replica Sharding of Weight Update in Data-Parallel Training – PaperGrep https://papergrep.dev/paper/automatic-cross-replica-sharding-of-weight-update-2747b7