Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well – PaperGrep https://papergrep.dev/paper/stochastic-weight-averaging-in-parallel-large-85daaf