Signal
Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate
arXiv:2412.07971v2 Announce Type: replace-cross Abstract: In distributed training of machine learning models, gradient descent with local iterative steps, commonly known as Local (Stochastic) Gradient Descent (Local-(S)GD) or Federated averaging (FedAvg), is a very popular method to mitigate communication burden.
rss
gradient_descent
Evidence locked
Today's free sample is only available for the edition's flagship signal.
Evidence preview
- Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Ne...arXiv stat.ML RSS
- Effectiveness of Distributed Gradient Descent with Local Steps for Overparameterized ModelsarXiv stat.ML RSS