Signal

Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

arXiv:2412.07971v2 Announce Type: replace-cross Abstract: In distributed training of machine learning models, gradient descent with local iterative steps, commonly known as Local (Stochastic) Gradient Descent (Local-(S)GD) or Federated averaging (FedAvg), is a very popular method to mitigate communication burden.

rss

gradient_descent

Evidence locked

Today's free sample is only available for the edition's flagship signal.

Back Unlock Pro

Evidence preview

Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Ne...
arXiv stat.ML RSS
Effectiveness of Distributed Gradient Descent with Local Steps for Overparameterized Models
arXiv stat.ML RSS