Signal

Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Nearly Minimax Optimal Rate

arXiv:2412.07971v2 Announce Type: replace-cross Abstract: In distributed training of machine learning models, gradient descent with local iterative steps, commonly known as Local (Stochastic) Gradient Descent (Local-(S)GD) or Federated averaging (FedAvg), is a very popular method to mitigate communication burden.

rss
gradient_descent
Evidence locked
Today's free sample is only available for the edition's flagship signal.
Evidence preview
  • Gradient Descent with Projection Finds Over-Parameterized Neural Networks for Learning Low-Degree Polynomials with Ne...
    arXiv stat.ML RSS
  • Effectiveness of Distributed Gradient Descent with Local Steps for Overparameterized Models
    arXiv stat.ML RSS