In:
Biometrika, Oxford University Press (OUP), Vol. 109, No. 4 ( 2022-11-29), p. 1033-1046
Abstract:
Many statistical estimators for high-dimensional linear regression are $M$-estimators, formed through minimizing a data-dependent square loss function plus a regularizer. This work considers a new class of estimators implicitly defined through a discretized gradient dynamic system under overparameterization. We show that, under suitable restricted isometry conditions, overparameterization leads to implicit regularization: if we directly apply gradient descent to the residual sum of squares with sufficiently small initial values then, under some proper early stopping rule, the iterates converge to a nearly sparse rate-optimal solution that improves over explicitly regularized approaches. In particular, the resulting estimator does not suffer from extra bias due to explicit penalties, and can achieve the parametric root-$n$ rate when the signal-to-noise ratio is sufficiently high. We also perform simulations to compare our methods with high-dimensional linear regression with explicit regularization. Our results illustrate the advantages of using implicit regularization via gradient descent after overparameterization in sparse vector estimation.
Type of Medium:
Online Resource
ISSN:
0006-3444
,
1464-3510
DOI:
10.1093/biomet/asac010
Language:
English
Publisher:
Oxford University Press (OUP)
Publication Date:
2022
detail.hit.zdb_id:
1119-8
detail.hit.zdb_id:
1470319-1
SSG:
12
Permalink