机器学习中的加速一阶优化算法
上QQ阅读APP看书,第一时间看更新

参考文献

Agarwal, Naman, Allen-Zhu, Zeyuan, Bullins, Brian, Hazan, Elad, and, Ma, Tengyu., (2017). Finding approximate local minima for nonconvex optimization in linear time[C]. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1195-1199, Montreal.

Allen-Zhu Zeyuan and Hazan Elad. (2016). Variance reduction for faster non-convex optimization[C]. In Proceedings of the 33th International Conference on Machine Learning, pages 699-707, New York.

Allen-Zhu Zeyuan and Li Yuanzhi. (2018). Neon2: Finding local minima via first-order oracles[C]. In Advances in Neural Information Processing Systems 31, pages 3716-3726, Montreal.

Allen-Zhu Zeyuan and Orecchia Lorenzo. (2017). Linear coupling: An ultimate unification of gradient and mirror descent[C]. In Proceedings of the 8th Innovations in Theoretical Computer Science, Berkeley.

Allen-Zhu Zeyuan. (2017). Katyusha: The first truly accelerated stochastic gradient method[C]. In Proceedings of the 49th Annual ACM SIGACT Symposium on the Theory of Computing, pages 1200-1206, Montreal.

Allen-Zhu Zeyuan. (2018). Natasha2: Faster non-convex optimization than SGD[C]. In Advances in Neural Information Processing Systems 31, pages 2675-2686, Montreal.

Beck Amir and Teboulle Marc. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J]. SIAM J. Imag. Sci., 2(1): 183-202.

Beck Amir and Teboulle Marc. (2014). A fast dual proximal gradient algorithm for convex minimization and applications[J]. Oper. Res. Lett., 42(1): 1-6.

Beck Amir. (2017). First-Order Methods in Optimization[M]. volume 25. SIAM, Philadelphia.

Berkson Joseph. (1944). Application of the logistic function to bio-assay[J]. J. Am. Stat. Assoc., 39(227): 357-365.

Bhojanapalli Srinadh, Neyshabur Behnam, and Srebro Nati. (2016). Global optimality of local search for low rank matrix recovery[C]. In Advances in Neural Information Processing Systems 29, pages 3873-3881, Barcelona.

Bottou Léon, Curtis Frank E, and Nocedal Jorge. (2018). Optimization methods for large-scale machine learning[J]. SIAM Rev., 60(2): 223-311.

Boyd Stephen, Parikh Neal, Chu Eric, Peleato Borja, and Eckstein Jonathan. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Found. Trends Math. Learn., 3(1): 1-122.

Bubeck Sébastien, Jiang Qijia, Lee Yin Tat, Li Yuanzhi, and Sidford Aaron. (2019). Near-optimal method for highly smooth convex optimization[C]. In Proceedings of the 32th Conference on Learning Theory, pages 492-507, Phoenix.

Bubeck Sébastien, Lee Yin Tat, and Singh Mohit. (2015). A geometric alternative to Nesterov's accelerated gradient descent[R]. Preprint. arXiv: 1506.08187.

Bubeck Sébastien. (2015). Convex optimization: Algorithms and complexity[J]. Found. Trends Math. Learn., 8(3-4): 231-357.

Carmon Yair, Duchi John C, Hinder Oliver, and Sidford Aaron. (2017). Convex until proven guilty: dimension-free acceleration of gradient descent on non-convex functions[C]. In Proceedings of the 34th International Conference on Machine Learning, pages 654-663, Sydney.

Carmon Yair, Duchi John C, Hinder Oliver, and Sidford Aaron. (2018). Accelerated methods for nonconvex optimization[J]. SIAM J. Optim., 28(2): 1751-1772.

Chambolle Antonin and Pock Thomas. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging[J]. J. Math. Imag. Vis., 40(1): 120-145.

Cortes Corinna and Vapnik Vladimir. (1995). Support-vector networks[J]. Mach. Learn., 20(3): 273-297.

Defazio Aaron, Bach Francis, and Lacoste-Julien Simon. (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives[C]. In Advances in Neural Information Processing Systems 27, pages 1646-1654, Montreal.

Domingos Pedro M. (2012). A few useful things to know about machine learning[J]. Commun. ACM, 55(10): 78-87.

Drusvyatskiy Dmitriy, Fazel Maryam, and Roy Scott. (2018). An optimal first order method based on optimal quadratic averaging[J]. SIAM J. Optim., 28(1): 251-271.

Fang Cong, Huang Yameng, and Lin Zhouchen. (2018a). Accelerating asynchronous algorithms for convex optimization by momentum compensation[R]. Preprint. arXiv: 1802.09747.

Fang Cong, Li Chris Junchi, Lin Zhouchen, and Zhang Tong. (2018b). SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator[C]. In Advances in Neural Information Processing Systems 31, pages 689-699, Montreal.

Fazlyab Mahyar, Ribeiro Alejandro, Morari Manfred, and Preciado Victor M. (2018). Analysis of optimization algorithms via integral quadratic constraints: Nonstrongly convex problems[J]. SIAM J. Optim., 28(3): 2654-2689.

Fercoq Olivier and Richtárik Peter. (2015). Accelerated, parallel, and proximal coordinate descent[J]. SIAM J. Optim., 25(4): 1997-2023.

Gambella Claudio, Ghaddar Bissan, and Naoum-Sawaya Joe. (2021). Optimization models for machine learning: a survey[J]. European J. Operational Research, 290(3): 807-828.

Ge Rong, Huang Furong, Jin Chi, and Yuan Yang. (2015). Escaping from saddle points-online stochastic gradient for tensor decomposition[C]. In Proceedings of the 28th Conference on Learning Theory, pages 797-842, Paris.

Ge Rong, Lee Jason D, and Ma Tengyu. (2016). Matrix completion has no spurious local minimum[C]. In Advances in Neural Information Processing Systems 29, pages 2973-2981, Barcelona.

Ghadimi Saeed and Lan Guanghui. (2016). Accelerated gradient methods for nonconvex nonlinear and stochastic programming[J]. Math. Program., 156(1-2): 59-99.

Hannah Robert, Feng Fei, and Yin Wotao. (2019). A2BCD: An asynchronous accelerated block coordinate descent algorithm with optimal complexity[C]. In Proceedings of the 7th International Conference on Learning Representations, New Orleans.

Haykin Simon. (1999). Neural Networks: A Comprehensive Foundation[M]. 2nd ed. Pearson Prentice Hall, Upper Saddle River.

Hazan Elad. (2016). Introduction to online convex optimization[J]. Found. Trends Optim., 2(3-4): 157-325.

Hazan Elad. (2019). Optimization for machine learning[R]. Technical report, Princeton University.

He Bingsheng and Yuan Xiaoming. (2010). On the acceleration of augmented Lagrangian method for linearly constrained optimization[J]. Optimization online. Preprint, 3.

Jain Prateek and Kar Purushottam. (2017). Non-convex optimization for machine learning[J]. Found. Trends Math. Learn., 10(3-4): 142-336.

Jin Chi, Netrapalli Praneeth, and Jordan Michael I. (2018). Accelerated gradient descent escapes saddle points faster than gradient descent[C]. In Proceedings of the 31th Conference On Learning Theory, pages 1042-1085, Stockholm.

Johnson Rie and Zhang Tong. (2013). Accelerating stochastic gradient descent using predictive variance reduction[C]. In Advances in Neural Information Processing Systems 26, pages 315-323, Lake Tahoe.

Kim Donghwan and Fessler Jeffrey A. (2016). Optimized first-order methods for smooth convex minimization[J]. Math. Program., 159(1-2): 81-107.

Lan Guanghui and Zhou Yi. (2018). An optimal randomized incremental gradient method[J]. Math. Program., 171(1-2): 167-215.

Lessard Laurent, Recht Benjamin, and Packard Andrew. (2016). Analysis and design of optimization algorithms via integral quadratic constraints[J]. SIAM J. Optim., 26(1): 57-95.

Li Huan and Lin Zhouchen. (2015). Accelerated proximal gradient methods for nonconvex programming[C]. In Advances in Neural Information Processing Systems 28, pages 379-387, Montreal.

Li Huan and Lin Zhouchen. (2019). Accelerated alternating direction method of multipliers: an optimal O(1/K) nonergodic analysis[J]. J. Sci. Comput., 79(2): 671-699.

Li Huan and Lin Zhouchen. (2020). On the complexity analysis of the primal solutions for the accelerated randomized dual coordinate ascent[J]. J. Mach. Learn. Res., 21(33): 1-45.

Li Huan, Fang Cong, and Lin Zhouchen. (2017). Convergence rates analysis of the quadratic penalty method and its applications to decentralized distributed optimization[R]. Preprint. arXiv: 1711.10802.

Li Huan, Fang Cong, Yin Wotao, and Lin Zhouchen. (2020). Decentralized accelerated gradient methods with increasing penalty parameters[J]. IEEE Trans. Signal Process., 68: 4855-4870.

Lian Xiangru, Zhang Ce, Zhang Huan, Hsieh Cho-Jui, Zhang Wei, and Liu Ji. (2017). Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent[C]. In Advances in Neural Information Processing Systems 30, pages 5330-5340, Long Beach.

Lin Hongzhou, Mairal Julien, and Harchaoui Zaid. (2015). A universal catalyst for first-order optimization[C]. In Advances in Neural Information Processing Systems 28, pages 3384-3392, Montreal.

Lin Qihang, Lu Zhaosong, and Xiao Lin. (2014). An accelerated proximal coordinate gradient method[C]. In Advances in Neural Information Processing Systems 27, pages 3059-3067.

Liu Guangcan, Lin Zhouchen, and Yu Yong. (2010). Robust subspace segmentation by low-rank representation[C]. In Proceedings of the 27th International Conference on Machine Learning, pages 663-670, Haifa.

Lu Jie and Johansson Mikael. (2016). Convergence analysis of approximate primal solutions in dual firstorder methods[J]. SIAM J. Optim., 26(4): 2430-2467.

Mairal Julien. (2013). Optimization with first-order surrogate functions[C]. In Proceedings of the 30th International Conference on Machine Learning, pages 783-791, Atlanta.

Nesterov Yurii. (1988). On an approach to the construction of optimal methods of minimization of smooth convex functions[J]. Ekonomika I Mateaticheskie Metody, 24(3): 509-517.

Nesterov Yurii. (2005). Smooth minimization of non-smooth functions[J]. Math. Program., 103(1): 127-152.

Nesterov Yurii. (2007). Gradient methods for minimizing composite objective function[R]. Technical Report Discussion Paper #2007/76, CORE.

Nesterov Yurii. (2012). Efficiency of coordinate descent methods on huge-scale optimization problems[J]. SIAM J. Optim., 22(2): 341-362.

Nesterov Yurii. (2013). Gradient methods for minimizing composite functions[J]. Math. Program., 140(1): 125-161.

Nesterov Yurii. (2018). Lectures on Convex Optimization[M]. 2nd ed. Springer.

Ouyang Yuyuan, Chen Yunmei, Lan Guanghui, and Pasiliao Jr Eduardo. (2015). An accelerated linearized alternating direction method of multipliers[J]. SIAM J. Imag. Sci., 8(1): 644-681.

Parikh Neal and Boyd Stephen. (2014). Proximal algorithms[J]. Found. Trends Optim., 1(3): 127-239.

Polyak Boris T. (1964). Some methods of speeding up the convergence of iteration methods[J]. USSR Comput. Math. and Math. Phys., 4(5): 1-17.

Reddi Sashank J, Hefny Ahmed, Sra Suvrit, Poczos Barnabas, and Smola Alex. (2016). Stochastic variance reduction for nonconvex optimization[C]. In Proceedings of the 33th International Conference on Machine Learning, pages 314-323, New York.

Scaman Kevin, Bach Francis, Bubeck Sébastien, Lee Yin Tat, and Massoulié Laurent. (2017). Optimal algorithms for smooth and strongly convex distributed optimization in networks[C]. In Proceedings of the 34th International Conference on Machine Learning, pages 3027-3036, Sydney.

Schmidt Mark, Roux Nicolas L, and Bach Francis. (2017). Minimizing finite sums with the stochastic average gradient[J]. Math. Program., 162(1-2): 83-112.

Shalev-Shwartz Shai and Zhang Tong. (2014). Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization[C]. In Proceedings of the 31th International Conference on Machine Learning, pages 64-72, Beijing.

Sra Suvrit, Nowozin Sebastian, and Wright Stephen J, editors. (2012). Optimization for Machine Learning[M]. MIT Press, Cambridge.

Su Weijie, Boyd Stephen, and Candès Emmanuel. (2014). A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights[C]. In Advances in Neural Information Processing Systems 27, pages 2510-2518, Montreal.

Tibshirani Robert. (1996). Regression shrinkage and selection via the lasso[J]. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.), 58(1): 267-288.

Tripuraneni Nilesh, Stern Mitchell, Jin Chi, Regier Jeffrey, and Jordan Michael I. (2018). Stochastic cubic regularization for fast nonconvex optimization[C]. In Advances in Neural Information Processing Systems 31, pages 2899-2908, Montreal.

Tseng Paul. (2008). On accelerated proximal gradient methods for convex-concave optimization[R]. Technical report, University of Washington, Seattle.

Wibisono Andre, Wilson Ashia C, and Jordan Michael I. (2016). A variational perspective on accelerated methods in optimization[J]. Proc. Natl. Acad. Sci., 113(47): 7351-7358.

Xu Yangyang. (2017). Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming[J]. SIAM J. Optim., 27(3): 1459-1484.

Xu Yi, Rong Jing, and Yang Tianbao. (2018). First-order stochastic algorithms for escaping from saddle points in almost linear time[C]. In Advances in Neural Information Processing Systems 31, pages 5530-5540, Montreal.

Zhang Yuchen and Xiao Lin. (2017). Stochastic primal-dual coordinate method for regularized empirical risk minimization[J]. J. Math. Learn. Res., 18(1): 2939-2980.

Zheng Shun, Wang Jialei, Xia Fen, Xu Wei, and Zhang Tong. (2017). A general distributed dual coordinate optimization framework for regularized loss minimization[J]. J. Math. Learn. Res., 18(115): 1-52.