Alp Yurtsever

Assistant professor in Machine Learning at Department of Mathematics and Mathematical Statistics at Umeå University.

My Publications

JabRef references

Dadras, A., Stich, S.U. and Yurtsever, A. (2025), "Personalized Federated Learning via Low-Rank Matrix Optimization" , Transactions on Machine Learning Research.

[Abstract] [BibTeX] [PDF]

Abstract: Personalized Federated Learning (pFL) has gained significant attention for building a suite of models tailored to different clients. In pFL, the challenge lies in balancing the reliance on local datasets, which may lack representativeness, against the diversity of other clients' models, whose quality and relevance are uncertain. Focusing on the clustered FL scenario, where devices are grouped based on similarities in their data distributions without prior knowledge of cluster memberships, we develop a mathematical model for pFL using low-rank matrix optimization. Building on this formulation, we propose a pFL approach leveraging the Burer-Monteiro factorization technique. We examine the convergence guarantees of the proposed method and present numerical experiments on training deep neural networks, demonstrating the empirical performance of the proposed method in scenarios where personalization is crucial.

BibTeX:

@article{Dadras2025-pFLMF,
  author = {Dadras, A. and Stich, S. U. and Yurtsever, A.},
  title = {Personalized Federated Learning via Low-Rank Matrix Optimization},
  journal = {Transactions on Machine Learning Research},
  year = {2025},
  url = {https://openreview.net/pdf?id=DFJu1QB2Nr}
}

Hou, Y., Sra, S. and Yurtsever, A. (2025), "Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture" . arXiv preprint arXiv:2501.16322.

[Abstract] [BibTeX] [PDF]

Abstract: Gradient descent for matrix factorization is known to exhibit an implicit bias toward approximately low-rank solutions. While existing theories often assume the boundedness of iterates, empirically the bias persists even with unbounded sequences. We thus hypothesize that implicit bias is driven by divergent dynamics markedly different from the convergent dynamics for data fitting. Using this perspective, we introduce a new factorization model: X ≈ UDV^T, where U and V are constrained within norm balls, while D is a diagonal factor allowing the model to span the entire search space. Our experiments reveal that this model exhibits a strong implicit bias regardless of initialization and step size, yielding truly (rather than approximately) low-rank solutions. Furthermore, drawing parallels between matrix factorization and neural networks, we propose a novel neural network model featuring constrained layers and diagonal components. This model achieves strong performance across various regression and classification tasks while finding low-rank solutions, resulting in efficient and lightweight networks.

BibTeX:

@techreport{Hou2025-UDV,
  author = {Hou, Y. and Sra, S. and Yurtsever, A.},
  title = {Implicit Bias in Matrix Factorization and its Explicit Realization in a New Architecture},
  school = {arXiv preprint arXiv:2501.16322},
  year = {2025},
  url = {https://arxiv.org/pdf/2501.16322}
}

Maskan, H., Halvachi, P., Sra, S. and Yurtsever, A. (2025), "Block Coordinate DC Programming" . arXiv preprint arXiv:2411.11664.

[Abstract] [BibTeX] [PDF]

Abstract: We introduce an extension of the Difference of Convex Algorithm (DCA) in the form of a block coordinate approach for problems with separable structure. For n coordinate-blocks and k iterations, our main result proves a non-asymptotic convergence rate of O(n/k) for the proposed method. Furthermore, leveraging the connection between DCA and Expectation Maximization (EM), we propose a block coordinate EM algorithm.

BibTeX:

@techreport{Maskan2025-BlockCoordinateDC,
  author = {Maskan, H. and Halvachi, P. and Sra, S. and Yurtsever, A.},
  title = {Block Coordinate DC Programming},
  school = {arXiv preprint arXiv:2411.11664},
  year = {2025},
  url = {https://arxiv.org/pdf/2411.11664}
}

Maskan, H., Hou, Y., Sra, S. and Yurtsever, A. (2025), "Revisiting Frank-Wolfe for Structured Nonconvex Optimization" . arXiv preprint arXiv:2503.08921.

[Abstract] [BibTeX] [PDF]

Abstract: We introduce a new projection-free (Frank-Wolfe) method for optimizing structured nonconvex functions that are expressed as a difference of two convex functions. This problem class subsumes smooth nonconvex minimization, positioning our method as a promising alternative to the classical Frank-Wolfe algorithm. DC decompositions are not unique; by carefully selecting a decomposition, we can better exploit the problem structure, improve computational efficiency, and adapt to the underlying problem geometry to find better local solutions. We prove that the proposed method achieves a first-order stationary point in O(1/2) iterations, matching the complexity of the standard Frank-Wolfe algorithm for smooth nonconvex minimization in general. Specific decompositions can, for instance, yield a gradient-efficient variant that requires only O(1/) calls to the gradient oracle. Finally, we present numerical experiments demonstrating the effectiveness of the proposed method compared to the standard Frank-Wolfe algorithm.

BibTeX:

@techreport{Maskan2025-RevisitingNonconvexFW,
  author = {Maskan, H. and Hou, Y. and Sra, S. and Yurtsever, A.},
  title = {Revisiting Frank-Wolfe for Structured Nonconvex Optimization},
  school = {arXiv preprint arXiv:2503.08921},
  year = {2025},
  url = {https://arxiv.org/pdf/2503.08921}
}

Maskan, H., Zygalakis, K., Eftekhari, A. and Yurtsever, A. (2025), "A Unified Model for High-Resolution ODEs: New Insights on Accelerated Methods" . arXiv preprint arXiv:2503.15136.

[Abstract] [BibTeX] [PDF]

Abstract: Recent work on high-resolution ordinary differential equations (HR-ODEs) captures fine nuances among different momentum-based optimization methods, leading to accurate theoretical insights. However, these HR-ODEs often appear disconnected, each targeting a specific algorithm and derived with different assumptions and techniques. We present a unifying framework by showing that these diverse HR-ODEs emerge as special cases of a general HR-ODE derived using the Forced Euler-Lagrange equation. Discretizing this model recovers a wide range of optimization algorithms through different parameter choices. Using integral quadratic constraints, we also introduce a general Lyapunov function to analyze the convergence of the proposed HR-ODE and its discretizations, achieving significant improvements across various cases, including new guarantees for the triple momentum method's HR-ODE and the quasi-hyperbolic momentum method, as well as faster gradient norm minimization rates for Nesterov′s accelerated gradient algorithm, among other advances.

BibTeX:

@techreport{Maskan2025-UnifiedModel,
  author = {Maskan, H. and Zygalakis, K. and Eftekhari, A. and Yurtsever, A.},
  title = {A Unified Model for High-Resolution ODEs: New Insights on Accelerated Methods},
  school = {arXiv preprint arXiv:2503.15136},
  year = {2025},
  url = {https://arxiv.org/pdf/2503.15136}
}

Palenzuela, K., Dadras, A., Yurtsever, A. and Lofstedt, T. (2025), "Provable Reduction in Communication Rounds for Non-Smooth Convex Federated Learning" , In 35th IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[Abstract] [BibTeX] [PDF]

Abstract: Multiple local steps are key to communication-efficient federated learning. However, theoretical guarantees for such algorithms, without data heterogeneity-bounding assumptions, have been lacking in general non-smooth convex problems. Leveraging projection-efficient optimization methods, we propose FedMLS, a federated learning algorithm with provable improvements from multiple local steps. FedMLS attains an 𝜖-suboptimal solution in O(1/) communication rounds, requiring a total of O(1/2) stochastic subgradient oracle calls.

BibTeX:

@conference{Palenzuela2025-ProvableReduction,
  author = {Palenzuela, K. and Dadras, A. and Yurtsever, A. and Lofstedt, T.},
  title = {Provable Reduction in Communication Rounds for Non-Smooth Convex Federated Learning},
  booktitle = {35th IEEE International Workshop on Machine Learning for Signal Processing (MLSP)},
  year = {2025},
  url = {https://arxiv.org/pdf/2503.21627?}
}

Prakhya, K., Birdal, T. and Yurtsever, A. (2025), "Convex Formulations for Training Two-Layer ReLU Neural Networks" , In 13th Int. Conf. Learning Representations (ICLR).

[Abstract] [BibTeX] [PDF]

Abstract: Solving non-convex, NP-hard optimization problems is crucial for training machine learning models, including neural networks. However, non-convexity often leads to black-box machine learning models with unclear inner workings. While convex formulations have been used for verifying neural network robustness, their application to training neural networks remains less explored. In response to this challenge, we reformulate the problem of training infinite-width two-layer ReLU networks as a convex completely positive program in a finite-dimensional (lifted) space. Despite the convexity, solving this problem remains NP-hard due to the complete positivity constraint. To overcome this challenge, we introduce a semidefinite relaxation that can be solved in polynomial time. We then experimentally evaluate the tightness of this relaxation, demonstrating its competitive performance in test accuracy across a range of classification tasks.

BibTeX:

@conference{Prakhya2025-SDPNN,
  author = {Prakhya, K. and Birdal, T. and Yurtsever, A.},
  title = {Convex Formulations for Training Two-Layer ReLU Neural Networks},
  booktitle = {13th Int. Conf. Learning Representations (ICLR)},
  year = {2025},
  url = {https://openreview.net/pdf?id=e0X9l4kecx}
}

Banerjee, S., Dadras, A., Yurtsever, A. and Bhuyan, M. (2024), "Personalized Multi-tier Federated Learning" , In 31st International Conference on Neural Information Processing (ICONIP).

[Abstract] [BibTeX] [PDF]

Abstract: The key challenge of personalized federated learning (PerFL) is to capture the statistical heterogeneity properties of data with inexpensive communications and gain customized performance for participating devices. To address these, we introduced personalized federated learning in multi-tier architecture (PerMFL) to obtain optimized and personalized local models when there are known team structures across devices. We provide theoretical guarantees of PerMFL, which offers linear convergence rates for smooth strongly convex problems and sub-linear convergence rates for smooth non-convex problems. We conduct numerical experiments demonstrating the robust empirical performance of PerMFL, outperforming the state-of-the-art in multiple personalized federated learning tasks.

BibTeX:

@conference{Banerjee2024-PersonalizedMultiTier,
  author = {Banerjee, S. and Dadras, A. and Yurtsever, A. and Bhuyan, M.},
  title = {Personalized Multi-tier Federated Learning},
  booktitle = {31st International Conference on Neural Information Processing (ICONIP)},
  year = {2024},
  url = {https://arxiv.org/pdf/2407.14251}
}

Dadras, A., Banerjee, S., Prakhya, K. and Yurtsever, A. (2024), "Federated Frank-Wolfe algorithm" , In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD).

[Abstract] [BibTeX] [PDF]

Abstract: Federated learning (FL) has gained a lot of attention in recent years for building privacy-preserving collaborative learning systems. However, FL algorithms for constrained machine learning problems are still limited, particularly when the projection step is costly. To this end, we propose a Federated Frank-Wolfe Algorithm (FedFW). FedFW features data privacy, low per-iteration cost, and communication of sparse signals. In the deterministic setting, FedFW achieves an varepsilon-suboptimal solution within 𝒪(-2) iterations for smooth and convex objectives, and 𝒪(-3) iterations for smooth but non-convex objectives. Furthermore, we present a stochastic variant of FedFW and show that it finds a solution within 𝒪(-3) iterations in the convex setting. We demonstrate the empirical performance of FedFW on several machine learning tasks.

BibTeX:

@conference{Dadras2024-FederatedFrankWolfe,
  author = {Dadras, A. and Banerjee, S. and Prakhya, K. and Yurtsever, A.},
  title = {Federated Frank-Wolfe algorithm},
  booktitle = {Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD)},
  year = {2024},
  url = {https://arxiv.org/pdf/2408.10090}
}

Maskan, H., Zygalakis, K. and Yurtsever, A. (2023), "A variational perspective on high-resolution ODEs" , In Advances in Neural Information Processing Systems 36 (NeurIPS).

[Abstract] [BibTeX] [PDF]

Abstract: We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence rate for gradient norm minimization using Nesterov's accelerated gradient method. Additionally, we show that Nesterov's method can be interpreted as a rate-matching discretization of an appropriately chosen high-resolution ODE. Finally, using the results from the new variational perspective, we propose a stochastic method for noisy gradients. Several numerical experiments compare and illustrate our stochastic algorithm with state of the art methods.

BibTeX:

@conference{Maskan2023-VariationalPerspective,
  author = {Maskan, H. and Zygalakis, K. and Yurtsever, A.},
  title = {A variational perspective on high-resolution ODEs},
  booktitle = {Advances in Neural Information Processing Systems 36 (NeurIPS)},
  year = {2023},
  url = {https://proceedings.neurips.cc/paper_files/paper/2023/file/0569458210c88d8db2985799da830d27-Paper-Conference.pdf}
}

Dresdner, G., Vladarean, M.L., Rärsch, G., Locatello, F., Cevher, V. and Yurtsever, A. (2022), "Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization" , In Proc. 25th Int. Conf. Artificial Intelligence and Statistics (AISTATS).

[Abstract] [BibTeX] [PDF]

Abstract: We propose a stochastic conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. Existing CGM variants for this template either suffer from slow convergence rates, or require carefully increasing the batch size over the course of the algorithm's execution, which leads to computing full gradients. In contrast, the proposed method, equipped with a stochastic average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques. In applications we put special emphasis on problems with a large number of separable constraints. Such problems are prevalent among semidefinite programming (SDP) formulations arising in machine learning and theoretical computer science. We provide numerical experiments on matrix completion, unsupervised clustering, and sparsest-cut SDPs.

BibTeX:

@conference{Dresdner2022-FasterOneSample,
  author = {Dresdner, G. and Vladarean, M. L. and Rärsch, G. and Locatello, F. and Cevher, V. and Yurtsever, A.},
  title = {Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization},
  booktitle = {Proc. 25th Int. Conf. Artificial Intelligence and Statistics (AISTATS)},
  year = {2022},
  url = {https://proceedings.mlr.press/v151/dresdner22a/dresdner22a.pdf}
}

Yurtsever, A., Birdal, T. and Golyanik, V. (2022), "Q-FW: A hybrid classical-quantum Frank-Wolfe for quadratic binary optimization" , In European Conference on Computer Vision (ECCV).

[Abstract] [BibTeX] [PDF]

Abstract: We present a hybrid classical-quantum framework based on the Frank-Wolfe algorithm, Q-FW, for solving quadratic, linearly-constrained, binary optimization problems on quantum annealers (QA). The computational premise of quantum computers has cultivated the re-design of various existing vision problems into quantum-friendly forms. Experimental QA realizations can solve a particular non-convex problem known as the quadratic unconstrained binary optimization (QUBO). Yet a naive-QUBO cannot take into account the restrictions on the parameters. To introduce additional structure in the parameter space, researchers have crafted ad-hoc solutions incorporating (linear) constraints in the form of regularizers. However, this comes at the expense of a hyper-parameter, balancing the impact of regularization. To date, a true constrained solver of quadratic binary optimization (QBO) problems has lacked. Q-FW first reformulates constrained-QBO as a copositive program (CP), then employs Frank-Wolfe iterations to solve CP while satisfying linear (in)equality constraints. This procedure unrolls the original constrained-QBO into a set of unconstrained QUBOs all of which are solved, in a sequel, on a QA. We use D-Wave Advantage QA to conduct synthetic and real experiments on two important computer vision problems, graph matching and permutation synchronization, which demonstrate that our approach is effective in alleviating the need for an explicit regularization coefficient.

BibTeX:

@conference{Yurtsever2022-QuantumFrankWolfe,
  author = {Yurtsever, A. and Birdal, T. and Golyanik, V.},
  title = {Q-FW: A hybrid classical-quantum Frank-Wolfe for quadratic binary optimization},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2022},
  url = {https://arxiv.org/pdf/2203.12633}
}

Yurtsever, A. and Sra, S. (2022), "CCCP is Frank-Wolfe in disguise" , In Advances in Neural Information Processing Systems 35 (NeurIPS).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: This paper uncovers a simple but rather surprising connection: it shows that the well-known convex-concave procedure (CCCP) and its generalization to constrained problems are both special cases of the Frank-Wolfe (FW) method. This connection not only provides insight of deep (in our opinion) pedagogical value, but also transfers the recently discovered convergence theory of nonconvex Frank-Wolfe methods immediately to CCCP, closing a long-standing gap in its non-asymptotic convergence theory. We hope the viewpoint uncovered by this paper spurs the transfer of other advances made for FW to both CCCP and its generalizations.

BibTeX:

@conference{Yurtsever2022-CCCPisFW,
  author = {Yurtsever, A. and Sra, S.},
  title = {CCCP is Frank-Wolfe in disguise},
  booktitle = {Advances in Neural Information Processing Systems 35 (NeurIPS)},
  year = {2022},
  url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/e589022774244df75b97eae18bb3628d-Paper-Conference.pdf}
}

Ding, L., Yurtsever, A., Cevher, V., Tropp, J. and Udell, M. (2021), "An Optimal-Storage Approach to Semidefinite Programming using Approximate Complementarity" , SIAM Journal on Optimization. . arXiv:1902.03373. Vol. 31 (4) , pp. 2695-2725.

[Abstract] [BibTeX] [PDF]

Abstract: This paper develops a new storage-optimal algorithm that provably solves generic semidefinite programs (SDPs) in standard form. This method is particularly effective for weakly constrained SDPs. The key idea is to formulate an approximate complementarity principle: Given an approximate solution to the dual SDP, the primal SDP has an approximate solution whose range is contained in the eigenspace with small eigenvalues of the dual slack matrix. For weakly constrained SDPs, this eigenspace has very low dimension, so this observation significantly reduces the search space for the primal solution.
This result suggests an algorithmic strategy that can be implemented with minimal storage:
(1) Solve the dual SDP approximately;
(2) compress the primal SDP to the eigenspace with small eigenvalues of the dual slack matrix;
(3) solve the compressed primal SDP.
The paper also provides numerical experiments showing that this approach is successful for a range of interesting large-scale SDPs.

BibTeX:

@article{Ding2021-ComplementarySlacknessSDP,
  author = {Ding, L. and Yurtsever, A. and Cevher, V. and Tropp, J. and Udell, M.},
  title = {An Optimal-Storage Approach to Semidefinite Programming using Approximate Complementarity},
  journal = {SIAM Journal on Optimization},
  school = {arXiv:1902.03373},
  year = {2021},
  volume = {31},
  number = {4},
  pages = {2695-2725},
  url = {https://arxiv.org/pdf/1902.03373.pdf}
}

Yurtsever, A., Gu, A. and Sra, S. (2021), "Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates" , In Advances in Neural Information Processing Systems 34 (NeurIPS).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: Three Operator Splitting (TOS) (Davis & Yin, 2017) can minimize the sum of multiple convex functions effectively when an efficient gradient oracle or proximal operator is available for each term. This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handle complex penalty functions may be more efficient and realistic. Motivated by these concerns, we analyze three potentially valuable extensions of TOS. The first two permit using subgradients and stochastic gradients, and are shown to ensure a O(1/t) convergence rate. The third extension AdapTOS endows TOS with adaptive step-sizes. For the important setting of optimizing a convex loss over the intersection of convex sets AdapTOS attains universal convergence rates, i.e., the rate adapts to the unknown smoothness degree of the objective. We compare our proposed methods with competing methods on various applications.

BibTeX:

@conference{Yurtsever2021-AdaptiveTOS,
  author = {Yurtsever, A. and Gu, A. and Sra, S.},
  title = {Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates},
  booktitle = {Advances in Neural Information Processing Systems 34 (NeurIPS)},
  year = {2021},
  url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/a4267159aa970aa5a6542bcbb7ef575e-Paper.pdf}
}

Yurtsever, A., Mangalick, V. and Sra, S. (2021), "Three Operator Splitting with a Nonconvex Loss Function" , In Proc. 38th Int. Conf. Machine Learning (ICML).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: We consider the problem of minimizing the sum of three functions, one of which is nonconvex but differentiable, and the other two are convex but possibly nondifferentiable. We investigate the Three Operator Splitting method (TOS) of Davis & Yin (2017) with an aim to extend its theoretical guarantees for this nonconvex problem template. In particular, we prove convergence of TOS with nonasymptotic bounds on its nonstationarity and infeasibility errors. In contrast with the existing work on nonconvex TOS, our guarantees do not require additional smoothness assumptions on the terms comprising the objective; hence they cover instances of particular interest where the nondifferentiable terms are indicator functions. We also extend our results to a stochastic setting where we have access only to an unbiased estimator of the gradient. Finally, we illustrate the effectiveness of the proposed method through numerical experiments on quadratic assignment problems.

BibTeX:

@conference{Yurtsever2021-NonconvexTOS,
  author = {Yurtsever, A. and Mangalick, V. and Sra, S.},
  title = {Three Operator Splitting with a Nonconvex Loss Function},
  booktitle = {Proc. 38th Int. Conf. Machine Learning (ICML)},
  year = {2021},
  url = {http://proceedings.mlr.press/v139/yurtsever21a/yurtsever21a.pdf}
}

Yurtsever, A., Tropp, J., Fercoq, O., Udell, M. and Cevher, V. (2021), "Scalable Semidefinite Programming" , SIAM Journal on Mathematics of Data Science. . arXiv:1912.02949. Vol. 3 (1) , pp. 171-200.

[Abstract] [BibTeX] [PDF]

Abstract: Semidefinite programming (SDP) is a powerful framework from convex optimization that has striking potential for data science applications. This paper develops a provably correct algorithm for solving large SDP problems by economizing on both the storage and the arithmetic costs. Numerical evidence shows that the method is effective for a range of applications, including relaxations of MaxCut, abstract phase retrieval, and quadratic assignment. Running on a laptop, the algorithm can handle SDP instances where the matrix variable has over 10^13 entries.

BibTeX:

@article{Yurtsever2021-ScalableSDP,
  author = {Yurtsever, A. and Tropp, J. and Fercoq, O. and Udell, M. and Cevher, V.},
  title = {Scalable Semidefinite Programming},
  journal = {SIAM Journal on Mathematics of Data Science},
  school = {arXiv:1912.02949},
  year = {2021},
  volume = {3},
  number = {1},
  pages = {171-200},
  url = {https://arxiv.org/pdf/1912.02949.pdf}
}

Cevher, V., Vu, B. and Yurtsever, A. (2019), "Inertial Three-Operator Splitting Method and Applications" . arXiv:1904.12980.

[Abstract] [BibTeX] [PDF]

Abstract: We introduce an inertial variant of the forward-Douglas-Rachford splitting and analyze its convergence. We specify an instance of the proposed method to the three-composite convex minimization template. We provide practical guidance on the selection of the inertial parameter based on the adaptive starting idea. Finally, we illustrate the practical performance of our method in various machine learning applications.

BibTeX:

@techreport{Cevher2019-InertialTOS,
  author = {Cevher, V. and B. Vu and Yurtsever, A.},
  title = {Inertial Three-Operator Splitting Method and Applications},
  school = {arXiv:1904.12980},
  year = {2019},
  url = {https://arxiv.org/pdf/1904.12980.pdf}
}

Locatello, F., Yurtsever, A., Fercoq, O. and Cevher, V. (2019), "Stochastic Conditional Gradient Method for Composite Convex Minimization" , In Advances in Neural Information Processing Systems 32 (NeurIPS).

[Abstract] [BibTeX] [PDF]

Abstract: A broad class of convex optimization problems can be formulated as a semidefinite program (SDP), minimization of a convex function over the positive-semidefinite cone subject to some affine constraints. The majority of classical SDP solvers are designed for the deterministic setting where problem data is readily available. In this setting, generalized conditional gradient methods (aka Frank-Wolfe-type methods) provide scalable solutions by leveraging the so-called linear minimization oracle instead of the projection onto the semidefinite cone. Most problems in machine learning and modern engineering applications, however, contain some degree of stochasticity. In this work, we propose the first conditional-gradient-type method for solving stochastic optimization problems under affine constraints. Our method guarantees O(k^−1/3) convergence rate in expectation on the objective residual and O(k^−5/12)on the feasibility gap.

BibTeX:

@conference{Locatello2019-StochasticHomotopyCGM,
  author = {Locatello, F. and Yurtsever, A. and Fercoq, O. and Cevher, V.},
  title = {Stochastic Conditional Gradient Method for Composite Convex Minimization},
  booktitle = {Advances in Neural Information Processing Systems 32 (NeurIPS)},
  year = {2019},
  url = {https://papers.nips.cc/paper/9572-stochastic-frank-wolfe-for-composite-convex-minimization.pdf}
}

Tropp, J., Yurtsever, A., Udell, M. and Cevher, V. (2019), "Streaming Low-Rank Matrix Approximation with an Application to Scientific Simulation" , SIAM Journal on Scientific Computing. Vol. 41 (4) , pp. A2430-A2463.

[Abstract] [BibTeX] [PDF]

Abstract: This paper argues that randomized linear sketching is a natural tool for on-the-fly compression of data matrices that arise from large-scale scientific simulations and data collection. The technical contribution consists in a new algorithm for constructing an accurate low-rank approximation of a matrix from streaming data. This method is accompanied by an a priori analysis that allows the user to set algorithm parameters with confidence and an a posteriori error estimator that allows the user to validate the quality of the reconstructed matrix. In comparison to previous techniques, the new method achieves smaller relative approximation errors and is less sensitive to parameter choices. As concrete applications, the paper outlines how the algorithm can be used to compress a Navier--Stokes simulation and a sea surface temperature dataset.

BibTeX:

@article{Tropp2019-SketchingScientificSimulation,
  author = {Tropp, J.A. and Yurtsever, A. and Udell, M. and Cevher, V.},
  title = {Streaming Low-Rank Matrix Approximation with an Application to Scientific Simulation},
  journal = {SIAM Journal on Scientific Computing},
  year = {2019},
  volume = {41},
  number = {4},
  pages = {A2430--A2463},
  url = {https://arxiv.org/pdf/1902.08651.pdf}
}

Yurtsever, A., Fercoq, O. and Cevher, V. (2019), "A Conditional Gradient-Based Augmented Lagrangian Framework" , In Proc. 36th Int. Conf. Machine Learning (ICML).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: This paper considers a generic convex minimization template with affine constraints over a compact domain, which covers key semidefinite programming applications. The existing conditional gradient methods either do not apply to our template or are too slow in practice. To this end, we propose a new conditional gradient method, based on a unified treatment of smoothing and augmented Lagrangian frameworks. The proposed method maintains favorable properties of the classical conditional gradient method, such as cheap linear minimization oracle calls and sparse representation of the decision variable. We prove O(k^-1/2) convergence rate for our method in the objective residual and the feasibility gap. This rate is essentially the same as the state of the art CG-type methods for our problem template, but the proposed method is arguably superior in practice compared to existing methods in various applications.

BibTeX:

@conference{Yurtsever2019-Cgal,
  author = {Yurtsever, A. and Fercoq, O. and Cevher, V.},
  title = {A Conditional Gradient-Based Augmented Lagrangian Framework},
  booktitle = {Proc. 36th Int. Conf. Machine Learning (ICML)},
  year = {2019},
  url = {http://proceedings.mlr.press/v97/yurtsever19a/yurtsever19a.pdf}
}

Yurtsever, A., Suvrit, S. and Cevher, V. (2019), "Conditional Gradient Method via Stochastic Path-Integrated Differential Estimators" , In Proc. 36th Int. Conf. Machine Learning (ICML).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: We propose a class of variance-reduced stochastic conditional gradient methods. By adopting the recent stochastic path-integrated differential estimator technique (SPIDER) of Fang et. al. (2018) for the classical Frank-Wolfe (FW) method, we introduce SPIDER-FW for finite-sum minimization as well as the more general expectation minimization problems. SPIDER-FW enjoys superior complexity guarantees in the non-convex setting, while matching the best known FW variants in the convex case. We also extend our framework a la conditional gradient sliding (CGS) of Lan & Zhou. (2016), and propose SPIDER-CGS.

BibTeX:

@conference{Yurtsever2019-SpiderFW,
  author = {Yurtsever, A. and Suvrit, S. and Cevher, V.},
  title = {Conditional Gradient Method via Stochastic Path-Integrated Differential Estimators},
  booktitle = {Proc. 36th Int. Conf. Machine Learning (ICML)},
  year = {2019},
  url = {http://proceedings.mlr.press/v97/yurtsever19b/yurtsever19b.pdf}
}

Cevher, V., Vu, B. and Yurtsever, A. (2018), "Stochastic Forward Douglas-Rachford Splitting Method for Monotone Inclusions" , In Large-Scale and Distributed Optimization. Springer International Publishing.

[Abstract] [BibTeX]

Abstract: We propose a stochastic Forward-Douglas-Rachford Splitting framework for finding a zero point of the sum of three maximally monotone operators in real separable Hilbert space, where one of the operators is cocoercive. We characterize the rate of convergence in expectation in the case of strongly monotone operators. We provide guidance on step-size sequences that achieve this rate, even if the strong convexity parameter is unknown.

BibTeX:

@incollection{Cevher2018-StochasticFDR,
  author = {Cevher, V. and Vu, B.C. and Yurtsever, A.},
  editor = {Giselsson, P. and Rantzer, A.},
  title = {Stochastic Forward Douglas-Rachford Splitting Method for Monotone Inclusions},
  booktitle = {Large-Scale and Distributed Optimization},
  publisher = {Springer International Publishing},
  year = {2018}
}

Hsieh, Y.-P., Kao, Y.-C., Mahabadi, R.K., Yurtsever, A., Kyrillidis, A. and Cevher, V. (2018), "A Non-Euclidean Gradient Descent Framework for Non-Convex Matrix Factorization" , IEEE Transactions on Signal Processing. Vol. 22 (66) , pp. 5917-5926.

[Abstract] [BibTeX] [PDF]

Abstract: We study convex optimization problems that feature low-rank matrix solutions. In such scenarios, non-convex methods offer significant advantages over convex methods due to their lower space complexity, as well as practical faster convergence. Under mild assumptions, these methods feature global convergence guarantees.

In this paper, we extend the results on this matter by following a different path. We derive a non-Euclidean optimization framework in the non-convex setting that takes nonlinear gradient steps on the factors. Our framework enables the possibility to further exploit the underlying problem structures, such as sparsity or low-rankness on the factorized domain, or better dimensional dependence of the smoothness parameters of the objectives. We prove that the non-Euclidean methods enjoy the same rigorous guarantees as their Euclidean counterparts, under appropriate assumptions. Numerical evidence with Fourier Ptychography and FastText applications, using real data, shows that our approach can enhance solution quality, as well as convergence speed over the standard non-convex approaches.

BibTeX:

@article{Hsieh2018-NonEuclideanFactorization,
  author = {Y.-P. Hsieh and Y.-C. Kao and R. K. Mahabadi and Yurtsever, A. and Kyrillidis, A. and Cevher, V.},
  title = {A Non-Euclidean Gradient Descent Framework for Non-Convex Matrix Factorization},
  journal = {IEEE Transactions on Signal Processing},
  year = {2018},
  volume = {22},
  number = {66},
  pages = {5917--5926},
  url = {https://infoscience.epfl.ch/record/231191/files/matrix_fact.pdf}
}

Levy, K., Yurtsever, A. and Cevher, V. (2018), "Online Adaptive Methods, Universality and Acceleration" , In Advances in Neural Information Processing Systems 31 (NeurIPS).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: We present a novel method for convex unconstrained optimization that, without any modifications, ensures: (i) accelerated convergence rate for smooth objectives, (ii) standard convergence rate in the general (non-smooth) setting, and (iii) standard convergence rate in the stochastic optimization setting. To the best of our knowledge, this is the first method that simultaneously applies to all of the above settings.

At the heart of our method is an adaptive learning rate rule that employs importance weights, in the spirit of adaptive online learning algorithms, combined with an update that linearly couples two sequences. An empirical examination of our method demonstrates its applicability to the above mentioned scnearios and corroborates our theoretical findings.

BibTeX:

@conference{Levy2018-Accelegrad,
  author = {Levy, K.Y. and Yurtsever, A. and Cevher, V.},
  title = {Online Adaptive Methods, Universality and Acceleration},
  booktitle = {Advances in Neural Information Processing Systems 31 (NeurIPS)},
  year = {2018},
  url = {https://papers.nips.cc/paper/7885-online-adaptive-methods-universality-and-acceleration.pdf}
}

Vu, B., Alacaoglu, A., Sahin, M., Yurtsever, A. and Cevher, V. (2018), "A First-Order Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints" , In Modern Trends in Nonconvex Optimization for Machine Learning (ICML Workshop).

[Abstract] [BibTeX] [PDF]

Abstract: We consider a canonical nonlinear-constrained nonconvex problem with broad applications in machine learning, theoretical computer science, and signal processing. We propose a simple primal-dual splitting scheme that provably converges to a stationary point of the non-convex problem. We achieve this desideratum via an adaptive and inexact augmented Lagrangian method. The new algorithm features a slow O(1/3) convergence rate, which it counteracted by its cheap per-iteration complexity. We provide numerical evidence on large-scale machine learning problems, modeled typically via semidefinite relaxations.

BibTeX:

@conference{Bang2018-NonconvexAugmentedLagrangian,
  author = {Vu, B.C. and Alacaoglu, A. and Sahin, M.F. and Yurtsever, A. and Cevher, V.},
  title = {A First-Order Augmented Lagrangian Framework for Nonconvex Optimization with Nonlinear Constraints},
  booktitle = {Modern Trends in Nonconvex Optimization for Machine Learning (ICML Workshop)},
  year = {2018},
  url = {https://drive.google.com/file/d/1r0pM67WKxqeh0ojJNaAJeq_9VKOEv4C7/preview}
}

Yurtsever, A., Fercoq, O., Locatello, F. and Cevher, V. (2018), "A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming" , In Proc. 35th Int. Conf. Machine Learning (ICML).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: We propose a conditional gradient framework for a composite convex minimization template with broad applications. Our approach combines smoothing and homotopy techniques under the CGM framework, and provably achieves the optimal O(1/k) convergence rate. We demonstrate that the same rate holds if the linear subproblems are solved approximately with additive or multiplicative error. In contrast with the relevant work, we are able to characterize the convergence when the non-smooth term is an indicator function. Specific applications of our framework include the non-smooth minimization, semidefinite programming, and minimization with linear inclusion constraints over a compact domain. Numerical evidence demonstrates the benefits of our framework

BibTeX:

@conference{Yurtsever2018-HomotopyCGM,
  author = {Yurtsever, A. and Fercoq, O. and Locatello, F. and Cevher, V.},
  title = {A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming},
  booktitle = {Proc. 35th Int. Conf. Machine Learning (ICML)},
  year = {2018},
  url = {http://proceedings.mlr.press/v80/yurtsever18a/yurtsever18a.pdf}
}

Tropp, J., Yurtsever, A., Udell, M. and Cevher, V. (2017), "Fixed-rank approximation of a positive-semidefinite matrix from streaming data" , In Advances in Neural Information Processing Systems 31 (NeurIPS).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: Several important applications, such as streaming PCA and semidefinite programming, involve a large-scale positive-semidefinite (psd) matrix that is presented as a sequence of linear updates. Because of storage limitations, it may only be possible to retain a sketch of the psd matrix. This paper develops a new algorithm for fixed-rank psd approximation from a sketch. The approach combines the Nyström
approximation with a novel mechanism for rank truncation. Theoretical analysis establishes that the proposed method can achieve any prescribed relative error in the Schatten 1-norm and that it exploits the spectral decay of the input matrix. Computer experiments show that the proposed method dominates alternative techniques for fixed-rank psd matrix approximation across a wide range of examples.

BibTeX:

@conference{Tropp2017-NystromSketch,
  author = {Tropp, J.A. and Yurtsever, A. and Udell, M. and Cevher, V.},
  title = {Fixed-rank approximation of a positive-semidefinite matrix from streaming data},
  booktitle = {Advances in Neural Information Processing Systems 31 (NeurIPS)},
  year = {2017},
  url = {http://papers.nips.cc/paper/6722-fixed-rank-approximation-of-a-positive-semidefinite-matrix-from-streaming-data.pdf}
}

Tropp, J., Yurtsever, A., Udell, M. and Cevher, V. (2017), "Practical sketching algorithms for low-rank matrix approximation" , SIAM Journal on Matrix Analysis and Applications. Vol. 38 (4) , pp. 1454-1485.

[Abstract] [BibTeX] [PDF]

Abstract: This paper describes a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by numerical experiments with real and synthetic data.

BibTeX:

@article{Tropp2017-PracticalSketch,
  author = {Tropp, J.A. and Yurtsever, A. and Udell, M. and Cevher, V.},
  title = {Practical sketching algorithms for low-rank matrix approximation},
  journal = {SIAM Journal on Matrix Analysis and Applications},
  year = {2017},
  volume = {38},
  number = {4},
  pages = {1454--1485},
  url = {https://arxiv.org/pdf/1609.00048.pdf}
}

Tropp, J., Yurtsever, A., Udell, M. and Cevher, V. (2017), "Randomized single-view algorithms for low-rank matrix approximation" . ACM Report 2017-01, Caltech.

[Abstract] [BibTeX] [PDF]

Abstract: This paper develops a suite of algorithms for constructing low-rank approximations of an input matrix from a random linear image of the matrix, called a sketch. These methods can preserve structural properties of the input matrix, such as positive-semidefiniteness, and they can produce approximations with a user-specified rank. The algorithms are simple, accurate, numerically stable, and provably correct. Moreover, each method is accompanied by an informative error bound that allows users to select parameters a priori to achieve a given approximation quality. These claims are supported by computer experiments.

BibTeX:

@techreport{Tropp2017-RandomizedSingleView,
  author = {Tropp, J.A. and Yurtsever, A. and Udell, M. and Cevher, V.},
  title = {Randomized single-view algorithms for low-rank matrix approximation},
  school = {ACM Report 2017-01, Caltech},
  year = {2017},
  url = {https://authors.library.caltech.edu/74347/1/ACM_TR_2017_01.pdf}
}

Yurtsever, A., Tran-Dinh, Q. and Cevher, V. (2017), "Scalable Convex Methods for Low-Rank Matrix Problems" , In Signal Processing with Adaptive Sparse Structured Representations (SPARS).

[Abstract] [BibTeX] [PDF]

Abstract: We study a phase retrieval problem in the Poisson noise model. Motivated by the PhaseLift approach, we approximate the maximumlikelihood estimator by solving a convex program with a nuclear norm constraint. While the Frank-Wolfe algorithm, together with the Lanczos method, can efficiently deal with nuclear norm constraints, our objective function does not have a Lipschitz continuous gradient, and hence existing convergence guarantees for the Frank-Wolfe algorithm do not apply. In this paper, we show that the Frank-Wolfe algorithm works for the Poisson phase retrieval problem, and has a
global convergence rate of O(1/t), where t is the iteration counter. We provide rigorous theoretical guarantee and illustrating numerical results.

BibTeX:

@conference{Yurtsever2017-ScalableLowRank,
  author = {Yurtsever, A. and Tran-Dinh, Q. and Cevher, V.},
  title = {Scalable Convex Methods for Low-Rank Matrix Problems},
  booktitle = {Signal Processing with Adaptive Sparse Structured Representations (SPARS)},
  year = {2017},
  url = {https://spars2017.lx.it.pt/index_files/papers/SPARS2017_Paper_11.pdf}
}

Yurtsever, A., Udell, M., Tropp, J. and Cevher, V. (2017), "Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage" , In Proc. 20th Int. Conf. Artificial Intelligence and Statistics (AISTATS).

[Abstract] [BibTeX] [PDF]

Abstract: This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a lowrank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to store only a small randomized sketch of the matrix variable. After the optimization terminates, the algorithm extracts a low-rank approximation of the solution from the sketch. In contrast to nonconvex heuristics, the guarantees for SketchyCGM do not rely on statistical models for the problem data. Numerical work demonstrates the benefits of SketchyCGM over heuristics.

BibTeX:

@conference{Yurtsever2016-SketchyDecision,
  author = {Yurtsever, A. and Udell, M. and Tropp, J.A. and Cevher, V.},
  title = {Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage},
  booktitle = {Proc. 20th Int. Conf. Artificial Intelligence and Statistics (AISTATS)},
  year = {2017},
  url = {https://arxiv.org/pdf/1702.06838.pdf}
}

Odor, G., Li, Y.-H., Yurtsever, A., Hsieh, Y.-P., Tran-Dinh, Q., El Halabi, M. and Cevher, V. (2016), "Frank-Wolfe Works for Non-Lipschitz Continuous Gradient Objectives: Scalable Poisson Phase Retrieval" , In 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[Abstract] [BibTeX] [PDF]

BibTeX:

@conference{Odor2016-PoissonPhaseRetrieval,
  author = {Odor, G. and Li, Y.-H. and Yurtsever, A. and Hsieh, Y.-P. and Tran-Dinh, Q. and El Halabi, M. and Cevher, V.},
  title = {Frank-Wolfe Works for Non-Lipschitz Continuous Gradient Objectives: Scalable Poisson Phase Retrieval},
  booktitle = {41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year = {2016},
  url = {https://arxiv.org/pdf/1602.00724.pdf}
}

Yurtsever, A., Vu, B. and Cevher, V. (2016), "Stochastic Three-Composite Convex Minimization" , In Advances in Neural Information Processing Systems 29 (NeurIPS).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: We propose a stochastic optimization method for the minimization of the sum of three convex functions, one of which has Lipschitz continuous gradient as well as restricted strong convexity. Our approach is most suitable in the setting where it is computationally advantageous to process smooth term in the decomposition with its stochastic gradient estimate and the other two functions separately with their proximal operators, such as doubly regularized empirical risk minimization problems. We prove the convergence characterization of the proposed algorithm in expectation under the standard assumptions for the stochastic gradient estimate of the smooth term. Our method operates in the primal space and can be considered as a stochastic extension of the three-operator splitting method. Numerical evidence supports the effectiveness of our method in real-world problems.

BibTeX:

@conference{Yurtsever2016-StochasticThreeOperator,
  author = {Yurtsever, A. and Vu, B.C. and Cevher, V.},
  title = {Stochastic Three-Composite Convex Minimization},
  booktitle = {Advances in Neural Information Processing Systems 29 (NeurIPS)},
  year = {2016},
  url = {http://papers.nips.cc/paper/6127-stochastic-three-composite-convex-minimization.pdf}
}

Yurtsever, A., Hsieh, Y.-P. and Cevher, V. (2015), "Scalable Convex Methods for Phase Retrieval" , In 6th IEEE Int. Workshop Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: This paper describes scalable convex optimization methods for phase retrieval. The main characteristics of these methods are the cheap per-iteration complexity and the low-memory footprint. With a variant of the original PhaseLift formulation, we first illustrate how to leverage the scalable Frank-Wolfe (FW) method (also known as the conditional gradient algorithm), which requires a tuning parameter. We demonstrate that we can estimate the tuning parameter of the FW algorithm directly from the measurements, with rigorous theoretical guarantees. We then illustrate numerically that recent advances in universal primal-dual convex optimization methods offer significant scalability improvements over the FW method, by recovering full HD resolution color images from their quadratic measurements.

BibTeX:

@conference{Yurtsever2015-ScalablePhaseRetrieval,
  author = {Yurtsever, A. and Hsieh, Y.-P. and Cevher, V.},
  title = {Scalable Convex Methods for Phase Retrieval},
  booktitle = {6th IEEE Int. Workshop Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)},
  year = {2015},
  url = {https://infoscience.epfl.ch/record/212914/files/ScalableConvexMethodsForPhaseRetrieval.pdf}
}

Yurtsever, A., Tran-Dinh, Q. and Cevher, V. (2015), "A Universal Primal-Dual Convex Optimization Framework" , In Advances in Neural Information Processing Systems 28 (NeurIPS).

[Abstract] [BibTeX] [PDF] [Poster]

Abstract: We propose a new primal-dual algorithmic framework for a prototypical constrained convex optimization template. The algorithmic instances of our framework are universal since they can automatically adapt to the unknown Holder continuity degree and constant within the dual formulation. They are also guaranteed to have optimal convergence rates in the objective residual and the feasibility gap for each Holder smoothness degree. In contrast to existing primal-dual algorithms, our framework avoids the proximity operator of the objective function. We instead leverage computationally cheaper, Fenchel-type operators, which are the main workhorses of the generalized conditional gradient (GCG)-type methods. In contrast to the GCG-type methods, our framework does not require the objective function to be differentiable, and can also process additional general linear inclusion constraints, while guarantees the convergence rate on the primal problem.

BibTeX:

@conference{Yurtsever2015-UniversalPrimalDual,
  author = {Yurtsever, A. and Tran-Dinh, Q. and Cevher, V.},
  title = {A Universal Primal-Dual Convex Optimization Framework},
  booktitle = {Advances in Neural Information Processing Systems 28 (NeurIPS)},
  year = {2015},
  url = {http://papers.nips.cc/paper/5826-a-universal-primal-dual-convex-optimization-framework.pdf}
}