Title: Convex Relaxations for Subset Selection
Abstract: We use convex relaxation techniques to produce lower bounds on the optimal value of subset selection problems and generate good approximate solutions. We then explicitly bound the quality of these relaxations by studying the approximation ratio of sparse eigenvalue relaxations. Our results are used to improve the performance of branch-and-bound algorithms to produce exact solutions to subset selection problems.
Title: A randomized Mirror-Prox method for solving structured matrix saddle-point problems
Abstract
In this paper, we derive a randomized version of the Mirror-Prox method for
solving some
structured matrix saddle-point problems, such as the maximal eigenvalue
minimization problem. Deterministic first-order schemes, such as
Nesterov's Smoothing
Techniques or standard
Mirror-Prox methods, require the exact computation of a matrix exponential
at every iteration,
limiting the size of the problems they can solve. Our method allows us to
use stochastic approximations of matrix exponentials. We prove that our randomized scheme
decreases significantly the complexity of its deterministic counterpart for large-scale matrix
saddle-point problems.
Numerical experiments illustrate and conirm our theoretical results.
Constantine Caramanis,
University of Texas at Austin, USA
Title: Robust Matrix Completion and Collaborative Filtering
Abstract: In this talk, we consider the problem of matrix completion when some number of the columns are arbitrarily corrupted. It is well-known that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a constant fraction of columns are corrupted?
In this paper, we discuss this very problem, and present a robust and efficient algorithm for its solution. One direct application comes from robust collaborative filtering. Here, some number of users are so-called manipulators, and try to skew the predictions of the algorithm. Significantly, our results hold without any assumptions on the observed entries of the manipulated columns.
Joint work with: Yudong Chen, Huan Xu, Sujay Sanghavi
Title: Towards Optimal Newton-Type Methods for Nonconvex Smooth Optimization
Abstract:
We show that the steepest-descent and Newton's methods for
unconstrained nonconvex optimization under standard assumptions may
both require a number of iterations and function evaluations
arbitrarily close to the steepest-descent's global worst-case
complexity bound. This shows that the latter upper bound is
essentially tight for steepest descent and that Newton's method may
be as slow as the steepest-descent method in the worst case. More
generally, we consider a class of second-order methods for
unconstrained minimization that includes Newton's, inexact
linesearch, cubic regularization, some Goldfeld-Quandt-Trotter and
trust-region variants. For each algorithm in this class, we exhibit
an example of a sufficiently smooth and bounded-below objective such
that the method takes at
least O(\epsilon^{-3/2}) function-evaluations to drive the gradient below \epsilon. Thus, as this evaluation count coincides in order with the upper complexity bound for cubic regularization, the latter has (order) optimal worst-case complexity amongst the methods in this class.
Joint work with Nick Gould and Philippe Toint.
Title: Nonsmooth Optimization via Gradient Sampling
Abstract:
We discuss an algorithm for nonsmooth, nonconvex optimization. We focus on the idea of gradient sampling to practically approximate epsilon-subdifferentials of the problem functions, which allows for problems to be solved via a sequence of quadratic subproblems. The ideas apply in both unconstrained and constrained contexts. We emphasize that adaptive sampling, warm-starts for the quadratic subproblem solver, and quasi-Newton Hessian approximations can be used to develop efficient methods for unconstrained problems, and illustrate how these techniques can be applied when constraints are present.
Joint work with: Michael Overton and Xiaocun Que
Title: Convex optimization with a first-order inexact oracle
Abstract:We introduce a new definition of inexact oracle for convex
optimization, which can be viewed as a common extension of the
classical notions of (1) epsilon-subgradient and (2)
Lipschitz-continuous gradient. We study the behaviour of classical
first-order methods for smooth convex optimization when such an
inexact oracle is used instead of the exact gradient. In particular,
we show that the convergence of the classical gradient method is
mostly unchanged (only the reachable accuracy is reduced), while the
behaviour of the fast gradient method seriously deteriorates (it no
longer converges, due to error accumulation). We provide several
examples of situations where such an inexact oracle is naturally
available, including cases involving non-smooth functions or functions
whose derivatives are not directly accessible.
Joint work with O. Devolder and Y. Nesterov
Title:
Fast First-Order and Alternating Direction Methods for Stable Princial Component Pursuit
Abstract:
Given a matrix M that is the sum of a low rank matrix X and a sparse matrix Y, it is known that under certain conditions, X and Y can be recovered with very high probability by solving a so-called robust principal component
pursuit problem. When M has noise added to it, the recovery problem is
called stable principal component pursuit. In this problem one minimizes the sum of the nuclear norm of X plus a scalar times the sum of the absolute
values of the elements of Y, subject to the Frobenius norm of X+Y-M being less than some known noise level. We show how to apply Nesterov's optimal
gradient and composite methods and a fast alternating linearization method
to obtain epsilon-optimal solutions in O(1/epsilon) iterations, while keeping the cost of each iteration low. We also propose a very efficient alternating direction augmented Lagrangian method, and prove its convergence.
Joint work with: Serhat Aybat and Garud Iyengar
Title: Second-order conditions in nonlinear optimization
Abstract: Second-order conditions known to most optimizers
(and found in most optimization books) are valid under fairly
strong assumptions and guarantee fairly weak conclusions.
In this talk, we will survey some state-of-the-art second order
conditions that can be obtained using penalty functions,
perturbations, and ultrafilter convergence.
Title: General linear complementarity problems:
algorithms and models
Abstract:
Linear complementarity problems (LCP) are usually NP-hard problems. The largest matrix class where the interior point algorithms (IPA) are polynomial is the class of P*(κ)-matrices, for given nonnegative real parameter κ. The union for all possible κ parameters of P*(κ)-matrices forms the class of P*-matrices. This class of matrices has been introduced by Kojima et al. in 1991.
Cottle et al. (1989) defined the class of sufficient matrices (SU). It has been proved that several variants of the criss-cross algorithm are finite for LCPs with sufficient matrices.
After all of these, it is a natural question: What is the relation between the sufficient and P*-matrices? Väliaho (1996) proved that the P*-matrices are exactly those which are sufficient.
Using the concept of EP-theorem of Cameron and Edmonds (1990) and the LCP duality theorem of Fukuda and Terlaky (1992), Fukuda et al. (1998) were able to generalize the criss-cross algorithm for LCP problems with arbitrary matrices. The generalized criss-cross algorithm for LCPs with rational data stops in finite number of iterations with one of the following stopping criteria: (i) primal LCP has been solved and the encoding size of the solution has a polynomial bound, (ii) dual LCP has been solved and the encoding size of the solution has a polynomial bound, (iii) the matrix of the problem is not sufficient matrix and there is a certificate whose encoding size has a polynomial bound.
Since 1998, it was an interesting open question whether the result of Fukuda et al. can be obtained using some generalization of IPAs or not? We modified some IPAs such that their stopping criteria are the same as those of the generalized criss-cross algorithm. The modified interior point algorithms running time is still polynomial, but does not give in all cases a solution for solvable LCPs [third stopping criterion, the matrix is not sufficient, but the LCP might have a solution].
Some of our interior point algorithms that solve LCPs with arbitrary matrices in the sense of the EP-theorem have been published recently.
Goal of this talk is to introduce algorithms that may solve general LCPs and to show their computational performance on the well-known exchange market model of Arrow and Debreu.
Joint work with: Zsolt Csizmadia, Adrienn Nagy and Marianna Nagy
Title: Condition numbers for optimization and compressed sensing thresholds.
Abstract: A precise error analysis of sparse recovery problems with noise can be
obtained from the study of variants of Renegar's condition number for
convex feasibility problems. The probability distribution of these
condition measures for random problem instances is well studied,
leading in turn to new results in efficient sparse recovery.
Title: Random gradient-free minimization of convex functions
Abstract:
In this talk, we present the complexity bounds
for methods of Convex Optimization based only on
computation of the function value. The search directions
of our schemes are normally distributed random Gaussian
vectors. It appears that such methods usually need at most
n times more iterations than the standard gradient
methods, where n is the dimension of the space of
variables. This conclusion is true both for nonsmooth and
smooth problems. For the later class, we develop also an
accelerated scheme with the expected rate of convergence
O({n^2 \over k^2}), where k is the iteration counter.
For Stochastic Optimization, we propose a zero-order
scheme and justify its expected rate of convergence O({n
\over k^{1/2}}). We give also some bounds for the rate of
convergence of the random gradient-free methods to
stationary points of nonconvex functions, both for smooth
and nonsmooth cases. Our theoretical results are supported
by preliminary computational experiments.
Title: Bad semidefinite programs: they all look the same
Abstract:
Duality theory is a central concept in Semidefinite Programming (SDP), just like it is in linear programming, since in optimization algorithms a dual solution serves as a certificate of optimality. However, in SDP, unlike in LP, ``pathological'' phenomena occur: nonattainment of the optimal value, and positive duality gaps between the primal and dual problems. This research was motivated by the curious similarity of pathological SDP instances appearing in the literature. We find an exact characterization of semidefinite systems, which are badly behaved from the viewpoint of duality, i.e. show that -- surprisingly -- ``all bad SDPs look the same''. We also prove an "excluded minor" type result: all badly behaved semidefinite systems can be reduced (in a well defined sense) to a minimal such system with just one variable, and two by two matrices. Our characterizations imply that recognizing badly behaved semidefinite systems is in NP and co-NP in the real number model of computing. The main results are derived from a fairly general characterization of badly behaved conic linear systems, and they hinge on a previous result on the closedness of the linear image of a closed convex cone. We show characterizations of badly behaved second order, and other conic systems as well. The URL of the short version of the paper is: http://www.optimization-online.org/DB_HTML/2010/11/2809.html
Title: A modified perceptron algorithm
Abstract:
The perceptron algorithm, introduced in the late fifties in the machine learning community, is a simple greedy algorithm for solving the polyhedral feasibility problem Ay > 0. The algorithm's main advantages are its simplicity and noise tolerance. The algorithm's main disadvantage is its slow convergence rate. We propose a modified version of the perceptron algorithm that retains the algorithm's original simplicity but has a substantially improved convergence rate.
Joint work with: Negar Soheili
Title:
Lock-Free Approaches to Parallelizing Stochastic Gradient Descent
Abstract:
Stochastic Gradient Descent (SGD) is a popular optimization algorithm
for solving a myriad of data-driven problems including support vector
machines, text labeling, and matrix factorization. SGD is well suited
to processing large amounts of data due to its robustness against
noise, rapid convergence rates, and predictable memory
footprint. Nevertheless, scaling SGD to very large problems is
challenging as SGD seems to be impeded by many of the classical
barriers to scalability: (1) SGD appears to be inherently sequential,
(2) SGD assumes uniform sampling from the underlying data set
resulting in poor locality, and (3) current approaches to parallelize
SGD require performance-destroying, fine-grained communication. This
talk aims to refute the conventional wisdom that SGD inherently
suffers from these scalability impediments. Specifically, using novel
theoretical analysis, algorithms, and implementation, I will show
that SGD can be implemented in parallel with minimal communication and
without any locking or synchronization. I will also discuss how to
induce additional locality through biased random orderings of the
incremental gradients. I will proceed to provide several example
implementations that achieve linear speedups on real data-processing
problems.
Joint work with: Feng Niu, Christopher Re, and Stephen Wright
Title: Using Newton's Method to Reoptimize for Hyperbolic Programs
Abstract:
For conic linear programs defined over hyperbolicity cones (e.g.,
semidefinite programming), when the feasible region has smooth
boundary with strict curvature at the optimal solution, there is a
natural strictly-convex function whose (unconstrained) minimizer is
precisely the optimal solution of the conic program. We show that
when Newton's method is applied to minimizing the function, the radius
of quadratic convergence is nearly as large as one could reasonably
hope it to be. This has potential applications for reoptimization,
when there is (slight) change to the data for the conic program,
resulting in a new optimal solution that needs to be computed.
Title:Iteration Complexity of Randomized Block-Coordinate Descent Methods
for Minimizing a Composite Function.
Abstract: We develop a randomized block-coordinate descent method for
minimizing the sum of a smooth and a simple nonsmooth block-separable
convex function and prove that it obtains an $\epsilon$-accurate
solution with probability at least 1-\rho in at most O((2n/\epsilon )\log (1/\rho)$ iterations, where n is the dimension of the problem. For strongly convex functions the method converges linearly. This extends recent results of Nesterov [Efficiency of coordinate descent methods on huge-scale optimization problems, CORE Discussion Paper \#2010/2], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing $\epsilon$ from under the logarithm. More importantly, in contrast with the aforementioned work in which the authors achieve the results by applying their method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving true iteration complexity bounds. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms.
We demonstrate numerically that the algorithm is able to solve
huge-scale \ell_1-regularized support vector machine and least
squares problems with billion variables. Finally, we present
preliminary computational results with a GPU-accelerated parallel
version of the method, achieving speedups of up to two orders of
magnitude when compared to a single-core implementation in C.
Joint work with: Martin Takac.
Title:Invisibility in Billiards and Open Problems in Newtonian
Aerodynamics
Abstract: We consider bodies moving in a rarefied flow of non-interacting
particles (as in the billiard theory or in geometrical optics) and
show that it is possible to construct a fractal body which is
invisible in 3 different directions. We also present a list open
problems in Newtonian Aerodynamics.
Joint work with: Alexander Plakhov
Title: On the Curvature of the Central Path and Polytopes:
Klee-Minty Constructions, the Hirsh Conjecture and its Relatives
Abstract:
Edge paths and the central path are key concepts in simplex and interior point methods, respectively.
Their worst case and average case behavior relate to worst case and average case complexity of
simplex and interior point algorithms. In this talk, we review the various constructions for
that force the central path to follow the simplex-edge path of the Klee-Minty cube.
Although the Hirsch conjecture is disproved in its original form, tightening the bounds for its
weaker variants is an active area of research. Further, there is a strong between partial results
about the Hirsh conjecture, and its analogue that the order of the largest total curvature of
the central path associated to a polytope is the number of inequalities defining the polytope.
In turn, the average central path curvature result of Dedieu, Malajovich and Shub leads to
and analogous conjecture about the average diameter of an arrangement.
Intriguing constructions illustrate the results and substantiate the conjectures.
Join work with: Antoine Deza, Eissa Nematollahi, Yuriy Zinchenko and Mut Murat.
Title: A robust robust (sic) optimization result
Abstract:
We show that, "on average," very little is lost if a misspecified
objective function is optimized over an arbitrary compact set,
under three probability distributions. An application is given to
the computational complexity of binary optimization problems.
Join work with: Martina Gancarova
Title: The Cubic Regularization Algorithm and Complexity Issues for Nonconvex Optimization
Abstract: The talk will survey recent developments in the analysis of worst-case complexity bounds for algorithms intended for solving nonconvex continuous optimization problems. The convergence to first- and second-order critical points in the unconstrained case will be considered first, and methods such as steepest descent, Newton and several of its variants will be revisited. The talk will also present some new approaches for the constrained case. Some relatively surprising results will be given and the special nature of the cubic regularization method (ARC) will be pointed out.
Join work with: Coralia Cartis and Nick Gould
Title: Logarithmic barriers for sparse matrix cones.
Abstract:
We describe algorithms for evaluating the values, gradients, and
Hessians of logarithmic barrier functions for two types of convex
matrix cones: cones of positive semidefinite matrices with a given
sparsity pattern, and cones of chordal sparse matrices with a positive
semidefinite completion.
The algorithms are based on techniques used in multifrontal and
supernodal Cholesky factorization methods.
Coralia Cartis, University of Edinburgh, UK
Frank Curtis,
Lehigh University, USA
Francois Glineur,
UCL, Belgium
Donald Goldfarb, Columbia University, USA
Osman Guler,
UMBC, USA
Tibor Illes,
University of Strathclyde, UK
Martin Lotz, University of Edinbutgh, UK
Yurii Nesterov, UCL, Belgium
Gabor Pataki,
UNC at Chapel Hill, USA
Javier Pena,
Carnegie Mellon University, USA
Ben Recht, University of Wisconsin, Madison, USA
James Renegar, Cornell University, USA
Peter Richtarik,
University of Edinburgh, UK
Vera Roshchina,
University of Evora, Portugal
Tamas Terlaky,
Lehigh University, USA
Michael Todd,
Cornell University, USA
Philippe Toint
FUNDP, Belgium
Lieven Vandenberghe UCLA, USA