Abstracts of the talks of the Optimization Workshop at FOCM 2011

Title: Convex Relaxations for Subset Selection

Abstract: We use convex relaxation techniques to produce lower bounds on the optimal value of subset selection problems and generate good approximate solutions. We then explicitly bound the quality of these relaxations by studying the approximation ratio of sparse eigenvalue relaxations. Our results are used to improve the performance of branch-and-bound algorithms to produce exact solutions to subset selection problems.

Michel Baes, ETH, Switzerland

Title: A randomized Mirror-Prox method for solving structured matrix saddle-point problems

Abstract In this paper, we derive a randomized version of the Mirror-Prox method for solving some structured matrix saddle-point problems, such as the maximal eigenvalue minimization problem. Deterministic first-order schemes, such as Nesterov's Smoothing Techniques or standard Mirror-Prox methods, require the exact computation of a matrix exponential at every iteration, limiting the size of the problems they can solve. Our method allows us to use stochastic approximations of matrix exponentials. We prove that our randomized scheme decreases signif icantly the complexity of its deterministic counterpart for large-scale matrix saddle-point problems. Numerical experiments illustrate and con irm our theoretical results.

Constantine Caramanis, University of Texas at Austin, USA

Title: Robust Matrix Completion and Collaborative Filtering

Abstract: In this talk, we consider the problem of matrix completion when some number of the columns are arbitrarily corrupted. It is well-known that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a constant fraction of columns are corrupted? In this paper, we discuss this very problem, and present a robust and eff icient algorithm for its solution. One direct application comes from robust collaborative f iltering. Here, some number of users are so-called manipulators, and try to skew the predictions of the algorithm. Signif icantly, our results hold without any assumptions on the observed entries of the manipulated columns.

Joint work with: Yudong Chen, Huan Xu, Sujay Sanghavi

Coralia Cartis, University of Edinburgh, UK

Title: Towards Optimal Newton-Type Methods for Nonconvex Smooth Optimization

Abstract: We show that the steepest-descent and Newton's methods for unconstrained nonconvex optimization under standard assumptions may both require a number of iterations and function evaluations arbitrarily close to the steepest-descent's global worst-case complexity bound. This shows that the latter upper bound is essentially tight for steepest descent and that Newton's method may be as slow as the steepest-descent method in the worst case. More generally, we consider a class of second-order methods for unconstrained minimization that includes Newton's, inexact linesearch, cubic regularization, some Goldfeld-Quandt-Trotter and trust-region variants. For each algorithm in this class, we exhibit an example of a sufficiently smooth and bounded-below objective such that the method takes at least O(\epsilon^{-3/2}) function-evaluations to drive the gradient below \epsilon. Thus, as this evaluation count coincides in order with the upper complexity bound for cubic regularization, the latter has (order) optimal worst-case complexity amongst the methods in this class.

Joint work with Nick Gould and Philippe Toint.

Frank Curtis, Lehigh University, USA

Title: Nonsmooth Optimization via Gradient Sampling

Abstract: We discuss an algorithm for nonsmooth, nonconvex optimization. We focus on the idea of gradient sampling to practically approximate epsilon-subdifferentials of the problem functions, which allows for problems to be solved via a sequence of quadratic subproblems. The ideas apply in both unconstrained and constrained contexts. We emphasize that adaptive sampling, warm-starts for the quadratic subproblem solver, and quasi-Newton Hessian approximations can be used to develop efficient methods for unconstrained problems, and illustrate how these techniques can be applied when constraints are present.

Joint work with: Michael Overton and Xiaocun Que

Francois Glineur, UCL, Belgium

Title: Convex optimization with a first-order inexact oracle

Abstract:We introduce a new def inition of inexact oracle for convex optimization, which can be viewed as a common extension of the classical notions of (1) epsilon-subgradient and (2) Lipschitz-continuous gradient. We study the behaviour of classical f irst-order methods for smooth convex optimization when such an inexact oracle is used instead of the exact gradient. In particular, we show that the convergence of the classical gradient method is mostly unchanged (only the reachable accuracy is reduced), while the behaviour of the fast gradient method seriously deteriorates (it no longer converges, due to error accumulation). We provide several examples of situations where such an inexact oracle is naturally available, including cases involving non-smooth functions or functions whose derivatives are not directly accessible.

Joint work with O. Devolder and Y. Nesterov

Donald Goldfarb, Columbia University, USA

Title: Fast First-Order and Alternating Direction Methods for Stable Princial Component Pursuit

Abstract: Given a matrix M that is the sum of a low rank matrix X and a sparse matrix Y, it is known that under certain conditions, X and Y can be recovered with very high probability by solving a so-called robust principal component pursuit problem. When M has noise added to it, the recovery problem is called stable principal component pursuit. In this problem one minimizes the sum of the nuclear norm of X plus a scalar times the sum of the absolute values of the elements of Y, subject to the Frobenius norm of X+Y-M being less than some known noise level. We show how to apply Nesterov's optimal gradient and composite methods and a fast alternating linearization method to obtain epsilon-optimal solutions in O(1/epsilon) iterations, while keeping the cost of each iteration low. We also propose a very eff icient alternating direction augmented Lagrangian method, and prove its convergence.

Joint work with: Serhat Aybat and Garud Iyengar

Osman Guler, UMBC, USA

Title: Second-order conditions in nonlinear optimization

Abstract: Second-order conditions known to most optimizers (and found in most optimization books) are valid under fairly strong assumptions and guarantee fairly weak conclusions. In this talk, we will survey some state-of-the-art second order conditions that can be obtained using penalty functions, perturbations, and ultrafilter convergence.

Tibor Illes, University of Strathclyde, UK

Title: General linear complementarity problems: algorithms and models

Abstract: Linear complementarity problems (LCP) are usually NP-hard problems. The largest matrix class where the interior point algorithms (IPA) are polynomial is the class of P*(κ)-matrices, for given nonnegative real parameter κ. The union for all possible κ parameters of P*(κ)-matrices forms the class of P*-matrices. This class of matrices has been introduced by Kojima et al. in 1991. Cottle et al. (1989) defined the class of sufficient matrices (SU). It has been proved that several variants of the criss-cross algorithm are finite for LCPs with sufficient matrices. After all of these, it is a natural question: What is the relation between the sufficient and P*-matrices? Väliaho (1996) proved that the P*-matrices are exactly those which are sufficient. Using the concept of EP-theorem of Cameron and Edmonds (1990) and the LCP duality theorem of Fukuda and Terlaky (1992), Fukuda et al. (1998) were able to generalize the criss-cross algorithm for LCP problems with arbitrary matrices. The generalized criss-cross algorithm for LCPs with rational data stops in finite number of iterations with one of the following stopping criteria: (i) primal LCP has been solved and the encoding size of the solution has a polynomial bound, (ii) dual LCP has been solved and the encoding size of the solution has a polynomial bound, (iii) the matrix of the problem is not sufficient matrix and there is a certificate whose encoding size has a polynomial bound. Since 1998, it was an interesting open question whether the result of Fukuda et al. can be obtained using some generalization of IPAs or not? We modified some IPAs such that their stopping criteria are the same as those of the generalized criss-cross algorithm. The modified interior point algorithms running time is still polynomial, but does not give in all cases a solution for solvable LCPs [third stopping criterion, the matrix is not sufficient, but the LCP might have a solution]. Some of our interior point algorithms that solve LCPs with arbitrary matrices in the sense of the EP-theorem have been published recently. Goal of this talk is to introduce algorithms that may solve general LCPs and to show their computational performance on the well-known exchange market model of Arrow and Debreu.

Joint work with: Zsolt Csizmadia, Adrienn Nagy and Marianna Nagy

Martin Lotz, University of Edinbutgh, UK

Title: Condition numbers for optimization and compressed sensing thresholds.

Abstract: A precise error analysis of sparse recovery problems with noise can be obtained from the study of variants of Renegar's condition number for convex feasibility problems. The probability distribution of these condition measures for random problem instances is well studied, leading in turn to new results in eff icient sparse recovery.

Yurii Nesterov, UCL, Belgium

Title: Random gradient-free minimization of convex functions

Abstract: In this talk, we present the complexity bounds for methods of Convex Optimization based only on computation of the function value. The search directions of our schemes are normally distributed random Gaussian vectors. It appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables. This conclusion is true both for nonsmooth and smooth problems. For the later class, we develop also an accelerated scheme with the expected rate of convergence O({n^2 \over k^2}), where k is the iteration counter. For Stochastic Optimization, we propose a zero-order scheme and justify its expected rate of convergence O({n \over k^{1/2}}). We give also some bounds for the rate of convergence of the random gradient-free methods to stationary points of nonconvex functions, both for smooth and nonsmooth cases. Our theoretical results are supported by preliminary computational experiments.

Gabor Pataki, UNC at Chapel Hill, USA

Title: Bad semidefinite programs: they all look the same

Abstract: Duality theory is a central concept in Semidefinite Programming (SDP), just like it is in linear programming, since in optimization algorithms a dual solution serves as a certificate of optimality. However, in SDP, unlike in LP, ``pathological'' phenomena occur: nonattainment of the optimal value, and positive duality gaps between the primal and dual problems. This research was motivated by the curious similarity of pathological SDP instances appearing in the literature. We find an exact characterization of semidefinite systems, which are badly behaved from the viewpoint of duality, i.e. show that -- surprisingly -- ``all bad SDPs look the same''. We also prove an "excluded minor" type result: all badly behaved semidefinite systems can be reduced (in a well defined sense) to a minimal such system with just one variable, and two by two matrices. Our characterizations imply that recognizing badly behaved semidefinite systems is in NP and co-NP in the real number model of computing. The main results are derived from a fairly general characterization of badly behaved conic linear systems, and they hinge on a previous result on the closedness of the linear image of a closed convex cone. We show characterizations of badly behaved second order, and other conic systems as well. The URL of the short version of the paper is: http://www.optimization-online.org/DB_HTML/2010/11/2809.html

Javier Pena, Carnegie Mellon University, USA

Title: A modified perceptron algorithm

Abstract: The perceptron algorithm, introduced in the late f ifties in the machine learning community, is a simple greedy algorithm for solving the polyhedral feasibility problem Ay > 0. The algorithm's main advantages are its simplicity and noise tolerance. The algorithm's main disadvantage is its slow convergence rate. We propose a modif ied version of the perceptron algorithm that retains the algorithm's original simplicity but has a substantially improved convergence rate.

Joint work with: Negar Soheili

Ben Recht, University of Wisconsin, Madison, USA

Title: Lock-Free Approaches to Parallelizing Stochastic Gradient Descent

Abstract: Stochastic Gradient Descent (SGD) is a popular optimization algorithm for solving a myriad of data-driven problems including support vector machines, text labeling, and matrix factorization. SGD is well suited to processing large amounts of data due to its robustness against noise, rapid convergence rates, and predictable memory footprint. Nevertheless, scaling SGD to very large problems is challenging as SGD seems to be impeded by many of the classical barriers to scalability: (1) SGD appears to be inherently sequential, (2) SGD assumes uniform sampling from the underlying data set resulting in poor locality, and (3) current approaches to parallelize SGD require performance-destroying, fine-grained communication. This talk aims to refute the conventional wisdom that SGD inherently suffers from these scalability impediments. Specif ically, using novel theoretical analysis, algorithms, and implementation, I will show that SGD can be implemented in parallel with minimal communication and without any locking or synchronization. I will also discuss how to induce additional locality through biased random orderings of the incremental gradients. I will proceed to provide several example implementations that achieve linear speedups on real data-processing problems.

Joint work with: Feng Niu, Christopher Re, and Stephen Wright

James Renegar, Cornell University, USA

Title: Using Newton's Method to Reoptimize for Hyperbolic Programs

Abstract: For conic linear programs defined over hyperbolicity cones (e.g., semidef inite programming), when the feasible region has smooth boundary with strict curvature at the optimal solution, there is a natural strictly-convex function whose (unconstrained) minimizer is precisely the optimal solution of the conic program. We show that when Newton's method is applied to minimizing the function, the radius of quadratic convergence is nearly as large as one could reasonably hope it to be. This has potential applications for reoptimization, when there is (slight) change to the data for the conic program, resulting in a new optimal solution that needs to be computed.

Peter Richtarik, University of Edinburgh, UK

Title:Iteration Complexity of Randomized Block-Coordinate Descent Methods for Minimizing a Composite Function.

Abstract: We develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an $\epsilon$-accurate solution with probability at least 1-\rho in at most O((2n/\epsilon )\log (1/\rho)$ iterations, where n is the dimension of the problem. For strongly convex functions the method converges linearly. This extends recent results of Nesterov [Efficiency of coordinate descent methods on huge-scale optimization problems, CORE Discussion Paper \#2010/2], which cover the smooth case, to composite minimization, while at the same time improving the complexity by the factor of 4 and removing $\epsilon$ from under the logarithm. More importantly, in contrast with the aforementioned work in which the authors achieve the results by applying their method to a regularized version of the objective function with an unknown scaling factor, we show that this is not necessary, thus achieving true iteration complexity bounds. In the smooth case we also allow for arbitrary probability vectors and non-Euclidean norms. We demonstrate numerically that the algorithm is able to solve huge-scale \ell_1-regularized support vector machine and least squares problems with billion variables. F inally, we present preliminary computational results with a GPU-accelerated parallel version of the method, achieving speedups of up to two orders of magnitude when compared to a single-core implementation in C.

Joint work with: Martin Takac.

Vera Roshchina, University of Evora, Portugal

Title:Invisibility in Billiards and Open Problems in Newtonian Aerodynamics

Abstract: We consider bodies moving in a rarefied flow of non-interacting particles (as in the billiard theory or in geometrical optics) and show that it is possible to construct a fractal body which is invisible in 3 different directions. We also present a list open problems in Newtonian Aerodynamics.

Joint work with: Alexander Plakhov

Tamas Terlaky, Lehigh University, USA

Title: On the Curvature of the Central Path and Polytopes: Klee-Minty Constructions, the Hirsh Conjecture and its Relatives

Abstract: Edge paths and the central path are key concepts in simplex and interior point methods, respectively. Their worst case and average case behavior relate to worst case and average case complexity of simplex and interior point algorithms. In this talk, we review the various constructions for that force the central path to follow the simplex-edge path of the Klee-Minty cube. Although the Hirsch conjecture is disproved in its original form, tightening the bounds for its weaker variants is an active area of research. Further, there is a strong between partial results about the Hirsh conjecture, and its analogue that the order of the largest total curvature of the central path associated to a polytope is the number of inequalities defining the polytope. In turn, the average central path curvature result of Dedieu, Malajovich and Shub leads to and analogous conjecture about the average diameter of an arrangement. Intriguing constructions illustrate the results and substantiate the conjectures.

Join work with: Antoine Deza, Eissa Nematollahi, Yuriy Zinchenko and Mut Murat.

Michael Todd, Cornell University, USA

Title: A robust robust (sic) optimization result

Abstract: We show that, "on average," very little is lost if a misspecified objective function is optimized over an arbitrary compact set, under three probability distributions. An application is given to the computational complexity of binary optimization problems.

Join work with: Martina Gancarova

Philippe Toint FUNDP, Belgium

Title: The Cubic Regularization Algorithm and Complexity Issues for Nonconvex Optimization

Abstract: The talk will survey recent developments in the analysis of worst-case complexity bounds for algorithms intended for solving nonconvex continuous optimization problems. The convergence to first- and second-order critical points in the unconstrained case will be considered first, and methods such as steepest descent, Newton and several of its variants will be revisited. The talk will also present some new approaches for the constrained case. Some relatively surprising results will be given and the special nature of the cubic regularization method (ARC) will be pointed out.

Join work with: Coralia Cartis and Nick Gould

Lieven Vandenberghe UCLA, USA

Title: Logarithmic barriers for sparse matrix cones.

Abstract: We describe algorithms for evaluating the values, gradients, and Hessians of logarithmic barrier functions for two types of convex matrix cones: cones of positive semidefinite matrices with a given sparsity pattern, and cones of chordal sparse matrices with a positive semidef inite completion. The algorithms are based on techniques used in multifrontal and supernodal Cholesky factorization methods.