On self-concordant convergence


In the post Feasibility is optimal I suggested introducing a new barrier function to guarantee faster convergence towards the analytical center of a polytope. This turns out to be unnecessary or even detrimental, and in this post I will get...



Onion Details



Page Clicks: 0

First Seen: 05/06/2024

Last Indexed: 10/25/2024

Domain Index Total: 80



Onion Content



In the post Feasibility is optimal I suggested introducing a new barrier function to guarantee faster convergence towards the analytical center of a polytope. This turns out to be unnecessary or even detrimental, and in this post I will get into why. This post also presents a potentially new result relating to convergence for a class of central path methods for linear programming which I will call medium-step methods. After digging around for literature on self-concordant functions I came across the book Convex Programming by Stephen Boyd and Lieven Vandenberghe. Chapter 9 of the book deals precisely with the kind of problem I'm wanting to solve, especially section 9.6. I will also use the paper A polynomial-time algorithm, based on Newton's method, for linear programming by James Renegar. Using the notation from Renegar, the iteration bound on page 505 in Convex Programming , using optimal line search and 64-bit floating point accuracy, becomes: 20 - 2 1 / 4 ( 1 - 1 / 2 ) 2 ( f ( ξ ) - f ( x 0 ) ) + log 2 ( 1 / 2 - 53 ) ≤ 288 ( f ( x 0 ) - f ( ξ ) ) + 6 All that remains is to work out a bound for Δ = f ( x 0 ) - f ( ξ ) . Remember that f here is the barrier function, which we are minimizing. Large values of Δ means slow convergence. In particular it is purely detrimental to add large quadratic terms as I suggested in the previous post. It may also be tempting to scale the barrier function down by some constant r 1 would be useless in what follows, but I will try to preserve the analysis for other values of l in case it is useful to someone. Each step j in the algorithm has an associated centroid ξ ( j ) . Define the function X ( j ) ( x ) as follows: X i ( j ) ( x ) = a i T x - b i a i T ξ ( j ) - b i where 1 ≤ i ≤ m ′ correspond to the constraints that are not changing and m ′ + 1 ≤ i ≤ m ′ + l to the one that does change (and copies of it, if any). In other words X ( x ) normalizes the coordinates A x - b ∈ R m by dividing them elementwise by A ξ ( j ) - b . This means that X ( j ) ( ξ ( j ) ) = e where e is all ones. Renegar also shows that e T X ( x ) = m = m ′ + l for all feasible x . If we move the constraint some fraction δ d = 0.01 : 0.001 : 0.4 ; b = d ./ ( 1 - d ); it =( ceil ( 288 * ( - log ( 1 - b ) - b )) + 6 ) .* ceil ( 1. / d / 3 ); plot ( d , it ); [ i , j ] = min ( it ); it ( j ), d ( j ) ans = 26 ans = 0.1670 So as few as 2 * 13 = 26 Newtons steps will perform the same job that would otherwise require 62 steps with δ = 1 / 3 . So conjecturally we are nearly an order of magnitude faster than the 206 figure in the previous section. Medium-step The term medium-step comes about for two reasons: the existence of the term long-step in the literature and the much shorter steps taken in Renegar's paper compared to what is done here. Renegar uses δ = 1 / ( 13 m ) whereas I use δ = 1 / 6 above. We could therefore call Renegar's δ short-step . It requires O ( L m ) iterations. On the other extreme, the type of predictor-corrector methods often used in practice are known as long-step methods. They use δ ≫ 1 , sometimes stepping as far as 99.9% of the distance to the boundary of the system. Todd and Ye demonstrate that these methods have worst-case performance of Ω ( L m 3 ) and, if I understand correctly, they remain O ( L m ) . Because the δ chosen above is inbetween these two extremes I have chosen to call the method medium-step. Note again that this result is only useful if we seek to remain in the center of a linear program. It is not useful if one seeks to optimize some objective function c T x . Future work Further theoretical improvements could be made by analyzing what I will call medium-step predictor-corrector methods. By computing the tangent to the central path using a short step, a line search can be performed for finding a better initial guess in the system updated with δ = 1 / 6 . This is likely to bring down the number of Newton steps. By how much I am not sure. Note that computing this tangent itself costs a small number of Newton steps. On the other hand the system is likely to be better conditioned compared to just after a medium-step update, meaning these Newton steps are cheaper to compute. Closing remarks I started writing this post around 2023-03-05, so in total I've been obsessing about this specific problem for roughly seven weeks. Initially I thought I had a much stronger result than I actually do, so quite a bit of time was spent making sure I don't overstate the result. In the process I've learned even more about barrier methods and self-concordant functions.