Interior Point Methods

John Mitchell

November 2010

Our standard linear programming problem is

Here, c and x are n vectors, b is an m vector, and A is an m × n matrix. The simplex algorithm moves from basic feasible solution to basic feasible solution. The basic feasible solutions are extreme points of the feasible region for (P). Furthermore, the simplex algorithm moves from one extreme point along an edge of the feasible region to another extreme point. If the feasible region is very big with many extreme points, the simplex algorithm may take a long time before it finds the optimal extreme point. Thus, we may try to use an algorithm which cuts across the middle of the feasible region. Such a method is called an interior point method. There are many different interior point algorithms; we will just consider one: the primal affine scaling algorithm.

Notation: We will let e = [1,…, 1]^{T }.

The boundaries of the feasible region are given by the nonnegativity inequalities x ≥ 0. Therefore, a point is in the interior of the feasible region if it satisfies

Assume we know a strictly positive point x^{k} which is feasible for the problem (P). The best
direction to move may appear to be the direction -c, because this will result in the largest
decrease in the objective function value for the smallest change in x. There are two problems with
this direction:

- It may result in a point which no longer satisfies Ax = b. We will return to this issue later when we discuss projections.
- We may only be able to take a very small step before we violate the nonnegativity constraint.

Consider the problem

so here c = [-1,-1, 1, 1]^{T }. This problem has optimal solution x^{*} = (1, 2, 0, 0) with value -3. The
point x^{k} = (0.8, 0.1, 0.2, 1.9) is feasible in this problem, with objective function value 1.2. We want
to look at points of the form

for some steplength β, because the objective function value of such a point is

because c^{T }c is the dot product between the vector c and itself, that is, it is just the square of the
length of the vector c. Now,

To get around this difficulty, we rescale the problem so that we can take a step of length at least one in any direction. Thus, we introduce a diagonal matrix

We then scale the constraint matrix and the objective function, getting A^{k} := AD^{k} and c^{k} := D^{k}c.
For the problem (LP1), with x^{k} as given above, we get

and

We then get a rescaled problem

The rescaled version of (LP1) is

Notice that z = (1, 1, 1, 1) is feasible in (LP1^{k}), with objective function value 1.2. Because all of
the components of z are at least one, we can take a reasonable step in any direction.

We list several properties of (P^{k}). It may help you to understand these properties if you see how
they apply to the problem (LP1).

- Since D
^{k}is a diagonal matrix, it is equal to its transpose: D^{k}^{T }= D^{k}. - If is feasible in (P) then := D
^{k}^{-1}is feasible in (P^{k}). Further, c^{T }= c^{k}^{T }.

Check: A^{k}= AD^{k}D^{k}^{-1}= A = b.

Also: c^{k}^{T }= (D^{k}c)^{T }D^{k}^{-1}= c^{T }D^{k}D^{k}^{-1}= c^{T }. - If is feasible in (P
^{k}) then := D^{k}is feasible in (P). Further, c^{T }= c^{k}^{T }.

Check: A = AD^{k}= A^{k}= b.

Also: c^{T }= c^{T }D^{k}= (D^{k}c)^{T }= c^{k}^{T }. - The point z = (1,…, 1) is feasible in (P
^{k}). It corresponds to the point x^{k}in (P). Note that D^{k}e = x^{k}and D^{k}^{-1}x^{k}= e.

Conceptually, an iteration of the algorithm has the form:

- Given a point x
^{k}> 0 which is feasible for (P), rescale to get new problem (P^{k}). The point z = e is feasible in the rescaled problem (P^{k}). - Update z to a new point: z
^{new}= e + βd, where d is a direction and β is a step length. We pick β to ensure that z^{new}> 0. - Scale back to get a feasible point in (P): x
^{k+1}= D^{k}z^{new}.

We would like to take d = -c^{k}, in order to decrease the objective function value as quickly as possible.
Unfortunately, this may lead to a point which violates the constraint A^{k}z = b. For example, in the
problem (LP1^{k}), taking β = 0.5 gives the point

because e is feasible in (P^{k}). Therefore, we project the direction c^{k} onto the nullspace of A^{k}.
Algebraically, this means that we take

| (1) |

where

| (2) |

and I denotes the identity matrix. (Aside: We assume that the rows of A are linearly independent. Under this assumption, the projection matrix is well-defined. Note that we need this assumption to hold in order to be able to obtain a basic feasible solution.) For the problem (LP1), we have

We have z^{new} = e + βd, and we need to have z^{new} > 0. Thus, we need to select β so that
1 + βd_{i} > 0 for each component i. So, we need to pick

so we take

| (3) |

This will result in moving 0.9 of the way to the boundary of the feasible region of the rescaled
problem (P^{k}). Notice that one component of z^{new} will be equal to 0.1.

(This discussion on choosing the steplength should remind you of the minimum ratio rule in the simplex algorithm.)

Recall that when we looked at the revised simplex method, we found out that it was not necessary
to calculate the whole of the simplex tableau. In the same spirit of efficiency, it is not necessary to
calculate the complete projection matrix P_{Ak}. In practice, we would perform one iteration as
follows:

- Given current iterate x
^{k}> 0 feasible in (P). - Calculate D
^{k}, A^{k}= AD^{k}, c^{k}= D^{k}c. - Calculate w
^{k}= A^{k}c^{k}. - Calculate v
^{k}= (A^{k}A^{k}^{T })^{-1}w^{k}. (Aside: In practice, we would not calculate the inverse explicitly, but we would solve the system of equations (A^{k}A^{k}^{T })v^{k}= w^{k}in order to find v^{k}.) - Calculate g
^{k}= A^{k}^{T }v^{k}. - Calculate the direction d
^{k}in the problem (P^{k}): d^{k}= -c^{k}+ g^{k}(= -P_{ Ak}c^{k}). - Calculate step length β using equation (3).
- Update z
^{new}= e + βd^{k}. - Update x
^{k+1}= D^{k}z^{new}.

The algorithm does decrease the objective function at each iteration. This is because a projection
matrix M is idempotent, that is, it is symmetric and MM = M. We then get that the objective
function value of z^{new} is

It can be shown that

- d
^{k}= 0 if and only if x^{k}is a vertex. - c-A
^{T }v^{k}≥ 0 at the optimal vertex. The optimal vertex is the only one where we have c - A^{T }v^{k}≥ 0.

A possible dual iterate is

| (4) |

Notice that the dual slacks are then c - A^{T }v^{k}, and that d^{k} = -D^{k}(c - A^{T }v^{k}). Thus, the elements
of d^{k} give the product between the primal variable d_{
i}^{k} and the corresponding dual slack, so they
give a measure of the complementary slackness. At optimality, we will have dual feasibility with
this choice. It can be shown that this choice is actually dual optimal if x^{k} is optimal for the primal
problem.

We stop the simplex algorithm when all the costs in the tableau are nonnegative. Unfortunately, we don’t have such a simple stopping rule for the primal affine scaling algorithm. However, there are some possibilities:

- Is c - A
^{T }v^{k}≥-ϵe for some ϵ such as 10^{-6}. (This corresponds to almost getting dual feasibility.) - Is the change in the objective function from one iteration to the next smaller
than some tolerance? That is, do we have c
^{T }x^{k}- c^{T }x^{k+1}≤ ϵ, or alternatively, (c^{T }x^{k}- c^{T }x^{k+1})∕c^{T }x^{k}≤ ϵ. This criterion indicates that we are not making much progress, so we must be very close to the optimal. - Is d
^{k}a small vector? This indicates that we are close to getting complementary slackness. (See the last subsection.)

The algorithm still sometimes makes very slow progress when it gets close to a corner of the feasible region. One possible remedy is to try explicitly to center the iterates. We do this by moving in the direction

in the scaled problem (P^{k}). Notice that this direction is trying to increase all components
simultaneously. An algorithm will usually do some steps using the regular d^{k} we found earlier and
then mix in some centering steps.

In the example we were looking at earlier, the centering direction is

Current commercial implementations combine affine steps with centering and duality in predictor-corrector methods. In addition to excellent computational performance, variants of these methods converge in a number of iterations that is polynomial in the size of the problem.

Serious research on interior point methods started with the publication of Karmarkar’s paper [3]. Karmarkar made some very bold claims for the performance of his algorithm, which were somewhat borne out by subsequent results. The primal affine scaling method is a simplification of Karmarkar’s original algorithm that was proposed by several researchers in 1986, including Vanderbei, Meketon and Freedman [8]. This method was in fact discovered by the Russian Dikin [2] in 1967, although this discovery remained unknown in the west until much later. A good (mathematical) description of the algorithm can be found in [7].

Textbooks on interior point methods include [1, 4, 5, 6, 9].

Current computational results indicate that interior point methods outperform the simplex algorithm on large problems.

It seems that an interior point method will solve almost any problem in at most 40 iterations. The number of iterations required grows very slowly as the size of the problem increases. By contrast, the simplex algorithm seems to need approximately 1.5m iterations to solve a problem with m constraints. The work required at each iteration of an interior point method is far larger than the amount of work needed to compute a simplex pivot, so there is a trade-off: an interior point algorithm needs far fewer iterations, but it takes considerably more time per iteration, when compared to the simplex algorithm.

[1] S. P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004.

[2] I. I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademiia Nauk SSSR, 174:747–748, 1967. English Translation: Soviet Mathematics Doklady, 1967, Volume 8, pp. 674–675.

[3] N. K. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395, 1984.

[4] D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming. Springer, 3rd edition, 2008.

[5] C. Roos, T. Terlaky, and J.-Ph. Vial. Interior Point Methods for Linear Optimization. Springer, 2nd edition, 2005.

[6] R. J. Vanderbei. Linear Programming: Foundations and Extensions. Springer, 3rd edition, 2008.

[7] R. J. Vanderbei and J. C. Lagarias. I. I. Dikin’s convergence result for the affine–scaling algorithm. In J. C. Lagarias and M. J. Todd, editors, Mathematical Developments Arising from Linear Programming: Proceedings of a Joint Summer Research Conference held at Bowdoin College, Brunswick, Maine, USA, June/July 1988, volume 114 of Contemporary Mathematics, pages 109–119. American Mathematical Society, Providence, Rhode Island, USA, 1990.

[8] R. J. Vanderbei, M. S. Meketon, and B. A. Freedman. A modification of Karmarkar’s linear programming algorithm. Algorithmica, 1:395–407, 1986.

[9] S. J. Wright. Primal-dual interior point methods. SIAM, Philadelphia, 1996.