Interior Point Methods

John Mitchell
November 2010

### 1 Introduction

Our standard linear programming problem is

Here, c and x are n vectors, b is an m vector, and A is an m × n matrix. The simplex algorithm moves from basic feasible solution to basic feasible solution. The basic feasible solutions are extreme points of the feasible region for (P). Furthermore, the simplex algorithm moves from one extreme point along an edge of the feasible region to another extreme point. If the feasible region is very big with many extreme points, the simplex algorithm may take a long time before it finds the optimal extreme point. Thus, we may try to use an algorithm which cuts across the middle of the feasible region. Such a method is called an interior point method. There are many different interior point algorithms; we will just consider one: the primal affine scaling algorithm.

Notation: We will let e = [1,, 1]T .

### 2 The primal affine scaling algorithm

The boundaries of the feasible region are given by the nonnegativity inequalities x 0. Therefore, a point is in the interior of the feasible region if it satisfies

Assume we know a strictly positive point xk which is feasible for the problem (P). The best direction to move may appear to be the direction -c, because this will result in the largest decrease in the objective function value for the smallest change in x. There are two problems with this direction:

• It may result in a point which no longer satisfies Ax = b. We will return to this issue later when we discuss projections.
• We may only be able to take a very small step before we violate the nonnegativity constraint.

#### 2.1 Rescaling so that we can take long steps

Consider the problem

so here c = [-1,-1, 1, 1]T . This problem has optimal solution x* = (1, 2, 0, 0) with value -3. The point xk = (0.8, 0.1, 0.2, 1.9) is feasible in this problem, with objective function value 1.2. We want to look at points of the form

for some steplength β, because the objective function value of such a point is

because cT c is the dot product between the vector c and itself, that is, it is just the square of the length of the vector c. Now,

Thus, if we want x 0, the largest possible value of β is 0.2 and then we get x = (1, 0.3, 0, 1.7) with value 0.4, so we have not moved very far towards the optimal point.

To get around this difficulty, we rescale the problem so that we can take a step of length at least one in any direction. Thus, we introduce a diagonal matrix

We then scale the constraint matrix and the objective function, getting Ak := ADk and ck := Dkc. For the problem (LP1), with xk as given above, we get

and

We then get a rescaled problem

The rescaled version of (LP1) is

Notice that z = (1, 1, 1, 1) is feasible in (LP1k), with objective function value 1.2. Because all of the components of z are at least one, we can take a reasonable step in any direction.

#### 2.2 The relationship between (P) and (Pk)

We list several properties of (Pk). It may help you to understand these properties if you see how they apply to the problem (LP1).

• Since Dk is a diagonal matrix, it is equal to its transpose: DkT = Dk.
• If is feasible in (P) then := Dk-1 is feasible in (Pk). Further, cT = ckT .
Check: Ak = ADkDk-1 = A = b.
Also: ckT = (Dkc)T Dk-1 = cT DkDk-1 = cT .
• If is feasible in (Pk) then := Dk is feasible in (P). Further, cT = ckT .
Check: A = ADk = Ak = b.
Also: cT = cT Dk = (Dkc)T = ckT .
• The point z = (1,, 1) is feasible in (Pk). It corresponds to the point xk in (P). Note that Dke = xk and Dk-1xk = e.

#### 2.3 Maintaining Ax = b

Conceptually, an iteration of the algorithm has the form:

1. Given a point xk > 0 which is feasible for (P), rescale to get new problem (Pk). The point z = e is feasible in the rescaled problem (Pk).
2. Update z to a new point: znew = e + βd, where d is a direction and β is a step length. We pick β to ensure that znew > 0.
3. Scale back to get a feasible point in (P): xk+1 = Dkznew.

We would like to take d = -ck, in order to decrease the objective function value as quickly as possible. Unfortunately, this may lead to a point which violates the constraint Akz = b. For example, in the problem (LP1k), taking β = 0.5 gives the point

which does not satisfy the constraints. We need to move in a direction d which satisfies Akd = 0. (A direction d which satisfies Akd = 0 is said to be in the nullspace of the matrix Ak.) We then get

because e is feasible in (Pk). Therefore, we project the direction ck onto the nullspace of Ak. Algebraically, this means that we take

 (1)

where

 (2)

and I denotes the identity matrix. (Aside: We assume that the rows of A are linearly independent. Under this assumption, the projection matrix is well-defined. Note that we need this assumption to hold in order to be able to obtain a basic feasible solution.) For the problem (LP1), we have

so
and
So, in the problem (LP1k), we use the direction
This gives a new point of the form
Taking β = 2 gives
and then we get a new iterate for (P):
with value -0.82.

#### 2.4 Choosing the steplength

We have znew = e + βd, and we need to have znew > 0. Thus, we need to select β so that 1 + βdi > 0 for each component i. So, we need to pick

so we take

 (3)

This will result in moving 0.9 of the way to the boundary of the feasible region of the rescaled problem (Pk). Notice that one component of znew will be equal to 0.1.

(This discussion on choosing the steplength should remind you of the minimum ratio rule in the simplex algorithm.)

#### 2.5 One iteration of the algorithm

Recall that when we looked at the revised simplex method, we found out that it was not necessary to calculate the whole of the simplex tableau. In the same spirit of efficiency, it is not necessary to calculate the complete projection matrix PAk. In practice, we would perform one iteration as follows:

1. Given current iterate xk > 0 feasible in (P).
2. Calculate Dk, Ak = ADk, ck = Dkc.
3. Calculate wk = Akck.
4. Calculate vk = (AkAkT )-1wk. (Aside: In practice, we would not calculate the inverse explicitly, but we would solve the system of equations (AkAkT )vk = wk in order to find vk.)
5. Calculate gk = AkT vk.
6. Calculate the direction dk in the problem (Pk): dk = -ck + gk(= -P Akck).
7. Calculate step length β using equation (3).
8. Update znew = e + βdk.
9. Update xk+1 = Dkznew.

#### 2.6 Monotonicity of the algorithm

The algorithm does decrease the objective function at each iteration. This is because a projection matrix M is idempotent, that is, it is symmetric and MM = M. We then get that the objective function value of znew is

and this must be smaller than ckT e since it subtracts off the square of the length of the vector PAkck.

#### 2.7 Getting a dual solution

It can be shown that

• dk = 0 if and only if xk is a vertex.
• c-AT vk 0 at the optimal vertex. The optimal vertex is the only one where we have c - AT vk 0.

A possible dual iterate is

 (4)

Notice that the dual slacks are then c - AT vk, and that dk = -Dk(c - AT vk). Thus, the elements of dk give the product between the primal variable d ik and the corresponding dual slack, so they give a measure of the complementary slackness. At optimality, we will have dual feasibility with this choice. It can be shown that this choice is actually dual optimal if xk is optimal for the primal problem.

#### 2.8 Stopping the algorithm

We stop the simplex algorithm when all the costs in the tableau are nonnegative. Unfortunately, we don’t have such a simple stopping rule for the primal affine scaling algorithm. However, there are some possibilities:

• Is c - AT vk ≥-ϵe for some ϵ such as 10-6. (This corresponds to almost getting dual feasibility.)
• Is the change in the objective function from one iteration to the next smaller than some tolerance? That is, do we have cT xk - cT xk+1 ϵ, or alternatively, (cT xk - cT xk+1)∕cT xk ϵ. This criterion indicates that we are not making much progress, so we must be very close to the optimal.
• Is dk a small vector? This indicates that we are close to getting complementary slackness. (See the last subsection.)

### 3 Doing more centering

The algorithm still sometimes makes very slow progress when it gets close to a corner of the feasible region. One possible remedy is to try explicitly to center the iterates. We do this by moving in the direction

in the scaled problem (Pk). Notice that this direction is trying to increase all components simultaneously. An algorithm will usually do some steps using the regular dk we found earlier and then mix in some centering steps.

In the example we were looking at earlier, the centering direction is

Notice that this has the effect of increasing x2 and decreasing x1, which does seem to move us towards the middle of the feasible region.

Current commercial implementations combine affine steps with centering and duality in predictor-corrector methods. In addition to excellent computational performance, variants of these methods converge in a number of iterations that is polynomial in the size of the problem.

### 4 Historical notes

Serious research on interior point methods started with the publication of Karmarkar’s paper [3]. Karmarkar made some very bold claims for the performance of his algorithm, which were somewhat borne out by subsequent results. The primal affine scaling method is a simplification of Karmarkar’s original algorithm that was proposed by several researchers in 1986, including Vanderbei, Meketon and Freedman [8]. This method was in fact discovered by the Russian Dikin [2] in 1967, although this discovery remained unknown in the west until much later. A good (mathematical) description of the algorithm can be found in [7].

Textbooks on interior point methods include [14569].

Current computational results indicate that interior point methods outperform the simplex algorithm on large problems.

It seems that an interior point method will solve almost any problem in at most 40 iterations. The number of iterations required grows very slowly as the size of the problem increases. By contrast, the simplex algorithm seems to need approximately 1.5m iterations to solve a problem with m constraints. The work required at each iteration of an interior point method is far larger than the amount of work needed to compute a simplex pivot, so there is a trade-off: an interior point algorithm needs far fewer iterations, but it takes considerably more time per iteration, when compared to the simplex algorithm.

### References

[1]   S. P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004.

[2]   I. I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademiia Nauk SSSR, 174:747–748, 1967. English Translation: Soviet Mathematics Doklady, 1967, Volume 8, pp. 674–675.

[3]   N. K. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395, 1984.

[4]   D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming. Springer, 3rd edition, 2008.

[5]   C. Roos, T. Terlaky, and J.-Ph. Vial. Interior Point Methods for Linear Optimization. Springer, 2nd edition, 2005.

[6]   R. J. Vanderbei. Linear Programming: Foundations and Extensions. Springer, 3rd edition, 2008.

[7]   R. J. Vanderbei and J. C. Lagarias. I. I. Dikin’s convergence result for the affine–scaling algorithm. In J. C. Lagarias and M. J. Todd, editors, Mathematical Developments Arising from Linear Programming: Proceedings of a Joint Summer Research Conference held at Bowdoin College, Brunswick, Maine, USA, June/July 1988, volume 114 of Contemporary Mathematics, pages 109–119. American Mathematical Society, Providence, Rhode Island, USA, 1990.

[8]   R. J. Vanderbei, M. S. Meketon, and B. A. Freedman. A modification of Karmarkar’s linear programming algorithm. Algorithmica, 1:395–407, 1986.

[9]   S. J. Wright. Primal-dual interior point methods. SIAM, Philadelphia, 1996.