Interior Point Methods

John Mitchell
November 2010

1 Introduction

Our standard linear programming problem is

              T
min          c x
subject to   Ax   =   b      (P )
               x  ≥   0

Here, c and x are n vectors, b is an m vector, and A is an m × n matrix. The simplex algorithm moves from basic feasible solution to basic feasible solution. The basic feasible solutions are extreme points of the feasible region for (P). Furthermore, the simplex algorithm moves from one extreme point along an edge of the feasible region to another extreme point. If the feasible region is very big with many extreme points, the simplex algorithm may take a long time before it finds the optimal extreme point. Thus, we may try to use an algorithm which cuts across the middle of the feasible region. Such a method is called an interior point method. There are many different interior point algorithms; we will just consider one: the primal affine scaling algorithm.

Notation: We will let e = [1,, 1]T .

2 The primal affine scaling algorithm

The boundaries of the feasible region are given by the nonnegativity inequalities x 0. Therefore, a point is in the interior of the feasible region if it satisfies

Ax =  b and x > 0.

Assume we know a strictly positive point xk which is feasible for the problem (P). The best direction to move may appear to be the direction -c, because this will result in the largest decrease in the objective function value for the smallest change in x. There are two problems with this direction:

2.1 Rescaling so that we can take long steps

Consider the problem

min          - x1  -  x2   +  x3  +   x4
subject to     x           +  x           =   1                (LP  1)
                1              3
                      x2          +   x4  =   2
                       xi  ≥   0       i  =   1,...,4

so here c = [-1,-1, 1, 1]T . This problem has optimal solution x* = (1, 2, 0, 0) with value -3. The point xk = (0.8, 0.1, 0.2, 1.9) is feasible in this problem, with objective function value 1.2. We want to look at points of the form

x = xk - βc

for some steplength β, because the objective function value of such a point is

 T     T  k     T     T  k
c x = c x  - βc  c < c x

because cT c is the dot product between the vector c and itself, that is, it is just the square of the length of the vector c. Now,

x  =  xk - βc

   =  (0.8,0.1,0.2,1.9) - β (- 1, - 1,1, 1)
   =  (0.8 + β,0.1 + β,0.2 - β,1.9 - β)
Thus, if we want x 0, the largest possible value of β is 0.2 and then we get x = (1, 0.3, 0, 1.7) with value 0.4, so we have not moved very far towards the optimal point.

To get around this difficulty, we rescale the problem so that we can take a step of length at least one in any direction. Thus, we introduce a diagonal matrix

      ⌊   k              ⌋
      | x 1  0   ...  0  |
      ||  0   xk  ...  ...  ||
Dk  = ||  .   .2  .       ||.
      ⌈  ..   ..   ..  0  ⌉
         0  ...   0  xkn

We then scale the constraint matrix and the objective function, getting Ak := ADk and ck := Dkc. For the problem (LP1), with xk as given above, we get

      ⌊                   ⌋
        0.8   0   0    0
  k   ||  0   0.1   0    0  ||
D  =  |⌈  0    0   0.2   0  |⌉

         0    0   0   1.9

and

                                                  ⌊       ⌋
             [                    ]               | - 0.8 |
Ak =  ADk  =   0.8   0   0.2   0   ,  ck = Dkc  = || - 0.1 ||
                0   0.1   0   1.9                  ⌈   0.2 ⌉
                                                      1.9

We then get a rescaled problem

min          ckTz
               k                 k
subject to   A  z  =   b      (P  )
                z  ≥   0

The rescaled version of (LP1) is

min         - 0.8z1  -   0.1z2  +   0.2z3  +  1.9z4
subject to    0.8z1            +   0.2z3            =   1           (LP 1k)
                         0.1z              +  1.9z   =   2
                            2                    4
                            zi ≥       0         i  =   1,...,4

Notice that z = (1, 1, 1, 1) is feasible in (LP1k), with objective function value 1.2. Because all of the components of z are at least one, we can take a reasonable step in any direction.

2.2 The relationship between (P) and (Pk)

We list several properties of (Pk). It may help you to understand these properties if you see how they apply to the problem (LP1).

2.3 Maintaining Ax = b

Conceptually, an iteration of the algorithm has the form:

  1. Given a point xk > 0 which is feasible for (P), rescale to get new problem (Pk). The point z = e is feasible in the rescaled problem (Pk).
  2. Update z to a new point: znew = e + βd, where d is a direction and β is a step length. We pick β to ensure that znew > 0.
  3. Scale back to get a feasible point in (P): xk+1 = Dkznew.

We would like to take d = -ck, in order to decrease the objective function value as quickly as possible. Unfortunately, this may lead to a point which violates the constraint Akz = b. For example, in the problem (LP1k), taking β = 0.5 gives the point

z  =   e - 0.5ck

   =   (1,1,1,1) - 0.5 (- 0.8,- 0.1,0.2, 1.9)
   =   (1.4,1.05,0.9,0.05)
which does not satisfy the constraints. We need to move in a direction d which satisfies Akd = 0. (A direction d which satisfies Akd = 0 is said to be in the nullspace of the matrix Ak.) We then get
Ak (e + βd) = Ake + βAkd  =  Ake = b,

because e is feasible in (Pk). Therefore, we project the direction ck onto the nullspace of Ak. Algebraically, this means that we take

          k
d = - PAkc
(1)

where

             kT   k kT - 1 k
PAk = (I - A   (A  A  )  A  ),
(2)

and I denotes the identity matrix. (Aside: We assume that the rows of A are linearly independent. Under this assumption, the projection matrix is well-defined. Note that we need this assumption to hold in order to be able to obtain a basic feasible solution.) For the problem (LP1), we have

                                  ⌊  0.8   0  ⌋
             [                   ]|          |
AkAkT    =     0.8   0   0.2   0   ||  0   0.1 ||
                0  0.1   0   1.9  ⌈  0.2   0  ⌉
                                     0   1.9
             [           ]
         =     0.68    0
                0    3.62
so
       T        [ 1.4706     0    ]
(AkAk   )-1  =
                     0     0.2762
and
             ⌊         ⌋
               0.8   0   [                 ][                   ]
             ||  0   0.1 ||   1.4706     0       0.8   0   0.2   0
PAk  =   I - |⌈ 0.2   0 |⌉      0    0.2762      0   0.1  0   1.9

                0   1.9
         ⌊ 1  0  0  0 ⌋   ⌊ 0.941    0    0.235    0   ⌋
         | 0  1  0  0 |   |   0    0.003    0    0.052 |
     =   ||            || - ||                            ||
         ⌈ 0  0  1  0 ⌉   ⌈ 0.235    0    0.059    0   ⌉
           0  0  0  1         0    0.052    0    0.997
         ⌊  0.059      0     - 0.235    0    ⌋
         |                                   |
     =   ||    0      0.997      0     - 0.052 || .
         ⌈ - 0.235      0     0.941      0    ⌉
              0     - 0.052     0      0.003
So, in the problem (LP1k), we use the direction
              k
d  =   - P⌊Ak c                                ⌋⌊       ⌋
            0.059      0     - 0.235     0        - 0.8
         ||    0      0.997      0     - 0.052  ||||  - 0.1 ||
   =   - |⌈  - 0.235    0      0.941      0    |⌉|⌈   0.2 |⌉

              0     - 0.052     0      0.003        1.9
       ⌊   0.0942 ⌋
       |   0.1985 |
   =   ||          || .
       ⌈ - 0.3762 ⌉
         - 0.0109
This gives a new point of the form
znew  =   e + βd

      =   (1,1,1,1) + β(.0941,0.1994,- 0.3765,- 0.0160)
Taking β = 2 gives
 new
z     =   (1.1882, 1.3988,0.237,0.968)
and then we get a new iterate for (P):
xk+1   =  Dkznew

       =  (0.95,0.14,0.05,1.86)
with value -0.82.

2.4 Choosing the steplength

We have znew = e + βd, and we need to have znew > 0. Thus, we need to select β so that 1 + βdi > 0 for each component i. So, we need to pick

β ≤ 1∕ (- d ) if d <  0,
          i     i

so we take

β = 0.9∕ max {- di : di < 0}.
(3)

This will result in moving 0.9 of the way to the boundary of the feasible region of the rescaled problem (Pk). Notice that one component of znew will be equal to 0.1.

(This discussion on choosing the steplength should remind you of the minimum ratio rule in the simplex algorithm.)

2.5 One iteration of the algorithm

Recall that when we looked at the revised simplex method, we found out that it was not necessary to calculate the whole of the simplex tableau. In the same spirit of efficiency, it is not necessary to calculate the complete projection matrix PAk. In practice, we would perform one iteration as follows:

  1. Given current iterate xk > 0 feasible in (P).
  2. Calculate Dk, Ak = ADk, ck = Dkc.
  3. Calculate wk = Akck.
  4. Calculate vk = (AkAkT )-1wk. (Aside: In practice, we would not calculate the inverse explicitly, but we would solve the system of equations (AkAkT )vk = wk in order to find vk.)
  5. Calculate gk = AkT vk.
  6. Calculate the direction dk in the problem (Pk): dk = -ck + gk(= -P Akck).
  7. Calculate step length β using equation (3).
  8. Update znew = e + βdk.
  9. Update xk+1 = Dkznew.

2.6 Monotonicity of the algorithm

The algorithm does decrease the objective function at each iteration. This is because a projection matrix M is idempotent, that is, it is symmetric and MM = M. We then get that the objective function value of znew is

ckTznew  =   ckT(e - βP  kck)
               T       TA
         =   ck e - βck  PAkck
              kT      kT         k
         =   c  e - βc   PAkPAk c
         =   ckTe - βckT PTkPAk ck
               T          A
         =   ck e - β(PAk ck)T(PAkck)
and this must be smaller than ckT e since it subtracts off the square of the length of the vector PAkck.

2.7 Getting a dual solution

It can be shown that

A possible dual iterate is

            T
vk = (AkAk   )-1Akck.
(4)

Notice that the dual slacks are then c - AT vk, and that dk = -Dk(c - AT vk). Thus, the elements of dk give the product between the primal variable d ik and the corresponding dual slack, so they give a measure of the complementary slackness. At optimality, we will have dual feasibility with this choice. It can be shown that this choice is actually dual optimal if xk is optimal for the primal problem.

2.8 Stopping the algorithm

We stop the simplex algorithm when all the costs in the tableau are nonnegative. Unfortunately, we don’t have such a simple stopping rule for the primal affine scaling algorithm. However, there are some possibilities:

3 Doing more centering

The algorithm still sometimes makes very slow progress when it gets close to a corner of the feasible region. One possible remedy is to try explicitly to center the iterates. We do this by moving in the direction

 center
d     := PAk e

in the scaled problem (Pk). Notice that this direction is trying to increase all components simultaneously. An algorithm will usually do some steps using the regular dk we found earlier and then mix in some centering steps.

In the example we were looking at earlier, the centering direction is

dcenter  =  PAk e
           ⌊                                    ⌋⌊    ⌋
           |  0.059      0     - 0.235     0    ||  1 |
        =  ||    0      0.997      0     - 0.052  ||||  1 ||
           ⌈  - 0.235    0      0.941      0    ⌉⌈  1 ⌉
                0     - 0.052     0      0.003       1
           ⌊         ⌋
              - 0.176
           ||   0.945 ||
        =  |⌈   0.706 |⌉ .
              - 0.049
Notice that this has the effect of increasing x2 and decreasing x1, which does seem to move us towards the middle of the feasible region.

Current commercial implementations combine affine steps with centering and duality in predictor-corrector methods. In addition to excellent computational performance, variants of these methods converge in a number of iterations that is polynomial in the size of the problem.

4 Historical notes

Serious research on interior point methods started with the publication of Karmarkar’s paper [3]. Karmarkar made some very bold claims for the performance of his algorithm, which were somewhat borne out by subsequent results. The primal affine scaling method is a simplification of Karmarkar’s original algorithm that was proposed by several researchers in 1986, including Vanderbei, Meketon and Freedman [8]. This method was in fact discovered by the Russian Dikin [2] in 1967, although this discovery remained unknown in the west until much later. A good (mathematical) description of the algorithm can be found in [7].

Textbooks on interior point methods include [14569].

Current computational results indicate that interior point methods outperform the simplex algorithm on large problems.

It seems that an interior point method will solve almost any problem in at most 40 iterations. The number of iterations required grows very slowly as the size of the problem increases. By contrast, the simplex algorithm seems to need approximately 1.5m iterations to solve a problem with m constraints. The work required at each iteration of an interior point method is far larger than the amount of work needed to compute a simplex pivot, so there is a trade-off: an interior point algorithm needs far fewer iterations, but it takes considerably more time per iteration, when compared to the simplex algorithm.

References

[1]   S. P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004.

[2]   I. I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademiia Nauk SSSR, 174:747–748, 1967. English Translation: Soviet Mathematics Doklady, 1967, Volume 8, pp. 674–675.

[3]   N. K. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395, 1984.

[4]   D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming. Springer, 3rd edition, 2008.

[5]   C. Roos, T. Terlaky, and J.-Ph. Vial. Interior Point Methods for Linear Optimization. Springer, 2nd edition, 2005.

[6]   R. J. Vanderbei. Linear Programming: Foundations and Extensions. Springer, 3rd edition, 2008.

[7]   R. J. Vanderbei and J. C. Lagarias. I. I. Dikin’s convergence result for the affine–scaling algorithm. In J. C. Lagarias and M. J. Todd, editors, Mathematical Developments Arising from Linear Programming: Proceedings of a Joint Summer Research Conference held at Bowdoin College, Brunswick, Maine, USA, June/July 1988, volume 114 of Contemporary Mathematics, pages 109–119. American Mathematical Society, Providence, Rhode Island, USA, 1990.

[8]   R. J. Vanderbei, M. S. Meketon, and B. A. Freedman. A modification of Karmarkar’s linear programming algorithm. Algorithmica, 1:395–407, 1986.

[9]   S. J. Wright. Primal-dual interior point methods. SIAM, Philadelphia, 1996.