Seminar assignments - Assignment 1-7 PDF

Title	Seminar assignments - Assignment 1-7
Course	Linear Optimization
Institution	Georgia Institute of Technology
Pages	14
File Size	353.4 KB
File Type	PDF
Total Downloads	14
Total Views	165

Preview

CLICK TO PREVIEW PDF

Summary

Assignment 1-7...

Description

ISyE 6663

Optimization III

Spring 2011 Assignment 1 (Revision) Issued: January 18, 2011 Due: January 27, 2011

Problem 1 Consider S 1 , S 2 ⊂ R ∪ {+∞}. If you need additional assumptions for a result to hold, then state those assumptions, and motivate why the assumptions are needed. Show that: (1) inf{S 1 + S 2 } = inf S 1 + inf S 2 (2) inf{S 1 ∪ S 2 } = min{inf S 1 , inf S 2 } (3) inf{S 1 ∩ S 2 } ≥ max{inf S 1 , inf S 2 } Give an example where strict inequality holds.

Problem 2 If you need additional assumptions for a result to hold, then state those assumptions, and motivate why the assumptions are needed. (1) Consider f, g : X → R. Show that inf f(x) + inf g(x) ≤ inf {f(x) + g(x)}

x∈X

x∈X

x∈X

Give an example where strict inequality holds. (2) Consider f : X → R and g : Y → R. Show that inf f(x) + inf g(y) =

x∈X

y∈Y

inf

x∈X ,y∈Y

{f(x) + g(y)}

(3) Consider f : X × Y → R. Show that inf { inf f(x, y)} = inf { inf f(x, y)} =

x∈X y∈Y

y∈Y x∈X

inf

x∈X ,y∈Y

(4) Consider f : X × Y → R. Show that sup { inf f(x, y)} ≤ inf {sup f(x, y)} x∈X y∈Y

Give an example where strict inequality holds.

y∈Y x∈X

f(x, y)

ISyE 6663 · Spring 2011 · Assignment 1

2

Problem 3 Order notation: (1) Let f(n) and g(n) be asymptotically nonnegative functions. Use the definition of the Ωnotation in the text (beware: other texts often use Θ for the same concept). Show that max{f(n), g(n)} = Ω(f(n) + g(n)). (2) Show that for any constants a and b, with b > 0, (n + a)b = Ω(nb ). (3) Is 2n+1 = O(2n )? Is 22n = O(2n )? Why or why not?

Problem 4 Assume that L ⊂ Rn is a linear subspace, and that B 1 and B 2 are two bases for L. (1) Show that |B1 | = |B 2 |, where |B| denotes the number of elements (cardinality) of set B. (2) Show that the expression of any element of L as a linear combination of the elements of B 1 is unique. (3) Represent a linear combination by its vector of coeﬃcients. Show that the function that maps the expression of an element of L as a linear combination of the elements of B 1 to the expression of the same element as a linear combination of the elements of B 2 is an invertible linear function. Show how to construct the matrix representation of the abovementioned function and its inverse.

Problem 5 Equivalence of norms in Rn : (1) Let  ·  and | · | be any two norms on Rn . Show that  ·  and | · | are equivalent, in the sense that there exists a constant c > 0 such that x ≤ c|x| for all x ∈ Rn . (Compare this result with the result (A.35) of the text.) (2) Let  ·  be any norm on Rn . Consider the topology on Rn generated by the balls B(x, r) := {y ∈ Rn : y − x < r} for x ∈ Rn and r > 0. Show that any norm | · | is a continuous function from Rn to R.

Problem 6 Consider a symmetric n × n matrix A with eigenvalues λ1 ≥ · · · ≥ λn . Let R : Rn \ {0} → R be defined by R(x) := xT Ax/xT x. (1) Show that maxx=0 R(x) = λ1 , and min x=0 R(x) = λn . (2) Show that the maximum is attained by any eigenvector of A corresponding to λ1 , and the minimum is attained by any eigenvector of A corresponding to λn . (3) Show that λn x2 ≤ xT Ax ≤ λ1 x2 for all x ∈ Rn . (Compare this result with the result on p.599 of the text.)

ISyE 6663 · Spring 2011 · Assignment 1

3

Problem 7 Let A be an m × n matrix with Euclidean norm A2 :=

max

x∈Rn \{0}

Ax2  x 2

(see (A.38) in the text). (1) Show that all the eigenvalues of AT A are nonnegative. (2) Show that A22 = λ1 (AT A), where λ1 (AT A) denotes the largest eigenvalue of AT A. (Compare this result with the result (A.39b) of the text.) (3) Suppose that A is a symmetric n × n matrix with eigenvalues λ1 (A), . . . , λ n (A). Show that A2 =

max

i∈{1,...,n}

|λi (A)|

(4) Suppose that A is a symmetric and nonsingular n×n matrix with eigenvalues λ1 (A), . . . , λ n (A). Show that 1 A−1 2 = mini∈{1,...,n} |λi (A)| (5) Let the condition number κ(A) of A be defined by κ(A) := A2 A−1 2 (see (A.42) in the text). Suppose again that A is a symmetric and nonsingular n × n matrix with eigenvalues λ1 (A), . . . , λn (A). Show that κ(A) =

maxi∈{1,...,n} |λi (A)| mini∈{1,...,n} |λi (A)|

ISyE 6663

Nonlinear Optimization Spring 2011 Assignment 2

Issued: February 1, 2011 Due: February 10, 2011

Problem 1 Consider a quadratic function f : Rn → R given by f(x) := d ∈ Rn .

1 T x Gx 2

+ d T x, where G ∈ Rn×n and

(1) Show that we can assume, without loss of generality, that G is symmetric. (2) Verify that ∇f(x) = Gx + d. (3) Show that f is Lipschitz continuously diﬀerentiable, that is, there is a constant L such that ∇f(x) − ∇f(y)2 ≤ Lx − y2 for all x, y ∈ Rn . (4) What is the smallest constant L such that the Lipschitz inequality above holds?

Problem 2 Consider the function f : R2 → R defined by f(x) = x22 . Consider the sequence {xk } given by    1 cos(k) xk = 1+ k sin(k) 2 for k = 0, 1, 2, . . .. 1. Show that the sequence {xk } satisfies f(xk+1 ) < f (xk ) for all k. 2. Show that every point on the unit circle {x ∈ R2 : x2 = 1} is a limit point of the sequence {xk }. You may use the following result: The sequence {ξk } defined by ξk = k( mod 2π) is dense in [0, 2π]. The point of the exercise is to show an example of a sequence {xk } of iterates such that the objective value converges (maybe to the minimum objective value), but the sequence of iterates does not converge.

Problem 3 Prove that all isolated local minimizers are strict. That is, for any isolated local minimizer x∗ , show that there is a neighborhood N of x∗ such that for all x ∈ N \ {x∗ }, it holds that f(x) > f (x∗ ).

ISyE 6663 · Spring 2011 · Assignment 2

2

Problem 4 Consider a convex function f. Show that arg minx∈dom(f ) f(x) is a convex set.

Problem 5 Suppose that f : Rm → R is continuously diﬀerentiable. Consider the function ˜f : Rn → R given by f˜(x) = f (Sx + s) for some S ∈ Rm×n and s ∈ Rm . Show that ∇f˜(x) = S T ∇f(Sx + s)

and

∇2 f˜(x) = S T ∇2 f (Sx + s)S

Hint: First use the chain rule to find an expression for each partial derivative ∂ ˜f (x)/∂xi in terms of the partial derivatives of f, and then verify that these expressions are consistent with the matrix expressions above.

Problem 6 Consider the symmetric-rank-one (SR1) formula for updating a Hessian approximation: Bk+1 = Bk +

(yk − Bk sk )(yk − Bk sk )T (yk − Bk sk )T sk

Consider the Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula for updating a Hessian approximation: yk yTk Bk sk sTk Bk + Bk+1 = Bk − skT Bk sk ykTsk Show that both the SR1 update and the BFGS update are scale-invariant. That is, suppose that f : Rn → R is continuously diﬀerentiable, and, for each update method, consider the following two sequences {xk } and {˜ xk }: Choose initial point x0 ∈ Rn and initial Hessian approximation n×n B0 ∈ R (typically symmetric positive definite). Then the sequence {xk } is generated as follows: xk+1 = xk − Bk−1 ∇f(xk ) (For now, assume that all Bk are nonsingular.) Next, consider an aﬃne rescaling of the decision variables x given by x ˜ = S −1 (x − s), where S ∈ Rn×n is nonsingular, and s ∈ Rn . That is, the new ˜ x) = f(S x objective function is given by f(˜ ˜ + s). Choose scaled initial point x ˜0 = S −1 (x0 − s), and T ˜ 0 = S B0 S (B˜0 is symmetric positive definite iﬀ B0 corresponding initial Hessian approximation B is symmetric positive definite). Then the sequence {˜ xk } is generated as follows: ˜ −1∇f˜(˜ xk ) x ˜k+1 = x ˜k − B k Show that the two sequences maintain the initial scaling relationship, that is, x ˜k = S −1 (xk − s) for all k.

Problem 7 Consider a quadratic function f : R2 → R given by f(x, y) := ax2 + by 2 + cxy + dx + ey. (1) Suppose that a = b = 1. Identify and classify the stationary points of f (it may depend on the other parameters c, d, e).

ISyE 6663 · Spring 2011 · Assignment 2

3

(2) Identify and classify the stationary points of f as a function of all the parameters a, b, c, d , e.

Problem 8 A local minimizer in every direction is not necessarily a local minimum: Consider a diﬀerentiable function f : Rn → R. Consider a point x∗ that is a local minimizer of f along every line through x∗ ; that is, the function g(α) := f(x∗ + αd) is minimized at α = 0 for all d ∈ Rn . (1) Show that ∇f(x∗ ) = 0. (2) Consider the function f : R2 →  R given by f(x, y) := (x − ay 2 )(x − by2 ) for some a, b > 0, ∗ a = b. Show that x = 0 is a local minimizer of f along every line through x∗ , but that x∗ is not a local minimum of f.

Problem 9 Show that the sequence {xk } given by xk = 1/k is not Q-linearly convergent, though it does converge to 0. (It converges sublinearly to 0.)

ISyE 6663

Optimization III

Spring 2011 Assignment 3 Issued: March 1, 2011 Due: March 10, 2011

Problem 1 Consider a constant stepsize algorithm, so that xk+1 = xk + sd k for some constant stepsize s > 0. There are a variety of conditions under which the sequence of iterates {xk } of such an algorithm converges to a stationary point of f. (1) Consider the function f : Rn → R given by f(x) := x2+a 2 , with a ≥ 0. Consider the application of the steepest descent algorithm with constant stepsize to f, that is, xk+1 = xk − s∇f(xk ) for some constant stepsize s > 0. Determine for which values of s and x0 the sequence of iterates {xk } converges to x∗ = 0. 3/2

(2) Consider the function f : Rn → R given by f(x) := x 2 . (a) Show that f is not Lipschitz continuously diﬀerentiable, that is, there is no constant L such that ∇f(x) − ∇f(y)2 ≤ Lx − y2 for all x, y ∈ Rn . (In fact, f is not even locally Lipschitz continuously diﬀerentiable at the optimal solution x∗ = 0, that is, there is no neighborhood of the optimal solution x∗ = 0 and constant L such that the Lipschitz inequality above holds for all x, y in the neighborhood.) (b) Consider the application of the steepest descent algorithm with constant stepsize to f, that is, xk+1 = xk −s∇f(xk ) for some constant stepsize s > 0. Show that, for any value of s > 0, the sequence of iterates {xk } either converges to x∗ = 0 in a finite number of iterations (and only in a very special case), or else the iterates do not converge to x∗ . (3) Consider a quadratic function f : Rn → R given by f(x) := 12 xT Gx + d T x, where G ∈ Rn×n is symmetric positive definite and d ∈ Rn . In the previous homework you were asked to show that f is Lipschitz continuously diﬀerentiable, that is, there is a constant L such that ∇f(x) − ∇f(y)2 ≤ Lx − y2 for all x, y ∈ Rn , with the smallest such constant L given by the largest eigenvalue of G. (a) Consider a steepest descent algorithm with a constant stepsize s applied to f. Show that {xk } converges to x∗ = −G−1 d for any starting point x0 if and only if 0 < s < 2/L. (b) Consider a gradient search algorithm with a constant stepsize s and constant symmetric positive definite deflection matrix B applied to f, that is, xk+1 = xk − sB∇f(xk ). Let L be the largest eigenvalue of B 1/2 GB 1/2 . Show that {xk } converges to x∗ = −G−1 d for any starting point x0 if and only if 0 < s < 2/L.

ISyE 6663 · Spring 2011 · Assignment 3

2

(4) Consider a quadratic function f : Rn → R given by f(x) := 12 xT Gx, where G ∈ Rn×n is nonsingular symmetric indefinite. Consider a steepest descent algorithm with a constant stepsize s applied to f. Show that if the starting point x0 does not belong to the subspace spanned by the eigenvectors corresponding to the nonnegative eigenvalues of G, the sequence {xk } diverges. (5) Suggest some conditions that you think would be required for a constant stepsize algorithm to converge to a stationary point of f. Problem 2 Program the steepest descent and Newton algorithms, both using the Armijo (backtracking) line search method with initial step size α ¯ = 1, decrease factor β = 1/2, and constant for suﬃcient decrease condition c1 = 0.1. Use the two algorithms to minimize the Rosenbrock function f(x1 , x 2 ) := 100(x2 − x12)2 + (1 − x1 )2 First try the initial point x0 = (1.2, 1.2), and then the initial point x0 = (−1.2, 1.0). Print the step length αk used by each algorithm at each iteration k, and the distance xk − x∗ 2 between the iterate xk at each iteration k and the optimal solution x∗ . Problem 3 Prove that Bx ≥ x/B −1  for any x ∈ Rn , B ∈ Rn×n nonsingular, and any norm  · . Problem 4 Kantorovich inequality: Show that for any symmetric positive definite matrix Q ∈ Rn×n and any x ∈ Rn , it holds that 4λminλmax (xT x)2 ≥ T T −1 (λmin + λmax )2 (x Qx)(x Q x) where λmin and λmax denote the smallest and largest eigenvalues of Q respectively.

ISyE 6663

Optimization III

Spring 2011 Assignment 4 Issued: March 29, 2011 Due: April 7, 2011

Problem 1 Program a trust region algorithm that uses the dogleg method. Use your program to minimize the Rosenbrock function f(x1 , x 2 ) := 100(x2 − x12)2 + (1 − x1 )2 For the second order term, choose B k = ∇2 f(xk ). Choose η1 = 0.1, η2 = 0.2, η3 = 0.9, β = 1/2, γ = 1, c1 = 1/2. Use  ·  =  ·  2 , and another version with  ·  =  ·  Q , with Q = ∇2 f(x∗ ). First try the initial point x0 = (1.2, 1.2), and then the initial point x0 = (−1.2, 1.0). Plot a graph of the distance xk − x∗ 2 between the iterate xk and the optimal solution x∗ versus iteration index k for each norm and each initial point. Interpret the results. Problem 2 Suppose that the objective function f : Rn → R is bounded below and continuously diﬀerentiable. A trust region algorithm is used, with matrices B k for the second order term of the model functions satisfying B k  ≤ β for all k. The approximate solutions of the subproblem at each iteration satisfy the two conditions of class. Show that if the iterates remain in a bounded set, then there is a limit point x∞ of the sequence {xk } such that ∇f(x∞ ) = 0. Problem 3 The Cauchy-Schwartz inequality states that for any u, v ∈ Rn , it holds that  T 2     u v  ≤ uT u v T v

with equality only when u and v are parallel. Use the Cauchy-Schwartz inequality to show that, for any Q ∈ Rn×n positive definite and for any w ∈ Rn , it holds that    |w|24 ≤ wT Qw wT Q−1 w with equality only when w and Qw (and thus also Q−1 w) are parallel.

Problem 4 Show that if B ∈ Rn×n is symmetric, then B + λI is positive definite for all λ suﬃciently large. Problem 5 Equivalent trust-region methods: Consider a trust region method with subproblems given by  1  mk (p) := f(xk ) + ∇f (xk )T p + 12 pT Hk p minn p∈R

subject to

p2 ≤ ∆k

Consider another trust region method with subproblems given by   1 T T 2 min mk (p) := f(xk ) + ∇f (xk ) p + p (Hk + λk I)p 2 p∈Rn (Note that the subproblem above is an unconstrained optimization problem.)

ISyE 6663 · Spring 2011 · Assignment 4

2

1. Show that the two trust region methods are equivalent, in the sense that for every ∆k > 0, there is a λk ≥ 0 such that Hk +λk I is positive semidefinite, ∇f(xk ) is in the range of Hk +λk I (which automatically holds if Hk + λk I is positive definite and thus nonsingular), and every optimal solution of the first subproblem is an optimal solution of the second subproblem (in particular, if Hk + λk I is positive definite then the two subproblems have the same optimal solution); and for every λk ≥ 0 and every optimal solution of the second subproblem (which implies that Hk + λk I is positive semidefinite and ∇f(xk ) is in the range of Hk + λk I), there is a ∆k > 0 such that the optimal solution of the second subproblem is also optimal for the first subproblem (again, if Hk + λk I is positive definite then the two subproblems have the same optimal solution). 2. List some potential advantages/disadvantages of each of the two trust region methods. Problem 6 Consider the following two-dimensional subspace minimization problem (which gives a solution to the subproblem at least as good as that of the dogleg method).   minn m(p) := a + bT p + 21pT Bp p∈R

subject to

p2 ≤ ∆

p ∈ span[b, B −1 b] Suppose that B is positive definite. Describe (in a precise way) a procedure to solve the twodimensional subspace minimization problem above.

ISyE 6663

Optimization III

Spring 2011 Assignment 5 Issued: April 7, 2011 Due: April 14, 2011

Problem 1 Program the conjugate gradient algorithm. Use your program to minimize the function f(x) :=

1 T x Ax − bT x 2

where A ∈ Rn×n is the Hilbert matrix with entries Ai,j = 1/(i + j − 1) and b = (1, 1, . . . , 1). Use initial point x0 = 0. Run the algorithm for dimensions n = 5, 10, 15, 20. Stop when ∇f(xk )∞ ≤ 10−6 . Plot a graph of ∇f(xk )∞ versus iteration index k, and a graph of the distance xk − x∗ 2 between the iterate xk and the optimal solution x∗ versus iteration index k, for each dimension. Interpret the results. Problem 2 Show that if d 0 , d1 , . . . , dk−1 ∈ Rn are linearly independent, and f : Rn → R is strongly convex quadratic, then h : Rk → R given by h(y) := f(x0 + y0 d 0 + · · · + yk−1 d k−1 ) is also strongly convex quadratic. Problem 3 Conjugate gradient methods use directions d 0 , d 1 , . . . , dn−1 ∈ Rn (for most iterations) generated as follows: d 0 = −∇f(x0 ) d k+1 = −∇f(xk+1 ) + βk+1 d k The Fletcher-Reeves method chooses FR := βk+1

∇f(xk+1 )T ∇f(xk+1 ) ∇f(xk )T ∇f(xk )

The Polak-Ribi`ere method chooses PR βk+1 :=

∇f(xk+1 )T (∇f (xk+1 ) − ∇f(xk )) ∇f(xk )T ∇f(xk )

The Hestenes-Stiefel method chooses HS := βk+1

∇f(xk+1 )T (∇f (xk+1 ) − ∇f(xk )) (∇f (xk+1 ) − ∇f(xk ))T d k

Show that when f is a quadratic function, and exact line search is done, then the three methods are the same.

ISyE 6663 · Spring 2011 · Assignment 5

2

Problem 4  Consider a symmetric positive definite matrix Q ∈ Rn×n , and the associated norm xQ := xT Qx. Consider Q-conjugate directions d 0 , d 1 , . . . , d n−1 ∈ Rn generated from linearly independent vectors p0 , p1 , . . . , pn−1 ∈ Rn . Show that, for each k = 1, . . . , n − 1, d k = pk − pˆk , where pˆk is the projection of pk onto the subspace spanned by p0 , . . . , pk−1 (or the subspace spanned by d 0 , . . . , d k−1 ) with respect to the  · Q -norm, that is, pˆk = arg min {pk − pQ : p ∈ [p0 , . . . , pk−1 ]} That is, d k is the part of pk that remains after we subtract the projection of pk onto the subspace spanned by p0 , . . . , pk−1 .

ISyE 6663

Optimization III

Spring 2003 Assignment 6 Issued: April 1, 2003 Due: April 10, 2003

Problem 1 Nocedal and Wright, Problem 8.3 Problem 2 Nocedal and Wright, Problem 8.4 Problem 3 Nocedal and Wright, Problem 8.8 Problem 4 Nocedal and Wright, Problem 8.9 Problem 5 Nocedal and Wright, Problem 8.10 Problem 6 Program the BFGS method with Armijo line search. Set B0 = I. Set the initial step length equal to 1, and set the step length reduction factor to 0.8. Use your program to minimize the Rosenbrock function, with initial points (1.2, 1.2) and (−1.2, 1). Also use your programs for steepest descent and Newton’s method to minimize the same function. Compare the number of steps taken by each algorithm to attain an objective value of 10−6 . Make a two dimensional plot of the sequence of iterates of each method.

ISyE 6663

Optimization III

Spring 2003 Assignment 7 Issued: April 15, 2003 Due: April 24, 2003

Problem 1 Nocedal and Wright, Problem 12.3 Problem 2 Nocedal and Wright, ...