TAM 541 notes PDF

Title	TAM 541 notes
Course	Mathematical Methods I
Institution	University of Illinois at Urbana-Champaign
Pages	29
File Size	351.3 KB
File Type	PDF
Total Downloads	117
Total Views	150

Preview

CLICK TO PREVIEW PDF

Summary

supplementary lecture notes on linear algebra...

Description

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

1

These are supplementary lecture notes on linear algebra. These are meant as an addendum to my online lectures. These notes may contain typos etc. • Basic problem of linear algebra: In elementary school, you first learned about multiplication y =a×x where x, y, a ∈ R. And you learned about division where you aim to find a x ∈ R such that for a given a, y ∈ R a×x=y Of course we omit to write the × simply writing the equation as ax = y The basic problem of linear algebra is the following Ax = y where x ∈ X, y ∈ Y (many a times X = Y ), and A is a linear function (or a mapping) from X → Y . As it turns out, with appropriate defintion of X, Y , many problems in engineering and mathematics can be expressed in this abstract form. Of course, in an undergraduate class, you have learned about the case where X = Rn , Y = Rm are Euclidean vector spaces and A : X → Y is a matrix. The linear algebra part of this class is divided into two sub-parts: 1. In sub-part I, you will learn about vector spaces. These are more abstract forms of spaces Rn and Rm that you have seen before in the matrix case. 2. In sub-part II, you will learn about linear operators. These are more abstract forms of the matrix. In fact, in a finite-dimensional setting (what does that mean?) a linear operator has a matrix representation (and what does that mean?). You will find out. • Vector space: Definition 1. A vector space X over reals is a set equipped with operations of 1. vector addition (for x, y ∈ X, x + y ∈ X ) 2. scalar multiplication (for x ∈ X, a ∈ R), ax ∈ X ) that certify certain axioms. For x, y ∈ X and a, b ∈ R: x+y =y+x x+0 =x x + (−x) = 0 x + (y + z) = (x + y) + z

(a + b)x = ax + bx a(x + y) = ax + ay 1x = x 0x = 0

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

2

Understand the notation 0 as an element of a vector space and as a real number. The formal definition appears as Definition 9.6.1 in Greenberg. We are most interested in applications. So, it is good to understand a few examples: 1. The Euclidean vector space Rn visualized as Cartesian coordinates. The origin of the coordinate system is the zero vector. 2. Set of n × m matrices. What is the zero vector? 3. Set of continuous functions on the interval [0, 1]. This space is denoted as C([0, 1]). What is the zero vector? • Subspace: A subspace Y is a subset of X (denoted as Y ⊂ X) such that Y itself is a vector space. A more useful equivalent definition is Y ⊂ X is a subspace if 1. Y is closed under addition; 2. Y is closed under scalar multiplication. You will several homework problems on this. Is it true that 0 ∈ Y . We continue with more definitions. • Linear combination and span: Let v1 , v2 , . . . , vn ∈ X. The set of all linear combinations of these vectors n X a1 v1 + a2 v2 + . . . + an vn = ai vi , ai ∈ R i=1

is denoted as span{v1 , v2 , . . . , vn }. Another way to write this is as follows: ) ( n X ai vi , ai ∈ R span{v1 , v2 , . . . , vn } := v ∈ X | v = i=1

We will do several examples of this in class. And also you will have several homework problems on this. A cool fact is Proposition 1. span{v1 , v2 , . . . , vn } is a subspace of X . Prove it by verifying the two properties: That the span is closed under vector addition and closed under scalar multiplication. We continue by introducing more definitions. • Linear independence (LI): A set of vectors {v1 , v2 , · · · , vn } are linearly dependent (LD) if there are real numbers a1 , a2 , · · · , an , not identically zero, such that a1 v1 + a2 v2 + . . . + an vn = 0 If no such real numbers can be found then the vectors are linearly dependent (LI). • Basis and dimension: A set of vectors {v1 , v2 , · · · , vn } is a basis of X if 1. span{v1 , v2 , . . . , vn } = X and

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

3

2. {v1 , v2 , · · · , vn } are LI. In that case X is said to have dimension n. An example of a basis in Rn is the canonical basis Rn = span {e1 , e2 , . . . , en }   1 0   where e1 =  ..  and similarly ei is the vector with 1 for the i-th entry and zero everywhere else. .  0

If no finite set of LI vectors span X then X is said to be infinite-dimensional. Suppose {v1 , v2 , · · · , vn } is a basis of X then 1. X = span {v1 , v2 , . . . , vn } means any x can be expressed as x = a1 v1 + a2 v2 + . . . + an vn for some a1 , a2 , . . . , an ∈ R; and 2. the LI property means that the coefficients are unique (why?).

The reals (a1 , a2 , . . . , an ) are referred to as the coordinates or the components of the vector x with respect to the basis {v1 , v2 , . . . , vn }. The vector x is expressed as   a1 a2    x =  ..  .  an

{v1 ,v2 ,...,vn }

• Coordinates of a vector: The notation   x1 x2    x =  ..  typically means  . xn

 x1  x2    x =  ..   .  

xn

{e1 ,e2 ,...,en }

with respect to the canonical basis which really means x = x1 e1 + x2 e2 + . . . + xn en =

X

xi e1

i=1

The tensor notation for this is x = xi ei where the summation is suppressed. (This is a very useful convention known as the Einstein summation convention or simply as the Einstein notation). In mechanics textbook, it is also common to write x = xi ei where the underline denotes the fact that ei and x are vectors.

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

4

The important point is that the coordinates of a vector x depend upon the basis. That is, if {¯ e1 , e¯2 , . . . , e¯n } is another basis of X then   x ¯1  x ¯2  x=.   ..  x ¯n

which really means

x=

{¯ e1 ,¯ e2 ,...,¯ en }

n X

x ¯i e¯1

i=1

Because they represent the same vector x, the coordinates must be related. • Change of basis: Let {e1 , e2 , . . . , en } and {¯ e1 , e¯2 , . . . , e¯n } be two bases of X. Let us express e¯j =

n X

Pij ei

i=1

Why can this be done? Let us now arrange {Pij : 1 ≤ i ≤ n, 1 ≤ j ≤ n} as a matrix   P11 P12 . . . P1n  P21 P22 . . . P1n   [P] =  .. .. . ..  .  . . . .  Pn1 Pn2 . . . Pnn

This matrix is referred to as the coordinate transformation matrix. Note the (most common) special case when {e1 , e2 , . . . , en } are canonical basis and {¯ e1 , e¯2 , . . . , e¯n } are expressed with respect to the canonical basis then   | | | [P ] = e¯1 e¯2 . . . e¯n | | | Now,

x=

X

xj ej =

j

X j

Therefore, xi =

n X

xj

X i

Pij x ¯j

  X X  Pij ei = Pij x ¯j  ei j

i

or

[x] = [P][¯ x]

j=1

where

 x1  x2   [x] :=  ..   . xn

 x ¯1  x  ¯ 2 [¯ x] :=  ..  . 



{e1 ,e2 ,...,en }

If the coordinate transformation matrix is also invertible then [¯ x] = [P]−1 [x]

x ¯n

{¯ e1 ,¯ e2 ,...,¯ en }

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

5

These formula are generally difficult to remember (at least for me!). We will do a few examples in the class to practice the change of basis. • Lengths and angles: There are two definitions – of norm and inner product – which are of fundamental importance. Definition 2. Suppose X is a vector space. Then a norm, denoted as k · k : X → R+ satisfies 1. kaxk = |a|kxk 2. kxk > 0 for all x 6= 0 and k0k = 0 3. kx + yk ≤ kxk + kyk Example 1. For X = Rn , most common examples of norms are as follows: p 1. kxk2 := x12 + x22 + . . . + x2n 2. kxk1 := |x1 | + |x2 | + . . . + |xn | 3. kxk∞ := maxi∈{1,2,...,n} |xi | where (x1 , x2 , . . . , xn ) are the coordinates of the vector x. Example 2. For X = C([0, 1]), most common examples of norms are as follows: 1/2 R 1 1. kxk2 := 0 |x(t)|2 dt 2. kxk1 :=

R1 0

|x(t)| dt

3. kxk∞ := maxt∈[0,1] |x(t)| A vector space X equipped with a norm is referred to as the normed vector space. Now, the most common (and calculation-friendly) norm is the 2-norm. This norm is related to the second definition. Definition 3. Suppose X is a vector space. Then an inner product, denoted as h·, ·i : X × X → R satisfies 1. (symmetry) hx, yi = hy, xi for all x, y ∈ X 2. (bi-linearity) hax + bz, yi = ahxi + bhz, yi for all x, y ∈ X and a, b ∈ R 3. (positive definiteness) hx, xi > 0 for all x 6= 0 and equals zero when x = 0. Example 3. The most common standard examples of the inner products are as follows: 1. For X = Rn hx, yi = x · y = x⊤ y =

n X i=1

xi yi

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC) 2. For X = C([0, 1]) hx, yi =

Z

6

1

x(t)y(t) dt 0

In both these cases, kxk2 = (hx, yi)1/2 A vector space equipped with an inner product h·, ·i is called a inner product space. It is denoted as (X, h·, ·i). Other notation is ℓ2 for Rn and L2 ([0, 1]) for the function space. Even though we defined the inner product for functions in C([0, 1]), it turns out that the space L2 ([0, 1]) is much larger than C([0, 1]). However, that will require us to first learn how to integrate functions that are not continuous! (This is a good motivation to take some classes in the Math department.) The vector space Rn equipped with the standard inner product is called the Euclidean space. This should be familiar – distance between two points in Rn is the Euclidean distance. Next is the key definition which gives the whole theory its legs: Definition 4. Two vectors x, y ∈ X are orthogonal if hx, yi = 0 We write x ⊥ y. Should I think of orthogonality as two vectors are perpendicular to each other? Yes, you may. In fact, one can define the cosine of angle between two vectors based on the following important inequality. • Cauchy-Schwarz (C-S) inequality: |hx, yi| ≤ kxk2 kyk2 The proof of C-S inequality. Let a := kxk2 and b := kyk2 . Then hay ± bx, ay ± bxi ≥ 0 ∴, a2 kyk2 ± 2abhx, yi + b2 kyk2 ≥ 0 ∴,

± 2kxk2 kyk2 hx, yi ≤ 2kxk22 kyk22

∴, |hx, yi| ≤ kxk2 kyk2 Thanks to the C-S inequality, we may define the angle between two vectors as cos(θ) =

hx, yi kxk2 kyk2

and two vectors are orthogonal means the cosine of the angle between the two vectors is zero. That is, the vectors are perpendicular. • Pythagorus theorem: Suppose x ⊥ y. Then kx + yk22 = kxk22 + kyk22

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

7

The proof again is very simple kx + yk22 = hx + y, x + yi = kxk22 + 2hx, yi + kyk22 = kxk22 + kyk22

if x ⊥ y

Definition 5. Suppose {e1 , e2 , . . . , en } is a basis of X. The basis is an orthogonal basis if ei ⊥ ej whenever i 6= j. The basis is an orthonormal basis (ON) if also kei k2 = 1. The following tensor notation is often useful to define and manipulate ON basis hei , ej i = δij where δij =

∀ 1 ≤ i, j ≤ n



1 if i = j 0 o.w.

Verify alsoP that any orthogonal set of vectors {e1 , e2 , . . . , en } is LI. (This is also a homework problem). Indeed if ai ei = 0 then taking an inner-product of both sides with ej shows that aj = 0 for all j = 1, 2, . . . , n. Proposition 2 (Formulae for coordinates and Parsevals). Suppose {e1 , e2 , . . . , en } is an ON basis of (X, h·, ·i). Then for x ∈ X , X x= hx, ei iei i

and kxk22 =

X

|hx, ei i|2

i

The proof again is elementary. Since {e1 , e2 , . . . , en } is a basis of X and x ∈ X. So, X xi ei x= i

Taking an inner product of both sides with ej , we arrive at the formula for the coordinates xj =

hx, ej i kej k22

Next, kxk22 = h

X i

xi ei ,

X j

xj ej i =

XX i

j

xi xj hei , ej i =

X i

xi2 kei k22 =

X i

|hx, ei i|2

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

8

• Gram-Schmidt orthogonalization procedure: is an algorithm to construct of set of n ON vectors {e1 , e2 , . . . , en } starting from a set of n LI vectors {v1 , v2 , . . . , vn }. The algorithm is as follows: u1 k u1 k 2 u2 − hu2 , e1 ie1 e2 = k · k2 .. .. .= . ui − (hui , e1 ie1 + hui , e2 ie2 + . . . + hui , ei−1 iei−1 ) ei = k · k2 . .. .. .= Pn−1 hun , ei iei un − i=1 en = k · k2 e1 =

where the denominator in all cases is the normalizing factor. The claim is 1. {e1 , e2 , . . . , en } is ON 2. span{e1 , e2 , . . . , en } = span{v1 , v2 , . . . , vn }. So, {e1 , e2 , . . . , en } is an ON basis of span{v1 , v2 , . . . , vn }. The proof of the claim is by induction. We have he2 , e1 i =

 1  hu2 , e1 i − hu2 , e1 ike1 k22 = 0 k · k2

The proof by induction works as follows:

Assume hei , ej i = 0 j dim(Y ). The matrix A is fat. There are more unknowns than equations. We consider each of these cases next. • Case1 : dim(X) = dim(Y ): The simplest example is from your elementary school: ax = y Proposition 5. Suppose A ∈ Rn×n . T.F.A.E. 1. A is 1-1. 2. A is onto. 3. A has n LI columns. 4. A has n LI rows. 5. det(A) 6= 0. In this case, for each y ∈ Rn there exists a unique x ∈ Rn such that Ax = y. We say that the matrix A is invertible and express the solution as x = A−1 y. (Why does this notation make sense? Is A−1 linear?) • Case2 : dim(X) < dim(Y ): The simplest example is     y1 a1  ..   ..   . x =  .  an

yn

From the rank nullity theorem

dim(R(A)) = dim(X) − dim(N (A)) < dim(Y ) So, R(A) ( Y . So, A can not be onto. This means we can not solve the problem exactly for all y ∈ Y . We consider the problem Ax = y. The following proposition describes the least square solution for the problem:

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

18

Theorem 2. Let A ∈ Rℓ×m where ℓ > m (ℓ is for long which is a useful way to remember that number of equations is larger than the number of unknowns). Define W := A⊤ A which is a m × m symmetric matrix. Then 1. N (A) = N (W ). 2. Suppose N (A) = {0}. Then W is invertible and for a given y ∈ R(A) x∗ := W −1 A⊤ y solves Ax = y. 3. Now suppose y ∈ Rℓ is arbitrary (not necessarily in R(A)). Then the ‘solution’ thus defined has the following least-square property: For all vectors x ∈ Rm , |y − Ax|2 ≥ |y − Ax∗ |2 with equality iff x = x∗ . Proof. Proof of part 1. N (A) = N (W ): The proof is in two parts: (i) We first show that N (W ) ⊂ N (A). Suppose η ∈ N (W ). So, W η = 0 =⇒ η ⊤ W η = |Aη |2 = 0 =⇒ η ∈ N (A) (ii) Actually, the converse also follows from the above because for symmetric matrices W η ⊤ W η = 0 =⇒ W η = 0 However, we have not proved this fact. So, we take a more direct route. We wish to show N (A) ⊂ N (W ). By the closed-range theorem, since N (A) = R(A⊤ )⊥ and N (W ) = R(W )⊥ , this is equivalent to showing R(A⊤ )⊥ ⊂ R(W )⊥ which in turn is equivalent to R(W ) ⊂ R(A⊤ ). Suppose y = W η then y = A⊤ Aη. So, y = A⊤ x with x = Aη. That is, y ∈ R(A⊤ ). Therefore, R(W ) ⊂ R(A⊤ ) and the result follows. Proof of part 2.: Since y ∈ R(A), there exist an x ∈ Rm such that Ax = y Multiply both sides by A⊤ to obtain W x = A⊤ y Since W is invertible, the solution is given by x = W −1 A⊤ y =: x∗ Proof of part 3.: Now suppose y ∈ Rℓ is arbitrary. Define the following errors e∗ := y − Ax∗ e := y − Ax

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

19

Clearly, e∗ = 0 iff y ∈ R(A). For an arbitrary y ∈ Rℓ , it may not be zero. Upon subtracting the two errors e − e∗ = A(x∗ − x) ∴, he∗ , e − e∗ i = he∗ , A(x∗ − x)i = hA⊤ e∗ , (x∗ − x)i = 0 where the last equality follows because A⊤ e∗ = A⊤ (y − Ax∗ ) = A⊤ y − W x∗ = 0. Therefore, |e|2 = |e∗ + (e − e∗ )|2 = he∗ + (e − e∗ ), e∗ + (e − e∗ )i = |e∗ |2 + 2he∗ , (e∗ − e)i + |(e − e∗ )|2 = |e∗ |2 + |(e − e∗ )|2 = |e∗ |2 + |A(x − x∗ )|2 Therefore, |y − Ax|2 ≥ |y − Ax∗ |2 with equality iff x = x∗ (since N (A) = {0}).

• Case3 : dim(X) > dim(Y ): The simplest example is   x1 .   a1 . . . an  ..  = y xn From the rank nullity theorem

dim(N (A)) = dim(X) − dim(R(A)) ≥ dim(X) − dim(Y ) > 0 So, N (A) is non-trivial and thus A can not be 1-1. This means the solution is never unique. We consider the problem Ax = y. The following proposition describes its minimum norm solution: Theorem 3. Let A ∈ Rn×ℓ where ℓ > n (ℓ is for long which is a useful way to remember that number of unknowns is larger than the number of equations). Define W := AA⊤ which is n × n symmetric matrix. Then 1. R(A) = R(W ). 2. Suppose R(A) = Rn . Then W is invertible and for any arbitrary y ∈ Rn x = A⊤ W −1 y =: x∗ solves Ax = y. 3. The solution thus defined is minimum norm in the following sense: If x is another solution such that Ax = y then |x|2 ≥ |x∗ |2 with equality iff x = x∗ .

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

20

Proof. Proof of part 1. R(A) = R(W ): The proof is in two parts: (i) We first show that R(W ) ⊂ R(A). If y ∈ R(W ) then there exists η such that Wη = y ∴, AA⊤ η = y with x = A⊤ η

∴, Ax = y So, y ∈ R(A).

(ii) We next show R(A) ⊂ R(W ). Using the closed-range theorem R(A) = N (A⊤ )⊥ , R(W ) = N (W )⊥

(∵, W is symmetric)

So, we need to show N (A⊤ )⊥ ⊂ N (W )⊥ which is true iff N (W ) ⊂ N (A⊤ ). Suppose η ∈ N (W ). Then 0 = η ⊤ W η = η ⊤ AA⊤ η = |A⊤ η|2 which implies η ∈ N (A⊤ ). Proof of part 2. The partical solution: Suppose R(A) = Rn . From part 1, R(W ) = Rn . So, W : Rn → Rn is onto. Therefore, W is invertible and given y ∈ Rn , there exists a unique η ∈ Rn such that Wη = y ∴, AA⊤ η = y ∴, Ax = y

with x = A⊤ η

That is, x = A⊤ W −1 y solves Ax = y. We have thus found a particular solution of the problem Ax = y. Since N (A) is non-trivial so this solution is not unique. In fact, the general solution is x = A⊤ W −1 y + z where z ∈ N (A). (Why?) Proof of part 3. Minimum norm property of the particular solution: Let us denote the particular solution x = A⊤ η =: x∗ . Let x be another solution so Ax∗ = y

and

Ax = y

and A(x∗ − x) = 0 ∴, hη, A(x∗ − x)i = 0 ∴, hA⊤ η, (x∗ − x)i = hx∗ , (x∗ − x)i = 0 We thus have |x|2 = |x∗ + (x − x∗ )|2 = hx∗ + (x − x∗ ), x∗ + (x − x∗ )i = |x∗ |2 + 2hx∗ , (x∗ − x)i + |(x − x∗ )|2 = |x∗ |2 + |(x − x∗ )|2 Therefore, |x|2 ≥ |x∗ |2 with equality iff x = x∗ .

TAM 541 Notes on Linear Algebra – P. G. Mehta (UIUC)

13

That is, choosing a different basis may lead to a new matrix for the same underlying operator A. In most settings, the matrix A is with respect to the canonical bases. Notationally, we often write x instead of [x] and A instead of [A]. At this point, the reader will also benefit from quickly reviewing the change of basis in the sub-part I. In the class, we will draw some diagrams which helps one remember formulae for matrix representation under change of basis: if [x] = [P ][¯ x] and [y] = [Q][¯ y] where (recall) [P ] and [Q] are the coordinate transformation matrix between two bases for X and Y , respectively. Then [¯ y] = [Q]−1 [A][P ][¯ x]. This is quite a mouthful but we will draw some diagrams so this is easy to remember. We will do some neat examples as well. Did you know Laplace transform is also an example of this? • Range and null space: We return to the problem Ax = y. Now, there are three questions of interest: 1. Existence. For what y ∈ Y does a solution exist? 2. Uniqueness. Suppose a solution exists. When is it unique? 3. Computation. How do I find the solution? We will make progress on each of these questions. Definition 7. Suppose A : X → Y . Then the range space Range(A) = R(A) := {y ∈ Y | ∃ x ∈ X s.t. Ax = y}

and the null space Null(A) = N (A)...