Principles of Digital communication Solutions PDF

Title	Principles of Digital communication Solutions
Author	CSC Scholarship
Course	临床内外科实
Institution	Xi'an Jiaotong University
Pages	110
File Size	1.5 MB
File Type	PDF
Total Downloads	136
Total Views	1,017

Preview

CLICK TO PREVIEW PDF

Summary

Description

SOLUTION MANUAL for PRINCIPLES OF DIGITAL COMMUNICATION by ROBERT G. GALLAGER Cambridge University Press, 2008

Emre Koksal, Tengo Saengudomlert, Shan-Yuan Ho, Manish Bhardwaj, Ashish Khisti, Etty Lee, and Emmanuel Abbe were the Teaching Assistants in Subject 6.450, Principles of Digital Communication, at M.I.T. in the years over which this book was written. They all made important contributions to the evolution of many of the solutions in this manual. This final edited version, however, has been prepared by the author, who takes responsibility for any residual errors.

1

Principles of Digital Communication Solutions to Exercises – Chapter 2 Exercise 2.1: What follows is one way of thinking about the problem. It is deﬁnitely not the only way – the important point in this question is to realize that we abstract complex physical phenomenon using simpliﬁed models and the choice of the model is governed by our objective. Speech encoder/decoder pairs try to preserve not only recognizability of words but also a host of speaker dependent information like pitch and intonation. If we do not care about any speaker speciﬁc characteristics of speech, the problem is essentially that of coding the English text that the speaker is producing. Hence, to estimate the rate (in bits per second) that a good source encoder would need, we must estimate two quantities. The ﬁrst is the rate, in English letters per second, that normal speakers achieve. The second is the average number of bits needed per English letter. Rough estimates of the former, which can be made by reading this solution with a stop watch, are 15-20 English letters per second. A simple code for 26 letters and a space would need log2 27 or 4.75 bits per English letter. By employing more sophisticated models of dependencies in the English language, researchers estimate that one could probably do with as few as 1.34 bits per letter. Hence we could envision source coders that achieve ≈20 bits per second (assuming 15 letters per second and 1.34 bits per letter) – which is considerably lower than what the best speech encoders today achieve! Exercise 2.2: (a) V + W is a random variable, so its expectation, by deﬁnition, is   (v + w)pV W (v, w) E[V + W ] = v∈V w∈W

=

 

v pV W (v, w) +

v∈V w∈W

=



v∈V

=



v



 

w pV W (v, w)

v∈V w∈W

pV W (v, w) +

w∈W

w∈W

v pV (v) +

v∈V

= E[V ] + E[W ].





w∈W

2

w pW (w)

w

 v∈V

pV W (v, w)

(b) Once again, working from ﬁrst principles,   (v · w)pV W (v, w) E[V · W ] = v∈V w∈W

=

 

v∈V w∈W

=



(v · w)pV (v )pW (w)

v pV (v)



(Using independence)

w pW (w)

w∈W

v∈V

= E[V ] · E[W ].

(c) To discover a case where E[V · W ] = E[V ] · E[W ], ﬁrst try the simplest kind of example where V and W are binary with the joint pmf pV W (0, 1) = pV W (1, 0) = 1/2;

pV W (0, 0) = pV W (1, 1) = 0.

Clearly, V and W are not independent. Also, E[V · W ] = 0 whereas E[V ] = E[W ] = 1/2 and hence E[V ] · E[W ] = 1/4.

The second case requires some experimentation. One approach is to choose a joint distribution such that E[V · W ] = 0 and E[V ] = 0. A simple solution is then given by the pmf, pV W (−1, 0) = pV W (0, 1) = pV W (1, 0) = 1/3.

Again, V and W are not independent. Clearly, E[V · W ] = 0. Also, E[V ] = 0 (what is E[W ]?). Hence, E[V · W ] = E[V ] · E[W ]. (d) σV2 +W

= E[(V + W )2 ] − (E[V + W ])2 = E[V 2 ] + E[W 2 ] + E[2V · W ] − (E[V ] + E[W ])2

= E[V 2 ] + E[W 2 ] + 2E[V ] · E[W ] − E[V ]2 − E[W ]2 − 2E[V ] · E[W ] = E[V 2 ] − E[V ]2 + E[W 2 ] − E[W ]2 2 . = σV2 + σW

Exercise 2.3: (a) This is a frequently useful form for the expectation of a non-negative integer rv. Expressing Pr(N ≥ n) as pn + pn+1 + · · · , ∞ 

n=1

Pr(N ≥ n) =

p1 +p2 +p3 +p4 · · · +p2 +p3 +p4 · · · +p3 +p4 · · · +p4 · · · ··· ∞  npn = E[N ]. = p1 + 2p2 + 3p3 + · · · = n=1

3

(b) Viewing the integral as a limiting sum,    ∞  X ǫ = E[X ]. Pr(X ≥ a) da = lim Pr(X ≥ nǫ) ǫ = E ǫ→0 ǫ 0 n>0

(c) a Pr(X ≥ a) ≤ Area = a Pr(X ≥ a)

Pr(X ≥ a)

∞ 0

Pr(X ≥ x) dx = E[X ].

a

(d) As the hint says, let X = (Y − E[Y ])2 . Then for any a ≥ 0, Pr{(Y − E[Y ])2 ≥ a) ≤

Letting b =

√ a completes the derivation.

E[(Y − E[Y ])2 σ2 = Y. a a

Exercise 2.4: (a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X 2 ) = Pr(X2 > X1 ). These two events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X 2 ) and Pr(X1 < X2 ) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2 ) = Pr(X2 ≥ X1 ) = 1/2. (b) Invoking the symmetry among X1 , X2 and X3 , we see that each has the same probability of being the smallest of the three (the probability of a tie is 0). These three events are mutually exclusive and their probabilities must add up to 1. Therefore each event occurs with probability 1/3.

(c) The event {N > n} is the same as the event {X1 is the minimum among the n iid random variables X1 , X2 , · · · , Xn }. By extending the argument in part (b), we see that 1 Pr(X1 is the smallest of X1 , . . . , Xn ) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= n−1 for n ≥ 2. (d) Since N is a non-negative integer random variable (taking on values from 2 to ∞), we can use Exercise 2.3(a) as follows: E[N ] =

∞ 

n=1

Pr {N ≥ n}

= Pr {N ≥ 1} + = 1+

∞ 

n=2

1 n−1

∞  1 . = 1+ n n=1

4

∞ 

n=2

Pr {N ≥ n}

Since the series

∞

1 n=1 n

diverges, we conclude that E[N ] = ∞.

(e) Since the alphabet has a ﬁnite number of letters,1 Pr(X1 = X2 ) is no longer 0 and depends on the particular probability distribution. Thus, although, Pr(X1 ≥ X2 ) = Pr(X2 ≥ X1 ) by symmetry, neither can be found without knowing the distribution. Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric value. If X1 = amin , then no subsequent rv X2 , X 3 , . . . can have a smaller value, so N = ∞ in this case. Since the event X1 = amin occurs with positive probability, E[N ] = ∞. Exercise 2.5: First observe that for any n ≥ 1, the number of binary n-tuples with an even number of 1’s is equal to the number of binary n-tuples with an odd number of 1’s. To see this, note that for each sequence (x1 , x 2 , · · · , xn−1 , 0), there corresponds a unique sequence (x1 , x2 , · · · , x n−1 , 1). One of these two sequences has an odd number of 1’s while the other has an even number of 1’s. Since all 2n n-sequences are equally likely, the probability of an even number of ones then equals the probability of an odd number, i.e., Pr(X1 ⊕ X2 ⊕ · · · ⊕ Xn = 0) = Pr(X1 ⊕ X2 ⊕ · · · ⊕ Xn = 1) =

1 . 2

(a) Since Z = X1 ⊕ · · · ⊕ Xn , this shows that the binary random variable Z takes on the values 0 and 1 with equal probability. Now, conditional on X1 = 0, Z will be 0 if X2 ⊕ · · · ⊕ Xn = 0, which, from above (using X2 . . . , Xn in place of X1 , . . . , X n ) has probability 1/2. Similarly, conditional on X1 = 1, Z again takes on the values 0 and 1 with probability 1/2 each. Thus Z is independent of X1 . (b) Given that X1 = x1 , X 2 = x2 , · · · , Xn−1 = xn−1 , Z is equal to 0 if Xn = x1 ⊕ x2 ⊕ · · · ⊕ xn−1 . Since Xn is independent of X1 , . . . , X n−1 , and is equiprobably 0 or 1, it follows that Z = 0 under this conditioning with probability 1/2 and Z = 1 with probability 1/2. Since this is true for all choices of x1 , . . . , x n−1 it follows that Z is independent of X1 , . . . , X n−1 . (c) Z, X1 , X2 , · · · , Xn are NOT statistically independent, and in fact, Z is uniquely determined by X1 , . . . , Xn . (d) Z is NOT statistically independent of X1 if Pr(Xi = 1) = p = 1/2. We demonstrate this for n = 2. The conditional PMF pZ|X1 (z|x1 ) for x1 = 0 and 1 is pZ|X1 (0|0) = Pr(X2 = 0) = 1 − p, pZ|X1 (0|1) = Pr(X2 = 1) = p. Since the conditional PMF for Z depends on X1 , Z and X1 are statistically dependent. The purpose of this question is to show that in a group of n+1 random variables, pairwise statistical independence (part a) does not imply statistical independence of all the random variables (part c) (even with the statistical independence of groups of n random variables in part (b).

1

The same results can be obtained with some extra work for a countably inﬁnite discrete alphabet.

5

Exercise 2.6: (a) Assume the contrary; i.e., there is a suﬃx-free code that is not uniquely decodable. Then that code must contain two distinct sequences of source letters, say, x1 , x 2 , . . . , xn and x′1 , x′2 , . . . , x′m such that, ′ ). C(x1 )C (x2 ) . . . C (xn ) = C (x1′ )C(x2′ ) . . . C(xm

Then one of the following must hold: ′ ) • C(xn ) = C (xm ′ ) • C(xn ) is a suﬃx of C(xm ′ ) is a suﬃx of C(x ). • C(xm n

In the last two cases we arrive at a contradiction since the code is hypothesized to be suﬃx-free. In the ﬁrst case, xn must equal x′m because of the suﬃx freedom. Simply delete that ﬁnal letter from each sequence and repeat the argument. Since the sequences are distinct, the ﬁnal letter must diﬀer after some number of repetitions of the above argument, and at that point one of the latter two cases holds and a contradiction is reached. Hence, suﬃx-free codes are uniquely decodable. (b) Any preﬁx-free code becomes a suﬃx-free code if the ordering of symbols in each codeword is reversed. About the simplest such example is {0,01,11} which can be seen to be a suﬃx-free code (with codeword lengths {1, 2, 2}) but not a preﬁx-free code.

A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the following output produced by the encoder, 0111111111 . . .

Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two possible source sequences, acccccccc . . . and bcccccccc . . . , till the end of the string is reached. Hence, in this case the decoder might have to wait for an arbitrarily long time before decoding. (c) There cannot be any code with codeword lengths (1, 2, 2) that is both preﬁx free and suﬃx free. Without loss of generality, set C1 = 0. Then a preﬁx-free code cannot use either the codewords 00 and 01 for C2 or C3 , and thus must use 10 and 11, which is not suﬃx free. Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then, u1 =0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the binary point, which is not possible. Hence, the algorithm fails.

6

Exercise 2.8: The Kraft inequality for D-ary alphabets can be stated thus: Every preﬁx-free code for an alphabet X with codeword lengths {ℓ(x), x ∈ X } satisﬁes,  D−ℓ(x) ≤ 1. x∈X

Conversely, if the inequality is satisﬁed, then a preﬁx-free code with lengths {ℓ(x)} exists.

The key is to identify the critical steps that make the proof work in the binary case and to recognize that they can always be extended to the D-ary case:  • If we associate the number 0.y1 y2 . . . yn = ni=1 yi D−i with the codeword y1 y2 . . . yn , then all codewords that begin with y1 , y 2 , . . . , yn lie in the interval [0.y1 y2 . . . yn , 0.y1 y2 . . . yn +D−n ). • Since the code is preﬁx-free, no two intervals can overlap. • Since the intervals are disjoint and all contained in [0, 1), the sum of all the interval lengths is less than or equal to one. Hence, every preﬁx free code for a D-ary alphabet must satisfy the generalized Kraft’s inequality stated above. We can similarly prove the construction, by essentially mimicking the code construction algorithm outlined at the end of section 2.3.3. Exercise 2.9: (a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9) H[X] − L =

M  j=1

 −lj  M  2 2−lj ≤ pj pj log − 1 log e = 0. pj pj j=1

As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if H[X] = L, it follows that 2−lj = pj for each j . (b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) = 1/4. The ﬁrst order node 0 corresponds to the letter a and has probability 1/2. The ﬁrst order node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2. ✏ ✏✏ bb ✏P ✏ P Pbc ✏ ✟PP ✟ ba ✟ b P✏ ✏✏cb ❍ ✏PP ✏ ✓ ❍ ✓ P ❍✏ PP c Pca cc 1✓ ✓ ✏ab ❅ ✏✏ 0❅ ✟ ✟PP Pac a ❅✟ PP Paa

aa → 00 ab → 011 ac → 010 ba → 110 bb → 1111 bc → 1110 ca → 100 cb → 1011 cc → 1010

Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with higher order nodes in the same way, but what is the principle behind this?

7

In general, when an inﬁnite binary tree is used to represent an unending sequence of letters from an iid source where each letter j has probability pj and length ℓj = 2 −j , we see that each −ℓxi node corresponding to an initial sequence of letters x1 , . . . , x n has a probability equal i2  to the product of the individual letter probabilities and an order equal to i ℓxi . Thus each node labeled by a subsequence of letters has a probability 2−ℓ where ℓ is the order of that node. The other nodes (those unlabeled in the example above) have a probability equal to the sum of the immediately following labeled nodes. This probability is again 2−ℓ for an ℓth order node, which can be established by induction if one wishes to be formal. Exercise 2.10: (a) Since the code satisﬁes the Kraft inequality with equality, the corresponding tree is full. Now let ℓmax be the length of the longest codeword. Instead of asking how large ℓmax can be for a given M , we ask how small M can be for a given ℓmax . It is easy to see, by drawing a few trees, that for ℓmax = 2, M ≥ 3; similarly, for ℓmax = 3, M ≥ 4. This leads us to hypothesize that for arbitrary ℓmax , we have M ≥ lmax + 1, achieved by the tree in the ﬁgure below: ❍

❍

❍

❍ ❍a

❍

❍

❍ ❍b

❍

❍

❍ ❍c

❍

e ❍

❍d

To verify this, note that the sibling of each intermediate node (including the root) on the path to a given codeword of length ℓmax must either be a single codeword or a subtree with multiple codewords. Thus M ≥ lmax + 1 and consequently, lmax ≤ M − 1. Since there are a ﬁnite number of distinct binary sequences of length M − 1 or less, there are ﬁnitely many full code trees with M code words (in the next part, we derive how many such codes there are). (b) It is obvious that there is only one full tree with 2 leaf nodes. Thus S(2)=1. Next we use induction to calculate the number of full binary code trees with M > 2. For given M , ﬁrst consider the number of trees in which j codewords start with 0 and M − j start with 1 (1 ≤ j ≤ M − 1). Since the set of codewords starting with 0 must form a full code tree with the initial 0 omitted, there are S(j) such trees. As seen above, S(2) = 1, and trivially S(1) = 1. Similarly there are S(M − j) diﬀerent trees starting with 1. Thus for given j, there are S(j )S (M − j) code trees of M codewords for which j codewords start with 0, It follows that S(M ) =

M −1  j=1

S(j )S(M − j ).

One can check the formula for small values of M , getting S(3) = 2, S (4) = 5, etc. Exercise 2.11: (a) For n = 2,   2   M  M M M M     −lj2  = −lj1   −lj    2−(lj1 +lj2 ) . 2 2 2 = j=1

j 1 =1

j 2 =1

j 1 =1 j 2 =1

The same approach works for arbitrary n.

(b) Each source n-tuple xn = (aj1 aj2 , . . . , ajn ), is encoded into a concatenation C(aj1 )C(aj2 ) . . . C(ajn ) of binary digits of aggregate length l(xn ) = lj1 + lj2 + · · · , +ljn . Since 8

there is one n-tuple xn for each choice of aj1 , a j2 , . . . , ajn , the result of part (a) can be rewritten as n  M   n  2−l(x ) . (1) 2−lj  = xn

j=1

(c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length i, n  M nl max    Ai 2−i . 2−lj  = j=1

i=1

This uses the fact that since each codeword has length at most lmax , each concatenation has length at most nlmax . (d) From unique decodability, each of these concatenations must be diﬀerent, so there are at most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i . Thus, since the above sum contains at most nlmax terms, n  M   2−lj  ≤ nlmax . (2) j=1

(e) Note that 1/n

[nlmax ]

 ln(nlmax ) −→ exp(0) = 1 = exp n 

as n → ∞. Since (2) must be satisﬁed for all n, the Kraft inequality must be satisﬁed. Exercise 2.12: (a) and (b) One Huﬀman code is a → 0, b → 10, c → 110, d → 111 where c and d are the less probable letters, and another is a → 00, b → 01, c → 10, d → 11.

(c) The code a → 00, b → 11, c → 10, d → 01 is also optimal but cannot arise from the Huﬀman procedure since the two least likely codewords must be siblings. Exercise 2.13:

(a) In the Huﬀman algorithm, we start by combining p3 and p4 . Since we have p1 = p3 +p4 ≥ p2 , we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding codewords of lengths 1,2,3 and 3 respectively. (b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2 . Thus p1 = p3 + p4 ≤ 2p2

which implies

p1 + p3 + p4 ≤ 4p2 .

Since p2 = 1 − p1 − p3 − p4 , the latter equation implies that 1 − p2 ≤ 4p2 , or p2 ≥ 0.2. From the former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also choosing p3 = p4 = 0.2. Thus pmax = 0.4. 9

(c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1 . Thus 1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3 and p3 = p4 = 1/6. (d) The argument in part (b) remains the same if we assume p1 ≤ p3 +p4 rather than p1 = p3 +p4 , i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax . Thus assuming p1 > p max implies that p1 > p3 + p4 . Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol 2 (or perhaps with symbol 1 if p2 = p1 ). Thus the codeword for symbol 1 (or perhaps the codeword for symbol 2) will have length 1. (e) The lengths of any optimal preﬁx free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 > pmax , then, from (b), p1 > p3 + p4 , so the lengths (1, 2, 3, 3) yield a lower average length than (2, 2, 2, 2). (f) The argument in part (c) remains almost the same if we start with the assumption that p1 ≥ p3 + p4 . In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1 . Combined with p1 ≥ p2 , we again have p1 ≥ pmin. Thus if p1 < pmin, we must have p3 + p4 > p 1 ≥ p2 . We then must combine p1 and p2 in the second step of the Huﬀman algorithm, so each codeword will have length 2. (g) It turns out that pmax is still 2/5. To see this, ﬁrst note that if p1 = 2/5, p 2 = p3 = 1/5 and all other symbols have an aggregate probability of 1/5, then the Huﬀman code construction combines the least likely symbols until they are tied together into a supersymbol o...