P1b Lluis Acosta Eloi Guerrero Annex PDF

Title	P1b Lluis Acosta Eloi Guerrero Annex
Author	Rafael Saldaña Ojeda
Course	Redes
Institution	Universitat Autònoma de Barcelona
Pages	3
File Size	110.4 KB
File Type	PDF
Total Downloads	109
Total Views	138

Preview

CLICK TO PREVIEW PDF

Summary

apuntes...

Description

Lab 1b: Introduction to information theory and data compression. Part II Eloi Guerrero Men´endez

Llu´ıs Acosta Basaga˜nas

Escola d’Enginyeria Universitat Aut` onoma de Barcelona NIU: 1392355 [email protected]

Escola d’Enginyeria Universitat Aut` onoma de Barcelona NIU: 1391183 [email protected]

Abstract—Simulation and calculation of Huffman and Shannon-Fano codes using MATLAB.

I. I NTRODUC TION

In this practical session we will continue working on data compression using two different coding methods: Huffman and ShannonFano.

II. OBJ EC TIVES

The main objectives for this session are the following:

Fig. 1: Huffman expansion tree for (2,2,2,3,3)

• Continue working on data compression. • Understand and apply Huffman and Shannon-Fano coding.

III. P RACTICAL REALIZATION

A. Huffman coding This laboratory session introduces us into how does Huffman coding work. It is an optimal coding method that achieves shortest average length. The ﬁrst exercise is the following: 1) It is easy to verify that (2, 2, 2, 3, 3)· (0.4, 0.2, 0.2, 0.1, 0.1)T = 2.2 and (1, 2, 3, 4, 4) · (0.4, 0.2, 0.2, 0.1, 0.1)T = 2.2. Thus, both the Fig. 2: Huffman expansion tree for (1,2,3,4,4) compact code (2, 2, 2, 3, 3) and (1, 2, 3, 4, 4) are also Huffman codes for the probability distribution (0.4, 0.2, 0.2, 0.1, 0.1) (because they 2) Calculate the Huffman code length for the ﬁles that were used in have the same compression rate that the previous codes). Use the Huffman coding method so that the result is the compact code (2, 2, session 1b and comment results.: To calculate Huffman code lengths in a fast manner using MATLAB we will implement the following 2, 3, 3). Then, repeat the same with (1, 2, 3, 4, 4). The probability distribution (0.4, 0.2, 0.2, 0.1, 0.1) its interesting as three different function: Huffman codes can be designed for it. We can see that Huffman Listing 1: MATLAB: Compression rate calculation codes might not be unique. It will only be unique for a speciﬁc set of probability distributions.: See ﬁgures 1 and 2 for the result of f u n c t i o n R = c omp fac tor ( m is sa t ge ) applying the Huffman algorithm to the codes.

f i d = fopen ( missatge , ’r ’) ; m1 = f r e a d ( f i d , 9 0 0 00 , ’ u i n t 8 ’ ) ; fclose ( fid ); f r e q = h i s t ( m1 , 0 : 2 5 5 ) ; b a r ( f r e q ) %M os t r a d e l ’ h i s t o g r a m a freq=freq (( find ( freq >0))); %Nomes l e s p r o b a b i l i t a t s d i f . a 0 length ( freq ) p= f r e q / sum ( f r e q ) %p e s l a d i s t r i b u c i o e m p i r i c a d e p r i m e r o r d r e L= h u f f m a n c o d e l e n g t h s ( p ) ; R=sum ( s o r t ( p ) . ∗ L ) en d Using the function huffmancodelenghts we were provided with, we can rapidly extract the Huffman code length of each ﬁle: different formats of the Silmarillion and dibuix. Results are shown in table 1.

%p e s l a d i s t r i b u c i o e m p i r i c a de p r i m e r o r d r e L= h u f f m a n c o d e l e n g t h s ( p ) ; R=sum ( s o r t ( p ) . ∗ L ) ; L i = c e i l (− l og 2 ( s o r t ( p ) ) ) ; K=sum ( s o r t ( p ) . ∗ L i ) ; en d

See results extracted from the function in table 2.

TABLE II: Entropy, Huffman and Shannon-Fano lengths File silmarillion.txt silmarillion.rar silmarillion.doc silmarillion.pdf dibuix.bmp dibuix.jpg

Entropy 4.3330 bits 7.9973 bits 4.4065 bits 7.9941 bits 0.0084 bits 5.8456 bits

L Huffman 4.3756 bits 8 bits 4.4490 bits 8 bits 1.0012 bits 5.8653 bits

L Shannon-Fano 4.8329 bits 8.4712 bits 4.8868 bits 8.5088 bits 1.0073 bits 6.4850 bits

TABLE I: Entropy and Huffman average length File silmarillion.txt silmarillion.rar silmarillion.doc silmarillion.pdf dibuix.bmp dibuix.jpg

Entropy 4.3330 bits 7.9973 bits 4.4065 bits 7.9941 bits 0.0084 bits 5.8456 bits

L Huffman 4.3756 bits 8 bits 4.4490 bits 8 bits 1.0012 bits 5.8653 bits

As we can see in the table, every Huffman code length will stay between entropy and entropy+1. At the same time we can conﬁrm that Huffman coding is compact and optimal and in the same way we can see the conﬁrmation of the ﬁrst Shannon theorem, that states there is no code that can have an average length that is lower than the entropy. B. Shannon-Fano coding

As we can easily spot, Shannon-Fano lengths are also near the entropy value and are also kept inbetween the (entropy, entropy+1) interval. However they are not optimal because Huffman lengths are closer to he entropy. 2) Compare Entropy, Huffman code length and Shannon-Fano code length. Give some conclusions.: Seeing table 2 we can conclude that Shannon-Fano codes are not optimal because Huffman codes give us better results. However, as we previously said, Huffman coding requires from high computational resources and that is a strong constraint when we want to have really fast coding techniques. In this case, we have proven that Shannon-Fano codes are much easier to obtain and give an average length that is also really near the entropy value.

Huffman codes are optimal and give best performance but they 3) Write a program that ﬁnds SHannon-Fano words from a data have a strong counterpart: they require long calculation times and computational resources. Shannon-Fano codes are easier to compute ﬁle.: Reusing the code we prepared for the previous exercise, we can but they aren’t optimal and so, they might not be compact sometimes. just add the following segments. 1) Write a MATLAB function that returns the Shannon-Fano code length of a probability array. Apply this function to the ﬁles.: We will now improve our MATLAB code as shown below Listing 3: MATLAB Shannon-Fano codewords (added to the previous code) Listing 2: MATLAB compression rate calculation v.2 function %M i l l o r a %on K e s %R e s l a

[ R , K] = c o m p f a c t o r ( m i s s a t g e ) de l codi a n te r i o r l a l o n g i t u d m i t j a de S han no n−Fan o i l o n g i t u d m i t j a d e H uf fm an .

f i d = fopen ( m i ssatge , ’r ’) ; m1 = f r e a d ( f i d , 9 00 00 , ’ u i n t 8 ’ ) ; fclose ( fid ); f r e q = h i s t (m1 , 0 : 2 5 5 ) ; b ar ( f r e q ) ; f r e q = f r e q ( ( f i n d ( f re q >0))); %Nomes l e s p r o b a b i l i t a t s d i f . a 0 length ( freq ); p= f r e q / sum ( f r e q ) ;

F ( 1 ) = 0 ; %e l p r i m e r v a l o r s e r a 0 f o r i = 1: l e n g t h ( p ) F ( i +1)= F ( i )+ p ( i ) ; en d %r e c or r e m e l v e c t o r de p r ob o m p l i n t e l v e c t o r %de p a r a u l e s f o r j = 1: l e n g t h ( F) − 1 de c2bi n ( f l o o r ( 2 ˆ ( s o r t ( L i ( j ) ) ) ∗ F ( j ) ) , L i ( j ) ) en d %r e c o r r e m e l v e c t o r de p a r a u l e s i passe m a bi n %de l a for m a que e n s i n d i c a l a p r a c t i c a p e r SF

Test to prove we obtain correct results

Fig. 3: Results of MATLAB code 3 - Shannon Fano coding IV. C ONC LUSION Through this laboratory session we have understood how Huffman and Shannon-Fano coding work and their power on data compression algorithms. As we had seen in the previous session, this is a fundamental topic because every communications system objective must be transmitting the maximum amount of data while using the less possible resources (symbols, data weight...)....