Hw3B S18 fp Maria short Solns pdf PDF

Title	Hw3B S18 fp Maria short Solns pdf
Author	Naruto 1999
Course	Computer Architecture
Institution	Rutgers University
Pages	12
File Size	697.4 KB
File Type	PDF
Total Downloads	8
Total Views	121

Preview

CLICK TO PREVIEW PDF

Summary

Download Hw3B S18 fp Maria short Solns pdf PDF

Description

COMPUTER ARCHITECTURE & ASSEMBLY LANGUAGE 14:331:331 Instructor: Maria Striki Rutgers University Spring 2018 Homework 3B Floating Point 120 Points Issued: Fri, March 30th 18, Due: Wed April 11, 2018 2-member Group Names: Exercises Allocation Table:

Problem 1 (10 marks) 1. Write the binary representation of number 1033.24 in IEEE 754 standard in single precision. Express the result in binary, oct, and hex formats. 2. Write the binary representation of number 71.37 in IEEE 754 standard in double precision. Express the result in binary, oct, and hex formats. 3. Register f3 contains the 32-bit number 10101010 11100000 00000000 00000000. What is the corresponding signed decimal number? Assume IEEE 754 representation.

Solution: Part 1: Positive = 0 for the first bit Exponent = 1033.24/(2^10) normalizes. e-127 = 10, e = 137 = 10001001 exponent 1.0090234375 is the normalized value - 1 = 0.0090234375

Binary fraction = 00000010010011110101110 Full Binary: 0 10001001 00000010010011110101110 Oct = 01 000 100 100 000 010 010 011 110 101 110 = 10440223656 octal Hex = 0100 0100 1000 0001 0010 0111 1010 1110 = 448127AE hex Part 2: Positive = 0 for first bit Exponent = 71.37 / (2^6) = 1.11515625, e-1023 = 6, e = 1029 = 10000000101 Fraction = 1.11515625 - 1 = .11515625 Binary fraction = 000111010111101011100001010001111010 1110000101001000 Full binary = 0 10000000101 000111010111101011100001010001111010 1110000101001000 Octal = 0401216572702436560510 octal Hex = 4051D7AE147AE148 hex

Part 3: 1

Problem 2 (20 marks): Consider the numbers 25.524 and 0.4433770219. Please normalize both. Calculate their sum by hand. Convert to binary assuming each number is stored in a 16-bit register. Half-precision binary floating-point has: sign bit: 1bit, exponent width: 5bits and a bias of 15, and significand 10 bits (16 bits total). 4. Show each step of their binary addition, assuming you have one guard, one round, and one sticky bit, rounding to the nearest even.

1. 2. 3.

Solution: Part 1: 25.524 2.524x101 , 0.4433770219 = 4.433770219x10-1 . Part 2:

Part 3:

Part 4:

2

Problem 3 (20 marks): Assume the IEEE 754 floating point format is used and $f1, $f2, and $f3 store the floating numbers as below: $f1: 1010 1010 0110 1010 1001 1010 0000 0011 $f2: 1010 1010 0100 1011 0100 0010 0000 1010 $f3: 0010 0001 0011 0111 0101 0111 0110 0001 a) Please show the step by step result computing for ($f1+$f2)+$f3 (in floating point format). (first show the steps required to compute $f1+$f2 and then the steps for adding this results to $f3) b) Please show the values of each register in decimal as well as the value of the final result ($f1+$f2+$f3) in decimal. Solution: $f1: Sign

Exponent

Fraction

1

0101 0100

110 1010 1001 1010 0000 0011

$f1 = (-1)1 x (1+0.8328250646591186523437510) x 284-127 = -1.83282506465911865234375 x 2-43 $f2: Sign

Exponent

1

010 1010 0

Fraction 100 1011 0100 0010 0000 1010

$f2 = (-1)1 x (1+0.587952852249145507812510) x 284-127 = -1.587952852249145507812510 x 2-43

$f1+$f2: -1.1011 0101 1101 1100 0000 110 x 2-42 (renormalized) Decimal representation: -1.7103888988494873046875 x 2-42 Sign

Exponent

Fraction

1

0101 0101

1011 0101 1101 1100 0000 110

3

$f3: Sign

Exponent

Fraction

0

0100 0010

011 0111 0101 0111 0110 0001

$f3 = (-1)0 x (1.011011101010111011000012) x 266-127 = 1.01101110101011101100001x 2-61 Decimal representation: 1.43235409259796142578125 x 2-61

($f1+$f2)+$f3: ($f1+$f2)+$f3 = -1.10110101110111000000110 x 2-42 + 1. 01101110101011101100001 x 2-61 = -1.10110101110110111110000 x 2-42 (no need to renormalize) Sign

Exponent

Fraction

1

0101 0101

101 1010 1110 1101 1111 0000

Decimal representation: -1.7103862762451171875x 2-42

4

Problem 4 (15 marks): a) Repeat Question 3.a and show the steps required to compute ($f1*$f2)*$f3. b) Show the result of ($f1*$f2)*$f3 in decimal. Solution: $f1: Sign

Exponent

Fraction

1

0101 0100

110 1010 1001 1010 0000 0011

$f1 = (-1)1 x (1+0.8328250646591186523437510) x 284-127 = -1.83282506465911865234375 x 2-43 $f2: Sign

Exponent

1

010 1010 0

Fraction 100 1011 0100 0010 0000 1010

$f2 = (-1)1 x (1+0.587952852249145507812510) x 284-127 = -1.587952852249145507812510 x 2-43

$f1*f2: Sign = 1 XOR 1 = 0 (+) Exponent (unbiased): (-43) + (-43) = -86 1.110101010011010000000112 x 1.100101101000010000010102 = 10.11 1010 0100 0100 1010 0101 0011 1111 1100 1010 0001 11102 x 2-86 First we normalize the product. Thus, 10.11 1010 0100 0100 1010 0101 0011 1111 1100 1010 0001 11102 x 2-86 = 1.011 1010 0100 0100 1010 0101 0011 1111 1100 1010 0001 11102 x 2-85 There is no underflow because -127< -85 < 128.

5

We need to round to the correct number of bits, that is 24 in total (23 of the fraction plus 1 for the hidden bit). So, the number is 1.011 1010 0100 0100 1010 01012 x 2-85 Check again if there is a need to normalize, and in this case we do not have to. The result is Sign

Exponent

0

0010 1010

Fraction 011 1010 0100 0100 1010 0101

b. ($f1*$f2)*$f3 $f3: Sign

Exponent

Fraction

0

0100 0010

011 0111 0101 0111 0110 0001

$f3 = (-1)0 x (1.011011101010111011000012) x 266-127 = 1.01101110101011101100001x 2-61 Decimal representation: 1.43235409259796142578125 x 2-61 ($f1*$f2)*$f3: Sign = 0 XOR 0 = 0 (+) Exponent (unbiased): (-85) + (-61) = -146 1.011101001000100101001012 x 1.011011101010111011000012 = 10.00 0101 0110 0110 1010 0101 1101 1011 0001 0101 1000 01012 x 2-146 First we normalize the product. Thus, 10.00 0101 0110 0110 1010 0101 1101 1011 0001 0101 1000 01012 x 2-146 = 1.000 0101 0110 0110 1010 0101 1101 1011 0001 0101 1000 01012 x 2-145 There is underflow because -145< -127 and we throw an exception.

6

Problem 5 (20 marks): EEE 754-2008 contains a half precision that is only 16 bits wide. The left most bit is still the sign bit, the exponent is 5 bits wide and has a bias of 15, and the mantissa is 10 bits long. A hidden 1 is assumed. Part a) Calculate (3.984375 x10-1 + 3.4375 x10-1) + 1.771 x103 by hand, assuming each of the values are stored in the 16-bit half precision format described above. Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal. Part b) Calculate 3.984375 x10-1 + (3.4375 x10-1 + 1.771 x103) by hand, assuming each of the values are stored in the 16-bit half precision format described above. Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal. Part c) Does the answer in a) is equivalent to the answer in b)? Solution:

Part a)

Part b:

7

Part c:

Problem 6 (20 marks): EEE 754-2008 contains a half precision that is only 16 bits wide. The left most bit is still the sign bit, the exponent is 5 bits wide and has a bias of 15, and the mantissa is 10 bits long. A hidden 1 is assumed. Part a) Calculate (3.41796875 x10-3 x 6.34765625 x10-3) x 1.05625 x102 by hand, assuming each of the values are stored in the 16-bit half precision format described above. Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal. Part b) Calculate 3.41796875 x10-3 x ( 6.34765625 x10-3 x 1.05625 x102) by hand, assuming each of the values are stored in the 16-bit half precision format described above. Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal. Part c) Does the answer in a) is equivalent to the answer in b)? Solution: 8

Part a)

We note that the exponent on the number is -16, which is out of the field of half precision numbers. This means that you cannot store this result in this type of register, which will result in an underflow error. Part b)

9

Part c

10

Problem 7 (15 marks): You are asked to design a system of FP numbers representation (with your own design choices), labeled Custom_FP_48, for which we have 48 bits at our disposal in order to represent a number, in analogy with the IEEE 754 prototype. You are asked to provide: a) The types for evaluating this number b) The width of representation of these numbers (upper and lower) c) The maximum precision (i.e., the minimum difference between two successive numbers).

Solution: This problem can have multiple acceptable solutions, depending on the relative weight that one attributes to the precision and representation width of the above Custom_FP_48 system. One such solution can be the following: a) Evaluation types: s 1 bit

e 10 bits

f 37 bits

The bias b is the median value that can be represented by the bits of e. Bias = 511 (since 210 – 1 = 1023). b) Representation Width (upper and lower): The value of number Z is defined as follows: - If 0...