08383694 - research paper PDF

Title	08383694 - research paper
Author	Azhar Zahid
Course	Design of Advanced Manufacturing Systems
Institution	University of Engineering and Technology Taxila
Pages	14
File Size	1.4 MB
File Type	PDF
Total Downloads	26
Total Views	134

Preview

CLICK TO PREVIEW PDF

Summary

research paper...

Description

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

1

Approximate Multipliers Based on New Approximate Compressors Darjn Esposito , Member, IEEE, Antonio Giuseppe Maria Strollo , Senior Member, IEEE, Ettore Napoli , Senior Member, IEEE, Davide De Caro , Senior Member, IEEE, and Nicola Petra, Member, IEEE Abstract— Approximate computing is an emerging trend in digital design that trades off the requirement of exact computation for improved speed and power performance. This paper proposes novel approximate compressors and an algorithm to exploit them for the design of efficient approximate multipliers. By using the proposed approach, we have synthesized approximate multipliers for several operand lengths using a 40-nm library. Comparison with previously presented approximated multipliers shows that the proposed circuits provide better power or speed for a target precision. Applications to image filtering and to adaptive least mean squares filtering are also presented in the paper. Index Terms— Approximate multiplier, digital arithmetic.

computing,

approximate

I. I NTRODUCTION

A

PPROXIMATE computing is an emerging trend in digital design [1], [2], relaxing the requisite of exact computation to gain substantial performance improvement in terms of power, speed and area. This approach is becoming more and more important for embedded and mobile systems, characterized by severe energy and speed constraints. Approximate computing can be fruitfully applied in several error-resilient applications. Examples are multimedia processing [3], data mining and recognition [2], machine learning. Multipliers are fundamental subsystems for microprocessors, digital signal processors, and embedded systems with applications ranging from filtering to convolutional neural networks. Unfortunately, multipliers are characterized by complex logic design [4] and constitute one of the most energy-hungry digital blocks [5]. Therefore, approximate multiplier design has become an important research subject in recent years [6]. A multiplier includes a few basic blocks: partial products generation, partial products reduction and carry-propagate addition. Approximations can be introduced in any of these blocks [6]. For example, truncation of the partial products is a well stablished approximation technique in which some of the partial products are not formed and the truncation error is reduced with the help of suitable correction functions [7]–[9]. Manuscript received January 10, 2018; revised March 12, 2018; accepted May 9, 2018. This work was supported by the Italian PRIN Project Advanced Nanometer IC Technologies for Next Generation Transceivers under Grant 2015ABZ44K. This paper was recommended by Associate Editor Y. Yu . (Corresponding author: Darjn Esposito.) The authors are with the Department of Electrical Engineering and Information Technology, University of Napoli Federico II, 80138 Naples, Italy (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSI.2018.2839266

Other approaches simplify the partial-product matrix by using approximate 2 × 2 or 4 × 4 sub-multipliers [10], [11]. Most of the proposed approximate multipliers rely on approximations in the partial product reduction step. Partial products reduction uses compressors to turn a multi-operand sum into a two-operand addition, using logarithmic schemes, such as Wallace [12], Dadda [13], or the Three-Dimensional Method TDM [14], [15]. A compressor is a logic circuit that counts the number of “ones” in the input. The simple and most used compressor is the full-adder (acting as a 3/2 compressor), but higher-order compressors (such as 4/2 or 5/3 [16]–[20]) are also employed. Compressors are XOR-rich circuits and partial products reduction is a critical multiplier block in terms of speed, power and area. Therefore, several approaches have been recently proposed that use approximate compressors for partial products reduction in approximate multipliers. Kelly et al. [21] propose approximate compressors obtained by discarding (truncating) the outputs of exact compressors; a similar approach is used in [22] where approximate compressor with only two outputs are considered. In [23], Momeni et al. propose two approximate 4/2 compressors and investigate the performance of approximate multipliers using developed circuits. Dual-quality 4/2 compressors, having the flexibility of switching between exact and approximate operating modes, are presented in [24]. Approximate halfadders, full-adders and 4/2 compressors are presented in [25], where the proposed circuits are utilized in two variants of 16-bit multipliers. The paper [26] proposes an approximate 15/4 compressor, built using 5/3 compressor as basic module. In [27] a lossy compression of the partial-product rows, based on their bit significance, is proposed; this technique uses approximate half-adders (realized with simple OR gates) to generate a reduced set of product terms. A similar approach, using simple OR gates as approximate counters, is used in [28]. Architectural-space exploration of approximate arithmetic units is investigated in [29] where the authors propose four novels approximate multipliers, based on [10]. In this paper, we present a new family of approximate compressors, obtained in a systematic way, aimed to minimize the error probability and the average error. The proposed approximate compressors are implemented by using simple AND-OR gates (no XOR gates are required) and outperform previously proposed circuits in terms of both precision and hardware complexity. We then investigate the use of the proposed compressors to build approximate multipliers. To that purpose, a simple algorithm is proposed, that uses the approximate compressors in the first steps of the partial product reduction tree. Approximate compressors are introduced in the less-significant part of the partial product matrix and added to

1549-8328 © 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

A compressor computes the arithmetic sum in (2), encoding the results in a binary format. The most common compressor is the full-adder having 3 inputs and encoding the results S, into two outputs: sum (having the same weight as the inputs) and carry (having double weight). Note that a compressor is fed by signals having the same weight, and some inputs can be carries coming from the column to the right. As an example, the widely used 4/2 compressor [17]–[19] has five inputs (one being a carry from a column to the right) and encodes the result on three outputs: sum (having the same weight as the inputs) and carry1, carry2 Fig. 1. Partial product matrix of a 8 × 8 bit Multiplier. (having double weight). The 5/2 compressor [17], [18] has the remaining portion of the matrix only if needed, to minimize seven inputs (two being carries from a column to the right) the overall approximation error. For each operand size, we and encodes the result on four outputs: sum (having the same propose four approximate multiplier versions, with different weight as the inputs) and carry1, carry2, carry3 (having double weight). precision vs electrical performance trade-off. We have synthesized the circuits developed in this paper III. P ROPOSED A PPROXIMATE C OMPRESSORS and previously proposed approximated multipliers, using a The compressors described in the following approximate the 40nm library, for several operand lengths. Syntheses show arithmetic sum in (2). For most of the combinations of pi they that our circuits provide better error-electrical performance trade-off, compared to previously proposed approximate mul- compute the same value as (2), while in some cases they give tipliers. We have also investigated the use of approximate an error. The proposed approximate compressors have j inputs multipliers in two applications: image filtering and LMS system identification. For image filtering, the proposed circuits p0 , p1 , . . . , p j −1 and compute d j /2e outputs by using a novel provide excellent compromise between Structural Similarity approach aimed to minimize the error probability and the aver(SSIM) index and power saving (either 2% SSIM degradation age error. Please note that the outputs of proposed approximate with 64% power saving or 8% SSIM degradation with 77% compressors have the same weight of the inputs (i.e. there are power saving). In the LMS system identification example, no carry outputs). This is different from standard compressors, the two most accurate versions of the proposed circuit reach where the weight of the carry output is two times the weight convergence with accuracy comparable to that obtained by of the inputs. In the following, we assume that compressor inputs are using exact multipliers. The other approximate multipliers fail partial products belonging to a PPM column. We assume, to converge. The paper is organized as follows. Section II briefly reviews moreover, that the input bits x i and y j are uniformly and exact compressors. The proposed approximate compressors independently distributed, so that the probability that each are described in Section III where a comparison with pre- partial product is high can be expressed as: vious designs is also presented. The algorithm for the design P ( pi ) = 1/4 (3) of approximated multipliers using the proposed approximate compressors is described in Section IV. Sections V provide synthesis results and the comparison with previously proposed A. Approximate 2/1 Compressor approximated multipliers. Image filtering and LMS system Let us consider the problem of summing two paridentification applications are investigated in Section VI, while tial products belonging to the same column. As observed conclusions are drawn in Section VII. in [22], [25], and [28]:   II. E XACT C OMPRESSORS S= { p0 , p1 } = { p0 p1 , p0 + p1 } (4) Let us consider two unsigned n bit inputs X = x n−1 2n−1 + Eq. (4) shows that we can sum a recoded version of the two . . . + x 0 , Y = y n−1 2n−1 + . . . + y 0 . The product Z , between partial products, given by the logical AND and the logical OR X and Y is: of the partial products. The probability of the two terms in the n−1  n−1  right-hand side of (4) is: i+ j 2n−1 xi y j 2 (1) Z = X ·Y = p2n−1 2 + . . . + p0 = P ( p0 p1) = 1/16 i=0 j =0 P ( p0 + p1) = 7/16 (5) The computation of Z requires the summation of the partial products x i y j according to their weights 2 i+ j . Fig. 1 shows, From (5), we foresee the possibility of simplifying the as an example, the partial products matrix (PPM) for an hardware, with a reduced error probability, by neglecting the 8 × 8 multiplier. low-probability term p0 p1 . In this way, the arithmetic sum Let us consider the partial products belonging to the of two partial products belonging to the same column can be j-th column of the PPM: p0 = x j −1 y 0 , p1 = x j −2 y 1 , approximated as: p2 = x j −3 y 3 , …, p j −1 = x 0 y j −1 . The arithmetic sum of the  partial products of this column, S, ranges between 0 (when S AP P = { p0 + p1 } ' S (6) all partial products are low) up to j (when all partial products With (6) we approximate a 2/1 compressor (half-adder) are 1), and will be indicated in the following as:   with an OR gate (with consequent area, delay and power S= p0 , p1 , p2 , . . . p j −1 (2) improvement), at a price of an error when both partial products

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. ESPOSITO et al.: APPROXIMATE MULTIPLIERS BASED ON NEW APPROXIMATE COMPRESSORS

3

TABLE I

TABLE II

A P P ROXIMATE 2/1 C OMP RE SSOR

A P P ROXIMATE 3/2 C OMP RE SSOR

are high. Table I shows the behavior of the approximate 2/1 compressor. Here, the output of the approximate compressor is indicated as w1 , while Erri indicates the difference (error) between exact and approximate compressor: Err i = S − S AP P

(7)

From Table I, it can be observed that the approximate compressor described by (6) under-estimates the exact sum S. As it will be shown in the following, this behavior is common to all the approximate compressors proposed in this paper. Let us define the error probability, PE , and the mean error, Fig. 2. Schematic of the proposed approximate 3/2 compressor. Emean, as: With (13) we approximate a 3/2 compressor (full-adder)  with an OR gate and an AND-OR gate. Table II shows P (Err ) PE = i the behavior of the approximate 3/2 compressor. We have i  Err = 1 only when p2 = p1 = p0 = 1, which occurs with P (Err i ) Err i (8) probability 1/64. Therefore: Emean = i  PE = 1/64 where p(Erri ) is the probability of having an error for the (approx. 3/2 compressor) (14) Emean = 1/64 i -th partial products combination. From Table I, Erri = 1 when p1 = p0 = 1, which occurs with probability 1/16. The schematic of the proposed approximate 3/2 compresTherefore: sor is shown in Fig. 2. Compared with the approximate  full-adder proposed in [25], our circuit is both simpler PE = 1/16 (approx. 2/1 compressor) (9) ([25] requires an XOR gate) and more precise. The last column Emean = 1/16 in Table II reports the arithmetic result computed by the It is worth noting that (6) was already used in [27] and [28]; approximate 3/2 compressor in [25]. As it can be observed, it is also worth observing that the approximate 2 × 2 sub- design [25] gives an error for two inputs combinations. multiplier of [10] can be obtained by using this approximate The first input combination, “110”, occurs with probability: 2/1 compressor. In [25] an approximate half-adder is proposed, (1/4)(1/4)(3/4) = 3/64 (please, recall that from (3) the probwhere an OR gate is used as sum output and an addi- ability that each partial product is high is 1/4 and hence tional AND gate produces the carry output. The last column the probability that each partial product is low is 3/4). in Table I shows the error between exact and approximate half- The second input combination, “111”, occurs with probability: adder of [25]. As it can be observed, design [25], while using (1/4)(1/4)(1/4) = 1/64. From (8), the error probability of the an additional gate, gives the same error probability (1/16) and approximate 3/2 compressor [25] is: PE = 3/64 +1/64 = 4/64. absolute mean error (1/16) as the proposed 2/1 approximate The mean error is obtained by observing that the error is +1 in both input combinations (the result is under-estimated by 1). compressor. Thus, we have: Emean = 3/64 + 1/64 = 4/64. B. Approximate 3/2 Compressor Using (4), the sum of three partial products is written as:   S= { p0 , p1 , p2 } = { p0 p1 , p0 + p1 , p2 } (10)

We can apply the AND, OR recoding of (4) again, to the first and last term at the right-hand side of (10):  S= { p0 p1 p2 , p0 p1 + p2 , p0 + p1 } (11)

The probability of the p0 p1 p2 term in (4), involving the AND of three partial products, is very low: P ( p0 p1 p2 ) = 1/64

(12)

Thus, we can neglect this low-probability term obtaining:  S AP P = { p0 p1 + p2 , p0 + p1 } (13)

C. Approximate 4/2 Compressor The sum of four partial products can be written, with the help of (4), as:  S= { p0 p1 , p0 + p1 , p2 p3 , p2 + p3 } (15) We can apply the AND, OR recoding again, obtaining:  S= {( p0 p1 ) ( p2 + p3) , ( p2 p3) ( p0 + p1 ) ,

( p0 p1) + ( p2 + p3 ) , ( p2 p3 ) + ( p0 + p1 )} (16)

The probability of the first two terms in (16), involving the AND of three partial products, is very low (see (12)). Thus, we can obtain an approximate 4/2 compressor by neglecting these low-probability terms, as follows:  S AP P = { p0 p1 + p2 + p3 , p2 p3 + p0 + p1 } (17)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

Fig. 3.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

TABLE III

TABLE IV

A P P ROXIMATE 4/2 C OMP RE SSOR

A P P ROXIMATE 5/3 C OMP RE SSOR

Schematic of the proposed approximate 4/2 compressor.

Table III shows the behavior of the proposed approximate 4/2 compressor. For four pi combinations, we have an error of 1, while there is a unique partial products combination resulting in an error of 2. The error probability and mean error are easily obtained as:  PE = 13/256 D. Approximate 5/3 Compressor (18) (approx. 4/2 compressor) Emean = 14/256 The design of approximate 5/3 compressor follows the same approach outlined for 3/2 and 4/2 approximate compressors. Fig. 3 shows the schematic of the proposed approximate 4/2 In a first step, we use (4) for all but the last partial products: compressor, using two AND-OR gates.  The last three columns in Table III report the arithmetic S= { p0 p1 , p0 + p1 , p2 p3 , p2 + p3 , p4 } (19) result computed by previously proposed approximate 4/2 comIn the second step, we apply the AND, OR recoding of (4) pressor. In every case the compressor proposed in this paper requires simpler hardware and provides smaller error probabil- again, to the first and the fourth terms and to the third and the ity and mean error. The circuit proposed in [23] (design 2) uses fifth terms at the right-hand side of (19):  two XOR and four NOR gates, giving an error in only four S = { p0 p1 ( p2 + p3 ) , p0 p1 + p2 + p3 , p0 + p1 , entries of the table. It exhibits a rather large error probability p2 p3 p4 , p2 p3 + p4 } (20) PE = 100/256 (mainly due to the error in the first row of the table occurring with probability 81/256), with a mean In this way, we obtain two terms where the AND of three error Emean = −62/256. The approximate compressor in [25] partial products appears. The probability of these terms is very (requiring two XOR and several AND-OR gates) gives error low; hence we neglect these terms and obtain:  in five entries of the table with PE = Emea n = 37/256. The S AP P = { p0 p1 + p2 + p3 , p0 + p1 , p2 p3 + p4 } (21) approximate 4/2 compressor proposed in [24] (structure 4) still requires two XOR gates, is slightly simpler than [25], and Table IV shows the behavior of the proposed approximate exhibits PE = 37/256, Emean = 38/256. 5/3 compressor. Again, there is a unique partial products Data in Table III highlight the main feature of the proposed combination resulting in an error of 2, having a very low approach: errors in our approximate compressor are present probability of 1/1024. From Table IV, the error probability only in the low-probability cases in which three or more inputs and mean error are obtained as: are ‘1’. In [23]–[25], instead, errors are more randomly spread  PE = 43/1024 across the table, resulting in larger overall error probability and (approx. 5/3 compressor) (22) Emean = 44/1024 mean error.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. ESPOSITO et al.: APPROXIMATE MULTIPLIERS BASED ON NEW APPROXIMATE COMPRESSORS

5

TABLE V A P P ROXIMATE 6/3 C OMP RE SSOR

Fig. 4.

Schematic of the proposed approximate 5/3 compressor.

The schematic of the proposed 5/3 compressor is shown in Fig. 4. It is worth noting that an approximate 5/3 compressor can also be obtained with an approximate 2/1 and an approximate 3/2 compressor. This results in a slightly simpler ...