# HIGH PERFORMANCE ADVANCED SIGNED MULTIPLIER FOR DSP APPLICATIONS

<sup>1</sup>N.Sharath Babu, <sup>2</sup>.V.Himaja, <sup>3</sup>.B.Srinu

<sup>1</sup>Department of Electronics and Communication Engineering, Anurag Group of institutions, Venkatapur,

Ghatkesar, Hyderabad, T.S

<sup>2,3</sup> Department of Electronics and Communication Engineering, Mahaveer Institute of Science and Technology, Bandlaguda, Hyderabad, T.S

*Abstract*- The speed of a multiplier is of utmost importance to any Digital Signal Processor (DSPs) performance. Along with the speed its precision also plays a major role. Although Floating point multipliers provide required precision they tend to consume more silicon area and are relatively slower compared to fixed point (Q-format) multipliers. In this paper we propose a method for fast fixed point signed multiplication based on Urdhava Tiryagbhyam method of Vedic mathematics. The coding is done for 16 bit (Q15) and 32 bit (Q31) fractional fixed point multiplications using Verilog and synthesized using Xilinx ISE version 12.2. Further the speed comparison of this multiplier with normal booth multiplier and Xilinx LogiCore parallel multiplier Intellectual Property (IP) is presented. The results clearly indicate that Urdhava Tiryagbhyam can have a great impact on improving the speed of Digital Signal Processors.

Keywords- Q-Notation; Vedic Multiplier Mathematics; Fractional fixed point

# I Introduction

Vedic Mathematics hails from the ancient Indian scriptures called "Vedas" or the source of knowledge. This system of computation covers all forms of mathematics, be it geometry, trigonometry or algebra. The striking feature of Vedic Mathematics is the coherence in its algorithms which are designed the way our mind naturally works. This makes it the easiest and fastest way to perform any mathematical calculation mentally. Vedic Mathematics is believed to be created around 1500 BC and was rediscovered between 1911 to 1918 by Sri Bharti Krishna Tirthaji (1884-1960) who was a Sanskrit scholar, mathematician and a philosopher [1]. He organized and classified the whole of Vedic Mathematics into 16 formulae or also called as *sutras*. These formulae form the backbone of Vedic mathematics. Great amount of research has been done all these years to implement algorithms of Vedic mathematics on digital processors. It has been observed that due to coherence and symmetry in these algorithms it can have a regular silicon layout and consume less area [2,3] along with lower power consumption.

Normally signal processing algorithms are developed using high level languages like C or Matlab using floating point number representations. The algorithm to architecture mapping using floating point number representation consumes more hardware which tends to be expensive. Fixed point number representation is a good option to implement at silicon level. Hence our focus in this work is to develop optimized hardware modules for multiplication operation which is one of the most frequently used operation in signal processing applications like Fourier transforms, FIR and IIR filters, image processing systems, seismic signal processing, optical signal processing etc. Any attempt to come out with an optimized architecture for this basic block is advantageous during the product development stages.

Considering fixed point representation, 16 bit Q15 format and 32 bit Q31 format provide required precision for most of the digital signal processing applications and it is best suited for implementation on processors. The advantage it provides over floating point multipliers is in the fact that Oformat fraction multiplications can be carried out using integer multipliers which are faster and consume less die area. DSP Processors like TMS320 series from Texas Instruments work on 16 bit Q15 format. In this paper we propose the implementation of fixed point O-format [6] high speed multiplier using Urdhava Tiryagbhyam method of Vedic mathematics. Further we have also implemented multipliers using normal booth algorithm [8] and Xilinx parallel multiplier Intellectual Property and presented a comparative study on maximum frequency or speed of these multipliers.

The paper is organized into VI sections. Section II explains fixed point or Q-format representation of a number; III spreads light over Urdhava Tiryagbhyam method of Vedic mathematics; IV explains the architecture of proposed Qformat Urdhava multipliers; V presents the results and comparison and lastly VI provides conclusion of the work.

# II Fixed Point Arithmetic.

An N-bit fixed point number [6] can be interpreted as either an integer or a fractional number. Integer fixed point is difficult to use in processors due to possible overflow. For e.g. In a 16-bit processor for signed integers the dynamic range is from  $-2^{15}$  to  $2^{15}$ -1 i.e. 32768 to 32767. If 500 is multiplied by 800 the result is 40000 which is an overflow. In order to overcome this situation fractional fixed point representation also known as Q-format is used.

# A.-format Representation

In general any Q-format representation is denoted by Qm.n, where *m* is the number of bits to represent integer, *n* denotes number of bits to represent fractional part and the total number of bits is given by N = m+n+1 for signed numbers. For e.g. Q4.11 format signifies that a total of 16 bits are required to represent a fractional number in which 4 bits are reserved for the integer part and 11 bits for the fractional part and 1 bit indicates sign. Special cases of Oformat consist of zero bits to represent the integer part. Q0.15 (Q15) and Q0.31 (Q31) are two such formats for 16 bit and 32 bit representations respectively. As there is no integer part the fractional number has a range between 1 and -1. Therefore the products of such numbers also lie between 1 and -1. This property is best suited for implementing multipliers as the bit length of the product is same as the input bit length and thus Q-format finds its application in digital signal processing hardware.

An N-bit number in Qm.n format is represented as follows.[6]

$$a_{n+m}a_{n+m-1} \dots a_n a_{n-1} \dots a_1 a_0$$
 (1)

Here the '.' between  $a_n$ .  $a_{n-1}$  represents the fixed point and value of (1) is given by,

$$(a_{n+m}2^{N-1} + a_{n+m-1}2^{N-2} \dots + a_22^2 + a_12 + a_0)2^{-n}.$$

When we want to convert a fractional number in the range of the desired Qm.n format, we multiply it with  $2^n$ . The resultant value is truncated or rounded off to the nearest integer. Therefore a small amount of precision loss is involved which reduces as the number of bits representing the fractional part increases. We prefer rounding technique since its error bias in both positive and negative direction is same [6]. Therefore the rounded value will be more precise.

For e.g. Conversion of 0.2625 to Q15 format is done by multiplying it with  $2^{15}$  which equals to 8601.6 which when rounded gives 8602. This is stored as 0010000110011010 in a 16 bit memory location. The most significant bit indicates sign of the number. If it is negative then 2's complement method is followed to store the number. Thus a fraction is converted to an integer in a Q-format and the choice of the decimal point lies entirely in the hands of the programmer. In general a Qm.n format has a resolution of  $2^{-n}$  and its dynamic range lies between  $-2^{m}$  to  $2^{m} - 2^{-n}$ . Therefore as the number of bits for fractional representation increases the resolution increases and as the number of bits for integer part increases the dynamic range increases. The resolution of Q15 format is  $2^{-15}$ , and for Q31 format it is  $2^{-31}$ . Therefore a number represented in

Q31 format has higher resolution and is more precise than the one in Q15 format. In this paper we mainly concentrate on Q15 and Q31 formats since they are best suited for implementing multipliers for DSP applications.

# **B.** Q-format Multiplication

When two Q15 numbers are multiplied their product is 32 bits long as illustrated in Fig. 1. The product has a redundant or extended sign bit. Since the product stored in memory should also be a Q15 number we left shift the product by one bit and the most significant 16 bits (including sign bit) is



Fig. 1. Multiplication of two Q15 format numbers

product in Q15 format itself. stored in the memory. Fig. 1 demonstrates multiplication of two Q15 format numbers. The process remains same for Q31 format wherein after left shifting the product by one bit, the most significant 32 bits are stored in the memory. Therefore with Q-format, multiplications of two fractional numbers can be carried out by using integer multiplications. Integer multiplications consume less area and are faster compared to floating point multipliers which is the major advantage of Qformat representation.

#### III.Urdhava Tiryagbhyam Method.

Urdhava Tiryagbhyam [2] is a Sanskrit word which means vertically and crosswire in English. The method is a general multiplication formula applicable to all cases of multiplication. It is based on a novel concept through which all partial products are generated concurrently. Fig. 2 demonstrates a 4 x 4 binary multiplication using this method. The method can be generalized for any N x N bit multiplication. This type of multiplier is independent of the clock frequency of the processor because the partial products and their sums are calculated in parallel. The net advantage is that it reduces the need of microprocessors to operate at increasingly higher clock frequencies. As the operating frequency of a processor increases the number of switching instances also increases. This results in more power consumption and also dissipation in the form of heat which results in higher device operating temperatures.

Another advantage of Urdhva Tiryagbhyam multiplier is its scalability. The processing power can easily be increased by increasing the input and output data bus widths since it has a regular structure [3]. Due to its regular structure, it can be easily layout in a silicon chip and also consumes optimum area [2]. As the number of input bits increase, gate delay and area increase very slowly as compared to other multipliers. Therefore Urdhava Tiryagbhyam multiplier is time, space and power efficient.

The line diagram in fig. 2 illustrates the algorithm for multiplying two 4-bit binary numbers  $a_3a_2a_1a_0$  and  $b_3b_2b_1b_0$ . The procedure is divided into 7 steps and each step generates partial products. Initially as shown in step 1 of fig. 2,



Fig. 2 Multiplication of two 4 bit numbers using Urdhava Tiryagbhyam method.[7]

# **IV.Implementation**

The proposed Urdhava Tiryagbhyam Q-format multiplier is designed using Verilog hardware description language and structural form of coding. The basic block of both Q15 and Q31 multiplier is a 4 x 4 Urdhava Tiryagbhyam integer multiplier which in turn is made up of two 2 x 2 multiplier blocks. The design is completely synchronized by the clock. Further, the Q-format multipliers were also implemented using normal booth's algorithm and Xilinx parallel multiplier Intellectual Property (IP) generated by Xilinx Core Generator which is optimized for speed with no pipelining stage. The code is completely synthesized using Xilinx XST and implemented on device family Virtex-5, device XC5VL50, package FF324 with speed grade -2.

# V. Simulation Results

The design was simulated using Isim on Xilinx ISE 12.2 version.

For Q15 format multiplication as shown in fig. 6,

Input1 =-0.75 = 1010 0000 0000 0000 Input2 =- 0.25 = 1100 0000 0000 0000 Output = 0.1875 = 0001 1000 0000 0000

For Q31 format multiplication as shown in fig. 7,

 $Input 1 {=} 0.666666 {=} 10101010101010101010101000001000010$ 

Input2=0.333333= 00101010101010101010101111111

Output= 1110 0011 1000 1110 0011 1100 1001 1110 whosevalueis-0.222221777743935585021972655625.But the actual value of the product is -0.222221777778. Therefore precision loss is involved in this multiplication and is found to be 3.60644E-12 which is less than the resolution of Q31 representation i.e.  $2^{-31}$ . Thus it provides 32 bit accurate product which is acceptable for most of the DSP applications. As shown in table 1, the comparison report suggests that a Q31 format Urdhava multiplier is faster than Xilinx parallel multiplier intellectual property (IP) by 1.25 times although the slice LUT usage is more for Urdhava multiplier.



Fig. 3:Q15 multiplication

| Name             | 1,999,998 ps  1,999,999 ps                      |
|------------------|-------------------------------------------------|
| 🖓 dk             |                                                 |
| 🕨 📑 input1[31:0] | 10 10 10 10 10 10 10 10 10 10 10 10000 10000 10 |
| 🕨 📑 input2[31:0] | 001010101010101010101011111111111111111         |
| 🕨 📑 out[31:0]    | 11100011100011100011110010011110                |
| 🕨 📷 out1[31:0]   | 0001110001110001110000110100010                 |

Fig. 4:Q31 multiplication

Also compared to booth multiplier the speed of Urdhava Qformat multiplier is faster by 2.61 times. For a Q-15 format multiplier, as seen in table 2 the speed factor improvement is 1.40 and 1.84 times compared to Xilinx parallel multiplier IP and booth multiplier respectively. When Virtex-5 DSP48E slices were used with Xilinx parallel multiplier IP, Urdhava multiplier still proved to be faster indicating that it is the best choice for implementing faster multipliers on FPGA..

Table 1 comparison Of 32 Bit Q31-Format Multipliers

| Maximum<br>Frequency<br>(in MHz) | 6-input<br>Slice LUT<br>Usage | Factor by<br>which<br>Urdhava<br>Multiplier |
|----------------------------------|-------------------------------|---------------------------------------------|
| (111 11112)                      | Usage                         | is                                          |

|                                        |        |            | faster     |
|----------------------------------------|--------|------------|------------|
| Urdhava<br>Multiplier                  | 158.90 | 2710/19200 | -          |
| Xilinx<br>parallel<br>multiplier<br>IP | 126.24 | 1896/19200 | 1.25 times |
| Normal<br>Booth<br>Multiplier          | 60.88  | 3047/19200 | 2.61 times |

Table 2comparison Of 16 Bit Q15-Format Multipliers

|                                        | Maximum<br>Frequency<br>(in MHz) | 6-input<br>Slice LUT<br>Usage | Factor by<br>which<br>Urdhava<br>Multiplier is<br>faster |
|----------------------------------------|----------------------------------|-------------------------------|----------------------------------------------------------|
| Urdhava<br>Multiplier                  | 236.18                           | 662/19200                     | _                                                        |
| Xilinx<br>parallel<br>multiplier<br>IP | 168.77                           | 434/19200                     | 1.40 times                                               |
| Normal<br>Booth<br>Multiplier          | 128.35                           | 718/19200                     | 1.84 times                                               |

### **VI.** Conclusion

This paper proposed a fast multiplier architecture for Q-format multiplications signed using Urdhava Tiryagbhyam method of Vedic mathematics. Since Qformat representation is widely used in Digital Signal Processors the proposed multiplier can substantially speed up the multiplication operation which is the basic hardware block. They occupy less area and are faster than the booth multipliers. It is also shown that the Urdhava Tiryagbhyam Q-format multiplier is faster than Xilinx parallel multiplier Intellectual Property (IP) using slice LUT's and also using DSP48E blocks which are meant specifically for digital signal processing operations like multiply-accumulate. multiply-add etc. Therefore the Urdhava Tiryagbhyam Qformat multiplier is best suited for signal processing applications requiring faster multiplications. Future work lies in the direction of introducing pipeline stages in the multiplier architecture for maximizing throughput.

#### References

- [1] Jagadguru Swami Sri Bharati Krisna Tirthaji Maharaja, "Vedic Mathematics: Sixteen Simple Mathematical Formulae from the Veda," Motilal Banarasidas Publishers, Delhi, 2009, pp. 5-45.
- [2] H. Thapliyal and M. B. Shrinivas and H. Arbania, "Design and Analysis of a VLSI Based High Performance Low Power Parallel Square

Indian J.Sci.Res. 17(2): 329-331, 2018

Architecture," Int. Conf. Algo.Math.Comp. Sc., Las Vegas, June 2005, pp. 72-76.

- [3] Himanshu Thapliyal and M. B. Srinivas, "An efficient method of elliptic curve encryption using Ancient Indian Vedic Mathematics," 48th IEEE International Midwest Symposium on Circuits and Systems, 2005, vol.1, pp. 826-828.
- [4] M. Pradhan and R. Panda, "Design and Implementation of Vedic Multiplier," A.M.S.E Journal, Computer Science and Statistics, France vol. 15, July 2010, pp. 1-19.
- [5] Harpreet Singh Dhillon , Abhijit Mitra, "A Reduced-Bit Multiplication Algorithm for Digital Arithmetics," International Journal of Computational and Mathematical Sciences, Spring 2008, pp.64-69.
- [6] Sen-Maw Kuo and Woon-Seng Gan, "Digital Signal Processor, architectures ,implementations and applications," Pearson Prentice Hall, 2005, pp. 253-323.