Papers by Harish M Kittur
Computers & Electrical Engineering, 2014
Due to increase in the number of Intellectual Property (IP) cores, clock generation in current da... more Due to increase in the number of Intellectual Property (IP) cores, clock generation in current day System-on-Chips (SoCs) is facing a crisis. The conventional method of using a dedicated Phase Locked Loop (PLL) to generate the clock for each IP core is becoming inefficient in terms of power and cost. We propose an algorithm based on Least Common Multiple (LCM) to minimize the number of PLLs required to generate the clocks for the IP cores in a SoC. This is done by finding an Optimum Operating Frequency (OOF) for each IP core within 10% below the maximum operating frequency of the core. The OOF is chosen such that the LCM of the OOF of all the IP cores is minimized. Simulated annealing is used to find the LCM. This LCM is the crucial high frequency from which maximum number of clocks can be derived by clock dividers.
Journal of Telecommunication, Electronic and Computer Engineering, 2017
Based on pay-as-per-usage policy, there is a tremendous use of cloud computing in scientific soci... more Based on pay-as-per-usage policy, there is a tremendous use of cloud computing in scientific society like bio-medical, healthcare and online financial applications. Fault tolerance is one of the biggest challenges to guarantee the reliability and availability of critical services. We must make the system to avail by minimizing the impact of failure. In this paper, we conducted a comparative analysis of various approaches for tolerating faults through scheduling in cloud computing environment based on their policies. The goal of this paper is not only used to analyze the existing methods, but also to identify the areas needed for future research.
An essential component of today’s battery powered SoC’s are power management systems which includ... more An essential component of today’s battery powered SoC’s are power management systems which include Low Drop-Out (LDO) voltage regulators. LDO voltage regulators improve battery’s power efficiency and life. In this paper design and analysis of an LDO voltage regulator is presented. The designed LDO voltage regulator is designed with self-compensated error amplifier. It provides 30mA load current with a stable 1.6V output voltage. It consumes 172μA quiescent current and has a power efficiency of 88.38 % with dropout voltage of 200mV.
Research Journal of Applied Sciences, Engineering and Technology, 2015
In this study an area optimized Dadda multiplier with a data aware Brent Kung adder in the final ... more In this study an area optimized Dadda multiplier with a data aware Brent Kung adder in the final addition stage of the Dadda algorithm for improved efficiency has been described in 45 nm technology. Currently the trend is to shift towards low area designs due to the increasing cost of scaled CMOS. An area reduced full adder is the key component in our design. It uses lesser number of gates than conventional design and hence lesser area and delay. The data aware Brent Kung adder in the final addition stage helps in reducing dynamic power as it reduces switching activity depending on the inputs. We have compared the results to the existing benchmark designs and our experimental results show that we have been capable of reducing the area by 13.011% and total power by 26.1% with only a slight increase in the delay.

In this work faster Baugh-Wooley multiplication has been achieved by using a combination of two d... more In this work faster Baugh-Wooley multiplication has been achieved by using a combination of two design techniques: partition of the partial products into two parts for independent parallel column compression and acceleration of the final addition using a hybrid adder proposed in this work. Based on the proposed techniques 8, 16, 32 and 64-bit Dadda based Baugh-Wooley multipliers has been developed and compared with the regular Baugh-Wooley multiplier. The performance of the proposed multiplier is analyzed by evaluating the delay, area and power, with 180 nm process technologies on interconnect and layout using industry standard design and layout tools. The result analysis shows that the 64-bit proposed multiplier is as much as 26.9% faster than the regular Baugh-Wooley multiplier and requires only 2.21% more power. Also the power-delay product of the proposed design is significantly lower than that of the regular Baugh-Wooley multiplier.

A Content Addressable Memory (CAM) is a memory primarily designed for high speed search operation... more A Content Addressable Memory (CAM) is a memory primarily designed for high speed search operation. Parallel search scheme forms the basis of CAM, thus power reduction is the challenge associated with a large amount of parallel active circuits. We are presenting a novel algorithm and architecture described as Selective Match-Line Energizer Content Addressable Memory (SMLE-CAM) which energizes only those MLs (Match-Line) whose first three bits are conditionally matched with corresponding first three search bit using special architecture which comprises of novel XNOR-CAM cell and novel XOR-CAM cell. The rest of the CAM chain is followed by NOR-CAM cell. The 256 X 144 bit SMLE-CAM is implemented in TSMC 90 nm technology and its robustness across PVT variation is verified. The post-layout simulation result shows, it has energy metric of 0.115 fJ/bit/search with search time 361.6 ps, the best reported so far. The maximum operating frequency is 1GHz.

Comparison of tunnel currents through SiO<inf>2</inf>, HfO<inf>2</inf>, Ta<inf>2</inf>O<inf>5</inf>, ZrO<inf>2</inf> and Dy<inf>2</inf>O<inf>3</inf> dielectrics in MOS devices for ultra large scale integration using first principle calculations
2013 Annual International Conference on Emerging Research Areas and 2013 International Conference on Microelectronics, Communications and Renewable Energy, 2013
ABSTRACT The work presented in this paper focuses on the effects of high leakage current in field... more ABSTRACT The work presented in this paper focuses on the effects of high leakage current in field effect transistors and the possible ways to play down with the leakage currents. This paper combines density functional theory and non equilibrium Green&#39;s function formalism to perform atomic scale calculation of tunnel currents through SiO2, HfO2, Ta2O5ZrO2 and DY2O3 dielectrics in MOSFETs. The tunnel currents for different bias voltages applied to Si/Insulator/Si systems have been obtained along with tunnel conductance v/s bias voltage plots for each system and the plots have been analyzed with reference to the presently used bulk Si/SiO2/Si systems that have SiO2 as the gate dielectric material. The results justify the use of high dielectric constant materials as gate dielectric in FET devices so as to enable further downscaling of MOSFETs with reduced gate leakage currents thereby enabling ultra large scale integration.

90nm CMOS Low Power Multimodulus 32/33/39/40/47/48 Prescaler with METSPC Based Logic
2013 Third International Conference on Advances in Computing and Communications, 2013
ABSTRACT The prescaler is primarily used in phased locked loop (PLL) to generate higher reference... more ABSTRACT The prescaler is primarily used in phased locked loop (PLL) to generate higher reference frequency for the loop, which supplies more samples per unit time to the phase detector to attain better frequency stability. This paper is first to present a Modified Extended True Single Phase Clock (METSPC) based 2/3 prescaler design. The METSPC-FF is fully investigated across all the process corners for power consumption and delay along with its functionality for GHz operations. Both ETSPC and METSPC are compared to find that the PDP of METSPC is 64.96% better than ETSPC. Thus using METSPC enhances the operating performance of the prescaler. A multimodulus 32/33/39/40/47/48 prescaler is proposed and its operation is verified over all PVT variations with a max. frequency of 6GHz. Simulation is performed in TSMC 90 nm technology using CADENCE SPECTRE simulator at supply voltage of 1.1V.

Indian Journal of Science and Technology, 2015
Due to the continual downscaling of technology, System on Chip (SoC) is becoming denser and dense... more Due to the continual downscaling of technology, System on Chip (SoC) is becoming denser and denser with multiple IP cores within. As the number of cores within a SoC increase, so does the number of faults within the chip. Along with the designing aspect of a chip, design for testability too is a major area of concern. Testing methods like Built-In-Self-Test (BIST) allow the chip to test itself without the need for external testing equipment. Test patterns for BIST are generated using Linear Feedback Shift Register (LFSR) which produces test vectors in a pseudo random manner. This paper concentrates on improving the hardware in terms of area and number of logical gates in the 2-D LFSR used for testing an SoC with multiple IP cores so that vectors in various patterns can be generated using a single reconfigurable 2 Dimensional LFSR. The proposed technique is much more useful for testing System on a Chip with large number of cores as the same configuration network is used to test different SoC cores.
Engineering Science and Technology, an International Journal, 2014
In the recent past, Mesh-based clock distribution has received interest due to their tolerance to... more In the recent past, Mesh-based clock distribution has received interest due to their tolerance to process variations in deep-sub micron technology. Mesh buffers are placed on the mesh to drive the large load capacitance of clock sinks and mesh wire capacitance. In this paper, we propose a buffer placement algorithm which can overcome the short circuit power dissipated in clock meshes. Our buffer placement algorithm uses clustering technique to judiciously place buffers such that short-circuit power is minimized while minimizing skew at the same time. This is verified by Monte carlo simulations incorporating process, voltage and systemic variations in NGSPICE.

A 180nm CMOS Low Power Latched Comparator for NMR Applications
IET Chennai 3rd International Conference on Sustainable Energy and Intelligent Systems (SEISCON 2012), 2012
ABSTRACT This paper presents a CMOS comparator design for Nuclear Magnetic Resonance (NMR) applic... more ABSTRACT This paper presents a CMOS comparator design for Nuclear Magnetic Resonance (NMR) applications. Basically the design is based on CMOS Operational Transconductance Amplifier (OTA) technique with reduced cascode current mirror circuit for proper biasing. The present Magnetic Resonance Imagers (MRI) operates at a magnetic field of 1.5 Tesla which corresponds to the resonance frequency of the nuclei being 64 MHz. Hence the proposed comparator architecture involves the use of a sampler and a comparator (quantizer) for this frequency specification. The overall CMOS comparator design is realised in 180nm CMOS technology which occupies an active area of 44.39 × 34.25 μm2 and consumes a power of 118.5 uW from a 1.5V power supply.
International Journal of Computer Applications, 2011
Content-addressable memories (CAMs) are hardware search engines that are much faster than algorit... more Content-addressable memories (CAMs) are hardware search engines that are much faster than algorithmic approaches for search-intensive applications. CAMs are composed of conventional semiconductor memory (usually SRAM) with added comparison circuitry that enables a search operation to complete in a single clock cycle. In case of advanced applications we need large sized CAM but it has the disadvantage of high power consumption. To overcome the drawbacks we need to reduce the power consumption of the CAM when we search the data. This paper proposes an idea for improving power, area and performance of the system of recently proposed high Performance Hybrid-Type CAM Designs. For this we replace the basic 9T CAM cell with a 4T CAM cell. The simulation results show the success of the method.

VLSI Design, 2013
We demonstrate faster and energy-efficient column compression multiplication with very small area... more We demonstrate faster and energy-efficient column compression multiplication with very small area overheads by using a combination of two techniques: partition of the partial products into two parts for independent parallel column compression and acceleration of the final addition using new hybrid adder structures proposed here. Based on the proposed techniques, 8-b, 16-b, 32-b, and 64-b Wallace (W), Dadda (D), and HPM (H) reduction tree based Baugh-Wooley multipliers are developed and compared with the regular W, D, H based Baugh-Wooley multipliers. The performances of the proposed multipliers are analyzed by evaluating the delay, area, and power, with 65 nm process technologies on interconnect and layout using industry standard design and layout tools. The result analysis shows that the 64-bit proposed multipliers are as much as 29%, 27%, and 21% faster than the regular W, D, H based Baugh-Wooley multipliers, respectively, with a maximum of only 2.4% power overhead. Also, the powe...

Power Optimized Digital Decimation Filter for Medical Applications
2012 International Conference on Advances in Computing and Communications, 2012
A digital decimation filter for medical applications (NMRI) is designed and proposed in this pape... more A digital decimation filter for medical applications (NMRI) is designed and proposed in this paper. The digital decimation filter is one of the main blocks of Σ-Δ ADC .For a clinical NMRI systems, the resonating frequency being 64MHz for a magnetic field strength of 1.5 T. The decimation filter down samples the over sampled output of the Σ-Δ modulator to Nyquist sampling rate of 128MHz.. The power, speed and the area of operation are largely governed by decimation filters in Σ-Δ ADC. The digital decimation filter is based on Cascaded Integrated Comb (CIC) filter architecture. In this scheme the optimization of the two stage CIC structure is done and the power consumption is considerably reduced compared to the conventional CIC filter architectures. The proposed resolution filter is realised in 0.18μm CMOS technology which occupies an area of 0.088mm2and consumes a power of 2.67mW from a 1.5V supply.
Low power energy efficient pipelined multiply-accumulate architecture
This paper proposes and implements an energy efficient, high speed pipelined Multiply and Accumul... more This paper proposes and implements an energy efficient, high speed pipelined Multiply and Accumulate (MAC) architecture for DSP applications. A controller has been designed to detect the input pattern such that it bypasses multiplier and accumulator units depending on the consecutive input bits. This architecture is used for both signed and unsigned multiplication, it includes a guard bits to support

Design of Low-Power Multiplier Using UCSLA Technique
Advances in Intelligent Systems and Computing, 2014
ABSTRACT Multiplication is one of the major fundamental operations and key hardware blocks in any... more ABSTRACT Multiplication is one of the major fundamental operations and key hardware blocks in any digital system. This paper presents the comparison of the VLSI design of uniform carry select adder (UCSLA)-based multiplier technique with the variable carry select adder (VCSLA)-based multiplier technique. The analysis is carried out on the different bit sized values of unsigned inputs, and output results show that the area, power, and delay are reduced in the UCSLA-based multiplier technique compared to VCSLA-based technique. The timing delay in 64-bit VCSLA-based multiplier technique is 95.25 ns for performing the multiplication, which is reduced by 11.11 % in the UCSLA-based multiplier technique. In the same manner, area is reduced by 39.42 % and power also reduced by 19.28 % in UCSLA-based multiplier technique. The simulation works of multipliers are carried out in Verilog-HDL (Modelsim). After the simulation, the results are obtained using cadence tool.

Low power, high speed hybrid clock divider circuit
ABSTRACT The Clock Divider circuit has found immense application in Multiple Clock Domain (MCD) s... more ABSTRACT The Clock Divider circuit has found immense application in Multiple Clock Domain (MCD) systems like ASICs, SoC and GALS. In MCD systems, we generate many clock signals of various frequencies from a high frequency clock by frequency division. Power is an important parameter to be minimized since the nodes in a clock divider circuit will toggle at clock frequency. In this paper, we present a low power hybrid clock divider circuit which can take an input frequency up to 6 GHz and perform frequency division. The divider is hybrid because it uses two different flip flops - a Modified Extended True Single-Phase Clock flip flop (METSPC-FF) and a self blocking FF (SBFF).The METSPC-FF is fast enough to divide a GHz frequency, but consumes more power when compared to SBFF, while the SBFF is relatively slow but consumes less power compared to METSPC. We analyze the performance of these 2 FFs across PVT variations and implement them in a clock divider circuit. Our clock divider circuit consumes 149.56 µW power for ‘divide by’ 8 operation on a 6 GHz clock. Simulation of these flip flops in TSMC 90 nm technology using CADENCE SPECTRE simulator shows that they are very energy efficient and hence can be used for other high speed applications without compromising on the power.
Clock frequency doubler circuit for multiple frequencies and its application in a CDN to reduce power
2012 International Conference on Computing, Electronics and Electrical Technologies (ICCEET), 2012
The frequency doubler(FD) circuit has found immense use in digital CMOS systems. Such a circuit i... more The frequency doubler(FD) circuit has found immense use in digital CMOS systems. Such a circuit is especially useful in a clock distribution network where the clock signal can be distributed at a low frequency and multiplied (clock frequency made 2 or 4 times) at the blocks where a higher frequency is needed. This reduces the power consumption of the clock

ASIC based logarithmic multiplier using iterative pipelined architecture
2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2013
ABSTRACT Multiplication is a significant process in digital signal processing algorithms. These a... more ABSTRACT Multiplication is a significant process in digital signal processing algorithms. These algorithms involve large number of multiplications, which is time consuming. In digital signal applications time is more important as compared to accuracy. In this paper a simple and efficient architecture of multiplier is proposed which uses adders, shifters, encoders and decoder etc. that consume less area, time and power. The multiplication is based on Mitchell&#39;s algorithm. This multiplier gives arbitrary accuracy but with only two iterations it gives very less error that is limited to 2% which is tolerable in digital signal algorithms. This multiplier is implemented in ASIC using SOC encounter and NCSIM simulator in Cadence with 180nm technology for 16 bit operands at 12.5 MHz frequency.

International Journal of Computer Applications, 2011
This paper proposes a modified form of the design for low dynamic power adder using a reset netwo... more This paper proposes a modified form of the design for low dynamic power adder using a reset network in the CMOS dynamic logic family. The results show that the dynamic power reduces as compared to lower dynamic power logic and the domino logic. In this modified form of the low dynamic power adder, the logic outputs are reset to low during the pre-discharge phase which is the high input to the clock. The logic evaluation takes place when the clock input is low. The modified logic is better than domino logic since it does not require an inverter for cascading the gates. In Pre-discharging, resetting the output low prevents the problems of charge sharing and charge leakage associated with the other dynamic logic families and also it avoids the static power dissipation which exists in the low power dynamic logic. Also resetting the output low avoids the problem of high transition time from high level to low level which exists in circuits employing PMOS logic. The proposed circuit is a mix of PMOS logic and a dynamic logic. The proposed logic cell can be cascaded in a domino like fashion without the need of an inverter.
Uploads
Papers by Harish M Kittur