Over the last several decades, the device geometry of integrated circuits has be
ID: 1716519 • Letter: O
Question
Over the last several decades, the device geometry of integrated circuits has been drastically reduced. We now manufacture printed circuit boards that have dimensions that are close to the dimensions that were only possible with ICs in the recent past. Moore’s Law suggests that the speed of integrated circuits has been doubling every 18 months and there is every indication that it may continue to do so. Estimate the change in CMOS gate dimensions that are necessary to accomplish a 2x speed up in the circuit operation. What is the corresponding reduction in power that will result from this reduction in gate dimensions, assuming the device operates at the same speed as its predecessor.
Explanation / Answer
Scaling, kT/q, and the Problem
While CMOS technology was invented in 1963, it took thefirst power crisis in the 1980s to cause VLSI chips to switchfrom nMOS, which during the late 1970s was the dominantVLSI technology. During this period V
dd
was fixed to 5V,and was not scaling with technology to maintain systemcompatibility. For control and speed reasons, this meant thatthe depletion thresholds for the nMOS loads did not scalerapidly, so the current per minimum gate scaled only slowly.The net result was that the power of the chips started growingwith the complexity, and chips rapidly went from a Watt tomultiple Watts, with the final nMOS VLSI chips dissipatingover 10W [2]. While the peak currents in CMOS were aslarge as nMOS, since they were transients that lasted roughly1/20 of a clock cycle, a CMOS processor ran at roughly 10xlower power than a similar nMOS chip.
Low Power Circuits and Architecture
This same view on equalizing the marginal delay cost for areduction in energy holds for low-power circuits andarchitectures, although it is rarely discussed that way. Many papers simply discuss energy savings without discussing the performance costs. A technique with moderate performancecost might be well-suited for a low-speed machine with alarge marginal delay cost per unit energy, but would actuallymake the power higher if it was applied to a fast machine witha small marginal delay cost for energy reduction.The best techniques have negative performance cost to reduceenergy – they improve both performance and energy. Thesetechniques generally involve problem reformulation or algorithmic changes that allow the desired task to beaccomplished with less computation than before. While theyare by their nature application specific, these techniques canchange the power required for a task by orders of magnitude[6], more than any other method. These changes aregenerally made at the architectural level, but sometimesimplementation decisions are critical too. Adding specializedhardware reduces the overhead work a more general hardware block would need to do, and thus can improve both energyand performance. Since these ideas require domain specificinsight, no tools to support this activity exist.The next set of low-power techniques are those that nominallyhave zero performance cost – these techniques remove energythat is simply being wasted by the system. Before power became a critical problem designers were rarely concernedwhether a unit was doing useful work, they were onlyconcerned about functionality and performance. At the circuitlevel these techniques generally are tied to clock gating to prevent units from transitioning when they are not producinguseful outputs. The larger power reductions come fromapplying this idea at the system level. Subsystems oftensupport different execution states, from powered off, to ready-to-run. Modern PCs use an interface called ACPI to allow thesoftware to deactivate unused units so that they don’tdissipate power [7]. A digital cell phone’s power advantageover analog phones comes mostly from an architecture thatwas borrowed from pagers in which the phone is actually off most of the time
The dual of reducing energy with no performance cost aretechniques that improve performance with no energy cost.Parallelism is the most commonly used example of thisapproach [8]. For applications with data parallelism, it is possible to use two functional units each running at half rate,rather than using a single unit running at full rate. Since theenergy per operation is lower as you decrease performance,this parallel solution will dissipate less power than theoriginal solution. Often there is no need to explicitly build parallel units because pipelining can achieve a similar effect.In reality the energy cost of parallelism is not zero, since thereis some cost in distributing operands and collecting theresults, or in the pipeline flops, but these costs are generallymodest. The efficiency of parallelism is often limited by theapplication – it must have enough work to do that partiallyfilled blocks don’t occur that often, since these increase theaverage energy cost.Other “low-power” techniques are really methods to reduceenergy by increasing the delay of the circuit, or techniquesthat give the low-level optimizer more degrees of freedom.The former include using power gating to reduce leakage andlow swing interconnects, while the latter include dualthreshold technologies [9], or allowing gates to connect toeither of two different power supplies [10]. As previouslymentioned, techniques with modest delay costs might beadvantageous for a low-performance design, but may not bein a high-performance system since these systems operate at a point where the allowable marginal delay cost is very small.Most of the remaining low power techniques are reallymethods of dealing with application, environmental or fabrication uncertainty, so before we describe them we firstneed to discuss the energy cost of variability
Impact of Variability on Energy
So far we have examined the optimization problem as if weknew what the desired performance requirement was, and wealso had the relationship between our control variables (Vdd,Vth, etc.) and performance. Neither of these assumptions istrue in a real system. If we build a fixed system for anapplication with variable computation rates, its performancemust exceed the requirements of the application, and its power must always be smaller than what the system cansupport. Since we have shown that higher levels of performance require higher energy per operation, this solutionwill, on average, waste energy.As an example, consider a system with no variations exceptthat the input comes in bursts, and the machine is active only1% of the time. If techniques such as power gating (alsoknown as sleep transistors [14]) are not used, the optimal Vth
will make the leakage power 30% of the average active power, or 100x lower than in the case when the unit is busyall the time. This will increase V th by roughly 160mV, andforce Vdd to rise by a similar percentage to maintain thedesired performance. The increase in Vdd makes th energy per operation higher, so the low duty cycle is translating toloss in power. If the threshold increases by 50%, then Vdd.
will increase by roughly 40%, roughly doubling the energy of each operation.
5
Unlike the deterministic optimization problem that wasdescribed in the previous section, fabrication variationschange the problem into the optimization of a probabilisticcircuit. The inability to set all device parameters to exactlytheir desired value has an energy cost. To understand thiscost and what can be done to reduce it, we first need to look atthe types of variability that occur in modern chips. Theuncertainty in transistor parameters can be broken into threelarge groups by looking at how the errors are correlated. Dieto Die (D2D) variations have large correlation distances andaffect all transistors on a die in the same way. Within Die(WID) variations are correlated only over small distance,affecting a group of transistors on the die. Random variations(Ran) are uncorrelated changes that affect each transistor – this last group depends on the area of the device [11]. Thecorrelated variations are often systematic in nature, and canoften be traced to design differences in parameters such aslocal density or device orientation.With uncertainty, the first question is what to use as theobjective function for the optimization. Generally, one wantsto optimize the energy and performance specifications so thatsome fraction of the parts will meet these targets. For example, if we wanted to sell 80% of the parts, the performance specification would be the performance of the part that is slower than 90% of the distribution, and theenergy spec would be the energy of the part that is higher than90% of the distribution. Thus in the face of uncertainty, theoptimizer must use this lower performance and higher power as the metrics for the part, even though they can’t exist on thesame die. This cost is easy to see for D2D variations since alltransistors will be changed by the same amount, so theunderlying optimization problem remains the same. Fig 5shows how the optimal energy-performance curve degrades asthe uncertainty in V
th
increases.While the optimization problem gets more complex with Ranand WID variations, since one must consider variationsduring construction of the delay paths to be optimized, sometools for this task are starting to emerge [12],[13]. The effectof V
th
variation on leakage current is also critical, but it iseasier to calculate. For leakage, we are interested incalculating the average leakage current of each transistor, andfor exponential functions, this can be much larger than predicted by simply using the average Vth. Even thoughaveraging all of the device’s threshold voltages together mayresult in the desired V
th
, the leakage of the devices with lower thresholds will be exponentially larger than that of the deviceswith high threshold. This means that the total leakage will bedominated by the devices with lower thresholds, and hencethe average leakage per device will be significantly higher than the leakage of a single device with the average Vth.
.
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.