#### Mitigating Parameter Variation with Dynamic Fine-Grain Body Biasing<sup>\*</sup>

Radu Teodorescu, Jun Nakano, Abhishek Tiwari and Josep Torrellas University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu





\*to appear in MICRO-40, December 2007

## Motivation

- Technology scaling continues
- More and more transistors every generation!
- However...
- Chips are increasingly affected by parameter variation





#### **Parameter Variation**

- Process variation
  - Manufacturing at low feature sizes
- Temperature variation
  - Uneven activity distribution
- Supply voltage variation
  - IR drop, di/dt noise





Intel Corp.



#### Effects of Parameter Variation

- Higher power consumption
- Lower frequency
- Uncertainty in the design process



#### Outline

- A Model of Process Variation
- **Dynamic Fine-Grain Body Biasing**
- Evaluation
- Conclusions



#### Outline

- A Model of Process Variation
- Dynamic Fine-Grain Body Biasing
- Evaluation
- Conclusions



# A Model For Process Variation

- Fast, simple and parameterizable model
- We model two key process parameters:
  - Transistor critical dimension (L<sub>eff</sub>) and threshold voltage (V<sub>th</sub>)
- We also model temperature effects



# Variation Components

- Granularity:
  - Within die
  - Die-to-die



#### Die-to-die



- WID variation:
  - Systematic variation
  - Random variation



8 Radu Teodorescu

# A Model For Process Variation

• Variation in any parameter P:

 $\Delta P = \Delta P_{D2D} + \Delta P_{WID} = \Delta P_{D2D} + \Delta P_{rand} + \Delta P_{sys}$ 

We focus on WID variation

- D2D is a chip-wide offset to  $\Delta P_{WID}$
- Random and systematic components
  - Modeled as normal distributions
  - Treated separately impact different levels of the microarchitecture



#### Systematic Variation

- We divide the chip into a grid of points
  - Each point has one random value of  $\Delta P_{sys}$
- Multivariate normal distribution ( $\mu_{sys}=0, \sigma_{sys}$ )
  - Characterized by a correlation function:

$$corr(P_{\vec{x}}, P_{\vec{y}}) = \rho(r) ; r = |\vec{x} - \vec{y}|$$



- Correlation is position independent and isotropic
- For  $\rho(r)$  we choose the spherical model

#### **Spherical Model**



Stronger correlation



Weaker correlation



Matches measured data [Friedberg et al. 05]





#### **Random Variation**

- Random variation transistor level
- We model it analytically as a normal distribution
- Both  $\Delta P_{rand}$  and  $\Delta P_{sys}$  are normal and independent with  $\sigma_{rand}$  and  $\sigma_{sys}$

$$\sigma_{total} = \sqrt{\sigma_{rand}^2 + \sigma_{sys}^2}$$



#### Outline

- A Model of Process Variation
- Dynamic Fine-Grain Body Biasing
- Evaluation
- Conclusions



# **Body Biasing**

- Well known technique for V<sub>th</sub> control
- A voltage is applied between source/drain and substrate of a transistor

• Forward body bias FBB - 
$$V_{th} \downarrow$$
 - Freq  $\uparrow$  - Leak  $\uparrow$ 

- Reverse body bias RBB  $V_{th}$  1 Freq 4 Leak 4
- Useful knob to control frequency and leakage



#### **Body Bias Design Space**

| Time<br>Space | Static<br>BB fixed for chip lifetime                       | Simple<br>adaptation<br>FBB in active mode<br>RBB in standby                              | <b>Dynamic</b><br>BB changes with T<br>and workload                         |
|---------------|------------------------------------------------------------|-------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| Chip-wide     | <ul> <li>D2D variation</li> <li>[Intel Xscale]</li> </ul>  | <ul> <li>D2D variation, power,<br/>performance</li> <li>[Intel's 80-core chip]</li> </ul> |                                                                             |
| Fine-grain    | <ul> <li>WID variation</li> <li>[Tschanz et al]</li> </ul> | <ul> <li>WID variation, power,<br/>performance</li> </ul>                                 | <ul> <li>WID variation</li> <li>T variation<br/>(space and time)</li> </ul> |



15 Radu Teodorescu

#### **Body Bias Design Space**

| Time<br>Space | Static<br>BB fixed for chip lifetime                      | Simple<br>adaptation<br>FBB in active mode<br>RBB in standby                              | <b>Dynamic</b><br>BB changes with T<br>and workload                         |
|---------------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| Chip-wide     | <ul> <li>D2D variation</li> <li>[Intel Xscale]</li> </ul> | <ul> <li>D2D variation, power,<br/>performance</li> <li>[Intel's 80-core chip]</li> </ul> |                                                                             |
| Fine-grain    | • WII <mark>S-FGBB</mark><br>[Tschanz et al]              | <ul> <li>WID variation, power,<br/>performance</li> </ul>                                 | <ul> <li>WID variation</li> <li>T variation<br/>(space and time)</li> </ul> |



15 Radu Teodorescu

#### **Body Bias Design Space**

| Time<br>Space | Static<br>BB fixed for chip lifetime                      | Simple<br>adaptation<br>FBB in active mode<br>RBB in standby                              | Dynamic<br>BB changes with T<br>and workload                             |
|---------------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|
| Chip-wide     | <ul> <li>D2D variation</li> <li>[Intel Xscale]</li> </ul> | <ul> <li>D2D variation, power,<br/>performance</li> <li>[Intel's 80-core chip]</li> </ul> |                                                                          |
| Fine-grain    | • WI[ <mark>S-FGBB</mark><br>[Tschanz et al]              | • WID variation, power, performance                                                       | <ul> <li>WID variation</li> <li>T D-FGBB<br/>(space and τime)</li> </ul> |



# Motivation for D-FGBB

- Body bias trades off frequency for leakage
- **Optimal** body bias:

The **lowest** FBB or **highest** RBB s.t. circuit delay meets frequency target



• Therefore optimal BB changes with temperature





I6 Radu Teodorescu

# Motivation for D-FGBB

Switching Frequency

Ņ

50

Vth = 0.120V

Vth = 0.135V Vth = 0.150V

Vth = 0.165VVth = 0.180V

60

70

80

Temperature (C)

- Body bias trades off frequency for leakage
- **Optimal** body bias:

The lowest FRR or highest RRR s.t. circuit d target body bias optimal as T changes

- Circuit delay changes with temperature
- Therefore optimal BB changes with temperature

90

100

# Finding the Optimal BB

- Measure the delay of each BB cell
- Critical path replicas to sample cell delay
- Phase detector "times" the critical path replica
  - If slow FBB signal raised
  - If fast RBB signal raised





# Applying Fine Grain BB





18Radu Teodorescu

# Applying Fine Grain BB





18Radu Teodorescu

# Applying Fine Grain BB





18Radu Teodorescu

#### **Applications of D-FGBB**

| Operating<br>environments | S-FGBB                       | D-FGBB                        |
|---------------------------|------------------------------|-------------------------------|
| Normal                    | Improve chip operating point | Save leakage<br>power         |
| High<br>Performance       | Improve chip operating point | Increase average<br>frequency |
| Low Power                 | Save leakage<br>power        | Save leakage<br>power         |



#### **Applications of D-FGBB**

| Operating<br>environments | S-FGBB                          | D-FGBB                        |
|---------------------------|---------------------------------|-------------------------------|
| Normal                    | Improve chip<br>operating point | Save leakage<br>power         |
| High<br>Performance       | Improve chip operating point    | Increase average<br>frequency |
| Low Power                 | Save leakage<br>power           | Save leakage power            |



# Improving a Chip's Operating Point



21 Radu Teodorescu

Intel PhD Fellowship Forum, October 2007



# Improving a Chip's Operating Point

• Post-manufacturing calibration phase:

- 1. Bring chip to T<sub>cal</sub>
- 2. Set target frequency  $F_{cal}^{0}$ , and run at full load
- 3. BB is adjusted automatically
- 4. Measure total power  $P_{cal}$ : if  $P_{cal} < P_{target}$ ,  $F_{cal}^1 = F_{cal}^0 + +$ , else  $F_{cal}^1 = F_{cal}^0 - -$
- 5. Repeat if needed, until  $P_{cal} \approx P_{target}$
- F<sub>cal</sub><sup>i</sup> becomes the chip's frequency



22 Radu Teodorescu

#### D-FGBB Adapts to Changes in T

• Calibration temperature T<sub>cal</sub> is conservative

• Average T much lower:



Functional Units



#### **D-FGBB Saves Leakage Power**

- S-FGBB finds and sets F<sub>cal</sub>
- D-FGBB adjusts dynamically to T changes to save power while running at F<sub>cal</sub>





#### **Applications of D-FGBB**

| Operating<br>environments | S-FGBB                          | D-FGBB                        |
|---------------------------|---------------------------------|-------------------------------|
| Normal                    | Improve chip operating point    | Save leakage<br>power         |
| High<br>Performance       | Improve chip<br>operating point | Increase average<br>frequency |
| Low Power                 | Save leakage<br>power           | Save leakage<br>power         |



## **D-FGBB Improves Performance**

- Average power P<sub>avg</sub><P<sub>max</sub>
- D-FGBB is used to push the chip to F<sub>avg</sub>>F<sub>cal</sub>, as long as P<P<sub>max</sub>





#### **Applications of D-FGBB**

| Operating<br>environments | S-FGBB                          | D-FGBB                        |
|---------------------------|---------------------------------|-------------------------------|
| Normal                    | Improve chip operating point    | Save leakage<br>power         |
| High<br>Performance       | Improve chip<br>operating point | Increase average<br>frequency |
| Low Power                 | Save leakage<br>power           | Save leakage<br>power         |



27 Radu Teodorescu

Intel PhD Fellowship Forum, October 2007

#### **D-FGBB Saves Leakage Power**

- The chip runs at its original Forig
- D-FGBB adjusts dynamically to T changes to save power while running at Forig





#### Outline

- A Model of Process Variation
- Dynamic Fine-Grain Body Biasing
- Evaluation
- Conclusions



#### **Evaluation Infrastructure**

- Statistical package R to generate variation maps for 200 chips
- SESC cycle accurate microarchitectural simulator - execution time, dynamic power
  - Mix of SPECint and SPECfp benchmarks
- HotLeakage, SPICE model leakage power
- Hotspot temperature estimation



#### **Evaluation Infrastructure**





#### **Evaluation Methodology**

- 4-core CMP, based on Alpha 21364
- 45nm technology, 4GHz
- V<sub>th</sub> variation:  $\sigma_{Vth}/\mu_{Vth}=0.3-0.12$ ,  $\sigma_{sys}=\sigma_{rand}$
- L<sub>eff</sub> variation  $\sigma_{\text{Leff}} = \sigma_{\text{Vth}}/2$
- $V_{dd}=1V$ ,  $V_{th0}=150mV$ ,  $V_{bb}=\pm500mV$



#### **CMP** Architecture





#### **Body Bias Cells**

- We partition each core into BB cells
- Shapes and sizes follow functional units





34

#### Variation Impact





#### **Applications of D-FGBB**

| Operating<br>environments | S-FGBB                          | D-FGBB                        |
|---------------------------|---------------------------------|-------------------------------|
| Normal                    | Improve chip<br>operating point | Save leakage<br>power         |
| High<br>Performance       | Improve chip operating point    | Increase average<br>frequency |
| Low Power                 | Save leakage<br>power           | Save leakage power            |



36 Radu Teodorescu

# S-FGBB Improves the Chip's Operating Point





#### **D-FGBB Reduces Leakage**



Number of BB Cells

- Large leakage reduction after binning: 28-42%
- More BB cells result in higher savings



#### **Applications of D-FGBB**

| Operating<br>environments | S-FGBB                          | D-FGBB                        |
|---------------------------|---------------------------------|-------------------------------|
| Normal                    | Improve chip operating point    | Save leakage<br>power         |
| High<br>Performance       | Improve chip<br>operating point | Increase average<br>frequency |
| Low Power                 | Save leakage<br>power           | Save leakage<br>power         |



#### **D-FGBB Improves Frequency**



- Average frequency improvement 7-9% over S-FGBB and 7-16% over NoBB
- More BB cells result in higher increase

40 Radu Teodorescu

Intel PhD Fellowship Forum, October 2007



#### **Power Cost**



Significant power cost, but still within the power budget



4I Radu Teodorescu

#### **Applications of D-FGBB**

| Operating<br>environments | S-FGBB                          | D-FGBB                        |
|---------------------------|---------------------------------|-------------------------------|
| Normal                    | Improve chip operating point    | Save leakage<br>power         |
| High<br>Performance       | Improve chip<br>operating point | Increase average<br>frequency |
| Low Power                 | Save leakage<br>power           | Save leakage<br>power         |



42 Radu Teodorescu

Intel PhD Fellowship Forum, October 2007

#### **D-FGBB Reduces Leakage**



Number of BB Cells

- Large leakage reduction at constant frequency: 10-51% vs. S-FGBB and 12-69% vs NoBB
- More BB cells result in higher savings



# Combining D-FGBB with DVFS

- D-FGBB targets leakage power
- DVFS targets mostly dynamic power
- Can they be combined effectively?



# Combining D-FGBB with DVFS



- D-FGBB scales well with DVFS
- S-FGBB does not scale unless calibrated at multiple voltages



#### Conclusions

- D-FGBB is an effective and versatile tool to address parameter variation
- We show three scenarios:
  - Normal: 28-42% leakage savings vs. S-FGBB
  - High performance: 7-9% frequency increase
  - Low power: 10-51% leakage reduction vs. S-FGBB
- Combines well with DVFS



# More in our MICRO 2007 paper

#### http://iacoma.cs.uiuc.edu

- More details on the variation model
- A solution for combining D-FGBB with DVS
- Estimated overheads of D-FGBB
- More implementation details

# Thank you! Questions?

