

#### Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold Processors

#### Chris H. Kim

#### University of Minnesota, Minneapolis, MN

chriskim@umn.edu www.umn.edu/~chriskim/

## **Scaling Challenges**



### **Overcoming the Power Wall**



 Proven solutions: Multi-core chips, dynamic voltage frequency scaling, clock gating, power gating, ...

## **Overcoming the Variability Wall**



 Proven solutions: Variation aware design, memory assist/repair, lithography techniques, adaptive systems

#### UNIVERSITY OF MINNESOTA

#### **Overcoming the Reliability Wall**



 Possible solutions: Guardbanding, sensing and compensation, wear-leveling, failure resistant systems, ...

# Outline

- Device Reliability Issues
- Reliability Monitors and Measurements
- Reliability Effects in NTV Processors
- Summary

## **Aging in CMOS Transistors**



UNIVERSITY OF MINNESOTA

## HCI, BTI, and TDDB in Digital Logic







 Transistors are exposed to different stress conditions during normal digital circuit operation

## Practical Solutions for Preventing Aging Related Failures

- BTI and HCI
  - Gradual decline in performance
  - <u>Guard banding (static or dynamic), adjust Vmax</u>
  - CAD, firmware & architecture level support essential
- TDDB
  - Single incident may lead to outright system failure
  - Can happen anywhere inside a chip
  - Improve fabrication procedure, adjust Vmax
- Bottom line: Precise measurement and understanding of circuit degradation a key aspect of robust design

## **Transistor Lifetime Estimation**



Extrapolate stress results with respect to:

- <u>Op. conditions</u> based on acceleration models
- <u>Larger chip areas</u> (e.g., Poisson scaling for TDDB)
- Lower percentiles based on chosen distribution

## Benefits of In-Situ Reliability Monitors over Device Probing

- Information from actual circuits (test circuit must be representative)
- High (timing) precision + short measurement interrupt
- No expensive equipment
- Short test time and reduced test area
- Measurements at use condition → allows realistic lifetime projection
- Complements traditional probing methods



## Usage Scenarios and Design Issues of In-situ Reliability Monitors

- Usage scenario 1: Process characterization and yield improvement
  - Early technology characterization is often performed before many metallization layers are being fabricated
  - Library cells may not be available (flip-flops, scan)
  - Device probing would still be a competitive solution for extracting analog parameters such as I–V or C–V
- Usage scenario 2: In-field monitoring and data collection
  - Workload unknown
  - Simple circuits are practical but they have limited capabilities
  - Firmware and architecture support needed

### Usage Scenarios and Design Issues of In-situ Reliability Monitors

- Usage scenario 3: Sensor for real time aging compensation
  - Effectiveness versus overhead
  - Measurements are from a proxy circuit
  - Practical issues: type of sensor, temporal granularity, spatial granularity, communication with sensors, interface and protocol
  - Personally not a big fan

# Outline

- Device Reliability Issues
- Monitors and Measurements
- Effects in NTV Processors
- Summary

## Circuit Based Reliability Monitors (or Silicon Odometers)

| Year                             | 2007                                        | 2008                                              | 2009                                                           | 2010                                                 | 2011                                                                           | 2012                                                                        |
|----------------------------------|---------------------------------------------|---------------------------------------------------|----------------------------------------------------------------|------------------------------------------------------|--------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| Die Photo                        |                                             |                                                   |                                                                |                                                      |                                                                                |                                                                             |
| Process                          | 130nm                                       | 65nm                                              | 65nm                                                           | 65nm                                                 | 32nmSOI                                                                        | 32nmSOI                                                                     |
| Odometer<br>Projects             | Original<br>Silicon<br>Odometer             | All-In-One<br>Odometer                            | Statistical,<br>Duty-Cycle,<br>and RTN<br>Odometer             | Interconnect<br>Odometer                             | PBTI and<br>SRAM<br>Odometer                                                   | SRAM and<br>RTN<br>Odometer                                                 |
| Focused<br>Reliability<br>Issues | NBTI<br>Induced<br>Frequency<br>Degradation | Separately<br>Monitoring<br>NBTI, HCI and<br>TDDB | Statistical<br>Behavior of<br>NBTI;<br>RTN on Logic<br>Circuit | Impact of<br>Interconnect<br>on BTI and<br>HCI Aging | Monitoring<br>PBTI in HKMG<br>Process;<br>BTI Impact on<br>SRAM Read/<br>Write | SRAM Timing<br>Issues Due to<br>BTI;<br>RTN Impact<br>on Ring<br>Oscillator |

### **Beat Frequency Silicon Odometer**



- Beat frequency of two free running ROSCs measured by DFF and edge detector
- Benefits of beat frequency detection system
  - Achieve ps resolution with µs measurement interrupt
  - Insensitive to common mode noise such as temperature drifts
  - Fully digital, scan based interface, easy to implement

#### **Beat Frequency Silicon Odometer** Stressed ROSC (freq=f<sub>stress</sub>) Beat **Phase** Frequency Comp. Counter Reference ROSC (freq= $f_{ref}$ ) **PC OUT** *f<sub>ref</sub>* : 1.00GHz Output $(f_{beat} = f_{ref} - f_{stress})$ Count: **f**<sub>stress</sub>: Ν 0.99GHz 100 0.98GHz 50

- Sample stressed ROSC output with reference ROSC
  - 1% frequency difference before stress  $\rightarrow$  N=100
  - 2% frequency difference after stress  $\rightarrow$  N=50
  - $-\Delta f$  or  $\Delta T$  sensing resolution is >0.01%

#### **ROSC Based Aging Sensor Comparison**

| System                                              | Single ROSC                                                                                                                         | 2 ROSC, simple                                                                                        | 2 ROSC, beat freq.                                                                            |
|-----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| Block<br>Diagram                                    | Stress ROSC ◆ Counter =<br>Variable N2 →                                                                                            | T1 = constant<br>Ref. ROSC<br>T2 = degrades<br>Stress ROSC<br>Variable N2<br>Counter =<br>Variable N2 | T1 = constant<br>Ref. ROSC<br>T2 = degrades<br>Stress ROSC<br>$f_{Pc} = f_{ref} - f_{stress}$ |
| Function                                            | Count Stress ROSC<br>periods during externally<br>controlled meas. time                                                             | Count Stress ROSC<br>periods during N1 periods<br>of Ref. ROSC                                        | Count Ref. ROSC periods<br>during one period of<br>PC_OUT                                     |
| Features                                            | Simple; compact                                                                                                                     | Simple; immune to common mode variations                                                              | High resolution w/ short<br>meas. time; immune to<br>common mode variations                   |
| Issues                                              | Voltage and temp.<br>varations; meas. time vs.<br>resolution tradeoff; requires<br>absolute timing reference<br>(e.g. oscilloscope) | Meas. time vs. resolution<br>tradeoff                                                                 | Requires extra circuits<br>(e.g., Phase Comp., edge<br>detector, etc)                         |
| Meas. time<br>for 0.01%<br>max res. *               | 30 µs                                                                                                                               | 30 µs                                                                                                 | 0.3 µs                                                                                        |
| Meas. error<br>wrt. common<br>mode<br>variations ** | +10.18% / -8.57%                                                                                                                    | +0.26% / -0.38%                                                                                       | +0.06% / -0.07%                                                                               |

\*ROSC period = 3 ns \*\* simulated with +/- 4%  $\triangle$ VCC

### Separately Monitoring NBTI and PBTI







**PBTI** becoming an important concern in high-k metal-gate

- **Conventional Ring Oscillator (ROSC)** ulletcan only provide overall frequency degradation information due to combined NBTI and PBTI effects
- New RO structure separates NBTI and  $\bullet$ **PBTI effects**

J. Kim, et al., IBM, IRPS 2011

Vstr

#### UNIVERSITY OF MINNESOTA

#### **Separately Monitoring BTI and HCI**



#### UNIVERSITY OF MINNESOTA

### **Separately Monitoring BTI and HCI**



- Backdriving action equalizes BTI in both BTI\_ROSC and DRIVE\_ROSC
- Negligible HCI in BTI\_ROSC: only 3-5% of the switching current in the DRIVE\_ROSC
- Fresh power gates are used for frequency measurements

#### **Temp. and Voltage Dependencies**



- HCI slightly reduced with temperature
  Due to reduced drain current
- Both mechanisms degrade with stress voltage
  - Point when HCI begins to dominate pushed out in time by >1 order of magnitude at 1.8V vs. 2.4V



- Interconnect affects the voltage and current shapes
  - Increased transition time (decreased slew rate)
  - Increased current pulse; decreased current peak value
- BTI and HCI have different sensitivities to bias conditions

#### **Interconnect Aging Monitor**





| Process            | 65nm LP CMOS                                     |
|--------------------|--------------------------------------------------|
| Core / IO Supplies | 1.2V / 2.5V                                      |
| Stress Voltage     | 1.8V, 2.4V                                       |
| Active Area        | 0.182mm <sup>2</sup>                             |
| Interconnect Layer | M2, W=100nm, double<br>shielded w/ 100nm spacing |
| Δf Resolution      | > 0.016%                                         |
| Meas. Interrupt    | < 3µs                                            |

- Serpentine wires for a dense chip implementation
- Ground shielding on both sides for reducing noise

X. Wang, et al., IRPS 2012, TVLSI 2014

### **BTI and HCI Aging: With Interconnect**



- BTI aging decreases with interconnect length
- HCI degradation peaks at L=500µm

## **BTI Aging vs. Interconnect Length**



- BTI induced frequency degradation decreases with longer interconnect
- Longer transition time → shorter PMOS stress duration → Less BTI aging

## HCI Aging vs. Interconnect Length



- HCI aging exhibits a non-monotonic behavior with respect to interconnect length
  - Current pulse width increases
  - Current peak decreases

#### **Statistical Behavior of Aging**



- Finite number and random spatial distribution of discrete charges → NBTI & HCI variation
- Inversely proportional to A<sub>GATE</sub> → worse with scaling
- Small number of aging measurements not sufficient to characterize aging

## **Statistical Reliability Monitor**

- Need stressed & reference ROSC frequencies to be close
- Difficult, costly to tune each stressed ROSC
- Use multiple ref. ROSCs with different frequencies
- Cover the frequency distribution of the stressed array



J. Keane, et al., IEDM 2010, JSSC 2011

## 65nm Test Chip Data



- Fresh and post-stress ROSC frequency PDFs
- No significant correlation of the frequency shift with fresh frequency

## SRAM Memory Design Challenges at Low Supply Voltages



- Ratio-ed operation leads to poor noise margin at low voltages for 6T SRAM cells
- Conflicting requirements: a stronger access transistor improves write margin but worsens read margin

## Impact of BTI on SRAM Read

Read





Cell recovers on a fail



## Impact of BTI on SRAM Write

Write





Cell recovers on a pass



#### **Representative SRAM Reliability Macro**



- Represents a product SRAM sub-array
- BIST function done by on-chip FSM with supply switches

P. Jain, et al., IEDM, 2012

#### Aging Monitor in IBM Microprocessors Pongfei Flu, Keith Jenkins, IBM, IRPS 2013



- Implemented on IBM's z196 Enterprise systems for long term degradation under real-use conditions.
- Over 500 days worth of ring oscillator degradation data from customer systems
- Other companies have aging monitors too, but they tend not to publish their work



#### Aging Monitor in IBM Microprocessors Pongfei Flu, Keith Jenkins, IBM, IRPS 2013



- Time-zero problem: Some time will elapse between applying voltage (burn-in, test, operation) and making the first measurement → time-zero frequency is completely unknown → incorrect time slope of 0.42
- Use fitting parameters assuming  $\Delta f = A(t-t_o)^n At^n \rightarrow \text{time slope of } 0.172$

#### Aging Monitor in IBM Microprocessors Pongfei Flu, Keith Jenkins, IBM, IRPS 2013

| Design<br>Considerations               | Examples of Practical Issues                                                       | Aging Sensor Implementation<br>in IBM z196 Server [3]                                     |
|----------------------------------------|------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| Type of Sensor                         | BTI, HCI, TDDB, RTN, transient errors, memory bit failures, etc.                   | Ring Oscillator based BTI monitor for long-<br>term frequency degradation measurement     |
| Temporal Granularity                   | Sensing period, threshold setting,<br>dynamic range, etc.                          | Sampling period: once a week                                                              |
| Spatial Granularity                    | Per CPU/GPU/memory, per functional unit, per sub-block, etc.                       | Total: 5 sensors per chip; One sensor per core (x4 cores) plus one sensor in L2 cache     |
| Stress and<br>Measurement<br>Condition | AC vs. DC, accelerated vs. usage condition, fast measurement                       | AC stress, usage condition,<br>0.5ms measurement time                                     |
| Communication                          | Between data gathering sensor,<br>across sensors, between sensors<br>and processor | Sensors are integrated with IBM z196<br>pervasive infrastructure with firmware<br>support |
| Interface and Protocol                 | Interrupt based, polling, event alarms, performance counter based, etc.            | Interrupt based in-field frequency degradation measurement                                |
| Testing and<br>Calibration             | Similar to any other on-chip monitor circuit                                       | Time 0 frequency shift unknown since first sample is taken after some stress              |

# Outline

- Device Reliability Issues
- Monitors and Measurements
- Effects in NTV Processors
- Summary

## **DVFS Systems in ISSCC 2014**



22nm Intel Haswell processor N. Kurd, *et al.*, ISSCC, 2014 22nm IBM POWER8 processor Z. Toprak-Deniz, *et al.*, ISSCC, 2014

 Latest trends: On-chip distributed VRM (fast transients, supply noise suppression), per-core DVS, NTV/Turbo



- Constant V<sub>DD</sub>: Frequency degrades with stress
- High V<sub>DD</sub> to low V<sub>DD</sub>: Freq. dips due to lower V<sub>DD</sub> followed by recovery
- Low  $V_{DD}$  to high  $V_{DD}$ : Freq. jumps and then degrades



- Constant V<sub>DD</sub>: Frequency degrades with stress
- High V<sub>DD</sub> to low V<sub>DD</sub>: Freq. dips due to lower V<sub>DD</sub> followed by recovery
- Low  $V_{DD}$  to high  $V_{DD}$ : Freq. jumps and then degrades

#### **Modeling Approach using Superposition**



C. Zhou, et al., IRPS 2014

#### **BTI Recovery Model using Superposition**



- Stress model: t<sup>n</sup> (power law)
- Recovery model derived from superposition property: ΔV<sub>T,recovery</sub>(t) = t<sup>n</sup>-(t-t<sub>0</sub>)<sup>n</sup>

## Translating $V_T$ Shift to Delay Shift



**Pull-down Delay** 

**Pull-up Delay** 

### Android Development Board for Collecting DVFS Traces

|                 | Processor    | <b>ARM Cortex A15</b> |
|-----------------|--------------|-----------------------|
|                 | System       | Samsung Exynos        |
|                 | System       | 5410 SoC              |
|                 | Process      | 28nm                  |
| Exvnos 5410 SoC | Frequency    | 0.8 – 1.8 GHz         |
|                 | Voltage      | 0.9 – 1.25 V          |
|                 | DVFS meas.   | National Instr.       |
| Sense Resistor  |              | DAQ                   |
|                 | Sampling     | 1000 samples          |
|                 | frequency    | per second            |
|                 | Linux kernel | v 3.4.5               |
|                 | Operating    | Android v 4 2 2       |
|                 | system       |                       |

- V<sub>DD</sub> and operating frequency collected in real time
- Navigating websites, running benchmark applications

#### Sample Waveform and Estimated Frequency Shift



- High V<sub>DD</sub> duration: Freq. degrades with time
- Low V<sub>DD</sub> duration: Freq. shift dips and then recovers

#### **Applying Model to Other DVFS Traces**



- Worst case frequency dip
  - 3D-raytrace: Δf=1.0% at t=6s when V<sub>DD</sub> drops by 29% after staying in high V<sub>DD</sub> mode for 5.8s

# Summary

- Power wall (2000) → Variability wall (2010) → Reliability wall (2020)
  - Example: NTV + RDF + BTI
- Aging sensor deployed for the first time in a commercial processor (IBM z systems)
- Per-Core DVFS with sub-microsecond ramp time becoming a standard feature in new processors
- Turbo boost + NTV: Best of both worlds in terms of power and performance, but presents new reliability challenges