## Fiber-to-the-processor and other challenges for photonics in future systems

A.F.J. Levi

http://www.usc.edu/alevi

with contributions from

Bindu Madhavan – USC and Agilent Technologies

Stanford, April 21, 2005

## What is a system ?

#### VSR interconnect

- Understand electronics in systems
  - Definition of system
    - Complex enough to require system area network
      - Multi-processor rack-based system, router, data center, telephone switch, automobile etc., are systems
      - Cell-phone, telephone handset, camera, pocket calculator, etc., are not complex enough to be systems
  - Chip IO performance
  - Backplane performance
- Chassis systems composed of passive backplane with connectors for linecards
  - Backplane supplies power to linecards
  - Connectors are interconnected by traces in backplane
- Chassis systems have slots for linecards that plug into backplane at connectors
- > Total chip-to-chip interconnect length up to 1meter.
- Interconnect loss is a tradeoff between
  - Cost improved line-characteristic using costlier dielectric materials, blind-via techniques,counterboring of backplane press-fit connector vias.
  - Density reduced signal density at linecard-backplane interface allows for cheaper PCB manufacturing options





# System interconnect hierarchy and advanced optical solutions



#### Parallel optical interconnect products emerge from DARPA funded POLO – PONI – MAUI programs

POLO-PONI-MAUI

PONI (1997 – 2000) - inspired products for 10 m – 600 m POLO (1994 - 1997) interconnect lengths: Agilent, Zarlink, Picolight, Gore, Emcore, Paracer, E20, Silicon Light Machines, Cielo Agilent announced 12 x 3.3 Gb/s = 40 Gb/s November 2000 Full production November 2001. customers: Nortel. Cisco. IBM 12 x 10 Gb/s = 120 Gb/s demonstrated 2003 1995 2000 2004 time VCSELs / PINs Guide pin Passives MAUI (2002 – present) **Combination of VCSEL** WDM and parallel fiber Optics optic technology for FTTP 1 m – 100 m interconnect length applications 240 Gb/s < 1 WSilicon IC Flex circuit Metal base demonstrated 2004 8 mm x 6 mm PMOSA 240 – 1000 Gb/s, < 1W

#### **Parallel optics and CMOS integration**

POLO

#### HP experimental JetStream ring network 1 Gb/s Tx 1 Gb/s Rx

Afterburner

JetStream





← 144 mm -►



July 1995

Point-to-point host interface for parallel optics 16 Gb/s Tx 16 Gb/s Rx





October 1997

Ring network for parallel optics integrated in single CMOS IC 20 Gb/s Tx 20 Gb/s Rx 20× JetStream on a chip



Link Adapter Chip for parallel fiber-optic ring network

-400,000 transistors includes ring MAC

- 10.2 mm x 7.3 mm in 0.5  $\mu m$  CMOS
- -tape-out 8.17.00, received 11.10.00

December 2000

## New markets for optical interconnects: Solving the electronics interconnect and packaging mess!

FTTP



The SAN The memory access bottleneck



- Integration trend places multi-processors on single chip
  - Chip multi-processor (CMP) from Broadcom (SiByte BCM1250)
- > Main memory likely to remain separate in most systems
  - 10nm CMOS circuits have 100M transistors/mm<sup>2</sup>
    - 6 transistors per bit in SRAM  $\rightarrow$  16 Mb = 2MB/mm^2 or 200MB/cm^2
    - 1 transistor per bit in DRAM  $\rightarrow$  100 Mb = 12MB/mm<sup>2</sup> or 1.2GB/cm<sup>2</sup>
      - Might be useful for single-chip notebook computer or make an interesting L2 cache for a CMP
- Multiple processor boards in chassis systems are connected by switches

## 1U (1.75") thick 20-port GbE switch/router for chassis servers (2001)

#### System example

- 96W, hot-swappable 20port GbE router
  - ➤ 15.5" x 5.35"
  - ~2300 components
  - ~7000 nets, ~11000 pins
  - Electrical and optical **GbE IO** 
    - > 8 GbE optical links
    - > 8 GbE backplane links
    - > 4 GbE Cat-5 links



Management Microprocessor and support circuitry



Quad 8-port, mesh-connected GbE Switch ICs with 20 external ports

SERDES + dual guad-channel MMF optical modules

## Integration and packing driven processor crisis: The case for fiber-to-the-processor (FTTP)

System level issues

#### Electronics fails to deliver

- Power crisis projected kW CPU not viable  $\geq$
- Processor crisis driving multi-core processor design with increased IO demand and only a fraction of transistors being active at any one time
  - Intel moves to CMP and Pentium IV uni-processor development terminated - 2005
- **Bandwidth density and latency crisis** 
  - increasing mismatch between memory bus bandwidth and CPU
  - > many CPU cycles wasted after cache miss
- Signal integrity crisis
  - EMI, reflections, crosstalk, device noise may lead the way to optical interconnects
  - high-speed electrical signaling not reliable
    - \$400M i820 memory translator hub recall because of electrical noise - 5.10.00
    - 1.13 GHz PIII recall because of electrical noise in circuit element - 8.28.00
- Fiber-to-the-processor is a new design point
  - Less power, less power density in distributed system using WDM SAN
  - Better signal integrity, optical isolation
  - More bandwidth density gives reduced latency in node and SAN
  - **Removes electrical backplane bottleneck for future** multi-processor systems



### **Optical interconnects and the memory access bottleneck**

FTTP



## FTTP: A new architecture enabled by optical interconnects and high-performance CMOS integration



#### **Example latency estimate**



## System impact of increased available bandwidth: Reduced message latency and improved scaling

 $k = \sqrt[n]{N}$ 

 $D = n \cdot \frac{k}{k}$ 

 $n = \frac{Ports}{2}$ 



- The 4-SAN ports can be used to design a 2-D torus with  $N = k^2$ processors (n = 2, N = [16, 64, 256, 1024])
- Message latency is

$$t_{\text{message}\_latency} = \frac{k}{2} (t_{\text{r}} + t_{\text{s}} + t_{\text{w}}) + \frac{L}{BW}$$

- For 32 processor network
  - 32 GB/s, 4-port switch achieve × 1.5 better no-load average message latency compared with to a 20 GB/s, 6-port switch
    - (x 1.36 better no-load average message latency for 2048 • processors)

- Bisection-bandwidth and message latency for a k-array n-cube network
  - A network with n-dimensions and k-nodes per dimension

$$BW_{bisection} = 2 \cdot BW_{port} \cdot k^{n-1}$$

$$t_{\text{message}\_latency} = D \cdot (t_r + t_s + t_w) + \frac{L}{BW}$$

- Where N  $\rightarrow$  Total number of nodes
  - $\rightarrow$  Number of nodes in each dimension
  - $\rightarrow$  number of dimensions n
  - $\rightarrow$  Average distance between any pair of nodes D
  - $\rightarrow$  Time to make routing decision (10 cycles, < 20 ns) t.
  - $\rightarrow$  The delay through switch (6 cycles, < 20 ns) t.
  - $\rightarrow$  The interconnection delay (1.0 m hop length)
  - BW  $\rightarrow$  Bandwidth of each port = B × W. Where B is the bandwidth of each line, and W is port width
  - $\rightarrow$  Packet length (1 kB) L



3-array, 2-cube (2-D torus)

3-array, 3-cube (3-D torus) wrap-around not shown

#### System impact of reduced cache miss



#### **Simulation assumptions**

- L1 hit rate 90% (based on third party test results)
  - http://www.aceshardware.com/Spades/read.php?article\_id=20000190
- L2 access latency 9 cycles (based on P4)
  - http://www.aceshardware.com/Spades/read.php?article\_id=20000190
- L3 access latency 20 cycles (based on Merced)
  - http://www.geek.com/procspec/features/itanium/index.htm
  - Assume 96% of the memory access is satisfied by L1 and L2.
- 5.0 GHz processor speed
- 1.3 cycles per instruction
  - Using Intel assumptions
    - http://developer.intel.com/design/pentium4/manuals/248966.htm
    - Each instruction is sub-divided into micro-ops during execution

Impact of memory access bandwidth on cache hit rate not taken into account

 Improved BW improves hit-rate because of reduced prefetch distance

Performance of FTTP with only L2 cache and 96% cache hit rate is equal to RAMBUS with L2 and L3 with 99.3% cache hit rate

 Adding a L3 cache to hide memory access latency does not out perform FTTP

## Fiber-to-the processor: Exposing raw CPU performance

System level issues

- Single-chip multi-CPU module with integrated switch and optical system area network (SAN)
  - SoC internal bandwidth 10GHz×128×2×2=5.12Tb/s
- Main memory module with highperformance optical IO port
- All off-chip high-speed signals are optical
  - 1.28 Tb/s×5 ports = 6.4 Tb/s SoC
     IO bisection bandwidth
    - RDMA ready
    - 1RU electrical backplane supports only two (2) SoC processors
    - Number of SoC processors using FTTP backplane determined by power dissipation
- All off-chip slow-speed signals are electrical (including electrical power)



## FTTP exposes raw CPU performance with multiple serial optical chip-to-chip interconnects



#### Flip-chip optical socket LGA concept



- Today at USC: 1.27mm pitch FC-LGA, 40 x 40 mm<sup>2</sup>, 960-pin, Rogers 2800 dielectric, estimated price \$30 in 10k volume
- > 212.5 mm center-to-center IC pad-pitch
  - > Option 1: 6.5 x 6.5 mm<sup>2</sup> IC = 216 diff IO
  - Option 2: 5.0 x 5.0 mm<sup>2</sup> IC = 108 diff IO
- Package performance
  - ➢ -3dB > 20 GHz, NEXT < -30 dB</p>
  - Can be improved to -3dB ~40 GHz, NEXT < -30 dB</p>
- Easily modified to implement "optical socket" for fiber to the processor



- Package level optical interconnect for inter-chip optical buses
- 8mm x 5mm chip scale optical port is a prototype today
- Today: 0.48 Tb/s, <2W unidirectional fiber-optic port
- Future: >1 Tb/s, <1W unidirectional fiber-optic port</p>
- Includes alignment pins for MTferrule with 12-fiber ribbon

### A system architecture roadmap: The FTTP opportunity



#### 'Optics will not speed up memory access'

- said Howard Davidson, OIDA, October 21, 2004, Burlingame, CA.
- Actually only true for for SMP and its current programming model in which latency is dominated by global directory coherency
  - NUMA, which has local coherency, does not suffer from this problem but you have to change your software

Embracing myths as truths avoids the need to innovate



# Impact of decreasing CMOS device feature size on interconnect: 80 Gb/s serial IO

#### Scaling trends





#### Transistor scaling to 10 nm CMOS by 2016

- 100 M transistors/mm<sup>2</sup> (2 Intel Pentium-IV processors)
  - Scaling fails due to IO, on-chip wiring, and  $V_{\rm dd}$  ~ 0.8 V to give 10-60 W power dissipation
- 80 Gb/s IO based on PAM-4,  $f_{\tau}$  > 400 GHz and 400 mW
- High-speed IO pad-pitch improvement limited by crosstalk and package material properties
- 75  $\mu$ m pad diameter and 150  $\mu$ m pitch
- 36 bond-pads/mm<sup>2</sup>
- 9 differential pair IO/mm<sup>2</sup>
- 18 power and ground pads/mm<sup>2</sup>



Fiber to the processor

mm

50

IC IO density75 um dia

150 um

NRZ

## **Challenges for electronics and photonics driven by CMOS** scaling

#### Electronics

| $\label{eq:computation} \end{tabular} \begin{tabular}{ c c c c c } \hline Computation & Co$ | Electronics                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | S                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 10 nm CMOS, $f_T > 400$ GHz, < 10 <sup>-18</sup> J switching energy10 nm CMOS, $f_T > 400$ GHz, < 10 <sup>-18</sup> J switching energy10 nm CMOS, $f_T > 400$ GHz,100 transistors/µm² for random logic500 transistors/µm² for SRAM cells $\circ$ 0.0122 µm² /SRAM single-port cell100M transistors/mm² $\circ$ 0.0122 µm² /SRAM single-port cell100M transistors/mm² $\circ$ 2 Pentium-IV/mm² $\circ$ 2 Pentium-IV/mm² $\circ$ 80 Gb/s IO (PAM-4 and $f_T > 400$ GHz)Integration implies high power density ~ 10-60 W/mm² $\circ$ Assumes 110 °C junction temperature $\circ$ Si thermal conductivity $\kappa = 1.5$ W/cm °C $\circ$ Forces 10 mm² area (~ 1-6 W/mm²) for 100M<br>transistor circuit in 10 nm CMOS ( <i>or</i> liquid<br>cooling) $\circ$ Distributed architecture on chip $\circ$ Benefit from large $f_T$ to reduce power and use<br>high-speed serial IO to reduce packaging cost $\circ$ Remaining area for power regulation, RF-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | Computation                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Communication                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| <ul> <li>10 - 12 metal layers</li> <li>100 transistors/µm² for random logic</li> <li>500 transistors/µm² for SRAM cells</li> <li>0.0122 µm² /SRAM single-port cell</li> <li>100M transistors/mm²</li> <li>2 Pentium-IV/mm²</li> <li>80 Gb/s IO (PAM-4 and f<sub>T</sub> &gt; 400 GHz)</li> <li>Integration implies high power density ~ 10-60 W/mm²</li> <li>Assumes 110 °C junction temperature</li> <li>Si thermal conductivity κ = 1.5 W/cm °C</li> <li>Forces 10 mm² area (~ 1-6 W/mm²) for 100M transistor circuit in 10 nm CMOS (<i>or</i> liquid cooling)</li> <li>Distributed architecture on chip</li> <li>Benefit from large f<sub>T</sub> to reduce power and use high-speed serial IO to reduce packaging cost.</li> <li>Remaining area for power regulation, RF-</li> <li>S11 &lt; -10 dB restricts flip-chip IO pitch on IC/Pkg to 150 µm pitch</li> <li>S11 &lt; -10 dB restricts flip-chip IO pitch on IC/Pkg to 150 µm pitch</li> <li>S11 &lt; -10 dB restricts flip-chip IO pitch on IC/Pkg to 150 µm pitch</li> <li>9 Differential IO/mm², suggests high-speed serial that also reduces backplane design effort</li> <li>Low-loss (&lt; -3 dB), low-crosstalk (&lt; -30 dB), dense</li> <li>Io electrical packages requires</li> <li>tan δ &lt; 0.002</li> <li>E<sub>r</sub> &lt; 2.5</li> <li>Via technology</li> <li>High-aspect ratio, blind-via, tight pad overlap of via, relatively tight registration</li> <li>Low-loss tangent PCB dielectric (tan δ &lt; 0.002)</li> <li>High density, <i>perfect</i> electrical backplane connector is required that is mechanically reliable, manufacturable, low-cost, low-NEXT, and impedance-matched at data rate</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                       | Proc. Mem Comm                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | Pkg trace Connector                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| ετνία απά απαίρα διαμάτες ερίτ τρετ εριμηγοτίρη                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | <ul> <li>10 nm CMOS, f<sub>T</sub> &gt; 400 GHz, &lt; 10<sup>-18</sup> J switching energy</li> <li>10 - 12 metal layers</li> <li>100 transistors/μm<sup>2</sup> for random logic</li> <li>500 transistors/μm<sup>2</sup> for SRAM cells <ul> <li>0.0122 μm<sup>2</sup> /SRAM single-port cell</li> </ul> </li> <li>100M transistors/mm<sup>2</sup></li> <li>2 Pentium-IV/mm<sup>2</sup></li> <li>80 Gb/s IO (PAM-4 and f<sub>T</sub> &gt; 400 GHz)</li> </ul> <li>Integration implies high power density ~ 10-60 W/mm<sup>2</sup></li> <li>Assumes 110 °C junction temperature</li> <li>Si thermal conductivity κ = 1.5 W/cm °C</li> <li>Forces 10 mm<sup>2</sup> area (~ 1-6 W/mm<sup>2</sup>) for 100M transistor circuit in 10 nm CMOS (<i>or</i> liquid cooling)</li> <li>Distributed architecture on chip</li> <li>Benefit from large f<sub>T</sub> to reduce power and use high-speed serial IO to reduce packaging cost</li> | <ul> <li>Controlled-impedance launch to package trace with S11 &lt; -10 dB restricts flip-chip IO pitch on IC/Pkg to 150 µm pitch</li> <li>9 Differential IO/mm<sup>2</sup>, suggests high-speed serial that also reduces backplane design effort</li> <li>Low-loss (&lt; -3 dB), low-crosstalk (&lt; -30 dB), dense IO electrical packages requires</li> <li>tan δ &lt; 0.002</li> <li>ε<sub>r</sub> &lt; 2.5</li> <li>Via technology</li> <li>High-aspect ratio, blind-via, tight pad overlap of via, relatively tight registration</li> <li>Low-loss tangent PCB dielectric (tan δ &lt; 0.002)</li> <li>High density, <i>perfect</i> electrical backplane connector is required that is mechanically reliable, manufacturable, low-cost, low-NEXT, and</li> </ul> |

## **Challenges for electronics and photonics driven by Moore's** Law CMOS scaling

Photonics

| Photonics                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|--|
| Computation       Communication         Proc.       Mem Comm         Optical logic and memory not practical at present time       Fiber optics superior to electrical interconnect on lenses scales ≥ 1 m, using metrics of signal loss, power dissipation and bandwidth         Optical devices cannot match electronic feature size (100 transistors/µm² in 10 nm CMOS) and efficiency or approach computational equivalence for digital processing       Fiber optics superior to electrical interconnect on lenses scales ≥ 1 m, using metrics of signal loss, power dissipation and bandwidth         Lower-power, higher-impedance lines can be us to interface electronics to optical devices. <i>Optical PCB-trace</i> " required for intra-chassis interconnect         Electronic interface to optical devices potentially imited by:       Optical connector has superior form-factor (3× – 10× compared to electrical connector         Bias voltage and current       Low-cost line-card to backplane version of para optics connector needed to enable optical interconnect in chassis         Intimacy of integration requiring fanin/nin/fan-out of controlled impedance lines       Conclude photonics useful for communication in system but presently limited by <i>slow speed</i> photonic devices a incompatibility with PAM-4         Slow speed photonic devices!       ≤ 20 Gb/s digital modulation of laser diodes         Message latency       0.5 ns conversion latency         • 64 B message per signal line 25.6 ns optical, 6 | ed<br>)<br>llel-<br>ems<br>and |  |
| electrical Page 21                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                |  |

## IO bandwidth example for 10 nm / 50 nm CMOS IC

#### CMOS IO



 128-bit wide datapath using 2.5 Gb/s = 320 Gb/s (unidirectional)

### **40GHz Differential PCB via simulation test fixture**

RO4503\_diffvia2\_40GHz\_v4





- Parameterized Ansoft HFSSv9.1 test structure for 100-ohm differential microstrip-stripline transition
  - RO4503 ( $\epsilon_r$ =3.48, tan  $\delta$  = 0.004), trace is copper (5.8E7 S/m), surface roughness not considered, Radiation boundaries on all sides
  - 7-mil wide trace, 8-mil space, 1.2mil thick planes
    - Microstrip: 1.2-mil thick trace, 4-mil dielectric
    - Stripline : 0.7-mil thick trace, 16.7-mil dielectric
  - 100-mil microstrip, 100-mil stripline, 15.7-mil tall via, NO via stub
- Number of geometrical parameters associated with transition varied to determine best fit (least SDD11, max SDD21)
  - Ground plane opening (major and minor axes of ellipse), which affects spacing of guard vias
  - Relative spacing of trace vias, transition length to vias

## Six via model, 33"model - 3 sections of microstrip(0.5")stripline(10")-microstrip(0.5")



- 3 sections of microstrip (0.5")-stripline (10")-microstrip (0.5") transition, nominal 100-ohm differential structures
  - Axis ratios of ground plane ellipse opening=2 for major radius=14mil, via offset =
     9mil from line of symmetry of coupled line structure and transition length=20mil
- Near linear roll off no significant notches or ripples in SDD21/SDD12
- TxLine gives trace loss alone is 35.91 dB at 40 GHz = (30"x1.1 dB/"+ 3"x0.97152 dB/")

## IC interconnect paradigm bifurcation: Optical interconnect insertion in intra-chassis communication



## Incompatible technology paths: Thin-fast electrical IO versus wide-slow optical IO

#### **Thin-fast electrical IO**



Serial electrical IO at chip boundary

- Electrical need 20 dB+ equalization at 28 GHz for 80 Gb/s serial PAM-4
- Power: 400 mW estimate per 80 Gb/s serial link in 10 nm CMOS
- Challenge: PCB connector is the key enabler! Material loss must also be lowered to enable continued use of lowcost electrical links power-efficiently



**Parallel IO at chip boundary** 

- Optics need 8×10 Gb/s or 4×20 Gb/s parallel fiber-optics or WDM
- Power: > 320 mW (8×40 mW) per 8x10 Gb/s parallel link
- Challenge: Per-lane CDR must be avoided and traces from IC to Tx/Rx electronics of optical module must be ~ 1 mm to be competitive in power with electrical; need high yields, thermal regulation and low-cost test

## **Slow-wide packaging solution**

#### > The IBM way

- Large number of IO limited by size of pads and die
- Increase packaging complexity, cost, system integration
- Keep electrical interconnect by using relatively slow signaling rate
- ... and IBM
   microelectronics failed
   to make money in past
   XQs



#### **Packaging roadmap**

#### **Roadmaps**

- Following the directions of roadmaps only makes sense if you can make money on the journey
  - Big companies have a vested interest in following the yellow brick road especially if they can exclude direct competition from using the same road
- > However, if the road turns into a dirt track
  - Off-road technology can win and dinosaurs following the dirt track will die



The yellow brick road to the emerald city



When the road turns to dirt, the dinosaurs die

## Driving force: Opening the 'fat' photonic pipe for global Driving market force for photonics application-on-demand



Source: http://www.caspiannetworks.com/library/presentations/traffic/GEthernet.ppt

## Who is going to provide the components, modules, and system integration?

Where are the new devices going to come from?

#### Volume manufacture and component integration: The new path forward for fiber-optic system development

Volume production







Fiber-optic components and modules

Since the Telco meltdown technology base has moved from US to pacific rim (China) to remove labor cost from products.

Even with zero labor cost, components are *still* too expensive!
 Need

>New high-volume markets (metro-FTTH, FTTP, automotive, ...)

>New cutting-edge technologies must be characterized by:

>Ultra-low cost (small, light-weight, low-power, few sub-component parts, approach cost-of-materials)

**High added value (e.g. integration of multiple functions)** 

> High level of volume manufacturability (10M/month, true  $6\sigma$ )

#### A new platform based on

>Ultra-precise metal coining with nm tolerance

Advanced photonic devices

>High levels of integration with CMOS electronics

#### Volume production with nano-scale precision Example: The fiber-optic connector!

Volume production

Fiber connector average selling price is too high (e.g. \$4 per installed plug in 2006, 500M units)

> Tolerance scale set by wavelength of light  $\lambda_0$  = 1550 nm and mode diameter in fiber

SMF-28e lateral displacement induced loss (dB) = 4.343 (d/r)<sup>2</sup>, d = lateral off-set, r = mode field radius

>± 300 nm typical finish tolerance on 2.5 mm diameter ferrule ( $l / \Delta l = 8,333$ )

>Volume production (>10M/month, >250/min) best if true  $6\sigma$  or < 2 PPB failure rate, c.f. Motorola 'six sigma process'  $\equiv 4.5\sigma$  or 3.4 PPM failure rate

>Assuming normal distribution, true 6 $\sigma$  requires better than  $\sigma$  = 50 nm tolerance

- New volume production nano-technology!
  - Production cost must approach cost-ofmaterials
  - Ultra low-cost, high-volume, precision fiberoptic manufacturing enables revolutionary wide-scale adoption of optics in systems



#### Stamping process is path to cost-of-materials manufacturing

**Precision stamping of SMF MT-RJ:** 

- Closed die process
- Small clearances between punch and punch holder
- > The linear gauge reader is attached to punch and hydraulic pressure monitored for future active tooling





#### New volume markets for optical interconnects: The automobile

- Mercedes-Benz S-class model year 2005 has a fiber-optic data bus backbone operating at Gb/s rates and for the first time using VCSELs (E-class and other models already use LED based systems at ~5 Mb/s )
- Data carried includes several video channels, the entertainment channels, and all sensor data / telemetry
- Fiber beats copper!
  - > 30M fiber links in 2005, over 120M fiber links in 2010



Research and Technology

### Future needs for optical interconnects in multi-processor automobile systems

- > 12-fiber ribbon and multi-Gb/s/fiber
- Ultra-high reliability for real-time processing of drive-by-fiber data in multiprocessor embedded system environment
  - MOST protocol, physical layer standards
  - > Aircraft as secondary market!

#### DAIMLERCHRYSLER

Research and Technology

Future needs for robust high speed data networks



## Summary

- Development of *perfect* electrical connector would be significant technical barrier to optics penetrating ≤ 0.5 m interconnect length in systems
  - Electronic interconnect distance is collapsing to  $\leq 0.5$  m
    - 1RU electrical bisection bandwidth limited to  $\leq$  18.7 Tb/s

#### Challenge for optics is to be competitive with electronic solutions

- Opportunity to implement new architectures such as FTTP (8 Tb/s/SoC) that require optical interconnect inside the box
  - New optical devices
    - Optical electrical socket for FTTP
    - Optical electrical PCB, optical backplane connectors
    - New PAM-4 compatible optical components that directly interface to 80 Gb/s data bandwidth PAM-4 electrical signaling *or* > 40 Gb/s VCSELs,  $I_{th}$  < 0.5 mA at 100 °C,  $I_d$  < 2 mA,  $\eta$  > 0.5
    - Cost-of-materials manufacturing
  - Complete optical solution for system designer
    - Standards for socket, PCB, connectors, testing
    - One-stop shopping
    - Multi-sourcing of components
    - Design tools that are transparent to system designer

#### Adapt and innovate or die !