

### The Coming Age of Extreme Heterogeneity

Jeffrey S. Vetter

With many contributions from FTG Group and Colleagues

ATPESC 2019 Chicago 30 Jul 2019

ORNL is managed by UT-Battelle, LLC for the US Department of Energy





http://ft.ornl.gov

vetter@computer.org



Alternate title

# Does 'A New Golden Age for Computer Architecture' equal 'Dark Ages for Software, Algorithms, and Applications?'

#### Jeffrey S. Vetter

With many, many contributions from workshop participants, FTG Group, and Colleagues

ATPESC 2019 Chicago 30 Jul 2019

ORNL is managed by UT-Battelle, LLC for the US Department of Energy



# Time for a short poll...



Q: Think back 10 years. How many of you would have predicted that many of our top HPC systems would be GPU-based architectures?

- a) Yes
- b) No
- c) Waffle 🙂



Q: Think forward 10 years. How many of you predict that most of our top HPC systems will have the following architectural features?

- a) X86 multicore CPU
- b) GPU
- c) FPGA/Reconfigurable processor
- d) Neuromorphic processor
- e) Deep learning processor
- f) Quantum processor
- g) RISC-V processor
- h) Some new unknown processor
- i) All/some of the above in one SoC



Q: Now imagine you are building a new application with ~3M LOC and 20 team members over the next 10 years. What on-node programming model/system do you use?

- a) C, C++, Fortran
- b) C++ templates, policies, etc (e.g., AMP, Kokkos, RAJA, )
- c) CUDA, cu\*\*\*, HIP
- d) OpenCL, SYCL
- e) OpenMP or OpenACC
- f) R, Python, Matlab, etc
- g) A Domain Specific Language (e.g., Claw, PySL)
- h) A Domain Specific Framework (e.g., PetSc)
- i) Some new unknown programming approach
- j) All/some of the above



# **Motivating Trends**



## **Contemporary devices are approaching fundamental limits**



Dennard scaling has already ended. Dennard observed that voltage and current should be proportional to the linear dimensions of a transistor: 2x transistor count implies 40% faster and 50% more efficient.

R.H. Dennard, F.H. Gaensslen, V.L. Rideout, E. Bassous, and A.R. LeBlanc, "Design of ion-implanted MOSFET's with very small physical dimensions," *IEEE Journal of Solid-State Circuits*, 9(5):256-68, 1974,

32



Figure 1 | As a metal oxide-semiconductor field effect transistor (MOSFET) shrinks, the gate dielectric (yellow) thickness approaches several atoms (0.5 nm at the 22-nm technology node). Atomic spacing limits the



Figure 2 | As a MOSFET transistor shrinks, the shape of its electric field departs from basic rectilinear models, and the level curves become disconnected. Atomic-level manufacturing variations, especially for dopant

I.L. Markov, "Limits on fundamental limits to computation," Nature, 512(7513):147-54, 2014, doi:10.1038/nature13570.



#### designlines AUTOMOTIVE

#### News & Analysis Foundries' Sales Show Hard Times Continuing

Peter Clarke 5/23/2016 09:33 PM EDT 2 comments f Like < 6 🎔 Tweet in Share 43 G **SEMICONDUCTOR** ENGINEERING nd UMC, tw LON with recent sen winter is no ma **Uncertainty Grows For** Bot ales that we 5nm, 3nm thos as after both f У in 🖓 🚱 🖂 🕂 74 revenue inc d nanowire FETs under development, bu cause they eetasia.con

GlobalFoundries Forfeit 7nm Manufacturing - EE Times Asia

#### <sup>6-7 mint</sup> SAN, Intel's 10nm Is Broken, the bi Delayed Until 2019

Globa

subsi

3 Share Post

DESIGNLINES | WIRELESS AND NETWORKING DESIGNLINE

### GlobalFoundries Selling ASIC Business to Marvell Another Step Toward the End of

2030

By Dylan McGrath, 05.20.19 🔲 1

**Moore's Law** Samsung and TSMC move to 5-nanometer manufacturing

Samsung to Invest \$115 Billion in Foundry & Chip Businesses by

37

COMMENTS

ers of

evelopers

| Number of Foundries with a Cutting Edge Logic Fab |            |           |                 |             |             |             |             |         |         |        |
|---------------------------------------------------|------------|-----------|-----------------|-------------|-------------|-------------|-------------|---------|---------|--------|
| SilTerra                                          |            |           |                 |             |             |             |             |         |         |        |
| X-FAB                                             |            |           |                 |             |             |             |             |         |         |        |
| Dongbu HiTek                                      |            |           |                 |             |             |             |             |         |         |        |
| ADI                                               | ADI        |           |                 |             |             |             |             |         |         |        |
| Atmel                                             | Atmel      |           |                 |             |             |             |             |         |         |        |
| Rohm                                              | Rohm       |           |                 |             |             |             |             |         |         |        |
| Sanyo                                             | Sanyo      |           |                 |             |             |             |             |         |         |        |
| Mitsubishi                                        | Mitsubishi |           |                 |             |             |             |             |         |         |        |
| ON                                                | ON         |           |                 |             |             |             |             |         |         |        |
| Hitachi                                           | Hitachi    |           |                 |             |             |             |             |         |         |        |
| Cypress                                           | Cypress    | Cypress   |                 |             |             |             |             |         |         |        |
| Sony                                              | Sony       | Sony      |                 |             |             |             |             |         |         |        |
| Infineon                                          | Infineon   | Infineon  |                 |             |             |             |             |         |         |        |
| Sharp                                             | Sharp      | Sharp     |                 |             |             |             |             |         |         |        |
| Freescale                                         | Freescale  | Freescale |                 |             |             |             |             |         |         |        |
| Renesas (NEC)                                     | Renesas    | Renesas   | Renesas         | Renesas     |             |             |             |         |         |        |
| SMIC                                              | SMIC       | SMIC      | SMIC            | SMIC        |             |             |             |         |         |        |
| Toshiba                                           | Toshiba    | Toshiba   | Toshiba         | Toshiba     |             |             |             |         |         |        |
| Fujitsu                                           | Fujitsu    | Fujitsu   | Fujitsu         | Fujitsu     |             |             |             |         |         |        |
| TI                                                | TI         | TI        | ТІ              | TI          |             |             |             |         |         |        |
| Panasonic                                         | Panasonic  | Panasonic | Panasonic       | Panasonic   | Panasonic   |             |             |         |         |        |
| STMicroelectronics                                | STM        | STM       | STM             | STM         | STM         |             |             |         |         |        |
| UMC                                               | UMC        | UMC       | UMC             | UMC         | UMC         |             |             |         |         |        |
| IBM                                               | IBM        | IBM       | IBM             | IBM         | IBM         | IBM         |             |         |         |        |
| AMD                                               | AMD        | AMD       | GlobalFoundries | GF          | GF          | GF          | GF          |         |         |        |
| Samsung                                           | Samsung    | Samsung   | Samsung         | Samsung     | Samsung     | Samsung     | Samsung     | Samsung | Samsung |        |
| TSMC                                              | TSMC       | TSMC      | TSMC            | TSMC        | TSMC        | TSMC        | TSMC        | TSMC    | TSMC    |        |
| Intel                                             | Intel      | Intel     | Intel           | Intel       | Intel       | Intel       | Intel       | Intel   | Intel   | Future |
| 180 nm                                            | 130 nm     | 90 nm     | 65 nm           | 45 nm/40 nm | 32 nm/28 nm | 22 nm/20 nm | 16 nm/14 nm | 10 nm   | 7 nm    | 5 nm   |
|                                                   |            |           |                 |             |             |             |             |         |         |        |

#### Business climate reflects this uncertainty, cost, complexity, consolidation

#### NVIDIA Buys Mellanox To Bring HPC Scaling To Data Centers

Kevin Krewell Contributor Tirias Research Contributor Group Enterprise & Cloud

The 2019 semiconductor merger and acquisition season has officially been

#### kicked ( nytimes.com

technol

#### offer aft and Xili said in a Hewlett Packard Enterprise to Acquire Supercomputer Pioneer Cray

on the c 5-6 minutes

center p perform Technology|Hewlett Packard Enterprise to Acquire Supercomputer Pioneer Cray



Hewlett Packard Enterprise will pay about \$1.4 billion to acquire Cray, which has designed some of the most powerful computer systems in use.CreditPaco Freire/SOPA Images, via LightRocket and Getty Images





The effort makes Amazon the latest major tech company, after Google and Apple, to design its own AI chips, in hopes of differentiating their products from those of rivals. That strategy has major ramifications for chip companies like Intel and Nvidia, which are now competing with companies that previously

Britain's bigg¢ purchased only six months ago, placing 25 per cent of Britain's largest technology ARM Holding: company into a new, Saudi-backed \$100bn investment fund.

#### SANDISK COMPLETES ACQUISITION OF FUSION



#### DESIGNLINES | MEMORY DESIGNLINE

#### Q1 Chip Sales Drop Among Largest on Record

By Dylan McGrath, 05.01.19 🔲 0

Share Post

SAN FRANCISCO — Global chip sales sank by 15.5% sequentially in the first quart among the largest quarter-to-quarter declines for the industry in the last 35 years

Chip sales totaled \$96.8 billion in the first quarter, down from \$114.7 billion last y according to the World Semiconductor Trade Statistics (WSTS) organization, whic sales data from chipmaker member companies. On a year-over-year basis, first-qu

## Sixth Wave of Computing



http://www.kurzweilai.net/exponential-growth-of-computing



### Q: when was the field effect transistor patented?

### Lilienfeld patents field effect transistor, October 8, 1926

Jessica MacNeil -October 08, 2018 6 Comments

On this day in tech history, JE Lilienfeld filed a patent for a threeelectrode structure using copper-sulfide semiconductor material, known today as a field-effect transistor.

Lilienfeld's patent for a "**method and apparatus for controlling** electric currents" was granted on January 28, 1930.

According to the patent, his invention was for controlling the flow of electric current between two terminals of an electrically

conducting solid by establishing a third potential between the terminals, particularly for the amplification of oscillating currents like those in radio communication.







#### Optimize Software and Expose New Hierarchical Parallelism

- Redesign software to boost performance on upcoming architectures
- Exploit new levels of parallelism and efficient data movement

Architectural Specialization and Integration

- Use CMOS more effectively for specific workloads
- Integrate components to boost performance and eliminate inefficiencies
- Workload specific memory+storage system design

- Investigate new computational paradigms
  - Quantum
  - Neuromorphic
  - Advanced Digital
  - Emerging Memory Devices



#### Optimize Software and Expose New Hierarchical Parallelism

- Redesign software to boost performance on upcoming architectures
- Exploit new levels of parallelism and efficient data movement

Architectural Specialization and Integration

- Use CMOS more effectively for specific workloads
- Integrate components to boost performance and eliminate inefficiencies
- Workload specific memory+storage system design

- Investigate new computational paradigms
  - Quantum
  - Neuromorphic
  - Advanced Digital
  - Emerging Memory Devices



#### Optimize Software and Expose New Hierarchical Parallelism

- Redesign software to boost performance on upcoming architectures
- Exploit new levels of parallelism and efficient data movement

Architectural Specialization and Integration

- Use CMOS more effectively for specific workloads
- Integrate components to boost performance and eliminate inefficiencies
- Workload specific memory+storage system design

- Investigate new computational paradigms
  - Quantum
  - Neuromorphic
  - Advanced Digital
  - Emerging Memory Devices



# Quantum computing: Qubit design and fabrication have made recent progress but still face challenges

Science 354, 1091 (2016) - 2 December

#### A bit of the action

In the race to build a quantum computer, companies are pursuing many types of quantum bits, or qubits, each with its own strengths and weaknesses.



Note: Longevity is the record coherence time for a single qubit superposition state, logic success rate is the highest reported gate fidelity for logic operations on two qubits, and number entangled is the maximum number of qubits entangled and capable of performing two-qubit operations.

The National Academies of SCIENCES • ENGINEERING • MEDICINE

CONSENSUS STUDY REPORT

#### **QUANTUM COMPUTING** Progress and Prospects



FIGURE 7.4 An illustration of potential milestones of progress in quantum computing. The arrangement of milestones corresponds to the order in which the committee thinks they are likely to be achieved; however, it is possible that some will not be achieved, or that they will not be achieved in the order indicated.



40

# **Neuromorphic (Brain Inspired) Computing**



One of Intel's Nahuku boards, each of which contains 8 to 32 Intel Loihi neuromorphic chips, shown here interfaced to an Intel Arria 10 FPGA development kit. Intel's latest neuromorphic system, Poihoiki Beach, annuounced in July 2019, is made up of multiple Nahuku boards and contains 64 Loihi chips. Pohoiki Beach was introduced in July 2019. (Credit: Tim Herman/Intel Corporation)

https://newsroom.intel.com/news/intels-pohoiki-beach-64-chip-neuromorphic-system-delivers-breakthrough-results-research-tests/

41

- SpiNNaker
- IBM True North
- BrainScaleS
- DANNA

• Others...



https://m-cacm.acm.org/news/201072-thefuture-of-microchips



## **Emerging Memory Devices**





### New devices: Carbon Nanotube Transistors and Circuits



https://cen.acs.org/materials/electronic-materials/Carbon-nanotube-computers-face-makebreak/97/i8



A wafer contains hundreds of tiny computer chips made from carbon nanotubes, which switch faster and more efficiently than transistors made from silicon. STANFORD ENGINEERING

#### Beyond silicon: \$1.5 billion U.S. program aims to spur new types of computer chips

By Robert F. Service | Jul. 24, 2018 , 8:30 AM



#### Optimize Software and Expose New Hierarchical Parallelism

- Redesign software to boost performance on upcoming architectures
- Exploit new levels of parallelism and efficient data movement

Architectural Specialization and Integration

- Use CMOS more effectively for specific workloads
- Integrate components to boost performance and eliminate inefficiencies
- Workload specific memory+storage system design

- Investigate new computational paradigms
  - Quantum
  - Neuromorphic
  - Advanced Digital
  - Emerging Memory Devices



# Pace of Architectural Specialization is Quickening

- Industry, lacking Moore's Law, will need to continue to differentiate products (to stay in business)
  - Use the same transistors differently to enhance performance
- Architectural design will become extremely important, critical
  - Dark Silicon
  - Address new parameters for benefits/curse of Moore's Law
- 50+ new companies focusing on hardware for Machine Learning



Intel's Nervana AI platform takes aim

At the forefront of the firm's AI ambitions is the Intel Nervana platform, which was announced on Thursday following intel's acquisition of deep learning startup Nervana Systems earlier this year.

http://www.theinquirer.net/inquirer/news/2477796/intels-nerva ai-platform-takes-aim-at-nvidias-gpu-techology



D.E. Shaw, M.M. Deneroff, R.O. Dror et al., "Anton, a special-purpose machine for molecular dynamics simulation," Communications of the ACM, 51(7):91-7, 2008.



#### GOOGLE BUILT ITS VERY OWN CHIPS TO POWER ITS AI BOTS



GOGGLE HAS DESIGNED its own computer chip for driving deep neural networks, an AI technology that is reinventing the way Internet services operate.

This morning at Google I/O, the centerpiece of the company's year. CSO Sundar Pichai said that Google has designed an <u>ASIC</u>, or application-specific integrated circuit, that's specific to deep neural nets. These are networks of

http://www.wired.com/2016/05/google-tpu-custom-chips/



TOM SIMONITE BUSINESS 11.27.18 08:12 PM



Mazon Web Services CEO Andy Jassy speaks at an event in San Francisco in 2017. DAVID PAUL MORRIS/BLOOMBERG/GETTY IMAGES

BIG SOFTWARE COMPANIES don't just stick to software any more—they build computer chips. The latest proof comes from Amazon, which announced late Monday that its cloud computing division has created its own chips to power customers' websites and other services. The chips, dubbed Graviton, are built around the same technology that powers smartphones and tablets. That approach has been much discussed in the cloud industry but never



https://fossbytes.com/nvidia-volta-gddr6-2018/







### Analysis of Apple A-\* SoCs







### **Growing Open Source Hardware Movement Enables Rapid Chip Design**



Ariane, PicoRV32, Piccolo,<br/>SCR1, Hummingbird, ...Codasip, Cortus, C-Sky,<br/>Nuclei, SiFive, Syntacore, ...

RISC-V Summit, 2018

National

# **DARPA ERI Programs Aiming for Agile (and Frequent) Chip Creation**





# Transition Period will be Disruptive – Opportunities and Pitfalls Abound

- New devices and architectures may not be hidden in traditional levels of abstraction
- Examples
  - A new type of CNT transistor may be completely hidden from higher levels
  - A new paradigm like quantum may require new architectures, programming models, and algorithmic approaches

| Layer       | Switch, 3D | NVM | Approximate | Neuro | Quantum |
|-------------|------------|-----|-------------|-------|---------|
| Application | 1          | 1   | 2           | 2     | 3       |
| Algorithm   | 1          | 1   | 2           | 3     | 3       |
| Language    | 1          | 2   | 2           | 3     | 3       |
| API         | 1          | 2   | 2           | 3     | 3       |
| Arch        | 1          | 2   | 2           | 3     | 3       |
| ISA         | 1          | 2   | 2           | 3     | 3       |
| Microarch   | 2          | 3   | 2           | 3     | 3       |
| FU          | 2          | 3   | 2           | 3     | 3       |
| Logic       | 3          | 3   | 2           | 3     | 3       |
| Device      | 3          | 3   | 2           | 3     | 3       |

Adapted from IEEE Rebooting Computing Chart



# **Department of Energy (DOE) Roadmap to Exascale Systems**

An impressive, productive lineup of *accelerated node* systems supporting DOE's mission



#### U.S. Department of Energy and Cray to Deliver Record-Setting Frontier Supercomputer at ORNL

# Exascale system expected to be world's most powerful computer for science and innovation

Topic: Supercomputing

May 7, 2019



OAK RIDGE, Tenn., May 7, 2019—The U.S. Department of Energy today anr contract with Cray Inc. to build the Frontier supercomputer at Oak Ridge Nation which is anticipated to debut in 2021 as the world's most powerful computer v performance of greater than 1.5 exaflops.

Scheduled for delivery in 2021, Frontier will accelerate innovation in science and technology and maintain U.S. leadership in high-performance computing and artificial intelligence. The total contract award is valued at more than \$600 million for the system and technology development. The system will be based on Cray's new Shasta architecture and Slingshot interconnect and will feature high-performance AMD EPYC CPU and AMD Radeon Instinct GPU technology.

# Frontier | 7 May 2019

| Peak Performance     | >1.5 EF                                                                                                                                                                       |
|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Footprint            | > 100 cabinets                                                                                                                                                                |
| Node                 | 1 HPC and AI Optimized AMD EPYC CPU<br>4 Purpose Built AMD Radeon Instinct GPU                                                                                                |
| CPU-GPU Interconnect | AMD Infinity Fabric<br>Coherent memory across the node                                                                                                                        |
| System Interconnect  | Multiple Slingshot NICs providing 100 GB/s network bandwidth<br>Slingshot dragonfly network which provides adaptive routing, congestion management<br>and quality of service. |
| Storage              | 2-4x performance and capacity of Summit's I/O subsystem. Frontier will have near node storage like Summit.                                                                    |



### **ASCR Extreme Heterogeneity Workshop**

#### January 23-25, 2018 Virtual Meeting

- Goal: Identify Priority Research Directions for Computer Science needed to make future supercomputers usable, useful and secure for science applications in the 2025-2040 timeframe
  - Note that <u>quantum computing</u> was defined as out of scope by ASCR.
- Primary focus on the software stack and programming models/environments/tools
- 150+ participants: DOE labs, academia, and industry
- White papers solicited (106 received!) to contribute to the FSD, identify potential participants, and help refine the agenda
- First ASCR workshop to use Basic Research Needs format (BES inspired)
  - Summit, Summit report, Factual Status Document, whitepapers, BRN/PRD result
- Organizing Committee
  - Jeffrey Vetter (ORNL), Lead Organizer and Program Committee Chair
  - Ron Brightwell (Sandia-NM), Pat McCormick (LANL), Rob Ross (ANL), John Shalf (LBNL)
  - Lucy Nowell, ASCR Program Manager
- Program Committee Members
  - Katy Antypas (LBNL, NERSC), David Donofrio (LBNL), Maya Gokhale (LLNL), Travis Humble (ORNL), Catherine Schuman (ORNL), Brian Van Essen (LLNL), Shinjae Yoo (BNL)

https://orau.gov/exheterogeneity2018/ https://doi.org/10.2172/1473756





# **EH Priority Research Directions (PRDs)**

| Maintaining and improving programmer productivity                         | <ul> <li>Flexible, expressive, programming models and languages</li> <li>Intelligent, domain-aware compilers and tools</li> <li>Composition of disparate software components</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Managing resources<br>intelligently                                       | <ul> <li>Automated methods using introspection and machine learning</li> <li>Optimize for performance, energy efficiency, and availability</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| Modeling & predicting<br>performance                                      | <ul> <li>Evaluate impact of potential system designs and application mappings</li> <li>Model-automated optimization of applications</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| Enabling reproducible<br>science despite non-<br>determinism & asynchrony | <ul> <li>Methods for validation on non-deterministic architectures</li> <li>Detection and mitigation of pervasive faults and errors</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| Facilitating Data<br>Management, Analytics, and<br>Workflows              | <ul> <li>Mapping of science workflows to heterogeneous hardware and software services</li> <li>Adapting workflows and services to meet facility-level objectives through learning approaches</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
|                                                                           | Source Control |

# Future Technologies Group (FTG)

Jeffrey S. Vetter, Group Leader

The Future Technologies Group performs research in core technologies for emerging generations of high-end computing architectures, including prototype computer architectures and experimental software systems. We investigate these technologies with the goal of improving the performance, energy efficiency, reliability, and productivity of these architectures for our sponsors and applications teams. See <u>http://ft.ornl.gov</u>.





#### Key Technical Areas

- Heterogeneous
   architectures
- Deep memory hierarchies including non-volatile memory
- Performance measurement, analysis, simulation, and modeling of emerging architectures.
- Programming systems to address emerging architectures

Mathematics Division

 Beyond Moore's Computing

#### Software Artifacts

- Scalable Heterogeneous Computing Benchmarks (SHOC)
- mpiP
- DESTINY
- Aspen
- OpenARC
- Papyrus
- NVL-C
- Oxbow
- LLVM Clacc and Parallel IR
- DRAGON
- RISC-V Extensions

#### Sponsors

- DOE ASCR, BER
- DOE Exascale Computing
   Project
- DOE SciDAC
- DARPA
- ORNL LDRD
- National Science
   Foundation
- Department of Defense
- NIH

#### Impact

rria

- Publications in SC, ICS, HPDC, TPDS, DATE, PLDI, IPDPS, Trans VLSI, etc.
- Two Gordon Bell awards
- NSF Keeneland
- DOE Titan
- IEEE TCHPC Early Career
- IEEE Fellows
- ~100 interns
- ~130 FTG seminars



FusionIO



https://www.thebroadcastbridge.com/con

| Progre                                                                                                | ssion of Expe                                                                                                                                                                                          | TRL 7-9 Operational                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                                                                                                             |
|-------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Techno<br>TRL 1-3 Basi<br>• Examples:<br>nanotube<br>memristor<br>neuromor<br>chip-level<br>photonics | concepts<br>carbon-<br>computing,<br>based<br>phic computing,<br>silicon<br>, universal                                                                                                                | TRL 4-6 Emerging<br>• Examples: FPGAs in HPC,<br>TrueNorth, SpiNNaker, D-<br>Wave, Emu, many SoC-<br>based systems, TPU, Gen-Z<br>NoCs, near-memory<br>computing<br>Evaluate, Se<br>Emerging Com                                                                                                                                                      | <ul> <li>Examples: Titan, Cori, Mira,<br/>Summit, BlueWaters,<br/>Keeneland, Stampede,<br/>Tsubame2.5</li> <li>Bect, and Improve<br/>lect, and Improve<br/>puting Technologies</li> <li>Production</li> </ul>                                               |
| "Be                                                                                                   | Limite<br>nch" System                                                                                                                                                                                  | ed Access Testbed<br>CS & Math Research                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                             |
| equanium<br>"Be<br>Programming                                                                        | Limite<br>nch" System<br>Assembly language, or less                                                                                                                                                    | ed Access Testbed<br>CS & Math Research<br>Few, if any, development tools                                                                                                                                                                                                                                                                             | Language support and compilers.                                                                                                                                                                                                                             |
| Programming<br>OS-R                                                                                   | Limite<br>nch" System<br>Assembly language, or less<br>Manual                                                                                                                                          | Access Testbed<br>CS & Math Research<br>Few, if any, development tools<br>Specialized programming environments and OSs                                                                                                                                                                                                                                | Language support and compilers.<br>Commodity OS & runtime systems                                                                                                                                                                                           |
| Programming<br>OS-R<br>Scale                                                                          | Limite<br>nch" System<br>Assembly language, or less<br>Manual<br>Small collections of devices                                                                                                          | Access Testbed<br>CS & Math Research<br>Few, if any, development tools<br>Specialized programming environments and OSs<br>Single to hundreds of engineered processing elements                                                                                                                                                                        | Language support and compilers.         Commodity OS & runtime systems         >10,000 processing elements                                                                                                                                                  |
| Programming<br>OS-R<br>Scale<br>Performance                                                           | Limite<br>Computing<br>Limite<br>Nch" System<br>Assembly language, or less<br>Manual<br>Small collections of devices<br>Analytical projections based<br>on device empirical<br>evaluation.             | Access Testbed<br>CS & Math Research<br>Few, if any, development tools<br>Specialized programming environments and OSs<br>Single to hundreds of engineered processing elements<br>Analytical projections or simulation based on component<br>or pilot system empirical evaluation.                                                                    | Language support and compilers.<br>Commodity OS & runtime systems<br>>10,000 processing elements<br>Empirical evaluation of prototype and final<br>systems.                                                                                                 |
| Programming<br>OS-R<br>Scale<br>Performance<br>Apps                                                   | Limite<br>Limite<br>nch" System<br>Assembly language, or less<br>Manual<br>Small collections of devices<br>Analytical projections based<br>on device empirical<br>evaluation.<br>Small encoded kernels | Access Testbed<br>CS & Math Research<br>Few, if any, development tools<br>Specialized programming environments and OSs<br>Single to hundreds of engineered processing elements<br>Analytical projections or simulation based on component<br>or pilot system empirical evaluation.<br>Architecture-aware algorithms; Mini-apps; Small<br>applications | <ul> <li>Language support and compilers.</li> <li>Commodity OS &amp; runtime systems</li> <li>&gt;10,000 processing elements</li> <li>Empirical evaluation of prototype and final systems.</li> <li>Numerical libraries; Full scale applications</li> </ul> |

# **ORNL ExCL Model**

https://excl.ornl.gov

- Provide low-level access to emerging computer architectures to encourage experimentation and prototyping of new hardware and software solutions.
- Not just testbeds, but staff and software environments to support this mode of operation.

### **ExCL Common Infrastructure**



### **ExCL** Technology Pillars



90

### Learn more about ExCL or Apply for Access

ExCL Systems

#### https://excl.ornl.gov

Home

Accessing ExCL >

#### About > Subscribe to our Newsletter

### **ORNL Experimental Computing Laboratory** (ExCL)

Pathfinding the future of computing

Welcome

Welcome to ExCL! We are excited to collaborate with users exploring emerging computing technologies

The Experimental Computing Lab (ExCL) is a laboratory designed for computer science research. At a time where heterogeneity defines the path forward, this system offers heterogeneous resources that researchers can use in their work. The computational resources provided by ExCL comprise diverse technologies in terms of chips, memories, and storage. ExCL will also adapt to the everchanging computing ecosystem and will incorporate the latest technology and make it available to its users.

The system will support full configurability of the software stack. Users will be able to provision bare metal nodes and network interconnects to meet their computational requirements.

The Experimental Computing Lab will offer a mix of exclusive access nodes and shared nodes where users will be able to carry out their research. It follows a novel design that allows a high degree of flexibility for users and administrators to accommodate a wide range of experiments.

#### News

ExCL announces availability of Intel Stratix 10 FPGA 16 April 2019

ExCL announces availability of Summit node for software development and benchmarking 12 April 2019

Call for proposals 12 April 2019

Revisiting the 2008 Exascale Computing Study at SC18 1 December 2018

FTG participates in DARPA ERI Summit 25 July 2018

#### Upcoming Events

There are no upcoming events at this time.

#### Accessing ExCL

Thanks for your interest in ExCL. We provide access to researchers using the following criteria: 1/ the researcher can demonstrate a need for experimental computer science on ExCL resources, 2/ the researcher can show a sufficient level of competency with the target resource and privilege level, and 3/ ExCL staff has sufficient resources in terms of hardware and staff to satisfy the researcher's request.

To use ExCL, researches need to have an approved project and an active account. The checklist below enumerates the steps for applying for access. We make project awards on at least a quarterly basis to industry, academia, laboratories, and others. Duration of projects is typically three or six months. Some systems have restrictions on access, such as the requirement for an NDA with the vendor, that we must navigate for each user, which may extend the time required for approval

If you have questions or need assistance, please contact excl-help@ornl.gov.

#### https://excl.ornl.gov/accessing-excl/



The exclusive access nodes allow privileged bare metal access to the entire compute node to the

**CAK RIDGE** National Laboratory

### **Take Away Messages**

- 1. Moore's Law as we know it is definitely ending for either economic or technical reasons in the next by 2025
- 2. CMOS continues indefinitely incredible technology!
  - 1. Specialization use the same transistors differently
  - 2. Architecting effective solutions will be critical for industry, HPC
- 3. Parallelism our area of expertise will continue to be the major contributor to performance improvements in HPC, enterprise for moving forward for the next decade
  - 1. Interconnect and memory bandwidth and capacity will need to improve
- 4. Our community must aggressively explore <u>emerging technologies now!</u>
  - 1. Some technologies will disrupt entire stack
- 5. Tremendous opportunities and challenges in designing and deploying these new technologies with massive existing software
  - 1. Many opportunities to provide new software frameworks for fundamental computer science problems: resource management, mapping, programming models, portability, algorithms, etc.
- 6. Start exploring these new technologies!
- 7. Talk to your colleagues in physics, chemistry, electrical engineering, math, etc



# Recap

- Recent trends in computing paint an ambiguous future (for HPC and broader community)
  - Contemporary systems provide evidence that power constraints are driving architectures to change rapidly (e.g., Dennard, Moore)
  - Multiple architectural dimensions are being (dramatically) redesigned: Processors, node design, memory systems, I/O
- Major transition point for computing
  - New devices
  - New architectures
  - New programming systems
- Complexity and uncertainty are ubiquitous
- Programming systems must provide performance portability (in addition to functional portability)!!
- In near term, rate of change will accelerate and grow more diverse

#### • Visit us

- We host interns and other visitors year round
  - Faculty, grad, undergrad, high school, industry
- Jobs in FTG
  - Postdoctoral Research Associate in Computer Science
  - Software Engineer
  - Computer Scientist
  - Visit <u>https://jobs.ornl.gov</u>
- Contact me <u>vetter@ornl.gov</u>



# Acknowledgements



- Contributors and Sponsors
  - Future Technologies Group: <u>http://ft.ornl.gov</u>
  - US Department of Energy Office of Science
    - Exascale Computing Project
    - DOE Vancouver Project: <u>https://ft.ornl.gov/trac/vancouver</u>
    - DOE Blackcomb Project: <u>https://ft.ornl.gov/trac/blackcomb</u>
    - SciDAC RAPIDS Project
  - US DARPA





# **Bonus Material**



### Memory Systems Started Diversifying Several Years Ago

- Architectures
  - HMC, HBM/2/3, LPDDR4, GDDR5X, etc
  - 2.5D, 3D Stacking
- Configurations
  - Unified memory
  - Scratchpads
  - Write through, write back, etc
  - Consistency and coherence protocols
  - Virtual v. Physical, paging strategies
- New devices
  - ReRAM, PCRAM, STT-MRAM, 3D-Xpoint
- Integrating compute and memory
  - PIM, CIM, In-mem





Copyright (c) 2014 Hiroshige Goto All rights reserved.

|                             | SRAM    | DRAM    | eDRAM   | 2D NAND<br>Flash | 3D NAND<br>Flash | PCRAM                             | STTRAM | 2D ReRAM                          | 3D ReRAM |
|-----------------------------|---------|---------|---------|------------------|------------------|-----------------------------------|--------|-----------------------------------|----------|
| Data Retention              | N       | N       | N       | Y                | Y                | Y                                 | Y      | Y                                 | Y        |
| Cell Size (F2)              | 50-200  | 4-6     | 19-26   | 2-5              | <1               | 4-10                              | 8-40   | 4                                 | <1       |
| Minimum F demonstrated (nm) | 14      | 25      | 22      | 16               | 64               | 20                                | 28     | 27                                | 24       |
| Read Time (ns)              | < 1     | 30      | 5       | 104              | 10 <sup>4</sup>  | 10-50                             | 3-10   | 10-50                             | 10-50    |
| Write Time (ns)             | < 1     | 50      | 5       | 105              | 10 <sup>5</sup>  | 100-300                           | 3-10   | 10-50                             | 10-50    |
| Number of Rewrites          | 1016    | 1016    | 1016    |                  |                  | 10 <sup>8</sup> -10 <sup>10</sup> | 1015   | 10 <sup>4</sup> -10 <sup>12</sup> | 108-1012 |
| Read Power                  | Low     | Low     | Low     | High             | High             | Low                               | Medium | Medium                            | Medium   |
| Write Power                 | Low     | Low     | Low     | High             | High             | High                              | Medium | Medium                            | Medium   |
| Power (other than R/W)      | Leakage | Refresh | Refresh | None             | None             | None                              | None   | Sneak                             | Sneak    |
| Maturity                    |         |         |         |                  |                  |                                   |        |                                   |          |

J.S. Vetter and S. Mittal, "Opportunities for Nonvolatile Memory Systems in Extreme-Scale High Performance Computing," CiSE, 17(2):73-82, 2015.



**Fig. 4.** (a) A typical 111R structure of RRAM with HfO<sub>x</sub>; (b) HR-TEM image of the TiN/Ti/HfO<sub>x</sub>/TiN stacked layer; the thickness of the HfO<sub>2</sub> is 20 nm.

H.S.P. Wong, H.Y. Lee, S. Yu et al., "Metal-oxide RRAM," Proceed



### **NVRAM Technology Continues to Improve – Driven by Broad Market Forces**



The forecasted total of \$102 billion for the overall semiconductor industry — including upgrades to existing wafer fab lines and brand new manufacturing facilities — would m

110