1
|
Haensch W, Raghunathan A, Roy K, Chakrabarti B, Phatak CM, Wang C, Guha S. Compute in-Memory with Non-Volatile Elements for Neural Networks: A Review from a Co-Design Perspective. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023; 35:e2204944. [PMID: 36579797 DOI: 10.1002/adma.202204944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 11/01/2022] [Indexed: 06/17/2023]
Abstract
Deep learning has become ubiquitous, touching daily lives across the globe. Today, traditional computer architectures are stressed to their limits in efficiently executing the growing complexity of data and models. Compute-in-memory (CIM) can potentially play an important role in developing efficient hardware solutions that reduce data movement from compute-unit to memory, known as the von Neumann bottleneck. At its heart is a cross-bar architecture with nodal non-volatile-memory elements that performs an analog multiply-and-accumulate operation, enabling the matrix-vector-multiplications repeatedly used in all neural network workloads. The memory materials can significantly influence final system-level characteristics and chip performance, including speed, power, and classification accuracy. With an over-arching co-design viewpoint, this review assesses the use of cross-bar based CIM for neural networks, connecting the material properties and the associated design constraints and demands to application, architecture, and performance. Both digital and analog memory are considered, assessing the status for training and inference, and providing metrics for the collective set of properties non-volatile memory materials will need to demonstrate for a successful CIM technology.
Collapse
Affiliation(s)
- Wilfried Haensch
- Materials Science Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Anand Raghunathan
- Department of Electrical Engineering, Purdue University, West Lafayette, IN, 47907, USA
| | - Kaushik Roy
- Department of Electrical Engineering, Purdue University, West Lafayette, IN, 47907, USA
| | - Bhaswar Chakrabarti
- Department of Electrical Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Charudatta M Phatak
- Materials Science Division, Argonne National Laboratory, Lemont, IL, 60439, USA
| | - Cheng Wang
- Department of Electrical Engineering, Purdue University, West Lafayette, IN, 47907, USA
| | - Supratik Guha
- Materials Science Division, Argonne National Laboratory, Lemont, IL, 60439, USA
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, IL, 60637, USA
| |
Collapse
|
2
|
Rasch MJ, Mackin C, Le Gallo M, Chen A, Fasoli A, Odermatt F, Li N, Nandakumar SR, Narayanan P, Tsai H, Burr GW, Sebastian A, Narayanan V. Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators. Nat Commun 2023; 14:5282. [PMID: 37648721 PMCID: PMC10469175 DOI: 10.1038/s41467-023-40770-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 08/08/2023] [Indexed: 09/01/2023] Open
Abstract
Analog in-memory computing-a promising approach for energy-efficient acceleration of deep learning workloads-computes matrix-vector multiplications but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable inference accuracy. Here, we develop an hardware-aware retraining approach to systematically examine the accuracy of analog in-memory computing across multiple network topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a realistic crossbar model, we improve significantly on earlier retraining approaches. We show that many larger-scale deep neural networks-including convnets, recurrent networks, and transformers-can in fact be successfully retrained to show iso-accuracy with the floating point implementation. Our results further suggest that nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on accuracy, and that recurrent networks are particularly robust to all nonidealities.
Collapse
Affiliation(s)
- Malte J Rasch
- IBM Research, TJ Watson Research Center, Yorktown Heights, NY, USA.
| | | | | | - An Chen
- IBM Research Almaden, 650 Harry Road, San Jose, CA, USA
| | - Andrea Fasoli
- IBM Research Almaden, 650 Harry Road, San Jose, CA, USA
| | | | - Ning Li
- IBM Research, TJ Watson Research Center, Yorktown Heights, NY, USA
| | | | | | - Hsinyu Tsai
- IBM Research Almaden, 650 Harry Road, San Jose, CA, USA
| | | | | | - Vijay Narayanan
- IBM Research, TJ Watson Research Center, Yorktown Heights, NY, USA
| |
Collapse
|