Tutorial Proposal ISCA 2026

SCALE-Sim Across the Silicon Lifecycle:
Pre-Silicon Cycle-Accurate Simulation for Next-Gen AI Accelerators (SCALE-Sim v3) and Post-Silicon Evaluation on State-of-the-art AI Accelerators (SCALE-Sim TPU)

28 June 2026, Sunday Afternoon Session  ·  Half-Day

Abstract

Modern design and simulation methodologies have led to design tools being increasingly used in the AI landscape — enabling rapid design-space exploration (DSE) and performance/power analysis to meet the demands of modern AI workloads. In this landscape, SCALE-Sim helps in designing next-gen AI accelerators through pre-silicon cycle-accurate full-system simulation and choosing AI models through post-silicon evaluation on state-of-the-art (SOTA) AI accelerators.

SCALE-Sim is widely used in industry including Arm and IMEC as well as academia, including Stanford University, with more than 500 GitHub stars. In this tutorial, we will present two major updates to the SCALE-Sim infrastructure.

Tutorial Overview

SCALE-Sim v3 Overview
Figure: Overview of SCALE-Sim v3 highlighting the new features over SCALE-Sim v2
SCALE-Sim TPU Validation Overview
Figure: SCALE-Sim TPU post-silicon evaluation on Google TPU v4 and TPU v6e

SCALE-Sim v3: Pre-Silicon

A modular, cycle-accurate simulator that extends v2 with five significant enhancements:

  • Multi-core simulation with spatio-temporal partitioning
  • Support for sparse matrix multiplications (SpMM) with layer-wise and row-wise sparsity
  • Integration with Ramulator for detailed DRAM analysis
  • Precise on-chip data layout modeling
  • Energy and power estimation via Accelergy

SCALE-Sim TPU: Post-Silicon

Validated against measured on-device runtimes on Google TPU v4 and TPU v6e:

  • Regression analysis across multiple matrix-size regimes (<128, 128–1024, 1024–4096)
  • Achieves up to R² = 0.99 against measured TPU fusion-kernel runtimes
  • Supports Weight-Stationary (WS) and Input-Stationary (IS) dataflow modes
  • Reliable cycle-level performance estimates for GEMM workloads on TPU-class architectures
  • Timely resource for studying deployment on upcoming Google TPUv7 (Ironwood)

Schedule

To be Updated

Prior Offerings

This is the first iteration of the SCALE-Sim tutorial presenting SCALE-Sim v3 and SCALE-Sim TPU. Previous iterations using v2 of the simulator:

ASPLOS 2021 View Tutorial
ISCA 2021 View Tutorial

Organizers

Dr. Tushar Krishna
Associate Professor, School of ECE, Georgia Institute of Technology  ·  tushar@ece.gatech.edu
Ritik Raj
Ph.D. Student, ECE, Georgia Institute of Technology  ·  ritik.raj@gatech.edu
Jingtian Dang
Ph.D. Student, ECE, Georgia Institute of Technology  ·  dangjingtian@gatech.edu
Dr. Sarbartha Banerjee
Postdoctoral Fellow, Georgia Institute of Technology  ·  sbanerjee76@gatech.edu

Invited Presenters

Dr. Suvinay Subramanian
Computer Architect, Google  ·  suvinay@google.com
Dr. Ananda Samajdar
Research Staff Member, IBM T.J. Watson Research Center  ·  ananda.samajdar@ibm.com

Resources

Papers

  • J. Dang, R. Raj, C. Man, J. Tong, and T. Krishna. "SCALE-Sim TPU: Validating and Extending SCALE-Sim for TPUs." arXiv preprint arXiv:2603.22535, 2026.
  • R. Raj, S. Banerjee, N. Chandra, Z. Wan, J. Tong, A. Samajdar, and T. Krishna. "SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis." In 2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 186–200. IEEE, 2025.
  • A. Samajdar, J.M. Joseph, Y. Zhu, P. Whatmough, M. Mattina, T. Krishna. "A systematic methodology for characterizing scalability of DNN accelerators using SCALE-Sim." In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 58–68. IEEE, 2020.

Source Code

View on GitHub