SCALE-Sim Across the Silicon Lifecycle:
Pre-Silicon Cycle-Accurate Simulation for Next-Gen AI Accelerators (SCALE-Sim v3)
and Post-Silicon Evaluation on State-of-the-art AI Accelerators (SCALE-Sim TPU)
Abstract
Modern design and simulation methodologies have led to design tools being increasingly used in the AI landscape — enabling rapid design-space exploration (DSE) and performance/power analysis to meet the demands of modern AI workloads. In this landscape, SCALE-Sim helps in designing next-gen AI accelerators through pre-silicon cycle-accurate full-system simulation and choosing AI models through post-silicon evaluation on state-of-the-art (SOTA) AI accelerators.
SCALE-Sim is widely used in industry including Arm and IMEC as well as academia, including Stanford University, with more than 500 GitHub stars. In this tutorial, we will present two major updates to the SCALE-Sim infrastructure.
Tutorial Overview
SCALE-Sim v3: Pre-Silicon
A modular, cycle-accurate simulator that extends v2 with five significant enhancements:
- Multi-core simulation with spatio-temporal partitioning
- Support for sparse matrix multiplications (SpMM) with layer-wise and row-wise sparsity
- Integration with Ramulator for detailed DRAM analysis
- Precise on-chip data layout modeling
- Energy and power estimation via Accelergy
SCALE-Sim TPU: Post-Silicon
Validated against measured on-device runtimes on Google TPU v4 and TPU v6e:
- Regression analysis across multiple matrix-size regimes (<128, 128–1024, 1024–4096)
- Achieves up to R² = 0.99 against measured TPU fusion-kernel runtimes
- Supports Weight-Stationary (WS) and Input-Stationary (IS) dataflow modes
- Reliable cycle-level performance estimates for GEMM workloads on TPU-class architectures
- Timely resource for studying deployment on upcoming Google TPUv7 (Ironwood)
Schedule
To be Updated
Prior Offerings
This is the first iteration of the SCALE-Sim tutorial presenting SCALE-Sim v3 and SCALE-Sim TPU. Previous iterations using v2 of the simulator:
Organizers
Invited Presenters
Resources
Papers
- J. Dang, R. Raj, C. Man, J. Tong, and T. Krishna. "SCALE-Sim TPU: Validating and Extending SCALE-Sim for TPUs." arXiv preprint arXiv:2603.22535, 2026.
- R. Raj, S. Banerjee, N. Chandra, Z. Wan, J. Tong, A. Samajdar, and T. Krishna. "SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis." In 2025 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 186–200. IEEE, 2025.
- A. Samajdar, J.M. Joseph, Y. Zhu, P. Whatmough, M. Mattina, T. Krishna. "A systematic methodology for characterizing scalability of DNN accelerators using SCALE-Sim." In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 58–68. IEEE, 2020.
Source Code
View on GitHub