ICS 2023: International Conference on Supercomputing

June 21-23, Orlando FL

In Conjunction with the Federated Computing Research Conference

Day 1: Wednesday, June 21


Opening Remarks (2.00 pm - 2.20 pm)


Session 1 (Plenary): Best Papers (2.20 pm - 3.40 pm), Location: Magnolia 10-11


Chair: Bill Magro, Google

FAZ: A flexible auto-tuned modular error-bounded compression framework for scientific data Jinyang Liu University of California, Riverside, Sheng Di Argonne National Laboratory Kai Zhao University of Alabama at Birmingham, Xin Liang University of Kentucky, Zizhong Chen University of California, Riverside, Franck Cappello Argonne National Laboratory.

Using Additive Modifications in LU Factorization Instead of Pivoting Neil Lindquist University of Tennessee, Piotr Luszczek University of Tennessee, Jack Dongarra University of Tennessee.

FLORIA: A Fast and Featherlight Approach for Predicting Cache Performance Jun Xiao Peking University, University of Amsterdam, Yaocheng Xiang Huawei, Peking University, Xiaolin Wang Peking University, Yingwei Luo Peking University, Andy Pimentel University of Amsterdam, Zhenlin Wang Michigan Tech.

FCRC Break (3.40 pm - 4.00 pm)


Session 2 (Plenary): Compilation and Scheduling (4.00 pm - 5.40 pm), Location: Magnolia 10-11


Chair: Chen Ding, University of Rochester

Transfer-learning-based Autotuning using Gaussian Copula Thomas Randall Clemson University, Jaehoon Koo Hanyang University, Brice Videau Argonne National Laboratory, Michael Kruse Argonne National Laboratory, Xingfu Wu Argonne National Laboratory, Paul Hovland Argonne National Laboratory, Mary Hall University of Utah, Rong Ge Clemson University, Prasanna Balaprakash Argonne National Laboratory.

Performance Embeddings: A Similarity-based Transfer Tuning Approach to Performance Optimization Lukas Trümper ETH Zurich, Tal Ben-Nun Lawrence Livermore National Laboratory, Philipp Schaad ETH Zurich, Alexandru Calotoiu ETH Zurich, Torsten Hoefler ETH Zurich,

CMLCompiler: A Unified Compiler for Classical Machine Learning Xu Wen Institute of Computing Technology, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Wanling Gao Institute of Computing Technology, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Anzheng Li Institute of Computing Technology, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Lei Wang Institute of Computing Technology, Chinese Academy of Sciences & University of Chinese Academy of Sciences, Zihan Jiang Huawei Technologies Co., Ltd., Jianfeng Zhan Institute of Computing Technology, Chinese Academy of Sciences & University of Chinese Academy of Sciences.

PAC: Preference-Aware Co-location Scheduling on Heterogeneous NUMA Architectures To Improve Resource Utilization Pu Pang Shanghai Jiao Tong University, Yaoxuan Li Shanghai Jiao Tong University, Bo Liu Shanghai Jiao Tong University, Quan Chen Shanghai Jiao Tong University, Zhou Yu Shuhai Lab, Huawei Cloud Computing Technologies Co., Ltd, Zhibin Yu Shuhai Lab, Huawei Cloud Computing Technologies Co., Ltd, Deze Zeng China University of Geosciences, Jingwen Leng Shanghai Jiao Tong University, Jieru Zhao Shanghai Jiaotong University, Minyi Guo Shanghai Jiao Tong University.

Day 2: Thursday, June 22


Session 3 (Plenary): Keynote Talk (8.45 am - 9.45 am), Location: Magnolia 10-11


Chair: Dimitrios S. Nikolopoulos, Virginia Tech

Title: HPC is dead, long live HPC! The future of HPC in a post-exascale era.
Speaker: William (Bill) Magro (Google).

Session 4A (Parallel): Tools and Libraries (9.45 am - 11.05 am), Location: Magnolia 10-11


Chair: Dingwen Tao, Indiana University

BiRFIA: Selective Binary Rewriting for Function Interception on ARM Kelun Lei Beihang University, Xin You Beihang University, Hailong Yang Beihang University, Zhongzhi Luan Beihang University, Depei Qian Beihang University.

Lightweight Huffman Coding for Efficient GPU Compression Milan Shah North Carolina State University, Xiaodong Yu Argonne National Laboratory, Sheng Di Argonne National Laboratory, Michela Becchi North Carolina State University, Franck Cappello Argonne National Laboratory.

Towards a Unified Implementation of GEMM in BLIS RuQing G. Xu The University of Tokyo, Field G. Van Zee The University of Texas at Austin, Robert A. van de Geijn The University of Texas at Austin.

Session 4B (Parallel): I/O and Storage (9.45 am - 11.05 am), Location: Magnolia 12


Chair: John Sopka

Use Only What You Need: Judicious Parallelism For File Transfers in High Performance Networks
Md Arifuzzaman University of Nevada, Reno, Engin Arslan University of Nevada, Reno.

DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access Meghana Madhyastha Johns Hopkins University, Robert Underwood Argonne National Laboratory, Randal Burns Johns Hopkins University, Bogdan Nicolae Argonne National Laboratory.

DyVer: Dynamic Version Handling for Array Databases Amelie Chi Zhou Shenzhen University, Zhoubin Ke Shenzhen University, Jianming Lao University of California Irvine.

FCRC Break (11.05 am - 11.20 am)



FCRC Plenary (11.20 am - 12.30 pm)



FCRC Lunch (12.30 pm - 2.00 pm)



Session 5A (Parallel): Accelerator Programming I (2.00 pm - 3.40 pm), Location: Magnolia 10-11


Chair: Milind Kulkarni, Purdue University

Accelerating BWA-MEM Read Mapping on GPUs Minh Pham University of South Florida, Yicheng Tu University of South Florida, Xiaoyi Lv Xinjiang University.

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications Lingqi Zhang Tokyo Institute of Technology, Mohamed Wahib RIKEN Center for Computational Science, Peng Chen National Institute of Advanced Industrial Science and Technology, Jintao Meng Shenzhen Institutes of Advanced Technology, Xiao Wang Oak Ridge National Laboratory, Toshio Endo Tokyo Institute of Technology, Satoshi Matsuoka RIKEN Center for Computational Science.

Wafer-Scale Fast Fourier Transforms Marcelo Orenes-Vera Princeton University, Ilya Sharapov Cerebras Systems, Rob Schreiber Cerebras Systems, Mathias Jacquelin Cerebras Systems, Philippe Vandermersch Cerebras Systems, Sharan Chetlur Cerebras Systems.

Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge Ismayil Ismayilov Koç University, Istanbul, Turkey Javid Baydamirli Koç University, Istanbul, Turkey Doğan Sağbili Koç University, Istanbul, Turkey Mohamed Wahib RIKEN Center for Computational Science Didem Unat Koç University, Istanbul, Turkey.

Session 5B (Parallel): Large Scale Applications I (2.00 pm - 3.40 pm), Location: Magnolia 12


Chair: Adnan Siraj Rakin, Binghampton University (SUNY)

A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training Siddharth Singh University of Maryland, Olatunji Ruwase Microsoft, Ammar Ahmad Awan Microsoft, Samyam Rajbhandari Microsoft Corporation, Yuxiong He Microsoft, Abhinav Bhatele University of Maryland.

Scalable parallelization for the solution of phonon Boltzmann Transport Equation Han D. Tran University of Utah, Siddharth Saurav The Ohio State University, P. Sadayappan University of Utah, Sandip Mazumder The Ohio State University, Hari Sundar University of Utah.

Optimizing Multi-grid Computation and Parallelization on Multi-cores Xiaojian Yang National University of Defense Technology, Shengguo Li National University of Defense Technology, Fan Yuan Xiangtan University, Dezun Dong NUDT, Chun Huang National University of Defense Technology, Zheng Wang University of Leeds.

FT-topo: Architecture-Driven Folded-Triangle Partitioning for Communication-efficient Graph Processing Xinbiao Gan NUDT, Guang Wu NUDT, Ruigeng Zeng NUDT, Jiaqi Si NUDT, Cong Liu Information Center of Logistic Support Department,CMC, Ji Liu Big Data Lab, Baidu Inc, Daxiang Dong Baidu Inc, Chunye Gong NUDT, Tiejun Li NUDT.

FCRC Break (3.40 pm - 4.00 pm)

Session 6A (Parallel) Accelerator Programming II (4.00 pm - 5.40 pm), Location: Magnolia 10-11

Chair: Amelie Chi Zhou, Shenzhen University

Revisiting Temporal Blocking Stencil Optimizations Lingqi Zhang Tokyo Institute of Technology, National Institute of Advanced Industrial Science and Technology Mohamed Wahib RIKEN Center for Computational Science, Peng Chen National Institute of Advanced Industrial Science and Technology, Jintao Meng Shenzhen Institutes of Advanced Technology, Xiao Wang Oak Ridge National Laboratory, Toshio Endo Tokyo Institute of Technology, Satoshi Matsuoka RIKEN Center for Computational Science.

BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs Jou-An Chen North Carolina State University, Hsin-Hsuan Sung North Carolina State University, Xipeng Shen North Carolina State University, Sutanay Choudhury Pacific Northwest National Laboratory, Ang Li Pacific Northwest National Laboratory.

Fast All-Pairs Shortest Paths Algorithm in Large Sparse Graph Shaofeng Yang Institute of Computing Technology, Chinese Academy of Science, Xiandong Liu Institute of Computing Technology, Chinese Academy of Science, Yunting Wang Institute of Computing Technology, Chinese Academy of Science, Xin He Institute of Computing Technology, Chinese Academy of Science, Guangming Tan Institute of Computing Technology, Chinese Academy of Science.

RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search Vani Nagarajan Purdue University, Durga Mandarapu Purdue University, Milind Kulkarni Purdue University.

Session 6B (Parallel) Large Scale Applications II (4.00 pm - 5.40 pm), Location: Magnolia 12

Chair: Piyush Sao, Oak Ridge National Laboratory

Distributed-Memory Parallel JointNMF Srinivas Eswar Argonne National Laboratory, Benjamin Cobb Georgia Institute of Technology, Koby Hayashi Georgia Institute of Technology, Ramakrishnan Kannan Oak Ridge National Laboratory, Grey Ballard Wake Forest University, Richard Vuduc Georgia Institute of Technology, Haesun Park Georgia Institute of Technology.

Parallel Software for Million-scale Exact Kernel Regression Yu Chen William & Mary, Lucca Skon William & Mary, James McCombs Indiana University, Zhenming Liu William & Mary, Andreas Stathopoulos William & Mary.

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on Multi-core CPUs Chengming Zhang Indiana University, Shaden Smith Microsoft, Baixi Sun Indiana University, Jiannan Tian Indiana University, Jonathan Soifer Microsoft, Xiaodong Yu Argonne National Laboratory, Shuaiwen Leon Song Microsoft, Yuxiong He Microsoft, Dingwen Tao Indiana University.

Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training Anqi Guo Boston University, Yuchen Hao Meta Platforms, Chunshu Wu Boston University, Pouya Haghi Boston University, Zhenyu Pan University of Rochester, Min Si Meta Platforms, Dingwen Tao Indiana University, Ang Li Pacific Northwest National Laboratory, Martin Herbordt Boston University, Tong Geng University of Rochester

ICS Reception (6.00 pm - 8.00 pm), Location: Magnolia Patio

Day 3: Friday, June 23


Session 7 (Plenary): Parallel Algorithms (8.45 am - 11.00 am), Location: Magnolia 10-11

Chair: James McCombs, Indiana University

GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs Boyuan Zhang Indiana University, Jiannan Tian Indiana University, Sheng Di Argonne National Laboratory, Xiaodong Yu Argonne National Laboratory, Martin Swany Indiana University, Dingwen Tao Indiana University. Franck Cappello Argonne National Laboratory.

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs Shixun Wu University of California, Riverside, Yujia Zhai University of California, Riverside, Jinyang Liu University of California, Riverside, Jiajun Huang University of California, Riverside, Zizhe Jian University of California, Riverside, Bryan Wong University of California, Riverside, Zizhong Chen University of California, Riverside.

FMI: Fast and Cheap Message Passing for Serverless Functions Marcin Copik ETH Zurich, Roman Böhringer ETH Zurich, Alexandru Calotoiu ETH Zurich, Switzerland, Torsten Hoefler ETH Zurich.

Scalable algorithms for compact spanners on real world graphs Maulein Pathak University of Delhi, Yogish Sabharwal IBM India Research Laboratory, Neelima Gupta University of Delhi.

OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs Tun Chen High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences, Haipeng Jia Institute of Computing Technology, Chinese Academy of Sciences, Yunquan Zhang Institute of Computing Technology, Chinese Academy of Sciences, Kun Li Institute of Computing Technology, Chinese Academy of Sciences, Zhihao Li Huawei Technologies Co., Ltd., Xiang Zhao Ocean University of China, Jianyu yao Institute of Computing Technology, Chinese Academy of Sciences, Chendi Li Institute of Computing Technology, Chinese Academy of Sciences.

FCRC Break (11.00 am - 11.20 am)

FCRC Plenary (11.20 am - 12.30 pm)

FCRC Lunch (12.30 pm - 2.00 pm)

Session 8 (Plenary): Architecture and Interconnects (2.00 pm - 4.30 pm), Location: Magnolia 10-11

Chair: Robert van Engelen, Genivia

Seizing the Bandwidth Scaling of On-Package Interconnect in a Post-Moore’s Law World Grigory Chirkov Princeton University, David Wentzlaff Princeton University.

A Router Microarchitecture for In-network Allreduce Ruiqi Wang National University of Defense Technology, Dezun Dong National University of Defense Technology, Fei Lei National University of Defence Technology, Ke Wu National University of Defense Technology, Junchao Ma National University of Defense Technology, Kai Lu National University of Defense Techonology.

GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPC Guangnan Feng Sun Yat-sen University, Dezun Dong College of Computer, National University of Defense Technology, Shizhen Zhao Shanghai Jiao Tong University, Yutong Lu Sun Yat-sen University.

FLASH: FPGA-Accelerated Smart Switches with GCN Case Study Pouya Haghi Boston University, William Krska Boston University, Cheng Tan Microsoft, Tong Geng University of Rochester, Po Hao Chen Boston University, Connor Greenwood Boston University, Anqi Guo Boston University, Thomas Hines University of Tennessee at Chattanooga, Chunshu Wu Boston University, Ang Li Pacific Northwest National Laboratory, Anthony Skjellum University of Tennessee at Chattanooga Martin Herbordt Boston Univesrsity.

SPARTA: Spatial Acceleration for Efficient and Scalable Horizontal Diffusion Weather Stencil Computation Gagandeep Singh ETH Zurich, Alireza Khodamoradi AMD, Kristof Denolf AMD, Jack Lo AMD, Juan Gomez-Luna ETH Zurich, Joseph Melber AMD, Andra Bisca AMD, Henk Corporaal Eindhoven University of Technology, Onur Mutlu ETH Zurich.

Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication Nicholas Contini The Ohio State University, Bharath Ramesh The Ohio State University, Kaushik Kandadi Suresh The Ohio State University, Tu Tran The Ohio State University, Ben Michalowicz The Ohio State University, Mustafa Abduljabbar The Ohio State University, Hari Subramoni The Ohio State University, Dhabaleswar Panda The Ohio State University.