Publications (Google Scholar Profile)
Journal
[3] Fault Injection for TensorFlow Applications
Niranjhana Narayanan, Zitao Chen, Bo Fang, Guanpeng Li, Karthik Pattabiraman, Nathan DeBardeleben
IEEE Transactions on Dependable and Secure Computing (TDSC) Accept date: May 2022.
[2] Improving the Accuracy of IR-level Fault Injection
Lucas Palazzi, Guanpeng Li, Bo Fang, and Karthik Pattabiraman
IEEE Transactions on Dependable and Secure Computing (TDSC) Accept date: February 2020.
[1] A Systematic Methodology for Evaluating the Error Resilience of GPGPU Applications
Bo Fang, Karthik Pattabiraman, Matei Ripeanu, and Sudhanva Gurumurth.
IEEE Transactions on Parallel and Distributed Systems (TPDS) Accept date: December 2015.
Conference and others
[32] ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training
Yuhang Liang, Bo Fang, Xinyi Li, Jie Ren, Ang Li, Jieyang Chen
to appear in 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2025)
[31] PQML: Enabling the Predictive Reproducibility on NISQ Machines for Quantum ML Applications
Priyabrata Senapati, Samuel Yen-Chi Chen, Bo Fang, Tushar M. Athawale, Ang Li, Weiwen Jiang, Cheng Chang Lu, Qiang Guan
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, QCE24
[30] HAPPA: A Modular Platform for HPC Application Resilience Analysis with LLMs Embedded
Hailong Jiang, Jianfeng Zhu, Bo Fang, Kevin Barker, Chao Chen, Ruoming Jin, Qiang Guan
The 43rd International Symposium on Reliable Distributed Systems (SRDS2024)
[29] Privacy-Preserving Artificial Intelligence on Edge Devices: A Homomorphic Encryption Approach
Khan M., B. Fang, G. Cimino, S. Cirillo, D. Zhao, and L. Yang
In International Conference on Web Services 2024
[28] A Testing-Guided Approach to Characterize NVIDIA and AMD Matrix Accelerator Numerics
Xinyi Li, Ang Li, Bo Fang, Ignacio Laguna, Ganesh Gopalakrishnan
24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID24)
[27] Red-QAOA: Efficient Variational Optimization through Circuit Reduction
Meng Wang, Bo Fang, Ang Li, and Prashant J. Nair
In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2024
[26] MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications,
Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan, Siva Kumar Sastry Hari, Timothy Tsai, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant Nair, Kevin Barker, Ang Li
arXiv IEEE Cluster 2024
[25] Towards Redefining the Reproducibility in Quantum Computing: A Data Analysis Approach on NISQ Devices,
Priyabrata Senapati, Zhepeng Wang, Weiwen Jiang, Travis Humble, Bo Fang, Shuai Xu and Qiang Guan
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, QCE23
[24] AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications,
Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian Jin, Houjun Tang, Jean Sexton, Sheng Di, Zarija Lukić, Kai Zhao, Bo Fang, Franck Cappello, James Ahrens, and Dingwen Tao
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Denver, CO, USA, Nov 12-17, 2023. Acceptance Rate: 24%
[23] Practical GPU Floating-Point Exception Detection, Diagnosis and Repair,
Xinyi Li, Ignacio Laguna, Bo Fang, Katarzyna Swirydowicz, Ang Li and Ganesh Gopalakrishnan,
ACM International Symposium on High-Performance Parallel and Distributed Computing, Orlando, FL, USA. June 16-23, 2023
[22] Towards Precision-Aware Fault-tolerance Approaches for Mixed-Precision Applications
Bo Fang, Hari Siva, Timothy Tsai, Xinyi Li, Ganesh Gopalakrishnan, Ignacio Laguna, Kevin Barker, Ang Li
FTXS 2022: Workshop on Fault Tolerance for HPC at eXtreme Scale (co-located at SC22)
[21] MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
Betis Baheri, Qiang Guan, Jacob Tronge, Bo Fang, Ang Li, Vipin Chaudhary
2022 IEEE International Performance, Computing, and Communications Conference (IPCCC)
[20] Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning
Bo Fang, M. Yusuf Ozkaya, Ang Li, Umit V. Catalyurek, Sriram Krishnamoorthy
to appear in IEEE Cluster 2022, co-first author
Best paper award
[19] Pinpointing the System Reliability Degradation in NISQ Machines
Qiang Guan, Betis Baheri, Zixuan Xu, Ying Mao, Vipin Chaudhary, Shuai Xu and Bo Fang
2022 IEEE International Conference on Quantum Computing and Engineering (QCE22)
[18] ASAP - Automatic Synthesis of Area-Efficient and Precision-Aware CGRA
Cheng Tan, Thierry Tambe, Jeff Zhang, Bo Fang, Tong Geng, Gu-Yeon Wei, David Brooks, Antonino Tumeo, Ganesh Gopalakrishnan, Ang Li
International Conference on Supercomputing. Jun 27-30, 2022 (Accepted)
[17] SV-Sim: Scalable PGAS-based State Vector Simulation of Quantum Circuits.
Ang Li, Bo Fang, Christopher Granade, Guen Prawiroatmodjo, Bettina Heim, Martin Roetteler and Sriram Krishnamoorthy,
The 2021 International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MI, USA. Nov 14-19, 2021
[16] A Hybrid System for Learning Classical Data in Quantum States.
Stein, S. A., L’Abbate, R., Mu, W., Liu, Y., Baheri, B., Mao, Y., Guan, Q., Li, A., & Fang, B.
In 2021 IEEE 34th International Performance Computing and Communications Conference (IPCCC), pages 1–8, 2021. IEEE
[15] QuGAN: A Quantum State Fidelity based Generative Adversarial Network
Samuel A. Stein, Betis Baheri, Ray Marie Tischio, Ying Mao, Qiang Guan, Ang Li, Bo Fang, Shuai Xu
2021 IEEE International Conference on Quantum Computing and Engineering.
[14] Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights
Bo Fang, Daoce Wang, Sian Jin, Quincey Koziol, Zhao Zhang, Qiang Guan, Suren Byna, Sriram Krishnamoorthy, Dingwen Tao
To appear at the IEEE CLUSTER, 2021 (Acceptance Rate: 29%, co-first author)
[13] TensorFlowFI: A Flexible Fault Injection Framework for TensorFlow Applications
Zitao Chen, Niranjhana Narayanan, Bo Fang, Guanpeng Li, Karthik Pattabiraman, Nathan DeBardeleben
To appear at the IEEE International Symposium on Software Reliability Engineering (ISSRE), 2020 (Acceptance Rate: 25.6%)
[12] Chaser: A Enhanced Fault Injection Tool for TracingSoft Errors in MPI Applications
Qiang Guan, Xunchao Hu,Terence Grove, Bo Fang, Hailong Jiang, Heng Yin, Nathen DeBardeleben
To appear at the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2020, tool track) (Acceptance Rate: 16.5%)
[11] A Tale of Two Injectors: End-to-End Comparison of IR-level and Assembly-Level Fault Injection
Lucas Palazzi, Guanpeng Li, Bo Fang, and Karthik Pattabiraman
To appear at the IEEE International Symposium on Software Reliability Engineering (ISSRE), 2019 (Acceptance Rate: 31.4%)
[10] BonVoision: Leveraging Spatial Data Smoothness for Recovery from Memory Soft Errors
Bo Fang, Hassan Halawa, Karthik Pattabiraman, Matei Ripeanu, Sriram Krishnamoorthy
ACM International Conference on Supercomputing (ICS’2019) (acceptance rate 23%)
[9] Towards Predicting the Impact of Roll-Forward Failure Recovery for HPC Applications
Bo Fang, Jieyang Chen, Karthik Pattabiraman, Matei Ripeanu, Sriram Krishnamoorthy
the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2019), Fast abstract
[8] LetGo: A Lightweight Continuous Framework for HPC Applications upon Failures
Bo Fang, Qiang Guan, Nathan Debardeleben, Karthik Pattabiraman, Matei Ripeanu
The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) June 2017 (acceptance rate 18%)
[7] SDC is in the Eye of the Beholder: A Survey and Preliminary Study
Bo Fang, Panruo Wu, Qiang Guan, Nathan Debardeleben, Laura Monroe, Sean Blanchard, Zhizong Chen, Karthik Pattabiraman, Matei Ripeanu
3rd IEEE International Workshop on Reliability and Security Data Analysis (co-located with DSN 2016), June 2016.
[6] ePVF: An Enhanced Program Vulnerability Factor Methodology for Cross-Layer Resilience Analysis
Bo Fang, Qining Lu, Karthik Pattabiraman, Matei Ripeanu and Sudhanva Gurumurthi
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2016 (acceptance rate 21%)
[5] Evaluating the Error Resilience of Parallel Programs
Bo Fang, Karthik Pattabiraman, Matei Ripeanu and Sudhanva Gurumurthi,
Workshop on Fault Tolerance for High-Performance at Extreme Scale (FTXS), In conjunction with DSN 2014
[4] GPUS: Combining high-performance with high-reliability
L. Bautista Gomez, F. Cappello, L. Carro, N.DeBardeleben, B. Fang, S. Gurumurthi, K. Pattabiraman, P. Rech, M. Sonza Reorda,
Embedded tutorial paper (invited), Proceedings of the International Symposium on Design Automation and Test in Europe (DATE), 2014
[3] GPU-Qin: A Methodology for Evaluating the Error Resilience of GPGPU Applications
Bo Fang, Karthik Pattabiraman, Matei Ripeanu, and Sudhanva Gurumurth,
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014, (acceptance rate 31%).
[2] Towards Building Error Resilient GPGPU Applications
Bo Fang, Jiesheng Wei, Karthik Pattabiraman, Matei Ripeanu,
3rd IEEE Workshop on Resilient Architecture (WRA) in conjunction with MICRO 2012.
[1] Evaluating Error Resiliency of GPGPU Applications
Bo Fang, Jiesheng Wei, Karthik Pattabiraman, Matei Ripeanu,
High Performance Computing, Networking, Storage and Analysis SC12’, poster.