Zhongliang Chen Ph.D. Candidate, Dept. of ECE, Northeastern University
zhonchen@ece.neu.edu

I am a PhD candidate at Department of Electrical and Computer Engineering, Northeastern University. My advisor is Dr. David Kaeli. I am a member of Northeastern University Computer Architecture Research Group (NUCAR). I received my master's degree in Computer Science from State Key Laboratory of Computer Architecture (CARCH), Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) in 2010, and bachelor's degree in Information Engineering from School of Information and Communication Engineering, Beijing University of Posts and Telecommunications (BUPT) in 2007.

Research


My research interests include parallel computing with Graphics Processing Units (GPU) and computer architecture. Currently, I am working on identifying and analyzing compiler- and architecture-level scalar opportunities in GPGPU applications. These opportunities are later utilized on the novel scalar-vector GPU architecture to improve performance and power efficiency.

I am also very interested in big data analytics and very large scale machine learning with GPUs. I am working with IBM Research on parallel and distributed stochastic gradient descent on GPUs.

I have spent a great deal of time on GPU architecture modeling and simulation. I was working on the modeling of a complete tessellation pipeline at Samsung Advanced Processor Lab. Also, I was a lead developer of NVIDIA Fermi/Kepler simulator on Multi2Sim simulation framework.

During my master's career, I have done intensive research on microprocessor reliability at Reliable Design Research Group.

Project


We are designing a stochastic gradient descent based matrix factorization algorithm on single and multiple GPUs.
We are designing a full-system CPU-GPU simulator that enables efficient hardware-software design exploration.
We modeled NVIDIA Fermi/Kepler microarchitecture on Multi2Sim simulation framework and implemented a native ISA-level simulator.
I ported a 3D finite difference time domain algorithm in Fortran to a GPU cluster using OpenCL and MPI.
We analyzed the performance of a industrial strength multi-GPU CT imaging application.
I optimized a CUDA implementation of a growing neural gas network algorithm.
We developed a forward ray tracer for 3D simulation and real-time inversion for whole-body imaging with NVIDIA OptiX ray tracing engine.

Work


I worked as a Research Intern at IBM T J Watson Research Center from May to August 2015. I was working on large scale matrix factorization using parallel stochastic gradient descent with GPUs.
I worked as a GPU Architecture Intern at Advanced Processor Lab, Samsung Research America from May to August 2014. I was working on modeling and implementation of a complete tessellation pipeline in Samsung GPU simulator.

AMD

I worked as a Performance Compiler Engineer Intern at AMD Shader Compiler Group from July to December 2011. My work was primarily focused on compile-time scalar opportunity analysis in GPGPU applications and performance evaluation of scalar coprocessors in AMD Southern Islands GPUs.

Publications


  1. Kathryn Williams, Luis Tirado, Zhongliang Chen, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray Tracing for Simulation of Millimeter Wave Whole Body Imaging Systems," IEEE Transactions on Antennas and Propagation, vol. 63, no. 12, pp. 5913-5918, 2015.
  2. Yu Hu, Zhongliang Chen, and Xiaowei Li, "OWARE: operand width aware redundant execution for whole-processor error detection," Intelligent Automation and Soft Computing, vol. 17, no. 6, pp. 771-780, 2011.
  1. Zhongliang Chen and David Kaeli, \Balancing Scalar and Vector Execution on GPU Architectures,", IPDPS, 2016.
  2. Yash Ukidave, Fanny Nina Paravecino, Leiming Yu, Charu Kalra, Amir Momeni, Zhongliang Chen, Nick Materise, Brett Daley, Perhaad Mistry, and David Kaeli, "NUPAR: a benchmark suite for modern GPU architectures," ICPE, 2015.
  3. Kathryn Williams, Luis Tirado, Zhongliang Chen, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray tracing simulation tool for portal-based millimeter-wave security systems using the NVIDIA OptiX ray tracing engine," USNC-URSI Radio Science Meeting, 2014.
  4. Rafael Ubal, Dana Schaa, Perhaad Mistry, Xiang Gong, Yash Ukidave, Zhongliang Chen, Gunar Schirner, and David R. Kaeli, "Exploring the heterogeneous design space for both performance and reliability," DAC, 2014.
  5. Ayse Yilmazer, Zhongliang Chen, and David Kaeli, "Scalar waving: improving the efficiency of SIMD execution on GPUs," IPDPS, 2014.
  6. Zhongliang Chen, David Kaeli, and Norman Rubin, "Characterizing Scalar Opportunities in GPGPU Applications," ISPASS, 2013.
  7. Kathryn Williams, Borja Gonzalez-Valdes, Zhongliang Chen, Luis Tirado, Jose Martinez-Lorenzo, and Carey Rappaport, "A GPU Ray Tracer for Modeling Electromagnetic Scattering from the Human Body," Northeastern University Research, Innovation, and Scholarship Expo (RISE), 2013.
  8. Kathryn Williams, Zhongliang Chen, Luis Tirado, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray tracing for 3D simulation and inversion for whole-body imaging," APSURSI, 2012.
  9. Zhongliang Chen and David Kaeli, "Delivering 100x speedup for three-dimensional finite difference time domain (FDTD) on GPU," Workshop on Advances in GPU Computing, 2011.
  10. Yu Hu, Zhongliang Chen, and Xiaowei Li, "Using data-level parallelism to accelerate instruction-Level temporal redundancy," the 4th Conference on Dependable Computing (CDC), 2010.
  11. Li Zhao, Zhongliang Chen, Yu Hu, and Xiaowei Li, "Software-hardware co-simulation based evaluation platform for reliable design of microprocessors (in Chinese)," China Test Conference (CTC), 2010.
  12. Zhongliang Chen, Yu Hu, and Xiaowei Li, "Overview of software-based fault tolerance," China Fault Tolerance Conference (CFTC), 2009.
  13. Zhongliang Chen and Yubin Huang, "The design of low-cost audio signal infrared transceiver (in Chinese)," the 3rd Annual Conference of School of Information Engineering, Beijing University of Posts and Telecommunications, 2007.
  1. Zhongliang Chen, "Research on operand-width aware fault tolerance for microprocessors (in Chinese)," Master's Thesis, Institute of Computing Technology, Chinese Academy of Sciences, 2010.
  2. Zhongliang Chen, "Research on reconfigurable boundary scan technique (in Chinese)," Bachelor's Thesis, Beijing University of Posts and Telecommunications, 2007.

Teaching


I am teaching CUDA programming and GPU architecture to undergraduate and graduate students at Northeastern University in Spring 2016.

I am the teaching assistant for a graduate course on Operating Systems at Northeastern University in Spring 2016.

I taught OpenCL programming and GPU architecture to undergraduate students at Northeastern University in Spring 2015.

I assisted Professor Kaeli to teach GPU programming to undergraduate students at Northeastern University in Spring 2011.