Zhongliang Chen Ph.D. Candidate, Dept. of ECE, Northeastern University
zhonchen@ece.neu.edu

I am a PhD candidate at Department of Electrical and Computer Engineering, Northeastern University. My advisor is Dr. David Kaeli. I am a member of Northeastern University Computer Architecture Research Group (NUCAR). I received my master's degree in Computer Science from State Key Laboratory of Computer Architecture (CARCH), Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) in 2010, and bachelor's degree in Information Engineering from School of Information and Communication Engineering, Beijing University of Posts and Telecommunications (BUPT) in 2007.

Research


My research interests include GPU computing, computer architecture, and machine learning. I have been designing scalar-vector GPU architectures to take advantage of compiler- and architecture-level scalar opportunities in GPGPU applications for better performance and power efficiency.

I also have been spending a great deal of time on GPU architecture modeling and simulation. I am a lead developer of NVIDIA Fermi/Kepler simulator on Multi2Sim simulation framework. Also, I was working on the modeling of a complete tessellation pipeline at Samsung Advanced Processor Lab.

Currently, I am working with IBM Research on big data analytics and very large scale machine learning with GPUs.

During my master's career, I have done intensive research on microprocessor reliability at Reliable Design Research Group.

Projects


We are designing a scalar-vector GPU architecture to take advantage of compiler- and architecture-level scalar opportunities in GPGPU applications for better performance and power efficiency.
We are designing parallel and distributed stochastic gradient descent based matrix factorization algorithms on a single and multiple GPUs.
We are modeling NVIDIA Fermi/Kepler microarchitecture on Multi2Sim simulation framework and implemented a native ISA-level simulator.
We designed a full-system CPU-GPU simulator to enable efficient hardware-software design exploration.
I ported a 3D finite difference time domain algorithm in Fortran to a GPU cluster using OpenCL and MPI.
We analyzed the performance of an industrial strength multi-GPU CT imaging application and provided advice on future optimization directions.
I optimized the CUDA implementation of a growing neural gas network algorithm.
We developed a forward ray tracer for 3D simulation and real-time inversion for whole-body imaging with NVIDIA OptiX ray tracing engine.

Work


I worked as a Research Intern at IBM T J Watson Research Center from May to August 2015. I designed, implemented, and optimized large scale parallel stochastic gradient descent on GPUs. Also, I wrote automated scripts for the profiling infrastructure on Intel/IBM-based clusters.
I worked as a GPU Architecture Intern at Advanced Processor Lab, Samsung Research America from May to August 2014. I modeled and implemented a complete tessellation pipeline in Samsung GPU simulator.

AMD

I worked as a Performance Compiler Engineer Intern at AMD Shader Compiler Group from July to December 2011. I designed a compiler pass to identify scalar opportunities in GPGPU applications. I also evaluated and analyzed the performance of scalar coprocessors in AMD Graphics Core Next GPU architecture.

Publications


  1. Kathryn Williams, Luis Tirado, Zhongliang Chen, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray Tracing for Simulation of Millimeter Wave Whole Body Imaging Systems," IEEE Transactions on Antennas and Propagation, vol. 63, no. 12, pp. 5913-5918, 2015.
  2. Yu Hu, Zhongliang Chen, and Xiaowei Li, "OWARE: operand width aware redundant execution for whole-processor error detection," Intelligent Automation and Soft Computing, vol. 17, no. 6, pp. 771-780, 2011.
  1. Zhongliang Chen and David Kaeli, "Balancing Scalar and Vector Execution on GPU Architectures," IPDPS, 2016.
  2. Yash Ukidave, Fanny Nina Paravecino, Leiming Yu, Charu Kalra, Amir Momeni, Zhongliang Chen, Nick Materise, Brett Daley, Perhaad Mistry, and David Kaeli, "NUPAR: a benchmark suite for modern GPU architectures," ICPE, 2015.
  3. Kathryn Williams, Luis Tirado, Zhongliang Chen, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray tracing simulation tool for portal-based millimeter-wave security systems using the NVIDIA OptiX ray tracing engine," USNC-URSI Radio Science Meeting, 2014.
  4. Rafael Ubal, Dana Schaa, Perhaad Mistry, Xiang Gong, Yash Ukidave, Zhongliang Chen, Gunar Schirner, and David R. Kaeli, "Exploring the heterogeneous design space for both performance and reliability," DAC, 2014.
  5. Ayse Yilmazer, Zhongliang Chen, and David Kaeli, "Scalar waving: improving the efficiency of SIMD execution on GPUs," IPDPS, 2014.
  6. Zhongliang Chen, David Kaeli, and Norman Rubin, "Characterizing Scalar Opportunities in GPGPU Applications," ISPASS, 2013.
  7. Kathryn Williams, Borja Gonzalez-Valdes, Zhongliang Chen, Luis Tirado, Jose Martinez-Lorenzo, and Carey Rappaport, "A GPU Ray Tracer for Modeling Electromagnetic Scattering from the Human Body," Northeastern University Research, Innovation, and Scholarship Expo (RISE), 2013.
  8. Kathryn Williams, Zhongliang Chen, Luis Tirado, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray tracing for 3D simulation and inversion for whole-body imaging," APSURSI, 2012.
  9. Zhongliang Chen and David Kaeli, "Delivering 100x speedup for three-dimensional finite difference time domain (FDTD) on GPU," Workshop on Advances in GPU Computing, 2011.
  10. Yu Hu, Zhongliang Chen, and Xiaowei Li, "Using data-level parallelism to accelerate instruction-Level temporal redundancy," the 4th Conference on Dependable Computing (CDC), 2010.
  11. Li Zhao, Zhongliang Chen, Yu Hu, and Xiaowei Li, "Software-hardware co-simulation based evaluation platform for reliable design of microprocessors (in Chinese)," China Test Conference (CTC), 2010.
  12. Zhongliang Chen, Yu Hu, and Xiaowei Li, "Overview of software-based fault tolerance," China Fault Tolerance Conference (CFTC), 2009.
  13. Zhongliang Chen and Yubin Huang, "The design of low-cost audio signal infrared transceiver (in Chinese)," the 3rd Annual Conference of School of Information Engineering, Beijing University of Posts and Telecommunications, 2007.
  1. Zhongliang Chen, "Research on operand-width aware fault tolerance for microprocessors (in Chinese)," Master's Thesis, Institute of Computing Technology, Chinese Academy of Sciences, 2010.
  2. Zhongliang Chen, "Research on reconfigurable boundary scan technique (in Chinese)," Bachelor's Thesis, Beijing University of Posts and Telecommunications, 2007.

Teaching


I am teaching CUDA programming and GPU architecture to undergraduate and graduate students at Northeastern University in Spring 2016.

I am the teaching assistant for a graduate course on Operating Systems at Northeastern University in Spring 2016.

I taught OpenCL programming and GPU architecture to undergraduate students at Northeastern University in Spring 2015.

I assisted Professor Kaeli to teach GPU programming to undergraduate students at Northeastern University in Spring 2011.