Zhongliang Chen Ph.D. Candidate, Dept. of ECE, Northeastern University
zhonchen@ece.neu.edu

I am a PhD candidate at Department of Electrical and Computer Engineering, Northeastern University. My advisor is Dr. David Kaeli. I am a member of Northeastern University Computer Architecture Research Group (NUCAR). I received my master's degree in Computer Science from State Key Laboratory of Computer Architecture (CARCH), Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) in 2010, and bachelor's degree in Information Engineering from School of Information and Communication Engineering, Beijing University of Posts and Telecommunications (BUPT) in 2007.

Research


My research interests include GPU computing, computer architecture, and machine learning. I design the scalar-vector GPU architectures to take advantage of compiler- and architecture-level scalar opportunities in GPU compute applications, improving both performance and power efficiency. I also work on parallel and distributed machine learning algorithms targeting GPUs. Furthermore, I have a great deal of hands-on experience on GPU architecture modeling and simulation.

During my master's career, I have done intensive research on microprocessor reliability at Reliable Design Research Group.

Projects


We are designing the scalar-vector GPU architectures to take advantage of compiler- and architecture-level scalar opportunities in GPU compute applications, improving both performance and power efficiency.
We are designing parallel and distributed stochastic gradient descent based matrix factorization algorithms targeting single and multiple GPUs.
We designed a full-system CPU-GPU simulator to enable efficient hardware-software design exploration.
I ported a 3D finite difference time domain algorithm in Fortran to a GPU cluster using OpenCL and MPI.
We analyzed the performance of an industrial strength multi-GPU CT imaging application and provided advice on future optimization directions.
I optimized a CUDA implementation of a growing neural gas network algorithm.
We developed a forward ray tracer for 3D simulation and real-time inversion for whole-body imaging with NVIDIA OptiX ray tracing engine.

Work


I worked as a Research Intern at IBM T J Watson Research Center from May to August 2015. I explored parallel and distributed matrix factorization techniques for recommender systems. I also designed, implemented, and optimized large scale parallel and distributed stochastic gradient descent targeting GPUs.
I worked as a GPU Architecture Intern at Advanced Processor Lab, Samsung Research America from May to August 2014. I modeled and implemented a complete tessellation pipeline in Samsung GPU simulator. I also implemented ISA instructions to support tessellation and added supporting logic in on-chip memory and shader states.

AMD

I worked as a Performance Compiler Engineer Intern at AMD Shader Compiler Group from July to December 2011. I designed a compiler pass to identify scalar opportunities in GPU compute applications, and analyzed the scalar opportunity detection rate using my modified AMD GPU shader compiler. I also evaluated and analyzed the performance of scalar coprocessors in AMD's Graphics Core Next GPU architecture.

Publications


  1. Perhaad Mistry, Yash Ukidave, Zhongliang Chen, and David Kaeli, "Computer Organization," book chapter in Encyclopedia of Computer Science and Technology, to appear, 2016.
  1. Kathryn Williams, Luis Tirado, Zhongliang Chen, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray Tracing for Simulation of Millimeter Wave Whole Body Imaging Systems," IEEE Transactions on Antennas and Propagation, vol. 63, no. 12, pp. 5913-5918, 2015.
  2. Yu Hu, Zhongliang Chen, and Xiaowei Li, "OWARE: operand width aware redundant execution for whole-processor error detection," Intelligent Automation and Soft Computing, vol. 17, no. 6, pp. 771-780, 2011.
  1. Zhongliang Chen and David Kaeli, "Balancing Scalar and Vector Execution on GPU Architectures," IPDPS PhD Forum, 2016.
  2. Zhongliang Chen and David Kaeli, "Balancing Scalar and Vector Execution on GPU Architectures," IPDPS, 2016.
  3. Yash Ukidave, Fanny Nina Paravecino, Leiming Yu, Charu Kalra, Amir Momeni, Zhongliang Chen, Nick Materise, Brett Daley, Perhaad Mistry, and David Kaeli, "NUPAR: a benchmark suite for modern GPU architectures," ICPE, 2015.
  4. Kathryn Williams, Luis Tirado, Zhongliang Chen, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray tracing simulation tool for portal-based millimeter-wave security systems using the NVIDIA OptiX ray tracing engine," USNC-URSI Radio Science Meeting, 2014.
  5. Rafael Ubal, Dana Schaa, Perhaad Mistry, Xiang Gong, Yash Ukidave, Zhongliang Chen, Gunar Schirner, and David R. Kaeli, "Exploring the heterogeneous design space for both performance and reliability," DAC, 2014.
  6. Ayse Yilmazer, Zhongliang Chen, and David Kaeli, "Scalar waving: improving the efficiency of SIMD execution on GPUs," IPDPS, 2014.
  7. Zhongliang Chen, David Kaeli, and Norman Rubin, "Characterizing Scalar Opportunities in GPGPU Applications," ISPASS, 2013.
  8. Kathryn Williams, Borja Gonzalez-Valdes, Zhongliang Chen, Luis Tirado, Jose Martinez-Lorenzo, and Carey Rappaport, "A GPU Ray Tracer for Modeling Electromagnetic Scattering from the Human Body," Northeastern University Research, Innovation, and Scholarship Expo (RISE), 2013.
  9. Kathryn Williams, Zhongliang Chen, Luis Tirado, Borja Gonzalez-Valdes, Jose Martinez-Lorenzo, and Carey Rappaport, "Ray tracing for 3D simulation and inversion for whole-body imaging," APSURSI, 2012.
  10. Zhongliang Chen and David Kaeli, "Delivering 100x speedup for three-dimensional finite difference time domain (FDTD) on GPU," Workshop on Advances in GPU Computing, 2011.
  11. Yu Hu, Zhongliang Chen, and Xiaowei Li, "Using data-level parallelism to accelerate instruction-Level temporal redundancy," the 4th Conference on Dependable Computing (CDC), 2010.
  12. Li Zhao, Zhongliang Chen, Yu Hu, and Xiaowei Li, "Software-hardware co-simulation based evaluation platform for reliable design of microprocessors (in Chinese)," China Test Conference (CTC), 2010.
  13. Zhongliang Chen, Yu Hu, and Xiaowei Li, "Overview of software-based fault tolerance," China Fault Tolerance Conference (CFTC), 2009.
  14. Zhongliang Chen and Yubin Huang, "The design of low-cost audio signal infrared transceiver (in Chinese)," the 3rd Annual Conference of School of Information Engineering, Beijing University of Posts and Telecommunications, 2007.
  1. Zhongliang Chen, "Scalar-Vector GPU Architectures," PhD dissertation (in preparation), Northeastern University.
  2. Zhongliang Chen, "Research on operand-width aware fault tolerance for microprocessors (in Chinese)," Master's Thesis, Institute of Computing Technology, Chinese Academy of Sciences, 2010.
  3. Zhongliang Chen, "Research on reconfigurable boundary scan technique (in Chinese)," Bachelor's Thesis, Beijing University of Posts and Telecommunications, 2007.

Teaching


I am teaching CUDA programming and GPU architecture to undergraduate and graduate students at Northeastern University in Spring 2016.

I am the teaching assistant for a graduate course on Operating Systems at Northeastern University in Spring 2016.

I taught OpenCL programming and GPU architecture to undergraduate students at Northeastern University in Spring 2015.

I assisted Professor Kaeli to teach GPU programming to undergraduate students at Northeastern University in Spring 2011.