My research is focusses on development of advanced runtime features for heterogenous computing using CUDA, OpenCL, OpenACC on GPUs and other accelerators. I work on design of advanced scheduling, partitioning and virtual context management of CUDA/OpenCL programs running on the GPU. My research also aims at architectural enhancements in GPUs for supporting such advanced runtime features.
Other research topics involve development of simulators for supporting heterogenous computing with GPU cores and CPUs such as ARM for SoC level solutions. I developed the ARM functional simulator for the Multi2Sim simulation framework.
I worked on analyzing the power and performance of different optimizations applied to heterogenous applications on Nvidia GPUs, AMD GPUs and also on APUs from AMD and Intel. I also have special interests in supercomputing and high performance computing infrastructure. Served as an advisor to the team which won the Student CLuster Competition in SuperComputing-2013 in Denver, CO.
Worked at Advanced Micro Devices(AMD) as co-op in Summer 2014, with AMD Research and CPU core architecture group on ARM and x86 processors. Worked on microarchitecture research to support enhanced predication schemes for server class ARM processors from AMD.
Worked at Intel Corp. in the OpenCL compiler Development team in Summer 2013. Working on OpenCL runtime design for future products and compiler design for support of new features of OpenCL 2.0 specification
Also worked as an Applications Engineering intern in DSP Product Development group at Analog Devices Inc. in the Summer and Fall 2011. Worked on projects for architectural analysis, cache characterization of product under design. Also worked on power unit and hysteresis validation for Blackfin Products.