Senior Research Associate
Dept. of Computer Science and Engineering
The Ohio State University
2015 Neil Avenue
Columbus, OH-43210, USA
khaledhamidouche (at) gmail.com
I am a Senior Research Associate
in the Department of
and Engineering at the Ohio State
University, since Nov, 2014.
My research advisor is Prof. Dhabaleswar
K. (DK) Panda.
I am a member of Network Based
My research interests include parallel programming models, parallel systems, distributed systems and heterougeneous HPC architectures including GPUs and MIC coprocessors. From 2012 to 2014, I held a Post-doctoral researcher position within the same team
Before that I was a post-doctoral researcher with the HP2 team (High Performance and Parallel) at Telecom Sud-Paris, Evry. In the area of compilation/ code generation for parallel architectures, I work on both the optimization of the source to source transformation tool (STEP) and its port on manycores architectures.
I defended my PhD thesis at the LRI- Laboratoire de Recherche en Informatique (
Parall Team ) in November 2011 on parallel computing and parallel architectures. My thesis work (dissertation available here), led by Prof Daniel Etiemble and Dr Joel Falcou , focused on a programming model and deployment / hybrid code generation for hierarchical and heterogeneous parallel architectures (with development of tools).
In September 2008, I graduated from University of Paris-Sud 11 with a Master's degree in Computer Science.
A full version of my resume is availabale in PDF form.
Events and Announcements:
MVAPICH User Group (MUG) Meeting
held in Columbus, Ohio, USA during August 19-21, 2015.
- High Performance Computing
- High level Parallel Programming
- Tools and Environements for parallel and heterogeneous architectures
- Compilation and code generation
- Multi/Many-cores architectures
- 33) K. Hamidouche, A. Venkatesh, A. A. Awan, H.
Subramoni, C. Chu and D. K. Panda, Exploiting GPUDirect RDMA in Designing High
Performance OpenSHMEM for NVIDIA GPU
Clusters IEEE Cluster 2015 Spetember
2015, Chicago, USA
- 32) M. Li, H. Subramoni, K. Hamidouche, X. Lu and D. K. Panda, High Performance MPI Datatype Support with User-mode
Challenges, Designs and Benefits IEEE Cluster 2015 Spetember
2015, Chicago, USA
- 31) M. Li, K. Hamidouche, X. Lu, J. Lin and D. K. Panda, High-Performance
and Scalable Design of MPI-3 RMA on Xeon Phi Clusters International EURO-PAR Conference (Euro-par 2015) August
- 30) H. Subramoni, A. A. Awan, K. Hamidouche, D.
Pekurovsky, A. Venkatesh, S. Chakraborty, K. Tomko and D. K. Panda, Designing
Non-Blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled
International Supercomputing Conference (ISC '15) July 2015, Germany
- 29) A. Gomez-Iglesias, D. Pekurovsky, K. Hamidouche, J. Zhang, J. Vienne Porting Scientific Libraries to PGAS in XSEDE
Resources: Practice and
Experience XSEDE'2015 Conference July 2015, ST-Louis, USA
- 28) J. Lian, K. Hamidouche, X. Lu, M. Li and D. K. Panda, Coarray Fortran Support with MVAPICH2-X: Initial
Experience and Evaluation
International Workshop on High-Level Parallel Programming Models and Supportive
Environments (HIPS '15)-- Affiliated with IPDPS 2015 May 2015, India
- 27) R. Rajachandrasekar, A. Venkatesh, K. Hamidouche and D. K. Panda, Power-Check: An Energy-Efficient Checkpointing
Framework for HPC Clusters
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing,
(CCGrid'2015) May 2015, Shenzhen, China
- 26) J. Jose, S. Potluri, H. Subramoni, X. Lu, K. Hamidouche, K. Schulz, H.
Sundar and D. K. Panda, Designing Scalable
Out-of-core Sorting with Hybrid MPI+PGAS Programming Models International Conference on Partitioned Global Address Space Programming Models, (PGAS'2014). October, Oregon, USA
- 25) R. Shi, S. Potluri, K. Hamidouche M. Li, J. Perkins D. Rossetti and D.
K. Panda, Designing Efficient Small Message
Transfer Mechanism for Inter-node MPI Communication on InfiniBand GPU Clusters
IEEE International Conference on High Performance Computing (HiPC'2014). December 2014, Goa, India
- 24) A. Venkatesh, H. Subramoni, K. Hamidouche and D. K. Panda, A High Performance Broadcast Design with Hardware
Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters
IEEE International Conference on High Performance Computing (HiPC'2014). December 2014, Goa, India
- 23) J. Jose, K. Hamidouche, X. Lu, S. Potluri, J. Zhang, K. Tomko and and
D. K. Panda, High Performance OpenSHMEM for
MIC Clusters: Extensions, Runtime Designs and Application Co-design IEEE CLUSTER'14 (Best Paper Nominee) . Spetember 2014, Madrid, Spain
- 22) M. Li, X. Lu, S. Potluri, K. Hamidouche, J. Jose, K. Tomko and and D.
K. Panda, Scalable Graph500 Design with MPI-3
RMA IEEE CLUSTER'14. Spetember 2014, Madrid, Spain
- 21) R. Rajachandrasekar, J. Perkins, K. Hamidouche, M. Arnold and D. K. Panda, Understanding the Memory-Utilization of MPI Libraries: Challenges and Designs in Implementing the MPI_T Interface. EUROMPI'14. Spetember 2014, Jepan
- 20) R. Shi, X. Lu, S. Potluri, K. Hamidouche, J. Zhang and D. K. Panda,
HAND: A Hybrid Approach to Accelerate
Non-contiguous Data Movement using MPI Datatypes on GPU Clusters International Conference on Parallel Processing (ICPP'14). Spetember 2014, Minneapolis, USA
- 19) R. Rajachandrasekar, S. Potluri, A. Venkatesh, K. Hamidouche, Md. Wasi-ur-Rahman and D. K. Panda, MIC-Check: A Distributed Checkpointing Framework for the Intel Many Integrated Cores Architecture. The International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC '14). June 2014, Vancouver, Canada
- 18) H. Subramoni, K. Hamidouche, A. Venkatesh, S. Chakraborty and D. K. Panda, Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand : Early Experiences. IEEE International Supercomputing Conference (ISC '14). June 2014, Leipzig, Germany
- 17) J.Jose, K. Hamidouche, J. Zhane, A. Venkadesh and D. K. Panda, Optimizing Collective in UPC. International Workshop on High-Level Parallel Programming Models and Suppor- tive Environments (HIPS '14). May 2014, Phoenix, USA
- 16) A. Venkatesh, S. Potluri, R. Rajachandrasekar, M. Luo, K. Hamidouche and D. K. Panda, High Performance Alltoall and Allgather designs for InfiniBand MIC Clusters. IEEE International Parallel & Distributed Processing Symposium (IPDPS '14). May 2014, Phoenix, USA
- 15) M. Luo, X. Lu, K. Hamidouche, K. Kandalla and D. K. Panda, Initial Study of Multi-Endpoint Runtime for MPI+OpenMP Hybrid Applications on Multi-Core Systems. International Symposium on Principles and Practice of Parallel Programming (PPoPP '14). February 2014, Orlondo, USA
- 14) R. Shi, S. Potluri, K. Hamidouche, X. Lu, K. Tomko and D. K. Panda, A Scalable and Portable Approach to Accelerate Hybrid HPL on
Heterogeneous CPU-GPU Clusters. IEEE Cluster (Cluster13). Best Student Paper Award . September 2013, Indianapolis, USA
- 13) S. Potluri, D. Bureddy, K. Hamidouche, A. Venkatesh, K. Kandalla, H. Subramoni and D. K. Panda, MVAPICH-PRISM: A Proxy-based Communication
Framework using InfiniBand and SCIF for Intel MIC Clusters. IEEE/ACM International Conference on Supercomputing (SC13) . November 2013, Denver, CO, USA
- 12) S. Potluri, K. Hamidouche, D. Bureddy and D. K. Panda, MVAPICH2-MIC: A High-Performance MPI Library for Xeon Phi Clusters with InfiniBand. Extreme Scaling Workshop . August 2013, Boulder, CO, USA
- 11) K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda, Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters. IEEE International Symposium on High-Performance Interconnects (HotI 2013) . August 2013, San Jose, CA, USA
- 10) S. Potluri, K. Hamidouche, A. Venkatesh, D. Bureddy and D. K. Panda, Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs. IEEE International Conference on Parallel Processing (ICPP 2013) . October 2013, Lyon, France
- 9) M. LI, S. Potluri, K. Hamidouche, J. Jose, D. K. Panda, Efficient and Truly Passive MPI-3 RMA Using InfiniBand Atomics. EuroMPI 13 . Septembre 2013, Madrid, Spain
- 8) K. Hamidouche, S. Potluri, H. Subramoni, K. Kandalla and D. K. Panda, MIC-RO: Enabling Efficient Remote Offload on Heterogeneous Many Integrated Core (MIC) Clusters with InfiniBand. ACM International Conference on Supercomputing (ICS 2013) . June 2013, Oregon, USA
- 7) K. Hamidouche, F. M. Mendonca, J. Falcou, A.C.M.A Melo, D. Etiemble, Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++. International Journal of Parallel Programming (IJPP) . August 2012
- 6) K. Hamidouche, F. M. Mendonca, J. Falcou, D. Etiemble, Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++. 23rd IEEE International Symposium on Computer Architecture and High Performance Computing - SBAC-PAD'2011 , Vitoria, Espirito Santo, Brazil, October 26-29, 2011. pdf
- 5) K. Hamidouche, J. Falcou, D. Etiemble, A Framework for an Automatic Hybrid MPI+OpenMP code generation, ACM High Performance Computing Symposium , HPC-11, Boston - USA,
April, 3-7, 2011. pdf
- 4) K. Hamidouche, J. Falcou, D. Etiemble, Hybrid Bulk Synchronous Parallelism Library for Clustered SMP Architectures, ACM International workshop on High Level Parallel Programming and Applications, HLPP2010 , Affiliated to ICFP 2010 , Baltimore - USA, September, 25,2010.pdf
- 3) K. Hamidouche, A. Borghi, P. Esterie, J. Falcou, S. Peyronnet, Three High Performance Architectures in the Parallel APMC Boat, IEEE International Workshop on Parallel and Distributed Methods in Verification, PDMC 2010 , Enschede, Netherlands, September 30, 2010. pdf
- 2) C. Tadonki, L. Lacassagne, T. Saidani, J. Falcou, K. Hamidouche The Harris algorithm revisited on the CELL processor , International workshop on highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2010 , 1 June 2010, Tsukuba , Japan
- 1) K. Hamidouche, F. Cappello, D. Etiemble, Comparaison de MPI, OpenMP et MPI+OpenMP sur un noeud multiprocesseur multicoeurs AMD a memoire partagee , Recontre Francophone de Parallelisme, RenPar 2009 , Toulouse, Septempber,9-11, 2009. pdf
The MVAPICH2 software, supporting MPI 3.0 standard, delivers best performance,
scalability and fault tolerance for high-end computing systems and servers using
InfiniBand, 10GigE/iWARP and RoCE networking technologies. The MVAPICH2-X software package provides support for hybrid MPI+PGAS (UPC and OpenSHMEM)
programming models with unified communication runtime for emerging exascale
systems. The MVAPICH2-GDR package provides support for clusters with NVIDIA GPUs
supporting the GPUDirect RDMA feature. The MVAPICH2-MIC package provides support
for clusters with Intel MIC coprocessors. MVAPICH2 software is powering several
supercomputers in the TOP 500 list.
- BSP++ Library
Is a generic library using C++ templates. Based on a hierarchical model, the BSP++ library takes the hybrid architectures (Multicore clusters and Cell BE accelerator based clusters) as native targets. Using a small set of primitives and intuitive concepts, BSP++ provides a simple way to program hybrid and heterogeneous architectures. It generates MPI, OpenMP, MPI+OpenMP, Cell BE and MPI+Cell BE codes with the same version of the user base code (the choice of the target machine is just a preprocessing symbole).
- BSPGen FrameWork
Is a tool for an automatic hybrid multi-level hierarchy (MPI + OpenMP or MPI +Cell BE) code generation. Using the BSP++ cost model, BSPGen predicts and generates the
appropriate hierarchical hybrid code (BSP++ code) for a given application on a target architecture. The prediction is based on a new pass on the LLVM compiler. BSPGen generates hybrid code from a list of sequential functions and a description of the parallel algorithm (XML file).
- I was Instructor/Teacher at Paris Sud 11 university. During my thesis, I taught several courses of different levels:
Polytechnique -IFIPS - University of Paris-sud 11
- (5 th degree - Formation Continue) Parallel and Distributed Programming
- (5 th degree - Apprentis 3) Parallel and Distributed Programming
IUT d'Orsay - University of Paris-sud 11
- (3 rd Degree) Operating Systems
UFR d'Orsay - University of Paris-sud 11
- (1 st Degree) Introduction to the Algorithmic and C langage