CS 680 GPU Program Optimization

 Announcments  Lectures Programs  Course Resources Assignments & Solutions  Grading Policy
Course Description
A seminar style course on Parallel programming using GPUs and CUDA. Covers GPU architecture and program optimization. Also covers distributed memory parallel programming using MPI in order to utilize cluster with multiple GPUs. The course will be project based and students will implement and optimize a significant parallel program using a 24 node cluster with each node consisting of 6 GPUs in addition to a multicore CPU. Students will present papers related to an algorithm of choice in preparation for their project.
Course Objective
  1. To be able to design, implement, and analyze correct and efficient parallel programs.
  2. To be able to write correct and efficient GPU programs with CUDA.
  3. To be able to write correct and efficient message passing programs with MPI.
  4. To read and synthesize research papers on parallel computing.
  5. To implement a substantial parallel program exhibiting significant parallel speedup.
Prerequisites
Course on parallel programming (CS 676), parallel architecture (ECEC 622), or High Performance Computing (CS 540) or (ECEC 621) or permission of the instructor.
Instructor
Jeremy Johnson
Office: 100C University Crossings
Phone: (215) 895-2669
E-mail: jjohnson AT cs DOT drexel DOT edu
Office hours:W 5-6 (or by appointment)
Meeting Time
W 6:00-9:00 in UC 153 (or online)
Textbook
The books on CUDA and MPI will be utilized for the first four lectures. Additional research papers and references will be used.
  1. Peter Pacheco, Parallel Programming with MPI, Morgan Kaufman Publishers, 1996.
  2. Jason Sanders and Edward Kandro, CUDA by Example: An Introduction to General-Purpose GPU Programming , Addison-Wesley Professional, 2010 (Safari).


Grading

  1. Class participation 40% (paper discussion and presentation)
  2. Project 60%


Resources

Reference Books
  1. F. Thomas Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees & Hypercubes, Morgan Kaufman Publishers, 1992.
  2. Randima (Randy) Fernando, Ed., GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics, Addison-Wesley Professional, 2004 (online).
  3. Matt Pharr, Ed., GPU Gems 2: Programming Techniques for High-Performance Graphics, Addison-Wesley Professional, 2005 (online).
  4. Hubert Nguyen, Ed., GPU Gems 3 - 3D and General Programming Techniques for GPUs, Addison-Wesley Professional, 2007 (online).
  5. Peter Pacheco, Parallel Programming with MPI, Morgan Kaufman Publishers, 1996.
  6. William Gropp, Ewing Lusk and Anthony Skjellum, Using MPI, 2nd Edition: Portable Parallel Programming with the Message Passing Interface The MIT Press, 1999.
  7. William Gropp, Ewing Lusk and Rajeev Thakur, Using MPI-2 Advanced Features of the Message Passing Interface, The MIT Press, 1999.
  8. Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra, MPI: The Complete Reference (Vol. 1 - The MPI Core), 2nd Ed., 1998. The first edition is available online at MPI: The Complete Reference.
  9. Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra, MPI: The Complete Reference (Vol. 2 - The MPI-2 Extensions), 1998.
Web Pages


Look in BbVista for Announcements


Lectures

This list is tentative and may be modified at the instructor's discretion.
  1. Message Passing with MPI
  2. Grouping Data and Communicators in MPI
  3. Advanced Communication in MPI
  4. Parallel Programming with CUDA
  5. Project Reports


Programs


Assignments


Created: 12/29/2010 by jjohnson AT cs DOT drexel DOT edu