CS 540 High Performance Computing

 Announcments  Lectures  Programs  Course Resources  Assignments & Solutions  Grading Policy
Course Description
Covers the design, evaluation and use of high-performance processors, including instruction set architecture, pipelining, superscalar execution, instruction level parallelism, vector instructions, memory hierarchy, parallel computing including multi-core and GPU, and high-performance I/O. Special attention is given to the effective utilization of these features, including automated techniques, in the design and optimization of performance-driven software.

The course will be organized around three fundamental computations: matrix multiplication, the fast Fourier transform, and integer multiplication.

Course Objective
To develop the skills required to implement high-performance software, including the interaction between algorithms, computer architecture and compilers. To learn techniques for analyzing the performance of programs and their interaction with the underlying hardware. To understand features of modern processors that affect performance and be able to use these features in the design and optimization of high-performance software. To utilize techniques to automatically implement, optimize, and adapt programs to different platforms.
Students should have had undergraduate courses in data structures, discrete mathematics, linear algebra, and computer architecture.
Jeremy Johnson
Office: 100 University Crossings
phone: (215) 895-2669
e-mail: jjohnson AT cs DOT drexel DOT edu
office hours: W 4-6, T 7-8 (online). Additional hours by appointment.
Course mail list: Use discussion groups in BbVista
Meeting Time
W 6:00-9:00 in University Crossings 149
There is no text. However, the book Computer Systems: A Programmer's Perspective by Bryant and O'Hallaron will be referenced and several chapters (3, 5, 6, and 9) will be utilized. Students may optionally purchase this book and a copy will be available in the Cyber Learning Center (UC 147). The course will also reference the paper How To Write Fast Numerical Code: A Small Introduction by Srinivas Chellappa, Franz Franchetti and Markus Püschel. Additional Foundational material will come from standard texts on algorithms (e.g. Cormen, Leiserson, and Rivest), computer architecture (e.g. Hennessy and Patterson), and a summary journal paper on compiler optimization. The remainder of the material will come from notes from the instructor and recent journal papers including papers from the recent issue (Feb. 2005) of the Proceedings of the IEEE on "Program Generation, Optimization, and Platform Adaptation" and architecture reference manuals.


  1. Class Assignments and Quizzes (30%)
  2. Assignments 45% (3 each worth 15%)
  3. Take home final exam (25%)

All assignments must be completed alone unless otherwise stated. No Late assignments will be accepted without prior approval.


Reference Books
  1. More to be added.
Web Pages
Other Reference

Look Here for Important Announcements

(See BbVista)


This list is tentative and may be modified at the instructor's discretion.
  1. Lecture 1 (Sept. 23): Three Divide and Conquer Algorithms.
  2. Lecture 2 (Sept. 30): From Ops to Instructions.
  3. Lecture 3 (Oct. 7): Program Tuning and Optimization.
  4. Lecture 4 (Oct. 14): Optimizing for the Memory Hierarchy .
  5. Lecture 5 (Oct. 21): Automatic Performance Tuning.
  6. Lecture 6 (Oct. 28): WHT Package - A self-adapting package to compute the Walsh-Hadamard Transform (WHT). .
  7. Lecture 7 (Nov. 4): Short Vector Instructions (SIMD Computation).
  8. Lecture 8 (Nov. 11): Shared Memory Parallel Programming.
  9. Lecture 9 (Nov. 18): GPU Programming .
  10. Thanksgiving Holiday - No class (Nov. 25): .
  11. Lecture 10 (Dec. 2): Cell Processor.
  12. Final Exam Due (Dec. 9).



Created: 9/23/09 by jjohnson AT cs DOT drexel DOT edu