CS 480/680 Program Generation and Optimization

 Announcments  Lectures  Programs  Course Resources  Assignments & Solutions  Grading Policy
Course Description
The fast evolution and increasing complexity of computing platforms pose a major challenge for developers of high performance numeric libraries: it is increasingly difficult to harness the available computing power; conversely, straightforward implementations may loose as much as one or two orders of magnitude in performance. Creating optimal implementations requires the developer to have an understanding of algorithms, capabilities and limitations of compilers, and the target platform's microarchitecture. For these reasons, a recent trend in numerical computing is towards "self-adaptable" software to achieve optimal performance and portability with reduced coding effort. One approach to self-adapting software is the automatic generation of algorithms and implementations and the use of intelligent search to find the "best" implementation on a given platform.

This course introduces the student to the foundations and state-of-the-art techniques in high performance software development for numeric libraries (including linear algebra and signal processing kernels). Topics include: 1) fundamental tools in algorithm theory and analysis; 2) fast signal processing and numerical algorithms; 3) optimzing compilers, what they and can not do and how to write software that overcomes compiler limitations; 4) the role of the memory hierarchy and other microarchitectural features in software development; 5) how to use special instruction sets, such as SSE/MMX on Pentium; 6) an introduction to the concepts of self-adaptable software and program generators.

The course will be organized around three fundamental computations: matrix multiplication, the fast Fourier transform, and integer multiplication.

Course Objective
To develop the skills required to implement high-performance software, including the interaction between algorithms, computer architecture and compilers. To learn techniques for analyzing the performance of programs and their interaction with the underlying hardware. To utilize techniques to automatically implement, optimize, and adapt programs to different platforms.
Graduate students should have had CS 521 (Data Structures and Algorithms I), CS 550 (Programming Languages), and undergraduate courses in discrete mathematics, linear algebra, and computer architecture.

Undergraduate students in the 480 section must have CS 260, MATH 201, MATH 221, and CS 282.

Jeremy Johnson
Office: 100 University Crossings
phone: (215) 895-2669
e-mail: jjohnson@cs.drexel.edu
office hours: T,R 10-11, W 5-6. Additional hours by appointment.
Course mail list: HPC AT cs dot drexel dot edu
Meeting Time
W 6:00-9:00 in Crossings 153
There is no text. The foundational material will come from standard texts on algorithms (Cormen, Leiserson, and Rivest), computer architecture (e.g. Hennessy and Patterson), and a summary journal paper on compiler optimization. The remainder of the material will come from notes from the instructor and recent journal papers including papers from the recent issue (Feb. 2005) of the Proceedings of the IEEE on "Program Generation, Optimization, and Platform Adaptation"


  1. Project 1 - Matrix Multiplication (20%)
  2. Final Project - WHT (30%)
  3. Class Participation(20%)
  4. Quizzes 30% (2 each worth 15 %)

All assignments must be completed alone unless otherwise stated. No Late assignments will be accepted without prior approval.


Reference Books
  1. More to be added.
Web Pages
Other Reference
  • More to be added.

Look Here for Important Announcements

Announcements ()


This list is tentative and may be modified at the instructor's discretion.
  1. Lecture 1 (Sept. 27): Three Divide and Conquer Algorithms
  2. Lecture 2 (Oct. 4): Implementation and Optimization of Matrix Multiplication
  3. Lecture 3 (Oct. 11): ATLAS - Automatically Tuned Linear Algebra Subroutines - Quiz 1.
  4. Lecture 4 (Oct. 18): Program Tuning and Optimization
  5. Lecture 5 (Oct. 25): Optimizing for the Memory Hierarchy .
  6. Lecture 6: (Nov. 1): A Family of Divide and Conquer Algorithms for the Walsh-Hadamard Transform
  7. Lecture 7 (Nov. 8) FFTW vs. Numeric Recipes
  8. Lecture 8 (Nov. 15): FFT Algorithms and Operation Count
  9. Lecture 9 (Nov. 29): FFT Compiler
  10. Lecture 10 (Dec. 6): SPIRAL - Generating Loop code
  11. Final Project Presentations (Dec. 13)



Created: 9/27/06 by jjohnson@cs.drexel.edu