**Course Description**-
The fast evolution and increasing complexity of computing platforms pose a
major challenge for developers of high performance numeric libraries: it is
increasingly difficult to harness the available computing power; conversely,
straightforward implementations may loose as much as one or two orders of
magnitude in performance. Creating optimal implementations requires the
developer to have an understanding of algorithms, capabilities and limitations
of compilers, and the target platform's microarchitecture. For these reasons,
a recent trend in numerical computing is towards "self-adaptable" software to
achieve optimal performance and portability with reduced coding effort. One
approach to self-adapting software is the automatic generation of algorithms
and implementations and the use of intelligent search to find the "best"
implementation on a given platform.
This course introduces the student to the foundations and state-of-the-art techniques in high performance software development for numeric libraries (including linear algebra and signal processing kernels). Topics include: 1) fundamental tools in algorithm theory and analysis; 2) fast signal processing and numerical algorithms; 3) optimzing compilers, what they and can not do and how to write software that overcomes compiler limitations; 4) the role of the memory hierarchy and other microarchitectural features in software development; 5) how to use special instruction sets, such as SSE/MMX on Pentium; 6) an introduction to the concepts of self-adaptable software and program generators.

The course will be organized around three fundamental computations: matrix multiplication, the fast Fourier transform, and integer multiplication.

**Course Objective**- To develop the skills required to implement high-performance software, including the interaction between algorithms, computer architecture and compilers. To learn techniques for analyzing the performance of programs and their interaction with the underlying hardware. To utilize techniques to automatically implement, optimize, and adapt programs to different platforms.
**Prerequisites**-
Graduate students should have had CS 521 (Data Structures and Algorithms I),
CS 550 (Programming Languages), and undergraduate courses in discrete
mathematics, linear algebra, and computer architecture.
Undergraduate students in the 480 section must have CS 260, MATH 201, MATH 221, and CS 282.

**Instructor**- Jeremy Johnson
**Meeting Time**- W 6:00-9:00 in Crossings 153
**Textbook**-
There is no text.
The foundational material will come from standard
texts on algorithms (Cormen, Leiserson, and Rivest),
computer architecture (e.g. Hennessy and Patterson), and a summary
journal paper on compiler optimization. The remainder of the material
will come from notes from the instructor and recent journal papers including
papers from the recent issue (Feb. 2005) of the Proceedings of the IEEE on
"Program Generation, Optimization, and Platform Adaptation"

Grading- Project 1 - Matrix Multiplication (20%)
- Final Project - WHT (30%)
- Class Participation(20%)
- Quizzes 30% (2 each worth 15 %)

All assignments must be completed alone unless otherwise stated. No Late assignments will be accepted without prior approval.

Resources**Reference Books**- More to be added.
**Web Pages**- General Architecture References
- DLX
- The DLX Instruction Set Architecture Handbook
- A Neophyte's Guide to DLX
- WinDLX Manual
- SUPERDLX - simulator and compiler link

- Intel
- MIPS
- Sun
- Benchmarks
- Simulators and Performance Tools
- Scientific Computing
- Programming and Compiler Tools
- Architecture-Adapting Software
- SPIRAL project (Automatic Implementation of Signal Processing Algorithms)
- FFTW (High Performance, self-adapting FFT package)
- ATLAS (Automatically Tuned Linear Algebra Software)
- PHIPAC (Portable High Performance ANSI C)
- Sparsity (Automatically tuned sparse matrix package)
- WHT package (Self-adapating package for computing the Walsh-Hadamard Transform)

- More to be added.
**Other Reference**- More to be added.

Announcements ()

Look Here for Important Announcements

This list is tentative and may be modified at the instructor's discretion.

Lectures- Lecture 1 (Sept. 27): Three Divide and Conquer Algorithms
- Lecture 2 (Oct. 4): Implementation and Optimization of Matrix Multiplication
- Lecture 3 (Oct. 11): ATLAS - Automatically Tuned Linear Algebra Subroutines - Quiz 1.
- Lecture 4 (Oct. 18): Program Tuning and Optimization
- Lecture 5 (Oct. 25): Optimizing for the Memory Hierarchy .
- Lecture 6: (Nov. 1): A Family of Divide and Conquer Algorithms for the Walsh-Hadamard Transform
- Lecture 7 (Nov. 8) FFTW vs. Numeric Recipes
- Lecture 8 (Nov. 15): FFT Algorithms and Operation Count
- Lecture 9 (Nov. 29): FFT Compiler
- Lecture 10 (Dec. 6): SPIRAL - Generating Loop code
- Final Project Presentations (Dec. 13)

Programs- Intel Processor Frequency ID Utility
- windlx_d.exe Windows version of a DLX pipeline simulator
- Wcpuid.zip Windows program to gather information about CPU and cache.

Assignments- Assignment 1 - Matrix multiplication generator - due Friday Oct. 27
- Assignment 2 - SIMD WHT Generator - due in class on Wed. Dec. 13

Created: 9/27/06 by jjohnson@cs.drexel.edu

- Office: 100 University Crossings

phone: (215) 895-2669

e-mail: jjohnson@cs.drexel.edu

office hours: T,R 10-11, W 5-6. Additional hours by appointment.

Course mail list: HPC AT cs dot drexel dot edu