**Course Description**- The fast evolution and increasing complexity of computing platforms pose a major challenge for developers of high performance numeric libraries: it is increasingly difficult to harness the available computing power; conversely, straightforward implementations may loose as much as one or two orders of magnitude in performance. Creating optimal implementations requires the developer to have an understanding of algorithms, capabilities and limitations of compilers, and the target platform's microarchitecture. For these reasons, a recent trend in numerical computing is towards "self-adaptable" software to achieve optimal performance and portability with reduced coding effort. This course introduces the student to the foundations and state-of-the-art techniques in high performance software development for numeric libraries (including linear algebra and signal processing kernels). Topics include: 1) fundamental tools in algorithm theory and analysis; 2) fast signal processing and numerical algorithms; 3) optimzing compilers, what they and can not do and how to write software that overcomes compiler limitations; 4) the role of the memory hierarchy and other microarchitectural features in software development; 5) how to use special instruction sets, such as SSE/MMX on Pentium; 6) an introduction to the concepts of self-adaptable software and program generators.
**Course Objective**- To develop the skills required to implement high-performance software, including the interaction between algorithms, computer architecture and compilers. To learn techniques for analyzing the performance of programs and their interaction with the underlying hardware. To utlize techniques to automatically implement, optimize, and adapt programs to different platforms.
**Prerequisites**- CS 557 (Data Structures and Algorithms I), CS 560 (Programming Languages), and undergraduate courses in discrete mathematics, linear algebra, and computer architecture.
**Instructor**- Jeremy Johnson
**Meeting Time**- T 6:00-9:00 in Crossings 153
**Textbook**-
There is no text.
The foundational material will come from standard
texts on algorithms (Cormen, Leiserson, and Rivest),
computer architecture (e.g. Hennessy and Patterson), and a summary
journal paper on compiler optimization. The remainder of the material
will come from notes from the instructor and recent journal papers including
papers from the recent issue (Feb. 2005) of the Proceedings of the IEEE on
"Program Generation, Optimization, and Platform Adaptation"

Grading- Assignments 40%
- Class Participation(20%)
- Final Project and Presentation 40%

Final grades will be determined by your total points weighted according to this distribution. Grades will be curved based on relative student performance. Students who successfully complete all of the homework and do reasonably well on the exam should receive a B. Students with high exam and project scores and who do well on the assignments will receive an A.

All assignments must be completed alone unless otherwise stated. No Late assignments will be accepted without prior approval.

Resources**Reference Books**- More to be added.
**Web Pages**- General Architecture References
- DLX
- The DLX Instruction Set Architecture Handbook
- A Neophyte's Guide to DLX
- WinDLX Manual
- SUPERDLX - simulator and compiler link

- Intel
- MIPS
- Sun
- Benchmarks
- Simulators and Performance Tools
- Scientific Computing
- Programming and Compiler Tools
- Architecture-Adapting Software
- SPIRAL project (Automatic Implementation of Signal Processing Algorithms)
- FFTW (High Performance, self-adapting FFT package)
- ATLAS (Automatically Tuned Linear Algebra Software)
- PHIPAC (Portable High Performance ANSI C)
- Sparsity (Automatically tuned sparse matrix package)
- WHT package (Self-adapating package for computing the Walsh-Hadamard Transform)

- More to be added.
**Other Reference**

Announcements (Thur. Nov. 30 @ 9:50pm)

Look Here for Important Announcements

This list is tentative and may be modified at the instructor's discretion.

Lectures- Design and analsis of divide and conquer algorithms
- The fast Fourier transform (FFT)
- Numeric Recipes vs. FFTW
- Overview of computer architecture
- Overview of optimizing compilers
- Guide to benchmarking
- LAPACK and BLAS
- Oveview of program generation and optimization
- ATLAS/Sparsity
- FFTW
- SPIRAL

Programs- Intel Processor Frequency ID Utility
- windlx_d.exe Windows version of a DLX pipeline simulator
- Wcpuid.zip Windows program to gather information about CPU and cache.

Assignments- Assignment Submission
- Assignment 1 - Divide and Conquer Algorithms and Analysis - due at the start of class on July 5.
- strassen.pdf - Solution to operations count recurrence (with cutoff 1 and more generally cutoff c)
- strassen.mws - Solution to assignment 1 (maple worksheet)
- Assignment 2- FFT Timings and Analysis - due at the start of class on July 19.
- Project List
- Project Guidelines

### Exam Studyguide

- Studyguide for the midterm exam.

### Solutions

- To be added when appropriate

Created: 12/29/03 by jjohnson@cs.drexel.edu

- Office: 100 University Crossings

phone: (215) 895-2669

e-mail: jjohnson@cs.drexel.edu

office hours: M 4-6:30, T 4-6. Additional hours by appointment.

Course mail list: HPC AT cs dot drexel dot edu