Lecture 3: Automatically Tuned Linear Algebra Subroutines
Background Material
- Lecture 2 on matrix multiplication.
- Memory Hierarchy (Chapter 5 of Hennessy and Patterson,
Chapter 6 of Bryant and O'Hallaron)
- Optimizing Program Performance (Chapter 5 of Bryant and O'Hallaron)
Reading
- Kamen Yotov, Xiaoming Li, Gang Ren, Maria Jesus Garzaran,
David Padua, Keshav Pingali, and Paul Stodghill, "Is Search Necessary
to Generate High-Performance BLAS?", In Special Issue on: Program
Generation, Optimization, and Platform Adaptation, Proc. of the IEEE,
Vol. 93, No. 2, 2005.
- ATLAS
Topics
- ATLAS (Automatic tuning of matrix multiplication)
Tasks
- Determine machine parameters (CPU type, CPU speed, CPU info such
as pipeline and functional units, memory, cache info)
- Time and instrument matrix multiplication code.
- Experiment with variants of matrix multiplication.
- Install ATLAS
and MKL
(compare to
Numeric Recipes)
Lecture Notes
Resources
Assignments
Created: Sept. 27, 2006 by jjohnson@cs.drexel.edu