Assignment 1
CS 540 High Performance Computing
Instructor: Jeremy Johnson
Due date: (Wed. Oct. 29 by the start of class)
[Preliminary data should be obtained by class time on Oct. 22]
In this assignment you will empirically explore the performance of matrix multiplication.
You will start with the code from Numeric Recipes
(included here), explore several variants,
and finally compare against high quality code from ATLAS or
Intel's MKL library .
Numerical Recipes
Here is the code from Numeric Recipes (along with a modified version which uses block matrix multiplication.
Assignment Tasks
- Determine the architectural features of your computing platform (clock speed, machine type,
amount of memory, cache parameters, number of functional units, pipeline information).
You should use various microbenchmark programs to gather this information (e.g. those
that come with PAPI such as mem_info, MOB,
Calibrator,
lmbench,
X-Ray.
You should confirm as much of the data as possible using CPUID or an architecture manual.
- Carefully time and plot the running time for the Numeric recipes matrix multiplication routine
discussed in class. Present the data in terms of MFLOPS. Make sure to provide relevent compiler
information (type, version, flags). Compare optimized and unoptimized versions. Also measure
instructions and L1 data cache misses, and other interesting metrics available through PAPI).
Plot the normalized data (i.e. divided by the number operations as in MFLOPS) and compare to peak MFLOPS
obtainable on your machine.
- Experiment with the different loop orders (i.e. ijk, jik, ikj, jik, kij, kji) and the blocked version.
You should measure and plot time, instructions, and other interesting metrics available through PAPI,
comparing them to the data in part 2.
- Install ATLAS or MKL on your computing platform and perform the same timings/measurements as you did for
numeric recipes and compare. [For extra credit install both and compare]
Submission
You should prepare a summary report of your experiments and performance data. Make sure all relevent (e.g. compiler
version and flags, machine type (memory, clock speed, cache info) information
for each set of data is provided. Make sure all plots are labeled and easy to read. Raw data and all
source files (along with compilation instructions - i.e. a makefile) should be provided. The report should
be submitted as a pdf file. Make sure that any code that is timed is correct.
Students should submit their solution electronically using BbVista.
Submit a gzipped tar file, called A1.tar.gz (the tar file should contain a directory called A1 which
contains the files). The tar file should contain source code, instructions how to run your programs,
sample input and output files, and a README file. The README file should describe all files that are
included, contain instructions how to build and use the code, and outline how the code works.