Assignment 3

CS 540 High Performance Computing
Instructor: Jeremy Johnson 
Due date: (Wed. Dec. 9 at midnight)

In this assignment, you will vectorize and parallelize your matrix multiplication code from .
  1. Vectorize your best mini MMM code using SSE2 (2 way vectorization). You may need to revisit your unrolled micro MMM to achieve appropriate vectorization. Measure and record speedup.
  2. Use multithreading (either pthreads or openmp) to parallelize your MMM code from assignment 2. You should use the vectorized code from part 1 for the mini MMM. You should experiment with different scheduling strategies to perform the mini MMM block multiplications in parallel. You may wish to introduce another level of blocking; however, start by performing the mini MMM blocks in parallel. Compute and present speedup for the different strategies you tried. What was the best approach and why?
  3. Create a plot showing the MFLOPS of your best code for different matrix sizes. Note that for small sizes the sequential code may be best. Compare your best times with the original triply nested loop code. What was your overall improvement.

Submission

You should prepare a summary report of your experiments and performance data. Make sure all relevent (e.g. compiler version and flags, machine type (memory, clock speed, cache info) information for each set of data is provided. Make sure all plots are labeled and easy to read. Raw data and all source files (along with compilation instructions - i.e. a makefile) should be provided. The report should be submitted as a pdf file. Make sure that any code that is timed is correct.

Students should submit their solution electronically using BbVista. Submit a gzipped tar file, called A3.tar.gz (the tar file should contain a directory called A1 which contains the files). The tar file should contain source code, instructions how to run your programs, sample input and output files, and a README file. The README file should describe all files that are included, contain instructions how to build and use the code, and outline how the code works.