- Vectorize your best mini MMM code using SSE2 (2 way vectorization). You
may need to revisit your unrolled micro MMM to achieve appropriate
vectorization. Measure and record speedup.
- Use multithreading (either pthreads or openmp) to parallelize your MMM code
from assignment 2. You should use the vectorized code from part 1 for the mini
MMM. You should experiment with different scheduling strategies to perform
the mini MMM block multiplications in parallel. You may wish to introduce
another level of blocking; however, start by performing the mini MMM blocks
in parallel. Compute and present speedup for the different strategies you
tried. What was the best approach and why?
- Create a plot showing the MFLOPS of your best code for different matrix
sizes. Note that for small sizes the sequential code may be best. Compare
your best times with the original triply nested loop code. What was your
You should prepare a summary report of your experiments and performance data. Make
sure all relevent (e.g. compiler version and flags, machine type (memory, clock speed,
cache info) information for each set of data is provided. Make sure all plots are
labeled and easy to read. Raw data and all source files (along with compilation
instructions - i.e. a makefile) should be provided. The report should be submitted as a
pdf file. Make sure that any code that is timed is correct.
Students should submit their solution electronically using BbVista.
Submit a gzipped tar file, called A3.tar.gz (the tar file should contain a directory called A1 which
contains the files). The tar file should contain source code, instructions how to run your programs,
sample input and output files, and a README file. The README file should describe all files that are
included, contain instructions how to build and use the code, and outline how the code works.