Lecture 8: Shared Memory Parallel Programming

Background Material



Lecture Slides



  1. Implement and time a Parallel radix 2 divide and conquer WHT using Pthreads. Compute speedup compared to your sequential code. What was the smallest input size for which you obtained speedup?
  2. Implement and time parallel multiple WHTs, i.e. I_M tensor W_N using openMP.
  3. Implement and time a Parallel WHT, i.e. W_MN = (W_M tensor I_N)(I_M tensor W_N), using OpenMP. Both the divide and conquer parts should be parallelized. To improve performance (remove false sharing) loop interleaving should be used for (W_M tensor I_N).

Related Links and Info

  1. Multi-core page from wikipedia
  2. Multi-core processors
  3. xeon 5000 series - Intel quad core (kodiak.cs.drexel.edu) information.
  4. Intel Nehalem microarchitecture
  5. Hyper-threading page from wikipedia
  6. Pthreads page from wikipedia
  7. Pthreads tutorial.
  8. openmp.org
  9. openMP specifications
  10. OpenMP tutorial.
Created: Nov. 18, 2008 by jjohnson AT cs DOT drexel DOT edu