Professor Ernesto Gomez
home page: www.cse.csusb.edu/egomez/
Class: 12-2:50 PM JBH 113
text: Programming Massively Parallel Processors, 3d edition, Kirk, ISBN: 9780128119860, Publisher: Elsevier Science & Technology Books
Notes on MPI programming (and use together with CUSA) - text CH 18
MPI 1.1 standard
MPI 2 standard
Designing and Building Parallel Programs , Ian Foster (Full text online) - CH 8.
Nomerical recipes code for various numeric algorithms
SHARED MEMORY REFERENCES:
(SEE CLASS NOTES, BELOW)
This course will focus on programming distributed computer systems. The course will include theoretical models of distributed processing, their application to real systems, programming paradigms and algorithms for distributed processing.
Specific objectives of the course are:
To study the main theoretical models of concurrent processing
To study the major programming paradigms of distributed processing
To learn how concurrent systems are programmed through some example algorithms, and program them using different models of parallelism
Other topics will be covered as resources and time permit; added topics may be explored depending on student interest.
This class will include advanced topics of current research that are not present in the text (or in any text). Where possible, additional references will be cited, and notes will be provided, but some topics will, of necessity only be covered in class discussion. It is your responsibility to take notes of such topics.
ANNOUNCEMENTS:THE FINAL - due Friday, December 7 by email.
CLASS NOTES and REFERENCES
Visualization of parallel and distributed execution:PDF
Further notes on parallel visualization:(presentation at Baylor University, Spring 2015
Barrier implementation: hypercube reduction/broadcast algorithm : sample code
MPICH manual: Postcript / PDF
1. Designing and
Building Parallel Programs , Ian Foster (Full text online)
2. Harry F. Jordan and Gita Alaghband, "Fundamentals of Parallel Processing", Prentice Hall 2003
3. Hagit Attiya and Jennifer Welch, "Distributed Computing - Fundamentals, Simulations and Advanced Topics", McGraw-Hill 1998
4. J. Blazaewicz, K. Ecker, B. Plateaus and D. Trystram, Eds. "Handbook on Parallel and Distributed Processing", Springer 2000
5. Raymond Greenlaw, H.James Hoover, Walter L. Ruzzo; "Limits to Parallel Computation", Oxford University Press
6. Andrew S. Tanenbaum and Maarten van Steen, " Distributed Systems - Principles and Paradigms", Prentice-Hall 2002
7. Krzystof R. Apt and Ernst-Rudinger Olderog, "Verification of Sequential and Concurrent Programs", Second Edition, Springer 1997
Parallel computing timeline (from Gregory V. Wilson, University
of Toronto): Text
OpenMPI docs and downloads
PVM Home Page and downloads
CRPC - Rice University
Analysis of Algorithms
BSP home page
MIT parallel+distributed systems
MIT DSM download page
Week1 - introduction - read 1,2,3 in distributed computing notes Gomez
Week2 - MPI and message passing - MPI standard 1.1, 18.3-18.6 text
Week3 - point to point messages, collective operations. semantics of messages, issues
with deadlock. efficiency and Amdahl's law in parallel computing, scalability.
(some of this is in notes on distributed computing, will be adding references
Week4 - Shared memory
1)Write a program to find all prime numbers up to some value M, using the Sieve of Eratosthenes algorithm we discussed in class (look up details on the web)
You will be parallelizing this algorithm using MPI, OpenMP and CUDA (details will be discussed, Week 2 we will discuss MPI, others lateri.
You want to evaluate difficulty of getting it to work and how long calculating takes under different methods and on different compute platforms (single 8 core machine, a set of machines on the network, a single GPU)
2) Same as above for a parallel optimization code: downhill simplex (see description of algorithm in link to "Numerical Recipes", above).
3) Alternative: write a program to model a cellular automaton (see links above on cellular automata). The grid should be at least 600x600, the neighborhood around each cell should be variable, allowing radius from 1 (3x3 square) to at least 3 (7x7 square centered on a cell).
Notes on parallelizing cellular automata
Here is how to compile and link programs that combine MPI and CUDA code. In the example, the cuda code is "cuda.cu" and the mpi code is "mpi.c". The final executable is "cudaMPI"
Programming note: if programming in C or in C++, output from different processes may appear out of order, even if the processes are ordered, due to buffered standard output. When using printf to standard output, follow it with "fflush(stdio);" to force the buffers to empty. In C++, almays terminate COUT statements with "endl"
Only do this if output order from multiple processes is important, because it slows things down.