Professor Ernesto Gomez
home page: www.cse.csusb.edu/egomez/
Class: 4-5:50 PM CE 113
text: L. Ridgway Scott, Terry Clark, Babak Bagheri, "Scientific Parallel Computing", Princeton 2005
This course will focus on programming distributed computer systems. The course will include theoretical models of distributed processing, their application to real systems, programming paradigms and algorithms for distributed processing.
Specific objectives of the course are:
To study the main theoretical models of distributed processing
To study the major programming paradigms of distributed processing
To learn how distributed systems are programmed through some examples of distributed algorithms
To learn how program a distributed computer system in the message passing model, using MPI
Other topics will be covered as resources and time permit; added topics may be explored depending on student interest.
This class will include advanced topics of current research that are not present in the text (or in any text). Where possible, additional references will be cited, and notes will be provided, but some topics will, of necessity only be covered in class discussion. It is your responsibility to take notes of such topics.
You will be using mpi (MPICH v2), NVIDIA compiler, PC (parallel C, one of the Planguages – see text, possibly other software depending on need and interest.
Your syllabus is this page plus the links:
(Tentative) Class Schedule/outline (
Grading and assessment
If you are in need of an accomodation for a disability in order to participate in this class, please see the instructor and contact Services to Students with Disabilities at (909)537-5238.
ANNOUNCEMENTS:Class today Monday June 13 will be in regular classroom CE 113. The FINAL is posted here and is due Friday June 17. The term project is due June 17.
Latest version of PC is available in this link
CELLULAR AUTOMATA LINKS - more to follow. See TERM PROJECT link, below.
Conway's Game of Life
Some other cellular automata
I have placed files "profile" and "bashrc" in /home/CUDA4_SDK
If you are having trouble with environment or paths, try copying these files into your directory and renaming them to .bashrc and .profile - for example, "cp /home/CUDA4_SDK/profile ." followed by "mv profile .profile"
Here is how to compile and link programs that combine MPI and CUDA code. In the example, the cuda code
is "cuda.cu" and the mpi code is "mpi.c". The final executable is "cudaMPI"
$ nvcc -c cuda.cu // Makes cuda.o
$ mpicc -c mpi.c // Makes mpi.o
$ mpicc mpi.o cuda.o -o cudaMPI -lcudart -L /usr/local/cuda/lib64
==> makes cudaMPI. Execute with mpirun or mpiexec, like a standard mpi program.
With thanks to Ahmed Algadi, who figured out this sequence of instructions and in particular the library specifications.
As of January 31 all nodes on AKEK are up, NFS is restored and CUDA devices are accessible on all nodes.
You should have done MPI hello program. You should be working on Sieve of Eratosthenes: a)sequential
b)MPI c)CUDA. If you look at: Class
Schedule/outline (preliminary) you will find we aree actually more or less following it and we are where
we should be in weeks 3-4. The material we have covered/are covering is in chapters 1-5 of
Collected notes with index
Your accounts should be working on all nodes. Paths to the CUDA and MPI compilers are
given below. CUDA documentation is on the web (look at the CUDA SDK) but I will be creating
an accessible repository. Links to the MPI 1 and 2 standards are given below, even though
we are using MPI 2, the MPI 1 standard is more useful for the basic send and receive operations.
You need to add the following to your .bashrc file:
Use ONE of the following two paths for MPI
system MPI path:
--- export PATH="/usr/local/mvapich2/mvapich2-1.6/x86_64/gcc/bin/:$PATH"
standard MPI path:
--- export PATH="/home/mpich2/bin
These lines set the path to the CUDA tools, libraries and mpich binaries. They are also in the file /etc/akekpath on the master node, so you can copy them from there
You will be programming on akek.ias.csusb.edu. Akek is a 6 node cluster , each node has two 6 core CP Us, nodes 1-5 each have 3 NVIDA Tesla 2060 GPUs. Access is only by ssh to the head node, nodes are connected via two local private networks – Gigeth and 20GB Infiniband.
CLASS NOTES and REFERENCES
Visualization of parallel and distributed execution:PDF
Further notes on parallel visualization:(presentation at Baylor University, Spring 2015
The Loop! A cycle in time without a message cycle between processes
SPMD execution, synchronization, fusions (old)
Loop transformations and message passing
Barrier implementation: hypercube reduction/broadcast algorithm : sample code
MPI 1.1 standard
MPI 2 standard
MPICH manual: Postcript / PDF
1. Designing and
Building Parallel Programs , Ian Foster (Full text online)
2. Harry F. Jordan and Gita Alaghband, "Fundamentals of Parallel Processing", Prentice Hall 2003
3. Hagit Attiya and Jennifer Welch, "Distributed Computing - Fundamentals, Simulations and Advanced Topics", McGraw-Hill 1998
4. J. Blazaewicz, K. Ecker, B. Plateaus and D. Trystram, Eds. "Handbook on Parallel and Distributed Processing", Springer 2000
5. Raymond Greenlaw, H.James Hoover, Walter L. Ruzzo; "Limits to Parallel Computation", Oxford University Press
6. Andrew S. Tanenbaum and Maarten van Steen, " Distributed Systems - Principles and Paradigms", Prentice-Hall 2002
7. Krzystof R. Apt and Ernst-Rudinger Olderog, "Verification of Sequential and Concurrent Programs", Second Edition, Springer 1997
Parallel computing timeline (from Gregory V. Wilson, University
of Toronto): Text
PVM Home Page and downloads
CRPC - Rice University
Analysis of Algorithms
BSP home page
MIT parallel+distributed systems
MIT DSM download page
TERM PROJECT : Implement and test one of exercises 7, 9, 11 or 23 from Ian Foster (see recommended references above). You will implement using either MPI message passing or PC (see chapters 7, 8 of text).
Alternative - you may do a parallel implementation of a cellular automaton program, which should partition a square grid into P square grids for P processes and support Von Neumman neighborhoods of r>4. You may also inplement any of the projects on a GPU using CUDA. : You may work individually or in teams of up to 3 people for the project.
Programming and theory problems - Individual work. Do (2),(4) and (7).
(1) Write a program to determine, using MPI point to point messages:
-a) Latency for a minimal MPI message between 2 nodes
-b) bandwidth as a function of increasing message size for MPI
All tests should be performed multiple times. You should calculate average and standard deviation. for part b) you should display results in tabular and/or graphic form.
You should test both the Ethernet and the Infiniband networks on AKEK. Do not mix networks. The master node has a somewhat different installation than the others, so for consistency ssh into one of nodes1-node6 and use a pair of nodes that does not include the head node.
There is no specific date for finishing all assignments, except that all work must be turned in on or before the last day of the term. Later work may result in an incomplete so I have time to review it. Deliverables on programs must include source code. In all cases, you should also turn in a written report, as a typed PDF file describing what experiments you did, showing your results (in case of programs) or solving the given problem (for theory).
Programming note: if programming in MPI and C, or in PC, printf from different processes may appear out of order, even if the processes are ordered, due to buffered standard output. When using printf to standard output, follow it with "fflush(stdio);" to force the buffers to empty. Only do this if output order from multiple processes is important, because it slows things down.