MATLAB: Case Studies
Overview
Teaching: 30 min
Exercises: 10 minQuestions
What kind of MATLAB code can be paralellized?
Objectives
Illustrate differences between code that does and does not parallelize well.
Practice parallelizing code.
Example MATLAB parfor
Monte Carlo Illustration
It is easier to learn how to program by working through real examples.
Let us look at a typical example of a Monte Carlo process - computing the value of pi.
First, to understand what this example does let’s run code/montecarlo/montecarlo_illustration.m. This code display a quarter of a circle. Random points are selected and determined to be inside or outside the circle. The ratio of points inside vs outside the circle is proportional to pi. So as the random selection of points grows, we can approximate the value of pi.
Monte Carlo Parallel
The montecarlo_illustration.m code only shows a few points. Let’s now move on to code/montecarlo/montecarlo.m. This code is similar, except it generates many more points, and for speed it no longer displays a visual image.
How could we improve the performance of this code?
Can this code be parallelized? We only have one loop that could be a candidate for parfor
. Each iteration of the loop has its own temporary x
,y
values. The only shared variable is incircle
, which satisfies classification as a reduction variable, and the loop variable counts consecutive integers. So each iteration of the loop is in indeed independent. The code even has tic/toc commands already, so we are all set to convert this to a parallel loop.
Go ahead and rename the for
statement to a parfor
statement. Now let’s see whether it goes any faster. But first, start a MATLAB parallel pool. You may do so using the parpool command you saw previously, or using MATLAB’s parallel pool menu in the bottom left corner. MATLAB is usually configured to automatically start a parallel pool when needed. However, our pi calculation doesn’t take very long unlike a full-scale problem where the parallel pool startup time would not be significant. So let’s avoid skewing the timing results in our small example and exclude timing the parallel pool startup.
Once the pool is started, run your parallel code. What speedup did you obtain?
Limits of parfor
It is important to reiterate that not all MATLAB for
loops can be converted to parfor
.
For example, the prefix sum problem computes the sums of elements up to a particular location in a vector, for every location in the vector. In MATLAB this is called the cumsum
function. Our own code to compute this might look like code/small/prefixsum.m
There is an dependency for the current prefix sum element on the previous one. This stops us from being able to use a parfor. When working on your own projects, it is worthwhile to remember that not all MATLAB loops can be parallelized with parfor
.
Parallel MATLAB performance
Here is another example of a parallel loop. This fast parallel code obtains appreciable speedup vs the original sequential code:
n = 200;
A = 500;
a = zeros(n);
parfor i = 1:n
a(i) = max(abs(eig(rand(A))));
end
On the other hand, this slow parallel code does not run faster than the sequential code:
n = 1024;
A = zeros(n);
parfor (i = 1:n)
A(i,:) = (1:n) .* sin(i*2*pi/1024);
end
The first loop performs a longer computation within each loop iteration, and only returns a single value from the workers back to the main MATLAB process. On the other hand, the second loop takes extra time because it shares more data with the workers. This is an example of how communication overhead can adversely affect performance of parallel code.
For reference: https://www.mathworks.com/help/distcomp/decide-when-to-use-parfor.html
Fixing code for MATLAB parfor
Let’s look at code that computes and then draws the Mandelbrot set. The Mandelbrot set is an example of a fractal, where a rough-edged geometric pattern repeats indefinitely as you zoom in. It is fun to look at, so it makes a good example.
The code is available at code/mandelbrot/mandelbrot_fixthis.m. If you try running this now, you will see that it takes a while to compute all the points in a particular zoomed image on the Mandelbrot set. The set is then displayed, along with the time to compute. This code can be parallelized with a parfor, but a few fixes will need to be made. Can you figure out how to make a parfor
version of mandelbrot.m?
This example is based on code from the Mathworks website
Key Points
Monte Carlo simulation is an example that obtain good speedup in parallel.
Prefix Sum is an example that cannot be parallelized using parfor.
Some code can be parallelized, but speedup may be greatly affected by communication overhead.