MATLAB: Best Practices
Overview
Teaching: 15 min
Exercises: 0 minQuestions
What guidelines are there for improving MATLAB code performance?
Objectives
Review best practices for MATLAB performance beyond just writing parallel code.
Overview
The emphasis in this workshop is on parallel computing. Parallelism is great for using all your processor cores. With parallelism, you can speed up already “tuned” code to go beyond what you can do on your laptop. The point of parallel computing is to improve your code’s performance. Because clock speed is not going up (i.e. CPU cores are not getting faster), we need to make efficient use of multicore hardware to achieve performance gains.
However, it is also worth noting tricks that can speed up your code before parallelizing. This includes some tips from Mathworks to improve MATLAB code performance. These techniques are preallocation, vectorization, parallelization, and loop constants.
Preallocation
In MATLAB, variables do not have to be declared. If you are saving into a matrix’s row or column that doesn’t yet exist, then MATLAB will grow the size of the matrix. Under the hood, MATLAB actually has to create a new matrix in RAM and copy the data to it from the previous, smaller matrix. If a loop writes data to a new row each iteration, then this matrix resize is repeated many times, and can greatly effect performance.
You can avoid this performance problem by first creating an empty matrix large enough for all your data - this is called preallocation. For example, let’s run this code code/small/bestpractice_preallocate.m:
N = 2E5;
clear A;
tic;
for i = 1:N
A(i,:) = rand(1,4);
end
fprintf('Time without preallocation: %f seconds\n', toc);
Time without preallocation: 12.282884 seconds
N = 2E5;
clear A;
tic;
A = zeros(N, 4);
for i = 1:N
A(i,:) = rand(1,4);
end
fprintf('Time with preallocation: %f seconds\n', toc);
Time with preallocation: 0.166861 seconds
For reference, see https://www.mathworks.com/help/matlab/matlab_prog/preallocating-arrays.html
Loop Constants
The idea with this optimization is simple. If you are computing the same value for a variable every time through a loop, then you are repeatedly calculating the same thing. Instead, if you can just calculate the value once outside the loop, then you will save some time.
tic;
for ii = 1:100
A = eig(magic(2000));
% Let's pretend there is more code here...
end
toc;
Elapsed time is 43.479244 seconds.
tic;
A = eig(magic(2000));
for ii = 1:100
% Let's pretend there is more code here...
end
toc;
Elapsed time is 2.116375 seconds
Vectorization
As previously discussed, vectorized operations perform computation on each element in a matrix. For example, the .*
element-wise operator will multiply each element of an array with the matching element. Using vectorization where possible can help to speed up your code.
For reference, see https://www.mathworks.com/help/matlab/matlab_prog/vectorization.html
Parallelization
As we have seen with parfor, MATLAB has commands that allow you to explicitly break apart work into separate pieces to run at once. MATLAB has other advanced parallel programming constructs. For example, parfeval calls a function in parallel across multiple inputs. Also, the spmd command runs one task for each worker and allows you to explicitly exchange data between workers. These additional MATLAB parallelization techniques are out of scope of this workshop and will not be discussed in detail.
MATLAB MEX Compilation
The MATLAB Coder is able to convert MATLAB code from .m
files to .mex
. The .mex
file is a compiled version that has the potential to run faster than the original MATLAB source.
There are some restrictions when using MATLAB Coder. The MATLAB .m
must be a function file, there are certain functions and toolboxes that cannot be used, you must specify data types and usually also fixed sizes for all arguments passed to your MATLAB function, and the resulting .mex
file targets the specific computer and operating system on which it was created.
Let us return to the PI calculation code code/montecarlo/mexexample/montecarlo.m. The easiest way to compile this code to a MEX file is using the graphical MATLAB Coder app, which will walk you through the process of creating a MEX file. The other way is using the MATLAB codegen
command. The codegen
command is simple in this case, since our montecarlo function does not have any inputs or outputs:
codegen montecarlo
And when comparing the performance of the original montecarlo to the montecarlo_mex, we obtain this speedup:
Column-Major Memory Access
MATLAB stores matrices in column-major format. This means a matrix such as the one below will be stored in RAM in this order: 1, 4, 7, 2, 5, 8, 3, 6, 9.
ans =
1 2 3
4 5 6
7 8 9
You may recall that items close to other recently accessed items are cached in the processor cores to speed up memory access. So by accessing matrix elements by column for large matrices, you will use the fast cache more often. On the other hand, if you access elements by row, you may need to wait for them to be fetched from RAM for each element.
So you should work with matrix data by column instead of by row to improve memory performance.
For example, the code code/imagefilter/processimage.m is faster than code/imagefilter/processimage_rowmajor.m.
Key Points
Preallocating variables, moving constants out of loops, compilation, and memory access patterns are other MATLAB performance optimization techniques.