Monitoring resources
Overview
Teaching: 30 min
Exercises: 0 minQuestions
What hardware resources are being used?
Objectives
Understand the computing concept of a process
Differentiate types of memory use
Understand used and available memory.
Display disk space usage
Examine how many CPU cores are being used
After you are able to connect to a server and run programs, it is often useful to know how much of the server your programs use. And servers are often shared, so it is also useful to see what is available on the server. There are three types of server hardware resources to consider - disk space, memory, and CPU.
Let’s start by looking at the top
command. This is an interactive command that show what is currently running on your server, as well as how much memory and CPU is being used.
From your server shell prompt, start the top program:
$ top
top - 14:14:43 up 34 days, 22:24, 2 users, load average: 0.00, 0.06, 0.05
Tasks: 133 total, 1 running, 132 sleeping, 0 stopped, 0 zombie
%Cpu(s): 49.9 us, 0.0 sy, 0.0 ni, 50.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.1 st
KiB Mem : 7980368 total, 216248 free, 1120544 used, 6643576 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 6487508 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23097 jane 20 0 1071500 1.001g 1444 S 200.0 13.2 0:14.43 run_simulation
23100 jane 20 0 40524 3660 3044 R 0.3 0.0 0:00.01 top
1 root 20 0 37764 5736 3908 S 0.0 0.1 0:25.26 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.16 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
At any time you may quit top by typing q
.
Process
We have seen how you can run commands in the Linux shell. These commands are software programs, just like your web browser or word processor. A running instance of a program is represented in Linux as a process.
In the top
command’s output, we can see a list of processes in the bottom section of the display.
Each process has a unique pid (process identifier). We will occasionally need to use these numbers in other shell commands.
CPU Usage
The system load is a measure of how many processes are running and how computationally intensive those tasks are. The amount of processing power you have will depend on the type of processor in the machine you are using. In all likelihood your computer has a multicore CPU. These factors are important in determining the current load on a given machine and whether you are able to take advantage of any available resources. To get a summary of the CPUs available on your current machine, use the command lscpu
.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 60
Model name: Intel Core Processor (Haswell, no TSX)
Stepping: 1
CPU MHz: 2294.470
BogoMIPS: 4588.94
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm tpr_shadow vnmi flexpriority ept fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
This summary gives us more information than we need at this point, but importantly it tells us how many CPU cores there are and their speed. In this example we see the machine has 4 available CPUs, each of which are roughly 2.3GHz. Looking back to the top
output, you will see a line in the top summary for ‘%Cpu(s)’. This indicates the average CPU use over the last few seconds. It is the percentage used of all CPUs combined, so it will range from 0% to 100%. One of the numbers is ‘id’, which means idle. This percentage multiplied by the number of cores will indicate how many CPU cores are available if you were about to start a program.
Now let’s run the run_simulation
example program to see how to monitor an individual process’ CPU use.
$ ./run_simulation -c 2
In top’s listing of processes, you should see a run_simulation
process that is using 200% CPU. For processes, the %CPU is the percentage of a single core being used. So 200% indicates that rougly 2 cores are being used, on average.
Memory
Another major aspect to consider when monitoring the load of a given computer is the amount of used and free memory. Memory, or RAM, is where processes and their data are kept while running. Every process uses a variable amount of memory and can only function efficiently when there is enough free memory for them to use. Besides the real memory, the computer can also use virtual memory, a combination of real memory and disk space. The portion of virtual memory that uses the disk is known as swap space. When the computer has to resort to using the hard drive in place of memory, the performance of the system drops considerably due to the fact that the disk is orders of magnitude slower than memory. It is for this reason that monitoring your memory usage (and ensuring you do not run out) is important.
To see how much memory is available, we can return to the top
program’s display. The ‘KiB memory’ and ‘KiB swap’ lines describe memory use. You can see how much memory you have in the ‘total’ number. The ‘avail mem’ number shows the number of kilobytes of memory that would is available to processes. This is not the same as the ‘free’ number, which does not take into account the memory used by Linux itself.
The same information about memory can be seen by running this shell command to use 500MB of memory:
$ free -h
This free
command gives a quick summary of the free and used memory across your VM. Including the option -h displays the output using human readable units.
Back in the top
display, we see a ‘%MEM’ column for each process. If we run this example command:
$ ./run_simulation -m 1000
It might be easier to see the run_simulation if we sort by memory used. Do this by pressing M
. You can go back to the default CPU sort by pressing P
.
Disk Space
The last resource it is important to monitor is the amount of free disk space. When it comes to scientific computing, it is possible that you have a large ammount of input data that needs to be processed by a series of programs, each of which has a significant amount of output data. On top of this there may be several other people using the machine and disk doing similar tasks. Completely running out of disk space is probably the most catastrophic scenario compared to running out of the other resourses mentioned. This is because the operating system also uses the hard disk frequently to read and write files of all kinds (temporary files, log files, etc). When the disk becomes compeltely full, these basic operating system tasks cannot be done and the whole system will grind to a halt. This can also complicate rebooting the system as well. For these reasons it is important to keep an eye on the amount of disk being used at all times.
The disk free command gives a quick usage summary of the available disk storage. An intervention should be made if you notice any listed volume approaching 100% usage.
$ df -h ~
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 78G 25G 53G 32% /
Additionally, the disk usage command will tell you how much disk space a directory is using:
$ du -hsx ~
1.9G /home/jane
To see the disk usage of a particular file, additional options to ls
will help. The -l
option provides a long format listing and the -h
option again provides disk space in human readable units:
$ ls -lh run_simulation
-rwxrwxr-x 1 ryantaylor ryantaylor 14K Aug 30 15:52 run_simulation
This form of ls
provides more information than we will address here. We are looking only for the size of the file, which is just before the date - 14 kilobytes in this example.
Playing nicely with other users
You may want to run a program that takes long time and uses most of the CPU cores in your server. If other users may need to run smaller programs at the same time, it is polite to lower your program’s priority. This will cause your program to slow down a bit more than the other user’s.
$ nice ./run_simulation -c 4
The nice
command will lower a process’ priority by increasing its nice
value. You can see process’ nice values in the ‘NI’ column of top.
Process Status command
The ps
command will show you a list of your processes, similar to the process list in the top
command’s display:
$ ps ux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
jane 22825 0.2 0.0 21396 5248 pts/0 Ss 14:06 0:00 -bash
jane 22875 0.2 0.0 21468 5436 pts/1 Ss 14:06 0:00 -bash
jane 22920 176 13.1 1071500 1050084 pts/1 Sl+ 14:07 0:14 ./run_simulation -c 2 -m 1024
jane 22926 0.0 0.0 36088 3316 pts/0 R+ 14:07 0:00 ps u
Running out of memory?
You are concerned that some running processes are taking up too much memory and it will affect your ability to run your program. What tool(s) can you use to assess the situation?
Solution
top
will give you a summary of all of the processes currently running, along with the amount of memory currently being used by each process.free
will give you a summary of the current memory use and how much memory is available.
Checking disk space
Which tool would you use to determine how much free space is available on the local hard disk?
ps
free
du
df
Solution
- No:
ps
aka ‘process status’ gives you information about running processes, not disk space.- No:
free
displays the amount of free and used memory (not disk space) in the system.- No:
du
gives an estimated file size for a specified file or folder, it will not tell you how much space is remamning.- Yes:
df
will report the total used and available space (among other things) on every disk in the machine.
Key Points
The
top
command shows an interactive display of hardware resources.The
free
command shows memory use.The
df
command shows available disk space.The
nice
command changes a process’ priority.The
ps
command shows a list of processes