This content is submitted by a BigAdmin user. It has not been reviewed for technical accuracy by Sun Microsystems, though it may have been lightly edited to improve readability. If you find an error or would like to comment on the article, please contact the submitter or use the comment field at the bottom of the article.
Community submissions may not follow Sun trademark guidelines. For information on Sun trademarks, please see http://www.sun.com/suntrademarks/.
Automating a System Performance Check Using the checkperf Utility
This tech tip provides two scripts: checkperf and runqueue.d. The
checkperf script works on systems that run the Solaris 9 or Solaris 10 Operating System. The
runqueue.d script works on systems that run the Solaris 10 OS.
The checkperf utility checks system performance in terms of CPU, memory, I/O, and network TCP.
The default warning threshold for each of these items can be changed. Whenever one of the thresholds is reached,
checkperf sends a warning email to a specified recipient. The email might
include suggestions about how to improve system performance.
cron can be used with checkperf so that you don't have to
go to each server to check its system performance manually. checkperf can be scheduled to run during business hours. checkperf will not affect system performance. By default, it uses sar to collect statistics every 5 seconds for 5 minutes.
The minimal interval for which sar is able to collects statistics is one second. If a system has many processes that take a couple of milliseconds to run,
sar will not know that they are in the run queue. Therefore, if DTrace is installed on the system
(for example, if the system runs the Solaris 10 OS), checkperf calls runqueue.d,
which collects run queue information every millisecond for 30 seconds.
The remaining sections of this tech tip demonstrate how checkperf reacts when a system has various performance issues. Before we continue,
you need to set a few variables in checkperf:
DIR: Specifies the directory where checkperf and runqueue.d are
located (for example, /home/<username>/bin)
LOG: Specifies the file that will contain generated warning messages (for example,
/home/<username>/bin/perf_msg)
RECEIVER: Specifies the email address of the person who should receive warning messages
(for example, <username@domain.com>)
Note: My system has 32 CPUs. For testing purposes, I turned off 30 of them using psradm -f 2-31.
CPU Performance Warning
The parameter in checkperf for reporting CPU performance is CPU_UTIL_WARN, and its
CPU utilization warning threshold is set to 80 by default.
If the CPU utilization rate is more than 80%, checkperf checks the threads in the run queue, checks
whether the system has CPUs offline, and sends out email.
We can run dd if=/dev/zero of=/dev/null & to consume CPU resources:
root@host # dd if=/dev/zero of=/dev/null &
[1] 1571
root@host # dd if=/dev/zero of=/dev/null &
[2] 1572
root@host # dd if=/dev/zero of=/dev/null &
[3] 1573
root@host # dd if=/dev/zero of=/dev/null &
[4] 1574
root@host # sar -uq 5 3
15:21:25 %usr %sys %wio %idle
runq-sz %runocc swpq-sz %swpocc
Average 69 31 0 0
Average 2.0 99 0.0 0
root@host # ./checkperf
root@host # more perf_msg
CPU average utilization: 100%(>80%)
There are 30 CPUs offline and use psradm to bring them online
Threads (per second) waiting for CPU to run: 2.0.
Recommend adding 2.0 CPUs to your system. Use prstat -L to see
if running processes have multiple threads so that you may switch to
thread-based-processor machine, such as the Sun Fire T2000 server.
The accurate threads waiting for CPU: 2.1
The "accurate threads waiting for CPU: 2.1" text is generated by runqueue.d, which provides more accurate
information about the run queue.
Memory Shortage Warning
There are two parameters in checkperf for checking memory:
MEM_FREEPHY_WARN_PERCENT: This is the warning threshold for available physical memory, and its threshold is set to 20 by default.
MEM_FREESWAP_WARN_PERCENT: This is the warning threshold for available swap space (virtual memory), and its threshold is set to 20 by default.
If the available swap space is less than 20%, checkperf also checks whether the total size of physical swap devices
is less than 1.5 times the size of physical memory. As I demonstrated in a previous article,
Impact of Swap Space on System Performance for the Solaris 9 and 10 OS,
the lack of physical swap space affects system performance when a system is low on physical memory.
Here we will use the myfilltmp.sh script (which is shown in the previous article) to consume memory:
root@host # ./myfilltmp.sh
root@host # sar -r 5 3
15:34:39 freemem freeswap
Average 122536 6180453
So, free memory is 122536*8/1024, which equals 957 Mbytes, and free swap space is 6180453*512/1024/1024, which equals 3017 Mbytes.
root@host # ./checkperf
root@host # more perf_msg
Available physical memory: 937 MB(<3275 MB)
Available swap space: 2956 MB(<3552 MB)
Recommend adding 20465 MB swap device. The total size
of physical swap devices should be 1.5 times physical memory.
I/O Performance Warning
The parameter in checkperf for reporting I/O devices' utilization is IO_UTIL_WARN, and its
I/O utilization warning threshold is set to 80 by default.
Let's generate some heavy I/O load:
root@host # cp myusr.tar myusr.tar2
root@host # sar -d 5 5
Average nfs1 0 0.0 0 0 0.0 0.0
sd1 99 6.8 134 80513 0.0 51.0
root@host # ./checkperf
root@host # more perf_msg
IO utilization on sd1: 100%(>80%)
Network Performance Warning
The following factors degrade TCP performance:
Retransmission: Messages that are lost must be retransmitted.
Duplicate packets: The local host might receive duplicate packets if it times out on the original request, issues another request, and then
receives the original packet.
Listen queues: A listen queue grows as the arrival rate of client requests to a server exceeds the server's processing rate.
In checkperf, the warning threshold for the retransmission rate is 15% and the warning threshold for the duplicate packet
rate is 15%. The warning threshold for listen queue drop is 100. Because the testing server does not have any retransmitted messages or any duplicate
packets, and listen queue drop is not greater than 100, the perf_msg file is empty.
Putting It All Together
Finally, let's perform CPU, memory, and I/O performance checks all together:
root@host # dd if=/dev/zero of=/dev/null &
root@host # dd if=/dev/zero of=/dev/null &
root@host # dd if=/dev/zero of=/dev/null &
root@host # dd if=/dev/zero of=/dev/null &
root@host # ./myfilltmp.sh
root@host # cp myusr.tar myusr.tar2
root@host # more perf_msg
CPU average utilization: 100%(>80%)
There are 30 CPUs offline and use psradm to bring them online
Threads (per second) waiting for CPU to run: 3.1.
Recommend to add 3.1 CPUs to your system. Use prstat -L to see
if running processes have multiple threads so that you may switch to
thread-based-processor machine, such as Sun Fire T2000 server.
The accurate threads waiting for CPU: 3.1
Available physical memory: 778 MB(<3275 MB)
Available swap space: 2821 MB(<3517 MB)
Recommend to add 20465 MB swap device. The total size of physical
swap devices should be 1.5 times physical memory.
IO utilization on sd1: 51%(>30%)
Because of the CPU utilization and lack of memory, the average disk utilization was not able to reach 80%. I decreased the variable
IO_UTIL_WARN to 30. From this example, we can see that CPU and memory can affect I/O performance too.
The information and links on this page have been provided by a BigAdmin user. The submitter is solely responsible for such information and links. Sun is not responsible for the availability of external sites or resources, and does not endorse and is not responsible or liable for any content, advertising, products, or other materials on or available from such sites or resources. Sun will not be responsible or liable, directly or indirectly, for any actual or alleged damage or loss caused by or in connection with use of or reliance on the information posted here, or goods or services available on or through any external site or resource.
Comments (latest comments first)
Discuss and comment on this resource in the BigAdmin Wiki
Unless otherwise licensed, code in all technical manuals herein (including articles, FAQs, samples) is provided under this License.