3.7. SYSTEM PERFORMANCE COUNTERS
The performance and
scalability of a software system are determined by the various
performance and scalability factors. Those factors that are affecting
the performance and scalability of a software system most are classified
as the bottlenecks. System performance counters help capture those
bottlenecks.
All operating systems,
whether it's Windows, UNIX, or Linux, have built-in system performance
counters that can be used to monitor how a system is utilizing its
resources. Based on the resource utilizations of a system, one can infer
immediately what the system is doing and where the problem areas are.
Capturing the system resource utilizations is one of the most
fundamental tasks to be conducted for diagnosing software performance
and scalability problems.
A performance counter enabled
through a system monitoring tool is simply a logical entity that
represents one of the aspects of a resource quantitatively. For example,
one often needs to know:
How busy the CPUs of a system are
How much memory is being used by the application under test
How busy the disks of a data storage system are
How busy the networks are
System resource utilizations
can be monitored in real time or collected into log files for later
analysis. In this section, I describe how this can be done on Windows
and UNIX platforms.
3.7.1. Windows Performance Console
On Windows-based computers, the performance monitoring utility program perfmon
can be used to log performance counters. Since most developers and QA
engineers might have not gotten a chance to get familiar with using perfmon, we spend a few minutes to show how to use it here.
To start up perfmon, click on Start | All Programs | Run, and enter perfmon as shown in Figure 3.29.
Then click OK and you should see the Performance Console as shown in Figure 3.30.
The left-hand side of the
Console shows two items, System Monitor and Performance Logs and Alerts.
When the System Monitor is selected, the right-hand side frame displays
the current readings of the added counters. At the bottom of the frame,
added counters are shown. For example, Figure 3.30 shows that on the computer WHENRY-NB, the counter %Processor_Time of the Performance Object Processor
was added to display CPU utilizations. The readings associated with
this counter are: Last CPU utilization reading 53.125%, Average CPU
utilization 41.359%, Minimum CPU utilization 3.906%, and Maximum CPU
utilization 57.813%. This is how to tell how busy the CPUs of a system
are.
It might be helpful at this
point to get familiar with the above performance console. Placing the
mouse pointer on an icon at the top of the right-hand side frame shows
what that icon is for. Some of the examples include:
Clicking on the second icon would clear the display.
Clicking on the third icon would enable viewing current activities.
Clicking on the "+" icon would bring up the Add Counters dialog box for adding new counters to the monitoring list.
Clicking on the "x" icon would remove a counter from the current monitoring list.
Clicking on the light bulb icon would highlight the display for the counter selected currently.
Clicking/unclicking on the red-cross icon would freeze/unfreeze displaying the current activities.
Next, let's see how to add various performance counters. Figure 3.31 shows what Performance object to select, what Counters to select, and whether to select for All instances or only some specific instances.
After selecting Performance object, instances, and counters based on your needs, click Add to add the desired counters. Click Close
to exit the Add Counters dialog box. If you want to know what a
specific counter is for, select the counter you are interested in, then
click Explain and you will get a fairly detailed description about that counter.
You can adjust the sampling interval by clicking on the Properties icon and then specify Sample automatically every n seconds, where n is the number of seconds you desire as the sampling interval. The default 1 second shown in Figure 3.32 is too fast and you can increase it based on how long your test would last.
Real-time display is meant
for short test duration only, and also, you would lose the data after
closing it. You can log the counters into a perfmon log file and analyze the logs afterwards.
To set up a perfmon logging task, follow this procedure:
Select Counter Logs under Performance Logs and Alerts, and right-click on Counter Logs to select the New Log Settings dialog box as shown in Figure 3.33.
Enter a name and click on OK, which would bring up the dialog box as shown in Figure 3.34.
From
here you can add any counters you are interested in and specify a
sampling interval. At the top, it shows the log file name, which will
contain the performance log data for later offline analysis.
You
can specify the log format under the Log Files tab, either Binary File
or Text File (Comma delimited) for working with Excel to plot charts.
Even if you select binary format now, you can re-save logged data in
text format later. To change the log format from binary to text with a
log file, first import the logged data in binary format, and then
specify the time range, add the counters you are interested in, and then
display the data. Right click anywhere on the display area and re-save
data in text format.
You
specify the schedules under the Schedule tab. You can select to
manually start and stop or specify a logging duration to avoid logging
too much unnecessary data even after a test is complete.
To analyze the perfmon log data, follow this procedure:
Select the System
Monitor entry, and then click on the fourth icon of View Log Data, which
should bring up the dialog box as shown in Figure 3.35.
Click on Add and then add the perfmon log file you want to analyze, which should bring up a dialog box similar to Figure 3.36.
Click
on the Time Range button to display the time range for which the
counters were logged. You can move the sliding bars to adjust the exact
range you want. Keep in mind that the average value of a counter is
based on the exact range you select, so you may want to adjust to the
exact start and stop times of your test. You should keep a daily
activity log that records the exact details of your test such as test
start/stop time, all test conditions, and test results so that you can
easily look back at exactly what you did with your previous test. This
is a good habit to have as a software performance engineer.
Then
click on the Data tab to get to the Add Counters dialog box. From
there, first delete all counters and then select the counters you are
interested in for analyzing your perfmon log data.
This seems to be a little bit tedious but it helps you learn perfmon
quickly without experimenting with it yourself. Initially, it might be
difficult for you to decide what counters you should select out of the
hundreds of built-in counters. To help you get started, Table 3.3 shows all common perfmon
counters I typically use for diagnosing my performance issues. You can
add more based on your special needs, but this list of counters should
be sufficient in general.
Before moving on to the UNIX system performance counters, I'd like to share with you some techniques of using perfmon to diagnose common performance and scalability issues such as memory leaks, CPU bottlenecks, and disk I/O bottlenecks. Using perfmon
to diagnose performance and scalability issues is a very important
skill to acquire for testing the performance and scalability of a
software system on the Windows platform, perfmon
is intuitive, easy to learn, and very powerful for diagnosing
performance and scalability issues on Windows. This is true not only for
troubleshooting the performance and scalability problems you encounter
with a complex, large-scale software system, but also for figuring out
what's wrong when your desktop or laptop Windows system is too slow for
you to bear with.
Let's start with using perfmon to diagnose memory leaks.
Table 3.3. A Minimum Set of Perfmon Counters to be Logged for Performance Tests in Windows Environment
Performance Object[Index Term: |Memory leaks:|perfmon, system performance counters][Index Term: |System performance counters|perfmon:|memory leak diagnosis] | Performance Counters |
Processor |
%ProcessorTime |
System |
Processor Queue Length |
Process |
%ProcessorTime |
|
Private Bytes |
|
Thread Count |
|
Virtual Bytes |
|
Working Set |
Memory |
Available MBytes |
|
Page Reads/sec |
|
Page Writes/sec |
Physical disk or logical disk |
%ldle Time (Note: Use 1 – %ldle for %Busy Time) |
|
Avg. Disk Read Queue Length |
|
Avg. Disk Write Queue Length |
|
Avg. Disk Bytes/Read |
|
Avg. Disk Bytes/Write |
|
Avg. Disk sec/Read |
|
Avg. Disk sec/Write |
|
Disk Read Bytes/sec |
|
Disk Write Bytes/sec |
|
Disk Bytes/sec |
|
Disk Reads/sec |
|
Disk Writes/sec |
Network interface |
Bytes Received/sec |
|
Bytes Sent/sec |
|
Bytes Total/sec |
3.7.2. Using perfmon to Diagnose Memory Leaks
The first chart I'd like
to show is the memory growth chart, which might help you evaluate the
memory leak issues associated with your application. Memory leak is a
very common factor affecting the performance and scalability of a
software system on Windows, especially with 32-bit Windows operating
systems. It's one of the toughest issues in developing software, as most
of the time, you know your software leaks memory, but it's hard to know
where leaks come from, perfmon
can only help diagnose whether you have memory leaks in your software;
it doesn't tell you where the leaks come from. You have to use some
other tools like Purify® to find and fix the memory leaks that your
product suffers.
In a 32-bit
environment, the maximum addressable memory space is 4 GB. On the
Windows platform, this 4 GB is split between the kernel and a process.
Although you can extend that 2-GB limit to 3 GB using a 3-GB switch
parameter, that 3 GB may still not be enough for some applications with
severe memory leak problems. So the best defense is to contain memory
growth in your application. Otherwise, when that 2-GB limit is hit, your
application will start to malfunction, which makes it totally unusable.
As a performance engineer,
you are obligated to check memory growth with your software product by
using a large volume of data. When you observe significant memory
growth, you need to communicate it back to your development team so that
they can fix it in time. Keep in mind that you need to make sure
whether memory growth would come down after your test is complete. If it
doesn't, it probably can be classified as memory leaks,
which sounds more horrible than memory growth. There is also a
likelihood that the memory growth you observe is actually memory
fragmentation, which is related to how the operating system manages
memory. Whether it is memory leak or memory fragmentation, they are
equally bad as far as the consequences to the application are concerned.[Index Term: |Memory leaks:|perfmon, system performance counters][Index Term: |See also Application programming interface (API) profiling|system performance counters|perfmon:][Index Term: |System performance counters|perfmon:][Index Term: |System performance counters|perfmon:|CPU bottlenecks diagnosis]
Figure 3.37
shows memory growth with two processes of an application written in
C/C++. The total test duration was about 24 hours. Note that private
bytes curves are smoother than virtual bytes curves, which appear
stair-cased. One should use private memory to evaluate actual physical
memory consumption. It is seen that Process A is much more benign than
Process B in terms of memory growth, as its private bytes curve is much
flatter. Process B reached 320 MB at the end of the test, which means it
might reach the 2-GB memory limit if the test lasts 5 days. From this
test, it's clear that it's necessary to take some action against the
memory growth for Process B.
In the next section, I'll discuss how to use perfmon to diagnose CPU bottlenecks.
3.7.3. Using perfmon to Diagnose CPU Bottlenecks
You can monitor the CPU
utilizations of a Windows system using the performance object of
Processor with the %Processor Time counter if you know you have only one
major process such as a database server running on your system. If you
have multiple processes running on your system, then use the Process
performance object with the %Processor Time counter for the process
instances you are concerned with. The %Processor Time counter for the
Processor performance object measures the total average CPU utilization
across multiple CPUs, whereas the %Processor Time counter for the
Process performance object measures the accumulative CPU utilizations
across multiple CPUs. So the maximum value is 100% for the former and N × 100% for the latter, where N is the number of total CPUs of an N-way Windows system. This is a subtle difference that must be accounted for when interpreting CPU utilizations.
Typically, an application
might be deployed on multiple systems, for example, the application
server on one physical box and the database server on another physical
box. When the application is properly sized, and the application is well
optimized and tuned, CPU utilizations across multiple systems should be
well balanced to yield sustainable, maximum possible performance and
scalability. Figure 3.38
shows such a balanced flow where the application server and database
server were about equally utilized, yielding a high throughput of
creating 127 objects/second. Over one million objects were created
during a period of 2 hours and 11 minutes with the associated test run.
If you see the CPU utilization
of the database server is going up while the CPU utilization of the
application server is going down, then some tuning is required to bring
both of them to a steady state. This phenomenon was called
"bifurcating," which might be very common for applications that are not
well tuned [Liu, 2006]. This is a good example that you should not just
keep generating performance test numbers. You should examine
utilizations of various resources to see if there are opportunities for
improving the performance and scalability of your application as well.[Index Term: |System performance counters|perfmon:|disk I/O bottlenecks diagnosis]
The general criteria for
defining CPU as the bottleneck on a computer system is that the average
CPU utilizations are above 70% or the average processor queue length per
CPU is above two. However, there might be a case where other resources,
such as disks, may become the bottleneck before the CPU does. This is
especially true with database-intensive software applications. Let's
look at such a scenario next.
3.7.4. Using perfmon to Diagnose Disk I/O Bottlenecks
In this section, I'd like to
share with you a chart that shows disk activities. It is very important
to make sure that your disk I/O is not the bottleneck if your
application is database intensive.
perfmon
provides a sufficient number of counters associated with disk
activities. However, very often, you may find that the %Disk Time
counter may give you some bogus numbers exceeding 100%. As aworkaround,
use 100 — %Idle Time to calculate the disk %Busy Time, which is
equivalent to the average utilization for CPUs. Figure 3.39
shows the average disk utilizations calculated using 100 — %Idle Time
for that one million object creation batch job discussed in the
preceding section. The database storage used for this test was an
internal RAID 0 configuration stripped across three physical disks.
Unlike CPUs, a disk
utilization level of above 20% starts to indicate that I/O is the
bottleneck, whereas for CPUs the threshold is about 70%. This disparity
between disks and CPUs is due to the fact that CPUs in general can crank
much faster than disks can spin.
Exploring disk
activities is a lot more interesting than exploring CPU activities, as
we can dig deeper into more metrics such as average (read | write) queue
length, average (reads | writes) / sec, average disk sec / (read |
write), and disk (read | write) / sec. Let's explore each of these disk
activity metrics.
Figure 3.40
shows the average disk read queue length and average disk write queue
length recorded during that one million object creation batch job. It is
seen that the write queue length is much larger than the read queue
length, which implies that a lot more disk write activities occurred
than read activities. This is not surprising at all, as during this
batch job test, one million objects were created and persisted to the
disks, which inevitably incurred a lot more disk writes than reads.
Queue length is a measure of
the number of items waiting in a queue to be processed. According to
queuing theory, which will be introduced in a later chapter of this
book, a resource is considered a bottleneck if its queue length is
larger than 2. As we introduced earlier, the database storage used for
this test was an internal RAID 0 configuration stripped across three
physical disks, which would push the queue length threshold to 6. It's
clear from Figure 3.40
that the write queue length was around 20, which had far exceeded the
threshold value of 6. This implies that a more capable storage system
would help improve the performance and scalability of this batch job
further.
Figure 3.41
shows the average number of reads and writes per second that occurred
during this test. There were about 300 writes/second and 50
reads/second, which once more confirmed that more writes than reads
occurred during the test period for the one million object creation
batch job. Remember that the throughput for this batch job was 127
CIs/s, which implies that about 2 to 3 writes occurred per object
creation on average. This seems to be normal for most database
write-intensive applications.
In addition to knowing the
disk queue lengths and I/O rates associated with a test, it's also
insightful to know how long it takes on average per read and per write.
Normally, disk times should range from 5 milliseconds to 20 milliseconds
with normal I/O loads. You may get submillisecond disk times if the
database storage has a huge cache, for example, from a few gigabytes to
tens of gigabytes.
For this test, each disk has
only a 256-MB cache, so we would expect disk read and write times to be
well above 1 millisecond. Actual disk read and write times associated
with this test are shown in Figure 3.42.
As is seen, the average disk write time is much longer than the average
disk read time, as we already know from the previous analysis that
there were a lot more requests accumulated up in the write queue than in
the read queue. You have confidence when all metrics are consistent
with each other.
Charts are very useful for qualitatively characterizing each performance factor. However, they are less precise for quantifying each performance factor. To get more quantitative, you can use the View Report functionality of perfmon to obtain the average value of each performance counter, such as shown in Figure 3.43 with the following quantitative values for some of the indicative disk performance counters:
Average disk utilization: 60%
Average disk time per read: 11 milliseconds
Average disk time per write: 73 milliseconds
Average disk write queue length: 22
Disk reads/sec: 46
Disk writes/sec: 297
Keep in mind that you need to narrow the time range of your perfmon
log data down to the exact range corresponding to the start and end
times of your test; otherwise, the averaged values won't be accurate.
Performance
Console allows you to monitor system resource utilizations over an
extended period of time when the test is running. It's convenient for
post-testing performance and scalability analysis. However, sometimes,
you may want to use another Windows utility tool—Task Manager—for
checking the resource consumption on the current Windows system. This is
the topic for the next section.
3.7.5. Using Task Manager to Diagnose System Bottlenecks
We'll see in this section that Task Manager is more convenient than perfmon for some tasks. For example:
You may want to have a quick look at how busy the CPUs of a system are overall right now.
You
may want to check how well balanced the CPU utilizations are across
multiple CPUs of the system. This actually is an easy way to tell
whether the software is running in multithreaded mode by examining
whether all CPUs are about equally busy simultaneously.
You may want to check which processes are consuming most of the CPU power on this system right now.
You
may want to check how much memory is used right now. And you can drill
down to which processes are consuming most of the memory.
You may want to check the network utilization right now.
You
can even see in real time if memory is leaking or not. If you see the
memory consumption of a process is only going up, then there are
probably memory leaks with that process.
First, to start up Task Manager, press CTRL + ALT + DELETE and you should get a dialog box similar to Figure 3.44.
As shown in Figure 3.44 under Performance
tab, this system has two CPUs and both of them were busy, which means
that the application is a multithreaded application. It also shows that a
total memory of 374 MB was used up to that moment.
You can check the network utilizations by clicking on the Network tab, and check the users currently logged in by clicking the Users tab. But the most important tab for troubleshooting a performance issue with a system is the Process tab.
Computer programs run on a computer system as processes. Each process has an ID and a name to identify itself on the system it is running. By clicking on the Process
tab on the Windows Task Manager dialog box, you can bring up a list of
processes that are running currently on the system, as shown in Figure 3.45.
A few notes about this Process tab:
You may want to check the box of Show processes from all users at the left bottom corner of the screenshot in Figure 3.45 in order to see the processes that you are looking for.
You can't see your processes unless they are running right now.
You
can sort by CPU Usage or Memory Usage to look for the most
CPU-intensive or memory-intensive processes running on this system right
now.
You can
decide on what metrics you want to be displayed by clicking on the View |
Select Columns ... which would bring up the list of metrics you can
select, such as shown in Figure 3.46.
As you can see from the screenshot in Figure 3.46,
you can select the Memory Usage, Memory Usage Delta, and Peak Memory
Usage from the view options made available. These counters give a
complete view of the process memory consumption. When a memory-intensive
application is running, you will see the memory usage for that process
keeps growing with more positive memory usage deltas than negative ones.
If the memory usage doesn't come down after the process completed its
task and is waiting for more new tasks, that's an indication that there
is a memory leak issue with that process.
This concludes our
discussion on performance counters on Windows systems. Most software
development work is done on Windows, which is why we covered more topics
on Windows.
However, for enterprise
software applications, UNIX or Linux platforms are the most likely
choice for some customers, so you might need to test out your software
on these platforms as well. Instead of repeating what is already
available in many UNIX/Linux texts, in the next section, I'll show you a
simple script that can be used to capture the CPU and memory
consumptions for the processes that you are concerned with. This
probably is sufficient for most of your performance test needs. In a
production environment, UNIX/Linux systems are typically managed by
professional administrators who have developed special ways of capturing
various system performance counters or simply use tools provided by
vendors. That is beyond the scope of this book.
3.7.6. UNIX Platforms
On UNIX and
Linux systems, vendors provide their own system performance monitoring
tools, although some common utilities such as sar might be available on all specially flavored platforms.
Performance
troubleshooting often requires monitoring resource utilizations on a
per-process basis. This might be a little bit more challenging on UNIX
systems than on Windows systems. On Windows, you use perfmon
to configure which processes and what counters you want to monitor. On
UNIX systems, you need a script to do the same job. Here, I'd like to
share one script that I often use on a specially flavored, popular UNIX
platform for capturing CPU and memory utilizations when my tests were
running. Since it's written in bash shell, it could be run on other UNIX and Linux systems as well.
Here is the script that you can
adapt to your needs for monitoring systems resource usages on a
per-process basis in your UNIX or Linux environment:
#!/bin/bash
sleepTime=60
pattern="yourPattern"
x=0
procId=$ (ps -eo pid,pcpu,time,args | grep $pattern |\
grep -v grep | awk '{print $1}')
echo "prodd=" $proId
while [$x -ge 0]
do
date=$(date)
ps0=$(ps -o vsz,rss,pcpu -p $procId)
x=$((x+l))
echo $x $date $ps0 | awk >{print $1, $3, $4, $5, $8,\
$11/1000.0, $9, $12/1000.0, $10, $13, $14}'
sleep $sleepTime
done
As you see, this is a bash script. You need to specify how often you want to sample (sleepTime) and enter a string (pattern)
that represents your process. Then you extract the process ID of that
process. Using that process ID, you keep polling in an infinite loop for
the counters you want to record. In this script, I was most interested
in three counters, vsz, rss, and pcpu, which represents the virtual memory size, resident memory size, and CPU usage associated with that process. The counters vsz and rss are equivalent to the virtual bytes and private bytes counters of perfmon on Windows, respectively. These counters are very useful for monitoring memory growth associated with a process.
To execute this script, change to the bash shell environment and issue the following command:
prompt> ./scriptFileName > filename.txt &
The output is directed to a
text file that you can analyze later. The text file is formatted with
comma, so you can import it into an Excel spreadsheet to draw the charts
you are interested in. Remember that this is an infinite loop, so you
need to bring it to the foreground using the command fg and stop it after you are done.
If you cannot run this script, you might need to execute the command chmod 700 scriptFileName to set proper permissions.
Also, this simple script can be modified to monitor multiple processes in a single script.
In the next section, I'll
propose several software performance data principles to help enforce the
notion that software performance and scalability testing is not just
about gathering data. It's about getting data that has value for your
company and your customers.