3.7. SYSTEM PERFORMANCE COUNTERS
The performance and scalability of a software system are determined by the various performance and scalability factors. Those factors that are affecting the performance and scalability of a software system most are classified as the bottlenecks. System performance counters help capture those bottlenecks.
All operating systems, whether it's Windows, UNIX, or Linux, have built-in system performance counters that can be used to monitor how a system is utilizing its resources. Based on the resource utilizations of a system, one can infer immediately what the system is doing and where the problem areas are. Capturing the system resource utilizations is one of the most fundamental tasks to be conducted for diagnosing software performance and scalability problems.
A performance counter enabled through a system monitoring tool is simply a logical entity that represents one of the aspects of a resource quantitatively. For example, one often needs to know:
How busy the networks are
System resource utilizations can be monitored in real time or collected into log files for later analysis. In this section, I describe how this can be done on Windows and UNIX platforms.
3.7.1. Windows Performance Console
On Windows-based computers, the performance monitoring utility program perfmon can be used to log performance counters. Since most developers and QA engineers might have not gotten a chance to get familiar with using perfmon, we spend a few minutes to show how to use it here.
Figure 3.29. Dialog box for starting up perfmon.
Figure 3.30. Windows performance console.
To start up perfmon, click on Start | All Programs | Run, and enter perfmon as shown in Figure 3.29.
Then click OK and you should see the Performance Console as shown in Figure 3.30.
The left-hand side of the Console shows two items, System Monitor and Performance Logs and Alerts. When the System Monitor is selected, the right-hand side frame displays the current readings of the added counters. At the bottom of the frame, added counters are shown. For example, Figure 3.30 shows that on the computer WHENRY-NB, the counter %Processor_Time of the Performance Object Processor was added to display CPU utilizations. The readings associated with this counter are: Last CPU utilization reading 53.125%, Average CPU utilization 41.359%, Minimum CPU utilization 3.906%, and Maximum CPU utilization 57.813%. This is how to tell how busy the CPUs of a system are.
It might be helpful at this point to get familiar with the above performance console. Placing the mouse pointer on an icon at the top of the right-hand side frame shows what that icon is for. Some of the examples include:
Clicking on the third icon would enable viewing current activities.
Clicking on the "+" icon would bring up the Add Counters dialog box for adding new counters to the monitoring list.
Clicking on the "x" icon would remove a counter from the current monitoring list.
Clicking on the light bulb icon would highlight the display for the counter selected currently.
Clicking/unclicking on the red-cross icon would freeze/unfreeze displaying the current activities.
Next, let's see how to add various performance counters. Figure 3.31 shows what Performance object to select, what Counters to select, and whether to select for All instances or only some specific instances.
After selecting Performance object, instances, and counters based on your needs, click Add to add the desired counters. Click Close to exit the Add Counters dialog box. If you want to know what a specific counter is for, select the counter you are interested in, then click Explain and you will get a fairly detailed description about that counter.
You can adjust the sampling interval by clicking on the Properties icon and then specify Sample automatically every n seconds, where n is the number of seconds you desire as the sampling interval. The default 1 second shown in Figure 3.32 is too fast and you can increase it based on how long your test would last.
Real-time display is meant for short test duration only, and also, you would lose the data after closing it. You can log the counters into a perfmon log file and analyze the logs afterwards.
Figure 3.31. Dialog box for adding perfmon counters.
Figure 3.32. Dialog box for entering perfmon sample interval.
To set up a perfmon logging task, follow this procedure:
Select Counter Logs under Performance Logs and Alerts, and right-click on Counter Logs to select the New Log Settings dialog box as shown in Figure 3.33.
Enter a name and click on OK, which would bring up the dialog box as shown in Figure 3.34.
From here you can add any counters you are interested in and specify a sampling interval. At the top, it shows the log file name, which will contain the performance log data for later offline analysis.
You can specify the log format under the Log Files tab, either Binary File or Text File (Comma delimited) for working with Excel to plot charts. Even if you select binary format now, you can re-save logged data in text format later. To change the log format from binary to text with a log file, first import the logged data in binary format, and then specify the time range, add the counters you are interested in, and then display the data. Right click anywhere on the display area and re-save data in text format.
You specify the schedules under the Schedule tab. You can select to manually start and stop or specify a logging duration to avoid logging too much unnecessary data even after a test is complete.
To analyze the perfmon log data, follow this procedure:
Select the System Monitor entry, and then click on the fourth icon of View Log Data, which should bring up the dialog box as shown in Figure 3.35.
Click on Add and then add the perfmon log file you want to analyze, which should bring up a dialog box similar to Figure 3.36.
Click on the Time Range button to display the time range for which the counters were logged. You can move the sliding bars to adjust the exact range you want. Keep in mind that the average value of a counter is based on the exact range you select, so you may want to adjust to the exact start and stop times of your test. You should keep a daily activity log that records the exact details of your test such as test start/stop time, all test conditions, and test results so that you can easily look back at exactly what you did with your previous test. This is a good habit to have as a software performance engineer.
Then click on the Data tab to get to the Add Counters dialog box. From there, first delete all counters and then select the counters you are interested in for analyzing your perfmon log data.
Figure 3.33. Dialog box for naming a new perfmon log setting.
Figure 3.34. Dialog box for configuring a new perfmon log setting.
Figure 3.35. Dialog box for selecting the perfmon log file to be analyzed.
Figure 3.36. Dialog box with a perfmon log file added.
This seems to be a little bit tedious but it helps you learn perfmon quickly without experimenting with it yourself. Initially, it might be difficult for you to decide what counters you should select out of the hundreds of built-in counters. To help you get started, Table 3.3 shows all common perfmon counters I typically use for diagnosing my performance issues. You can add more based on your special needs, but this list of counters should be sufficient in general.
Before moving on to the UNIX system performance counters, I'd like to share with you some techniques of using perfmon to diagnose common performance and scalability issues such as memory leaks, CPU bottlenecks, and disk I/O bottlenecks. Using perfmon to diagnose performance and scalability issues is a very important skill to acquire for testing the performance and scalability of a software system on the Windows platform, perfmon is intuitive, easy to learn, and very powerful for diagnosing performance and scalability issues on Windows. This is true not only for troubleshooting the performance and scalability problems you encounter with a complex, large-scale software system, but also for figuring out what's wrong when your desktop or laptop Windows system is too slow for you to bear with.
Let's start with using perfmon to diagnose memory leaks.
Performance Object[Index Term: |Memory leaks:|perfmon, system performance counters][Index Term: |System performance counters|perfmon:|memory leak diagnosis] | Performance Counters |
---|---|
Processor | %ProcessorTime |
System | Processor Queue Length |
Process | %ProcessorTime |
Private Bytes | |
Thread Count | |
Virtual Bytes | |
Working Set | |
Memory | Available MBytes |
Page Reads/sec | |
Page Writes/sec | |
Physical disk or logical disk | %ldle Time (Note: Use 1 – %ldle for %Busy Time) |
Avg. Disk Read Queue Length | |
Avg. Disk Write Queue Length | |
Avg. Disk Bytes/Read | |
Avg. Disk Bytes/Write | |
Avg. Disk sec/Read | |
Avg. Disk sec/Write | |
Disk Read Bytes/sec | |
Disk Write Bytes/sec | |
Disk Bytes/sec | |
Disk Reads/sec | |
Disk Writes/sec | |
Network interface | Bytes Received/sec |
Bytes Sent/sec | |
Bytes Total/sec |
[] Select instances that are pertinent to your tests.
3.7.2. Using perfmon to Diagnose Memory Leaks
The first chart I'd like to show is the memory growth chart, which might help you evaluate the memory leak issues associated with your application. Memory leak is a very common factor affecting the performance and scalability of a software system on Windows, especially with 32-bit Windows operating systems. It's one of the toughest issues in developing software, as most of the time, you know your software leaks memory, but it's hard to know where leaks come from, perfmon can only help diagnose whether you have memory leaks in your software; it doesn't tell you where the leaks come from. You have to use some other tools like Purify® to find and fix the memory leaks that your product suffers.
In a 32-bit environment, the maximum addressable memory space is 4 GB. On the Windows platform, this 4 GB is split between the kernel and a process. Although you can extend that 2-GB limit to 3 GB using a 3-GB switch parameter, that 3 GB may still not be enough for some applications with severe memory leak problems. So the best defense is to contain memory growth in your application. Otherwise, when that 2-GB limit is hit, your application will start to malfunction, which makes it totally unusable.
Figure 3.37. Memory growth associated with two processes of an application written in C/C + + in a Windows environment.
As a performance engineer, you are obligated to check memory growth with your software product by using a large volume of data. When you observe significant memory growth, you need to communicate it back to your development team so that they can fix it in time. Keep in mind that you need to make sure whether memory growth would come down after your test is complete. If it doesn't, it probably can be classified as memory leaks, which sounds more horrible than memory growth. There is also a likelihood that the memory growth you observe is actually memory fragmentation, which is related to how the operating system manages memory. Whether it is memory leak or memory fragmentation, they are equally bad as far as the consequences to the application are concerned.[Index Term: |Memory leaks:|perfmon, system performance counters][Index Term: |See also Application programming interface (API) profiling|system performance counters|perfmon:][Index Term: |System performance counters|perfmon:][Index Term: |System performance counters|perfmon:|CPU bottlenecks diagnosis]
Figure 3.37 shows memory growth with two processes of an application written in C/C++. The total test duration was about 24 hours. Note that private bytes curves are smoother than virtual bytes curves, which appear stair-cased. One should use private memory to evaluate actual physical memory consumption. It is seen that Process A is much more benign than Process B in terms of memory growth, as its private bytes curve is much flatter. Process B reached 320 MB at the end of the test, which means it might reach the 2-GB memory limit if the test lasts 5 days. From this test, it's clear that it's necessary to take some action against the memory growth for Process B.
In the next section, I'll discuss how to use perfmon to diagnose CPU bottlenecks.
3.7.3. Using perfmon to Diagnose CPU Bottlenecks
You can monitor the CPU utilizations of a Windows system using the performance object of Processor with the %Processor Time counter if you know you have only one major process such as a database server running on your system. If you have multiple processes running on your system, then use the Process performance object with the %Processor Time counter for the process instances you are concerned with. The %Processor Time counter for the Processor performance object measures the total average CPU utilization across multiple CPUs, whereas the %Processor Time counter for the Process performance object measures the accumulative CPU utilizations across multiple CPUs. So the maximum value is 100% for the former and N × 100% for the latter, where N is the number of total CPUs of an N-way Windows system. This is a subtle difference that must be accounted for when interpreting CPU utilizations.
Typically, an application might be deployed on multiple systems, for example, the application server on one physical box and the database server on another physical box. When the application is properly sized, and the application is well optimized and tuned, CPU utilizations across multiple systems should be well balanced to yield sustainable, maximum possible performance and scalability. Figure 3.38 shows such a balanced flow where the application server and database server were about equally utilized, yielding a high throughput of creating 127 objects/second. Over one million objects were created during a period of 2 hours and 11 minutes with the associated test run.
If you see the CPU utilization of the database server is going up while the CPU utilization of the application server is going down, then some tuning is required to bring both of them to a steady state. This phenomenon was called "bifurcating," which might be very common for applications that are not well tuned [Liu, 2006]. This is a good example that you should not just keep generating performance test numbers. You should examine utilizations of various resources to see if there are opportunities for improving the performance and scalability of your application as well.[Index Term: |System performance counters|perfmon:|disk I/O bottlenecks diagnosis]
Figure 3.38. CPU utilizations of two identical Intel Xeon systems on Windows 2003, one as the application server and the other as the database server.
The general criteria for defining CPU as the bottleneck on a computer system is that the average CPU utilizations are above 70% or the average processor queue length per CPU is above two. However, there might be a case where other resources, such as disks, may become the bottleneck before the CPU does. This is especially true with database-intensive software applications. Let's look at such a scenario next.
3.7.4. Using perfmon to Diagnose Disk I/O Bottlenecks
In this section, I'd like to share with you a chart that shows disk activities. It is very important to make sure that your disk I/O is not the bottleneck if your application is database intensive.
perfmon provides a sufficient number of counters associated with disk activities. However, very often, you may find that the %Disk Time counter may give you some bogus numbers exceeding 100%. As aworkaround, use 100 — %Idle Time to calculate the disk %Busy Time, which is equivalent to the average utilization for CPUs. Figure 3.39 shows the average disk utilizations calculated using 100 — %Idle Time for that one million object creation batch job discussed in the preceding section. The database storage used for this test was an internal RAID 0 configuration stripped across three physical disks.
Unlike CPUs, a disk utilization level of above 20% starts to indicate that I/O is the bottleneck, whereas for CPUs the threshold is about 70%. This disparity between disks and CPUs is due to the fact that CPUs in general can crank much faster than disks can spin.
Figure 3.39. Average disk utilizations.
Figure 3.40. Average disk read queue length and write queue length.
Exploring disk activities is a lot more interesting than exploring CPU activities, as we can dig deeper into more metrics such as average (read | write) queue length, average (reads | writes) / sec, average disk sec / (read | write), and disk (read | write) / sec. Let's explore each of these disk activity metrics.
Figure 3.40 shows the average disk read queue length and average disk write queue length recorded during that one million object creation batch job. It is seen that the write queue length is much larger than the read queue length, which implies that a lot more disk write activities occurred than read activities. This is not surprising at all, as during this batch job test, one million objects were created and persisted to the disks, which inevitably incurred a lot more disk writes than reads.
Queue length is a measure of the number of items waiting in a queue to be processed. According to queuing theory, which will be introduced in a later chapter of this book, a resource is considered a bottleneck if its queue length is larger than 2. As we introduced earlier, the database storage used for this test was an internal RAID 0 configuration stripped across three physical disks, which would push the queue length threshold to 6. It's clear from Figure 3.40 that the write queue length was around 20, which had far exceeded the threshold value of 6. This implies that a more capable storage system would help improve the performance and scalability of this batch job further.
Figure 3.41 shows the average number of reads and writes per second that occurred during this test. There were about 300 writes/second and 50 reads/second, which once more confirmed that more writes than reads occurred during the test period for the one million object creation batch job. Remember that the throughput for this batch job was 127 CIs/s, which implies that about 2 to 3 writes occurred per object creation on average. This seems to be normal for most database write-intensive applications.
Figure 3.41. Average number of reads and writes per second that occurred during the one million object creation batch job.
In addition to knowing the disk queue lengths and I/O rates associated with a test, it's also insightful to know how long it takes on average per read and per write. Normally, disk times should range from 5 milliseconds to 20 milliseconds with normal I/O loads. You may get submillisecond disk times if the database storage has a huge cache, for example, from a few gigabytes to tens of gigabytes.
For this test, each disk has only a 256-MB cache, so we would expect disk read and write times to be well above 1 millisecond. Actual disk read and write times associated with this test are shown in Figure 3.42. As is seen, the average disk write time is much longer than the average disk read time, as we already know from the previous analysis that there were a lot more requests accumulated up in the write queue than in the read queue. You have confidence when all metrics are consistent with each other.
Charts are very useful for qualitatively characterizing each performance factor. However, they are less precise for quantifying each performance factor. To get more quantitative, you can use the View Report functionality of perfmon to obtain the average value of each performance counter, such as shown in Figure 3.43 with the following quantitative values for some of the indicative disk performance counters:
Average disk time per write: 73 milliseconds
Average disk write queue length: 22
Figure 3.42. Average disk read time and average disk write time with the one million object creation batch job. Note that avg. disk sec / (read \ write) are perfmon counter names and the actual units are in milliseconds.
Disk reads/sec: 46
Disk writes/sec: 297
Keep in mind that you need to narrow the time range of your perfmon log data down to the exact range corresponding to the start and end times of your test; otherwise, the averaged values won't be accurate.
Performance Console allows you to monitor system resource utilizations over an extended period of time when the test is running. It's convenient for post-testing performance and scalability analysis. However, sometimes, you may want to use another Windows utility tool—Task Manager—for checking the resource consumption on the current Windows system. This is the topic for the next section.
Figure 3.43. Perfmon report.
3.7.5. Using Task Manager to Diagnose System Bottlenecks
We'll see in this section that Task Manager is more convenient than perfmon for some tasks. For example:
You may want to have a quick look at how busy the CPUs of a system are overall right now.
You may want to check how well balanced the CPU utilizations are across multiple CPUs of the system. This actually is an easy way to tell whether the software is running in multithreaded mode by examining whether all CPUs are about equally busy simultaneously.
You may want to check which processes are consuming most of the CPU power on this system right now.
You may want to check how much memory is used right now. And you can drill down to which processes are consuming most of the memory.
You may want to check the network utilization right now.
You can even see in real time if memory is leaking or not. If you see the memory consumption of a process is only going up, then there are probably memory leaks with that process.
First, to start up Task Manager, press CTRL + ALT + DELETE and you should get a dialog box similar to Figure 3.44.
As shown in Figure 3.44 under Performance tab, this system has two CPUs and both of them were busy, which means that the application is a multithreaded application. It also shows that a total memory of 374 MB was used up to that moment.
You can check the network utilizations by clicking on the Network tab, and check the users currently logged in by clicking the Users tab. But the most important tab for troubleshooting a performance issue with a system is the Process tab.
Computer programs run on a computer system as processes. Each process has an ID and a name to identify itself on the system it is running. By clicking on the Process tab on the Windows Task Manager dialog box, you can bring up a list of processes that are running currently on the system, as shown in Figure 3.45.
A few notes about this Process tab:
You may want to check the box of Show processes from all users at the left bottom corner of the screenshot in Figure 3.45 in order to see the processes that you are looking for.
You can't see your processes unless they are running right now.
You can sort by CPU Usage or Memory Usage to look for the most CPU-intensive or memory-intensive processes running on this system right now.
You can decide on what metrics you want to be displayed by clicking on the View | Select Columns ... which would bring up the list of metrics you can select, such as shown in Figure 3.46.
As you can see from the screenshot in Figure 3.46, you can select the Memory Usage, Memory Usage Delta, and Peak Memory Usage from the view options made available. These counters give a complete view of the process memory consumption. When a memory-intensive application is running, you will see the memory usage for that process keeps growing with more positive memory usage deltas than negative ones. If the memory usage doesn't come down after the process completed its task and is waiting for more new tasks, that's an indication that there is a memory leak issue with that process.
Figure 3.44. Windows Task Manager.
This concludes our discussion on performance counters on Windows systems. Most software development work is done on Windows, which is why we covered more topics on Windows.
However, for enterprise software applications, UNIX or Linux platforms are the most likely choice for some customers, so you might need to test out your software on these platforms as well. Instead of repeating what is already available in many UNIX/Linux texts, in the next section, I'll show you a simple script that can be used to capture the CPU and memory consumptions for the processes that you are concerned with. This probably is sufficient for most of your performance test needs. In a production environment, UNIX/Linux systems are typically managed by professional administrators who have developed special ways of capturing various system performance counters or simply use tools provided by vendors. That is beyond the scope of this book.
Figure 3.45. Process view from Windows Task Manager.
Figure 3.46. Select columns for process view.
3.7.6. UNIX Platforms
On UNIX and Linux systems, vendors provide their own system performance monitoring tools, although some common utilities such as sar might be available on all specially flavored platforms.
Performance troubleshooting often requires monitoring resource utilizations on a per-process basis. This might be a little bit more challenging on UNIX systems than on Windows systems. On Windows, you use perfmon to configure which processes and what counters you want to monitor. On UNIX systems, you need a script to do the same job. Here, I'd like to share one script that I often use on a specially flavored, popular UNIX platform for capturing CPU and memory utilizations when my tests were running. Since it's written in bash shell, it could be run on other UNIX and Linux systems as well.
Here is the script that you can adapt to your needs for monitoring systems resource usages on a per-process basis in your UNIX or Linux environment:
#!/bin/bash sleepTime=60 pattern="yourPattern" x=0 procId=$ (ps -eo pid,pcpu,time,args | grep $pattern |\ grep -v grep | awk '{print $1}') echo "prodd=" $proId while [$x -ge 0] do date=$(date) ps0=$(ps -o vsz,rss,pcpu -p $procId) x=$((x+l)) echo $x $date $ps0 | awk >{print $1, $3, $4, $5, $8,\ $11/1000.0, $9, $12/1000.0, $10, $13, $14}' sleep $sleepTime done
As you see, this is a bash script. You need to specify how often you want to sample (sleepTime) and enter a string (pattern) that represents your process. Then you extract the process ID of that process. Using that process ID, you keep polling in an infinite loop for the counters you want to record. In this script, I was most interested in three counters, vsz, rss, and pcpu, which represents the virtual memory size, resident memory size, and CPU usage associated with that process. The counters vsz and rss are equivalent to the virtual bytes and private bytes counters of perfmon on Windows, respectively. These counters are very useful for monitoring memory growth associated with a process.
To execute this script, change to the bash shell environment and issue the following command:
prompt> ./scriptFileName > filename.txt &
The output is directed to a text file that you can analyze later. The text file is formatted with comma, so you can import it into an Excel spreadsheet to draw the charts you are interested in. Remember that this is an infinite loop, so you need to bring it to the foreground using the command fg and stop it after you are done.
If you cannot run this script, you might need to execute the command chmod 700 scriptFileName to set proper permissions.
Also, this simple script can be modified to monitor multiple processes in a single script.
In the next section, I'll propose several software performance data principles to help enforce the notion that software performance and scalability testing is not just about gathering data. It's about getting data that has value for your company and your customers.
'IT' 카테고리의 다른 글
2011-08-25 Date Calculations (0) | 2011.08.26 |
---|---|
mysql (0) | 2011.08.25 |
Tomcat root changes (0) | 2011.08.24 |
How to disable your iPhone's creepy tracking feature (1) | 2011.04.22 |
Outlook 전자 메일에서 첨부 파일 차단 (0) | 2011.04.21 |