'memory'에 해당되는 글 3건

  1. 2011.12.24 Using Memory Correctly by CEOinIRVINE
  2. 2008.12.25 'Orphan Train' carried children to new lives by CEOinIRVINE
  3. 2008.10.02 Linux Memory/CPU administration by CEOinIRVINE

Using Memory Correctly

Online Game 2011. 12. 24. 03:23

Using Memory Correctly

Did you ever hear the joke about the programmer trying to beat the Devil in a coding contest? Part of his solution involved overcoming a memory limitation by storing a few bytes in a chain of sound waves between the microphone and the speaker. That’s an interesting idea, and I’ll bet we would have tried that one on Ultima VII had someone on our team thought of it.

Memory comes in very different shapes, sizes, and speeds. If you know what you’re doing, you can write programs that make efficient use of these different memory blocks. If you believe that it doesn’t matter how you use memory, you’re in for a real shock. This includes assuming that the standard memory manager for your operating system is efficient; it usually isn’t, and you’ll have to think about writing your own.

Understanding the Different Kinds of Memory

The system RAM is the main warehouse for storage, as long as the system has power. Video RAM or VRAM is usually much smaller and is specifically used for storing objects that will be used by the video card. Some platforms, such as Xbox and Xbox360, have a unified memory architecture that makes no distinctions between RAM and VRAM. Desktop PCs run operating systems like Windows Vista, and have virtual memory that mimics much larger memory space by swapping blocks of little-used RAM to your hard disk. If you’re not careful, a simple memcpy() could cause the hard drive to seek, which to a computer is like waiting for the sun to cool off.

System RAM

Your system RAM is a series of memory sticks that are installed on the motherboard. Memory is actually stored in nine bits per byte, with the extra bit used to catch memory parity errors. Depending on the OS, you get to play with a certain addressable range of memory. The operating system keeps some to itself. Of the parts you get to play with, it is divided into three parts when your application loads:

  • Global memory: This memory never changes size. It is allocated when your program loads and stores global variables, text strings, and virtual function tables.

  • Stack: This memory grows as your code calls deeper into core code, and it shrinks as the code returns. The stack is used for parameters in function calls and local variables. The stack has a fixed size that can be changed with compiler settings.

  • Heap: This memory grows and shrinks with dynamic memory allocation. It is used for persistent objects and dynamic data structures.

Old-timers used to call global memory the DATA segment, harkening back to the days when there used to be near memory and far memory. It was called that because programmers used different pointers to get to it. What a disgusting practice! Everything is much cleaner these days because each pointer is a full 32 bits. (Don’t worry, I’m not going to bore you with the “When I went to school I used to load programs from a linear access tape cassette” story.)

Your compiler and linker will attempt to optimize the location of anything you put into the global memory space based on the type of variable. This includes constant text strings. Many compilers, including Visual Studio, will attempt to store text strings only once to save space:

const char *error1 = "Error";
const char *error2 = "Error";

int main()
{
   printf ("%x\n", (int)error1);
   // How quaint. A printf.
   printf ("%x\n", (int)error2);
   return 0;
}

This code yields interesting results. You’ll notice that under Visual C++, the two pointers point to the same text string in the global address space. Even better than that, the text string is one that was already global and stuck in the CRT libraries. It’s as if we wasted our time typing “Error.” This trick only works for constant text strings, since the compiler knows they can never change. Everything else gets its own space. If you want the compiler to consolidate equivalent text strings, they must be constant text strings.

Don’t make the mistake of counting on some kind of rational order to the global addresses. You can’t count on anything the compiler or linker will do, especially if you are considering crossing platforms.

On most operating systems, the stack starts at high addresses and grows toward lower addresses. C and C++ parameters get pushed onto the stack from right to left—the last parameter is the first to get pushed onto the stack in a function call. Local parameters get pushed onto the stack in their order of appearance:

void testStack(int x, int y)
{
   int a = 1;
   int b = 2;

   printf("&x= %-10x &y= %-10x\n", &x, &y);
   printf("&a= %-10x &b= %-10x\n", &a, &b);
}

This code produces the following output:

&x= 12fdf0  &y= 12fdf4
&a= 12fde0  &b= 12fdd4

Stack addresses grow downward to smaller memory addresses. Thus, it should be clear that the order in which the parameters and local variables were pushed was y, x, a, and b. Which turns out to be exactly the order in which you read them—a good mnemonic. The next time you’re debugging some assembler code, you’ll be glad to understand this, especially if you are setting your instruction pointer by hand.

C++ allows a high degree of control over the local scope. Every time you enclose code in a set of braces, you open a local scope with its own local variables:

int main()
{
   int a = 0;
   {           // start a local scope here...
     int a = 1;
     printf("%d\n", a);
   }

   printf("%d\n", a);
}

This code compiles and runs just fine. The two integer variables are completely separate entities. I’ve written this example to make a clear point, but I’d never actually write code like this. Doing something like this in Texas is likely to get you shot. The real usefulness of this kind of code is for use with C++ objects that perform useful tasks when they are destroyed—you can control the exact moment a destructor is called by closing a local scope.

Video Memory (VRAM)

Video RAM is the memory installed on your video card, unless we’re talking about an Xbox. Xbox hardware has unified memory architecture or UMI, so there’s no difference between system RAM and VRAM. It would be nice if the rest of the world worked that way. Other hardware such as the Intel architectures must send any data between VRAM and system RAM over a bus. The PS2 has even more different kinds of memory. There are quite a few bus architectures and speeds out there, and it is wise to understand how reading and writing data across the bus affects your game’s speed.

As long as the CPU doesn’t have to read from VRAM, everything clicks along pretty fast. If you need to grab a piece of VRAM for something, the bits have to be sent across the bus to system RAM. Depending on your architecture, your CPU and GPU must argue for a moment about timing, stream the bits, and go their separate ways. While this painful process is occurring, your game has come to a complete halt.

This problem was pretty horrific back in the days of fixed function pipelines when anything not supported by the video card had to be done with the CPU, such as the first attempts at motion blur. With programmable pipelines, you can create shaders that can run directly on the bits stored in VRAM, making this kind of graphical effect extremely efficient.

The hard disk can’t write straight to VRAM, so every time a new texture is needed you’ll need to stop the presses, so to speak. The smart approach is to limit any communication needed between the CPU and the video card. If you are going to send anything to it, it is best to send it in batches.

If you’ve been paying attention, you’ll realize that the GPU in your video card is simply painting the screen using the components in VRAM. If it ever has to stop and ask system RAM for something, your game won’t run as fast as it could.

Mr. Mike’s First Texture Manager

The first texture manager I ever wrote was for Ultima IX. (That was before the game was called Ultima: Ascension.) I wrote the texture manager for 3DFx’s Glide API, and I had all of an hour to do it. We wanted to show some Origin execs what Ultima looked like running under hardware acceleration. Not being the programmer extraordinaire, and I only had a day to work, my algorithm had to be pretty simple. I chose a variant of LRU, but since I didn’t have time to write the code to sort and organize the textures, I simply threw out every texture in VRAM the moment there wasn’t any additional space. I think this code got some nomination for the dumbest texture manager ever written, but it actually worked. The player would walk around for 90 seconds or so before the hard disk lit up and everything came to a halt for two seconds. I’m pretty sure someone rewrote it before U9 shipped. At least, I hope someone rewrote it!


Optimizing Memory Access

Every access to system RAM uses a CPU cache. If the desired memory location is already in the cache, the contents of the memory location are presented to the CPU extremely quickly. If, on the other hand, the memory is not in the cache, a new block of system RAM must be fetched into the cache. This takes a lot longer than you’d think.

A good test bed for this problem uses multidimensional arrays. C++ defines its arrays in row major order. This ordering puts the members of the right-most index next to each other in memory.

TestData[0][0][0] and TestData[0][0][1] are stored in adjacent memory locations.

Row Order or Column Order?

Not every language defines arrays in row order. Some versions of PASCAL define arrays in column order. Don’t make assumptions unless you like writing slow code.


If you access an array in the wrong order, it will create a worst-case CPU cache scenario. Here’s an example of two functions that access the same array and do the same task. One will run much faster than the other:

const int g_n = 250;
float TestData[g_n][g_n][g_n];

inline void column_ordered()
{
  for (int k=0; k<g_n; k++)           // K
     for (int j=0; j<g_n; j++)        // J
        for (int i=0; i<g_n; i++)     // I
           TestData[i][j][k] = 0.0f;
}

inline void row_ordered()
{
  for (int i=0; i<g_n; i++)           // I
     for (int j=0; j<g_n; j++)        // J
        for (int k=0; k<g_n; k++)     // K
           TestData[i][j][k] = 0.0f;
}

The timed output of running both functions on my test machine showed that accessing the array in row order was nearly nine times faster:

Column Ordered=2817 ms  Row Ordered=298 ms  Delta=2519 ms

Any code that accesses any largish data structure can benefit from this technique. If you have a multistep process that affects a large data set, try to arrange your code to perform as much work as possible in smaller memory blocks. You’ll optimize the use of the L2 cache and make a much faster piece of code. While you surely won’t have any piece of runtime game code do something this crazy, you might very well have a game editor or production tool that does.

Memory Alignment

The CPU reads and writes memory-aligned data much faster than other data. Any N-byte data type is memory aligned if the starting address is evenly divisible by N. For example, a 32-bit integer is memory aligned on a 32-bit architecture if the starting address is 0x04000000. The same 32-bit integer is unaligned if the starting address is 0x04000002, since the memory address is not evenly divisible by 4 bytes.

You can perform a little experiment in memory alignment and how it affects access time by using example code like this:

#pragma pack(push, 1)
struct ReallySlowStruct
{
   char c : 6;
    __int64 d : 64;
   int b : 32;
   char a : 8;
};

struct SlowStruct
{
   char c;
   __int64 d;
   int b;
   char a;
};

struct FastStruct
{
    __int64 d;
   int b;
   char a;
   char c;
   char unused[2];
};

#pragma pack(pop)

					  

I wrote a piece of code to perform some operations on the member variables in each structure. The difference in times is as follows:

Really slow=417 ms
Slow=222 ms
Fast=192 ms

Your penalty for using the SlowStruct over FastStruct is about 14 percent on my test machine. The penalty for using ReallySlowStruct is code that runs twice as slowly.

The first structure isn’t even aligned properly on bit boundaries, hence the name ReallySlowStruct. The definition of the 6-bit char variable throws the entire structure out of alignment. The second structure, SlowStruct, is also out of alignment, but at least the byte boundaries are aligned. The last structure, FastStruct, is completely aligned for each member. The last member, unused, ensures that the structure fills out to an 8-byte boundary in case someone declares an array of FastStruct.

Notice the #pragma pack(push, 1) at the top of the source example? It’s accompanied by a #pragma pack(pop) at the bottom. Without them, the compiler, depending on your project settings, will choose to spread out the member variables and place each one on an optimal byte boundary. When the member variables are spread out like that, the CPU can access each member quickly, but all that unused space can add up. If the compiler were left to optimize SlowStruct by adding unused bytes, each structure would be 24 bytes instead of just 14. Seven extra bytes are padded after the first char variable, and the remaining bytes are added at the end. This ensures that the entire structure always starts on an 8-byte boundary. That’s about 40 percent of wasted space, all due to a careless ordering of member variables.

Don’t let the compiler waste precious memory space. Put some of your brain cells to work and align your own member variables. You don’t get many opportunities to save memory and optimize CPU at the same time.

Virtual Memory

Virtual memory increases the addressable memory space by caching unused memory blocks to the hard disk. The scheme depends on the fact that even though you might have a 500MB data structure, you aren’t going to be playing with the whole thing at the same time. The unused bits are saved off to your hard disk until you need them again. You should be cheering and wincing at the same time. Cheering because every programmer likes having a big memory playground, and wincing because anything involving the hard disk wastes a lot of time.

Just to see how bad it can get, I took the code from the array access example and modified it to iterate through a three-dimensional array 500 elements cubed. The total size of the array would be 476MB, much bigger than the installed memory on the test machine. A data structure bigger than available memory is sometimes called out-of-core. I ran the column_ordered() function and went to lunch. When I got back about 30 minutes later, the test program was still chugging away. The hard drive was seeking like mad, and I began to wonder whether my hard disk would give out. I became impatient and re-ran the example and timed just one iteration of the inner loop. It took 379.75 seconds to run the inner loop. The entire thing would have taken over 50 hours to run. I’m glad I didn’t wait. Any game written badly can suffer the same fate, and as you can see, the difference between running quickly and paging constantly to your hard disk can be as small as a single byte.

Remember that the original array, 250 elements cubed, ran the test code in 298ms when the fast row_ordered() function was used. The large array is only eight times bigger, giving an expectation that the same code should have run in 2384ms, or just under two-and-a-half seconds.

Compare 2384ms with 50 hours, and you’ll see how virtual memory can work against you if your code accesses virtual memory incorrectly.

Cache Misses Can Cost You Dearly

Any time a cache is used inefficiently, you can degrade the overall performance of your game by many orders of magnitude. This is commonly called “thrashing the cache” and is your worst nightmare. If your game is thrashing cache, you might be able to solve the problem by reordering some code, but most likely you will need to reduce the size of the data.


Writing Your Own Memory Manager

Most games extend the provided memory management system. The biggest reasons to do this are performance, efficiency, and improved debugging. Default memory managers in the C runtime are designed to run fairly well in a wide range of memory allocation scenarios. They tend to break down under the load of computer games, though, where allocations and deallocations of relatively tiny memory blocks can be fast and furious.

A standard memory manager, like the one in the C runtime, must support multithreading. Each time the memory manager’s data structures are accessed or changed, they must be protected with critical sections, allowing only one thread to allocate or deallocate memory at a time. All this extra code is time consuming, especially if you use malloc and free very frequently. Most games are multithreaded to support sound systems, but don’t necessarily need a multithreaded memory manager for every part of the game. A single threaded memory manager that you write yourself might be a good solution.

The Infamous Voodoo Memory Manager

Ultima VII: The Black Gate had a legendary memory manager: The VooDoo Memory Management System. It was written by a programmer who used to work on guided missile systems for the Department of Defense, a brilliant and dedicated engineer. U7 ran in good old DOS back in the days when protected mode was the neat new thing. VooDoo was a true 32-bit memory system for a 16-bit operating system, and the only problem with it was you had to read and write to the memory locations with assembly code, since the Borland compiler didn’t understand 32-bit pointers. It was done this way because U7 couldn’t really exist in a 16-bit memory space—there were atomic data structures larger than 64KB. For all its hoopla, VooDoo was actually pretty simple, and it only provided the most basic memory management features. The fact that it was actually called VooDoo was a testament to the fact that it actually worked; it wasn’t exactly supported by the operating system or the Borland compilers.

VooDoo MM for Ultima VII is a great example of writing a simple memory manager to solve a specific problem. It didn’t support multithreading, it assumed that memory blocks were large, and finally it wasn’t written to support a high number or frequency of allocations.


Simple memory managers can use a doubly-linked list as the basis for keeping track of allocated and free memory blocks. The C runtime uses a more complicated system to reduce the algorithmic complexity of searching through the allocated and free blocks that could be as small as a single byte. Your memory blocks might be either more regularly shaped, fewer in number, or both. This creates an opportunity to design a simpler, more efficient system.

Default memory managers must assume that deallocations happen approximately as often as allocations, and they might happen in any order and at any time. Their data structures have to keep track of a large number of blocks of available and used memory. Any time a piece of memory changes state from used to available, the data structures must be quickly traversed. When blocks become available again, the memory manager must detect adjacent available blocks and merge them to make a larger block. Finding free memory of an appropriate size to minimize wasted space can be extremely tricky. Since default memory managers solve these problems to a large extent, their performance isn’t as high as another memory manager that can make more assumptions about how and when memory allocations occur.

If your game can allocate and deallocate most of its dynamic memory space at once, you can write a memory manager based on a data structure no more complicated than a singly-linked list. You’d never use something this simple in a more general case, of course, because a singly-linked list has O(n) algorithmic complexity. That would cripple any memory management system used in the general case.

A good reason to extend a memory manager is to add some debugging features. Two features that are common include adding additional bytes before and after the allocation to track memory corruption or to track memory leaks. The C runtime adds only one byte before and after an allocated block, which might be fine to catch those pesky x+1 and x-1 errors, but doesn’t help for much else. If the memory corruption seems pretty random, and most of them sure seem that way, you can increase your odds of catching the culprit by writing a custom manager that adds more bytes to the beginning and ending of each block. In practice, the extra space is set to a small number, even one byte, in the release build.

Different Build Options will Change Runtime Behavior

Anything you do differently from the debug and release builds can change the behavior of bugs from one build target to another. Murphy’s Law dictates that the bug will only appear in the build target that is hardest, or even impossible, to debug.


Another common extension to memory managers is leak detection. It is a common practice to redefine the new operator to add __FILE__ and __LINE__ information to each allocated memory block in debug mode. When the memory manager is shut down, all the unfreed blocks are printed out in the output window in the debugger. This should give you a good place to start when you need to track down a memory leak.

If you decide to write your own memory manager, keep the following points in mind:

  • Data structures: Choose the data structure that matches your memory allocation scenario. If you traverse a large number of free and available blocks very frequently, choose a hash table or tree-based structure. If you hardly ever traverse it to find free blocks, you could get away with a list. Store the data structure separately from the memory pool; any corruption will keep your memory manager’s data structure intact.

  • Single/multithreaded access: Don’t forget to add appropriate code to protect your memory manager from multithreaded access if you need it. Eliminate the protections if you are sure that access to the memory manager will only happen from a single thread, and you’ll gain some performance.

  • Debug and testing: Allocate a little additional memory before and after the block to detect memory corruption. Add caller information to the debug memory blocks; at a minimum, you should use __FILE__ and __LINE__ to track where the allocation occurred.

One of the best reasons to extend the C runtime memory manager is to write a better system to manage small memory blocks. The memory managers supplied in the C runtime or MFC library are not meant for tiny allocations. You can prove it to yourself by allocating two integers and subtracting their memory addresses as shown here:

int *a = new int;
int *b = new int;

int delta1 = ((int)b - (int)a) - sizeof(int);

The wasted space for the C runtime library was 28 bytes for a release build and 60 bytes for the debug build under Visual Studio. Even with the release build, an integer takes eight times as much memory space as it would if it weren’t dynamically allocated.

Most games overload the new operator to allocate small blocks of memory from a reserved pool set aside for smaller allocations. Memory allocations that are larger than a set number of bytes can still use the C runtime. I recommend that you start with 128 bytes as the largest block your small allocator will handle and tweak the size until you are happy with the performance.

'Online Game' 카테고리의 다른 글

Configuring ConTEXT  (0) 2011.12.27
코스프레? 귀엽다~~  (0) 2011.03.04
World of Warcraft adds another half million subscribers  (0) 2008.12.26
Lunia Chronicle  (0) 2008.12.19
Gunz Hack loltastic.dll  (0) 2008.12.16
Posted by CEOinIRVINE
l

PUEBLO, Colorado (CNN) -- Orphan Train rider Stanley Cornell's oldest memory is of his mother's death in 1925.

Stanley Cornell, right, and his younger brother, Victor, were adopted from an "Orphan Train."

Stanley Cornell, right, and his younger brother, Victor, were adopted from an "Orphan Train."

"My first feeling was standing by my mom's bedside when she was dying. She died of tuberculosis," recalls Cornell. "I remember her crying, holding my hand, saying to 'be good to Daddy.' "

"That was the last I saw of her. I was probably four," Cornell says of his mother, Lottie Cornell, who passed away in Elmira, New York.

His father, Floyd Cornell, was still suffering the effects of nerve gas and shell shock after serving as a soldier in combat during WWI. That made it difficult for him to keep steady work or care for his two boys.

"Daddy Floyd," as Stanley Cornell calls his birth father, eventually contacted the Children's Aid Society. The society workers showed up in a big car with candy and whisked away Stanley and his brother, Victor, who was 16 months younger. Photo See the Cornell family album »

Stanley Cornell remembers his father was crying and hanging on to a post. The little boy had a feeling he would not see his father again.

The two youngsters were taken to an orphanage, the Children's Aid Society of New York, founded by social reformer Charles Loring Brace

"It was kind of rough in the orphans' home," Cornell remembers, adding that the older children preyed on the younger kids -- even though officials tried to keep them separated by chicken wire fences. He says he remembers being beaten with whips like those used on horses.

New York City in 1926 was teeming with tens of thousands of homeless and orphaned children. These so-called "street urchins" resorted to begging, stealing or forming gangs to commit violence to survive. Some children worked in factories and slept in doorways or flophouses.

The Orphan Train movement took Stanley Cornell and his brother out of the city during the last part of a mass relocation movement for children called "placing out."Watch Cornell share ups and downs of his family story

Brace's agency took destitute children, in small groups, by train to small towns and farms across the country, with many traveling to the West and Midwest. From 1854 to 1929, more than 200,000 children were placed with families across 47 states. It was the beginning of documented foster care in America.

"It's an exodus, I guess. They called it Orphan Train riders that rode the trains looking for mom and dad like my brother and I."

"We'd pull into a train station, stand outside the coaches dressed in our best clothes. People would inspect us like cattle farmers. And if they didn't choose you, you'd get back on the train and do it all over again at the next stop."

Cornell and his brother were "placed out" twice with their aunts in Pennsylvania and Coffeyville, Kansas. But their placements didn't last and they were returned to the Children's Aid Society.

"Then they made up another train. Sent us out West. A hundred-fifty kids on a train to Wellington, Texas," Cornell recalls. "That's where Dad happened to be in town that day."

Each time an Orphan Train was sent out, adoption ads were placed in local papers before the arrival of the children.

J.L. Deger, a 45-year-old farmer, knew he wanted a boy even though he already had two daughters ages 10 and 13.

"He'd just bought a Model T. Mr. Deger looked those boys over. We were the last boys holding hands in a blizzard, December 10, 1926," Cornell remembers. He says that day he and his brother stood in a hotel lobby.

"He asked us if we wanted to move out to farm with chickens, pigs and a room all to your own. He only wanted to take one of us, decided to take both of us."

Life on the farm was hard work.

"I did have to work and I expected it, because they fed me, clothed me, loved me. We had a good home. I'm very grateful. Always have been, always will be."

Taking care of a family wasn't always easy.

"In 1931, the Dust Bowl days started. The wind never quit. Sixty, 70 miles an hour, all that dust. It was a mess. Sometimes, Dad wouldn't raise a crop in two years."

A good crop came in 1940. With his profit in hand, "first thing Dad did was he took that money and said, 'we're going to repay the banker for trusting us,' " Cornell says.

When World War II began, Cornell joined the U.S. Army Signal Corps. He shipped out to Africa and landed near Casablanca, Morocco, where he laid telephone and teletype lines. Later he served in Egypt and northern Sicily. While in Italy, he witnessed Mount Vesuvius erupting.

It was on a telephone line-laying mission between Naples and Rome that Cornell suffered his first of three wounds.

"Our jeep was hit by a bomb. I thought I was in the middle of the ocean. It was the middle of January and I was in a sea of mud."

With their jeep destroyed and Cornell bleeding from a head wound, his driver asked a French soldier to use his vehicle to transport them. The Frenchman refused to drive Cornell the five miles to the medical unit.

"So, the driver pulled out his pistol, put the gun to the French soldier's head and yelled, 'tout suite!' or 'move it!' " Cornell recalls.

Once he was treated, Cornell remembers the doctor saying, "You've got 30 stitches in your scalp. An eighth of an inch deeper and you'd be dead."

Cornell always refused to accept his commendations for a Purple Heart even though he'd been wounded three times, twice severely enough to be hospitalized for weeks. He felt the medals were handed out too often to troops who suffered the equivalent of a scratch.

His younger brother served during the war in the Air Force at a base in Nebraska, where he ran a film projector at the officers' club.

As WWII was drawing to a close, Stanley Cornell headed up the teletype section at Allied headquarters in Reims, France. "I saw [Gen. Dwight] Eisenhower every day," he recalls.

On May 7, 1945, the Nazis surrendered. "I sent the first teletype message from Eisenhower saying the war was over with Germany," Cornell says.

In 1946, the 25-year-old Stanley Cornell met with his 53-year-old birth father, Daddy Floyd. It was the last time they would see each other.

Cornell eventually got married and he and his wife, Earleen, adopted two boys, Dana and Dennis, when each was just four weeks old.

"I knew what it was like to grow up without parents," Cornell says. "We were married seven years and couldn't have kids, so I asked my wife, 'how about adoption?' She'd heard my story before and said, 'OK.' "

After they adopted their two boys, Earleen gave birth to a girl, Denyse.

Dana Cornell understands what his father and uncle went through.

"I don't think [Uncle] Vic and Stan could have been better parents. I can relate, you know, because Dad adopted Dennis and me. He has taught me an awful lot over the years," Dana Cornell says.

Dana Cornell says his adoptive parents have always said that if the boys wanted to find their birth parents, they would help. But he decided not to because of how he feels about the couple who adopted him. "They are my parents and that's the way it's gonna be."

Posted by CEOinIRVINE
l

Linux Memory/CPU administration

IT 2008. 10. 2. 03:02

 

Red Hat Linux comes with a variety of resource monitoring tools. While there are more than those listed here, these tools are representative in terms of functionality. The tools are:

  • free

  • top (and GNOME System Monitor, a more graphically oriented version of top)

  • vmstat

  • The Sysstat suite of resource monitoring tools

Let us look at each one in more detail.

1. free

The free command displays system memory utilization. Here is an example of its output:

             total       used       free     shared    buffers     cached
Mem:        255508     240268      15240          0       7592      86188
-/+ buffers/cache:     146488     109020
Swap:       530136      26268     503868

The Mem: row displays physical memory utilization, while the Swap: row displays the utilization of the system swap space, and the -/+ buffers/cache: row displays the amount of physical memory currently devoted to system buffers.

Since free by default only displays memory utilization information once, it is only useful for very short-term monitoring, or quickly determining if a memory-related problem is currently in progress. Although free has the ability to repetitively display memory utilization figures via its -s option, the output scrolls, making it difficult to easily see changes in memory utilization.

Tip Tip
 

A better solution than using free -s would be to run free using the watch command. For example, to display memory utilization every two seconds (the default display interval), use this command:

watch free

The watch command issues the free command every two seconds, after first clearing the screen. This makes it much easier to see how memory utilization changes over time, as it is not necessary to scan continually scrolling output. You can control the delay between updates by using the -n option, and can cause any changes between updates to be highlighted by using the -d option, as in the following command:

watch -n 1 -d free

For more information, refer to the watch man page.

The watch command runs until interrupted with [Ctrl]-[C]. The watch command is something to keep in mind; it can come in handy in many situations.

2. top

While free displays only memory-related information, the top command does a little bit of everything. CPU utilization, process statistics, memory utilization — top does it all. In addition, unlike the free command, top's default behavior is to run continuously; there is no need to use the watch command. Here is a sample display:

11:13am  up 1 day, 31 min,  5 users,  load average: 0.00, 0.05, 0.07
89 processes: 85 sleeping, 3 running, 1 zombie, 0 stopped
CPU states:  0.5% user,  0.7% system,  0.0% nice, 98.6% idle
Mem:  255508K av, 241204K used,  14304K free,    0K shrd,   16604K buff
Swap: 530136K av,  56964K used, 473172K free                64724K cached

  PID USER   PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 8532 ed      16   0  1156 1156   912 R     0.5  0.4   0:11 top
 1520 ed      15   0  4084 3524  2752 S     0.3  1.3   0:00 gnome-terminal
 1481 ed      15   0  3716 3280  2736 R     0.1  1.2   0:01 gnome-terminal
 1560 ed      15   0 11216  10M  4256 S     0.1  4.2   0:18 emacs
    1 root    15   0   472  432   416 S     0.0  0.1   0:04 init
    2 root    15   0     0    0     0 SW    0.0  0.0   0:00 keventd
    3 root    15   0     0    0     0 SW    0.0  0.0   0:00 kapmd
    4 root    34  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU0
    5 root    15   0     0    0     0 SW    0.0  0.0   0:00 kswapd
    6 root    25   0     0    0     0 SW    0.0  0.0   0:00 bdflush
    7 root    15   0     0    0     0 SW    0.0  0.0   0:00 kupdated
    8 root    25   0     0    0     0 SW    0.0  0.0   0:00 mdrecoveryd
   12 root    15   0     0    0     0 SW    0.0  0.0   0:00 kjournald
   91 root    16   0     0    0     0 SW    0.0  0.0   0:00 khubd
  185 root    15   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  186 root    15   0     0    0     0 SW    0.0  0.0   0:00 kjournald
  576 root    15   0   712  632   612 S     0.0  0.2   0:00 dhcpcd

The display is divided into two sections. The top section contains information related to overall system status — uptime, load average, process counts, CPU status, and utilization statistics for both memory and swap space. The lower section displays process-level statistics, the exact nature of which can be controlled while top is running.

Warning Warning
 

Although top looks like a simple display-only program, this is not the case. top uses single character commands to perform various operations; if you are logged in as root, it is possible to change the priority and even kill any process on your system. Therefore, until you have reviewed top's help screen (type [?] to display it), it is safest to only type [q] (which exits top).

2.1. The GNOME System Monitor — A Graphical top

If you are more comfortable with graphical user interfaces, the GNOME System Monitor may be more to your liking. Like top, the GNOME System Monitor displays information related to overall system status, process counts, memory and swap utilization, and process-level statistics.

However, the GNOME System Monitor goes a step further by also including graphical representations of CPU, memory, and swap utilization, along with a tabular disk space utilization listing. Here is an example of the GNOME System Monitor's Process Listing display:

Figure 2-1. The GNOME System Monitor Process Listing Display

Additional information can be displayed for a specific process by first clicking on the desired process and then clicking on the More Info button.

To view the CPU, memory, and disk usage statistics, click on the System Monitor tab.

3. vmstat

For a more concise view of system performance, try vmstat. Using this resource monitor, it is possible to get an overview of process, memory, swap, I/O, system, and CPU activity in one line of numbers:

   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 524684 155252 338068   0   0     1     6  111   114  10   3  87
        

The process-related fields are:

  • r — The number of runnable processes waiting for access to the CPU

  • b — The number of processes in an uninterruptible sleep state

  • w — The number of processes swapped out, but runnable

The memory-related fields are:

  • swpd — The amount of virtual memory used

  • free — The amount of free memory

  • buff — The amount of memory used for buffers

  • cache — The amount of memory used as page cache

The swap-related fields are:

  • si — The amount of memory swapped in from disk

  • so — The amount of memory swapped out to disk

The I/O-related fields are:

  • bi — Blocks sent to a block device

  • bo— Blocks received from a block device

The system-related fields are:

  • in — The number of interrupts per second

  • cs — The number of context switches per second

The CPU-related fields are:

  • us — The percentage of the time the CPU ran user-level code

  • sy — The percentage of the time the CPU ran system-level code

  • id — The percentage of the time the CPU was idle

When vmstat is run without any options, only one line is displayed. This line contains averages, calculated from the time the system was last booted.

However, most system administrators do not rely on the data in this line, as the time over which it was collected varies. Instead, most administrators take advantage of vmstat's ability to repetitively display resource utilization data at set intervals. For example, the command vmstat 1 displays one new line of utilization data every second, while the command vmstat 1 10 displays one new line per second, but only for the next ten seconds.

In the hands of an experienced administrator, vmstat can be used to quickly determine resource utilization and performance issues. But to gain more insight into those issues, a different kind of tool is required — a tool capable of more in-depth data collection and analysis.

4. The Sysstat Suite of Resource Monitoring Tools

While the previous tools may be helpful for gaining more insight into system performance over very short time frames, they are of little use beyond providing a snapshot of system resource utilization. In addition, there are aspects of system performance that cannot be easily monitored using such simplistic tools.

Therefore, a more sophisticated tool is necessary. Sysstat is such a tool.

Sysstat contains the following tools related to collecting I/O and CPU statistics:

iostat

Displays an overview of CPU utilization, along with I/O statistics for one or more disk drives.

mpstat

Displays more in-depth CPU statistics.

Sysstat also contains tools that collect system resource utilization data and create daily reports based on that data. These tools are:

sadc

Known as the system activity data collector, sadc collects system resource utilization information and writes it to a file.

sar

Producing reports from the files created by sadc, sar reports can be generated interactively or written to a file for more intensive analysis.

The following sections explore each of these tools in more detail.

4.1. The iostat command

The iostat command at its most basic provides an overview of CPU and disk I/O statistics:

Linux 2.4.18-18.8.0 (pigdog.example.com)     12/11/2002

avg-cpu:  %user   %nice    %sys   %idle
           6.11    2.56    2.15   89.18

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
dev3-0            1.68        15.69        22.42   31175836   44543290
          

Below the first line (which displays the system's kernel version and hostname, along with the current date), iostat displays an overview of the system's average CPU utilization since the last reboot. The CPU utilization report includes the following percentages:

  • Percentage of time spent in user mode (running applications, etc.)

  • Percentage of time spent in user mode (for processes that have altered their scheduling priority using nice(2))

  • Percentage of time spent in kernel mode

  • Percentage of time spent idle

Below the CPU utilization report is the device utilization report. This report contains one line for each active disk device on the system and includes the following information:

  • The device specification, displayed as dev<major-number>-sequence-number, where <major-number> is the device's major number[1], and <sequence-number> is a sequence number starting at zero.

  • The number of transfers (or I/O operations) per second.

  • The number of 512-byte blocks read per second.

  • The number of 512-byte blocks written per second.

  • The total number of 512-byte blocks read.

  • The total number of 512-byte block written.

This is just a sample of the information that can be obtained using iostat. For more information, see the iostat(1) man page.

4.2. The mpstat command

The mpstat command at first appears no different from the CPU utilization report produced by iostat:

Linux 2.4.18-14smp (pigdog.example.com)      12/11/2002

07:09:26 PM  CPU   %user   %nice %system   %idle    intr/s
07:09:26 PM  all    6.40    5.84    3.29   84.47    542.47
          

With the exception of an additional column showing the interrupts per second being handled by the CPU, there is no real difference. However, the situation changes if mpstat's -P ALL option is used:

Linux 2.4.18-14smp (pigdog.example.com)      12/11/2002

07:13:03 PM  CPU   %user   %nice %system   %idle    intr/s
07:13:03 PM  all    6.40    5.84    3.29   84.47    542.47
07:13:03 PM    0    6.36    5.80    3.29   84.54    542.47
07:13:03 PM    1    6.43    5.87    3.29   84.40    542.47
          

On multiprocessor systems, mpstat allows the utilization for each CPU to be viewed individually, making it possible to determine how effectively each CPU is being used.

4.3. The sadc command

As stated earlier, the sadc command collects system utilization data and writes it to a file for later analysis. By default, the data is written to files in the /var/log/sa/ directory. The files are named sa<dd>, where <dd> is the current day's two-digit date.

sadc is normally run by the sa1 script. This script is periodically invoked by cron via the file sysstat, which is located in /etc/crond.d. The sa1 script invokes sadc for a single one-second measuring interval. By default, cron runs sa1 every 10 minutes, adding the data collected during each interval to the current /var/log/sa/sa<dd> file.

4.4. The sar command

The sar command produces system utilization reports based on the data collected by sadc. As configured in Red Hat Linux, sar is automatically run to process the files automatically collected by sadc. The report files are written to /var/log/sa/ and are named sar<dd>, where <dd> is the two-digit representations of the previous day's two-digit date.

sar is normally run by the sa2 script. This script is periodically invoked by cron via the file sysstat, which is located in /etc/crond.d. By default, cron runs sa2 once a day at 23:53, allowing it to produce a report for the entire day's data.

4.4.1. Reading sar Reports

The format of a sar report produced by the default Red Hat Linux configuration consists of multiple sections, with each section containing a specific type of data, ordered by the time of day that the data was collected. Since sadc is configured to perform a one-second measurement interval every ten minutes, the default sar reports contain data in ten-minute increments, from 00:00 to 23:50[2].

Each section of the report starts with a heading that illustrates the data contained in the section. The heading is repeated at regular intervals throughout the section, making it easier to interpret the data while paging through the report. Each section ends with a line containing the average of the data reported in that section.

Here is a sample section sar report, with the data from 00:30 through 23:40 removed to save space:

00:00:01          CPU     %user     %nice   %system     %idle
00:10:00          all      6.39      1.96      0.66     90.98
00:20:01          all      1.61      3.16      1.09     94.14
…
23:50:01          all     44.07      0.02      0.77     55.14
Average:          all      5.80      4.99      2.87     86.34
            

In this section, CPU utilization information is displayed. This is very similar to the data displayed by iostat.

Other sections may have more than one line's worth of data per time, as shown by this section generated from CPU utilization data collected on a dual-processor system:

00:00:01          CPU     %user     %nice   %system     %idle
00:10:00            0      4.19      1.75      0.70     93.37
00:10:00            1      8.59      2.18      0.63     88.60
00:20:01            0      1.87      3.21      1.14     93.78
00:20:01            1      1.35      3.12      1.04     94.49
…
23:50:01            0     42.84      0.03      0.80     56.33
23:50:01            1     45.29      0.01      0.74     53.95
Average:            0      6.00      5.01      2.74     86.25
Average:            1      5.61      4.97      2.99     86.43
            

There are a total of seventeen different sections present in reports generated by the default Red Hat Linux sar configuration; many are discussing in upcoming chapters. For more information about the data contained in each section, see the sar(1) man page.

Notes

[1]

Device major numbers can be found by using ls -l to display the desired device file in /dev/. Here is sample output from ls -l /dev/hda:

brw-rw----    1 root     disk       3,   0 Aug 30 19:31 /dev/hda
                  

The major number in this example is 3, and appears between the file's group and its minor number.

[2]

Due to changing system loads, the actual time that the data was collected may vary by a second or two.

Posted by CEOinIRVINE
l