How to Keep Your AWS Credentials on an EC2 Instance Securely

August 31, 2009 · 17 comments

If you’ve been using EC2 for anything serious then you have some code on your instances that requires your AWS credentials. I’m talking about code that does things like this:

  • Attach an EBS volume
  • Download your application from a non-public location in S3
  • Send and receive SQS messages
  • Query or update SimpleDB

All these actions require your credentials. How do you get the credentials onto the instance in the first place? How can you store them securely once they’re there? First let’s examine the issues involved in securing your keys, and then we’ll explore the available options for doing so.

Potential Vulnerabilities in Transferring and Storing Your Credentials

There are a number of vulnerabilities that should be considered when trying to protect a secret. I’m going to ignore the ones that result from obviously foolish practice, such as transferring secrets unencrypted.

  1. Root: root can get at any file on an instance and can see into any process’s memory. If an attacker gains root access to your instance, and your instance can somehow know the secret, your secret is as good as compromised.
  2. Privilege escalation: User accounts can exploit vulnerabilities in installed applications or in the kernel (whose latest privilege escalation vulnerability was patched in new Amazon Kernel Images on 28 August 2009) to gain root access.
  3. User-data: Any user account able to open a socket on an EC2 instance can see the user-data by getting the URL http://169.254.169.254/latest/user-data . This is exploitable if a web application running in EC2 does not validate input before visiting a user-supplied URL. Accessing the user-data URL is particularly problematic if you use the user-data to pass in the secret unencrypted into the instance – one quick wget (or curl) command by any user and your secret is compromised. And, there is no way to clear the user-data – once it is set at launch time, it is visible for the entire life of the instance.
  4. Repeatability: HTTPS URLs transport their content securely, but anyone who has the URL can get the content. In other words, there is no authentication on HTTPS URLs. If you specify an HTTPS URL pointing to your secret it is safe in transit but not safe from anyone who discovers the URL.

Benefits Offered by Transfer and Storage Methods

Each transfer and storage method offers a different set of benefits. Here are the benefits against which I evaluate the various methods presented below:

  1. Easy to do. It’s easy to create a file in an AMI, or in S3. It’s slightly more complicated to encrypt it. But, you should have a script to automate the provision of new credentials, so all of the methods are graded as “easy to do”.
  2. Possible to change (now). Once an instance has launched, can the credentials it uses be changed?
  3. Possible to change (future). Is it possible to change the credentials that will be used by instances launched in the future? All methods provide this benefit but some make it more difficult to achieve than others, for example instances launched via Auto Scaling may require the Launch Configuration to be updated.


How to Put AWS Credentials on an EC2 Instance

With the above vulnerabilities and benefits in mind let’s look at different ways of getting your credentials onto the instance and the consequences of each approach.

Mitch Garnaat has a great set of articles about the AWS credentials. Part 1 explores what each credential is used for, and part 2 presents some methods of getting them onto an instance, the risks involved in leaving them there, and a strategy to mitigate the risk of them being compromised. A summary of part 1: keep all your credentials secret, like you keep your bank account info secret, because they are – literally – the keys to your AWS kingdom.

As discussed in part 2 of Mitch’s article, there are a number of methods to get the credentials (or indeed, any secret) onto an instance. Here are two, evaluated in light of the benefits presented above:

1. Burn the secret into the AMI

Pros:

  • Easy to do.

Cons:

  • Not possible to change (now) easily. Requires SSHing into the instance, updating the secret, and forcing all applications to re-read it.
  • Not possible to change (future) easily. Requires bundling a new AMI.
  • The secret can be mistakenly bundled into the image when making derived AMIs.

Vulnerabilities:

  • root, privilege escalation.

2. Pass the secret in the user-data

Pros:

  • Easy to do. Putting the secret into the user-data must be integrated into the launch procedure.
  • Possible to change (future). Simply launch new instances with updated user-data. With Auto Scaling, create a new Launch Configuration with the updated user-data.

Cons:

  • Not possible to change (now). User-data cannot be changed once an instance is launched.

Vulnerabilities:

  • user-data, root, privilege escalation.

Here are some additional methods to transfer a secret to an instance, not mentioned in the article:

3. Put the secret in a public URL
The URL can be on a website you control or in S3. It’s insecure and foolish to keep secrets in a publicly accessible URL. Please don’t do this, I had to mention it just to be comprehensive.

Pros:

  • Easy to do.
  • Possible to change (now). Simply update the content at that URL. Any processes on the instance that read the secret each time will see the new value once it is updated.
  • Possible to change (future).

Cons:

  • Completely insecure. Any attacker between the endpoint and the EC2 boundary can see the packets and discover the URL, revealing the secret.

Vulnerabilities:

  • repeatability, root, privilege escalation.

4. Put the secret in a private S3 object and provide the object’s path
To get content from a private S3 object you need the secret access key in order to authenticate with S3. The question then becomes “how to put the secret access key on the instance”, which you need to do via one of the other methods.

Pros:

  • Easy to do.
  • Possible to change (now). Simply update the content at that
    URL. Any processes on the instance that read the secret each time will see the new value once it is updated.
  • Possible to change (future).

Cons:

  • Inherits the cons of the method used to transfer the secret access key.

Vulnerabilities:

  • root, privilege escalation.

5. Put the secret in a private S3 object and provide a signed HTTPS S3 URL
The signed URL must be created before launching the instance and specified somewhere that the instance can access – typically in the user-data. The signed URL expires after some time, limiting the window of opportunity for an attacker to access the URL. The URL should be HTTPS so that the secret cannot be sniffed in transit.

Pros:

  • Easy to do. The S3 URL signing must be integrated into the launch procedure.
  • Possible to change (now). Simply update the content at that URL. Any processes on the instance that read the secret each time will see the new value once it is updated.
  • Possible to change (future). In order to integrate with Auto Scaling you would need to (automatically) update the Auto Scaling Group’s Launch Configuration to provide an updated signed URL for the user-data before the previously specified signed URL expires.

Cons:

  • The secret must be cached on the instance. Once the signed URL expires the secret cannot be fetched from S3 anymore, so it must be stored on the instance somewhere. This may make the secret liable to be burned into derived AMIs.

Vulnerabilities:

  • repeatability (until the signed URL expires), root, privilege escalation.

6. Put the secret on the instance from an outside source, via SCP or SSH
This method involves an outside client – perhaps your local computer, or a management node – whose job it is to put the secret onto the newly-launched instance. The management node must have the private key with which the instance was launched, and must know the secret in order to transfer it. This approach can also be automated, by having a process on the management node poll every minute or so for newly-launched instances.

Pros:

  • Easy to do. OK, not “easy” because it requires an outside management node, but it’s doable.
  • Possible to change (now). Have the management node put the updated secret onto the instance.
  • Possible to change (future). Simply put a new secret onto the management node.

Cons:

  • The secret must be cached somewhere on the instance because it cannot be “pulled” from the management node when needed. This may make the secret liable to be burned into derived AMIs.

Vulnerabilities:

  • root, privilege escalation.

The above methods can be used to transfer the credentials – or any secret – to an EC2 instance.

Instead of transferring the secret directly, you can transfer an encrypted secret. In that case, you’d need to provide a decryption key also – and you’d use one of the above methods to do that. The overall security of the secret would be influenced by the combination of methods used to transfer the encrypted secret and the decryption key. For example, if you encrypt the secret and pass it in the user-data, providing the decryption key in a file burned into the AMI, the secret is vulnerable to anyone with access to both user-data and the file containing the decryption key. Also, if you encrypt your credentials then changing the encryption key requires changing two items: the encryption key and the encrypted credentials. Therefore, changing the encryption key can only be as easy as changing the credentials themselves.

How to Keep AWS Credentials on an EC2 Instance

Once your credentials are on the instance, how do you keep them there securely?

First off, let’s remember that in an environment out of your control, such as EC2, you have no guarantees of security. Anything processed by the CPU or put into memory is vulnerable to bugs in the hypervisor (the virtualization provider) or to malicious AWS personnel (though the AWS Security White Paper goes to great lengths to explain the internal procedures and controls they have implemented to mitigate that possibility) or to legal search and seizure. What this means is that you should only run applications in EC2 for which the risk of secrets being exposed via these vulnerabilities is acceptable. This is true of all applications and data that you allow to leave your premises. But this article is about the security of the AWS credentials, which control the access to your AWS resources. It is perfectly acceptable to ignore the risk of abuse by AWS personnel exposing your credentials because AWS folks can manipulate your account resources without needing your credentials! In short, if you are willing to use AWS then you trust Amazon with your credentials.

There are three ways to store information on a running machine: on disk, in memory, and not at all.

1. Keeping a secret on disk
The secret is stored in a file on disk, with the appropriate permissions set on the file. The secret survives a reboot intact, which can be a pro or a con: it’s a good thing if you want the instance to be able to remain in service through a reboot; it’s a bad thing if you’re trying to hide the location of the secret from an attacker, because the reboot process contains the script to retrieve and cache the secret, revealing its cached location. You can work around this by altering the script that retrieves the secret, after it does its work, to remove traces of the secret’s location. But applications will still need to access the secret somehow, so it remains vulnerable.

Pros:

  • Easily accessible by applications on the instance.

Cons:

  • Visible to any process with the proper permissions.
  • Easy to forget when bundling an AMI of the instance.

Vulnerabilities:

  • root, privilege escalation.

2. Keeping the secret in memory
The secret is stored as a file on a ramdisk. (There are other memory-based methods, too.) The main difference between storing the secret in memory and on the filesystem is that memory does not survive a reboot. If you remove the traces of retrieving the secret and storing it from the startup scripts after they run during the first boot, the secret will only exist in memory. This can make it more difficult for an attacker to discover the secret, but it does not add any additional security.

Pros:

  • Easily accessible by applications on the instance.

Cons:

  • Visible to any process with the proper permissions.

Vulnerabilities:

  • root, privilege escalation.

3. Do not store the secret; retrieve it each time it is needed
This method requires your applications to support the chosen transfer method.

Pros:

  • Secret is never stored on the instance.

Cons:

  • Requires more time because the secret must be fetched each time it is needed.
  • Cannot be used with signed S3 URLs. These URLs expire after some time and the secret will no longer be accessible. If the URL does not expire in a reasonable amount of time then it is as insecure as a public URL.
  • Cannot be used with externally-transferred (via SSH or SCP) secrets because the secret cannot be pulled from the management node. Any protocol that tries to pull the secret from the management node can be also be used by an attacker to request the secret.

Vulnerabilities:

  • root, privilege escalation.

Choosing a Method to Transfer and Store Your Credentials

The above two sections explore some options for transferring and storing a secret on an EC2 instance. If the secret is guarded by another key – such as an encryption key or an S3 secret access key – then this key must also be kept secret and transferred and stored using one of those same methods. Let’s put all this together into some tables presenting the viable options.

Unencrypted Credentials

Here is a summary table evaluating the transfer and storage of unencrypted credentials using different combinations of methods:

Transferring and Keeping Unencrypted Credentials

Some notes on the above table:

  • Methods making it “hard” to change credentials are highlighted in yellow because, through scripting, the difficulty can be minimized. Similarly, the risk of forgetting credentials in an AMI can be minimized by scripting the AMI creation process and choosing a location for the credential file that is excluded from the AMI by the script.
  • While you can transfer credentials using a private S3 URL, you still need to provide the secret access key in order to access that private S3 URL. This secret access key must also be transferred and stored on the instance, so the private S3 URL is not by itself usable. See below for an analysis of using a private S3 URL to transfer credentials. Therefore the Private S3 URL entries are marked as N/A.
  • You can burn credentials into an AMI and store them in memory. The startup process can remove them from the filesystem and place them in memory. The startup process should then remove all traces from the startup scripts mentioning the key’s location in memory, in order to make discovery more difficult for an attacker with access to the startup scripts.
  • Credentials burned into the AMI cannot be “not stored”. They can be erased from the filesystem, but must be stored somewhere in order to be usable by applications. Therefore these entries are marked as N/A.
  • Credentials transferred via a signed S3 URL cannot be “not stored” because the URL expires and, once that happens, is no longer able to provide the credentials. Thus, these entries are marked N/A.
  • Credentials “pushed” onto the instance from an outside source, such as SSH, cannot be “not stored” because they must be accessible to applications on the instance. These entries are marked N/A.

A glance at the above table shows that it is, overall, not difficult to manage unencrypted credentials via any of the methods. Remember: don’t use the Public URL method, it’s completely unsecure.

Bottom line: If you don’t care about keeping your credentials encrypted then pass a signed S3 HTTPS URL in the user-data. The startup scripts of the instance should retrieve the credentials from this URL and store them in a file with appropriate permissions (or in a ramdisk if you don’t want them to remain through a reboot), then the startup scripts should remove their own commands for getting and storing the credentials. Applications should read the credentials from the file (or directly from the signed URL if you don’t care that it will stop working after it expires).

Encrypted Credentials

We discussed 6 different ways of transferring credentials and 3 different ways of storing them. A transfer method and a storage method must be used for the encrypted credentials and for the decryption key. That gives us 36 combinations of transfer methods, and 9 combinations of storage methods, for a grand total of 324 choices.

Here are the first 54, summarizing the options when you choose to burn the encrypted credentials into the AMI:

As (I hope!) you can see, all combinations that involve burning encrypted credentials into the AMI make it hard (or impossible) to change the credentials or the encryption key, both on running instances and for future ones.

Here are the next set, summarizing the options when you choose to pass encrypted credentials via the user-data:

Passing encrypted credentials in the user-data requires the decryption key to be transferred also. It’s pointless from a security perspective to pass the decryption key together with the encrypted credentials in the user-data. The most flexible option in the above table is to pass the decryption key via a signed S3 HTTPS URL (specified in the user-data, or specified at a public URL burned into the AMI) with a relatively short expiry time (say, 4 minutes) allowing enough time for the instance to boot and retrieve it.

Here is a summary of the combinations when the encrypted credentials are passed via a public URL:

It might be surprising, but passing encrypted credentials via a public URL is actually a viable option. You just need to make sure you send and store the decryption key securely, so send that key via a signed S3 HTTPS URL (specified in the user-data on specified at a public URL burned into the AMI) for maximum flexibility.

The combinations with passing the encrypted credentials via a private S3 URL are summarized in this table:

As explained earlier, the private S3 URL is not usable by itself because it requires the AWS secret access key. (The access key id is not a secret). The secret access key can be transferred and stored using the combinations of methods shown in the above table.

The most flexible of the options shown in the above table is to pass in the secret access key inside a signed S3 HTTPS URL (which is itself provided in the user-data or at a public URL burned into the AMI).

Almost there…. This next table summarizes the combinations with encrypted credentials passed via a signed S3 HTTPS URL:

The signed S3 HTTPS URL containing the encrypted credentials can be specified in the user-data or specified behind a public URL which is burned into the AMI. The best options for providing the decryption key are via another signed URL or from an external management node via SSH or SCP.

And, the final section of the table summarizing the combinations of using encrypted credentials passed in via SSH or SCP from an outside management node:

The above table summarizing the use of an external management node to place encrypted credentials on the instance shows the same exact results as the previous table (for a signed S3 HTTPS URL). The same flexibility is achieved using either method.

The Bottom Line

Here’s a practical recommendation: if you have code that generates signed S3 HTTPS URLs then pass in two signed URLs into the user-data, one containing the encrypted credentials and the other containing the decryption key. The startup sequence of the AMI should read these two items from their URLs, decrypt the credentials, and store the credentials in a ramdisk file with the minimum permissions necessary to run the applications. The start scripts should then remove all traces of the procedure (beginning with “read the user-data URL” and ending with “remove all traces of the procedure”) from themselves.

If you don’t have code to generate signed S3 URLs then burn the encrypted credentials into the AMI and pass the decryption key via the user-data. As above, the startup sequence should decrypt the credentials, store them in a ramdisk, and destroy all traces of the raw ingredients and the process itself.

This article is an informal review of the benefits and vulnerabilities offered by different methods of transferring credentials to and storing credentials on an EC2 instance. In a future article I will present scripts to automate the procedures described. In the meantime, please leave your feedback in the comments.

{ 3 trackbacks }

Storing AWS Credentials on an EBS Snapshot Securely
July 19, 2010 at 4:13 pm
juraboy
January 3, 2011 at 11:13 pm
AWS Auto-Scaling and ELB with Reliable Root Domain Handling
January 24, 2011 at 10:56 am

{ 14 comments… read them below or add one }

1 Michael Fairchild October 18, 2009 at 9:04 pm

Another option you can add to the matrix is using an additional authenticate only aws user.
Create a new aws user, 'wimpy', but do not sign up for any services, and do not provide a credit card.
Although the new user cannot provisoin any aws resources, it does get an account id and access keys. Private s3 buckets and objects can be shared with this wimpy user. The wimpy user credentials can be provided in the userdata (or similar options mentioned) allowing boot scripts to retreive authenticated objects from s3, while not exposing the keys to the entire AWS kingdom.
A benefit of this approach, as compared to time expireing s3 urls, is that it can be used with autoscaling.

This method will not however give access to ec2-api commands such as ebs-attach-volume etc. If (and only if) access to these commands is required from the instance a separate monitor instance that does have the primary AWS keys can be used to proxy ec2 commands. The monitor host can listen for requests on the 10.* network from authenticated security groups, and run whatever additional verification is required before then executing the requested ec2-command. This reduces the exposure of your secret to a single instance.

Reply

2 shlomo October 18, 2009 at 10:09 pm

@Michael Fairchild,

Mitch Garnaat suggests a similar two-credential method in part 2 of his article (linked above). He calls them "Secret Credentials" ('wimpy') and "Double Secret Credentials" (the real ones).

The "monitor host" idea is similar to one I've been kicking around lately. My comment to Mitch's blog post outlines the idea, and some more detail is in the comments to the following blog:
http://elastic-security.com/2009/08/20/ec2-design-patterns-1-externalconsole/

Reply

3 6p00e54ee6e7b68834 November 10, 2009 at 11:58 pm

Given the constraints imposted by Auto Scaling, I think there's a better option than using signed URLs.

Signed URLs have the problem of a hardcoded expiration date, which means you need some external script which is vigilant in continually generating new signed URLs and updating your Auto Scaling Group parameters with the latest URL (which will need to be replaced again in X minutes). This puts robustness in direction opposition to security – the most secure solution mandates a short expiration time, which decreases the robustness of the system by requiring the external script to run frequently, without fail.

There's a better solution that keeps all the security goodness of signed URLs with none of the "signed-URL-generator-must-run-or-Auto-Scaling-will-fail" badness.

Instead of using signed URLs, use *public* URLs with a random path element:

https://s3etc/as0df98a0b980a98a0sd98f0a98sdfa/secret-user-data.txt

The URL is world-readable, but its path is unguessable (just like a signed URL).

Your Auto Scaling Launch config is initially configured with this URL.

Your external script then runs *whenever it wants*, creating a new random path & uploading your data to it, and then updating the Auto Scaling Launch Config to point at the new path. The script then deletes the file from the old path, so all running instances no longer have access to the secret data.

This can be combined with the "wimpy" auth scheme so that the URL doesn't even need to be public, and thus your attacker (if lucky enough to remote-exec on the machine before the URL dies) needs more than just 'curl' to get the secret data.

Reply

4 shlomo December 6, 2009 at 3:41 am

@6p00e54ee6e7b68834,

That's also a good suggestion. Even better would be to use a single-use URL, which would cease to work after the first retrieval. Then it would not need to be deleted.

Reply

5 Gabe March 30, 2010 at 5:02 am

AWS could help a lot by providing a way to generate credentials constrained to specific APIs. For example, if I have a machine that simply writes to an SQS queue, then I would generate credentials that only have access to the SendMessage API. If my machine needs to attach EBS volumes and access S3, I would generate credentials with only those permissions. That way in the case of a compromised system or elevation of privilege the damage done is limited to the rights granted in the credentials.

Reply

6 shlomo March 30, 2010 at 5:28 pm

@Gabe,

Absolutely, I agree that fine-grained credentials would help mitigate the risk of compromised credentials.

Reply

7 Yarin April 22, 2010 at 2:50 pm

Good article- any thoughts on using SimpleDB to store credentials?

Reply

8 shlomo April 22, 2010 at 3:18 pm

@Yarin,

SimpleDB requires AWS credentials to access. So it’s equivalent to the option “4. Put the secret in a private S3 object and provide the object’s path” discussed above.

Reply

9 Jack July 9, 2010 at 11:47 pm

If I generate a presigned URL with Amazon’s SDK to a private S3 object, I can access it in a regular browser but cannot wget/curl it and will give me an Error 403: Forbidden. Do you know why that is?

Reply

10 shlomo July 10, 2010 at 7:45 pm

@Jack,

Try putting the URL you give to wget in quotes. Some of these URLs have special characters that the shell interprets and quoting the URL argument will prevent the shell from interpreting those special characters.

Reply

11 Ewout July 12, 2010 at 9:38 pm

@Schlomo,

I have been struggling with the same challenge of getting AWS credentials on an EC2 instance. I came up with roughly the same list of options as you, until tonight, when I thought of another possibility:

when launching an instance, one can specify a snapshot to automatically create an EBS volume from and bind it to a block device. What if you created an EBS volume, put your credentials on it, create a snapshot from that, and then use the mentioned approach? One could use the user-data script (or whatever) to mount the block device and read the credentials. And when an instance terminates, by default the created EBS volume gets deleted (unless you turned it off in the –block-device-mapping option). Make sure the snapshot is private though. And I assume traffic between EC2 and EBS is secure, however I’m not sure of that, but there are many EBS boot images now, so that would be awkward then. Finally, it’s possible to encrypt the EBS volume at filesystem level, and pass the key for it in your user-data script; it doesn’t add security, but prevents someone else from reading the raw storage after having deleted the volume.

That still leaves the ‘How to Keep AWS Credentials on an EC2 Instance’ part, probably you would need to look at SELinux or AppArmor to fix that one, if EC2 even supports that (since EC2 provides the kernels). Also, one could use a read-only filesystem on the EBS volume and have some credentials broker there which takes proper measures to prevent unauthorized retrieving of the credentials; but no idea how to really secure that yet, if it is even possible (since root can do anything, but one could look at the pid of the process requesting the credentials, see which binary it belongs to and check whether the binary is untampered with for example, one could store a list of binaries and sha1sums in the read-only filesystem; but the filesystem itself might be unmounted/recreated/mounted as well).

Reply

12 shlomo July 19, 2010 at 4:16 pm

@Ewout,

Thanks for your comment! I’ve written an article showing how to implement this technique.

Reply

13 never mind November 6, 2010 at 9:47 am

You do realize that once the volume is mounted the credentials are available in clear text for any process with uid 0, right? (think “hackers” here) So what’s the improvement then? Let’s face it, the is *no* secure way to store clear text credentials. And you need them in clear text if you want to use them for AWS.

Reply

14 shlomo November 28, 2010 at 1:18 am

@never mind,

True, there’s no secure way to secure clear-text credentials.

The AWS Identity and Access Management features can be used to mitigate the risk of credentials being exposed.

Reply


'Hacking' 카테고리의 다른 글

DNS Vuln.  (1) 2012.06.15
Employers on track to get more nosey with employees' social media lives  (1) 2012.06.01
HttpOnly  (0) 2011.12.10
Security Advisory  (1) 2011.12.05
Web Penetration Testings  (0) 2011.12.04
Posted by CEOinIRVINE
l

Configuring ConTEXT

Online Game 2011. 12. 27. 07:28

Time for action — Configuring ConTEXT

Now we'll set up ConTEXT to make reading UnrealScript easier, and use it to compile scripts with a single button press.

  1. Click on Options in the top toolbar, then Environment Options. In the first tab, General, set When started to Open last file/project. That way any files that we're working on will automatically open the next time we use ConTEXT.

  2. Make sure that Remember editing positions is checked. This makes the files we're working with open in the same position the next time we open ConTEXT. This saves a lot of time remembering where we left off.

  3. In the Editor tab, uncheck Allow cursor after end of line. This will keep our code clean by preventing unnecessary spaces all over the place.

  4. Uncheck Smart tabs. Part of writing clean code is having it lined up, and Smart tabs tends to move the cursor to the beginning of words instead of a set number of spaces.

  5. Make sure that Line numbers is checked. When we start compiling, any errors that show up will give us a line number which makes them easier to find and fix. This also helps when we search through our code as the searches will also give us line numbers.

  6. Finally for this tab, set Block indent and C/Java Block Indent to 4. This comes down to personal preference but having four spaces instead of two makes it easier to quickly scan through code and find what you're looking for.

  7. Now we're going to set up ConTEXT to compile code. On the Execute Keys tab, click on Add, then type .uc into the Extensions field that comes up.

  8. Once that's done four keys, F9 through F12, will show up in the User Exec Keys window. Let's click on F9 to make it convenient. Once clicked the options on the right become available.

  9. For the Execute line, click on the button to the right of the field and navigate to our UDK installation's Binaries\Win32 folder, and select UDK.exe. For Start In, copy the Execute line but leave out UDK.exe.

  10. In the Parameters field, type "make" without the quote marks. This tells UDK.exe that we want to compile code instead of opening the game.

  11. Change Save to All Files Before Execution. This makes sure that all of our changes get compiled if we're working in more than one file.

  12. Check Capture Console Output and Scroll Console to the Last Line. This lets you see the compile progress at the bottom of ConTEXT, and any compiler errors will show up there as well.

  13. Now we're going to set up an UnrealScript highlighter. Highlighters make code easier to read by color coding keywords for a programming language. Since each language has different keywords, we need a highlighter specific to UnrealScript. Close ConTEXT and find the UnrealScript.chl file included with this book, or head to http://wiki.beyondunreal.com/ConTEXT and follow the instructions for the UnrealScript highlighter. Once you have your .chl file, place it in ConTEXT's Highlighters folder.

  14. Open ConTEXT again. In the top toolbar there is a drop-down menu, and our UnrealScript highlighter should show up in the list now. Select it and we're done setting up ConTEXT!

What just happened?

ConTEXT is now set up to compile our UnrealScript files; all we have to do is press F9. The first time we do this it will also recompile Epic's UnrealScript files, this is normal. The compiler may also show up in a separate window instead of at the bottom of ConTEXT, this is also normal.

Starting to feel like a programmer yet? Now that we're able to compile code we just need an easy way to browse through Epic's UnrealScript source code, and to do that we're going to install another small program, UnCodeX.

UnCodeX

We can write our own code with ConTEXT, but now we need something to make sense of the Development\Src folder. There are over 2,000 files in there! This is where UnCodeX comes in. UnCodeX organizes the files into a class tree so that we can easily browse through them and see their relationship to each other. It also allows us to quickly search through the source code, which is where the line numbers in ConTEXT come in handy when we're searching through our own code.

'Online Game' 카테고리의 다른 글

Using Memory Correctly  (0) 2011.12.24
코스프레? 귀엽다~~  (0) 2011.03.04
World of Warcraft adds another half million subscribers  (0) 2008.12.26
Lunia Chronicle  (0) 2008.12.19
Gunz Hack loltastic.dll  (0) 2008.12.16
Posted by CEOinIRVINE
l

Using Memory Correctly

Online Game 2011. 12. 24. 03:23

Using Memory Correctly

Did you ever hear the joke about the programmer trying to beat the Devil in a coding contest? Part of his solution involved overcoming a memory limitation by storing a few bytes in a chain of sound waves between the microphone and the speaker. That’s an interesting idea, and I’ll bet we would have tried that one on Ultima VII had someone on our team thought of it.

Memory comes in very different shapes, sizes, and speeds. If you know what you’re doing, you can write programs that make efficient use of these different memory blocks. If you believe that it doesn’t matter how you use memory, you’re in for a real shock. This includes assuming that the standard memory manager for your operating system is efficient; it usually isn’t, and you’ll have to think about writing your own.

Understanding the Different Kinds of Memory

The system RAM is the main warehouse for storage, as long as the system has power. Video RAM or VRAM is usually much smaller and is specifically used for storing objects that will be used by the video card. Some platforms, such as Xbox and Xbox360, have a unified memory architecture that makes no distinctions between RAM and VRAM. Desktop PCs run operating systems like Windows Vista, and have virtual memory that mimics much larger memory space by swapping blocks of little-used RAM to your hard disk. If you’re not careful, a simple memcpy() could cause the hard drive to seek, which to a computer is like waiting for the sun to cool off.

System RAM

Your system RAM is a series of memory sticks that are installed on the motherboard. Memory is actually stored in nine bits per byte, with the extra bit used to catch memory parity errors. Depending on the OS, you get to play with a certain addressable range of memory. The operating system keeps some to itself. Of the parts you get to play with, it is divided into three parts when your application loads:

  • Global memory: This memory never changes size. It is allocated when your program loads and stores global variables, text strings, and virtual function tables.

  • Stack: This memory grows as your code calls deeper into core code, and it shrinks as the code returns. The stack is used for parameters in function calls and local variables. The stack has a fixed size that can be changed with compiler settings.

  • Heap: This memory grows and shrinks with dynamic memory allocation. It is used for persistent objects and dynamic data structures.

Old-timers used to call global memory the DATA segment, harkening back to the days when there used to be near memory and far memory. It was called that because programmers used different pointers to get to it. What a disgusting practice! Everything is much cleaner these days because each pointer is a full 32 bits. (Don’t worry, I’m not going to bore you with the “When I went to school I used to load programs from a linear access tape cassette” story.)

Your compiler and linker will attempt to optimize the location of anything you put into the global memory space based on the type of variable. This includes constant text strings. Many compilers, including Visual Studio, will attempt to store text strings only once to save space:

const char *error1 = "Error";
const char *error2 = "Error";

int main()
{
   printf ("%x\n", (int)error1);
   // How quaint. A printf.
   printf ("%x\n", (int)error2);
   return 0;
}

This code yields interesting results. You’ll notice that under Visual C++, the two pointers point to the same text string in the global address space. Even better than that, the text string is one that was already global and stuck in the CRT libraries. It’s as if we wasted our time typing “Error.” This trick only works for constant text strings, since the compiler knows they can never change. Everything else gets its own space. If you want the compiler to consolidate equivalent text strings, they must be constant text strings.

Don’t make the mistake of counting on some kind of rational order to the global addresses. You can’t count on anything the compiler or linker will do, especially if you are considering crossing platforms.

On most operating systems, the stack starts at high addresses and grows toward lower addresses. C and C++ parameters get pushed onto the stack from right to left—the last parameter is the first to get pushed onto the stack in a function call. Local parameters get pushed onto the stack in their order of appearance:

void testStack(int x, int y)
{
   int a = 1;
   int b = 2;

   printf("&x= %-10x &y= %-10x\n", &x, &y);
   printf("&a= %-10x &b= %-10x\n", &a, &b);
}

This code produces the following output:

&x= 12fdf0  &y= 12fdf4
&a= 12fde0  &b= 12fdd4

Stack addresses grow downward to smaller memory addresses. Thus, it should be clear that the order in which the parameters and local variables were pushed was y, x, a, and b. Which turns out to be exactly the order in which you read them—a good mnemonic. The next time you’re debugging some assembler code, you’ll be glad to understand this, especially if you are setting your instruction pointer by hand.

C++ allows a high degree of control over the local scope. Every time you enclose code in a set of braces, you open a local scope with its own local variables:

int main()
{
   int a = 0;
   {           // start a local scope here...
     int a = 1;
     printf("%d\n", a);
   }

   printf("%d\n", a);
}

This code compiles and runs just fine. The two integer variables are completely separate entities. I’ve written this example to make a clear point, but I’d never actually write code like this. Doing something like this in Texas is likely to get you shot. The real usefulness of this kind of code is for use with C++ objects that perform useful tasks when they are destroyed—you can control the exact moment a destructor is called by closing a local scope.

Video Memory (VRAM)

Video RAM is the memory installed on your video card, unless we’re talking about an Xbox. Xbox hardware has unified memory architecture or UMI, so there’s no difference between system RAM and VRAM. It would be nice if the rest of the world worked that way. Other hardware such as the Intel architectures must send any data between VRAM and system RAM over a bus. The PS2 has even more different kinds of memory. There are quite a few bus architectures and speeds out there, and it is wise to understand how reading and writing data across the bus affects your game’s speed.

As long as the CPU doesn’t have to read from VRAM, everything clicks along pretty fast. If you need to grab a piece of VRAM for something, the bits have to be sent across the bus to system RAM. Depending on your architecture, your CPU and GPU must argue for a moment about timing, stream the bits, and go their separate ways. While this painful process is occurring, your game has come to a complete halt.

This problem was pretty horrific back in the days of fixed function pipelines when anything not supported by the video card had to be done with the CPU, such as the first attempts at motion blur. With programmable pipelines, you can create shaders that can run directly on the bits stored in VRAM, making this kind of graphical effect extremely efficient.

The hard disk can’t write straight to VRAM, so every time a new texture is needed you’ll need to stop the presses, so to speak. The smart approach is to limit any communication needed between the CPU and the video card. If you are going to send anything to it, it is best to send it in batches.

If you’ve been paying attention, you’ll realize that the GPU in your video card is simply painting the screen using the components in VRAM. If it ever has to stop and ask system RAM for something, your game won’t run as fast as it could.

Mr. Mike’s First Texture Manager

The first texture manager I ever wrote was for Ultima IX. (That was before the game was called Ultima: Ascension.) I wrote the texture manager for 3DFx’s Glide API, and I had all of an hour to do it. We wanted to show some Origin execs what Ultima looked like running under hardware acceleration. Not being the programmer extraordinaire, and I only had a day to work, my algorithm had to be pretty simple. I chose a variant of LRU, but since I didn’t have time to write the code to sort and organize the textures, I simply threw out every texture in VRAM the moment there wasn’t any additional space. I think this code got some nomination for the dumbest texture manager ever written, but it actually worked. The player would walk around for 90 seconds or so before the hard disk lit up and everything came to a halt for two seconds. I’m pretty sure someone rewrote it before U9 shipped. At least, I hope someone rewrote it!


Optimizing Memory Access

Every access to system RAM uses a CPU cache. If the desired memory location is already in the cache, the contents of the memory location are presented to the CPU extremely quickly. If, on the other hand, the memory is not in the cache, a new block of system RAM must be fetched into the cache. This takes a lot longer than you’d think.

A good test bed for this problem uses multidimensional arrays. C++ defines its arrays in row major order. This ordering puts the members of the right-most index next to each other in memory.

TestData[0][0][0] and TestData[0][0][1] are stored in adjacent memory locations.

Row Order or Column Order?

Not every language defines arrays in row order. Some versions of PASCAL define arrays in column order. Don’t make assumptions unless you like writing slow code.


If you access an array in the wrong order, it will create a worst-case CPU cache scenario. Here’s an example of two functions that access the same array and do the same task. One will run much faster than the other:

const int g_n = 250;
float TestData[g_n][g_n][g_n];

inline void column_ordered()
{
  for (int k=0; k<g_n; k++)           // K
     for (int j=0; j<g_n; j++)        // J
        for (int i=0; i<g_n; i++)     // I
           TestData[i][j][k] = 0.0f;
}

inline void row_ordered()
{
  for (int i=0; i<g_n; i++)           // I
     for (int j=0; j<g_n; j++)        // J
        for (int k=0; k<g_n; k++)     // K
           TestData[i][j][k] = 0.0f;
}

The timed output of running both functions on my test machine showed that accessing the array in row order was nearly nine times faster:

Column Ordered=2817 ms  Row Ordered=298 ms  Delta=2519 ms

Any code that accesses any largish data structure can benefit from this technique. If you have a multistep process that affects a large data set, try to arrange your code to perform as much work as possible in smaller memory blocks. You’ll optimize the use of the L2 cache and make a much faster piece of code. While you surely won’t have any piece of runtime game code do something this crazy, you might very well have a game editor or production tool that does.

Memory Alignment

The CPU reads and writes memory-aligned data much faster than other data. Any N-byte data type is memory aligned if the starting address is evenly divisible by N. For example, a 32-bit integer is memory aligned on a 32-bit architecture if the starting address is 0x04000000. The same 32-bit integer is unaligned if the starting address is 0x04000002, since the memory address is not evenly divisible by 4 bytes.

You can perform a little experiment in memory alignment and how it affects access time by using example code like this:

#pragma pack(push, 1)
struct ReallySlowStruct
{
   char c : 6;
    __int64 d : 64;
   int b : 32;
   char a : 8;
};

struct SlowStruct
{
   char c;
   __int64 d;
   int b;
   char a;
};

struct FastStruct
{
    __int64 d;
   int b;
   char a;
   char c;
   char unused[2];
};

#pragma pack(pop)

					  

I wrote a piece of code to perform some operations on the member variables in each structure. The difference in times is as follows:

Really slow=417 ms
Slow=222 ms
Fast=192 ms

Your penalty for using the SlowStruct over FastStruct is about 14 percent on my test machine. The penalty for using ReallySlowStruct is code that runs twice as slowly.

The first structure isn’t even aligned properly on bit boundaries, hence the name ReallySlowStruct. The definition of the 6-bit char variable throws the entire structure out of alignment. The second structure, SlowStruct, is also out of alignment, but at least the byte boundaries are aligned. The last structure, FastStruct, is completely aligned for each member. The last member, unused, ensures that the structure fills out to an 8-byte boundary in case someone declares an array of FastStruct.

Notice the #pragma pack(push, 1) at the top of the source example? It’s accompanied by a #pragma pack(pop) at the bottom. Without them, the compiler, depending on your project settings, will choose to spread out the member variables and place each one on an optimal byte boundary. When the member variables are spread out like that, the CPU can access each member quickly, but all that unused space can add up. If the compiler were left to optimize SlowStruct by adding unused bytes, each structure would be 24 bytes instead of just 14. Seven extra bytes are padded after the first char variable, and the remaining bytes are added at the end. This ensures that the entire structure always starts on an 8-byte boundary. That’s about 40 percent of wasted space, all due to a careless ordering of member variables.

Don’t let the compiler waste precious memory space. Put some of your brain cells to work and align your own member variables. You don’t get many opportunities to save memory and optimize CPU at the same time.

Virtual Memory

Virtual memory increases the addressable memory space by caching unused memory blocks to the hard disk. The scheme depends on the fact that even though you might have a 500MB data structure, you aren’t going to be playing with the whole thing at the same time. The unused bits are saved off to your hard disk until you need them again. You should be cheering and wincing at the same time. Cheering because every programmer likes having a big memory playground, and wincing because anything involving the hard disk wastes a lot of time.

Just to see how bad it can get, I took the code from the array access example and modified it to iterate through a three-dimensional array 500 elements cubed. The total size of the array would be 476MB, much bigger than the installed memory on the test machine. A data structure bigger than available memory is sometimes called out-of-core. I ran the column_ordered() function and went to lunch. When I got back about 30 minutes later, the test program was still chugging away. The hard drive was seeking like mad, and I began to wonder whether my hard disk would give out. I became impatient and re-ran the example and timed just one iteration of the inner loop. It took 379.75 seconds to run the inner loop. The entire thing would have taken over 50 hours to run. I’m glad I didn’t wait. Any game written badly can suffer the same fate, and as you can see, the difference between running quickly and paging constantly to your hard disk can be as small as a single byte.

Remember that the original array, 250 elements cubed, ran the test code in 298ms when the fast row_ordered() function was used. The large array is only eight times bigger, giving an expectation that the same code should have run in 2384ms, or just under two-and-a-half seconds.

Compare 2384ms with 50 hours, and you’ll see how virtual memory can work against you if your code accesses virtual memory incorrectly.

Cache Misses Can Cost You Dearly

Any time a cache is used inefficiently, you can degrade the overall performance of your game by many orders of magnitude. This is commonly called “thrashing the cache” and is your worst nightmare. If your game is thrashing cache, you might be able to solve the problem by reordering some code, but most likely you will need to reduce the size of the data.


Writing Your Own Memory Manager

Most games extend the provided memory management system. The biggest reasons to do this are performance, efficiency, and improved debugging. Default memory managers in the C runtime are designed to run fairly well in a wide range of memory allocation scenarios. They tend to break down under the load of computer games, though, where allocations and deallocations of relatively tiny memory blocks can be fast and furious.

A standard memory manager, like the one in the C runtime, must support multithreading. Each time the memory manager’s data structures are accessed or changed, they must be protected with critical sections, allowing only one thread to allocate or deallocate memory at a time. All this extra code is time consuming, especially if you use malloc and free very frequently. Most games are multithreaded to support sound systems, but don’t necessarily need a multithreaded memory manager for every part of the game. A single threaded memory manager that you write yourself might be a good solution.

The Infamous Voodoo Memory Manager

Ultima VII: The Black Gate had a legendary memory manager: The VooDoo Memory Management System. It was written by a programmer who used to work on guided missile systems for the Department of Defense, a brilliant and dedicated engineer. U7 ran in good old DOS back in the days when protected mode was the neat new thing. VooDoo was a true 32-bit memory system for a 16-bit operating system, and the only problem with it was you had to read and write to the memory locations with assembly code, since the Borland compiler didn’t understand 32-bit pointers. It was done this way because U7 couldn’t really exist in a 16-bit memory space—there were atomic data structures larger than 64KB. For all its hoopla, VooDoo was actually pretty simple, and it only provided the most basic memory management features. The fact that it was actually called VooDoo was a testament to the fact that it actually worked; it wasn’t exactly supported by the operating system or the Borland compilers.

VooDoo MM for Ultima VII is a great example of writing a simple memory manager to solve a specific problem. It didn’t support multithreading, it assumed that memory blocks were large, and finally it wasn’t written to support a high number or frequency of allocations.


Simple memory managers can use a doubly-linked list as the basis for keeping track of allocated and free memory blocks. The C runtime uses a more complicated system to reduce the algorithmic complexity of searching through the allocated and free blocks that could be as small as a single byte. Your memory blocks might be either more regularly shaped, fewer in number, or both. This creates an opportunity to design a simpler, more efficient system.

Default memory managers must assume that deallocations happen approximately as often as allocations, and they might happen in any order and at any time. Their data structures have to keep track of a large number of blocks of available and used memory. Any time a piece of memory changes state from used to available, the data structures must be quickly traversed. When blocks become available again, the memory manager must detect adjacent available blocks and merge them to make a larger block. Finding free memory of an appropriate size to minimize wasted space can be extremely tricky. Since default memory managers solve these problems to a large extent, their performance isn’t as high as another memory manager that can make more assumptions about how and when memory allocations occur.

If your game can allocate and deallocate most of its dynamic memory space at once, you can write a memory manager based on a data structure no more complicated than a singly-linked list. You’d never use something this simple in a more general case, of course, because a singly-linked list has O(n) algorithmic complexity. That would cripple any memory management system used in the general case.

A good reason to extend a memory manager is to add some debugging features. Two features that are common include adding additional bytes before and after the allocation to track memory corruption or to track memory leaks. The C runtime adds only one byte before and after an allocated block, which might be fine to catch those pesky x+1 and x-1 errors, but doesn’t help for much else. If the memory corruption seems pretty random, and most of them sure seem that way, you can increase your odds of catching the culprit by writing a custom manager that adds more bytes to the beginning and ending of each block. In practice, the extra space is set to a small number, even one byte, in the release build.

Different Build Options will Change Runtime Behavior

Anything you do differently from the debug and release builds can change the behavior of bugs from one build target to another. Murphy’s Law dictates that the bug will only appear in the build target that is hardest, or even impossible, to debug.


Another common extension to memory managers is leak detection. It is a common practice to redefine the new operator to add __FILE__ and __LINE__ information to each allocated memory block in debug mode. When the memory manager is shut down, all the unfreed blocks are printed out in the output window in the debugger. This should give you a good place to start when you need to track down a memory leak.

If you decide to write your own memory manager, keep the following points in mind:

  • Data structures: Choose the data structure that matches your memory allocation scenario. If you traverse a large number of free and available blocks very frequently, choose a hash table or tree-based structure. If you hardly ever traverse it to find free blocks, you could get away with a list. Store the data structure separately from the memory pool; any corruption will keep your memory manager’s data structure intact.

  • Single/multithreaded access: Don’t forget to add appropriate code to protect your memory manager from multithreaded access if you need it. Eliminate the protections if you are sure that access to the memory manager will only happen from a single thread, and you’ll gain some performance.

  • Debug and testing: Allocate a little additional memory before and after the block to detect memory corruption. Add caller information to the debug memory blocks; at a minimum, you should use __FILE__ and __LINE__ to track where the allocation occurred.

One of the best reasons to extend the C runtime memory manager is to write a better system to manage small memory blocks. The memory managers supplied in the C runtime or MFC library are not meant for tiny allocations. You can prove it to yourself by allocating two integers and subtracting their memory addresses as shown here:

int *a = new int;
int *b = new int;

int delta1 = ((int)b - (int)a) - sizeof(int);

The wasted space for the C runtime library was 28 bytes for a release build and 60 bytes for the debug build under Visual Studio. Even with the release build, an integer takes eight times as much memory space as it would if it weren’t dynamically allocated.

Most games overload the new operator to allocate small blocks of memory from a reserved pool set aside for smaller allocations. Memory allocations that are larger than a set number of bytes can still use the C runtime. I recommend that you start with 128 bytes as the largest block your small allocator will handle and tweak the size until you are happy with the performance.

'Online Game' 카테고리의 다른 글

Configuring ConTEXT  (0) 2011.12.27
코스프레? 귀엽다~~  (0) 2011.03.04
World of Warcraft adds another half million subscribers  (0) 2008.12.26
Lunia Chronicle  (0) 2008.12.19
Gunz Hack loltastic.dll  (0) 2008.12.16
Posted by CEOinIRVINE
l