by Dejan Jelovic

"Java is high performance. By high performance we mean adequate. By adequate we mean slow." - Mr. Bunny

Anybody that has ever used a non-trivial Java program or has programmed in Java knows that Java is slower than native programs written in C++. This is a fact of life, something that we accept when we use Java.

However, many folks would like to convince us that this is just a temporary condition. Java is not slow by design, they say. Instead, it is slow because today's JIT implementations are relatively young and don't do all the optimizations they could.

This is incorrect. No matter how good the JITs get, Java will always be slower than C++.

The Idea

People who claim that Java can be as fast as C++ or even faster often base their opinion on the idea that more disciplined languages give the compiler more room for optimization. So, unless you are going to hand-optimize the whole program, the compiler will do a better job overall.

This is true. Fortran still kicks C++'s ass in numeric computing because it is more disciplined. With no fear of pointer aliasing the compiler can optimize better. The only way that C++ can rival the speed of Fortran is with a cleverly designed active library like Blitz++.

However, in order to achieve overall results like that, the language must be designed to give the compiler room for optimization. Unfortunately, Java was not designed that way. So no matter how smart the compilers get, Java will never approach the speed of C++.

The Benchmarks

Perversely, the only area in which Java can be as fast as C++ is a typical benchmark. If you need to calculate Nth Fibonacci number or run Linpack, there is no reason why Java cannot be as fast as C++. As long as all the computation stays in one class and uses only primitive data types like int and double, the Java compiler is on equal footing with the C++ compiler.

The Real World

The moment you start using objects in your program, Java looses the potential for optimization. This section lists some of the reasons why.

1. All Objects are Allocated on the Heap

Java only allocates primitive data types like int and double and object references on the stack. All objects are allocated on the heap.

For large objects which usually have identity semantics, this is not a handicap. C++ programmers will also allocate these objects on the heap. However, for small objects with value semantics, this is a major performance killer.

What small objects? For me these are iterators. I use a lot of them in my designs. Someone else may use complex numbers. A 3D programmer may use a vector or a point class. People dealing with time series data will use a time class. Anybody using these will definitely hate trading a zero-time stack allocation for a constant-time heap allocation. Put that in a loop and that becomes O (n) vs. zero. Add another loop and you get O (n^2) vs. again, zero.

2. Lots of Casts

With the advent of templates, good C++ programmers have been able to avoid casts almost completely in high-level programs. Unfortunately, Java doesn't have templates, so Java code is typically full of casts.

What does that mean for performance? Well, all casts in Java are dynamic casts, which are expensive. How expensive? Consider how you would implement a dynamic cast:

The fastest thing you could do is assign a number to each class and then have a matrix that tells if any two classes are related, and if they are, what is the offset that needs to be added to the pointer in order to make the cast. In that case, the pseudo-code for the cast would look something like this:

DestinationClass makeCast (Object o, Class destinationClass) {
    Class sourceClass = o.getClass (); // JIT compile-time
    int sourceClassId = sourceClass.getId (); // JIT compile-time

    int destinationId = destinationClass.getId ();

    int offset = ourTable [sourceClassId][destinationClassId];

    if (offset != ILLEGAL_OFFSET_VALUE) {
        return <object o adjusted for offset>;
    }
    else {
        throw new IllegalCastException ();
    }
}

Quite a lot of code, this little cast! And this here is a rosy picture - using a matrix to represent class relationships takes up a lot of memory and no sane compiler out there would do that. Instead, they will either use a map or walk the inheritance hierarchy - both of which will slow things down even further.

3. Increased Memory Use

Java programs use about double the memory of comparable C++ programs to store the data. There are three reasons for this:

  1. Programs that utilize automatic garbage collection typically use about 50% more memory that programs that do manual memory management.
  2. Many of the objects that would be allocated on stack in C++ will be allocated on the heap in Java.
  3. Java objects will be larger, due to all objects having a virtual table plus support for synchronization primitives.

A larger memory footprint increases the probability that parts of the program will be swapped out to the disk. And swap file usage kills the speed like nothing else.

4. Lack of Control over Details

Java was intentionally designed to be a simple language. Many of the features available in C++ that give the programmer control over details were intentionally stripped away.

For example, in C++ one can implement schemes that improve the locality of reference. Or allocate and free many objects at once. Or play pointer tricks to make member access faster. Etc.

None of these schemes are available in Java.

5. No High-Level Optimizations

Programmers deal with high-level concepts. Unlike them, compilers deal exclusively with low-level ones. To a programmer, a class named Matrix represents a different high-level concept from a class named Vector. To a compiler, those names are only entries in the symbol table. What it cares about are the functions that those classes contain, and the statements inside those functions.

Now think about this: say you implement the function exp (double x, double y) that raises x to the exponent y. Can a compiler, just by looking at the statements in that function, figure out that exp (exp (x, 2), 0.5) can be optimized by simply replacing it with x? Of course not!

All the optimizations that a compiler can do are done at the statement level, and they are built into the compiler. So although the programmer might know that two functions are symmetric and cancel each other now, or that the order of some function calls is irrelevant in some place, unless the compiler can figure it out by looking at the statements, the optimization will not be done.

So, if a high-level optimization is to be done, there has to be a way for the programmer to specify the high-level optimization rules for the compiler.

No popular programming language/system does this today. At least not in the totally open sense, like what the Microsoft's Intentional Programming project promises. However, in C++ you can do template metaprogramming to implement optimizations that deal with high-level objects. Temporary elimination, partial evaluation, symmetric function call removal and other optimizations can be implemented using templates. Of course, not all high-level optimizations can be done this way. And implementing some of these things can be cumbersome. But a lot can be done, and people have implemented some snazzy libraries using these techniques.

Unfortunately, Java doesn't have any metaprogramming facilities, and thus high-level optimizations are not possible in Java.

So...

Java, with the current language features, will never be as fast as C++. This pretty much means that it's not a sensible choice for high-performance software and the highly competitive COTS arena. But its small learning curve, its forgiveness, and its large standard library make it a good choice for some small and medium-sized in-house and custom-built software.

Notes

1. James Gosling has proposed a number of language features that would help improve Java performance. You can find the text here. Unfortunately, the Java language has not changed for four years, so it doesn't seem like these will be implemented any time soon.

2. The most promising effort to bring generic types to Java is Generic Java. Unfortunately, GJ works by removing all type information when it compiles the program, so what the execution environment sees is the end is again the slow casts.

3. The Garbage Collection FAQ contains the information that garbage collections is slower than customized allocator (point 4 in the above text).

4. There is a paper that claims that Garbage Collection Can Be Faster than Stack Allocation. But the requirement is that there is seven times more physical memory than what the program actually uses. Plus, it describes a stop-and-copy collector and doesn't take concurrency into account. [Peter Drayton: FWIW, this is an over-simplification of the paper, which provides a means of calculating what the cross-over point is, but doesn't claim that 7 is a universal cross-over point: it is merely the crossover point he derives using the sample inputs in the paper.]

Feedback

I received a lot of feedback about this article. Here are the typical comments, together with my answers:

"You forgot to mention that all methods in Java are virtual, because nobody is using the final keyword."

The fact that people are not using the final keyword is not a problem with the language, but with the programmers using it. Also, virtual functions calls in general are not problematic because of the call overhead, but because of lost optimization opportunities. But since JITs know how to inline across virtual function boundaries, this is not a big deal.

Java can be faster than C++ because JITs can inline over virtual function boundaries.

C++ can also be compiled using JITs. Check out the C++ compiler in .NET.

In the end, speed doesn't matter. Computers spend most of their time waiting on our input.

Speed still maters. I still wait for my laptop to boot up. I wait for my compiler. I wait on Word when I have a long document.

I work in the financial markets industry. Sometimes I have to run a simulation over a huge data set. Speed matters in those cases.

It is possible for a JIT to allocate some objects on a stack.

Sure. Some.

Your casting pseudo-code is naive. For classes a check can be made based on inheritance depth.

First, that's only a tad faster than the matrix lookup.

Second, that works only for classes, which make up what percentage of casts? Low-level details are usually implemented through interfaces.

So we should all use assembly, ha!?

No. We should all use languages that make sense for a given project. Java is great because it has a large standard library that makes many common tasks easy. It's more portable than any other popular language (but not 100% portable - different platforms fire events at different times and in different order). It has garbage collection that makes memory management simpler and some constructs like closures possible.

But, at the same time, Java, just like any other language, has some deficiencies. It has no support for types with value semantics. Its synchronization constructs are not efficient enough. Its standard library relies on checked exceptions which are evil because they push implementation details into interfaces. Its performance could be better. The math library has some annoying problems. Etc.

Are these deficiencies a big deal? It depends on what you are building. So know a few languages and pick the one that, together with the compiler and available libraries, makes sense for a given project.

Read more...


Content of this site is © Dejan Jelovic. All rights reserved.

'IT' 카테고리의 다른 글

The Best AirPrint Compatible Printers  (0) 2011.12.23
Setting the secure flag in the cookie is easy  (0) 2011.12.10
a  (0) 2011.12.09
Information Security Interview Questions  (0) 2011.12.05
Endpoint Buyers Guide (Anti Virus)  (0) 2011.12.02
Posted by CEOinIRVINE
l

a

IT 2011. 12. 9. 04:07

 

Posted by CEOinIRVINE
l

Information Security Interview Questions

websec

What follows is a list of questions for use in vetting candidates for positions in Information Security. Many of the questions are designed to get the candidate to think, and to articulate that thought process in a scenario where preparation was not possible. Observing these types of responses is often as important as the actual answers.

I’ve mixed technical questions with those that are more theory and opinion-based, and they are also mixed in terms of difficulty. A number of trick questions are included, but the goal there is to expose glaring technical weakness, not to be cute. I also include with each question a few words on expected responses.

Where do you get your security news from?

Here I’m looking to see how in tune they are with the security community. Answers I’m looking for include syndication feeds for solid sites like liquidmatrix, packetstorm, rootsecure, secguru, astalavista, whitedust, internet storm center, etc. The exact sources don’t really matter. What does matter is that he doesn’t respond with, “I go to the CNET website.”, or, “Steve Gibson’s home page”. It’s these types of answers that will tell you he’s likely not on top of things.

If you had to both encrypt and compress data during transmission, which would you do first, and why?

If they don’t know the answer immediately it’s ok. The key is how they react. Do they panic, or do they enjoy the challenge and think through it? I was asked this question during an interview at Cisco. I told the interviewer that I didn’t know the answer but that I needed just a few seconds to figure it out. I thought out loud and within 10 seconds gave him my answer: “Compress then encrypt. If you encrypt first you’ll have nothing but random data to work with, which will destroy any potential benefit from compression.”

What’s the difference between HTTP and HTML?

Obviously the answer is that one is the networking/application protocol and the other is the markup language, but again–the main thing you’re looking for is for him not to panic.

How does HTTP handle state?

It doesn’t, of course. Not natively. Good answers are things like “cookies”, but the best answer is that cookies are a hack to make up for the fact that HTTP doesn’t do it itself.

What exactly is Cross Site Scripting?

You’d be amazed at how many security people don’t know even the basics of this immensely important topic. We’re looking for them to say anything regarding an attacker getting a victim to run script content (usually Javascript) within their browser.

What’s the difference between stored and reflected XSS?

Stored is on a static page or pulled from a database and displayed to the user directly. Reflected comes from the user in the form of a request (usually constructed by an attacker), and then gets run in the victim’s browser when the results are returned from the site.

What are the common defenses against XSS?

Input Validation/Output Sanitization, with focus on the latter.

What’s the difference between symmetric and public-key cryptography

Standard stuff here–single key vs. two keys, etc, etc.

In public-key cryptography you have a public and a private key, and you often perform both encryption and signing functions. Which key is used for which function?

You encrypt with the other person’s public key, and you sign with your own private. If they confuse the two, don’t put them in charge of your PKI project.

What kind of network do you have at home?

Good answers here are anything that shows you he’s a computer/technology/security enthusiast and not just someone looking for a paycheck. So if he’s got multiple systems running multiple operating systems you’re probably in good shape. What you don’t want to hear is, “I get enough computers when I’m at work..” I’ve yet to meet a serious security guy who doesn’t have a considerable home network.

What is Cross-Site Request Forgery?

Not knowing this is more forgivable than not knowing what XSS is, but only for junior positions. Desired answer: when an attacker gets a victim’s browser to make requests, ideally with their credentials included, without their knowing. A solid example of this is when an IMG tag points to a URL associated with an action, e.g. http://foo.com/logout/. A victim just loading that page could potentially get logged out from foo.com, and their browser would have made the action, not them (since browsers load all IMG tags automatically).

How does one defend against CSRF?

Nonces required by the server for each page or each request is an accepted, albeit not foolproof, method. Again, we’re looking for recognition and basic understanding here–not a full, expert level dissertation on the subject. Adjust expectations according to the position you’re hiring for.

What port does ping work over?

A trick question, to be sure, but an important one. If he starts throwing out port numbers you may want to immediately move to the next candidate. Hint: ICMP is a layer 3 protocol (it doesn’t work over a port) A good variation of this question is to ask whether ping uses TCP or UDP. An answer of either is a fail, as those are layer 4 protocols.

How exactly does traceroute/tracert work at the protocol level?

This is a fairly technical question but it’s an important concept to understand. It’s not natively a “security” question really, but it shows you whether or not they like to understand how things work, which is crucial for an Infosec professional. If they get it right you can lighten up and offer extra credit for the difference between Linux and Windows versions.

The key point people usually miss is that each packet that’s sent out doesn’t go to a different place. Many people think that it first sends a packet to the first hop, gets a time. Then it sends a packet to the second hop, gets a time, and keeps going until it gets done. That’s incorrect. It actually keeps sending packets to the final destination; the only change is the TTL that’s used. The extra credit is the fact that Windows uses ICMP by default while Linux uses UDP.

If you were to start a job as head engineer or CSO at a Fortune 500 company due to the previous guy being fired for incompetence, what would your priorities be? [Imagine you start on day one with no knowledge of the environment]

We don’t need a list here; we’re looking for the basics. Where is the important data? Who interacts with it? Network diagrams. Visibility touch points. Ingress and egress filtering. Previous vulnerability assessments. What’s being logged an audited? Etc. The key is to see that they could quickly prioritize, in just a few seconds, what would be the most important things to learn in an unknown situation.

As a corporate Information Security professional, what’s more important to focus on: threats or vulnerabilities?

This one is opinion-based, and we all have opinions. Focus on the quality of the argument put forth rather than whether or not they they chose the same as you, necessarily. My answer to this is that vulnerabilities should usually be the main focus since we in the corporate world usually have little control over the threats.

Another way to take that, however, is to say that the threats (in terms of vectors) will always remain the same, and that the vulnerabilities we are fixing are only the known ones. Therefore we should be applying defense-in-depth based on threat modeling in addition to just keeping ourselves up to date.

Both are true, of course; the key is to hear what they have to say on the matter.

Describe the last program or script that you wrote. What problem did it solve?

All we want to see here is if the color drains from the guy’s face. If he panics then we not only know he’s not a programmer (not necessarily bad), but that he’s afraid of programming (bad). I know it’s controversial, but I think that any high-level security guy needs some programming skills. They don’t need to be a God at it, but they need to understand the concepts and at least be able to muddle through some scripting when required.

What are Linux’s strengths and weaknesses vs. Windows?

Look for biases. Does he absolutely hate Windows and refuse to work with it? This is a sign of an immature hobbyist who will cause you problems in the future. Is he a Windows fanboy who hates Linux with a passion? If so just thank him for his time and show him out. Linux is everywhere in the security world.

What’s the difference between a threat, vulnerability, and a risk?

As weak as the CISSP is as a security certification it does teach some good concepts. Knowing basics like risk, vulnerability, threat, exposure, etc. (and being able to differentiate them) is important for a security professional. Ask as many of these as you’d like, but keep in mind that there are a few differing schools on this. Just look for solid answers that are self-consistent.

Cryptographically speaking, what is the main method of building a shared secret over a public medium?

Diffie-Hellman. And if they get that right you can follow-up with the next one.

What’s the difference between Diffie-Hellman and RSA?

Diffie-Hellman is a key-exchange protocol, and RSA is an encryption/signing protocol. If they get that far, make sure they can elaborate on the actual difference, which is that one requeres you to have key material beforehand (RSA), while the other does not (DH). Blank stares are undesirable.

What kind of attack is a standard Diffie-Hellman exchange vulnerable to?

Man-in-the-middle, as neither side is authenticated.

What’s the goal of information security within an organization?

This is a big one. What I look for is one of two approaches; the first is the über-lockdown approach, i.e. “To control access to information as much as possible, sir!” While admirable, this again shows a bit of immaturity. Not really in a bad way, just not quite what I’m looking for. A much better answer in my view is something along the lines of, “To help the organization succeed. ”This type of response shows that the individual understands that business is there to make money, and that we are there to help them do that. It is this sort of perspective that I think represents the highest level of security understanding—-a realization that security is there for the company and not the other way around.

Are open-source projects more or less secure than proprietary ones?

The answer to this question is often very telling about a given candidate. It shows 1) whether or not they know what they’re talking about in terms of development, and 2) it really illustrates the maturity of the individual (a common theme among my questions). My main goal here is to get them to show me pros and cons for each. If I just get the “many eyes” regurgitation then I’ll know he’s read Slashdot and not much else. And if I just get the “people in China can put anything in the kernel” routine then I’ll know he’s not so good at looking at the complete picture.

The ideal answer involves the size of the project, how many developers are working on it (and what their backgrounds are), and most importantly — quality control. In short, there’s no way to tell the quality of a project simply by knowing that it’s either open-source or proprietary. There are many examples of horribly insecure applications that came from both camps.

What’s the difference between encoding, encryption, and hashing?

Encoding is designed to protect the integrity of data as it crosses networks and systems, i.e. to keep its original message upon arriving, and it isn’t primarily a security function. It is easily reversible because the system for encoding is almost necessarily and by definition in wide use. Encryption is designed purely for confidentiality and is reversible only if you have the appropriate key/keys. With hashing the operation is one-way (non-reversible), and the output is of a fixed length that is usually much smaller than the input.

Who do you look up to within the field of Information Security? Why?

A standard question type. All we’re looking for here is to see if they pay attention to the industry leaders, and to possibly glean some more insight into how they approach security. If they name a bunch of hackers/criminals that’ll tell you one thing, and if they name a few of the pioneers that’ll say another. If they don’t know anyone in Security, well…consider closely what position you’re hiring them for. Hopefully it’s a junior position.

Advanced

Ok, now for some more advanced questions:

  1. If I’m on my laptop, here inside my company, and I have just plugged in my network cable. How many packets must leave my NIC in order to complete a traceroute to twitter.com?

    The key here is that they need to factor in all layers: Ethernet, IP, DNS, ICMP/UDP, etc. And they need to consider round-trip times. What you’re looking for is a realization that this is the way to approach it, and an attempt to knock it out. A bad answer is the look of WTF on the fact of the interviewee.


  2. How would you build the ultimate botnet?

    Answers here can vary widely; you want to see them cover the basics: encryption, DNS rotation, the use of common protocols, obscuring the heartbeat, the mechanism for providing updates, etc. Again, poor answers are things like, “I don’t make them; I stop them.”

Bonus: Scenario Role-Play

For special situations you may want to do the ultimate interview question. This is a role-played scenario, where the candidate is a consultant and you control the environment. I had one of these during an interview and it was quite valuable.

So you tell them, for example, that they’ve been called in to help a client who’s received a call from their ISP stating that one or more computers on their network have been compromised. And it’s their job to fix it. They are now at the client site and are free to talk to you as the client (interviewing them), or to ask you as the controller of the environment, e.g. “I sniff the external connection using tcpdump on port 80. Do I see any connections to IP 4.2.2.2?” And you can then say yes or no, etc.

From there they continue to troubleshooting/investigating until they solve the problem or you discontinue the exercise due to frustration or pity.

Feel free to contact me if you have any comments on the questions, or if you have an ideas for additions.

Posted by CEOinIRVINE
l