function filter_xss

Filters HTML to prevent cross-site-scripting (XSS) vulnerabilities.

Based on kses by Ulf Harnhammar, see http://sourceforge.net/projects/kses. For examples of various XSS attacks, see: http://ha.ckers.org/xss.html.

This code does four things:

  • Removes characters and constructs that can trick browsers.
  • Makes sure all HTML entities are well-formed.
  • Makes sure all HTML tags and attributes are well-formed.
  • Makes sure no HTML tags contain URLs with a disallowed protocol (e.g. javascript:).

Parameters

$string: The string with raw HTML in it. It will be stripped of everything that can cause an XSS attack.

$allowed_tags: An array of allowed tags.

Return value

An XSS safe version of $string, or an empty string if $string is not valid UTF-8.

See also

drupal_validate_utf8()

Related topics

25 calls to filter_xss()
aggregator_filter_xss in modules/aggregator/aggregator.module
Renders the HTML content safely, as allowed.
CommentTokenReplaceTestCase::testCommentTokenReplacement in modules/comment/comment.test
Creates a comment, then tests the tokens generated from it.
comment_tokens in modules/comment/comment.tokens.inc
Implements hook_tokens().
CommonXssUnitTest::testInvalidMultiByte in modules/simpletest/tests/common.test
Check that invalid multi-byte sequences are rejected.
DBLogTestCase::assertLogMessage in modules/dblog/dblog.test
Confirms that a log message appears on the database log overview screen.

... See full list

File

includes/common.inc, line 1538

Code

function filter_xss($string, $allowed_tags = array(
    'a',
    'em',
    'strong',
    'cite',
    'blockquote',
    'code',
    'ul',
    'ol',
    'li',
    'dl',
    'dt',
    'dd',
)) {
    // Only operate on valid UTF-8 strings. This is necessary to prevent cross
    // site scripting issues on Internet Explorer 6.
    if (!drupal_validate_utf8($string)) {
        return '';
    }
    // Store the text format.
    _filter_xss_split($allowed_tags, TRUE);
    // Remove NULL characters (ignored by some browsers).
    $string = str_replace(chr(0), '', (string) $string);
    // Remove Netscape 4 JS entities.
    $string = preg_replace('%&\\s*\\{[^}]*(\\}\\s*;?|$)%', '', $string);
    // Defuse all HTML entities.
    $string = str_replace('&', '&', $string);
    // Change back only well-formed entities in our whitelist:
    // Decimal numeric entities.
    $string = preg_replace('/&#([0-9]+;)/', '&#\\1', $string);
    // Hexadecimal numeric entities.
    $string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\\1', $string);
    // Named entities.
    $string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\\1', $string);
    return preg_replace_callback('%
    (
    <(?=[^a-zA-Z!/])  # a lone <
    |                 # or
    <!--.*?-->        # a comment
    |                 # or
    <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string
    |                 # or
    >                 # just a >
    )%x', '_filter_xss_split', $string);
}

Buggy or inaccurate documentation? Please file an issue. Need support? Need help programming? Connect with the Drupal community.