Same name and namespace in other branches
  1. 4.6.x modules/filter.module \filter_xss()
  2. 4.7.x modules/filter.module \filter_xss()
  3. 6.x modules/filter/filter.module \filter_xss()
  4. 7.x includes/common.inc \filter_xss()

Filters XSS. Based on kses by Ulf Harnhammar, see http://sourceforge.net/projects/kses

For examples of various XSS attacks, see: http://ha.ckers.org/xss.html

This code does four things:

  • Removes characters and constructs that can trick browsers
  • Makes sure all HTML entities are well-formed
  • Makes sure all HTML tags and attributes are well-formed
  • Makes sure no HTML tags contain URLs with a disallowed protocol (e.g. javascript:)

Parameters

$string: The string with raw HTML in it. It will be stripped of everything that can cause an XSS attack.

$allowed_tags: An array of allowed tags.

$format: The format to use.

4 calls to filter_xss()
aggregator_filter_xss in modules/aggregator/aggregator.module
Safely render HTML content, as allowed.
filter_xss_admin in modules/filter/filter.module
Very permissive XSS/HTML filter for admin-only use.
node_revision_overview in modules/node/node.module
Generate an overview table of older revisions of a node.
_filter_html in modules/filter/filter.module
HTML filter. Provides filtering of input into accepted HTML.

File

modules/filter/filter.module, line 1276
Framework for handling filtering of content.

Code

function filter_xss($string, $allowed_tags = array(
  'a',
  'em',
  'strong',
  'cite',
  'code',
  'ul',
  'ol',
  'li',
  'dl',
  'dt',
  'dd',
)) {

  // Only operate on valid UTF-8 strings. This is necessary to prevent cross
  // site scripting issues on Internet Explorer 6.
  if (!drupal_validate_utf8($string)) {
    return '';
  }

  // Store the input format
  _filter_xss_split($allowed_tags, TRUE);

  // Remove NUL characters (ignored by some browsers)
  $string = str_replace(chr(0), '', $string);

  // Remove Netscape 4 JS entities
  $string = preg_replace('%&\\s*\\{[^}]*(\\}\\s*;?|$)%', '', $string);

  // Defuse all HTML entities
  $string = str_replace('&', '&', $string);

  // Change back only well-formed entities in our whitelist
  // Named entities
  $string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\\1', $string);

  // Decimal numeric entities
  $string = preg_replace('/&#([0-9]+;)/', '&#\\1', $string);

  // Hexadecimal numeric entities
  $string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\\1', $string);
  return preg_replace_callback('%
    (
    <(?=[^a-zA-Z!/])  # a lone <
    |                 # or
    <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string
    |                 # or
    >                 # just a >
    )%x', '_filter_xss_split', $string);
}