Community Documentation

filter_xss

5 filter.module filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
6 filter.module filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
7 common.inc filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'blockquote', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
8 common.inc filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'blockquote', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))

Filters an HTML string to prevent cross-site-scripting (XSS) vulnerabilities.

Based on kses by Ulf Harnhammar, see http://sourceforge.net/projects/kses. For examples of various XSS attacks, see http://ha.ckers.org/xss.html.

This code does four things:

  • Removes characters and constructs that can trick browsers.
  • Makes sure all HTML entities are well-formed.
  • Makes sure all HTML tags and attributes are well-formed.
  • Makes sure no HTML tags contain URLs with a disallowed protocol (e.g. javascript:).

Parameters

$string: The string with raw HTML in it. It will be stripped of everything that can cause an XSS attack.

$allowed_tags: An array of allowed tags.

Return value

An XSS safe version of $string, or an empty string if $string is not valid UTF-8.

See also

drupal_validate_utf8()

Related topics

▾ 7 functions call filter_xss()

aggregator_filter_xss in modules/aggregator/aggregator.module
Safely render HTML content, as allowed.
drupal_error_handler in includes/common.inc
Log errors as defined by administrator.
drupal_html_to_text in includes/mail.inc
Transform an HTML string into plain text, preserving the structure of the markup. Useful for preparing the body of a node to be sent by e-mail.
filter_xss_admin in modules/filter/filter.module
Very permissive XSS/HTML filter for admin-only use.
locale_string_is_safe in includes/locale.inc
Check that a string is safe to be added or imported as a translation.
node_revision_overview in modules/node/node.pages.inc
Generate an overview table of older revisions of a node.
_filter_html in modules/filter/filter.module
HTML filter. Provides filtering of input into accepted HTML.

File

modules/filter/filter.module, line 992
Framework for handling filtering of content.

Code

<?php
function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
  // Only operate on valid UTF-8 strings. This is necessary to prevent cross
  // site scripting issues on Internet Explorer 6.
  if (!drupal_validate_utf8($string)) {
    return '';
  }
  // Store the input format
  _filter_xss_split($allowed_tags, TRUE);
  // Remove NUL characters (ignored by some browsers)
  $string = str_replace(chr(0), '', $string);
  // Remove Netscape 4 JS entities
  $string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);

  // Defuse all HTML entities
  $string = str_replace('&', '&amp;', $string);
  // Change back only well-formed entities in our whitelist
  // Decimal numeric entities
  $string = preg_replace('/&amp;#([0-9]+;)/', '&#\1', $string);
  // Hexadecimal numeric entities
  $string = preg_replace('/&amp;#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);
  // Named entities
  $string = preg_replace('/&amp;([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string);

  return preg_replace_callback('%
    (
    <(?=[^a-zA-Z!/])  # a lone <
    |                 # or
    <!--.*?-->        # a comment
    |                 # or
    <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string
    |                 # or
    >                 # just a >
    )%x', '_filter_xss_split', $string);
}
?>

Comments

finer filtering for style attribute

Great work here - especially after reading the XSS attacks ref!
Internal editor users often want to add markup to locally used data (eg entered via CKEditor) which requires the style attribute.
This is stripped out when passed through filter_filter() and sub functions.

Would it be possible to add an optional style cleanup routine that:
- first removes embedded comments
- removes the 'behavior' selector, 'expression' property, 'javascript' anywhere?

thanks

style attribute

that actually gets pulled out further down the line in _filter_xss_attributes(). I agree with you Jons, it looks from that function that "style" is hard-coded for removal. It would be cool if there was an "allowed attributes" setting. Feature request?

See possible solution here:

See possible solution here: http://drupal.org/node/1311064

Login or register to post comments