function filter_xss

You are here

7 common.inc filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'blockquote', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
4.6 filter.module filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
4.7 filter.module filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
5 filter.module filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
6 filter.module filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))
8 common.inc filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'blockquote', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd'))

Filters an HTML string to prevent cross-site-scripting (XSS) vulnerabilities.

Based on kses by Ulf Harnhammar, see http://sourceforge.net/projects/kses. For examples of various XSS attacks, see http://ha.ckers.org/xss.html.

This code does four things:

  • Removes characters and constructs that can trick browsers.
  • Makes sure all HTML entities are well-formed.
  • Makes sure all HTML tags and attributes are well-formed.
  • Makes sure no HTML tags contain URLs with a disallowed protocol (e.g. javascript:).

Parameters

$string: The string with raw HTML in it. It will be stripped of everything that can cause an XSS attack.

$allowed_tags: An array of allowed tags.

Return value

An XSS safe version of $string, or an empty string if $string is not valid UTF-8.

See also

drupal_validate_utf8()

Related topics

8 calls to filter_xss()
aggregator_filter_xss in modules/aggregator/aggregator.module
Safely render HTML content, as allowed.
dblog_overview in modules/dblog/dblog.admin.inc
Menu callback; displays a listing of log messages.
drupal_error_handler in includes/common.inc
Log errors as defined by administrator.
drupal_html_to_text in includes/mail.inc
Transform an HTML string into plain text, preserving the structure of the markup. Useful for preparing the body of a node to be sent by e-mail.
filter_xss_admin in modules/filter/filter.module
Very permissive XSS/HTML filter for admin-only use.

... See full list

1 string reference to 'filter_xss'
drupal_error_handler in includes/common.inc
Log errors as defined by administrator.

File

modules/filter/filter.module, line 992
Framework for handling filtering of content.

Code

function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
  // Only operate on valid UTF-8 strings. This is necessary to prevent cross
  // site scripting issues on Internet Explorer 6.
  if (!drupal_validate_utf8($string)) {
    return '';
  }
  // Store the input format
  _filter_xss_split($allowed_tags, TRUE);
  // Remove NUL characters (ignored by some browsers)
  $string = str_replace(chr(0), '', $string);
  // Remove Netscape 4 JS entities
  $string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);

  // Defuse all HTML entities
  $string = str_replace('&', '&', $string);
  // Change back only well-formed entities in our whitelist
  // Decimal numeric entities
  $string = preg_replace('/&#([0-9]+;)/', '&#\1', $string);
  // Hexadecimal numeric entities
  $string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);
  // Named entities
  $string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string);

  return preg_replace_callback('%
    (
    <(?=[^a-zA-Z!/])  # a lone <
    |                 # or
    <!--.*?-->        # a comment
    |                 # or
    <[^>]*(>|$)       # a string that starts with a <, up until the > or the end of the string
    |                 # or
    >                 # just a >
    )%x', '_filter_xss_split', $string);
}

Comments

There is still a security issue with this function, according to OWASP the following characters should be filtered in order to prevent an XSS attack:

<?php
& --> &amp;
< --> &
lt;
> --> &
gt;
" --> &quot;
' --> &#x27;     &apos; is not recommended
/ --> &#x2F;     forward slash is included as it helps end an HTML entity
?>

Source: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.232_-_Attribute_Escape_Before_Inserting_Untrusted_Data_into_HTML_Common_Attributes

With filter_xss() the system is only filtering "&", "<" and ">", leaving holes for other types of javascript code that can be sent through a parameter. For example letting it pass quotes (") allows events like onClick, among others, to be executed successfully:

http://example.com/?q=" onmouseover=prompt("Example")""

For the time being I've hacked the core module filter.module with the same code used to filter "&", but I would like to know if this is the correct approach, and if so, to see this being put in a patch.

<?php
$string
= str_replace('"', '&quot;', $string);
 
$string = str_replace("'", '&#x27;', $string);
 
$string = str_replace('/', '&#x2F;', $string);
?>

After reviewing what I did, I had to put my custom code in a separate function, since it was breaking some of the links of my site.

I'd like to know what would be the right approach for this from the Drupal Core, since it's a security matter.