Same name and namespace in other branches
  1. 4.6.x includes/common.inc \drupal_xml_parser_create()
  2. 4.7.x includes/unicode.inc \drupal_xml_parser_create()
  3. 5.x includes/unicode.inc \drupal_xml_parser_create()
  4. 6.x includes/unicode.inc \drupal_xml_parser_create()
  5. 8.9.x core/includes/unicode.inc \drupal_xml_parser_create()

Prepares a new XML parser.

This is a wrapper around xml_parser_create() which extracts the encoding from the XML data first and sets the output encoding to UTF-8. This function should be used instead of xml_parser_create(), because PHP 4's XML parser doesn't check the input encoding itself. "Starting from PHP 5, the input encoding is automatically detected, so that the encoding parameter specifies only the output encoding."

This is also where unsupported encodings will be converted. Callers should take this into account: $data might have been changed after the call.

Parameters

$data: The XML data which will be parsed later.

Return value

An XML parser object or FALSE on error.

Related topics

2 calls to drupal_xml_parser_create()
aggregator_parse_feed in modules/aggregator/aggregator.parser.inc
Parses a feed and stores its items.
_aggregator_parse_opml in modules/aggregator/aggregator.admin.inc
Parses an OPML file.

File

includes/unicode.inc, line 189
Provides Unicode-related conversions and operations.

Code

function drupal_xml_parser_create(&$data) {

  // Default XML encoding is UTF-8
  $encoding = 'utf-8';
  $bom = FALSE;

  // Check for UTF-8 byte order mark (PHP5's XML parser doesn't handle it).
  if (!strncmp($data, "", 3)) {
    $bom = TRUE;
    $data = substr($data, 3);
  }

  // Check for an encoding declaration in the XML prolog if no BOM was found.
  if (!$bom && preg_match('/^<\\?xml[^>]+encoding="(.+?)"/', $data, $match)) {
    $encoding = $match[1];
  }

  // Unsupported encodings are converted here into UTF-8.
  $php_supported = array(
    'utf-8',
    'iso-8859-1',
    'us-ascii',
  );
  if (!in_array(strtolower($encoding), $php_supported)) {
    $out = drupal_convert_to_utf8($data, $encoding);
    if ($out !== FALSE) {
      $encoding = 'utf-8';
      $data = preg_replace('/^(<\\?xml[^>]+encoding)="(.+?)"/', '\\1="utf-8"', $out);
    }
    else {
      watchdog('php', 'Could not convert XML encoding %s to UTF-8.', array(
        '%s' => $encoding,
      ), WATCHDOG_WARNING);
      return FALSE;
    }
  }
  $xml_parser = xml_parser_create($encoding);
  xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, 'utf-8');
  return $xml_parser;
}