Same name and namespace in other branches
  1. 4.7.x includes/unicode.inc \drupal_xml_parser_create()
  2. 5.x includes/unicode.inc \drupal_xml_parser_create()
  3. 6.x includes/unicode.inc \drupal_xml_parser_create()
  4. 7.x includes/unicode.inc \drupal_xml_parser_create()
  5. 8.9.x core/includes/unicode.inc \drupal_xml_parser_create()

Prepare a new XML parser.

This is a wrapper around xml_parser_create() which extracts the encoding from the XML data first and sets the output encoding to UTF-8. This function should be used instead of xml_parser_create(), because PHP's XML parser doesn't check the input encoding itself.

This is also where unsupported encodings will be converted. Callers should take this into account: $data might have been changed after the call.

Parameters

&$data: The XML data which will be parsed later.

Return value

An XML parser object.

Related topics

1 call to drupal_xml_parser_create()
aggregator_parse_feed in modules/aggregator.module

File

includes/common.inc, line 1639
Common functions that many Drupal modules will need to reference.

Code

function drupal_xml_parser_create(&$data) {

  // Default XML encoding is UTF-8
  $encoding = 'utf-8';
  $bom = false;

  // Check for UTF-8 byte order mark (PHP5's XML parser doesn't handle it).
  if (!strncmp($data, "", 3)) {
    $bom = true;
    $data = substr($data, 3);
  }

  // Check for an encoding declaration in the XML prolog if no BOM was found.
  if (!$bom && ereg('^<\\?xml[^>]+encoding="([^"]+)"', $data, $match)) {
    $encoding = $match[1];
  }

  // Unsupported encodings are converted here into UTF-8.
  $php_supported = array(
    'utf-8',
    'iso-8859-1',
    'us-ascii',
  );
  if (!in_array(strtolower($encoding), $php_supported)) {
    $out = drupal_convert_to_utf8($data, $encoding);
    if ($out !== false) {
      $encoding = 'utf-8';
      $data = ereg_replace('^(<\\?xml[^>]+encoding)="([^"]+)"', '\\1="utf-8"', $out);
    }
    else {
      watchdog('php', t("Could not convert XML encoding '%s' to UTF-8.", array(
        '%s' => theme('placeholder', $encoding),
      )), WATCHDOG_WARNING);
      return 0;
    }
  }
  $xml_parser = xml_parser_create($encoding);
  xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, 'utf-8');
  return $xml_parser;
}