function MailFormatHelper::htmlToText

Same name and namespace in other branches
  1. 9 core/lib/Drupal/Core/Mail/MailFormatHelper.php \Drupal\Core\Mail\MailFormatHelper::htmlToText()
  2. 10 core/lib/Drupal/Core/Mail/MailFormatHelper.php \Drupal\Core\Mail\MailFormatHelper::htmlToText()
  3. 11.x core/lib/Drupal/Core/Mail/MailFormatHelper.php \Drupal\Core\Mail\MailFormatHelper::htmlToText()

Transforms an HTML string into plain text, preserving its structure.

The output will be suitable for use as 'format=flowed; delsp=yes' text (RFC 3676) and can be passed directly to MailManagerInterface::mail() for sending.

We deliberately use LF rather than CRLF, see MailManagerInterface::mail().

This function provides suitable alternatives for the following tags: <a> <em> <i> <strong> <b> <br> <p> <blockquote> <ul> <ol> <li> <dl> <dt> <dd> <h1> <h2> <h3> <h4> <h5> <h6> <hr>

Parameters

string $string: The string to be transformed.

array $allowed_tags: (optional) If supplied, a list of tags that will be transformed. If omitted, all supported tags are transformed.

Return value

string The transformed string.

5 calls to MailFormatHelper::htmlToText()
ContactSitewideTest::testAutoReply in core/modules/contact/tests/src/Functional/ContactSitewideTest.php
Tests auto-reply on the site-wide contact form.
HtmlToTextTest::assertHtmlToText in core/modules/system/tests/src/Functional/Mail/HtmlToTextTest.php
Helper function to test \Drupal\Core\Mail\MailFormatHelper::htmlToText().
HtmlToTextTest::testDrupalHtmlToTextBlockTagToNewline in core/modules/system/tests/src/Functional/Mail/HtmlToTextTest.php
Test that text separated by block-level tags in HTML get separated by (at least) a newline in the plaintext version.
HtmlToTextTest::testVeryLongLineWrap in core/modules/system/tests/src/Functional/Mail/HtmlToTextTest.php
Tests \Drupal\Core\Mail\MailFormatHelper::htmlToText() wrapping.
PhpMail::format in core/lib/Drupal/Core/Mail/Plugin/Mail/PhpMail.php
Concatenates and wraps the email body for plain-text mails.

File

core/lib/Drupal/Core/Mail/MailFormatHelper.php, line 103

Class

MailFormatHelper
Defines a class containing utility methods for formatting mail messages.

Namespace

Drupal\Core\Mail

Code

public static function htmlToText($string, $allowed_tags = NULL) {
    // Cache list of supported tags.
    if (empty(static::$supportedTags)) {
        static::$supportedTags = [
            'a',
            'em',
            'i',
            'strong',
            'b',
            'br',
            'p',
            'blockquote',
            'ul',
            'ol',
            'li',
            'dl',
            'dt',
            'dd',
            'h1',
            'h2',
            'h3',
            'h4',
            'h5',
            'h6',
            'hr',
        ];
    }
    // Make sure only supported tags are kept.
    $allowed_tags = isset($allowed_tags) ? array_intersect(static::$supportedTags, $allowed_tags) : static::$supportedTags;
    // Make sure tags, entities and attributes are well-formed and properly
    // nested.
    $string = Html::normalize(Xss::filter($string, $allowed_tags));
    // Apply inline styles.
    $string = preg_replace('!</?(em|i)((?> +)[^>]*)?>!i', '/', $string);
    $string = preg_replace('!</?(strong|b)((?> +)[^>]*)?>!i', '*', $string);
    // Replace inline <a> tags with the text of link and a footnote.
    // 'See <a href="https://www.drupal.org">the Drupal site</a>' becomes
    // 'See the Drupal site [1]' with the URL included as a footnote.
    static::htmlToMailUrls(NULL, TRUE);
    $pattern = '@(<a[^>]+?href="([^"]*)"[^>]*?>(.+?)</a>)@i';
    $string = preg_replace_callback($pattern, 'static::htmlToMailUrls', $string);
    $urls = static::htmlToMailUrls();
    $footnotes = '';
    if (count($urls)) {
        $footnotes .= "\n";
        for ($i = 0, $max = count($urls); $i < $max; $i++) {
            $footnotes .= '[' . ($i + 1) . '] ' . $urls[$i] . "\n";
        }
    }
    // Split tags from text.
    $split = preg_split('/<([^>]+?)>/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
    // Note: PHP ensures the array consists of alternating delimiters and
    // literals and begins and ends with a literal (inserting $null as
    // required).
    // Odd/even counter (tag or no tag).
    $tag = FALSE;
    // Case conversion function.
    $casing = NULL;
    $output = '';
    // All current indentation string chunks.
    $indent = [];
    // Array of counters for opened lists.
    $lists = [];
    foreach ($split as $value) {
        // Holds a string ready to be formatted and output.
        $chunk = NULL;
        // Process HTML tags (but don't output any literally).
        if ($tag) {
            list($tagname) = explode(' ', strtolower($value), 2);
            switch ($tagname) {
                // List counters.
                case 'ul':
                    array_unshift($lists, '*');
                    break;
                case 'ol':
                    array_unshift($lists, 1);
                    break;
                case '/ul':
                case '/ol':
                    array_shift($lists);
                    // Ensure blank new-line.
                    $chunk = '';
                    break;
                // Quotation/list markers, non-fancy headers.
                case 'blockquote':
                    // Format=flowed indentation cannot be mixed with lists.
                    $indent[] = count($lists) ? ' "' : '>';
                    break;
                case 'li':
                    $indent[] = isset($lists[0]) && is_numeric($lists[0]) ? ' ' . $lists[0]++ . ') ' : ' * ';
                    break;
                case 'dd':
                    $indent[] = '    ';
                    break;
                case 'h3':
                    $indent[] = '.... ';
                    break;
                case 'h4':
                    $indent[] = '.. ';
                    break;
                case '/blockquote':
                    if (count($lists)) {
                        // Append closing quote for inline quotes (immediately).
                        $output = rtrim($output, "> \n") . "\"\n";
                        // Ensure blank new-line.
                        $chunk = '';
                    }
                // Intentional fall-through to the processing for '/li' and '/dd'.
                case '/li':
                case '/dd':
                    array_pop($indent);
                    break;
                case '/h3':
                case '/h4':
                    array_pop($indent);
                // Intentional fall-through to the processing for '/h5' and '/h6'.
                case '/h5':
                case '/h6':
                    // Ensure blank new-line.
                    $chunk = '';
                    break;
                // Fancy headers.
                case 'h1':
                    $indent[] = '======== ';
                    $casing = 'mb_strtoupper';
                    break;
                case 'h2':
                    $indent[] = '-------- ';
                    $casing = 'mb_strtoupper';
                    break;
                case '/h1':
                case '/h2':
                    $casing = NULL;
                    // Pad the line with dashes.
                    $output = static::htmlToTextPad($output, $tagname == '/h1' ? '=' : '-', ' ');
                    array_pop($indent);
                    // Ensure blank new-line.
                    $chunk = '';
                    break;
                // Horizontal rulers.
                case 'hr':
                    // Insert immediately.
                    $output .= static::wrapMail('', implode('', $indent)) . "\n";
                    $output = static::htmlToTextPad($output, '-');
                    break;
                // Paragraphs and definition lists.
                case '/p':
                case '/dl':
                    // Ensure blank new-line.
                    $chunk = '';
                    break;
            }
        }
        else {
            // Convert inline HTML text to plain text; not removing line-breaks or
            // white-space, since that breaks newlines when sanitizing plain-text.
            $value = trim(Html::decodeEntities($value));
            if (mb_strlen($value)) {
                $chunk = $value;
            }
        }
        // See if there is something waiting to be output.
        if (isset($chunk)) {
            // Apply any necessary case conversion.
            if (isset($casing)) {
                $chunk = call_user_func($casing, $chunk);
            }
            $line_endings = Settings::get('mail_line_endings', PHP_EOL);
            // Format it and apply the current indentation.
            $output .= static::wrapMail($chunk, implode('', $indent)) . $line_endings;
            // Remove non-quotation markers from indentation.
            $indent = array_map('\\Drupal\\Core\\Mail\\MailFormatHelper::htmlToTextClean', $indent);
        }
        $tag = !$tag;
    }
    return $output . $footnotes;
}

Buggy or inaccurate documentation? Please file an issue. Need support? Need help programming? Connect with the Drupal community.