PHP htmlentities() Function

Usage — The PHP htmlentities() function is used to convert all applicable characters to HTML entities.

It has the following syntax:

string htmlentities ( string $string [, int $flags = ENT_COMPAT | ENT_HTML401 [, string $encoding = ini_get("default_charset") [, bool $double_encode = true ]]] )

Here is an example of using htmlentities():

<?php
$para = '<p class="example">Everything is looking good.</p>';

// Output — &lt;p class=&quot;example&quot;&gt;Everything is looking good.&lt;/p&gt;
echo htmlentities($para);
?>

Return Value — This function returns the encoded string. If the input string contains an invalid code sequence within the encoding, an empty string is returned instead.

Additional Information — If you want to convert HTML entities back to characters, use the html_entity_decode() function. You can use the get_html_translation_table() function to return the translation table used by htmlentities().

The double_encode parameter was added in 5.2.3. The flags for specifying the doctype were added in version 5.4.0 and the default value of the encoding parameter was changed in UTF-8 in version 5.4.0 as well. Finally, the default value for the encoding parameter was changed to be the value of the default_charset configuration option in version 5.6.0.

This function is very much like the htmlspecialchars(). The only difference is that htmlentities() will convert all characters which have HTML character entity equivalents into those entities.

If you want to encode a string instead of decoding it, you can use the html_entity_decode() function.

PHP Version and Changelog — The htmlentities() function is available in PHP 4, PHP 5, PHP 7.

Relevant Functions — Other related PHP functions that you should know about are: html_entity_decode() which converts all HTML entities to their applicable characters, get_html_translation_table() which returns the translation table used by htmlspecialchars and htmlentities, htmlspecialchars() which converts special characters to HTML entities, nl2br() which inserts HTML line breaks before all newlines in a string, urlencode which uRL-encodes a given string.

Parameters

string

The string parameter is used to specify the input string that needs to be encoded. This is a required parameter.

flags

The flags parameter is used to specify a bitmask of one or more of the following flags, which determine how to handle quotes, invalid code unit sequences and which document type to use. This parameter is optional and its default value is ENT_COMPAT | ENT_HTML401.

The following quote style flags are available:

  • ENT_COMPAT - This will convert double-quotes and leave single-quotes alone.
  • ENT_QUOTES - This will convert both double and single quotes.
  • ENT_NOQUOTES - This will leave both double and single quotes unconverted.

Invalid encoding is handled using the following flags:

  • ENT_IGNORE - Ignores invalid encoding instead of having the function return an empty string. Should be avoided, as it may have security implications.
  • ENT_SUBSTITUTE - Replaces invalid encoding for a specified character set with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; instead of returning an empty string.
  • ENT_DISALLOWED - This will replace code points that are invalid in the specified doctype with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD;

The following additional flags can be used to specify the used doctype:

  • ENT_HTML401 - Default. Handle code as HTML 4.01
  • ENT_HTML5 - Handle code as HTML 5
  • ENT_XML1 - Handle code as XML 1
  • ENT_XHTML - Handle code as XHTML

encoding

The encoding parameter is used to specify an optional argument defining the encoding to use when converting characters. It is an optional parameter. Allowed values for the encoding parameter are:

UTF-8 - Default. ASCII compatible multi-byte 8-bit Unicode
ISO-8859-1 - Western European
ISO-8859-15 - Western European (adds the Euro sign + French and Finnish letters missing in ISO-8859-1)
cp866 - DOS-specific Cyrillic charset
cp1251 - Windows-specific Cyrillic charset
cp1252 - Windows specific charset for Western European
KOI8-R - Russian
BIG5 - Traditional Chinese, mainly used in Taiwan
GB2312 - Simplified Chinese, national standard character set
BIG5-HKSCS - Big5 with Hong Kong extensions
Shift_JIS - Japanese
EUC-JP - Japanese
MacRoman - Character-set that was used by Mac OS

Please note that unrecognized character-sets will be ignored and replaced by ISO-8859-1 in versions prior to PHP 5.4. As of PHP 5.4, it will be ignored an replaced by UTF-8.

If omitted, the default value of the encoding varies depending on the PHP version in use. PHP 5.4 and 5.5 will use UTF-8 as the default. Earlier versions of PHP use ISO-8859-1. In PHP 5.6 and later, the default_charset configuration option is used as the default value.

Even though this argument is technically optional, you are highly encouraged to specify the correct value for your code if you are using PHP 5.5 or earlier.

double_encode

The double_encode is an optional parameter that is used to specify if PHP should or should not encode existing html entities. The default value is true which indicates that all the entities will be converted.

Working Examples

Here are some examples of using the htmlentities() function:

A htmlentities() example

<?php
$str = "A 'quote' is <b>bold</b>";

// Output — A 'quote' is &lt;b&gt;bold&lt;/b&gt;
echo htmlentities($str);

// Output — A &#039;quote&#039; is &lt;b&gt;bold&lt;/b&gt;
echo htmlentities($str, ENT_QUOTES);
?>

Additional Tips

Here are some of the most upvoted tips taken from the comment section of the PHP manual:

  1. An important note below about using this function to secure your application against Cross Site Scripting (XSS) vulnerabilities.

    When printing user input in an attribute of an HTML tag, the default configuration of htmlentities() doesn’t protect you against XSS, when using single quotes to define the border of the tag’s attribute-value. XSS is then possible by injecting a single quote:

    <?php
    $_GET['a'] = "#000' onload='alert(document.cookie)";
    ?>

    XSS possible (insecure):

    <?php
    $href = htmlentities($_GET['a']);
    print "<body bgcolor='$href'>"; # results in: <body bgcolor='#000' onload='alert(document.cookie)'>
    ?>

    Use the ‘ENT_QUOTES’ quote style option, to ensure no XSS is possible and your application is secure:

    <?php
    $href = htmlEntities($_GET['a'], ENT_QUOTES);
    print "<body bgcolor='$href'>"; # results in: <body bgcolor='#000&#039; onload=&#039;alert(document.cookie)'>
    ?>

    The ‘ENT_QUOTES’ option doesn’t protect you against javascript evaluation in certain tag’s attributes, like the ‘href’ attribute of the ‘a’ tag. When clicked on the link below, the given JavaScript will get executed:

    <?php
    $_GET['a'] = 'javascript:alert(document.cookie)';
    $href = htmlEntities($_GET['a'], ENT_QUOTES);
    print "<a href='$href'>link</a>"; # results in: <a href='javascript:alert(document.cookie)'>link</a>
    ?>

    Suggested by - Sijmen Ruwhof

Further Reading

  1. You can read more about the PHP htmlentities() function on PHP.net.