Viewed   63 times

I want to convert this to

I have tried:


this provides the same string I entered, returned with the @ symbol converted to %40

also tried:


this provides the same string right back.

I am using a UTF8 charset. not sure if this makes a difference....



Here it goes (assumes UTF-8, but it's trivial to change):

function encode($str) {
    $str = mb_convert_encoding($str , 'UTF-32', 'UTF-8'); //big endian
    $split = str_split($str, 4);

    $res = "";
    foreach ($split as $c) {
        $cur = 0;
        for ($i = 0; $i < 4; $i++) {
            $cur |= ord($c[$i]) << (8*(3 - $i));
        $res .= "&#" . $cur . ";";
    return $res;

EDIT Recommended alternative using unpack:

function encode2($str) {
    $str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
    $t = unpack("N*", $str);
    $t = array_map(function($n) { return "&#$n;"; }, $t);
    return implode("", $t);
Sunday, August 14, 2022

This works for me for decoding entities to utf8:

html_entity_decode($str, ENT_QUOTES | ENT_HTML5, 'UTF-8');

Edit:-- The "trick" to it is the combination in the second parameter, and including the encoding in the third parameter. That is, if you just did html_entity_decode($str); the result would not be utf8.

Thursday, October 6, 2022


This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.

Sorting by non-accented characters in PHP 5.2

You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;

$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);

Then do the comparison

See the documentation here:

[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?

To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().


setLocale(LC_ALL, 'fr_FR');

$names = array(
   'Zoey and another (word) ',
   'Émilie and another word',

$converted = array();

foreach($names as $name) {
    $converted[] = preg_replace('#[^ws]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));


echo '<pre>'; print_r($converted);

// Array
// (
//     [0] => Amber
//     [1] => Emilie and another word
//     [2] => Zoey and another word 
// )
Wednesday, November 23, 2022

Use mb_encode_numericentity:

$convmap = array(0x80, 0xffff, 0, 0xffff);
echo mb_encode_numericentity($utf8Str, $convmap, 'UTF-8');
Sunday, October 9, 2022

When opening binary files with fopen(), use the rb mode, ie

$fp = fopen($tmp_name, 'rb');

Alternatively, you may simply use file_get_contents(), eg

$file_content = file_get_contents($tmp_name);

To enable better error reporting, place this at the top of your script

ini_set('display_errors', 'On');
Tuesday, August 9, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :