Viewed   75 times

I'm looking for a simple function that would remove Emoji characters from instagram comments. What I've tried for now (with a lot of code from examples I found on SO & other websites) :

// PHP class
public static function removeEmoji($string)
{
    // split the string into UTF8 char array
    // for loop inside char array
        // if char is emoji, remove it
    // endfor
    // return newstring
}

Any help would be appreciated

 Answers

4

I think the preg_replace function is the simpliest solution.

As EaterOfCode suggests, I read the wiki page and coded new regex since none of SO (or other websites) answers seemed to work for Instagram photo captions (API returning format) . Note: /u identifier is mandatory to match x unicode chars.

public static function removeEmoji($text) {

    $clean_text = "";

    // Match Emoticons
    $regexEmoticons = '/[x{1F600}-x{1F64F}]/u';
    $clean_text = preg_replace($regexEmoticons, '', $text);

    // Match Miscellaneous Symbols and Pictographs
    $regexSymbols = '/[x{1F300}-x{1F5FF}]/u';
    $clean_text = preg_replace($regexSymbols, '', $clean_text);

    // Match Transport And Map Symbols
    $regexTransport = '/[x{1F680}-x{1F6FF}]/u';
    $clean_text = preg_replace($regexTransport, '', $clean_text);

    // Match Miscellaneous Symbols
    $regexMisc = '/[x{2600}-x{26FF}]/u';
    $clean_text = preg_replace($regexMisc, '', $clean_text);

    // Match Dingbats
    $regexDingbats = '/[x{2700}-x{27BF}]/u';
    $clean_text = preg_replace($regexDingbats, '', $clean_text);

    return $clean_text;
}

The function does not remove all emojis since there are many more, but you get the point.

Please refer to unicode.org - full emoji list (thanks Epoc)

Tuesday, August 30, 2022
1
if(preg_match('/xEE[x80-xBF][x80-xBF]|xEF[x81-x83][x80-xBF]/', $value) 

You really want to match Unicode at a character level, rather than trying to keep track of UTF-8 byte sequences. Use the u modifier to treat your UTF-8 string on a character basis.

The emoji are encoded in the block U+1F300–U+1F5FF. However:

  • many characters from Japanese carriers' ‘emoji’ sets are actually mapped to existing Unicode symbols, eg the card suits, zodiac signs and some arrows. Do you count these symbols as ‘emoji’ now?

  • there are still systems which don't use the newly-standardised Unicode emoji code points, instead using ad-hoc ranges in the Private Use Area. Each carrier had their own encodings. iOS 4 used the Softbank set. More info. You may wish to block the entire Private Use Area.

eg:

function unichr($i) {
    return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}

if (preg_match('/['.
    unichr(0x1F300).'-'.unichr(0x1F5FF).
    unichr(0xE000).'-'.unichr(0xF8FF).
']/u'), $value) {
    ...
}
Friday, August 5, 2022
 
3

Your API endpoint is wrong. This is the API to fetch recent media of an user from Instagram:

https://api.instagram.com/v1/users/self/media/recent/?access_token=ACCESS-TOKEN

Get the most recent media published by the owner of the access_token.

Parameters

access_token A valid access token.

max_id Return media earlier than this max_id.

min_id Return media later than this min_id.

count Count of media to return.

Read more: https://www.instagram.com/developer/endpoints/users/#get_users_media_recent_self

Monday, December 19, 2022
3

Here's my "basic" version that works and does the job. It's "half" recursive (a loop that may call the function or not) and the improvements I'm planning to do (handle "+" separator to "add" returns of two functions, and handle "=" to set variables to make short aliases of the value of a return function) seem quite easy to implement in the _compute() function... maybe because I wrote the code myself, and maybe because, like Paul Crovella said, I'm not using PCRE, because it can very easily become an unmaintainable mess...

NB: this code can be easily optimized, and it's not perfect (there are some cases it doesnt work like (a()+b()))... but if someone is willing to finish it he/she's welcome!

class Parser
{
    private $ref = array(
        'a'         => array( 'type' => 'fn',  'val' => '_a'),
        'b'         => array( 'type' => 'fn',  'val' => '_b'),
        'c'         => array( 'type' => 'fn',  'val' => '_c'),
        'd'         => array( 'type' => 'fn',  'val' => '_d'),
        'e'         => array( 'type' => 'fn',  'val' => '_e'),
        'f'         => array( 'type' => 'fn',  'val' => '_f'),
        'intro'         => array( 'type' => 'fn',  'val' => '_getIntro'),
        'insist'        => array( 'type' => 'fn',  'val' => '_insist'),
        'summoner_name' => array( 'type' => 'fn',  'val' => '_getSummonerName'),
        'type'          => array( 'type' => 'fn',  'val' => '_getEtat'),
        ' '             => array( 'type' => 'str', 'val' => ' ')
    );
    private function _a($p)        { return 'valfnA'; }
    private function _b($p)        { return 'valfnB'; }
    private function _c($p)        { return 'valfnC'; }
    private function _d($p)        { return 'valfnD'; }
    private function _e($p)        { return 'valfnE'; }
    private function _f($p)        { return 'valfnF'; }
    private function _getIntro($p)        { return 'valGetIntro'; }
    private function _insist($p)          { return 'valInsist'; }
    private function _getSummonerName($p) { return 'valGetSqmmonerName'; }
    private function _getEtat($p)         { return 'valGetEtat'; }

    private function _convertKey($key, $params=false)
    {
        $retour = 'indéfini';
        if (isset($this->ref[$key])) {
            $val = $this->ref[$key];
            switch ($val['type']) {
                case 'fn':
                    $val=$val['val'];
                    if (method_exists($this, $val)) {
                        $retour = $this->$val($params);
                    }
                    break;

                default:
                    if (isset($this->val['val'])) {
                        $retour = $this->val['val'];
                    }
                    break;
            }
        }
        return $retour;
    }
    private function _compute($str)
    {
        $p=strpos($str, '?');
        if ($p===false) {
            $p=strpos($str, '=');
            if ($p===false) {
                return $str;
            }
        } else {
            $or=strpos($str, '|');
            if ($or===false) {
                return false;
            }
            $s=substr($str,0,$p);
            if (empty($s) || (strtolower($s)=='false')) {
                return substr($str, $or+1);
            }
            return substr($str, $p+1, ($or-$p)-1);
        }
        return $str;
    }
    private function _getTexte($str, $i, $level)
    {
        if (empty($str)) {
            return $str;
        }
        $level++;
        $f   = (strlen($str)-$i);
        $val = substr($str, $i);
        do {
            $d = $i;
            do {
                $p=$d;
                $d=strpos($str, '(', $p+1);
                if (($p==$i) && ($d===false)) {
                    $retour = $this->_compute($str);
                    return $retour;
                } elseif (($d===false) && ($p>$i)) {
                    $f=strpos($str, ')', $p+1);
                    if ($f===false) {
                        return false;
                    }
                    $d=$p;
                    while((--$d)>=$i) {
                        if (($str[$d]!=' ')
                            && ($str[$d]!='_')
                            && (!ctype_alnum($str[$d]))
                        ) {
                            break;
                        }
                    }
                    if ($d>=$i) {
                        $d++;
                    } else {
                        $d=$i;
                    }
                    $val=substr($str, $d, ($f-$d)+1);
                    $fn=substr($str, $d, $p-$d);
                    $param=$this->_getTexte(
                        substr($str, $p+1, ($f-$p)-1), 0, $level+1
                    );
                    if (!empty($fn)) {
                        $val = $this->_convertKey($fn, $param);
                    } else {
                        $val = $this->_compute($param);
                    }
                    $str = substr($str, 0, $d).$val.substr($str, $f+1);
                    break;
                } elseif ($d===false) {
                    break;
                }
            } while (true);
        } while (true);
    }
    public function parse($str)
    {
        $retour=preg_replace('/{*[^.{]+*}/', '', $str); //}
        $retour=str_replace("n", "", $retour);
        $retour=str_replace("r", "", $retour);
        while (strpos($retour, '  ')!==false) {
            $retour=str_replace("  ", " ", $retour);
        }
        return trim($this->_getTexte($retour, 0, 0));
    }
}

$p=new Parser();
$tests = [
    "a",
    "a()",
    "a(b)",
    "(a?b|c)",
    "(a()?(b()?d|e)|(c()?f|g))",
    "(a()?(b()?d|e)|(c()?f()|g))",
    "((h() ? a | i) ? (b() ? d | e) | (c() ? f | g))",
    "(a(d(f))?b(e(f))|c)",
    '(intro(intro(type(insist(poou))))?toutou|tutu)',
    'type()intro(intro(type(insist(poou))))?type()|tutu'
];
foreach ($tests as $test) {
    $res=$p->parse($test);
    echo $test.' = '.var_export($res,true)."n";
}
Monday, September 5, 2022
2

Your PHP testing function:

<?php
function test_req($key, $default = '') {
    if(isset($_REQUEST[$key]) and
       !empty($_REQUEST[$key])) {
        return $_REQUEST[$key];
    } else {
        return $default;
    }
}
?>

Then in your form HTML:

<input name="my_field" value="<?php echo htmlentities(test_req('my_field')); ?>" />

$_REQUEST (linked) is a PHP super global that contains both POST ($_POST) and GET ($_GET) request parameters.

If you only want to capture POST request parameters then it would be:

<?php
function test_req($key, $default = '') {
    if(isset($_POST[$key]) and
       !empty($_POST[$key])) {
        return $_POST[$key];
    } else {
        return $default;
    }
}
?>

For example.

Tuesday, September 13, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :