Viewed   72 times

I have a feed taken from third-party sites, and sometimes I have to apply utf8_decode and other times utf8_encode to get the desired visible output.

If by mistake the same stuff is applied twice/or the wrong method is used I get something more ugly, this is what I want to change.

How can I detect when what have to apply on the string?

Actually the content returns UTF-8, but inside there are parts that are not.



I can't say I can rely on mb_detect_encoding(). I had some freaky false positives a while back.

The most universal way I found to work well in every case was:

if (preg_match('!!u', $string))
   // This is UTF-8
   // Definitely not UTF-8
Sunday, October 2, 2022

Solution: after $description_dom = new DOMDocument(); , i placed this code.

$description_html = mb_convert_encoding($description_node, 'HTML-ENTITIES', "UTF-8");

Simply converts html entities to UTF-8. Instead of

$description_dom->loadHTML( (string)$description_node );

now i load the converted html

$description_dom->loadHTML( (string)$description_html );
Tuesday, September 27, 2022

try this for the software #2

iconv("UTF-8", "CP437", $this->_output);

Extended ASCII is not the same as plain ASCII. The first one maybe accepts ASCII, but the second software requires Extended ASCII - Codepage 437

see this link

Wednesday, December 7, 2022

In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.

In Python 2, a string may be of type str or of type unicode. You can tell which using code something like this:

def whatisthis(s):
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
        print "not a string"

This does not distinguish "Unicode or ASCII"; it only distinguishes Python types. A Unicode string may consist of purely characters in the ASCII range, and a bytestring may contain ASCII, encoded Unicode, or even non-textual data.

Tuesday, August 30, 2022

UTF8 is the encoding supported by MongoDB out of the box.

Wednesday, November 30, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :