Viewed   135 times

Possible Duplicate:
UTF-8 all the way through

I'm developing some new features on a website that somebody else already developed.

I'm having a problem the charset.

I saw that the database had some tables in utf8 and some in latin1

So I'm trying to convert all the tables in UTF8.

I did it for one table (also the fields of this table now are utf8), but was not successful.

I'm using the normal mysql connect. I have to put any config to say that it must connect with utf8 to the DB? If yes witch one?

In my html I have:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

It looks like some letters works and others display the question mark. For example it not able to display this ’ that is different of this: '

 Answers

3

Try this

<?php

   header('Content-Type: text/html; charset=utf-8');
?>

and then in the connection

<?php
 $dbLink = mysql_connect($argHost, $argUsername, $argPassword);
    mysql_query("SET character_set_results=utf8", $dbLink);
    mb_language('uni'); 
    mb_internal_encoding('UTF-8');
    mysql_select_db($argDB, $dbLink);
    mysql_query("set names 'utf8'",$dbLink);
?>
Tuesday, October 18, 2022
1

Try this:

function convert( $str ) {
    return iconv( "Windows-1252", "UTF-8", $str );
}

public function getRow()
{
    if (($row = fgetcsv($this->_handle, 10000, $this->_delimiter)) !== false) {
        $row = array_map( "convert", $row );
        $this->_line++;
        return $this->_headers ? array_combine($this->_headers, $row) : $row;
    } else {
        return false;
    }
}
Tuesday, August 16, 2022
 
5
  • mb_internal_encoding('UTF-8') doesn't do anything by itself, it only sets the default encoding parameter for each mb_ function. If you're not using any mb_ function, it doesn't make any difference. If you are, it makes sense to set it so you don't have to pass the $encoding parameter each time individually.
  • IMO mb_detect_encoding is mostly useless since it's fundamentally impossible to accurately detect the encoding of unknown text. You should either know what encoding a blob of text is in because you have a specification about it, or you need to parse appropriate meta data like headers or meta tags where the encoding is specified.
  • Using mb_check_encoding to check if a blob of text is valid in the encoding you expect it to be in is typically sufficient. If it's not, discard it and throw an appropriate error.
  • Regarding:

    does this mean I have to use all multi byte functions instead of its core functions

    If you are manipulating strings that contain multibyte characters, then yes, you need to use the mb_ functions to avoid getting wrong results. The core string functions only work on a byte level, not a character level, which is what you typically want when working with strings.

  • utf8_general_ci vs. utf8_bin only makes a difference when collating, i.e. sorting and comparing strings. With utf8_bin data is treated in binary form, i.e. only identical data is identical. With utf8_general_ci some logic is applied, e.g. "é" sorts together with "e" and upper case is considered equal to lower case.
Monday, November 21, 2022
 
5

Did you save the php-file without BOM? If not, try it. Potential issues with the UTF-8 BOM


Further try with 'utf-8' using single quotes and without SET CHARACTER_SET

mysql_query("SET NAMES 'utf8'");

and with charset utf-8 in the html-document header:

header("content-type: text/html; charset=utf-8");
Wednesday, August 3, 2022
1

I agree with the previous answers that UTF-8 is a good choice for most applications.

Beware the traps that might be awaiting you, though! You'll want to be careful that you use a consistent character encoding throughout your system (input forms, output web pages, other front ends that might access or change the data).

I have spent some unpleasant hours trying to figure out why a simple β or é was mangled on my web page, only to find that something somewhere had goofed up an encoding. I've even seen cases of text that gets run through multiple encoders--once turning a single quotation mark into eight bytes.

Bottom line, don't assume the correct translation will be done; be explicit about character encoding throughout your project.

Edit: I see in your update you've already started to discover this particular joy. :)

Wednesday, November 2, 2022
 
aran_k
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :