Viewed   55 times

Possible Duplicate:
UTF-8 all the way through

Searching high & low for a solution. I've tried many variations before posting the question.

What is required to have names appear the same in phpMyAdmin and html page? Can this even be accomplished?

EDIT 1: It would seem that this is a mysql issue. Why? Because the php generated html page will always show the correct characters. At this point it is only the database that shows incorrectly.

EDIT 2: Clarification. With the original settings shown in code snip and images below,

  1. Enter João and submit
  2. João displayed in database
  3. João display after reload

Adding the mysqli_query ( $link, 'SET NAMES utf8' )

  1. Enter João and submit
  2. João displayed in database
  3. Jo?o displayed after reload

end Edit 2

In a mysql database, viewed with phpMyAdmin:

The items appear in the database like this: (I've modified the first João to appear correct in database)

And in the html page with encoding set the names appear like (order is reversed & modified has black diamond),

Encoding: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

I have tried changing the column collation to utf8_bin, utf8_general_ci, utf8_unicode_ci, all with no change to either side. Also changed the document (BBEdit) from UTF-8 to UTF-8 (with BOM), ISO Latin 1 and Windows Latin 1. Several of these created more black diamonds, making the issue worse. (Set to UTF-8 in images) I even tried to preg_replace ã, é etc with the encoded equivalents.

The short story is, João is entered on the page (content type above), João is in database, and João comes to the html page on refresh.

Looking for ideas. Thanks.

 Answers

2

Character set issues are often really tricky to figure out. Basically, you need to make sure that all of the following are true:

  • The DB connection is using UTF-8
  • The DB tables are using UTF-8
  • The individual columns in the DB tables are using UTF-8
  • The data is actually stored properly in the UTF-8 encoding inside the database (often not the case if you've imported from bad sources, or changed table or column collations)
  • The web page is requesting UTF-8
  • Apache is serving UTF-8

Here's a good tutorial on dealing with that list, from start to finish: http://www.bluebox.net/news/2009/07/mysql_encoding/

It sounds like your problem is specifically that you've got double-encoded (or triple-encoded) characters, probably from changing character sets or importing already-encoded data with the wrong charset. There's a whole section on fixing that in the above tutorial.

Wednesday, September 28, 2022
 
reid
 
1

If you have made sure that both the tables, and the output encoding are UTF-8, almost the only thing left is the connection encoding.

The reason for the change in behaviour when updating servers could be a change of the default connection encoding:

[mysql]
default-character-set=utf8

However, I can't see any changes in the default encoding between versions, so if those were brand-new installs, I can't see that happening.

Anyway, what happens if you run this from within your PHP query and output the results. Any differences to the command line output?

 SHOW VARIABLES LIKE 'character_set%';
 SHOW VARIABLES LIKE 'collation%'; 
Friday, November 18, 2022
 
1

Please bare my somewhat lengthy response.
Let's start with your second question. %C3%96 means that the bytes 0xC3 and 0x96 are transmitted. Those two bytes encode the character Ö in utf-8.
From this (and that your query yields the described results) I assume that you're using utf-8 all the way through.

The lexicographical order of characters of a given charset is determined by the collation used.
That's more or less an ordered list of characters. E.g. A,B,C,D,.... meaning A<B<C....
But these lists my contain multiple characters in the same "location", e.g.
[A,Ä],B,C,D.... meaning that A==Ä->true

___ excursion, not immediately relevant to your question ____
Let's take a look at the "name" of the character Ö, it's LATIN CAPITAL LETTER O WITH DIAERESIS.
So, the base character is O, it just has some decoration(s).
Some systems/libraries allow you to specify the "granularity"/level/strength of the comparison, see e.g. Collator::setStrength of the php-intl extension.

<?php
// utf8 characters
define('SMALL_O_WITH_DIAERESIS', chr(0xC3) . chr(0xB6));
define('CAP_O_WITH_DIAERESIS', chr(0xC3) . chr(0x96));

$coll = collator_create( 'utf-8' );
foreach( array('PRIMARY', 'SECONDARY', 'TERTIARY') as $strength) {
    echo $strength, "rn";
    $coll->setStrength( constant('Collator::'.$strength) );
    echo '  o ~ ö = ', $coll->compare('o', SMALL_O_WITH_DIAERESIS), "rn";
    echo '  Ö ~ ö = ', $coll->compare(CAP_O_WITH_DIAERESIS, SMALL_O_WITH_DIAERESIS), "rn";
}

prints

PRIMARY
  o ~ ö = 0
  Ö ~ ö = 0
SECONDARY
  o ~ ö = -1
  Ö ~ ö = 0
TERTIARY
  o ~ ö = -1
  Ö ~ ö = 1

On the primary level all the involved characters (o,O,ö,Ö) are just some irrelevant variations of the character O, so all are regarded as equal.
On the secondary level the additional "feature" WITH DIAERESIS is taken into consideration and on the third level also whether it is a small or a capital letter.
But ...MySQL doesn't exactly work that way ...so, sorry again ;-)
___ end of excursion ____

In MySQL there are collation tables that specify the order. When you select a charset you also implictly select the default collation for that charset, unless you explictly specify one. In your case the implictly selected collation is probably utf8_general_ci and it treats ö==o.
This applies to both the table defintion and charset/collation of the connection (the latter being almost irrelevant in your case).
utf8_turkish_ci on the other hand treats ö!=o. That's probably the collation you want.

When you have a table defintion like

CREATE TABLE soFoo (
  x varchar(32)
)
CHARACTER SET utf8

the default collation for utf8 is chosen -> general_ci -> o=ö
You can specifiy the default collation for the table when defining it

CREATE TABLE soFoo (
  x varchar(32)
)
CHARACTER SET utf8 COLLATE utf8_turkish_ci

Since you already have a table plus data, you can change the collation of the table ...but if you do it on the table level you have to use ALTER TABLE ... CONVERT (in case you use MODIFY, the column keeps its "original" collation).

ALTER TABLE soFoo CONVERT TO CHARACTER SET utf8 COLLATE utf8_turkish_ci

That should pretty much take care of your problem.


As a side note there is (as mentioned) a collation assigned to your connection as well. Selecting a charset means selecting a collation. I use mainly PDO when (directly) connecting to MySQL and my default connection code looks like this

$pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array(
    PDO::ATTR_EMULATE_PREPARES=>false,
    PDO::MYSQL_ATTR_DIRECT_QUERY=>false,
    PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION
));

note the charset=utf8; no collation, so again general_ci is assigned to the connection. And that's why

<?php
$pdo = new PDO('mysql:host=localhost;dbname=test;charset=utf8', 'localonly', 'localonly', array(
    PDO::ATTR_EMULATE_PREPARES=>false,
    PDO::MYSQL_ATTR_DIRECT_QUERY=>false,
    PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION
));

$smallodiaresis_utf8 = chr(0xC3) . chr(0xB6);
foreach( $pdo->query("SELECT 'o'='$smallodiaresis_utf8'") as $row ) {
    echo $row[0];
}

prints 1 meaning o==ö. The string literals used in the statement are treated as utf8/utf8_general_ci.

I could either specify the collation for the string literal explicitly in the statement

SELECT 'o' COLLATE utf8_turkish_ci ='ö'

(only setting it for one of the two literals/operands; for why and how this works see Collation of Expressions)
or I can set the connection collation via

$pdo->exec("SET collation_connection='utf8_turkish_ci'");

both result in

foreach( $pdo->query("SELECT 'o'[...]='$smallodiaresis_utf8'") as $row ) {
    echo $row[0];
}

printing 0.

edit: and to complicate things even a bit further:
The charset utf8 can't represent all possible characters. There's an even broader character set utf8mb4.

Thursday, November 17, 2022
2

obviously 'localhost' isn't the correct URL that you want to connect to.

if you're running this app from an emulator and you want to connect to a web server running on the pc running the emulator, use 10.0.2.2, more about this - http://developer.android.com/tools/devices/emulator.html

if you're not on an emulator, use the actual ip/host of the machine running the web server. make sure its accessible to the Android device first.

Wednesday, September 14, 2022
1

query to create database only once if it doesn't exists

            CREATE DATABASE IF NOT EXISTS DBName;

query to create table only once if it doesn't exists

            CREATE TABLE IF NOT EXISTS tablename; 
Saturday, September 10, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :