After upgrading PHP on our development server from 5.2 to 5.3, we're encountering an issue where data requested from our database and displayed on a web page shows with improper encoding when attempting to display Russian characters.
- Dev OS: Debian GNU/Linux 6.0
- Dev PHP: 5.3.5-0.dotdeb.1
- Live MySQL: Distrib 5.1.49
In PHP 5.3, the default client library for interacting with MySQL databases changed from libmysql to mysqlnd, which would appear to be the cause of the issue we are encountering.
We are connecting to the database with the following code:
$conn = mysql_pconnect('database.hostname', 'database_user', 'database_password'); $mysql_select_db('database', $conn);
The data stored in our database is encoded with UTF-8 encoding. Connecting to the database via the command-line client and running queries confirms that the data is intact and encoded properly. However, when we query the database in PHP and try to display the exact same data, it becomes garbled. In this specific case, we're attempting to display Russian characters and the result is non-English, non-Russian characters:
The response headers we receive confirm that the content-type is UTF-8:
We tested the strings before display with mb_detect_encoding in strict mode as well as mb_check_encoding and were told the string was a UTF-8 string before displaying it. We also used mysql_client_encoding to test the client encoding and it also indicates the character set is UTF-8.
In performing research, we discovered some suggestions to try to work around this issue:
header("Content-type: text/html; charset=utf-8"); mysql_set_charset('utf8'); mysql_query("SET SESSION character_set_results = 'UTF8'"); mysql_query('SET NAMES UTF8', $conn);
We even tried utf8_encode:
However, none of these solutions worked.
Running out of options, we upgraded MySQL on our development system to Distrib 5.1.55. After that upgrade, everything displayed correctly when we connected to our development database. Of course, it continues to display incorrectly when we connect to our live database.
Ideally, we would like to resolve this issue without upgrading MySQL on our production servers unless we can verify the exact reason why this isn't working and why the upgrade will fix it. How can we resolve this encoding issue without upgrading MySQL? Alternatively, why does the MySQL upgrade fix the issue?
If you have made sure that both the tables, and the output encoding are UTF-8, almost the only thing left is the connection encoding.
The reason for the change in behaviour when updating servers could be a change of the default connection encoding:
However, I can't see any changes in the default encoding between versions, so if those were brand-new installs, I can't see that happening.
Anyway, what happens if you run this from within your PHP query and output the results. Any differences to the command line output?