Viewed   114 times

I am generating XML using PHP library as below:

$dom = new DOMDocument("1.0","utf-8");

Doing above results in a page which shows a message on top of the output.

This page contains the following errors: error on line 16 at column 274505: PCDATA invalid Char value 27 Below is a rendering of the page up to the first error.

I have tried rectifying using Tidy library.. used iconv to get the chinese character in UTF-8.

 Answers

2

A useful function to get rid of that error is suggested on this website. http://www.phpwact.org/php/i18n/charsets#common_problem_areas_with_utf-8

When you put utf-8 encoded strings in a XML document you should remember that not all utf-8 valid chars are accepted in a XML document http://www.w3.org/TR/REC-xml/#charsets

So you should strip away the unwanted chars, else you’ll have an XML fatal parsing error such as above

function utf8_for_xml($string)
{
    return preg_replace ('/[^x{0009}x{000a}x{000d}x{0020}-x{D7FF}x{E000}-x{FFFD}]+/u', ' ', $string);
}

Hope that saves someone else some time..

Saturday, October 15, 2022
4
header('Content-type: text/html; charset=UTF-8') ;

/**
 * Encodes HTML safely for UTF-8. Use instead of htmlentities. 
 *
 * @param string $var 
 * @return string 
 */
function html_encode($var)
{
    return htmlentities($var, ENT_QUOTES, 'UTF-8');
}

Those two rescued me and I think it is now working. I'll come back if I continue to encounter problems. Should I store it in the DB, eg as "&" or as "&"?

Wednesday, November 16, 2022
 
4

Not a full solution, but just a data point (because encoding issues suck). When I downloaded your XML via the static file and PHP file and diffed them, I got the following results

% diff php.xml static.xml
1c1
< <?xml version="1.0"?>
---
> <?xml version="1.0"?>
10a11
>
19a21
>

Your static file has an extra "non-ASCII" character at the start of it.

My guess is your static XML file has a UTF-8 BOM that PHP generated file doesn't, and that your flash movie is expecting a UTF-8 file. I'd try generating a BOM with your PHP XML file and seeing it it helps, or fiddling with your server's encoding settings.

Try changing you PHP so it outputs the following header (matching the web page the movie is embedded in).

header('Content-Type: text/html; charset=UTF-8') 
Tuesday, November 1, 2022
 
exort
 
1

From your output:

Error #1088: The markup in the document following the root element must be well-formed.

It seems that the problem is not with the loader but with the PHP output. Check to make sure your output looks as expected by accessing the generated XML directly from your browser and downloading it. You may be able to spot the error in the output if you go through it in a text editor line-by-line, or you could try using an XML editor and see if it finds issues.

Tuesday, November 29, 2022
 
grant
 
1

Your problem is that your SET NAMES 'utf8_persian_ci' command was invalid (utf8_persion_ci is a collation, not an encoding). If you run it in a terminal you will see an error Unknown character set: 'utf8_persian_ci'. Thus your application, when it stored the data, was using the latin1 character set. MySQL interpreted your input as latin1 characters which it then stored encoded as utf-8. Likewise when the data was pulled back out, MySQL converted it from UTF-8 back to latin1 and (hopefully, most of the time) the original bytes you gave it.

In other words, all your data in the database is completely messed up, but it just so happened to work.

To fix this, you need to undo what you did. The most straightforward way is using PHP:

  1. SET NAMES latin1;
  2. Select every single text field from every table.
  3. SET NAMES utf8;
  4. Update the same rows using the same string unaltered.

Alternatively you can perform these steps inside MySQL, but it's tricky because MySQL understands the data to be in a certain character set. You need to modify your text columns to a BLOB type, then modify them back to text types with a utf8 character set. See the section at the bottom of the ALTER TABLE MySQL documentation labeled "Warning" in red.

After you do either one of these things, the bytes stored in your database columns will be the actual character set they claim to be. Then, make sure you always use mysql_set_charset('utf8') on any database access from PHP that you may do in the future! Otherwise you will mess things up again. (Note, do not use a simple mysql_query('SET NAMES utf8')! There are corner cases (such as a reset connection) where this can be reset to latin1 without your knowledge. mysql_set_charset() will set the charset whenever necessary.)

It would be best if you switched away from mysql_* functions and used PDO instead with the charset=utf8 parameter in your PDO dsn.

Tuesday, August 23, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :