Viewed   179 times

Only today I realized that I was missing this in my PHP scripts:

mysql_set_charset('utf8');

All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.

So, until now, every time I "INSERT" something with diacritics, example:

mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');

The 'name' contents would be, in this case: Jáuò Iñe.

Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:

$m = mysql_real_escape_string('¿<?php echo "¬<b>'PHP &aacute; (á)??riî? </b>"; ?> ?-?i abcdd;//;ñç´????????ç?â????????????ñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');

$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
    $message = $row['name'];
    $message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
    //$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
    mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}

It "UPDATE"s with the expected characters, except that the string gets truncated after the character "?". I mean, that character and following chars are not included on the string.

Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT

I also tested several charsets, between ISO-8859-1 and ISO-8859-15.

 Answers

5

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like

convert(cast(convert(name using  latin1) as binary) using utf8)

It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

Wednesday, October 26, 2022
5

The U+FFFD is a replacement character used to replace an unknown or unprintable character. Basically, this means you are trying to show an unprintable character.

Maybe this will offer some guidance on how to proceed: How to handle user input of invalid UTF-8 characters?

Tuesday, December 6, 2022
 
grizzly
 
3

U+2019 RIGHT SINGLE QUOTATION MARK is not a character in ISO-8859-1. It is a character in windows-1252, as 0x92. The actual ISO-8859-1 character 0x92 is a rarely-used C1 control character called "Private Use 2".

It is very common to mislabel Windows-1252 text data with the charset label ISO-8859-1. Many web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content.

It appears that this is what's happening here. Change "ISO-8859-1" to "windows-1252".

Thursday, December 8, 2022
 
1

Thanks all for your replies. I wrote one myself. Please note that this uses jQuery.

Code snippet:

var myList = [
  { "name": "abc", "age": 50 },
  { "age": "25", "hobby": "swimming" },
  { "name": "xyz", "hobby": "programming" }
];

// Builds the HTML Table out of myList.
function buildHtmlTable(selector) {
  var columns = addAllColumnHeaders(myList, selector);

  for (var i = 0; i < myList.length; i++) {
    var row$ = $('<tr/>');
    for (var colIndex = 0; colIndex < columns.length; colIndex++) {
      var cellValue = myList[i][columns[colIndex]];
      if (cellValue == null) cellValue = "";
      row$.append($('<td/>').html(cellValue));
    }
    $(selector).append(row$);
  }
}

// Adds a header row to the table and returns the set of columns.
// Need to do union of keys from all records as some records may not contain
// all records.
function addAllColumnHeaders(myList, selector) {
  var columnSet = [];
  var headerTr$ = $('<tr/>');

  for (var i = 0; i < myList.length; i++) {
    var rowHash = myList[i];
    for (var key in rowHash) {
      if ($.inArray(key, columnSet) == -1) {
        columnSet.push(key);
        headerTr$.append($('<th/>').html(key));
      }
    }
  }
  $(selector).append(headerTr$);

  return columnSet;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<body onLoad="buildHtmlTable('#excelDataTable')">
  <table id="excelDataTable" border="1">
  </table>
</body>
Sunday, November 20, 2022
 
r3verse
 
5

This is very similar to @eddi's answer, but using base difftime instead of lubridate functions:

# modifying the example:
DT[1, StartDateTime := as.POSIXct("2015-01-21 13:12")]

DT[,{
    t0  = StartDateTime
    t1  = StartDateTime + Duration*60

    h0  = trunc(t0, units="hour") 
    h1  = trunc(t1, units="hour") 
    h   = seq(h0, h1, by="hour")
    nh  = length(h)     

    dur = as.difftime(rep("1",nh), format="%H", units="mins")
    if (h0 <  t0) dur[1 ] = difftime(h0 + as.difftime("1", format="%H", units="mins"), t0)
    if (h1 <  t1) dur[nh] = difftime(t1, h1)
    if (h0 == h1) dur     = difftime(t1, t0)

    list(h = h, dur = dur)
}, by=.(IDX, ID, Trip)]

which gives

    IDX ID Trip                   h     dur
 1:   1  1    1 2015-01-21 13:00:00 48 mins
 2:   1  1    1 2015-01-21 14:00:00 52 mins
 3:   2  1    1 2015-01-21 13:00:00 60 mins
 4:   2  1    1 2015-01-21 14:00:00 60 mins
 5:   2  1    1 2015-01-21 15:00:00 60 mins
 6:   2  1    1 2015-01-21 16:00:00  4 mins
 7:   3  1    1 2015-01-21 10:00:00 60 mins
 8:   3  1    1 2015-01-21 11:00:00 31 mins
 9:   4  2    1 2015-01-22 13:00:00 30 mins
10:   5  2    2 2015-01-30 23:00:00 60 mins
11:   5  2    2 2015-01-31 00:00:00 40 mins
Friday, November 25, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :