In my string I have utf-8 non-breaking space (0xc2a0) and I want to replace it with something else.
When I use
$str=preg_replace('~xc2xa0~', 'X', $str);
it works OK.
But when I use
$str=preg_replace('~x{C2A0}~siu', 'W', $str);
non-breaking space is not found (and replaced).
Why? What is wrong with second regexp?
The format x{C2A0}
is correct, also I used u
flag.
Actually the documentation about escape sequences in PHP is wrong. When you use
xc2xa0
syntax, it searches for UTF-8 character. But withx{c2a0}
syntax, it tries to convert the Unicode sequence to UTF-8 encoded character.A non breaking space is
U+00A0
(Unicode) but encoded asC2A0
in UTF-8. So if you try with the pattern~x{00a0}~siu
, it will work as expected.