What is the best way to remove accents eg.
ÈâuÑ" becomes "Eaun"
Without using iconv
What is the best way to remove accents eg.
ÈâuÑ" becomes "Eaun"
Without using iconv
You can break each data by -
and build the array in as much as needed. Notice the use of &
in the code as using reference to result array.
Example:
$str = "15-02-01-0000,15-02-02-0000,15-02-03-0000,15-02-04-0000,15-02-05-0000,15-02-10-0000,15-02-10-9100,15-02-10-9101,15-15-81-0000,15-15-81-0024";
$arr = explode(",", $str);
$res = [];
foreach($arr as $e) { // for each line in your data
$a = explode("-", $e); //break to prefix
$current = &$res;
while(count($a) > 1) { // create the array to that specific place if needed
$key = array_shift($a); // take the first key
if (!isset($current[$key])) // if the path not exist yet create empty array
$current[$key] = array();
$current = &$current[$key];
}
$current[] = $e; // found the right path so add the element
}
The full result will be in $res
.
correct me if i'm wrong, but i don't think you can do this with a simple regexp. in a full regexp implementation you could use something like this :
$parts = preg_split("/(?<!<[^>]*)./", $input);
but php does not allow non-fixed-length lookbehind, so that won't work. apparently the only 2 that do are jgsoft and the .net regexp. Useful Page
my method of dealing with this would be :
function splitStringUp($input, $maxlen) {
$parts = explode(".", $input);
$i = 0;
while ($i < count($parts)) {
if (preg_match("/<[^>]*$/", $parts[$i])) {
array_splice($parts, $i, 2, $parts[$i] . "." . $parts[$i+1]);
} else {
if ($i < (count($parts) - 1) && strlen($parts[$i] . "." . $parts[$i+1]) < $maxlen) {
array_splice($parts, $i, 2, $parts[$i] . "." . $parts[$i+1]);
} else {
$i++;
}
}
}
return $parts;
}
you didn't mention what you want to happen when an individual sentence is >8000 chars long, so this just leaves them intact.
sample output :
splitStringUp("this is a sentence. this is another sentence. this is an html <a href="a.b.c">tag. and the closing tag</a>. hooray", 8000);
array(1) {
[0]=> string(114) "this is a sentence. this is another sentence. this is an html <a href="a.b.c">tag. and the closing tag</a>. hooray"
}
splitStringUp("this is a sentence. this is another sentence. this is an html <a href="a.b.c">tag. and the closing tag</a>. hooray", 80);
array(2) {
[0]=> string(81) "this is a sentence. this is another sentence. this is an html <a href="a.b.c">tag"
[1]=> string(32) " and the closing tag</a>. hooray"
}
splitStringUp("this is a sentence. this is another sentence. this is an html <a href="a.b.c">tag. and the closing tag</a>. hooray", 40);
array(4) {
[0]=> string(18) "this is a sentence"
[1]=> string(25) " this is another sentence"
[2]=> string(36) " this is an html <a href="a.b.c">tag"
[3]=> string(32) " and the closing tag</a>. hooray"
}
splitStringUp("this is a sentence. this is another sentence. this is an html <a href="a.b.c">tag. and the closing tag</a>. hooray", 0);
array(5) {
[0]=> string(18) "this is a sentence"
[1]=> string(25) " this is another sentence"
[2]=> string(36) " this is an html <a href="a.b.c">tag"
[3]=> string(24) " and the closing tag</a>"
[4]=> string(7) " hooray"
}
I think the problem here is that your encodings consider ä and å different symbols to 'a'. In fact, the PHP documentation for strtr offers a sample for removing accents the ugly way :(
http://ie2.php.net/strtr
Experiment shows that indicating UTF-16
rather than UTF-16BE
does what you want:
iconv -f UTF-16 -t UTF-8 myfile.txt
Complete working code. I know this is long, but it's a sure-shot way used by Wordpress.