Viewed   58 times

I know this comment I would like to have a similar tool like tr for PHP such that I can run simply

tr -d " " ""

I run unsuccessfully the function php_strip_whitespace by

$tags_trimmed = php_strip_whitespace($tags);

I run the regex function also unsuccessfully

$tags_trimmed = preg_replace(" ", "", $tags);



A regular expression does not account for UTF-8 characters by default. The s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines

// http://.com/a/1279798/54964
$str=preg_replace('/s+/', '', $str);

With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the s cannot account for.

To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.

Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (x80 in the quad set could replace all x80 sub-bytes in smart quotes)

$cleanedstr = preg_replace(
    "/(t|n|v|f|r| |xC2x85|xc2xa0|xe1xa0x8e|xe2x80[x80-x8D]|xe2x80xa8|xe2x80xa9|xe2x80xaF|xe2x81x9f|xe2x81xa0|xe3x80x80|xefxbbxbf)+/",

This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:

nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.

Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"

[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]

Thursday, August 4, 2022

If they're only in the current directory

find * -type f -print

Is that what you want?

Monday, November 14, 2022

There is a special-case shortcut for exactly this use case!

If you call str.split without an argument, it splits on runs of whitespace instead of single characters. So:

>>> ' '.join("Please n don't t hurt x0b me.".split())
"Please don't hurt me."
Thursday, September 1, 2022

You want to replace it, not strip it:

s = s.replace(',', '')
Monday, December 12, 2022

str_pad is padding with spaces, not adding spaces. You're padding an existing value with spaces so that it is 6 characters long, not adding 6 whitespaces to the value. So if $_SESSION["emp_id"] is 6 characters long or more, nothing will be added.

Thursday, September 1, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :