Viewed   86 times

I am facing an issue with URLs, I want to be able to convert titles that could contain anything and have them stripped of all special characters so they only have letters and numbers and of course I would like to replace spaces with hyphens.

How would this be done? I've heard a lot about regular expressions (regex) being used...

 Answers

4

This should do what you're looking for:

function clean($string) {
   $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.

   return preg_replace('/[^A-Za-z0-9-]/', '', $string); // Removes special chars.
}

Usage:

echo clean('a|"bc!@£de^&$f g');

Will output: abcdef-g

Edit:

Hey, just a quick question, how can I prevent multiple hyphens from being next to each other? and have them replaced with just 1?

function clean($string) {
   $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
   $string = preg_replace('/[^A-Za-z0-9-]/', '', $string); // Removes special chars.

   return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}
Saturday, October 1, 2022
3

There's no need to use a regex for this. PHP has an inbuilt function to do just this. Use parse_url():

$domain = parse_url($url, PHP_URL_HOST);
Tuesday, December 27, 2022
 
lher
 
2

Many frameworks provide functions for this

CodeIgniter: http://bitbucket.org/ellislab/codeigniter/src/c39315f13a76/system/helpers/url_helper.php#cl-472

wordpress (has many more in the code): http://core.trac.wordpress.org/browser/trunk/wp-includes/formatting.php#L814

Sunday, November 13, 2022
 
5

You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all from the stringr package, though gsub from base R works just as well.

The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.

x <- "a1~!@#$%^&*(){}_+:"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")

(The base R equivalent is gsub("[[:punct:]]", " ", x).)

An alternative is to swap out all non-alphanumeric characters.

str_replace_all(x, "[^[:alnum:]]", " ")

Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.

Thursday, November 24, 2022
 
2

You can try p{L} for all letters and p{N} for all numbers:

text = text.replaceAll("[^\p{L}\p{N} ]+", "");
Tuesday, December 27, 2022
 
tlav
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :