Viewed   76 times

I've got this regular expression which removes common words($commonWords) from a string($input) an I would like to tweak it so that it ignores hyphenated words as these sometimes contain common words.

return preg_replace('/b('.implode('|',$commonWords).')b/i','',$input);





return preg_replace('/(?<!-)b('.implode('|',$commonWords).')b(?!-)/i','',$input);

This adds negative lookaround expressions to the start and end of the regex so that a match is only allowed if there is no dash before or after the match.

Friday, September 9, 2022

Assuming you want to remove both ( and ) from the $search string:

$search = preg_replace('/(|)/','',$search);

I think the fastest way to do this is using the strtr function, like this:

$search = strtr($search, array('(' => '', ')' => ''));
Tuesday, November 15, 2022

AFAIK Lucene can do what you want. With StandardAnalyzer and StopAnalyzer you can to the stop word removal. In combination with the Lucene contrib-snowball (which includes work from Snowball) project you can do the stemming too.

But for stemming also consider this answer to: Stemming algorithm that produces real words

Thursday, November 24, 2022

preg_replace('/[_]+/', '_', $your_string);

Friday, December 9, 2022

How about this?


This doesn't take into account anything non-alphabetic though. It also assumes that all words are separated by a single whitespace character. You will need to modify it if you want more complex support.

Monday, September 12, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :