I've got this regular expression which removes common words($commonWords) from a string($input) an I would like to tweak it so that it ignores hyphenated words as these sometimes contain common words.

return preg_replace('/b('.implode('|',$commonWords).')b/i','',$input);





return preg_replace('/(?<!-)b('.implode('|',$commonWords).')b(?!-)/i','',$input);

This adds negative lookaround expressions to the start and end of the regex so that a match is only allowed if there is no dash before or after the match.

Friday, September 9, 2022

Assuming you want to remove both ( and ) from the $search string:

$search = preg_replace('/(|)/','',$search);

I think the fastest way to do this is using the strtr function, like this:

$search = strtr($search, array('(' => '', ')' => ''));
Tuesday, November 15, 2022

AFAIK Lucene can do what you want. With StandardAnalyzer and StopAnalyzer you can to the stop word removal. In combination with the Lucene contrib-snowball (which includes work from Snowball) project you can do the stemming too.

But for stemming also consider this answer to: Stemming algorithm that produces real words

Thursday, November 24, 2022

preg_replace('/[_]+/', '_', $your_string);

Friday, December 9, 2022

How about this?


This doesn't take into account anything non-alphabetic though. It also assumes that all words are separated by a single whitespace character. You will need to modify it if you want more complex support.

Monday, September 12, 2022
