I need to split a paragraph into sentences. That's where i got a bit confused with the regex.
I have already referred this question to which this Q is marked as a duplicate to. but the issue here is different.
Here is a example of the string i need to split :
hello! how are you? how is life
live life, live free. "isnt it?"
here is the code i tried :
$sentence_array = preg_split('/([.!?rn|r|n])+(?![^"]*")/', $paragraph, -1);
What i need is :
array (
[0] => "hello"
[1] => "how are you"
[2] => "how is life"
[3] => "live life, live free"
[4] => ""isnt it?""
)
What i get is :
array(
[0] => "hello! how are you? how is life live life, live free. "isnt it?""
)
When i do not have any quotes in the string, the split works as required.
Any help is appreciated. Thank you.
There are some problems with your regular expression that the main of them is confusing group constructs with character classes. A pipe
|
in a character class means a|
literally. It doesn't have any special meaning.What you need is this:
This first tries to match a string enclosed in double quotation marks (and captures the content). Then tries to match any punctuation marks from
[!?.]
set to split on them. Then goes for any kind of newline characters if found.PHP:
Output: