Viewed   249 times

I need split string by commas and spaces, but ignore the inside quotes, single quotes and parentheses

$str = "Questions, "Quote",'single quote','comma,inside' (inside parentheses) space #specialchar";

so that the resultant array will have

[0]Questions
[1]Quote
[2]single quote
[3]comma,inside
[4]inside parentheses
[5]space
[6]#specialchar

my atual regexp is

$tags = preg_split("/[,s]*[^ws]+[s]*/", $str,0,PREG_SPLIT_NO_EMPTY);

but this is ignoring special chars and stil split the commas inside quotes, the resultant array is :

[0]Questions
[1]Quote
[2]single quote
[3]comma
[4]inside
[5]inside parentheses
[6]space
[7]specialchar

ps: this is no csv

Many Thanks

 Answers

2

This will work only for non-nested parentheses:

    $regex = <<<HERE
    /  "  ( (?:[^"\\]++|\\.)*+ ) "
     | '  ( (?:[^'\\]++|\\.)*+ ) '
     | ( ( [^)]*                  ) )
     | [s,]+
    /x
    HERE;

    $tags = preg_split($regex, $str, -1,
                         PREG_SPLIT_NO_EMPTY
                       | PREG_SPLIT_DELIM_CAPTURE);

The ++ and *+ will consume as much as they can and give nothing back for backtracking. This technique is described in perlre(1) as the most efficient way to do this kind of matching.

Saturday, October 29, 2022
5

The Arabic regex is:

[u0600-u06FF]

Actually, ?-? is a subset of this Arabic range, so I think you can remove them from the pattern.

So, in JS it will be

/^[a-z0-9+,()/'su0600-u06FF-]+$/i

See regex demo

Tuesday, October 11, 2022
3

Here's a sample of a parser that would implement your need :

public static List<String> splitter(String input) {
    int nestingLevel=0;
    StringBuilder currentToken=new StringBuilder();
    List<String> result = new ArrayList<>();
    for (char c: input.toCharArray()) {
        if (nestingLevel==0 && c == '/') { // the character is a separator !
            result.add(currentToken.toString());
            currentToken=new StringBuilder();
        } else {
            if (c == '(') { nestingLevel++; }
            else if (c == ')' && nestingLevel > 0) { nestingLevel--; }

            currentToken.append(c);
        }
    }
    result.add(currentToken.toString());
    return result;
}

You can try it here.

Note that it doesn't lead to the expected output you posted, but I'm not sure what algorithm you were following to obtain such result. In particular I've made sure there's no "negative nesting level", so for starters the / in "Mango 003 )/( ASDJ" is considered outside of parenthesis and is parsed as a separator.

Anyway I'm sure you can tweak my answer much more easily than you would a regex answer, the whole point of my answer being to show that writing a parser to handle such problems is often more realistic than to bother trying to craft a regex.

Tuesday, October 18, 2022
 
alla
 
3

For this PHP regex:

$str = preg_replace ( '{(.)1+}', '$1', $str );
$str = preg_replace ( '{[ '-_()]}', '', $str )

In Java:

str = str.replaceAll("(.)\1+", "$1");
str = str.replaceAll("[ '-_\(\)]", "");

I suggest you to provide your input and expected output then you will get better answers on how it can be done in PHP and/or Java.

Sunday, October 9, 2022
 
haodong
 
5

Use a CSV parser like OpenCSV to take care of things like commas in quoted elements, values that span multiple lines etc. automatically. You can use the library to serialize your text back as CSV as well.

String str = "value1, value2, value3, value4, "value5, 1234", " +
        "value6, value7, "value8", value9, "value10, 123.23"";

CSVReader reader = new CSVReader(new StringReader(str));

String [] tokens;
while ((tokens = reader.readNext()) != null) {
    System.out.println(tokens[0]); // value1
    System.out.println(tokens[4]); // value5, 1234
    System.out.println(tokens[9]); // value10, 123.23
}
Monday, August 8, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :