I found it in the following regex:
[(?:[^][]|(?R))*]
It matches square brackets (with their content) together with nested square brackets.
I found it in the following regex:
[(?:[^][]|(?R))*]
It matches square brackets (with their content) together with nested square brackets.
The Arabic regex is:
[u0600-u06FF]
Actually, ?-?
is a subset of this Arabic range, so I think you can remove them from the pattern.
So, in JS it will be
/^[a-z0-9+,()/'su0600-u06FF-]+$/i
See regex demo
See also a lot of general hints and useful links at the regex tag details page.
Online tutorials
Quantifiers
*
:greedy, *?
:reluctant, *+
:possessive+
:greedy, +?
:reluctant, ++
:possessive?
:optional (zero-or-one){n,m}
:between n & m, {n,}
:n-or-more, {n}
:exactly n{n}
and {n}?
Character Classes
[...]
: any one character, [^...]
: negated/any character but[^]
matches any one character including newlines javascript[w-[d]]
/ [a-z-[qz]]
: set subtraction .net, xml-schema, xpath, JGSoft[w&&[^d]]
: set intersection java, ruby 1.9+[[:alpha:]]
:POSIX character classes[^\D2]
, [^[^0-9]2]
, [^2[^0-9]]
get different results in Java? javad
:digit, D
:non-digitw
:word character, W
:non-word characters
:whitespace, S
:non-whitespacep{L}, P{L}
, etc.)Escape Sequences
h
:space-or-tab, t
:tabr
, n
:carriage return and line feedR
:generic newline php java-8H
:Non horizontal whitespace character, V
:Non vertical whitespace character, N
:Non line feed character pcre php5 java-8v
:vertical tab, e
:the escape characterAnchors
^
:start of line/input, b
:word boundary, and B
:non-word boundary, $
:end of line/inputA
:start of input, Z
:end of input php, perl, rubyz
:the very end of input (Z
in Python) .net, php, pcre, java, ruby, icu, swift, objective-cG
:start of match php, perl, ruby(Also see "Flavor-Specific Information ? Java ? The functions in Matcher
")
Groups
(...)
:capture group, (?:)
:non-capture group
1
:backreference and capture-group reference, $1
:capture group reference
g<1>123
:How to follow a numbered capture group, such as 1
, with a number?: python(?i:regex)
mean?(?P<group_name>regexp)
mean?(?>)
:atomic group or independent group, (?|)
:branch reset
regular-expressions.info
(?<groupname>regex)
: Overview and naming rules (Non- links)(?P<groupname>regex)
python, (?<groupname>regex)
.net, (?<groupname>regex)
perl, (?P<groupname>regex)
and (?<groupname>regex)
phpLookarounds
(?=...)
:positive, (?!...)
:negative(?<=...)
:positive, (?<!...)
:negative (not supported by javascript){0,n}
javaK
php, perl (Flavors that support K
)Modifiers
flag | modifier | flavors |
---|---|---|
a |
ASCII | python |
c |
current position | perl |
e |
expression | php perl |
g |
global | most |
i |
case-insensitive | most |
m |
multiline | php perl python javascript .net java |
m |
(non)multiline | ruby |
o |
once | perl ruby |
S |
study | php |
s |
single line | unsupported: javascript (workaround) | ruby |
U |
ungreedy | php r |
u |
unicode | most |
x |
whitespace-extended | most |
y |
sticky ? | javascript |
Other:
|
:alternation (OR) operator, .
:any character, [.]
:literal dot character(*PRUNE)
, (*SKIP)
, (*FAIL)
and (*F)
(*BSR_ANYCRLF)
(?R)
, (?0)
and (?1)
, (?-1)
, (?&groupname)
Common Tasks
{...}
Advanced Regex-Fu
(?!a)a
this
except in contexts A, B and CFlavor-Specific Information
(Except for those marked with *
, this section contains non- links.)
java.util.regex.Matcher
:
matches()
): The match must be anchored to both input-start and -endfind()
): A match may be anywhere in the input string (substrings)lookingAt()
: The match must be anchored to input-start onlyjava.lang.String
functions that accept regular expressions: matches(s)
, replaceAll(s,s)
, replaceFirst(s,s)
, split(s)
, split(s,i)
java.util.regex
preg_match
search
vs match
, how-toregex
, struct regex::Regex
regexp
commandGeneral information
(Links marked with *
are non- links.)
Examples of regex that can cause regex engine to fail
Tools: Testers and Explainers
(This section contains non- links.)
Online (* includes replacement tester, + includes split tester):
freeformatter.com
xregexpregex.larsolavtorvik.com
php PCRE and POSIX, javascriptOffline:
For this PHP regex:
$str = preg_replace ( '{(.)1+}', '$1', $str );
$str = preg_replace ( '{[ '-_()]}', '', $str )
In Java:
str = str.replaceAll("(.)\1+", "$1");
str = str.replaceAll("[ '-_\(\)]", "");
I suggest you to provide your input and expected output then you will get better answers on how it can be done in PHP and/or Java.
\pL
is a Unicode property shortcut. It can also be written as asp{L}
or p{Letter}
. It matches any kind of letter from any language.
[^][]
is a character class that means all characters except[
and]
.You can avoid escaping
[
and]
special characters since it is not ambiguous for the PCRE, the regex engine used inpreg_
functions.Since
[^]
is incorrect in PCRE, the only way for the regex to parse is that]
is inside the character class which will be closed later. The same with the[
that follows. It can not reopen a character class (except a POSIX character class[:alnum:]
) inside a character class. Then the last]
is clear; it is the end of the character class. However, a[
outside a character class must be escaped since it is parsed as the beginning of a character class.In the same way, you can write
[]]
or[[]
or[^[]
without escaping the[
or]
in the character class.Note: since PHP 7.3, you can use the inline xx modifier that allows blank characters to be ignored even inside character classes. This way you can write these classes in a less ambigous way like that:
(?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ]
.You can use this syntax with several regex flavour: PCRE (PHP, R), Perl, Python, Java, .NET, GO, awk, Tcl (if you delimit your pattern with curly brackets, thanks Donal Fellows), ...
But not with: Ruby, JavaScript (except for IE < 9), ...
As m.buettner noted,
[^]]
is not ambiguous because]
is the first character,[^a]]
is seen as all that is not aa
followed by a]
. To havea
and]
, you must write:[^a]]
or[^]a]
In particular case of JavaScript, the specification allow
[]
as a regex token that never matches (in other words,[]
will always fail) and[^]
as a regex that matches any character. Then[^]]
is seen as any character followed by a]
. The actual implementation varies, but modern browser generally sticks to the definition in the specification.Pattern details:
In your pattern example, you don't need to escape the last
]
But you can do the same with this pattern a little bit optimized, and more useful cause reusable as subpattern (with the
(?-1)
):([(?:[^][]+|(?-1))*+])
or better:
([[^][]*(?:(?-1)[^][]*)*+])
that avoids the cost of an alternation.