Viewed   131 times

I found it in the following regex:

[(?:[^][]|(?R))*]

It matches square brackets (with their content) together with nested square brackets.

 Answers

5

[^][] is a character class that means all characters except [ and ].

You can avoid escaping [ and ] special characters since it is not ambiguous for the PCRE, the regex engine used in preg_ functions.

Since [^] is incorrect in PCRE, the only way for the regex to parse is that ] is inside the character class which will be closed later. The same with the [ that follows. It can not reopen a character class (except a POSIX character class [:alnum:]) inside a character class. Then the last ] is clear; it is the end of the character class. However, a [ outside a character class must be escaped since it is parsed as the beginning of a character class.

In the same way, you can write []] or [[] or [^[] without escaping the [ or ] in the character class.

Note: since PHP 7.3, you can use the inline xx modifier that allows blank characters to be ignored even inside character classes. This way you can write these classes in a less ambigous way like that: (?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ].

You can use this syntax with several regex flavour: PCRE (PHP, R), Perl, Python, Java, .NET, GO, awk, Tcl (if you delimit your pattern with curly brackets, thanks Donal Fellows), ...

But not with: Ruby, JavaScript (except for IE < 9), ...

As m.buettner noted, [^]] is not ambiguous because ] is the first character, [^a]] is seen as all that is not a a followed by a ]. To have a and ], you must write: [^a]] or [^]a]

In particular case of JavaScript, the specification allow [] as a regex token that never matches (in other words, [] will always fail) and [^] as a regex that matches any character. Then [^]] is seen as any character followed by a ]. The actual implementation varies, but modern browser generally sticks to the definition in the specification.

Pattern details:

[          # literal [
(?:         # open a non capturing group
    [^][]   # a character that is not a ] or a [
  |         # OR
    (?R)    # the whole pattern (here is the recursion)
)*          # repeat zero or more time
]          # a literal ]

In your pattern example, you don't need to escape the last ]

But you can do the same with this pattern a little bit optimized, and more useful cause reusable as subpattern (with the (?-1)): ([(?:[^][]+|(?-1))*+])

(                     # open the capturing group
    [                # a literal [
        (?:           # open a non-capturing group
            [^][]+    # all characters but ] or [ one or more time
          |           # OR
            (?-1)     # the last opened capturing group (recursion)
                      # (the capture group where you are)
        )*+           # repeat the group zero or more time (possessive)
    ]                 # literal ] (no need to escape)
)                     # close the capturing group

or better: ([[^][]*(?:(?-1)[^][]*)*+]) that avoids the cost of an alternation.

Friday, October 7, 2022
5

The Arabic regex is:

[u0600-u06FF]

Actually, ?-? is a subset of this Arabic range, so I think you can remove them from the pattern.

So, in JS it will be

/^[a-z0-9+,()/'su0600-u06FF-]+$/i

See regex demo

Tuesday, October 11, 2022
5

The Regular Expressions FAQ

See also a lot of general hints and useful links at the regex tag details page.


Online tutorials

  • RegexOne ?
  • Regular Expressions Info ?

Quantifiers

  • Zero-or-more: *:greedy, *?:reluctant, *+:possessive
  • One-or-more: +:greedy, +?:reluctant, ++:possessive
  • ?:optional (zero-or-one)
  • Min/max ranges (all inclusive): {n,m}:between n & m, {n,}:n-or-more, {n}:exactly n
  • Differences between greedy, reluctant (a.k.a. "lazy", "ungreedy") and possessive quantifier:
    • Greedy vs. Reluctant vs. Possessive Quantifiers
    • In-depth discussion on the differences between greedy versus non-greedy
    • What's the difference between {n} and {n}?
    • Can someone explain Possessive Quantifiers to me? php, perl, java, ruby
    • Emulating possessive quantifiers .net
    • Non- references: From Oracle, regular-expressions.info

Character Classes

  • What is the difference between square brackets and parentheses?
  • [...]: any one character, [^...]: negated/any character but
  • [^] matches any one character including newlines javascript
  • [w-[d]] / [a-z-[qz]]: set subtraction .net, xml-schema, xpath, JGSoft
  • [w&&[^d]]: set intersection java, ruby 1.9+
  • [[:alpha:]]:POSIX character classes
  • Why do [^\D2], [^[^0-9]2], [^2[^0-9]] get different results in Java? java
  • Shorthand:
    • Digit: d:digit, D:non-digit
    • Word character (Letter, digit, underscore): w:word character, W:non-word character
    • Whitespace: s:whitespace, S:non-whitespace
  • Unicode categories (p{L}, P{L}, etc.)

Escape Sequences

  • Horizontal whitespace: h:space-or-tab, t:tab
  • Newlines:
    • r, n:carriage return and line feed
    • R:generic newline php java-8
  • Negated whitespace sequences: H:Non horizontal whitespace character, V:Non vertical whitespace character, N:Non line feed character pcre php5 java-8
  • Other: v:vertical tab, e:the escape character

Anchors

  • ^:start of line/input, b:word boundary, and B:non-word boundary, $:end of line/input
  • A:start of input, Z:end of input php, perl, ruby
  • z:the very end of input (Z in Python) .net, php, pcre, java, ruby, icu, swift, objective-c
  • G:start of match php, perl, ruby

(Also see "Flavor-Specific Information ? Java ? The functions in Matcher")

Groups

  • (...):capture group, (?:):non-capture group
    • Why is my repeating capturing group only capturing the last match?
  • 1:backreference and capture-group reference, $1:capture group reference
    • What's the meaning of a number after a backslash in a regular expression?
    • g<1>123:How to follow a numbered capture group, such as 1, with a number?: python
  • What does a subpattern (?i:regex) mean?
  • What does the 'P' in (?P<group_name>regexp) mean?
  • (?>):atomic group or independent group, (?|):branch reset
    • Equivalent of branch reset in .NET/C# .net
  • Named capture groups:
    • General named capturing group reference at regular-expressions.info
    • java: (?<groupname>regex): Overview and naming rules (Non- links)
    • Other languages: (?P<groupname>regex) python, (?<groupname>regex) .net, (?<groupname>regex) perl, (?P<groupname>regex) and (?<groupname>regex) php

Lookarounds

  • Lookaheads: (?=...):positive, (?!...):negative
  • Lookbehinds: (?<=...):positive, (?<!...):negative (not supported by javascript)
  • Lookbehind limits in:
    • Lookbehinds need to be constant-length php, perl, python, ruby
    • Lookarounds of limited length {0,n} java
    • Variable length lookbehinds are allowed .net
  • Lookbehind alternatives:
    • Using K php, perl (Flavors that support K)
    • Alternative regex module for Python python
      • The hacky way
      • JavaScript negative lookbehind equivalents External link

Modifiers

flag modifier flavors
a ASCII python
c current position perl
e expression php perl
g global most
i case-insensitive most
m multiline php perl python javascript .net java
m (non)multiline ruby
o once perl ruby
S study php
s single line unsupported: javascript (workaround) | ruby
U ungreedy php r
u unicode most
x whitespace-extended most
y sticky ? javascript
  • How to convert preg_replace e to preg_replace_callback?
  • What are inline modifiers?
  • What is '?-mix' in a Ruby Regular Expression

Other:

  • |:alternation (OR) operator, .:any character, [.]:literal dot character
  • What special characters must be escaped?
  • Control verbs (php and perl): (*PRUNE), (*SKIP), (*FAIL) and (*F)
    • php only: (*BSR_ANYCRLF)
  • Recursion (php and perl): (?R), (?0) and (?1), (?-1), (?&groupname)

Common Tasks

  • Get a string between two curly braces: {...}
  • Match (or replace) a pattern except in situations s1, s2, s3...
  • How do I find all YouTube video ids in a string using a regex?
  • Validation:
    • Internet: email addresses, URLs (host/port: regex and non-regex alternatives), passwords
    • Numeric: a number, min-max ranges (such as 1-31), phone numbers, date
    • Parsing HTML with regex: See "General Information > When not to use Regex"

Advanced Regex-Fu

  • Strings and numbers:
    • Regular expression to match a line that doesn't contain a word
    • How does this PCRE pattern detect palindromes?
    • Match strings whose length is a fourth power
    • How does this regex find triangular numbers?
    • How to determine if a number is a prime with regex?
    • How to match the middle character in a string with regex?
  • Other:
    • How can we match a^n b^n?
    • Match nested brackets
      • Using a recursive pattern php, perl
      • Using balancing groups .net
    • “Vertical” regex matching in an ASCII “image”
    • List of highly up-voted regex questions on Code Golf
    • How to make two quantifiers repeat the same number of times?
    • An impossible-to-match regular expression: (?!a)a
    • Match/delete/replace this except in contexts A, B and C
    • Match nested brackets with regex without using recursion or balancing groups?

Flavor-Specific Information

(Except for those marked with *, this section contains non- links.)

  • Java
    • Official documentation: Pattern Javadoc ?, Oracle's regular expressions tutorial ?
    • The differences between functions in java.util.regex.Matcher:
      • matches()): The match must be anchored to both input-start and -end
      • find()): A match may be anywhere in the input string (substrings)
      • lookingAt(): The match must be anchored to input-start only
      • (For anchors in general, see the section "Anchors")
    • The only java.lang.String functions that accept regular expressions: matches(s), replaceAll(s,s), replaceFirst(s,s), split(s), split(s,i)
    • *An (opinionated and) detailed discussion of the disadvantages of and missing features in java.util.regex
  • .NET
    • How to read a .NET regex with look-ahead, look-behind, capturing groups and back-references mixed together?
  • Official documentation:
    • Boost regex engine: General syntax, Perl syntax (used by TextPad, Sublime Text, UltraEdit, ...???)
    • JavaScript 1.5 general info and RegExp object
    • .NET MySQL Oracle Perl5 version 18.2
    • PHP: pattern syntax, preg_match
    • Python: Regular expression operations, search vs match, how-to
    • Rust: crate regex, struct regex::Regex
    • Splunk: regex terminology and syntax and regex command
    • Tcl: regex syntax, manpage, regexp command
    • Visual Studio Find and Replace

General information

(Links marked with * are non- links.)

  • Other general documentation resources: Learning Regular Expressions, *Regular-expressions.info, *Wikipedia entry, *RexEgg, Open-Directory Project
  • DFA versus NFA
  • Generating Strings matching regex
  • Books: Jeffrey Friedl's Mastering Regular Expressions
  • When to not use regular expressions:
    • Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. (blog post written by 's founder)*
    • Do not use regex to parse HTML:
      • Don't. Please, just don't
      • Well, maybe...if you're really determined (other answers in this question are also good)

Examples of regex that can cause regex engine to fail

  • Why does this regular expression kill the Java regex engine?

Tools: Testers and Explainers

(This section contains non- links.)

  • Online (* includes replacement tester, + includes split tester):

    • Debuggex (Also has a repository of useful regexes) javascript, python, pcre
    • *Regular Expressions 101 php, pcre, python, javascript
    • Regex Pal, regular-expressions.info javascript
    • Rubular ruby RegExr Regex Hero dotnet
    • *+ regexstorm.net .net
    • *RegexPlanet: Java java, Go go, Haskell haskell, JavaScript javascript, .NET dotnet, Perl perl php PCRE php, Python python, Ruby ruby, XRegExp xregexp
    • freeformatter.com xregexp
    • *+regex.larsolavtorvik.com php PCRE and POSIX, javascript
    • Refiddle javascript ruby .net
  • Offline:

    • Microsoft Windows: RegexBuddy (analysis), RegexMagic (creation), Expresso (analysis, creation, free)
Saturday, November 19, 2022
 
jameo
 
3

For this PHP regex:

$str = preg_replace ( '{(.)1+}', '$1', $str );
$str = preg_replace ( '{[ '-_()]}', '', $str )

In Java:

str = str.replaceAll("(.)\1+", "$1");
str = str.replaceAll("[ '-_\(\)]", "");

I suggest you to provide your input and expected output then you will get better answers on how it can be done in PHP and/or Java.

Sunday, October 9, 2022
 
haodong
 
4

\pL is a Unicode property shortcut. It can also be written as asp{L} or p{Letter}. It matches any kind of letter from any language.

Saturday, December 17, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :