Note: This is a question about possibilities of modern regex flavors. It's not about the best way to solve this using other methods. It's inspired by an earlier question, but that one is not restricted to regex.
The Problem
In an ASCII "image"/art/map/string like:
....X.......
..X..X...X....
X.X...X..X.....
X....XXXXXX.....
X..XXX...........
.....X..........
..............X
..X...........X....
..X...........X....X...
....X.....
I'd like to find a simple vertical line formation of three X
s:
X
X
X
The number of lines is variable in the image, and the width of each line is variable too.
The Question(s)
With regex (PCRE/PHP, Perl, .NET or similar) is it possible to:
- Determine if such a formation exists
- Count the number of such formations/match the starting point of them all (4 in the above example)
Answer to question 1
To answer the first question one could use:
Online demo
This expression works in Perl, PCRE, Java and should work in .NET.
The expression uses lookaheads with self referencing capturing groups to add a character for every repetition of the lookahead (this is used to "count").
1?+
means if1
matches (or is defined) consume it, and don't give it back (don't backtrack). In this case it's equivalent to(?(1) 1 )
. Which means match1
if1
is defined.polygenelubricants explains this kinds of lookaheads with backreferences very nicely in his answer for How can we match a^n b^n with Java regex?. (He has also written about other impressive tricks for Java regex involving backreferences and lookarounds.)
Answer to question 2
Plain matching
When just using matching and requiring the answer (count) in the number of matches, then the question 2 answer would be:
It can not be directly solved in regex flavors that have a limited lookbehind. While other flavors like Java and .NET could (as for example in m.buettner's .NET solution).
Thus plain regex matches in Perl and PCRE (PHP, etc) cannot directly answer this question in this case.
(Semi?)proof
Assume that no variable length lookbehinds are available.
You have to in some way count the number of characters on a line before an
X
.Only way to do that is to match them, and since no variable length lookbehinds are available you have to start the match (at least) at the beginning of the line.
If you start the match at the beginning of a line you can only get at most one match per line.
Since there can be multiple occurrences per line, this would not count them all and would not give a correct answer.
Length/indirect solution
On the other hand if we accept the answer as the length of a match or substitution result, then the 2nd question can be answered in PCRE and Perl (and other flavors).
This solution is based on/inspired by m.buettner's nice "partial PCRE solution".
One could simply replace all matches of the following expression with
$3
, getting the answer to question two (the number of patterns of interests) as the length of the resulting string.Which in Perl could be written as:
Online demo
This expression is similar to the solution to question 1 above, with some modifications to include
X
in the characters matched in the first lookahead, wrapped with a quantifier and counting number of matches of the quantifier.Except for direct matches this is as close as it gets (extra code wise besides regex), and could be an acceptable answer to question 2.
Test cases
Some test cases and results for the above solution. Result showing the numerical answer (length of the resulting string) and in parenthesis the resulting string after the substitution(s).