Viewed   229 times

How can I make a RegEx in PHP that only accepts 3-9 letters (uppercase) and 5-50 numbers?

I'm not that good at regular expressions. But this one doesn't work:

/[A-Z]{3,9}[0-9]{5,50}/

For instance, it matches ABC12345 but not A12345BC

Any ideas?

 Answers

4

This is a classic "password validation"-type problem. For this, the "rough recipe" is to check each condition with a lookahead, then we match everything.

^(?=(?:[^A-Z]*[A-Z]){3,9}[^A-Z]*$)(?=(?:[^0-9]*[0-9]){5,50}[^0-9]*$)[A-Z0-9]*$

I'll explain this one below, but here's a variation that I'll leave for you to figure out.

^(?=(?:[^A-Z]*[A-Z]){3,9}[0-9]*$)(?=(?:[^0-9]*[0-9]){5,50}[A-Z]*$).*$

Let's look at the first regex piece by piece.

  1. We anchor the regex between the head of string ^ and end of string $ assertions, ensuring that the match (if any) is the whole string.
  2. We have two lookaheads: one for the capital letters, one for the digits.
  3. After the lookaheads, [A-Z0-9]* matches the whole string (if it consists only of uppercase ASCII letters and digits). (Thanks to @TimPietzcker for pointing out that I was asleep at the wheel for starting out with a dot-star there.)

How do the lookaheads work?

The (?:[^A-Z]*[A-Z]){3,9}[^A-Z]*$) asserts that at the current position, i.e. the beginning of the string, we are able to match "any number of characters that are not capital letters, followed by a single capital letter", 3 to 9 times. This ensures we have enough capital letters. Note that the {3,9} is greedy, so we will match as many capital letters as possible. But we don't want to match more than we wish to allow, so after the expression quantifies by {3,9}, the lookahead checks that we can match "zero or any number" of characters that are not a capital letter, until the end of the string, marked by the anchor $.

The second lookahead works in similar fashion.

For a more in-depth explanation of this technique, you may want to peruse the password validation section of this page about regex lookarounds.

In case you are interested, here is a token-by-token explanation of the technique.

^                      the beginning of the string
(?=                    look ahead to see if there is:
 (?:                   group, but do not capture (between 3 and 9 times)
  [^A-Z]*              any character except: 'A' to 'Z' (0 or more times)
   [A-Z]               any character of: 'A' to 'Z'
 ){3,9}                end of grouping
  [^A-Z]*              any character except: 'A' to 'Z' (0 or more times)
$                      before an optional n, and the end of the string
)                      end of look-ahead
(?=                    look ahead to see if there is:
 (?:                   group, but do not capture (between 5 and 50 times)
  [^0-9]*              any character except: '0' to '9' (0 or more times)
   [0-9]               any character of: '0' to '9'
 ){5,50}               end of grouping
  [^0-9]*              any character except: '0' to '9' (0 or more times)
$                      before an optional n, and the end of the string
)                      end of look-ahead
[A-Z0-9]*              any character of: 'A' to 'Z', '0' to '9' (0 or more times)
$                      before an optional n, and the end of the string
Monday, September 5, 2022
 
1

Such a code does either consist of

  • an arbitrary count (minimum 1) of letters followed by one number and an arbitrary count (minimum 0) of letters and/or numbers
  • or an arbitrary count (minimum 1) of numbers followed by one letter and an arbitrary count (minimum 0) of letters and/or numbers

written as a capture group:

((?:[a-zA-Z]+[0-9]|[0-9]+[a-zA-Z])[a-zA-Z0-9]*)
Saturday, November 26, 2022
 
bzezzz
 
5

The Arabic regex is:

[u0600-u06FF]

Actually, ?-? is a subset of this Arabic range, so I think you can remove them from the pattern.

So, in JS it will be

/^[a-z0-9+,()/'su0600-u06FF-]+$/i

See regex demo

Tuesday, October 11, 2022
3

You can use Jonny 5 method that consists to write all the characters you need in the character class. You can use the predefined class p{Latin} that contains all latin letters (and accentued letters too):

$content = preg_replace('~[^p{Latin}0-9]+~u', '', $string); 

If you want all letters or digits "of the world":

$content = preg_replace('~P{Xan}+~u', '', $string); 
Wednesday, August 31, 2022
 
3

For this PHP regex:

$str = preg_replace ( '{(.)1+}', '$1', $str );
$str = preg_replace ( '{[ '-_()]}', '', $str )

In Java:

str = str.replaceAll("(.)\1+", "$1");
str = str.replaceAll("[ '-_\(\)]", "");

I suggest you to provide your input and expected output then you will get better answers on how it can be done in PHP and/or Java.

Sunday, October 9, 2022
 
haodong
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :