Viewed   76 times

I am working with the Amazon Mechanical Turk API and it will only allow me to use regular expressions to filter a field of data.

I would like to input an integer range to a function, such as 256-311 or 45-1233, and return a regex that would match only that range.

A regex matching 256-321 would be:

b((25[6-9])|(2[6-9][0-9])|(3[0-1][0-9])|(32[0-1]))b

That part is fairly easy, but I am having trouble with the loop to create this regex.

I am trying to build a function defined like this:

function getRangeRegex( int fromInt, int toInt)
{

      return regexString;
}

I looked all over the web and I am surprised that it doesn't look like anyone has solved this in the past. It is a difficult problem...

Thanks for your time.

 Answers

2

Here's a quick hack:

<?php

function regex_range($from, $to) {

  if($from < 0 || $to < 0) {
    throw new Exception("Negative values not supported"); 
  }

  if($from > $to) {
    throw new Exception("Invalid range $from..$to, from > to"); 
  }

  $ranges = array($from);
  $increment = 1;
  $next = $from;
  $higher = true;

  while(true) {

    $next += $increment;

    if($next + $increment > $to) {
      if($next <= $to) {
        $ranges[] = $next;
      }
      $increment /= 10;
      $higher = false;
    }
    else if($next % ($increment*10) === 0) {
      $ranges[] = $next;
      $increment = $higher ? $increment*10 : $increment/10;
    }

    if(!$higher && $increment < 10) {
      break;
    }
  }

  $ranges[] = $to + 1;

  $regex = '/^(?:';

  for($i = 0; $i < sizeof($ranges) - 1; $i++) {
    $str_from = (string)($ranges[$i]);
    $str_to = (string)($ranges[$i + 1] - 1);

    for($j = 0; $j < strlen($str_from); $j++) {
      if($str_from[$j] == $str_to[$j]) {
        $regex .= $str_from[$j];
      }
      else {
        $regex .= "[" . $str_from[$j] . "-" . $str_to[$j] . "]";
      }
    }
    $regex .= "|";
  }

  return substr($regex, 0, strlen($regex)-1) . ')$/';
}

function test($from, $to) {
  try {
    printf("%-10s %sn", $from . '-' . $to, regex_range($from, $to));
  } catch (Exception $e) {
    echo $e->getMessage() . "n";
  }
}

test(2, 8);
test(5, 35);
test(5, 100);
test(12, 1234);
test(123, 123);
test(256, 321);
test(256, 257);
test(180, 195);
test(2,1);
test(-2,4);

?>

which produces:

2-8        /^(?:[2-7]|8)$/
5-35       /^(?:[5-9]|[1-2][0-9]|3[0-5])$/
5-100      /^(?:[5-9]|[1-9][0-9]|100)$/
12-1234    /^(?:1[2-9]|[2-9][0-9]|[1-9][0-9][0-9]|1[0-2][0-3][0-4])$/
123-123    /^(?:123)$/
256-321    /^(?:25[6-9]|2[6-9][0-9]|3[0-2][0-1])$/
256-257    /^(?:256|257)$/
180-195    /^(?:18[0-9]|19[0-5])$/
Invalid range 2..1, from > to
Negative values not supported

Not properly tested, use at your own risk!

And yes, the generated regex could be written more compact in many cases, but I leave that as an exercise for the reader :)

Monday, September 19, 2022
 
5

In case you don't want a regex (including the strripos() following xzyfer comment):

<?php 

function stripTypeTitle($title) {
    $dvdpos = strripos($title, 'dvd');
    $bluraypos = strripos($title, 'bluray');
    if ($dvdpos !== false && $dvdpos > $bluraypos) {
        $title = substr($title, 0, $dvdpos);
    }
    if ($bluraypos !== false && $bluraypos > $dvdpos) {
        $title = substr($title, 0, $bluraypos);
    }
    return $title;
}

$title = "Avatar DVD 2009";
echo stripTypeTitle($title)."<br/>";
$title = "War of the Roses DVD 1989 Region 1 US import";
echo stripTypeTitle($title)."<br/>";
$title = "Wanted Bluray 2008 US Import";
echo stripTypeTitle($title)."<br/>";
$title = "This Bluray is Wanted DVD 2008 US Import";
echo stripTypeTitle($title)."<br/>";
$title = "This DVD is Wanted Bluray 2008 US Import";
echo stripTypeTitle($title)."<br/>";

?>

Prints:

Avatar
War of the Roses
Wanted
This Bluray is Wanted
This DVD is Wanted 
Saturday, November 26, 2022
1

If this is written in C, then you are pretty close. Compiling this code:

#include <stdio.h>
#include <stdlib.h>

int main() {
    int i;
    for (i = 0; i < 1000000; i++) {
        printf("%dn", rand()%(35-18+1)+18);
    }
}

And running it in a pipeline produces this output:

chris@zack:~$ gcc -o test test.c
chris@zack:~$ ./test | sort | uniq -c
  55470 18
  55334 19
  55663 20
  55463 21
  55818 22
  55564 23
  55322 24
  55886 25
  55947 26
  55554 27
  55342 28
  55526 29
  55719 30
  55435 31
  55669 32
  55818 33
  55205 34
  55265 35

The key is you forgot to add 1 -- the fencepost error.

You can generalize this into a function:

int random_between(int min, int max) {
    return rand() % (max - min + 1) + min;
}
Monday, September 19, 2022
 
5

Unfortunately there's no easy way to define ranges in regex. If you are to use the range 1-23 you'll end up with a regex like this:

([1-9]|1[0-9]|2[0-3])

Explanation:

  1. Either the value is 1-9
  2. or the value starts with 1 and is followed with a 0-9
  3. or the value starts with 2 and is followed with a 0-3
Monday, September 5, 2022
 
mmvie
 
1

[Solved]

We just need to add enter/add new line after close the class or function tag

So maybe this is a bug

Thanks

Sunday, December 18, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :