Asked  2 Years ago    Answers:  5   Viewed   67 times

Where can I find algorithms that values the spelling of misplaced characters more accurately than levenshtein() and php similar_text() methods?

Example:

similar_text('jonas', 'xxjon', $similar); echo $similar; // returns 60
similar_text('jonas', 'asjon', $similar); echo $similar; // returns 60 <- although more similar!
echo levenshtein('jonas', 'xxjon'); // returns 4
echo levenshtein('jonas', 'asjon'); // returns 4  <- although more similar!

/ Jonas

 Answers

4

Here's a solution that I've come up to. It's based on Tim's suggestion of comparing the order of subsequent charachters. Some results:

  • jonas / jonax : 0.8
  • jonas / sjona : 0.68
  • jonas / sjonas : 0.66
  • jonas / asjon : 0.52
  • jonas / xxjon : 0.36

I'm sure i isn't perfect, and that it could be optimized, but nevertheless it seems to produce the results that I'm after... One weak spot is that when strings have different length, it produces different result when the values are swapped...

static public function string_compare($str_a, $str_b) 
{
    $length = strlen($str_a);
    $length_b = strlen($str_b);

    $i = 0;
    $segmentcount = 0;
    $segmentsinfo = array();
    $segment = '';
    while ($i < $length) 
    {
        $char = substr($str_a, $i, 1);
        if (strpos($str_b, $char) !== FALSE) 
        {               
            $segment = $segment.$char;
            if (strpos($str_b, $segment) !== FALSE) 
            {
                $segmentpos_a = $i - strlen($segment) + 1;
                $segmentpos_b = strpos($str_b, $segment);
                $positiondiff = abs($segmentpos_a - $segmentpos_b);
                $posfactor = ($length - $positiondiff) / $length_b; // <-- ?
                $lengthfactor = strlen($segment)/$length;
                $segmentsinfo[$segmentcount] = array( 'segment' => $segment, 'score' => ($posfactor * $lengthfactor));
            } 
            else 
            {
                $segment = '';
                $i--;
                $segmentcount++;
            } 
        } 
        else 
        {
            $segment = '';
            $segmentcount++;
        }
        $i++;
    }   

    // PHP 5.3 lambda in array_map      
    $totalscore = array_sum(array_map(function($v) { return $v['score'];  }, $segmentsinfo));
    return $totalscore;     
}
Monday, August 1, 2022
1
function substr_with_ellipsis($string, $chars = 100)
{
    preg_match('/^.{0,' . $chars. '}(?:.*?)b/iu', $string, $matches);
    $new_string = $matches[0];
    return ($new_string === $string) ? $string : $new_string . '&hellip;';
}
Tuesday, December 20, 2022
1

You should point to your vendor/autoload.php at Settings | PHP | PHPUnit when using PHPUnit via Composer.

This blog post has all the details (with pictures) to successfully configure IDE for such scenario: http://confluence.jetbrains.com/display/PhpStorm/PHPUnit+Installation+via+Composer+in+PhpStorm

Related usability ticket: http://youtrack.jetbrains.com/issue/WI-18388

P.S. The WI-18388 ticket is already fixed in v8.0

Sunday, July 31, 2022
 
4

On Mac OS X environment variables available in Terminal and for the normal applications can be different, check the related question for the solution how to make them similar.

Note that this solution will not work on Mountain Lion (10.8).

Sunday, November 6, 2022
3

I'm so sorry. Issue was simple, even if it has gone me mad for a pair of weeks! Look at fetch file "Fetch_User_Data.php" and then look at string where I call it as "FetchUserData.php" and you'll be done...

Sunday, November 13, 2022
 
phyatt
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 

Browse Other Code Languages