Viewed   77 times

PHP

Hi, I have been struggling with this problem for awhile and can not find a solution to it and was wondering if anyone could help.

I need to group similar strings for example:

Input

Slim Aluminium HDMI Lead, 1m Blue
Slim Aluminium HDMI Lead, 2m Blue
Slim Aluminium HDMI Lead, 3m Blue
Frozen Kids Headphones with Volume Limiter
XLR Plug to Socket Lead, 3m
XLR Plug to Socket Lead, 6m
Monster High Kids Headphones with Volume Limiter
TMNT Kids Headphones with Volume Limiter
Batman Kids Headphones with Volume Limiter
1-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 45mm
2-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 100mm
1-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 45mm
2-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 100mm
Slim Aluminium HDMI Lead, 5m Blue
Slim Aluminium HDMI Lead, 7.5m Blue
6.35mm (1/4") Mono Jack to Jack Guitar Lead, 5m Orange
XLR Plug to Socket Lead, 0.5m
XLR Plug to Socket Lead, 1m
XLR Plug to Socket Lead, 2m

Output (Grouped In Array)

Slim Aluminium HDMI Lead, 1m Blue
Slim Aluminium HDMI Lead, 2m Blue
Slim Aluminium HDMI Lead, 3m Blue
Slim Aluminium HDMI Lead, 5m Blue
Slim Aluminium HDMI Lead, 7.5m Blue

XLR Plug to Socket Lead, 0.5m
XLR Plug to Socket Lead, 1m
XLR Plug to Socket Lead, 2m
XLR Plug to Socket Lead, 3m
XLR Plug to Socket Lead, 6m

1-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 45mm
2-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 100mm
1-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 45mm
2-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 100mm

Frozen Kids Headphones with Volume Limiter
Monster High Kids Headphones with Volume Limiter
TMNT Kids Headphones with Volume Limiter
Batman Kids Headphones with Volume Limiter

6.35mm (1/4") Mono Jack to Jack Guitar Lead, 5m Orange

 Answers

1

Update: I've found a better solution

I've never played with php's similar_text() function but I thought I'd give it a shot...

$array = explode(PHP_EOL,'Slim Aluminium HDMI Lead, 1m Blue
Slim Aluminium HDMI Lead, 2m Blue
Slim Aluminium HDMI Lead, 3m Blue
Frozen Kids Headphones with Volume Limiter
XLR Plug to Socket Lead, 3m
XLR Plug to Socket Lead, 6m
Monster High Kids Headphones with Volume Limiter
TMNT Kids Headphones with Volume Limiter
Batman Kids Headphones with Volume Limiter
1-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 45mm
2-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 100mm
1-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 45mm
2-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 100mm
Slim Aluminium HDMI Lead, 5m Blue
Slim Aluminium HDMI Lead, 7.5m Blue
6.35mm (1/4") Mono Jack to Jack Guitar Lead, 5m Orange
XLR Plug to Socket Lead, 0.5m
XLR Plug to Socket Lead, 1m
XLR Plug to Socket Lead, 2m');

$output = array();
while( empty( $array ) === false )
{
  $currentWord = array_shift( $array );
  $currentGroup = array( $currentWord );
  foreach( $array as $index => $word )
  {
    if( similar_text( $word, $currentWord, $percentage ) and $percentage > 80 )
    {
      $currentGroup[] = $word;
      unset( $array[ $index ] );
    }
  }
  $output[] = $currentGroup;
}

print_r($output);

// Array
// (
//     [0] => Array
//         (
//             [0] => Slim Aluminium HDMI Lead, 1m Blue
//             [1] => Slim Aluminium HDMI Lead, 2m Blue
//             [2] => Slim Aluminium HDMI Lead, 3m Blue
//             [3] => Slim Aluminium HDMI Lead, 5m Blue
//             [4] => Slim Aluminium HDMI Lead, 7.5m Blue
//         )
// 
//     [1] => Array
//         (
//             [0] => Frozen Kids Headphones with Volume Limiter
//             [1] => Monster High Kids Headphones with Volume Limiter
//             [2] => TMNT Kids Headphones with Volume Limiter
//             [3] => Batman Kids Headphones with Volume Limiter
//         )
// 
//     [2] => Array
//         (
//             [0] => XLR Plug to Socket Lead, 3m
//             [1] => XLR Plug to Socket Lead, 6m
//             [2] => XLR Plug to Socket Lead, 0.5m
//             [3] => XLR Plug to Socket Lead, 1m
//             [4] => XLR Plug to Socket Lead, 2m
//         )
// 
//     [3] => Array
//         (
//             [0] => 1-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 45mm
//             [1] => 2-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 100mm
//             [2] => 1-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 45mm
//             [4] => 2-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 100mm
//         )
// 
//     [4] => Array
//         (
//             [0] => 6.35mm (1/4") Mono Jack to Jack Guitar Lead, 5m Orange
//         )
// 
// )

Assoc array edit

$products = array(
  array('id'=>'A','name'=>'Slim Aluminium HDMI Lead, 1m Blue'),
  array('id'=>'B','name'=>'Slim Aluminium HDMI Lead, 2m Blue'),
  array('id'=>'C','name'=>'Slim Aluminium HDMI Lead, 3m Blue'),
  array('id'=>'D','name'=>'1-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 45mm'),
  array('id'=>'E','name'=>'2-Gang Cable Entry Brush Wall Plate White/White Brushes 50 x 100mm'),
  array('id'=>'F','name'=>'1-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 45mm'),
  array('id'=>'G','name'=>'2-Gang Cable Entry Brush Wall Plate White/Black Brushes 50 x 100mm'),
  array('id'=>'H','name'=>'Slim Aluminium HDMI Lead, 5m Blue'),
  array('id'=>'I','name'=>'Slim Aluminium HDMI Lead, 7.5m Blue')
);
$output = array();
while( empty( $products) === false )
{
  $currentProduct = array_shift( $products );
  $currentGroup = array( $currentProduct );
  foreach( $products as $index => $product )
  {
    if( similar_text( $product['name'], $currentProduct['name'], $percentage ) and $percentage > 80 )
    {
      $currentGroup[] = $product;
      unset( $products[ $index ] );
    }
  }
  $output[] = $currentGroup;
}
print_r($output);
Friday, December 23, 2022
 
1

see similar_text(). And if you want to exclude spaces simple str_replace(' ', '', $string); prior.

echo similar_text ( 'LEGENDARY' , 'BARNEYSTINSON', $percent); // outputs 3
echo $percent; // outputs 27.272727272727

Here's another way using only unique characters

<?php
function unique_chars($string) {
   return count_chars(strtolower(str_replace(' ', '', $string)), 3);
}
function compare_strings($a, $b) {
    $index = similar_text(unique_chars($a), unique_chars($b), $percent);
    return array('index' => $index, 'percent' => $percent);
}
print_r( compare_strings('LEGENDARY', 'BARNEY STINSON') );

// outputs:
?>

Array
(
    [index] => 5
    [percent] => 55.555555555556
)
Thursday, August 4, 2022
4

Using the in-php-array-is-the-duct-tape-of-the-universe way :P

function get_all_substrings($input, $delim = '') {
    $arr = explode($delim, $input);
    $out = array();
    for ($i = 0; $i < count($arr); $i++) {
        for ($j = $i; $j < count($arr); $j++) {
            $out[] = implode($delim, array_slice($arr, $i, $j - $i + 1));
        }       
    }
    return $out;
}

$subs = get_all_substrings("a b c", " ");
print_r($subs);
Sunday, December 25, 2022
 
1

If you just want to simply group by longest common prefix (that means "apple ipad mini" will be chosen even though "apple ipad" would yield a larger group), then maybe something like this?

sort the list
i = 0
while i < end of list:
  key = longest common prefix of list[i] & list[i + 1]
        or list[i] if the common prefix is less than (1?) words or i is the last index
  groups[key] = list[i++]
  while key is prefix of list[i]:
    add list[i++] to groups[key]
Saturday, October 8, 2022
 
k_anas
 
5

I would recommend looking into the networkx package. It allows you to create directed graphs such as you are describing. For measuring the similarity of 2 routes, I would recommend the Jaccard similarity index. Here is code showing the example you illustrated.

First, import a few libraries: graphs, plotting, and numeric python. Then build the directed graph by adding nodes numbered 1 to 8. Build the connections from node to node to build your path. The networkx package has built-in capability to find all paths in the graph from one node to another: nx.all_simple_paths(g, start_node, end_node).

Once you have all of the paths, you can compute the a matrix, J, of the Jaccard similarity between the paths. How you actually want to cluster the paths by their similarity is up to you.

import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

g = nx.DiGraph()
g.add_nodes_from(range(1,8))
g.add_edges_from([(1,2),(2,3),(3,4),(4,7)]) #path 1,2,3,4,7
g.add_edges_from([(4,5),(5,7)])             #path 1,2,3,4,5,7
g.add_edges_from([(4,6),(6,7)])             #path 1,2,3,4,6,7
paths_iter = nx.all_simple_paths(g,1,7)
paths = [p for p in paths]

np.random.seed(100000)
nx.draw_spring(g, with_labels=True)
plt.show()

def jaccard(v1, v2):
    return (len(np.intersect1d(v1,v2))+0.0)/len(np.union1d(v1,v2))

J = np.zeros([len(paths),len(paths)])
for i in range(J.shape[0]):
    for j in range(i, J.shape[1]):
        J[i,j] = J[j,i] = jaccard(paths[i],paths[j])

print J

> [[ 1.          0.71428571  0.83333333]
>  [ 0.71428571  1.          0.83333333]
>  [ 0.83333333  0.83333333  1.        ]]

EDIT Since you have a metric to compare the similarity of the paths to each other, look into k-means clustering to cluster the paths together.

from scipy.cluster.vq import kmeans2

I don't have enough of your code or your data to help anymore from this point.

Monday, September 26, 2022
 
sgvd
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :