Asked  2 Years ago    Answers:  5   Viewed   159 times

I need to parse an HTML document and to find all occurrences of string asdf in it.

I currently have the HTML loaded into a string variable. I would just like the character position so I can loop through the list to return some data after the string.

The strpos function only returns the first occurrence. How about returning all of them?

 Answers

5

Without using regex, something like this should work for returning the string positions:

$html = "dddasdfdddasdffff";
$needle = "asdf";
$lastPos = 0;
$positions = array();

while (($lastPos = strpos($html, $needle, $lastPos))!== false) {
    $positions[] = $lastPos;
    $lastPos = $lastPos + strlen($needle);
}

// Displays 3 and 10
foreach ($positions as $value) {
    echo $value ."<br />";
}
Sunday, October 2, 2022
4

A regex would be simplest:

$input = 'foo_left.jpg';
if(!preg_match('/_(left|right|center)/', $input, $matches)) {
    // no match
}

$pos = $matches[0]; // "_left", "_right" or "_center"

See it in action.

Update:

For a more defensive-minded approach (if there might be multiple instances of "_left" and friends in the filename), you can consider adding to the regex.

This will match only if the l/r/c is followed by a dot:

preg_match('/(_(left|right|center))./', $input, $matches);

This will match only if the l/r/c is followed by the last dot in the filename (which practically means that the base name ends with the l/r/c specification):

preg_match('/(_(left|right|center))\.[^\.]*$/', $input, $matches);

And so on.

If using these regexes, you will find the result in $matches[1] instead of $matches[0].

Saturday, August 27, 2022
2

This can't work properly. Stored with Unicode there are many more Characters than with ANSI. So if you "convert" to ANSI, you will loose lots of charackters.

http://php.net/manual/en/function.htmlentities.php

You can use Unicode (UTF-8) charset with htmlentities:

string htmlentities ( string $string [, int $flags = ENT_COMPAT [, string $charset [, bool $double_encode = true ]]] )

htmlentities($myString, ENT_COMPAT, "UTF-8"); should work.

Saturday, October 15, 2022
 
zbr
 
zbr
2

One way to do this is to find the indices using list comprehension:

currentWord = "hello"

guess = "l"

occurrences = currentWord.count(guess)

indices = [i for i, a in enumerate(currentWord) if a == guess]

print indices

output:

[2, 3]
Friday, September 2, 2022
 
seb_t
 
4

You can keep track of the index:

int index = theString.indexOf("the");
while(index >= 0) {
    index = theString.indexOf("the", index+1);
    counter2++;
}
Sunday, September 4, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 

Browse Other Code Languages