Asked  2 Years ago    Answers:  5   Viewed   159 times

I need to parse an HTML document and to find all occurrences of string asdf in it.

I currently have the HTML loaded into a string variable. I would just like the character position so I can loop through the list to return some data after the string.

The strpos function only returns the first occurrence. How about returning all of them?



Without using regex, something like this should work for returning the string positions:

$html = "dddasdfdddasdffff";
$needle = "asdf";
$lastPos = 0;
$positions = array();

while (($lastPos = strpos($html, $needle, $lastPos))!== false) {
    $positions[] = $lastPos;
    $lastPos = $lastPos + strlen($needle);

// Displays 3 and 10
foreach ($positions as $value) {
    echo $value ."<br />";
Sunday, October 2, 2022

A regex would be simplest:

$input = 'foo_left.jpg';
if(!preg_match('/_(left|right|center)/', $input, $matches)) {
    // no match

$pos = $matches[0]; // "_left", "_right" or "_center"

See it in action.


For a more defensive-minded approach (if there might be multiple instances of "_left" and friends in the filename), you can consider adding to the regex.

This will match only if the l/r/c is followed by a dot:

preg_match('/(_(left|right|center))./', $input, $matches);

This will match only if the l/r/c is followed by the last dot in the filename (which practically means that the base name ends with the l/r/c specification):

preg_match('/(_(left|right|center))\.[^\.]*$/', $input, $matches);

And so on.

If using these regexes, you will find the result in $matches[1] instead of $matches[0].

Saturday, August 27, 2022

This can't work properly. Stored with Unicode there are many more Characters than with ANSI. So if you "convert" to ANSI, you will loose lots of charackters.

You can use Unicode (UTF-8) charset with htmlentities:

string htmlentities ( string $string [, int $flags = ENT_COMPAT [, string $charset [, bool $double_encode = true ]]] )

htmlentities($myString, ENT_COMPAT, "UTF-8"); should work.

Saturday, October 15, 2022

One way to do this is to find the indices using list comprehension:

currentWord = "hello"

guess = "l"

occurrences = currentWord.count(guess)

indices = [i for i, a in enumerate(currentWord) if a == guess]

print indices


[2, 3]
Friday, September 2, 2022

You can keep track of the index:

int index = theString.indexOf("the");
while(index >= 0) {
    index = theString.indexOf("the", index+1);
Sunday, September 4, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :

Browse Other Code Languages