Viewed   63 times

I need to match parts of string whilst ignoring HTML tags. Which means if user wants to look for string "foo and foo1" in source code.

Two strings, <u>foo</u> and foo1

He'd not get the match, because of the tags.

I've tried regex, but since the tags can and don't have to be there, it seems rather too complicated.

It's not server-side script. It'd be an application run from console.

To be more specific: it is for syntax highlight. So user wants "foo and foo1" to be italic, but part of it is already underline and wouldn't match anyway. That's why I can't strip the string.

 Answers

3

Use the PHP function strip_tags to remove the HTML tags from the text. Then do your search.

http://php.net/manual/en/function.strip-tags.php

Wednesday, October 12, 2022
5

The Arabic regex is:

[u0600-u06FF]

Actually, ?-? is a subset of this Arabic range, so I think you can remove them from the pattern.

So, in JS it will be

/^[a-z0-9+,()/'su0600-u06FF-]+$/i

See regex demo

Tuesday, October 11, 2022
3

Taking into consideration that parsing html with regex is a bad idea, here is a solution that does just that :)

EDIT: Just to be clear: This is not a valid solution, it was meant as an exercise that made very lenient assumptions about the input string, and as such should be taken with a grain of salt. Read the link above and see why parsing html with regex can never be done.

function htmlSubstring(s, n) {
    var m, r = /<([^>s]*)[^>]*>/g,
        stack = [],
        lasti = 0,
        result = '';

    //for each tag, while we don't have enough characters
    while ((m = r.exec(s)) && n) {
        //get the text substring between the last tag and this one
        var temp = s.substring(lasti, m.index).substr(0, n);
        //append to the result and count the number of characters added
        result += temp;
        n -= temp.length;
        lasti = r.lastIndex;

        if (n) {
            result += m[0];
            if (m[1].indexOf('/') === 0) {
                //if this is a closing tag, than pop the stack (does not account for bad html)
                stack.pop();
            } else if (m[1].lastIndexOf('/') !== m[1].length - 1) {
                //if this is not a self closing tag than push it in the stack
                stack.push(m[1]);
            }
        }
    }

    //add the remainder of the string, if needed (there are no more tags in here)
    result += s.substr(lasti, n);

    //fix the unclosed tags
    while (stack.length) {
        result += '</' + stack.pop() + '>';
    }

    return result;

}

Example: http://jsfiddle.net/danmana/5mNNU/

Note: patrick dw's solution may be safer regarding bad html, but I'm not sure how well it handles white spaces.

Monday, November 21, 2022
3

For this PHP regex:

$str = preg_replace ( '{(.)1+}', '$1', $str );
$str = preg_replace ( '{[ '-_()]}', '', $str )

In Java:

str = str.replaceAll("(.)\1+", "$1");
str = str.replaceAll("[ '-_\(\)]", "");

I suggest you to provide your input and expected output then you will get better answers on how it can be done in PHP and/or Java.

Sunday, October 9, 2022
 
haodong
 
1

You can solve this easily with jQuery:

jQuery.trim( jQuery('label').text() )

That will strip the tags for you, and produce $36.07 which you can then test with a much simpler regex.


(If you're not currently using jQuery, and don't want to use it, you can still take a look at the source code for it and see how they've implement the .text() function in order to emulate it.)


Hmmm, Re-reading your question, you might be asking something else - to retrieve all labels containing $ (and ignore the inputs) you can do:

jQuery('label:contains($)')

or

jQuery('label').each( checkForDollars );

function checkForDollars()
{
    if ( jQuery(this).text().matches(/$d{2,5}/) } )
    {
        // do something
    }
)
Monday, November 21, 2022
 
aosmith
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :