Viewed   47 times

I originally asked a question along these lines using Regex but was recommended to use the PHP DOM library instead... which is superior, but I am still stuck.

Basically, I want to wrap the contents of an <a> in a <span> if it is not already wrapped in <span>.

<?php
$input = <<<EOT
<html><head></head>
<body bgcolor="#393a36">
    <a href="#"><span style="color:#ffffff;">Link 1</span></a>
    <a href="#">Link 2</a>
    <a href="#"><img src="mypic.gif" />Image Link</a>
    <a href="#"><u>Underlined Link</u></a>
</body>
</html>
EOT;


$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
    $spancount = $tag->getElementsByTagName("span")->length;
    if($spancount == 0){
        $content = nodeContent($tag);
        $element = $doc->createElement('span');
        $element->setAttribute('style','color:#ffffff;');
        $frag = $doc->createDocumentFragment();
        $frag->appendXML($content);
        $element->appendChild($frag);   
        $tag->nodeValue = ""; //clear node
        $tag->appendChild($element);
    }
}
echo $doc->saveHTML();

function nodeContent($n, $outer=false) { 
    $d = new DOMDocument('1.0'); 
    $d->formatOutput = true;
    $b = $d->importNode($n->cloneNode(true),true); 
    $d->appendChild($b);
    $h = $d->saveHTML(); 
    // remove outter tags 
    if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4)); 
    return $h; 
} 

It provides this output:

PHP Warning: DOMDocumentFragment::appendXML(): Entity: line 1: parser error : Premature end of data in tag img line 1 in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24
PHP Warning: DOMDocumentFragment::appendXML(): Image Link in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24 PHP Warning: DOMDocumentFragment::appendXML(): ^ in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 24 PHP Warning: DOMNode::appendChild(): Document Fragment is empty in /private/var/folders/78/78vHGigZHcuFeXB1KKJSb++++TI/-Tmp-/untitled_3xd..php on line 25

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>  
<head></head>  
<body bgcolor="#393a36">  
    <a href="#"><span style="color:#ffffff;">Link 1</span></a>  
    <a href="#"><span style="color:#ffffff;">Link 2</span></a>  
    <a href="#"><span style="color:#ffffff;"></span></a>  
    <a href="#"><span style="color:#ffffff;"><u>Underlined Link</u></span></a>  
</body>  
</html>

This mostly works, except that it is really picky, and as you can see it dies if here is an img (or similar) tag inside the a href.

What is the best way to make this work. I've been banging my head against for an embarrassing long time now.

EDIT

Based on feedback below, here is the revised code and output. Note that the text preceding the img tag isn't being wrapped for some reason. Any Ideas?

$doc = new DOMDocument();
$doc->loadHTML($input);
$tags = $doc->getElementsByTagName('a');
foreach ($tags as $tag) {
    $spancount = $tag->getElementsByTagName("span")->length;
    if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    foreach ($tag->childNodes as $child) {
        $tag->removeChild($child);
        $element->appendChild($child);
    }
    $tag->appendChild($element);

    }
}
echo $doc->saveHTML();

Output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head></head>
<body bgcolor="#393a36">
    <a href="#"><span style="color:#ffffff;">Link 1</span></a>
    <a href="#"><span style="color:#ffffff;">Link 2</span></a>
    <a href="#">Image Link<span style="color:#ffffff;"><img src="mypic.gif"></span></a>
    <a href="#"><span style="color:#ffffff;"><u>Underlined Link</u></span></a>
</body>
</html>

 Answers

2

Why bother with re-creating the node? Why not just replace the node? (If I understand what you're trying to do)...

if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    $tag->parentNode->replaceChild($element, $tag);
    $element->apendChild($tag);
}

Edit Whoops, it looks like you're trying to wrap everything under $tag in the span... Try this instead:

if($spancount == 0){
    $element = $doc->createElement('span');
    $element->setAttribute('style','color:#ffffff;');
    foreach ($tag->childNodes as $child) {
        $tag->removeChild($child);
        $element->appendChild($child);
    }
    $tag->appendChild($child);
}

Edit2 Based on your results, it looks like that foreach is not completing because of the node removal... Try replacing the foreach with this:

while ($tag->childNodes->length > 0) {
    $child = $tag->childNodes->item(0);
    $tag->removeChild($child);
    $element->appendChild($child);
}
Saturday, August 27, 2022
 
squeek
 
4
$d = new DOMDocument();
$d->loadXML($xml);
$x = new DOMXPath($d);
$result = $x->evaluate("//text()[contains(.,'617.99')]/ancestor::*/@id");
$unique = null;
for($i = $result->length -1;$i >= 0 && $item = $result->item($i);$i--){
    if($x->query("//*[@id='".addslashes($item->value)."']")->length == 1){
        echo 'Unique ID is '.$item->value."n";
            $unique = $item->value;
        break;
    }
}
if(is_null($unique)) echo 'no unique ID found';
Tuesday, August 30, 2022
1

Try the following CSS selector

b > span.marked

That would return the span though, so you probably have to do $e->parent() to get to the b element.

Also see Best Methods to parse HTML for alternatives to SimpleHtmlDom


Edit after update:

Your browser will modify the DOM. If you look at your markup, you will see that there is no tbody elements. Yet Firebug gives you

html body div#wrapper table.desc tbody tr td div span.marked'
html body div#wrapper table.desc tbody tr td table.split tbody tr td b'

Also, your question does not match the queries. You asked how to find

elements surrounded with the <b>,</b>-tags followed by a <span class="marked">

That can be read to either mean

<b><span class="marked">foo</span></b>

or

<b><element>foo</element></b><span class="marked">foo</span>

For that first use the child combinator I have shown earlier. For the second, use the adjacent sibling combinator

b + span.marked

to get the span and then use $e->prev_sibling() to return the previous sibling of element (or null if not found).

However, in your shown markup, there is neither nor. There is only a DIV with a SPAN child having the marked class

<div style="text-align: center"> <span class="marked">marked</span>

If that is what you want to match, it's the child combinator again. Of course, you have to change the b then to a div.

Wednesday, September 28, 2022
 
sanya
 
2

Since the answer in the linked duplicate is not that comprehensive, I'll give an example:

$dom = new DOMDocument;
$dom->loadXml($html); // use loadHTML if its invalid (X)HTML

// create the new element
$newNode = $dom->createElement('div', 'this is new');
$newNode->setAttribute('id', 'new_div');

// fetch and replace the old element
$oldNode = $dom->getElementById('old_div');
$oldNode->parentNode->replaceChild($newNode, $oldNode);

// print xml
echo $dom->saveXml($dom->documentElement);

Technically, you don't need XPath for this. However, it can happen that your version of libxml cannot do getElementById for non-validated documents (id attributes are special in XML). In that case, replace the call to getElementById with

$xp = new DOMXPath($dom);
$oldNode = $xp->query('//div[@id="old_div"]')->item(0);

Demo on codepad


To create a $newNode with child nodes without having to to create and append elements one by one, you can do

$newNode = $dom->createDocumentFragment();
$newNode->appendXML('
<div id="new_div">
    <p>some other text</p>
    <p>some other text</p>
    <p>some other text</p>
    <p>some other text</p>
</div>
');
Wednesday, September 7, 2022
5

As I've worked a lot with "non-english" characters, several things are required for proper display and storage of those characters.

In no particular order (as I don't know what charset is best suited for Persian, I'll use UTF-8, if it's different, you just use the one you need):

Tell your browser what charset you are using, either by setting the proper header from PHP header('Content-type: text/html; charset=utf-8'); or set the meta tag in your html like so: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

In the database avoid mixing different collations and charsets in the columns/tables. I always set the database, the tables and the columns to utf8_general_ci which for my needs work all the time (languages like English, German, Serbian, Hungarian...).

As Jan said, read http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html You'll most likely need to execute query something like SET NAMES utf8 right after connecting to the database.

All this should ensure the proper displaying of unicode characters. However, there is one more thing that can override all this - the web server. Apache (don't know about the other servers) has a AddDefaultCharset directive. On most setups this is left as Off, but I did came across setups where the default charset was set to latin1, thus overriding all my charset settings. If this is set, it is set in the httpd.conf (or similar configuration file). If you have access to it, I recommend setting it to Off. If you don't, then you can override the global value with .htaccess placed in your webroot, with something like: AddDefaultCharset utf-8

Wednesday, December 21, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :