Asked  2 Years ago    Answers:  5   Viewed   604 times

I'm using PHP DOM and I'm trying to get an element within a DOM node that have a given class name. What's the best way to get that sub-element?

Update: I ended up using Mechanize for PHP which was much easier to work with.

 Answers

2

Update: Xpath version of *[@class~='my-class'] css selector

So after my comment below in response to hakre's comment, I got curious and looked into the code behind Zend_Dom_Query. It looks like the above selector is compiled to the following xpath (untested):

[contains(concat(' ', normalize-space(@class), ' '), ' my-class ')]

So the PHP would be:

$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

Basically, all we do here is normalize the class attribute so that even a single class is bounded by spaces, and the complete class list is bounded in spaces. Then append the class we are searching for with a space. This way we are effectively looking for and find only instances of my-class .


Use an xpath selector?

$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$classname="my-class";
$nodes = $finder->query("//*[contains(@class, '$classname')]");

If it is only ever one type of element you can replace the * with the particular tagname.

If you need to do a lot of this with very complex selector I would recommend Zend_Dom_Query which supports CSS selector syntax (a la jQuery):

$finder = new Zend_Dom_Query($html);
$classname = 'my-class';
$nodes = $finder->query("*[class~="$classname"]");
Friday, November 4, 2022
5

You can use XPath on your DOMDocument as follows:

$doc->loadHTML($article_header);
$xpath = new DOMXpath($doc);

$imagesAndIframes = $xpath->query('//img | //iframe');

$length = $imagesAndIframes->length;
for ($i = 0; $i < $length; $i++) {
    $element = $imagesAndIframes->item($i);

    if ($element->tagName == 'img') {
        echo 'img';
    } else {
        echo 'iframe';
    }
}
Monday, October 17, 2022
 
sinned
 
2

You can set the user agent in php.ini, without the need for curl. Just use the below lines before you load the DOMDocument

$agent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";
ini_set('user_agent', $agent);

And then your code:

$doc = new DOMDocument();
@$doc->loadHTMLFile('http://www.facebook.com');
$xpath = new DOMXPath($doc);
echo $xpath->query('//title')->item(0)->nodeValue."n";
Wednesday, December 7, 2022
 
beano
 
2

Use this:

$img = $dom->getElementsByTagName('img')->item(0);
echo $img->attributes->getNamedItem("src")->value;
Wednesday, August 24, 2022
 
5

I think it has to do with how you're iterating. You're changing the result list as it's being iterated against, so it winds up breaking (side-effects). Try changing your loop to this:

$nodes = $root->getElementsByTagNameNS($root->lookupNamespaceURI('zuq'), 'data');
$i = $nodes->length - 1;
while ($i >= 0) {
    $node = $nodes->item($i);
    $node->parentNode->replaceChild(
        $node->ownerDocument->createTextNode('foo'), 
        $node
    );
    $i--;
}

Basically, it just iterates backwards over the list of nodes, so that when nodes are removed, they are removed from the end rather than the beginning...

Saturday, November 5, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 

Browse Other Code Languages