Asked  2 Years ago    Answers:  5

I'm using PHP DOM and I'm trying to get an element within a DOM node that have a given class name. What's the best way to get that sub-element?

Update: I ended up using Mechanize for PHP which was much easier to work with.



Update: Xpath version of *[@class~='my-class'] css selector

So after my comment below in response to hakre's comment, I got curious and looked into the code behind Zend_Dom_Query. It looks like the above selector is compiled to the following xpath (untested):

[contains(concat(' ', normalize-space(@class), ' '), ' my-class ')]

So the PHP would be:

$dom = new DomDocument();
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

Basically, all we do here is normalize the class attribute so that even a single class is bounded by spaces, and the complete class list is bounded in spaces. Then append the class we are searching for with a space. This way we are effectively looking for and find only instances of my-class .

Use an xpath selector?

$dom = new DomDocument();
$finder = new DomXPath($dom);
$nodes = $finder->query("//*[contains(@class, '$classname')]");

If it is only ever one type of element you can replace the * with the particular tagname.

If you need to do a lot of this with very complex selector I would recommend Zend_Dom_Query which supports CSS selector syntax (a la jQuery):

$finder = new Zend_Dom_Query($html);
$classname = 'my-class';
$nodes = $finder->query("*[class~="$classname"]");
Friday, November 4, 2022

You can use XPath on your DOMDocument as follows:

$xpath = new DOMXpath($doc);

$imagesAndIframes = $xpath->query('//img | //iframe');

$length = $imagesAndIframes->length;
for ($i = 0; $i < $length; $i++) {
    $element = $imagesAndIframes->item($i);

    if ($element->tagName == 'img') {
        echo 'img';
    } else {
        echo 'iframe';
Monday, October 17, 2022

You can set the user agent in php.ini, without the need for curl. Just use the below lines before you load the DOMDocument

$agent = "Mozilla/5.0 (compatible; Googlebot/2.1; +";
ini_set('user_agent', $agent);

And then your code:

$doc = new DOMDocument();
$xpath = new DOMXPath($doc);
echo $xpath->query('//title')->item(0)->nodeValue."n";
Wednesday, December 7, 2022

Use this:

$img = $dom->getElementsByTagName('img')->item(0);
echo $img->attributes->getNamedItem("src")->value;
Wednesday, August 24, 2022

I think it has to do with how you're iterating. You're changing the result list as it's being iterated against, so it winds up breaking (side-effects). Try changing your loop to this:

$nodes = $root->getElementsByTagNameNS($root->lookupNamespaceURI('zuq'), 'data');
$i = $nodes->length - 1;
while ($i >= 0) {
    $node = $nodes->item($i);

Basically, it just iterates backwards over the list of nodes, so that when nodes are removed, they are removed from the end rather than the beginning...

Saturday, November 5, 2022
