I'm using PHP DOM and I'm trying to get an element within a DOM node that have a given class name. What's the best way to get that sub-element?
Update: I ended up using Mechanize
for PHP which was much easier to work with.
I'm using PHP DOM and I'm trying to get an element within a DOM node that have a given class name. What's the best way to get that sub-element?
Update: I ended up using Mechanize
for PHP which was much easier to work with.
You can use XPath on your DOMDocument as follows:
$doc->loadHTML($article_header);
$xpath = new DOMXpath($doc);
$imagesAndIframes = $xpath->query('//img | //iframe');
$length = $imagesAndIframes->length;
for ($i = 0; $i < $length; $i++) {
$element = $imagesAndIframes->item($i);
if ($element->tagName == 'img') {
echo 'img';
} else {
echo 'iframe';
}
}
You can set the user agent in php.ini, without the need for curl. Just use the below lines before you load the DOMDocument
$agent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)";
ini_set('user_agent', $agent);
And then your code:
$doc = new DOMDocument();
@$doc->loadHTMLFile('http://www.facebook.com');
$xpath = new DOMXPath($doc);
echo $xpath->query('//title')->item(0)->nodeValue."n";
Use this:
$img = $dom->getElementsByTagName('img')->item(0);
echo $img->attributes->getNamedItem("src")->value;
I think it has to do with how you're iterating. You're changing the result list as it's being iterated against, so it winds up breaking (side-effects). Try changing your loop to this:
$nodes = $root->getElementsByTagNameNS($root->lookupNamespaceURI('zuq'), 'data');
$i = $nodes->length - 1;
while ($i >= 0) {
$node = $nodes->item($i);
$node->parentNode->replaceChild(
$node->ownerDocument->createTextNode('foo'),
$node
);
$i--;
}
Basically, it just iterates backwards over the list of nodes, so that when nodes are removed, they are removed from the end rather than the beginning...
Update: Xpath version of
*[@class~='my-class']
css selectorSo after my comment below in response to hakre's comment, I got curious and looked into the code behind
Zend_Dom_Query
. It looks like the above selector is compiled to the following xpath (untested):[contains(concat(' ', normalize-space(@class), ' '), ' my-class ')]
So the PHP would be:
Basically, all we do here is normalize the
class
attribute so that even a single class is bounded by spaces, and the complete class list is bounded in spaces. Then append the class we are searching for with a space. This way we are effectively looking for and find only instances ofmy-class
.Use an xpath selector?
If it is only ever one type of element you can replace the
*
with the particular tagname.If you need to do a lot of this with very complex selector I would recommend
Zend_Dom_Query
which supports CSS selector syntax (a la jQuery):