Viewed   98 times

I am trying to display Xml content in to tables, all works perfectly but some content in the tag that i don't want to display, I want only image but not

November 2012 calendar from 5.10 The Test

like in xml,
 <content:encoded><![CDATA[<p>November 2012 calendar from 5.10 The Test</p>
    <p><a class="shutterset_" href='' title='&lt;br&gt;November 2012 calendar from 5.10 The Test&lt;br&gt; &lt;a href=&quot;</a></p>]]>

I want to display image but not

November 2012 calendar from 5.10 The Test

// load SimpleXML
$item = new SimpleXMLElement('test1.xml', null, true);

echo <<<EOF
<table border="1px">
        <tr cl>

foreach($item->channel->item as $boo) // loop through our books
        echo <<<EOF

            <td rowspan="3">{$boo->children('content', true)->encoded}</td>


echo '</table>';



I once answered it but I don't find the answer any longer.

If you take a look at the string (simplified/beautified):

    <p>Lorem Ipsom</p>
      <a href='laura-bertram-trance-gemini-145-1080.jpg' 
         title='&lt;br&gt;November 2012 calendar from 5.10 The Test&lt;br&gt; &lt;a href=&quot;</a>

You can see that you have HTML encoded inside the node-value of the <content:encoded> element. So first you need to obtain the HTML value, which you already do:

$html = $boo->children('content', true)->encoded;

Then you need to parse the HTML inside $html. With which libraries HTML parsing can be done with PHP is outlined in:

  • How to parse and process HTML/XML with PHP?

If you decide to use the more or less recommended DOMDocument for the job, you only need to get the attribute value of a certain element:

  • PHP DOMDocument getting Attribute of Tag

Or for its sister library SimpleXML you already use (so this is more recommended, see as well the next section):

  • How to get an attribute with SimpleXML?

In context of your question here the following tip:

You're using SimpleXML. DOMDocument is a sister-library, meaning you can interchange between the two so you don't need to learn a full new library.

For example, you can use only the HTML parsing feature of DOMDocument, but import it then into SimpleXML. This is useful, because SimpleXML does not support HTML parsing.

That works via simplexml_import_dom().

A simplified step-by-step example:

// get the HTML string out of the feed:
$htmlString = $boo->children('content', true)->encoded;

// create DOMDocument for HTML parsing:
$htmlParser = new DOMDocument();

// load the HTML:

// import it into simplexml:
$html = simplexml_import_dom($htmlParser);

Now you can use $html as a new SimpleXMLElement that represents the HTML document. As your HTML chunks did not have any <body> tags, according to the HTML specification, they are put inside the <body> tag. This will allow you for example to access the href attribute of the first <a> inside the second <p> element in your example:#

// access the element you're looking for:
$href = $html->body->p[1]->a['href'];

Here the full view from above (Online Demo):

// get the HTML string out of the feed:
$htmlString = $boo->children('content', true)->encoded;

// create DOMDocument for HTML parsing:
$htmlParser = new DOMDocument();

// your HTML gives parser warnings, keep them internal:

// load the HTML:

// import it into simplexml:
$html = simplexml_import_dom($htmlParser);

// access the element you're looking for:
$href = $html->body->p[1]->a['href'];

// output it
echo $href, "n";

And what it outputs:

Wednesday, August 10, 2022

You can suppress warnings with libxml_use_internal_errors, while loading the document. Eg.:

$doc = new DomDocument();
$doc->loadHTML("<strong>This is an example of a <pseud-template>fake tag</pseud-template></strong>");

If, for some reason, you need access to the warnings, use libxml_get_errors

Thursday, September 8, 2022

You only have one Pages so you are only entering that foreach once. Try looping on the urls.

$xml = "<?xml version='1.0'?>

$xmlparsed = new SimpleXMLElement($xml);
foreach ($xmlparsed->Response->Campaign->Pages->Url as $url) {
    echo $url, PHP_EOL;


Friday, December 2, 2022
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(f);
Element root = doc.getDocumentElement();
NodeList nodeList = doc.getElementsByTagName("player");
for (int i = 0; i < nodeList.getLength(); i++) {
  Node node = nodeList.item(i);
  // do your stuff

but I'd rather suggest to use XPath

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(<uri_as_string>);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/GameWorld/player");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
Saturday, December 17, 2022

Note that the documentation of the handle_starttag method states:

The tag argument is the name of the tag converted to lower case. The attrs argument is a list of (name, value) pairs containing the attributes found inside the tag’s <> brackets.

So, you're probably looking for something like:

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag == 'tr':
            for name, value in attrs:
                if name == 'class':
                    print 'Found class', value

p = MyHTMLParser()


Found class Table_Heading
Found class Table_row
Found class alternat_table_row

P.S. I also recommend BeautifulSoup for parsing HTML with Python.

Friday, September 9, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :