I have a simple XML document:
<?xml version="1.0"?>
<cellphones>
<telefon>
<model>Easy DB</model>
<proizvodjac>Alcatel</proizvodjac>
<cena>25</cena>
</telefon>
<telefon>
<model>3310</model>
<proizvodjac>Nokia</proizvodjac>
<cena>30</cena>
</telefon>
<telefon>
<model>GF768</model>
<proizvodjac>Ericsson</proizvodjac>
<cena>15</cena>
</telefon>
<telefon>
<model>Skeleton</model>
<proizvodjac>Panasonic</proizvodjac>
<cena>45</cena>
</telefon>
<telefon>
<model>Earl</model>
<proizvodjac>Sharp</proizvodjac>
<cena>60</cena>
</telefon>
</cellphones>
I need to print the content of this file using XML DOM, and it needs to be structured like this:
"model: Easy DB
proizvodjac: Alcatel
cena: 25"
for each node inside the XML.
IT HAS TO BE DONE using XML DOM. That's the problem. I can do it the usual, simple way. But this one bothers me because I can't seem to find any solution on the internet.
This is as far as I can go, but I need to access inside nodes (child nodes) and to get node values. I also want to get rid of some weird string "#text" that comes up out of the blue.
<?php
//kreira se DOMDocument objekat
$xmlDoc = new DOMDocument();
//u xml objekat se ucitava xml fajl
$xmlDoc->load("poruke.xml");
//dodeljuje se promenljivoj koreni element
$x = $xmlDoc->documentElement;
//prolazi se kroz petlju tako sto se ispisuje informacija o podelementima
foreach ($x->childNodes AS $item){
print $item->nodeName . " = " . $item->nodeValue . "<br />";
}
?>
Thanks
Explanation for weird #text strings
The weird #text strings dont come out of the blue but are actual Text Nodes. When you load a formatted XML document with
DOM
any whitespace, e.g. indenting, linebreaks and node values will be part of the DOM asDOMText
instances by default, e.g.where E is a
DOMElement
and T is aDOMText
.To get around that, load the document like this:
Then your document will be structured as follows
Note that individual nodes representing the value of a
DOMElement
will still beDOMText
instances, but the nodes that control the formatting are gone. More on that later.Proof
You can test this easily with this code:
This code runs through all the telefon elements in your given XML and prints out node name, type and the urlencoded node value of it's child nodes. When you preserve the whitespace, you will get something like
The reason I urlencoded the value is to show that there is in fact
DOMText
nodes containing the indenting and the linebreaks in yourDOMDocument
.%0A
is a linebreak, while each+
is a space.When you compare this with your XML, you will see there is a line break after each
<telefon>
element followed by four spaces until the<model>
element starts. Likewise, there is only a newline and two spaces between the closing<cena>
and the opening<telefon>
.The given type for these nodes is 3, which - according to the list of predefined constants - is
XML_TEXT_NODE
, e.g. aDOMText
node. In lack of a proper element name, these nodes have a name of #text.Disregarding Whitespace
Now, when you disable preservation of whitespace, the above will output:
As you can see, there is no more #text nodes, but only type 1 nodes, which means
XML_ELEMENT_NODE
, e.g.DOMElement
.DOMElements contain DOMText nodes
In the beginning I said, the values of
DOMElements
areDOMText
instances too. But in the output above, they are nowhere to be seen. That's because we are accessing thenodeValue
property, which returns the value of theDOMText
as string. We can prove that the value is aDOMText
easily though:will output
And this proves a
DOMElement
contains it's value as aDOMText
andnodeValue
is just returning the content of theDOMText
directly.More on nodeValue
In fact,
nodeValue
is smart enough to concatenate the contents of anyDOMText
children:will output
although these are really the combined values of
Printing content of a XML file using XML DOM
To finally answer your question, look at the first test code. Everything you need is in there. And of course by now you have been given fine other answers too.