Viewed   59 times

I'm building an XML file from scratch and need to know if htmlentities() converts every character that could potentially break an XML file (and possibly UTF-8 data)?

The values will be from a twitter/flickr feed, so I need to be sure-

 Answers

4

htmlentities() is not a guaranteed way to build legal XML.

Use htmlspecialchars() instead of htmlentities() if this is all you are worried about. If you have encoding mismatches between the representation of your data and the encoding of your XML document, htmlentities() may serve to work around/cover them up (it will bloat your XML size in doing so). I believe it's better to get your encodings consistent and just use htmlspecialchars().

Also, be aware that if you pump the return value of htmlspecialchars() inside XML attributes delimited with single quotes, you will need to pass the ENT_QUOTES flag as well so that any single quotes in your source string are properly encoded as well. I suggest doing this anyway, as it makes your code immune to bugs resulting from someone using single quotes for XML attributes in the future.

Edit: To clarify:

htmlentities() will convert a number of non-ANSI characters (I assume this is what you mean by UTF-8 data) to entities (which are represented with just ANSI characters). However, it cannot do so for any characters which do not have a corresponding entity, and so cannot guarantee that its return value consists only of ANSI characters. That's why I 'm suggesting to not use it.

If encoding is a possible issue, handle it explicitly (e.g. with iconv()).

Edit 2: Improved answer taking into account Josh Davis's comment belowis .

Wednesday, August 31, 2022
1

For those of you not using the PEAR packages, but you've got PHP5 installed. This worked for me:

/**
 * Build A XML Data Set
 *
 * @param array $data Associative Array containing values to be parsed into an XML Data Set(s)
 * @param string $startElement Root Opening Tag, default fx_request
 * @param string $xml_version XML Version, default 1.0
 * @param string $xml_encoding XML Encoding, default UTF-8
 * @return string XML String containig values
 * @return mixed Boolean false on failure, string XML result on success
 */
public function buildXMLData($data, $startElement = 'fx_request', $xml_version = '1.0', $xml_encoding = 'UTF-8') {
    if(!is_array($data)) {
        $err = 'Invalid variable type supplied, expected array not found on line '.__LINE__." in Class: ".__CLASS__." Method: ".__METHOD__;
        trigger_error($err);
        if($this->_debug) echo $err;
        return false; //return false error occurred
    }
    $xml = new XmlWriter();
    $xml->openMemory();
    $xml->startDocument($xml_version, $xml_encoding);
    $xml->startElement($startElement);

    /**
     * Write XML as per Associative Array
     * @param object $xml XMLWriter Object
     * @param array $data Associative Data Array
     */
     function write(XMLWriter $xml, $data) {
         foreach($data as $key => $value) {
             if(is_array($value)) {
                 $xml->startElement($key);
                 write($xml, $value);
                 $xml->endElement();
                 continue;
             }
             $xml->writeElement($key, $value);
         }
     }
     write($xml, $data);

     $xml->endElement();//write end element
     //Return the XML results
     return $xml->outputMemory(true); 
}
Monday, October 24, 2022
 
3

Nodes are accessed as object properties, attributes use the array notation. foreach lets you iterate over nodes. You can get the content of a node by casting it as a string. (so if you use echo it's implied)

$shrs = simplexml_load_string($xml);

foreach ($shrs->rs->r as $r)
{
    $jobTitle = $r->jt;
    $city = $r->loc['cty'];

    echo "There's an offer for $jobTitle in $city<br />n";
}
Wednesday, September 21, 2022
 
3

Have a look at Jeroen Pluimers Sessions at CodeRage 4

called Practical XML in Delphi

"Starting with the XML basics, learn about well formed and valid documents, encoding, and recoding and XSD validation. See examples in Delphi for Win32 and Delphi Prism showing you which tool to choose when. Finally, learn where things can go wrong and how to prevent that: improper but well formed XML, copying data between XML documents, convert XML to tables and objects, etc."

Tuesday, October 4, 2022
 
elph
 
1

The absolute fastest way to query an XML document is the hardest: write a method that uses an XmlReader to process the input stream, and have it process nodes as it reads them. This is the way to combine parsing and querying into a single operation. (Simply using XPath doesn't do this; both XmlDocument and XPathDocument parse the document in their Load methods.) This is usually only a good idea if you're processing extremely large streams of XML data.

All three methods you've describe perform similarly. XSLT has a lot of room to be the slowest of the lot, because it lets you combine the inefficiencies of XPath with the inefficiencies of template matching. XPath and LINQ queries both do essentially the same thing, which is linear searching through enumerable lists of XML nodes. I would expect LINQ to be marginally faster in practice because XPath is interpreted at runtime while LINQ is interpreted at compile-time.

But in general, how you write your query is going to have a much greater impact on execution speed than what technology you use.

The way to write fast queries against XML documents is the same whether you're using XPath or LINQ: formulate the query so that as few nodes as possible get visited during its execution. It doesn't matter which technology you use: a query that examines every node in the document is going to run a lot slower than one that examines only a small subset of them. Your ability to do that is more dependent on the structure of the XML than anything else: a document with a navigable hierarchy of elements is generally going to be a lot faster to query than one whose elements are all children of the document element.

Edit:

While I'm pretty sure I'm right that the absolute fastest way to query an XML is the hardest, the real fastest (and hardest) way doesn't use an XmlReader; it uses a state machine that directly processes characters from a stream. Like parsing XML with regular expressions, this is ordinarily a terrible idea. But it does give you the option of exchanging features for speed. By deciding not to handle those pieces of XML that you don't need for your application (e.g. namespace resolution, expansion of character entities, etc.) you can build something that will seek through a stream of characters faster than an XmlReader would. I can think of applications where this is even not a bad idea, though there I can't think of many.

Monday, October 10, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :