"fix malformed xml in php before processing using domdocument functions" Code Answer

5

Try using the Tidy library which can be used to clean up bad HTML and XML http://php.net/manual/en/book.tidy.php

A pure PHP solution to fix some XML like this:

<?xml version="1.0"?>
<feed>
<RECORD>
<ID>117387</ID>
<ADVERTISERNAME>Test < texter</ADVERTISERNAME>
<AID>10544740</AID>
<NAME>This & This</NAME>
<DESCRIPTION>For one day only this is > than this.</DESCRIPTION>
</RECORD>
</feed>

Would be something like this:

  function cleanupXML($xml) {
    $xmlOut = '';
    $inTag = false;
    $xmlLen = strlen($xml);
    for($i=0; $i < $xmlLen; ++$i) {
        $char = $xml[$i];
        // $nextChar = $xml[$i+1];
        switch ($char) {
        case '<':
          if (!$inTag) {
              // Seek forward for the next tag boundry
              for($j = $i+1; $j < $xmlLen; ++$j) {
                 $nextChar = $xml[$j];
                 switch($nextChar) {
                 case '<':  // Means a < in text
                   $char = htmlentities($char);
                   break 2;
                 case '>':  // Means we are in a tag
                   $inTag = true;
                   break 2;
                 }
              }
          } else {
             $char = htmlentities($char);
          }
          break;
        case '>':
          if (!$inTag) {  // No need to seek ahead here
             $char = htmlentities($char);
          } else {
             $inTag = false;
          }
          break;
        default:
          if (!$inTag) {
             $char = htmlentities($char);
          }
          break;
        }
        $xmlOut .= $char;
    }
    return $xmlOut;
  }

Which is a simple state machine noting whether we are in a tag or not and if not then encoding the text using htmlentities.

It's worth noting that this will be memory hungry on large files so you may want to rewrite it as a stream plugin or a pre-processor.

By justinlam0566-4ca628f70a5c on October 6 2022

Answers related to “fix malformed xml in php before processing using domdocument functions”

Only authorized users can answer the search term. Please sign in first, or register a free account.