Viewed   98 times

I need to parse some HTML files, however, they are not well-formed and PHP prints out warnings to. I want to avoid such debugging/warning behavior programatically. Please advise. Thank you!

Code:

// create a DOM document and load the HTML data
$xmlDoc = new DomDocument;
// this dumps out the warnings
$xmlDoc->loadHTML($fetchResult);

This:

@$xmlDoc->loadHTML($fetchResult)

can suppress the warnings but how can I capture those warnings programatically?

 Answers

2

You can install a temporary error handler with set_error_handler

class ErrorTrap {
  protected $callback;
  protected $errors = array();
  function __construct($callback) {
    $this->callback = $callback;
  }
  function call() {
    $result = null;
    set_error_handler(array($this, 'onError'));
    try {
      $result = call_user_func_array($this->callback, func_get_args());
    } catch (Exception $ex) {
      restore_error_handler();        
      throw $ex;
    }
    restore_error_handler();
    return $result;
  }
  function onError($errno, $errstr, $errfile, $errline) {
    $this->errors[] = array($errno, $errstr, $errfile, $errline);
  }
  function ok() {
    return count($this->errors) === 0;
  }
  function errors() {
    return $this->errors;
  }
}

Usage:

// create a DOM document and load the HTML data
$xmlDoc = new DomDocument();
$caller = new ErrorTrap(array($xmlDoc, 'loadHTML'));
// this doesn't dump out any warnings
$caller->call($fetchResult);
if (!$caller->ok()) {
  var_dump($caller->errors());
}
Friday, November 25, 2022
2

If you want to get :

  • The text
  • that's inside a <div> tag with class="text"
  • that's, itself, inside a <div> with class="main"

I would say the easiest way is not to use DOMDocument::getElementsByTagName -- which will return all tags that have a specific name (while you only want some of them).

Instead, I would use an XPath query on your document, using the DOMXpath class.


For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath class :

$html = <<<HTML
<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);


And, then, you can use XPath queries, with the DOMXPath::query method, that returns the list of elements you were searching for :

$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}


And executing this gives me the following output :

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)
Monday, December 5, 2022
3

Is this what your are looking for?

    $result = array();

    $doc = <<< HTML
    <html>
        <body>
            <div>1
                <span>2</span>
            </div>
            <div>3</div>
            <div>4
                <span class="class1"><strong>5</strong></span>
                <span class="class1"><strong>6</strong></span>
                <span>7</span>
            </div>
        </body>
    </html>
HTML;
    $classname = "class1";
    $domdocument = new DOMDocument();
    $domdocument->loadHTML($doc);
    $a = new DOMXPath($domdocument);
    $spans = $a->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

    for ($i = $spans->length - 1; $i > -1; $i--) {
        $result[] = $spans->item($i)->firstChild->nodeValue;
    }

    echo "<pre>";
    print_r($result);
    exit();
Thursday, December 15, 2022
 
aminner
 
2

Replace

dateforService = $dateforService 

with

dateforService = '$dateforService'

and just in case

Replace

jobrequestnumber ='$jobrequestnumber' // remove { & }. You are not using an array here. And add quotes around it
Thursday, October 20, 2022
1
PostBackUrl='<%# "~/Add/CheckMovie.aspx?movie=" + Eval("mov_id") %>'
Saturday, August 6, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :