Asked  2 Years ago    Answers:  5   Viewed   373 times

Using CURL to get content from website. Getting response in object. How to convert that object in to PHP Simple HTML DOM Parser

function get_data($url) 
{
    $ch = curl_init();
    $timeout = 30;
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,false);
    curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
    curl_setopt($ch,CURLOPT_POST,false);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0");
    //curl_exec($ch);
    $dom = new simple_html_dom(curl_exec($ch));
    print_r( $dom );
    curl_close($ch);
    return $data;
}
$url = 'http://www.example.com';
$data = get_data($url);

?>

Result

simple_html_dom Object ( [root] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => 1 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) ) [parent] => [_] => Array ( [0] => -1 [1] => 2 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => 1 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) ) [parent] => [_] => Array ( [0] => -1 [1] => 2 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) [1] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object *RECURSION* ) [parent] => [_] => Array ( [0] => -1 [1] => 2 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) [_] => Array ( [4] => 1 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) ) [callback] => [lowercase] => 1 [original_size] => 1 [size] => 1 [pos:protected] => 1 [char:protected] => [cursor:protected] => 2 [parent:protected] => simple_html_dom_node Object ( [nodetype] => 5 [tag] => root [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 3 [tag] => text [attr] => Array ( ) [children] => Array ( ) [nodes] => Array ( ) [parent] => simple_html_dom_node Object *RECURSION* [_] => Array ( [4] => 1 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) ) [parent] => [_] => Array ( [0] => -1 [1] => 2 ) [tag_start] => 0 [dom:private] => simple_html_dom Object *RECURSION* ) [token_blank:protected] => [token_equal:protected] => =/> [token_slash:protected] => /> [token_attr:protected] => > [_charset] => UTF-8 [_target_charset] => UTF-8 [default_br_text:protected] => [default_span_text] => [self_closing_tags:protected] => Array ( [img] => 1 [br] => 1 [input] => 1 [meta] => 1 [link] => 1 [hr] => 1 [base] => 1 [embed] => 1 [spacer] => 1 ) [block_tags:protected] => Array ( [root] => 1 [body] => 1 [form] => 1 [div] => 1 [span] => 1 [table] => 1 ) [optional_closing_tags:protected] => Array ( [tr] => Array ( [tr] => 1 [td] => 1 [th] => 1 ) [th] => Array ( [th] => 1 ) [td] => Array ( [td] => 1 ) [li] => Array ( [li] => 1 ) [dt] => Array ( [dt] => 1 [dd] => 1 ) [dd] => Array ( [dd] => 1 [dt] => 1 ) [dl] => Array ( [dd] => 1 [dt] => 1 ) [p] => Array ( [p] => 1 ) [nobr] => Array ( [nobr] => 1 ) [b] => Array ( [b] => 1 ) [option] => Array ( [option] => 1 ) ) [doc:protected] => 1 [noise:protected] => Array ( ) ) 


 Answers

3

You're not creating the DOM correctly, you must do it like this:

// Create a DOM object
$dom = new simple_html_dom();
// Load HTML from a string
$dom->load(curl_exec($ch))

print_r( $dom );

Check the Manual for more details...

Edit

It seems that is a cURL settings problem, please refer to the documentation to configure it correctly...

This is a function I usualy use to download some pages, feel free to adjust it to your needs:

function dlPage($href) {

    $curl = curl_init();
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($curl, CURLOPT_HEADER, false);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_URL, $href);
    curl_setopt($curl, CURLOPT_REFERER, $href);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4");
    $str = curl_exec($curl);
    curl_close($curl);

    // Create a DOM object
    $dom = new simple_html_dom();
    // Load HTML from a string
    $dom->load($str);

    return $dom;
    }

$url = 'http://www.example.com/';
$data = dlPage($url);
print_r($data);
Sunday, August 14, 2022
2
$newimage->data-original;

means

$newimage->data - original;

A way round this is to try:

$property = 'data-original';
$newimage->$property;

or, to use the alternative syntax:

$newimage['data-original'];
Friday, November 4, 2022
 
3

Found the answer to my question with help from user sms who commented above. This php pulls the data from the first table and encodes it in the JSON format.

<?php

include('simple_html_dom.php');
header('Content-Type: application/json');

$html = file_get_html('http://www.1011now.com/weather/closings');
$row_count=0;
$json = array();

// Find all links 
$table = $html->find('table', 0);
foreach($table->find('tr') as $row) {
    $name = $row->find('td',0)->innertext;
    $status = $row->find('td',1)->innertext;

    $json[] = [ 'name' => strip_tags($name), 'status' => strip_tags($status)];
}

$options = array(
    'http' => array(
    'method'  => 'POST',
    'content' => json_encode(array('Closings' =>$json)),
    'header'=>  "Content-Type: application/jsonrn" .
                "Accept: application/jsonrn"
    )
);

$context  = stream_context_create( $options );
$result = file_get_contents( $url, false, $context );
$response = json_decode( $result );

echo json_encode(array('Closings' =>$json), JSON_PRETTY_PRINT);  


?>
Monday, November 14, 2022
2

The find function. http://simplehtmldom.sourceforge.net/manual.htm#section_find

$content = file_get_html($link);

$elems = $content->find("/html/body/div/div");
Wednesday, December 21, 2022
5

No, simple html dom doesn't do dom manipulation. With phpquery though you can do:

$doc->find('head')->append('<script src="foo"></script>');
Thursday, October 27, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 

Browse Other Code Languages