i get the html from another site with file_get_contens
, my question is how can i get a specific tag value?
let's say i have:
<div id="global"><p class="paragraph">1800</p></div>
how can i get paragraph's value? thanks
i get the html from another site with file_get_contens
, my question is how can i get a specific tag value?
let's say i have:
<div id="global"><p class="paragraph">1800</p></div>
how can i get paragraph's value? thanks
A few years ago I benchmarked the two and CURL was faster. With CURL you create one CURL instance which can be used for every request, and it maps directly to the very fast libcurl library. Using file_get_contents you have the overhead of protocol wrappers and the initialization code getting executed for every single request.
I will dig out my benchmark script and run on PHP 5.3 but I suspect that CURL will still be faster.
In your url try:
http://user:[email protected]/
(append whatever the rest of the URL for your API should be)
Hooray!!!
I found this source code:
1) create Readability.php
2) create JSLikeHTMLElement.php
3) create index.php by this code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>!</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
</head>
<body dir="rtl">
<?php
include_once 'Readability.php';
// get latest Medialens alert
// (change this URL to whatever you'd like to test)
$url = 'http://';
$html = file_get_contents($url);
// Note: PHP Readability expects UTF-8 encoded content.
// If your content is not UTF-8 encoded, convert it
// first before passing it to PHP Readability.
// Both iconv() and mb_convert_encoding() can do this.
// If we've got Tidy, let's clean up input.
// This step is highly recommended - PHP's default HTML parser
// often doesn't do a great job and results in strange output.
if (function_exists('tidy_parse_string')) {
$tidy = tidy_parse_string($html, array(), 'UTF8');
$tidy->cleanRepair();
$html = $tidy->value;
}
// give it to Readability
$readability = new Readability($html, $url);
// print debug output?
// useful to compare against Arc90's original JS version -
// simply click the bookmarklet with FireBug's console window open
$readability->debug = false;
// convert links to footnotes?
$readability->convertLinksToFootnotes = true;
// process it
$result = $readability->init();
// does it look like we found what we wanted?
if ($result) {
echo "== Title =====================================n";
echo $readability->getTitle()->textContent, "nn";
echo "== Body ======================================n";
$content = $readability->getContent()->innerHTML;
// if we've got Tidy, let's clean it up for output
if (function_exists('tidy_parse_string')) {
$tidy = tidy_parse_string($content, array('indent'=>true, 'show-body-only' => true), 'UTF8');
$tidy->cleanRepair();
$content = $tidy->value;
}
echo $content;
} else {
echo 'Looks like we couldn't find the content. :(';
}
?>
</body>
</html>
in $url = 'http://';
set your site url.
Thank you;)
as per the manual set ignore_errors
to true
:
$opts = array(
'http' => array(
'method' => "GET",
'header' => "Accept-language: enrn",
'ignore_errors' => true
)
);
If the example is really that trivial you could just use a regular expression. For generic HTML parsing though, PHP has DOM support: