Viewed   277 times

I want to try figure out how to get the

<title>A common title</title>
<meta name="keywords" content="Keywords blabla" />
<meta name="description" content="This is the description" />

Even though if it's arranged in any order, I've heard of the PHP Simple HTML DOM Parser but I don't really want to use it. Is it possible for a solution except using the PHP Simple HTML DOM Parser.

preg_match will not be able to do it if it's invalid HTML?

Can cURL do something like this with preg_match?

Facebook does something like this but it's properly used by using:

<meta property="og:description" content="Description blabla" />

I want something like this so that it is possible when someone posts a link, it should retrieve the title and the meta tags. If there are no meta tags, then it it ignored or the user can set it themselves (but I'll do that later on myself).

 Answers

4

This is the way it should be:

function file_get_contents_curl($url)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

    $data = curl_exec($ch);
    curl_close($ch);

    return $data;
}

$html = file_get_contents_curl("http://example.com/");

//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');

//get and display what you need:
$title = $nodes->item(0)->nodeValue;

$metas = $doc->getElementsByTagName('meta');

for ($i = 0; $i < $metas->length; $i++)
{
    $meta = $metas->item($i);
    if($meta->getAttribute('name') == 'description')
        $description = $meta->getAttribute('content');
    if($meta->getAttribute('name') == 'keywords')
        $keywords = $meta->getAttribute('content');
}

echo "Title: $title". '<br/><br/>';
echo "Description: $description". '<br/><br/>';
echo "Keywords: $keywords";
Thursday, December 15, 2022
1

Yes, but you'll have to do it yourself by parsing the response and looking for things that look like:

<meta http-equiv="refresh" content="5;url=http://example.com/" />

Obeying <meta> refresh requests is a browser-side thing. Use DOM parsing to look for <meta> tags with the appropriate attributes in the response cURL gives you.

If you can guarantee that the response is valid XML, you could do something like this:

$xml = simplexml_load_file($cURLResponse);
$result = $xml->xpath("//meta[@http-equiv='refresh']");
// Process the $result element to get the relevant bit out of the content attribute
Saturday, October 15, 2022
3

Apart from a few exceptions, the order of childs in the head element doesn’t matter.

That said, consumers like search engines may of course do what they want (e.g., ignoring every third element, just for the fun of it), but discussing the possible behaviour of undesignated consumers is off-topic here.

The exceptions:

  • meta-charset should ideally be the first child (i.e., the element must be within the first 1024 bytes of the document, and at best before any non-ASCII characters)

  • base should ideally be the second child (i.e., it must come before any other element in head that has a URI as attribute value)

  • those link and script elements that the user agents wants to process are by default processed in the order they appear

  • the order of link-stylesheet and style elements can play a role for applying CSS

  • the first link-alternate with a type of application/rss+xml or application/atom+xml is "the default syndication feed for the purposes of feed autodiscovery"

Sunday, August 21, 2022
 
tumen_t
 
3

is there a way to use curl such that you can do something that is equivalent to the get_meta_tags() function in php

Nope, I don't think so.

The best way would be to fetch the data, and parse it using a HTML parser. Alternatively, there are several regex based approaches in the user contributed notes in the manual.

Sunday, December 11, 2022
 
nes1983
 
3

Different titles on tag page:

<title>{block:TagPage}Posts tagged {Tag} - {/block:TagPage} {Title}</title>

You can't define different meta tags on tumblr.

You can however define multiple meta tags for background color for instance, and use on as standard and on on the tag page.

<meta name="color:Background" content="#eee"/>
<meta name="color:Background Tag Page" content="#666"/>

In the css:

body {
  background: {color:background};
  {block:TagPage}
  background: {color:background tag page};
  {/block:TagPage}
}

The usefulness of this method is debatable though. I think it would work okay for a personal blog at least.

Wednesday, August 24, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :