Viewed   53 times

Notice how Google News has sources on the bottom of each article excerpt.

The Guardian - ABC News - Reuters - Bloomberg

I'm trying to imitate that.

For example, upon submitting the URL http://www.washingtontimes.com/news/2010/dec/3/debt-panel-fails-test-vote/ I want to return The Washington Times

How is this possible with php?

 Answers

4

My answer is expanding on @AI W's answer of using the title of the page. Below is the code to accomplish what he said.

<?php

function get_title($url){
  $str = file_get_contents($url);
  if(strlen($str)>0){
    $str = trim(preg_replace('/s+/', ' ', $str)); // supports line breaks inside <title>
    preg_match("/<title>(.*)</title>/i",$str,$title); // ignore case
    return $title[1];
  }
}
//Example:
echo get_title("http://www.washingtontimes.com/");

?>

OUTPUT

Washington Times - Politics, Breaking News, US and World News

As you can see, it is not exactly what Google is using, so this leads me to believe that they get a URL's hostname and match it to their own list.

http://www.washingtontimes.com/ => The Washington Times

Wednesday, November 9, 2022
2

found a good way :

$html = preg_replace("#(<s*as+[^>]*hrefs*=s*["'])(?!http)([^"'>]+)(["'>]+)#", '$1http://mydomain.com/$2$3', $html);

you can use (?!http|mailto) if you have also mailto links in your $html

Saturday, December 17, 2022
 
1

You want to create slugs, but from experience i can tell you the decode possibilities are limited. For example "Foo - Bar" will become "foo-bar" so how do you then can possibly know that it wasn't "foo bar" or "foo-bar" all along?

Or how about chars that you don't want in your slug and also have no representation for like " ` "? So you can ether use a 1 to 1 converstion like rawurlencode() or you can create a Slug, here is an example for a function - but as i said, no reliable decoding possible - its just in its nature since you have to throw away Information.

function sanitizeStringForUrl($string){
    $string = strtolower($string);
    $string = html_entity_decode($string);
    $string = str_replace(array('ä','ü','ö','ß'),array('ae','ue','oe','ss'),$string);
    $string = preg_replace('#[^wsäüöß]#',null,$string);
    $string = preg_replace('#[s]{2,}#',' ',$string);
    $string = str_replace(array(' '),array('-'),$string);
    return $string;
}
Wednesday, August 31, 2022
 
zen
 
zen
3

You can also get the title of any webpage using this API

http://textance.herokuapp.com/title/

$.ajax({
      url: "http://textance.herokuapp.com/title/www.bbc.co.uk",
      complete: function(data) {
        alert(data.responseText);
      }
});
Wednesday, December 21, 2022
 
4

For the regex, use:

document.title = document.title.replace (/[^0-9:]/g, "");

To detect title changes, use MutationObservers, a new HTML5 feature that is implemented in both Google Chrome and Firefox (The two main userscripts browsers).

This complete script will work:

// ==UserScript==
// @name        Shakes & Fidget Buffed title shortener
// @namespace   http://släcker.de
// @version     0.1
// @description  Removes the page title of Shakes & Fidget to only display left time if it exists
// @include     *.sfgame.*
// @exclude     www.sfgame.*
// @exclude     sfgame.*
// @copyright   2013+, slaecker, 
// @grant       GM_addStyle
// ==/UserScript==
/*- The @grant directive is needed to work around a design change
    introduced in GM 1.0.   It restores the sandbox.
*/

var MutationObserver = window.MutationObserver || window.WebKitMutationObserver;
var myObserver       = new MutationObserver (titleChangeDetector);
var obsConfig        = {
    //-- Subtree needed.
    childList: true, characterData: true, subtree: true
};

myObserver.observe (document, obsConfig);

function titleChangeHandler () {
    this.weInitiatedChange      = this.weInitiatedChange || false;
    if (this.weInitiatedChange) {
        this.weInitiatedChange  = false;
        //-- No further action needed
    }
    else {
        this.weInitiatedChange  = true;
        document.title = document.title.replace (/[^0-9:]/g, "");
    }
}

function titleChangeDetector (mutationRecords) {

    mutationRecords.forEach ( function (mutation) {
        //-- Sensible, Firefox
        if (    mutation.type                       == "childList"
            &&  mutation.target.nodeName            == "TITLE"
        ) {
            titleChangeHandler ();
        }
        //-- WTF, Chrome
        else if (mutation.type                      == "characterData"
            &&  mutation.target.parentNode.nodeName == "TITLE"
        ) {
            titleChangeHandler ();
        }
    } );
}

//-- Probably best to wait for first title change, but uncomment the next line if desired.
//titleChangeHandler ();

If you are using some other browser (state that in the question), then fallback to using setInterval().

Tuesday, August 30, 2022
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :