Viewed   77 times

I'm trying to find all href links on a webpage and replace the link with my own proxy link.

For example

<a href="http://www.google.com">Google</a>

Needs to be

<a href="http://www.example.com/?loadpage=http://www.google.com">Google</a>

 Answers

1

Use PHP's DomDocument to parse the page

$doc = new DOMDocument();

// load the string into the DOM (this is your page's HTML), see below for more info
$doc->loadHTML('<a href="http://www.google.com">Google</a>');

//Loop through each <a> tag in the dom and change the href property
foreach($doc->getElementsByTagName('a') as $anchor) {
    $link = $anchor->getAttribute('href');
    $link = 'http://www.example.com/?loadpage='.urlencode($link);
    $anchor->setAttribute('href', $link);
}
echo $doc->saveHTML();

Check it out here: http://codepad.org/9enqx3Rv

If you don't have the HTML as a string, you may use cUrl (docs) to grab the HTML, or you can use the loadHTMLFile method of DomDocument

Documentation

  • DomDocument - http://php.net/manual/en/class.domdocument.php
  • DomElement - http://www.php.net/manual/en/class.domelement.php
  • DomElement::getAttribute - http://www.php.net/manual/en/domelement.getattribute.php
  • DOMElement::setAttribute - http://www.php.net/manual/en/domelement.setattribute.php
  • urlencode - http://php.net/manual/en/function.urlencode.php
  • DomDocument::loadHTMLFile - http://www.php.net/manual/en/domdocument.loadhtmlfile.php
  • cURL - http://php.net/manual/en/book.curl.php
Wednesday, September 21, 2022
 
diwp
 
1

Use glob to find pathnames matching a pattern or a GlobIterator.

If you need that to be recursive use a RegexIterator and a RecursiveDirectoryIterator.

Marking this CW because the question is a sure duplicate and you can easily find examples for all of the above when using the Search function. Please do so.

Saturday, August 27, 2022
 
svick
 
2

To convert the plain text line breaks to html line breaks, try this:

    $fh = fopen("filename.txt", 'r');

    $pageText = fread($fh, 25000);

    echo nl2br($pageText);

Note the nl2br function wrapping the text.

Tuesday, September 27, 2022
 
adrianm
 
5

You could use a regular expression replace:

str = str.replace(/ +(?= )/g,'');

Credit: The above regex was taken from Regex to replace multiple spaces with a single space

Tuesday, October 18, 2022
1

You should be able to accomplish this easily with a basic fread(). You can specify how many bytes you want to read, so it's trivial to read in an exact amount and output it to a new file.

Try something like this:

$i = 1;
$fp = fopen("test.txt",'r');
while(! feof($fp)) {
    $contents = fread($fp,1000);
    file_put_contents('new_file_'.$i.'.txt',$contents);
    $i++;
}

EDIT

If you wish to stop after a certain amount of length OR on a certain character, then you could use stream_get_line() instead of fread(). It's almost identical, except it allows you to specify any ending delimiter you wish. Note that it does not return the delimeter as part of the read.

$contents = stream_get_line($fp,1000,".");
Thursday, November 10, 2022
 
cyk
 
cyk
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :