I'm trying to find all href links on a webpage and replace the link with my own proxy link.
For example
<a href="http://www.google.com">Google</a>
Needs to be
<a href="http://www.example.com/?loadpage=http://www.google.com">Google</a>
I'm trying to find all href links on a webpage and replace the link with my own proxy link.
For example
<a href="http://www.google.com">Google</a>
Needs to be
<a href="http://www.example.com/?loadpage=http://www.google.com">Google</a>
Use glob
to find pathnames matching a pattern or a GlobIterator
.
If you need that to be recursive use a RegexIterator
and a RecursiveDirectoryIterator
.
Marking this CW because the question is a sure duplicate and you can easily find examples for all of the above when using the Search function. Please do so.
To convert the plain text line breaks to html line breaks, try this:
$fh = fopen("filename.txt", 'r');
$pageText = fread($fh, 25000);
echo nl2br($pageText);
Note the nl2br function wrapping the text.
You could use a regular expression replace:
str = str.replace(/ +(?= )/g,'');
Credit: The above regex was taken from Regex to replace multiple spaces with a single space
You should be able to accomplish this easily with a basic fread(). You can specify how many bytes you want to read, so it's trivial to read in an exact amount and output it to a new file.
Try something like this:
$i = 1;
$fp = fopen("test.txt",'r');
while(! feof($fp)) {
$contents = fread($fp,1000);
file_put_contents('new_file_'.$i.'.txt',$contents);
$i++;
}
EDIT
If you wish to stop after a certain amount of length OR on a certain character, then you could use stream_get_line() instead of fread()
. It's almost identical, except it allows you to specify any ending delimiter you wish. Note that it does not return the delimeter as part of the read.
$contents = stream_get_line($fp,1000,".");
Use PHP's
DomDocument
to parse the pageCheck it out here: http://codepad.org/9enqx3Rv
If you don't have the HTML as a string, you may use cUrl (docs) to grab the HTML, or you can use the
loadHTMLFile
method ofDomDocument
Documentation
DomDocument
- http://php.net/manual/en/class.domdocument.phpDomElement
- http://www.php.net/manual/en/class.domelement.phpDomElement::getAttribute
- http://www.php.net/manual/en/domelement.getattribute.phpDOMElement::setAttribute
- http://www.php.net/manual/en/domelement.setattribute.phpurlencode
- http://php.net/manual/en/function.urlencode.phpDomDocument::loadHTMLFile
- http://www.php.net/manual/en/domdocument.loadhtmlfile.php