Asked  2 Years ago    Answers:  5   Viewed   152 times

Possible Duplicate:
PHP String Manipulation: Extract hrefs

I am using php and have string with content =

<a href="www.something.com">Click here</a>

I need to get rid of everything except "www.something.com" I assume this can be done with regular expressions. Any help is appreciated! Thank you

 Answers

4

This is very easy to do using SimpleXML:

$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com
Friday, November 4, 2022
 
4

You can use PHPs DOMDocument library to parse XML and/or HTML. Something like the following should do the trick, to get the href attribute from a string of HTML.

$html = '<h1>Doctors</h1>
<a title="C - G" href="linkl.html">C - G</a>
<a title="G - K" href="link2.html">G - K</a>
<a title="K - M" href="link3.html">K - M</a>';

$hrefs = array();

$dom = new DOMDocument();
$dom->loadHTML($html);

$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
       $hrefs[] =  $tag->getAttribute('href');
}
Thursday, December 15, 2022
1

The strcspn function is what you are looking for.

<?php

$mask = "abc";

$string = "log dog hat bat";

$result = substr($string,0,strcspn($string,$mask));

var_dump($result);

?>
Thursday, October 20, 2022
 
5

download java file as plain text/html pass it through Jsoup or html cleaner both are similar and can be used to parse even malformed html 4.0 syntax and then you can use the popular HTML DOM parsing methods like getElementsByName("a") or in jsoup its even cool you can simply use

File input = new File("/tmp/input.html");
 Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Elements links = doc.select("a[href]"); // a with href
Elements pngs = doc.select("img[src$=.png]");
// img with src ending .png

Element masthead = doc.select("div.masthead").first();

and find all links and then get the detials using

String linkhref=links.attr("href");

Taken from http://jsoup.org/cookbook/extracting-data/selector-syntax

The selectors have same syntax as jQuery if you know jQuery function chaining then you will certainly love it.

EDIT: In case you want more tutorials, you can try out this one made by mkyong.

http://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/

Sunday, September 18, 2022
42

You can use python script here

This script get any links started with http

import re

f = open('sitemap.xml','r')
res = f.readlines()
for d in res:
    data = re.findall('>(http://.+)<',d)
    for i in data:
        print i

And in your case next script find all data wraped in tags

import re

f = open('sitemap.xml','r')
res = f.readlines()
for d in res:
    data = re.findall('<loc>(http://.+)</loc>',d)
    for i in data:
        print i

Here nice tool to play with regexp if you not familiar with it.

if you need to load remote file you can use next code

import urllib2 as ur
import re

f = ur.urlopen(u'http://server.com/sitemap.xml')
res = f.readlines()
for d in res:
  data = re.findall('<loc>(http://.+)</loc>',d)
  for i in data:
    print i
Sunday, October 23, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 

Browse Other Code Languages