Viewed   107 times

I am using latest PHP. I want to parse HTML page to get data.

HTML:

<table class="margin15" style="margin-left: 0pt; margin-right: 0pt;" width="100%" align="left" border="0" cellpadding="0" cellspacing="0">
TRs, TDs, Data
</table>

<table class="margin15" style="margin-left: 0pt; margin-right: 0pt;" width="100%" align="left" border="0" cellpadding="0" cellspacing="0">
TRs, TDs, Data
</table>

<table class="margin15" style="margin-left: 0pt; margin-right: 0pt;" width="100%" align="left" border="0" cellpadding="0" cellspacing="0">
TRs, TDs, Data
</table>

<table class="margin15" style="margin-left: 0pt; margin-right: 0pt;" width="100%" align="left" border="0" cellpadding="0" cellspacing="0">
TRs, TDs, Data
</table>

PHP Code:

<?php

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.test.com/mypage.html');  
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);


$pattern = '/<table class="margin15" style="margin-left: 0pt; margin-right: 0pt;" width="100%" align="left" border="1" cellpadding="0" cellspacing="0">[^~]</table>/';
preg_match_all($pattern, $result, $matches);
print_r($matches);

?>

I am not able to get all tables. When I use simple $pattern='/table/';, it gives me exact result. How to create a pattern to get whole table at one array location?

 Answers

5

Parsing HTML using regex is a pain at best as HTML is not regular, I suggest you use Simple HTML DOM.

Saturday, October 29, 2022
 
embo
 
3

Note that W3C says :

Defining term: If the dfn element has a title attribute, then the exact value of that attribute is the term being defined.

Accordingly, you can have a simple solution where you put the term being defined inside the title attribute, and the definition inside

dfn::before {content: attr(title);padding: 0 0 1em;}      
dfn button {display: none; position: absolute}
dfn:hover button, dfn:focus button {display: block;}
<p>An
     <dfn title="onomasticon" tabindex="0"><button disabled>
       <p>Another word for <strong>thesaurus</strong></p>
       <p><img src="http://i.imgur.com/G0bl4k7.png" /></p>
      </button></dfn> is not a dinosaur.
</p>

I do not find any tag that can replace here the button element which seems to be the only one working here.

So we have to add the disabled attribute to the button element to disable its button behavior (focus) and set the tabindex on the dfn element to enable arrow navigation.

Tuesday, September 13, 2022
4

You are missing the /ims flag at the end of your regex. Otherwise . will not match line breaks (as in your first paragraph). Actually /s would suffice, but I'm always using all three for simplicity.

Also, preg_match works for many simple cases. But if you are attempting any more complex extractions, then consider alternating to phpQuery or QueryPath which allow for:

foreach (qp($html)->find("p") as $p)  { print $p->text(); }
Friday, September 23, 2022
 
1

Pattern: /(?:G(?!^)|Temp(C):) Kd+/ (Demo)

Code: (Demo)

$in='28.6MH/s 27.3MH/s | Temp(C): 64 66 61 64 63 | Fan: 74% 76% 69% 75% 72% | HW: 21 21 21 ';

var_export(preg_match_all('/(?:G(?!^)|Temp(C):) Kd+/',$in,$out)?$out[0]:'fail');

Output:

array (
  0 => '64',
  1 => '66',
  2 => '61',
  3 => '64',
  4 => '63',
)

Explanation:

You can see the official terminology explanation in the Pattern Demo link, but here is my way of explaining...

(?:         # start a non-capturing group so that regex understands the piped "alternatives"
G          # match from the start of the string or where the previous match left off
(?!^)       # ...but not at the start of the string (for your case, this can actually be omitted, but it is a more trustworthy pattern with it included
|           # OR
Temp(C):  # literally match Temp(C):
)           # end the non-capturing group
            # <-- there is a blank space there which needs to be matched
K          # "release" previous matched characters (restart fullstring match)
d+         # match one or more digits greedily

The pattern stops when it hits that | ("space and pipe") after 63 because they aren't matched by d+ ("space and digits").

Wednesday, November 2, 2022
 
kxr
 
kxr
2

Jukka K. Korpela is right, but regarding your question:

This should get you started on the selecting font family:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
  <h1 id="liveh1">Some text</h1>
  <select id="selecth1FontFamily" name="selectFontFamily" onchange="updateh1family();">
    <option> Serif </option>
    <option> Arial </option>
    <option> Sans-Serif </option>                                  
    <option> Tahoma </option>
    <option> Verdana </option>
    <option> Lucida Sans Unicode </option>                               
  </select>
    <script>
      function updateh1family() {
        var selector = document.getElementById('selecth1FontFamily');
        var family = selector.options[selector.selectedIndex].value;
        var h1 = document.getElementById('liveh1')
        h1.style.fontFamily = family;        
      }

    </script>
</body>
</html>

If it's important to see if the fonts are installed you can use this to tool to check. And if it's ok to use flash you may use something like font-detect-js (they also have a demo but I didn't manage to get it to work in chrome).

And for color selector I would recomend using : jscolor

Hope it helps, and good luck!

Thursday, September 29, 2022
 
Only authorized users can answer the search term. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :