Quote:
Originally Posted by Zoxxa
I would first extract all the "a href" tags with regex, xpath, or this: http://simplehtmldom.sourceforge.net/
Then detect which urls contain search engine keywords or domains.
Something like this (Typed out fast, did not test):
Code:
$href_array = array('<a href="http://google.com">google</a>', '<a href="http://www.bing.com">bing</a>', 'etc..');
$search_engines = array('bing.com', 'google.com', 'etc...');
$i = 0;
foreach($href_array as $link) {
foreach($search_engines as $site){
if(strpos($link, $site) !== FALSE){
// SE link found
$final[$i] = $link;
$i++;
}
}
}
echo '<pre>';
print_r($final);
|
Zoxxa, sorry, but this makes completely no sense.
If you know all search engine links ($search_engines array), why do you search them?
It's like "I *know* that lighbulb and toy car is inside this box, but I will check it anyway".
Also, what will be if you don't have a link in $search_engines that exists in test.txt ?
And, why you are firing up PHP, performing DOM/regexp processing, while it can be done with single sed command?