GoFuckYourself.com - Adult Webmaster Forum - View Single Post - Best way to do this? python/ruby/perl/sed/awk/php/etc?

kazymjir · 08-02-2012, 12:09 PM

Quote:

Originally Posted by Zoxxa

I would first extract all the "a href" tags with regex, xpath, or this: http://simplehtmldom.sourceforge.net/

Then detect which urls contain search engine keywords or domains.

Something like this (Typed out fast, did not test):

Code:

$href_array = array('<a href="http://google.com">google</a>', '<a href="http://www.bing.com">bing</a>', 'etc..');

$search_engines = array('bing.com', 'google.com', 'etc...');

$i = 0;
foreach($href_array as $link) {
	
	foreach($search_engines as $site){
		if(strpos($link, $site) !== FALSE){
			
			// SE link found
			$final[$i] = $link;
			$i++;
		}
	}

}

echo '<pre>';
print_r($final);

Zoxxa, sorry, but this makes completely no sense.

If you know all search engine links ($search_engines array), why do you search them?
It's like "I *know* that lighbulb and toy car is inside this box, but I will check it anyway".

Also, what will be if you don't have a link in $search_engines that exists in test.txt ?

And, why you are firing up PHP, performing DOM/regexp processing, while it can be done with single sed command?