| 
		
			
			
				
			
			
				 
			
			
				
			
		 | 
		
			
			
				 
			
				
			
		 | 
	||||
| 
				Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.  You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact us.  | 
		
		 
		![]()  | 
	
		
			
  | 	
	
	
		
		|||||||
| Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed. | 
| 
		 | 
	Thread Tools | 
| 
			
			 | 
		#1 | 
| 
			
			
			
			 Too lazy to set a custom title 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Aug 2002 
				
				
				
					Posts: 55,372
				 
				
				
				
				 | 
	
	
	
	
		
			
			 
				
				Best way to do this? python/ruby/perl/sed/awk/php/etc?
			 
			I have a list of links. 
		
	
		
		
		
		
			example Code: 
	<h3>search engine links</h3> <a href="http://google.com">google</a> <a href="http://www.bing.com">bing</a> <a href="http://www.yahoo.com">yahoo</a> <h3>payment links</h3> <a href="http://www.paypal.com">paypal</a> <a href="http://www.paxum.com">paxum</a> Code: 
	<a href="http://google.com">google</a> <a href="http://www.bing.com">bing</a> <a href="http://www.yahoo.com">yahoo</a> i was using sed, but its printing the 2nd h3 Code: 
	sed -n '/<h3>/,/<\/h3>/p' test.txt 
				__________________ 
		
		
		
		
	
	Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence. ![]() WP Stuff  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Confirmed User 
			
		
			
			
			Industry Role:  
				Join Date: May 2012 
				
				
				
					Posts: 124
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 have you tried using xpath? 
		
	
		
		
		
		
		
		
			
		
		
	
	i don't know exactly what you're doing but in this case //a[position()<4] will bring the search engines in this case, but i'm sure xpath could handle whatever you wanted to do i guess u just want everything after the first h3 tag and probably has dynamic number of links, but it was just an example of using xpath  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Oct 2011 
				Location: Munich 
				
				
					Posts: 411
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Code: 
	$ cat test.txt <h3>search engine links</h3> <a href="http://google.com">google</a> <a href="http://www.bing.com">bing</a> <a href="http://www.yahoo.com">yahoo</a> <h3>payment links</h3> <a href="http://www.paypal.com">paypal</a> <a href="http://www.paxum.com">paxum</a> $ sed -e '/<h3>payment/,/<\/h3>/ d' -e '/<h3>/ d' test.txt <a href="http://google.com">google</a> <a href="http://www.bing.com">bing</a> <a href="http://www.yahoo.com">yahoo</a> $ 
				__________________ 
		
		
		
		
	
	http://kazymjir.com/  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#4 | 
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Feb 2011 
				Location: Ontario, Canada 
				
				
					Posts: 1,026
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 I would first extract all the "a href" tags with regex, xpath, or this: http://simplehtmldom.sourceforge.net/ 
		
	
		
		
		
		
			Then detect which urls contain search engine keywords or domains. Something like this (Typed out fast, did not test): Code: 
	
$href_array = array('<a href="http://google.com">google</a>', '<a href="http://www.bing.com">bing</a>', 'etc..');
$search_engines = array('bing.com', 'google.com', 'etc...');
$i = 0;
foreach($href_array as $link) {
	
	foreach($search_engines as $site){
		if(strpos($link, $site) !== FALSE){
			
			// SE link found
			$final[$i] = $link;
			$i++;
		}
	}
}
echo '<pre>';
print_r($final);
				__________________ 
		
		
		
		
		
			
		
		
	
	[email protected] ICQ: 269486444 ZoxEmbedTube - Build unlimited "fake" tubes with this easy 100% unencoded CMS!  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#5 | |
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Oct 2011 
				Location: Munich 
				
				
					Posts: 411
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 If you know all search engine links ($search_engines array), why do you search them? It's like "I *know* that lighbulb and toy car is inside this box, but I will check it anyway". Also, what will be if you don't have a link in $search_engines that exists in test.txt ? And, why you are firing up PHP, performing DOM/regexp processing, while it can be done with single sed command? 
				__________________ 
		
		
		
		
		
			
		
		
	
	http://kazymjir.com/  | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Confirmed User 
			
		
			
			
			Industry Role:  
				Join Date: May 2012 
				
				
				
					Posts: 124
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 if you wanted to use xpath u could use //a[following-sibling::h3[1]]  
		
	
		
		
		
		
		
	
	but kazymjir's method is probably what you are looking for  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#7 | ||
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Feb 2011 
				Location: Ontario, Canada 
				
				
					Posts: 1,026
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 To you. 
		
	
		
		
		
		
			Quote: 
	
 Quote: 
	
 I code with php, not sed, so obviously my help would be provided with php. I didn't see your post so chill the fuck out allstar. 
				__________________ 
		
		
		
		
		
			
		
		
	
	[email protected] ICQ: 269486444 ZoxEmbedTube - Build unlimited "fake" tubes with this easy 100% unencoded CMS!  | 
||
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#8 | |
| 
			
			
			
			 It's 42 
			
		
			
			
			Industry Role:  
				Join Date: Jun 2010 
				Location: Global 
				
				
					Posts: 18,083
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 in a foreach loop Code: 
	Perl
foreach(@_){if ($_=~/href/ig)   {chomp $_; print FILE $_\n";}}
 | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Oct 2011 
				Location: Munich 
				
				
					Posts: 411
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Zoxxa, notice, that Fris provided an example input. example input. It doesn't have to be search engines, it can be anything.  
		
	
		
		
		
		
			He wants to get the content between given <h3>s. He may not know what content is between them, so your code is in this case useless. Search engines are only example, as Fris said. There can be totally random links. Zoxxa, you should chill out. I didn't have on mind attacking you, I just express my opinion. 
				__________________ 
		
		
		
		
	
	http://kazymjir.com/  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#10 | 
| 
			
			
			
			 frc 
			
		
			
			
			Industry Role:  
				Join Date: Jul 2003 
				Location: Bitcoin wallet 
				
				
					Posts: 4,663
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		
		
	
		
		
		
		
			 
				__________________ 
		
		
		
		
	
	Crazy fast VPS for $10 a month. Try with $20 free credit  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#11 | |
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Feb 2011 
				Location: Ontario, Canada 
				
				
					Posts: 1,026
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 I apologize, I misread his post where it says "i only want to get the search engine links being". I thought he was being literal and actually meant only urls being search engines, I did not read the text between the h3 tag or his last sed example which shows what he wants. 
				__________________ 
		
		
		
		
	
	[email protected] ICQ: 269486444 ZoxEmbedTube - Build unlimited "fake" tubes with this easy 100% unencoded CMS!  | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#12 | 
| 
			
			
			
			 It's 42 
			
		
			
			
			Industry Role:  
				Join Date: Jun 2010 
				Location: Global 
				
				
					Posts: 18,083
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		http://search.cpan.org/dist/libwww-perl/lwpcook.pod  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#13 | |
| 
			
			
			
			 Too lazy to set a custom title 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Aug 2002 
				
				
				
					Posts: 55,372
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 
				__________________ 
		
		
		
		
	
	Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence. ![]() WP Stuff  | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#14 | 
| 
			
			
			
			 there's no $$$ in porn 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Jul 2005 
				Location: icq: 195./568.-230 (btw: not getting offline msgs) 
				
				
					Posts: 33,063
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 ugly, but it'll work... and less memory intensive than splitting the file: 
		
	
		
		
		
		
		
	
	Code: 
	open(FILE, 'stuff.txt');
$h3 = 0;
while(<FILE>)
{
  chomp;
  if($_ =~ "<h3>"){$h3++; if($h3 > 1){close FILE;} }
  else{print "$_\n";}
  
}
close FILE;
 | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#15 | 
| 
			
			
			
			 Too lazy to set a custom title 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Aug 2002 
				
				
				
					Posts: 55,372
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 foirgot to mention i wanna specify the links based on the h3 tag so <h3>payment links</h3> would grab those links under that h3 element ;) 
		
	
		
		
		
		
			
				__________________ 
		
		
		
		
	
	Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence. ![]() WP Stuff  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#16 | 
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: May 2005 
				Location: UK 
				
				
					Posts: 1,201
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 cut and paste into a file editor, replace all <a href="  with \t<a href=" 
		
	
		
		
		
		
			
		
		
		
		
	
	cut and paste into excel, select column, done  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#18 | |
| 
			
			
			
			 there's no $$$ in porn 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Jul 2005 
				Location: icq: 195./568.-230 (btw: not getting offline msgs) 
				
				
					Posts: 33,063
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 Code: 
	open(FILE, 'stuff.txt');
$h3 = 0;
$h3str = "payment links";
while(<FILE>)
{
  chomp;
  if($_ =~ "<h3>$h3str</h3>"){$h3++;}
  elsif($_ =~ "<h3>"){$h3 = 0;}
  elsif($h3>0){print "$_\n";}
}
close FILE;
 | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#19 | 
| 
			
			
			
			 It's 42 
			
		
			
			
			Industry Role:  
				Join Date: Jun 2010 
				Location: Global 
				
				
					Posts: 18,083
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Code: 
	if($_ =~ /"<h3>$h3str</h3>"/ig) Might work better  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#20 | 
| 
			
			
			
			 Beer Money Baron 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Jan 2001 
				Location: brujah / gmail 
				
				
					Posts: 22,157
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Maybe something along this line: 
		
	
		
		
		
		
			Code: 
	echo preg_replace( '|.*</h3>(.*)<h3>.*|s', '$1', $input ); 
				__________________ 
		
		
		
		
	
	
  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#21 | 
| 
			
			
			
			 Beer Money Baron 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Jan 2001 
				Location: brujah / gmail 
				
				
					Posts: 22,157
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 I guess if multiple h3 blocks continue this will work: 
		
	
		
		
		
		
			Code: 
	echo preg_replace( '|.*?</h3>(.*?)<h3>.*|s', '$1', $input ); 
				__________________ 
		
		
		
		
	
	
  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#23 | |
| 
			
			
			
			 Beer Money Baron 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Jan 2001 
				Location: brujah / gmail 
				
				
					Posts: 22,157
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 Code: 
	$link_block = preg_replace( '|.*?</h3>(.*?)<h3>.*|s', '$1', $input ); 
				__________________ 
		
		
		
		
		
			
		
		
	
	
  | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#24 | |
| 
			
			
			
			 Too lazy to set a custom title 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Aug 2002 
				
				
				
					Posts: 55,372
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 Code: 
	
$data = file_get_contents('links.txt');
$block = preg_replace( '|.*?</h3>(.*?)<h3>.*|s', '$1', $data );
echo $block;
				__________________ 
		
		
		
		
	
	Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence. ![]() WP Stuff  | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#25 | 
| 
			
			
			
			 Beer Money Baron 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Jan 2001 
				Location: brujah / gmail 
				
				
					Posts: 22,157
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 It displays the first block for me, but all I had to go on was your sample links.txt code above. 
		
	
		
		
		
		
			Code: 
	# cat links.txt
<h3>search engine links</h3>
<a href="http://google.com">google</a>
<a href="http://www.bing.com">bing</a>
<a href="http://www.yahoo.com">yahoo</a>
<h3>payment links</h3>
<a href="http://www.paypal.com">paypal</a>
<a href="http://www.paxum.com">paxum</a>
<h3>block three</h3>
<a href="https://gfy.com">gfy</a>
<a href="http://php.net">php</a>
# php test.php
<a href="http://google.com">google</a>
<a href="http://www.bing.com">bing</a>
<a href="http://www.yahoo.com">yahoo</a>
# cat test.php
<?php
$data = file_get_contents('links.txt');
$block = preg_replace( '|.*?</h3>(.*?)<h3>.*|s', '$1', $data );
echo $block;
				__________________ 
		
		
		
		
	
	
  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#26 | |
| 
			
			
			
			 Too lazy to set a custom title 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Aug 2002 
				
				
				
					Posts: 55,372
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 its chrome bookmarks, so each folder has a h3 heading for the folder name, just wanna get those links for the h3 folder name. 
				__________________ 
		
		
		
		
	
	Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence. ![]() WP Stuff  | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#27 | 
| 
			
			
			
			 It's 42 
			
		
			
			
			Industry Role:  
				Join Date: Jun 2010 
				Location: Global 
				
				
					Posts: 18,083
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		One idea...  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#28 | 
| 
			
			
			
			 Beer Money Baron 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Jan 2001 
				Location: brujah / gmail 
				
				
					Posts: 22,157
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Ah ok.  If you're still interested in a php solution, maybe this? 
		
	
		
		
		
		
			Code: 
	if ( empty( $argv[1] ) ) die( 'Usage: php test.php keyword' . PHP_EOL );
$fp = fopen( 'links.txt', 'r' );
while( $line = fgets( $fp ) )
{
    if ( strpos( $line, '<h3>' ) !== false AND strpos( $line, $argv[1] ) !== false )
    {
        do {
            $line = fgets( $fp );
            if ( strpos( $line, '<h3>' ) !== false ) break 2;
            else echo $line;
        } while ( ! feof( $fp ) );
    }
}
fclose( $fp );
Code: 
	~ $ php test.php Usage: php test.php keyword ~ $ php test.php search <a href="http://google.com">google</a> <a href="http://www.bing.com">bing</a> <a href="http://www.yahoo.com">yahoo</a> ~ $ php test.php pay <a href="http://www.paypal.com">paypal</a> <a href="http://www.paxum.com">paxum</a> ~ $ php test.php bleh <a href="http://php.net">php</a> <a href="http://nginx.org">nginx</a> ~ $ php test.php 'search engine links' <a href="http://google.com">google</a> <a href="http://www.bing.com">bing</a> <a href="http://www.yahoo.com">yahoo</a> ~ $ php test.php 'payment links' <a href="http://www.paypal.com">paypal</a> <a href="http://www.paxum.com">paxum</a> 
				__________________ 
		
		
		
		
	
	
  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#29 | 
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: May 2005 
				Location: UK 
				
				
					Posts: 1,201
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 you could have done it in excel by now or registered a website that parses chrome book marks with php preg_match 
		
	
		
		
		
		
			
		
		
		
		
	
	is it just chrome bookmarks? I'll make a damn site to stop reading this  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#30 | |
| 
			
			
			
			 Beer Money Baron 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Jan 2001 
				Location: brujah / gmail 
				
				
					Posts: 22,157
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 Quote: 
	
 ![]() 
				__________________ 
		
		
		
		
	
	
  | 
|
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 | 
| 
			
			 | 
		#32 | 
| 
			
			
			
			 Confirmed User 
			
		
			
				
			
			
			Industry Role:  
				Join Date: Dec 2009 
				Location: Texas 
				
				
					Posts: 1,643
				 
				
				
				
				 | 
	
	
	
	
		
		
		
		 This is untested and you already have a lot of good examples but I believe you have more options to on the command line this way. 
		
	
		
		
		
		
			#!/usr/bin/perl die "Usage is $0 <start> <stop> <filename>\n" unless $ARGV[2]; $start = shift; $stop = shift; $file = shift; open(FILE, "$file") or die "Could not open $file $!\n; while(<FILE>){ chomp; $sp = 1 if $_ =~ /$start/; die if ($_ =~ /$stop/; next if $_ =~ /<h2>/; if($sp == 1){ print "$_\n"; } } 
				__________________ 
		
		
		
		
	
	Go Fuck Yourself  | 
| 
		 | 
	
	
	
		
                 
		
		
		
		
		
		
		
			
			
		
	 |