|
Here is a way to do it without regex:
<?php
// buffer is a variable to hold the data we are working on
$buffer='';
// set vars for the beginning of what we want to parse and the end of what we want to parse
$begin_pattern='<a href="';
$end_pattern='">';
// set up var for data being extracted. This could be an array or string to write to a file whatever
// here I am just using it to echo the data extracted
$dataout='';
// set file2read to point at the path and file that the list is stored in
$file2read='testfile.txt';
// open the file
$filein=fopen('testfile.txt','r');
// suck the entire file into a variable
while (!feof($filein)){
$buffer=$buffer . fgets($filein);
}
// close the file
fclose($filein);
// check to make sure we got something out of the file
if ($buffer>''){
// do this while any occurences of the beginning pattern are still in the data
while( substr_count(strtolower($buffer),$begin_pattern)>0 ){
// trim the data to just past the next beginning pattern occurence
$buffer=substr($buffer, strpos(strtolower($buffer),$begin_pattern)+strlen( $begin_pattern));
// pull the data in from where we trimmed the data to the occurence of the next end pattern
$dataout=substr($buffer,0,strpos($buffer,$end_patt ern));
// trim the buffer by the length of the data we pulled
$buffer=substr($buffer,strlen($dataout));
// output the data we pulled - could go into an array here or write it to a file whatever
echo $dataout . '<br>';
}
}
?>
takes a file that looks like this:
<a href="testurl1.com">crapcrapcrap<a href="testurl2.com">morecrapmorecrap<a href="testurl3.com">yesevenmore<a href="testurl4.com">awholelottacrap<a href="testurl5.com"><a href="testurl6.com"><a href="testurl7.com"><a href="testurl8.com"><a href="testurl9.com"><a href="testurl10.com"><a href="testurl11.com"><a href="testurl12.com"><a href="testurl13.com"><a href="testurl14.com"><a href="testurl15.com"><a href="testurl16.com"><a href="testurl17.com"><a href="testurl18.com"><a href="testurl19.com"><a href="testurl20.com"><a href="testurl21.com"><a href="testurl22.com"><a href="testurl23.com"><a href="testurl24.com"><a href="testurl25.com">
and outputs it like this:
testurl1.com
testurl2.com
testurl3.com
testurl4.com
testurl5.com
testurl6.com
testurl7.com
testurl8.com
testurl9.com
testurl10.com
testurl11.com
testurl12.com
testurl13.com
testurl14.com
testurl15.com
testurl16.com
testurl17.com
testurl18.com
testurl19.com
testurl20.com
testurl21.com
testurl22.com
testurl23.com
testurl24.com
testurl25.com
__________________
All cookies cleared!
|