Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 03-08-2004, 12:09 PM   #1
whee
Confirmed User
 
Join Date: Sep 2002
Location: http://www.nightstation.com
Posts: 1,375
PHP gurus! Question about "preg_match_all" function

Help anyone!
I've got the html of an extern gallery in a string by using fopen.
Now I wanna grab all the picture urls from this gallery by using RegX / preg_match_all whatever.
I've tried :

1. preg_match_all('/href=\"([^\"]+\.jpg)\"/i',$str,$arr2);

2. preg_match_all('/\"([^\"]+\.jpg)\"/i',$str,$arr2);

Does not seem to work.
I want all the picture urls from the gallery in an array.

Anyone who can give me the right search pattern ???? Or help me in the right direction?

Most appreciated... I've used long time on this.

__________________
http://www.nightstation.com
whee is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2004, 12:24 PM   #2
garett
Confirmed User
 
Join Date: Mar 2004
Posts: 683
Here's a little script I wrote for you:

Code:
<?php
	header("Content-type: text/plain");
	
	$f = file_get_contents("http://path_to_gallery_html",  "r");
	
	$file = strtolower($f);
	
	$lines = split("\n", $file);
	
	foreach($lines as $line) {
		
		if(preg_match("/href=[\"|\']([^\"|\']+.jpg)|([^\"|\']+.jpeg)/", $line, $matches)) {
			echo $matches[1] . "\n";
		}
	}
?>
I prefer to do it on a line per line basis.. and I also converted the entire HTML to lower case so you don't have to test for file names that are upper case.

cheers

Edit: I just realized a little more explanation might be needed about the pattern..

First off .. a lot of people form their link tags like <a href='http://blahblah' .. single quotes rather than double quotes.. so it's a good idea to check for both.

And then also you need to check for '.jpg' and '.jpeg' respectively ..
__________________

Last edited by garett; 03-08-2004 at 12:32 PM..
garett is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2004, 12:39 PM   #3
whee
Confirmed User
 
Join Date: Sep 2002
Location: http://www.nightstation.com
Posts: 1,375
Garret, you are a fox...

Thank you very much, this was exactly what I needed!



to Garret!

<br>
__________________
http://www.nightstation.com
whee is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2004, 12:48 PM   #4
garett
Confirmed User
 
Join Date: Mar 2004
Posts: 683

Np..

.. and I just realized a stupid little mistake in my code..

Initially I was using fopen() instead of file_get_contents().. with file_get_contents you don't need the "r" so you can take that out.
__________________
garett is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2004, 12:57 PM   #5
Lane
Will code for food...
 
Join Date: Apr 2001
Location: Buckeye, AZ
Posts: 8,496
Quote:
Originally posted by garett
Here's a little script I wrote for you:

Code:
&lt;?php
	header("Content-type: text/plain");
	
	$f = file_get_contents("http://path_to_gallery_html",  "r");
	
	$file = strtolower($f);
	
	$lines = split("\n", $file);
	
	foreach($lines as $line) {
		
		if(preg_match("/href=[\"|\']([^\"|\']+.jpg)|([^\"|\']+.jpeg)/", $line, $matches)) {
			echo $matches[1] . "\n";
		}
	}
?&gt;
I prefer to do it on a line per line basis.. and I also converted the entire HTML to lower case so you don't have to test for file names that are upper case.

cheers

Edit: I just realized a little more explanation might be needed about the pattern..

First off .. a lot of people form their link tags like &lt;a href='http://blahblah' .. single quotes rather than double quotes.. so it's a good idea to check for both.

And then also you need to check for '.jpg' and '.jpeg' respectively ..
your code assumes there is at most 1 image url per line, and the regular expression has redundant code.
correct me if i'm wrong.
__________________
Lane is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2004, 01:00 PM   #6
garett
Confirmed User
 
Join Date: Mar 2004
Posts: 683
Quote:
Originally posted by Lane


your code assumes there is at most 1 image url per line, and the regular expression has redundant code.
correct me if i'm wrong.
Thanks for pointing that out.. it's true I didn't think of that.

And looking carefully you should shorten the pattern from this:

/href=[\"|']([^\"|']+.jpg)|([^\"|']+.jpeg)/

To

/href=[\"|']([^\"|']+[.jpg|.jpeg])/

Is that what you had in mind when you said redundant code?
__________________
garett is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-08-2004, 01:05 PM   #7
fletcher
Confirmed User
 
Join Date: Jan 2003
Location: Austin, TX
Posts: 698
In order to match pretty much any HTML, here's what I use:

PHP Code:
preg_match_all("/<a.*href=[\"|\']?([^\"|^\'|^\s|^\>]+)[\"|\']?.*><img.*src=[\"|\']?([^\"|^\'|^\s|^\>]+)[\"|\']?.*><\/a>/im"$tmpStr$matches); 

I'm sure it could be optimized a bit, but sometimes .* gets greedy so I like to have extra checks in there. You'll find it a lot easier to parse out the whole HTML page, then remove any linebreaks and run that regexp on the whole thing. That way, any links or image tags that span more than one line will match.

__________________
&nbsp;
[email protected]
ICQ: 6411138
fletcher is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.