GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   PHP gurus! Question about "preg_match_all" function (https://gfy.com/showthread.php?t=249192)

whee 03-08-2004 12:09 PM

PHP gurus! Question about "preg_match_all" function
 
Help anyone!
I've got the html of an extern gallery in a string by using fopen.
Now I wanna grab all the picture urls from this gallery by using RegX / preg_match_all whatever.
I've tried :

1. preg_match_all('/href=\"([^\"]+\.jpg)\"/i',$str,$arr2);

2. preg_match_all('/\"([^\"]+\.jpg)\"/i',$str,$arr2);

Does not seem to work.
I want all the picture urls from the gallery in an array.

Anyone who can give me the right search pattern ???? Or help me in the right direction?

Most appreciated... I've used long time on this.

:helpme

garett 03-08-2004 12:24 PM

Here's a little script I wrote for you:

Code:

<?php
        header("Content-type: text/plain");
       
        $f = file_get_contents("http://path_to_gallery_html",  "r");
       
        $file = strtolower($f);
       
        $lines = split("\n", $file);
       
        foreach($lines as $line) {
               
                if(preg_match("/href=[\"|\']([^\"|\']+.jpg)|([^\"|\']+.jpeg)/", $line, $matches)) {
                        echo $matches[1] . "\n";
                }
        }
?>

I prefer to do it on a line per line basis.. and I also converted the entire HTML to lower case so you don't have to test for file names that are upper case.

cheers

Edit: I just realized a little more explanation might be needed about the pattern..

First off .. a lot of people form their link tags like <a href='http://blahblah' .. single quotes rather than double quotes.. so it's a good idea to check for both.

And then also you need to check for '.jpg' and '.jpeg' respectively ..

whee 03-08-2004 12:39 PM

Garret, you are a fox...

Thank you very much, this was exactly what I needed!

:) :) :)

:thumbsup to Garret!

<br>

garett 03-08-2004 12:48 PM

Np.. :thumbsup

.. and I just realized a stupid little mistake in my code..

Initially I was using fopen() instead of file_get_contents().. with file_get_contents you don't need the "r" so you can take that out.

Lane 03-08-2004 12:57 PM

Quote:

Originally posted by garett
Here's a little script I wrote for you:

Code:

&lt;?php
        header("Content-type: text/plain");
       
        $f = file_get_contents("http://path_to_gallery_html",  "r");
       
        $file = strtolower($f);
       
        $lines = split("\n", $file);
       
        foreach($lines as $line) {
               
                if(preg_match("/href=[\"|\']([^\"|\']+.jpg)|([^\"|\']+.jpeg)/", $line, $matches)) {
                        echo $matches[1] . "\n";
                }
        }
?&gt;

I prefer to do it on a line per line basis.. and I also converted the entire HTML to lower case so you don't have to test for file names that are upper case.

cheers

Edit: I just realized a little more explanation might be needed about the pattern..

First off .. a lot of people form their link tags like &lt;a href='http://blahblah' .. single quotes rather than double quotes.. so it's a good idea to check for both.

And then also you need to check for '.jpg' and '.jpeg' respectively ..

your code assumes there is at most 1 image url per line, and the regular expression has redundant code.
correct me if i'm wrong.

garett 03-08-2004 01:00 PM

Quote:

Originally posted by Lane


your code assumes there is at most 1 image url per line, and the regular expression has redundant code.
correct me if i'm wrong.

Thanks for pointing that out.. it's true I didn't think of that.

And looking carefully you should shorten the pattern from this:

/href=[\"|']([^\"|']+.jpg)|([^\"|']+.jpeg)/

To

/href=[\"|']([^\"|']+[.jpg|.jpeg])/

Is that what you had in mind when you said redundant code?

fletcher 03-08-2004 01:05 PM

In order to match pretty much any HTML, here's what I use:

PHP Code:

preg_match_all("/<a.*href=[\"|\']?([^\"|^\'|^\s|^\>]+)[\"|\']?.*><img.*src=[\"|\']?([^\"|^\'|^\s|^\>]+)[\"|\']?.*><\/a>/im"$tmpStr$matches); 


I'm sure it could be optimized a bit, but sometimes .* gets greedy so I like to have extra checks in there. You'll find it a lot easier to parse out the whole HTML page, then remove any linebreaks and run that regexp on the whole thing. That way, any links or image tags that span more than one line will match.

:2 cents:


All times are GMT -7. The time now is 03:46 PM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123