GoFuckYourself.com - Adult Webmaster Forum - PHP gurus! Question about "preg_match

- Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)

- - PHP gurus! Question about "preg_match_all" function (https://gfy.com/showthread.php?t=249192)

PHP gurus! Question about "preg_match_all" function

Help anyone!
I've got the html of an extern gallery in a string by using fopen.
Now I wanna grab all the picture urls from this gallery by using RegX / preg_match_all whatever.
I've tried :

1. preg_match_all('/href=\"([^\"]+\.jpg)\"/i',$str,$arr2);

2. preg_match_all('/\"([^\"]+\.jpg)\"/i',$str,$arr2);

Does not seem to work.
I want all the picture urls from the gallery in an array.

Anyone who can give me the right search pattern ???? Or help me in the right direction?

Most appreciated... I've used long time on this.

:helpme

Here's a little script I wrote for you:

Code:

&lt;?php

        header("Content-type: text/plain");

        

        $f = file_get_contents("http://path_to_gallery_html",  "r");

        

        $file = strtolower($f);

        

        $lines = split("\n", $file);

        

        foreach($lines as $line) {

                

                if(preg_match("/href=[\"|\']([^\"|\']+.jpg)|([^\"|\']+.jpeg)/", $line, $matches)) {

                        echo $matches[1] . "\n";

                }

        }

?&gt;

I prefer to do it on a line per line basis.. and I also converted the entire HTML to lower case so you don't have to test for file names that are upper case.

cheers

Edit: I just realized a little more explanation might be needed about the pattern..

First off .. a lot of people form their link tags like <a href='http://blahblah' .. single quotes rather than double quotes.. so it's a good idea to check for both.

And then also you need to check for '.jpg' and '.jpeg' respectively ..

Garret, you are a fox...

Thank you very much, this was exactly what I needed!

:) :) :)

:thumbsup to Garret!

<br>

Np.. :thumbsup

.. and I just realized a stupid little mistake in my code..

Initially I was using fopen() instead of file_get_contents().. with file_get_contents you don't need the "r" so you can take that out.

Quote:

Originally posted by garett
Here's a little script I wrote for you:

Code:

<?php header("Content-type: text/plain"); $f = file_get_contents("http://path_to_gallery_html", "r"); $file = strtolower($f); $lines = split("\n", $file); foreach($lines as $line) { if(preg_match("/href=[\"|\']([^\"|\']+.jpg)|([^\"|\']+.jpeg)/", $line, $matches)) { echo $matches[1] . "\n"; } } ?>

I prefer to do it on a line per line basis.. and I also converted the entire HTML to lower case so you don't have to test for file names that are upper case.

cheers

Edit: I just realized a little more explanation might be needed about the pattern..

First off .. a lot of people form their link tags like <a href='http://blahblah' .. single quotes rather than double quotes.. so it's a good idea to check for both.

And then also you need to check for '.jpg' and '.jpeg' respectively ..

your code assumes there is at most 1 image url per line, and the regular expression has redundant code.
correct me if i'm wrong.

Quote:

Originally posted by Lane

your code assumes there is at most 1 image url per line, and the regular expression has redundant code.
correct me if i'm wrong.

Thanks for pointing that out.. it's true I didn't think of that.

And looking carefully you should shorten the pattern from this:

/href=[\"|']([^\"|']+.jpg)|([^\"|']+.jpeg)/

To

/href=[\"|']([^\"|']+[.jpg|.jpeg])/

Is that what you had in mind when you said redundant code?

In order to match pretty much any HTML, here's what I use:

PHP Code:


		
			
preg_match_all("/<a.*href=[\"|\']?([^\"|^\'|^\s|^\>]+)[\"|\']?.*><img.*src=[\"|\']?([^\"|^\'|^\s|^\>]+)[\"|\']?.*><\/a>/im", $tmpStr, $matches);

I'm sure it could be optimized a bit, but sometimes .* gets greedy so I like to have extra checks in there. You'll find it a lot easier to parse out the whole HTML page, then remove any linebreaks and run that regexp on the whole thing. That way, any links or image tags that span more than one line will match.

:2 cents: