Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 02-24-2009, 01:11 PM   #1
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
This is a long shot, but wth: Any Perl gurus here familiar with HTML::TokeParser use?

Already dropped my question on a few Perl forums, and I've been detached from my regular Perl master for a few days. If you know your shit and have a moment I'd like to run a question passed you.
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 01:41 PM   #2
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
post the question here
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 02:25 PM   #3
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
what i'm trying to achieve is that i'd like to extract all the html contained within a specified table. i have targeted the table, and i attempted to clip out the required html using "get_trimmed_text" but it parses the html as text, so all the html tags are not saved using this method. is there an equivelant to using get_trimmed_text I could use within HTML::TokeParser or should I be looking into a different module. IS there a funtion for trimming down html in WWW::Mechanize?

Code:
!/usr/bin/perl
use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;

# extract.pl

  print "Enter the page URL: ";
      chomp( my $domain = <STDIN> );

  print "Enter the output HTML filename: ";
      chomp( my $html_output = <STDIN> );

  my $content = get($domain) or die $!;

  my $stream = HTML::TokeParser->new( \$content ) or die $!;

  while ( my $tag = $stream->get_tag( "table" ) ) {

      if ( $tag->[1]{cellpadding} and  $tag->[1]{cellpadding} eq '8' ) {

      # what do i do here?

      }

  }
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 02:42 PM   #4
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
Shit I just found out there is an HTML::TableExtract module :D, gonna go Google now, peace...
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 02:54 PM   #5
HorseShit
Too lazy to set a custom title
 
Join Date: Dec 2004
Posts: 17,513
peace fucker
HorseShit is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 03:47 PM   #6
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
hrmmm, if i could trim all points before and after a set point in a text file somehow i could make this work. any ideas?
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 05:09 PM   #7
Tempest
Too lazy to set a custom title
 
Industry Role:
Join Date: May 2004
Location: West Coast, Canada.
Posts: 10,217
Don't know why you're using Tokeparser for this...
Code:
$lchtml = lc($html);
$start = index($lchtml, '<table');
$end = index($lchtml, '</table>') + 8;
$table = substr($html, $start, $end - $start);
Tempest is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 05:09 PM   #8
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
why not use php. its prob about 3 lines one to grab the content and a regex
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 05:28 PM   #9
Alky
Confirmed User
 
Alky's Avatar
 
Join Date: Apr 2002
Location: Houston
Posts: 5,651
Quote:
Originally Posted by fris View Post
why not use php. its prob about 3 lines one to grab the content and a regex
perl can use regex....
Alky is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 05:31 PM   #10
Tempest
Too lazy to set a custom title
 
Industry Role:
Join Date: May 2004
Location: West Coast, Canada.
Posts: 10,217
Quote:
Originally Posted by Alky View Post
perl can use regex....
Code:
$html =~ /(<table.+?<\/table>)/si;
$table = $1;
Tempest is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 05:45 PM   #11
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
Quote:
Originally Posted by Tempest View Post
Don't know why you're using Tokeparser for this...
Code:
$lchtml = lc($html);
$start = index($lchtml, '<table');
$end = index($lchtml, '</table>') + 8;
$table = substr($html, $start, $end - $start);
I don't know why I'm using TokeParser for this either. I've just been on a google mission all day slowly stepping closer to getting my goal achieved. I'm pretty new to Perl, and real coding in general. I had a good Perl mentor for a while, but he up and disappeared for the last few days so I've been busting ass on google trying to accomplish my aims here. Never tried to manipulate a page before.

I got some help and have achieved my goal using HTML::TreeBuilder though. Everything has been running just skippy. I'm gonna look mor einto what you've given me though, looks as though it'd shave a few lines off my code...
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 06:12 PM   #12
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
Cheers Tempest, shaved 7 lines of code off, and is much easier to remember then the TreeBuilder method for future use.
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 07:48 PM   #13
Tempest
Too lazy to set a custom title
 
Industry Role:
Join Date: May 2004
Location: West Coast, Canada.
Posts: 10,217
If you're going to do quite a bit of Perl, I'd recomend you get and read this book http://oreilly.com/catalog/9780596520106/ and then the rest in the series... You might also want to check out this downloadable book http://www.perl.org/books/beginning-perl/ or perhaps this site http://www.perltutorial.org/
Tempest is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-24-2009, 07:49 PM   #14
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
Quote:
Originally Posted by Tempest View Post
If you're going to do quite a bit of Perl, I'd recomend you get and read this book http://oreilly.com/catalog/9780596520106/ and then the rest in the series... You might also want to check out this downloadable book http://www.perl.org/books/beginning-perl/ or perhaps this site http://www.perltutorial.org/
i forgot my llama during my last move, but i agree, its a great book. i wish i had it with me
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.