This is a long shot, but wth: Any Perl gurus here familiar with HTML::TokeParser use? - GoFuckYourself.com

Angry Jew Cat - Banned for Life · 02-24-2009, 01:11 PM

Already dropped my question on a few Perl forums, and I've been detached from my regular Perl master for a few days. If you know your shit and have a moment I'd like to run a question passed you.

fris · 02-24-2009, 01:41 PM

post the question here

Angry Jew Cat - Banned for Life · 02-24-2009, 02:25 PM

what i'm trying to achieve is that i'd like to extract all the html contained within a specified table. i have targeted the table, and i attempted to clip out the required html using "get_trimmed_text" but it parses the html as text, so all the html tags are not saved using this method. is there an equivelant to using get_trimmed_text I could use within HTML::TokeParser or should I be looking into a different module. IS there a funtion for trimming down html in WWW::Mechanize?

Code:

!/usr/bin/perl
use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;

# extract.pl

  print "Enter the page URL: ";
      chomp( my $domain = <STDIN> );

  print "Enter the output HTML filename: ";
      chomp( my $html_output = <STDIN> );

  my $content = get($domain) or die $!;

  my $stream = HTML::TokeParser->new( \$content ) or die $!;

  while ( my $tag = $stream->get_tag( "table" ) ) {

      if ( $tag->[1]{cellpadding} and  $tag->[1]{cellpadding} eq '8' ) {

      # what do i do here?

      }

  }

Angry Jew Cat - Banned for Life · 02-24-2009, 02:42 PM

Shit I just found out there is an HTML::TableExtract module :D, gonna go Google now, peace...

HorseShit · 02-24-2009, 02:54 PM

peace fucker

Angry Jew Cat - Banned for Life · 02-24-2009, 03:47 PM

hrmmm, if i could trim all points before and after a set point in a text file somehow i could make this work. any ideas?

Tempest · 02-24-2009, 05:09 PM

Don't know why you're using Tokeparser for this...

Code:

$lchtml = lc($html);
$start = index($lchtml, '<table');
$end = index($lchtml, '</table>') + 8;
$table = substr($html, $start, $end - $start);

fris · 02-24-2009, 05:09 PM

why not use php. its prob about 3 lines one to grab the content and a regex

Alky · 02-24-2009, 05:28 PM

Quote:

Originally Posted by fris

why not use php. its prob about 3 lines one to grab the content and a regex

perl can use regex....

Tempest · 02-24-2009, 05:31 PM

Quote:

Originally Posted by Alky

perl can use regex....

Code:

$html =~ /(<table.+?<\/table>)/si;
$table = $1;

Angry Jew Cat - Banned for Life · 02-24-2009, 05:45 PM

Quote:

Originally Posted by Tempest

Don't know why you're using Tokeparser for this...

Code:

$lchtml = lc($html);
$start = index($lchtml, '<table');
$end = index($lchtml, '</table>') + 8;
$table = substr($html, $start, $end - $start);

I don't know why I'm using TokeParser for this either. I've just been on a google mission all day slowly stepping closer to getting my goal achieved. I'm pretty new to Perl, and real coding in general. I had a good Perl mentor for a while, but he up and disappeared for the last few days so I've been busting ass on google trying to accomplish my aims here. Never tried to manipulate a page before.

I got some help and have achieved my goal using HTML::TreeBuilder though. Everything has been running just skippy. I'm gonna look mor einto what you've given me though, looks as though it'd shave a few lines off my code...

Angry Jew Cat - Banned for Life · 02-24-2009, 06:12 PM

Cheers Tempest, shaved 7 lines of code off, and is much easier to remember then the TreeBuilder method for future use.

Tempest · 02-24-2009, 07:48 PM

If you're going to do quite a bit of Perl, I'd recomend you get and read this book http://oreilly.com/catalog/9780596520106/ and then the rest in the series... You might also want to check out this downloadable book http://www.perl.org/books/beginning-perl/ or perhaps this site http://www.perltutorial.org/

Angry Jew Cat - Banned for Life · 02-24-2009, 07:49 PM

Quote:

Originally Posted by Tempest

If you're going to do quite a bit of Perl, I'd recomend you get and read this book http://oreilly.com/catalog/9780596520106/ and then the rest in the series... You might also want to check out this downloadable book http://www.perl.org/books/beginning-perl/ or perhaps this site http://www.perltutorial.org/

i forgot my llama during my last move, but i agree, its a great book. i wish i had it with me

02-24-2009, 01:11 PM	#1
Angry Jew Cat - Banned for Life (felis madjewicus) Industry Role: Join Date: Jul 2006 Location: In Mom & Dad's Basement Posts: 20,368	This is a long shot, but wth: Any Perl gurus here familiar with HTML::TokeParser use? Already dropped my question on a few Perl forums, and I've been detached from my regular Perl master for a few days. If you know your shit and have a moment I'd like to run a question passed you.

02-24-2009, 01:41 PM	#2
fris Too lazy to set a custom title Industry Role: Join Date: Aug 2002 Posts: 55,372	post the question here __________________ Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence. WP Stuff

02-24-2009, 02:25 PM	#3
Angry Jew Cat - Banned for Life (felis madjewicus) Industry Role: Join Date: Jul 2006 Location: In Mom & Dad's Basement Posts: 20,368	what i'm trying to achieve is that i'd like to extract all the html contained within a specified table. i have targeted the table, and i attempted to clip out the required html using "get_trimmed_text" but it parses the html as text, so all the html tags are not saved using this method. is there an equivelant to using get_trimmed_text I could use within HTML::TokeParser or should I be looking into a different module. IS there a funtion for trimming down html in WWW::Mechanize? Code: !/usr/bin/perl use strict; use warnings; use HTML::TokeParser; use LWP::Simple; # extract.pl print "Enter the page URL: "; chomp( my $domain = <STDIN> ); print "Enter the output HTML filename: "; chomp( my $html_output = <STDIN> ); my $content = get($domain) or die $!; my $stream = HTML::TokeParser->new( \$content ) or die $!; while ( my $tag = $stream->get_tag( "table" ) ) { if ( $tag->[1]{cellpadding} and $tag->[1]{cellpadding} eq '8' ) { # what do i do here? } }

02-24-2009, 05:09 PM	#7
Tempest Too lazy to set a custom title Industry Role: Join Date: May 2004 Location: West Coast, Canada. Posts: 10,217	Don't know why you're using Tokeparser for this... Code: $lchtml = lc($html); $start = index($lchtml, '<table'); $end = index($lchtml, '</table>') + 8; $table = substr($html, $start, $end - $start);

02-24-2009, 05:09 PM	#8
fris Too lazy to set a custom title Industry Role: Join Date: Aug 2002 Posts: 55,372	why not use php. its prob about 3 lines one to grab the content and a regex __________________ Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence. WP Stuff

02-24-2009, 02:42 PM	#4
Angry Jew Cat - Banned for Life (felis madjewicus) Industry Role: Join Date: Jul 2006 Location: In Mom & Dad's Basement Posts: 20,368	Shit I just found out there is an HTML::TableExtract module :D, gonna go Google now, peace...

02-24-2009, 02:54 PM	#5
HorseShit Too lazy to set a custom title Join Date: Dec 2004 Posts: 17,513	peace fucker

02-24-2009, 03:47 PM	#6
Angry Jew Cat - Banned for Life (felis madjewicus) Industry Role: Join Date: Jul 2006 Location: In Mom & Dad's Basement Posts: 20,368	hrmmm, if i could trim all points before and after a set point in a text file somehow i could make this work. any ideas?

02-24-2009, 06:12 PM	#12
Angry Jew Cat - Banned for Life (felis madjewicus) Industry Role: Join Date: Jul 2006 Location: In Mom & Dad's Basement Posts: 20,368	Cheers Tempest, shaved 7 lines of code off, and is much easier to remember then the TreeBuilder method for future use.

02-24-2009, 07:48 PM	#13
Tempest Too lazy to set a custom title Industry Role: Join Date: May 2004 Location: West Coast, Canada. Posts: 10,217	If you're going to do quite a bit of Perl, I'd recomend you get and read this book http://oreilly.com/catalog/9780596520106/ and then the rest in the series... You might also want to check out this downloadable book http://www.perl.org/books/beginning-perl/ or perhaps this site http://www.perltutorial.org/