GoFuckYourself.com - Adult Webmaster Forum - View Single Post - This is a long shot, but wth: Any Perl gurus here familiar with HTML::TokeParser use?

Angry Jew Cat - Banned for Life · 02-24-2009, 02:25 PM

what i'm trying to achieve is that i'd like to extract all the html contained within a specified table. i have targeted the table, and i attempted to clip out the required html using "get_trimmed_text" but it parses the html as text, so all the html tags are not saved using this method. is there an equivelant to using get_trimmed_text I could use within HTML::TokeParser or should I be looking into a different module. IS there a funtion for trimming down html in WWW::Mechanize?

Code:

!/usr/bin/perl
use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;

# extract.pl

  print "Enter the page URL: ";
      chomp( my $domain = <STDIN> );

  print "Enter the output HTML filename: ";
      chomp( my $html_output = <STDIN> );

  my $content = get($domain) or die $!;

  my $stream = HTML::TokeParser->new( \$content ) or die $!;

  while ( my $tag = $stream->get_tag( "table" ) ) {

      if ( $tag->[1]{cellpadding} and  $tag->[1]{cellpadding} eq '8' ) {

      # what do i do here?

      }

  }

02-24-2009, 02:25 PM
Angry Jew Cat - Banned for Life (felis madjewicus) Industry Role: Join Date: Jul 2006 Location: In Mom & Dad's Basement Posts: 20,368	what i'm trying to achieve is that i'd like to extract all the html contained within a specified table. i have targeted the table, and i attempted to clip out the required html using "get_trimmed_text" but it parses the html as text, so all the html tags are not saved using this method. is there an equivelant to using get_trimmed_text I could use within HTML::TokeParser or should I be looking into a different module. IS there a funtion for trimming down html in WWW::Mechanize? Code: !/usr/bin/perl use strict; use warnings; use HTML::TokeParser; use LWP::Simple; # extract.pl print "Enter the page URL: "; chomp( my $domain = <STDIN> ); print "Enter the output HTML filename: "; chomp( my $html_output = <STDIN> ); my $content = get($domain) or die $!; my $stream = HTML::TokeParser->new( \$content ) or die $!; while ( my $tag = $stream->get_tag( "table" ) ) { if ( $tag->[1]{cellpadding} and $tag->[1]{cellpadding} eq '8' ) { # what do i do here? } }