View Single Post
Old 02-24-2009, 02:25 PM  
Angry Jew Cat - Banned for Life
(felis madjewicus)
 
Industry Role:
Join Date: Jul 2006
Location: In Mom & Dad's Basement
Posts: 20,368
what i'm trying to achieve is that i'd like to extract all the html contained within a specified table. i have targeted the table, and i attempted to clip out the required html using "get_trimmed_text" but it parses the html as text, so all the html tags are not saved using this method. is there an equivelant to using get_trimmed_text I could use within HTML::TokeParser or should I be looking into a different module. IS there a funtion for trimming down html in WWW::Mechanize?

Code:
!/usr/bin/perl
use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;

# extract.pl

  print "Enter the page URL: ";
      chomp( my $domain = <STDIN> );

  print "Enter the output HTML filename: ";
      chomp( my $html_output = <STDIN> );

  my $content = get($domain) or die $!;

  my $stream = HTML::TokeParser->new( \$content ) or die $!;

  while ( my $tag = $stream->get_tag( "table" ) ) {

      if ( $tag->[1]{cellpadding} and  $tag->[1]{cellpadding} eq '8' ) {

      # what do i do here?

      }

  }
Angry Jew Cat - Banned for Life is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote