GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   This is a long shot, but wth: Any Perl gurus here familiar with HTML::TokeParser use? (https://gfy.com/showthread.php?t=889891)

Angry Jew Cat - Banned for Life 02-24-2009 01:11 PM

This is a long shot, but wth: Any Perl gurus here familiar with HTML::TokeParser use?
 
Already dropped my question on a few Perl forums, and I've been detached from my regular Perl master for a few days. If you know your shit and have a moment I'd like to run a question passed you. :helpme

fris 02-24-2009 01:41 PM

post the question here

Angry Jew Cat - Banned for Life 02-24-2009 02:25 PM

what i'm trying to achieve is that i'd like to extract all the html contained within a specified table. i have targeted the table, and i attempted to clip out the required html using "get_trimmed_text" but it parses the html as text, so all the html tags are not saved using this method. is there an equivelant to using get_trimmed_text I could use within HTML::TokeParser or should I be looking into a different module. IS there a funtion for trimming down html in WWW::Mechanize?

Code:

!/usr/bin/perl
use strict;
use warnings;
use HTML::TokeParser;
use LWP::Simple;

# extract.pl

  print "Enter the page URL: ";
      chomp( my $domain = <STDIN> );

  print "Enter the output HTML filename: ";
      chomp( my $html_output = <STDIN> );

  my $content = get($domain) or die $!;

  my $stream = HTML::TokeParser->new( \$content ) or die $!;

  while ( my $tag = $stream->get_tag( "table" ) ) {

      if ( $tag->[1]{cellpadding} and  $tag->[1]{cellpadding} eq '8' ) {

      # what do i do here?

      }

  }


Angry Jew Cat - Banned for Life 02-24-2009 02:42 PM

Shit I just found out there is an HTML::TableExtract module :D, gonna go Google now, peace...

HorseShit 02-24-2009 02:54 PM

peace fucker

Angry Jew Cat - Banned for Life 02-24-2009 03:47 PM

hrmmm, if i could trim all points before and after a set point in a text file somehow i could make this work. any ideas?

Tempest 02-24-2009 05:09 PM

Don't know why you're using Tokeparser for this...
Code:

$lchtml = lc($html);
$start = index($lchtml, '<table');
$end = index($lchtml, '</table>') + 8;
$table = substr($html, $start, $end - $start);


fris 02-24-2009 05:09 PM

why not use php. its prob about 3 lines one to grab the content and a regex

Alky 02-24-2009 05:28 PM

Quote:

Originally Posted by fris (Post 15544755)
why not use php. its prob about 3 lines one to grab the content and a regex

perl can use regex....

Tempest 02-24-2009 05:31 PM

Quote:

Originally Posted by Alky (Post 15544807)
perl can use regex....

Code:

$html =~ /(<table.+?<\/table>)/si;
$table = $1;


Angry Jew Cat - Banned for Life 02-24-2009 05:45 PM

Quote:

Originally Posted by Tempest (Post 15544753)
Don't know why you're using Tokeparser for this...
Code:

$lchtml = lc($html);
$start = index($lchtml, '<table');
$end = index($lchtml, '</table>') + 8;
$table = substr($html, $start, $end - $start);


I don't know why I'm using TokeParser for this either. I've just been on a google mission all day slowly stepping closer to getting my goal achieved. I'm pretty new to Perl, and real coding in general. I had a good Perl mentor for a while, but he up and disappeared for the last few days so I've been busting ass on google trying to accomplish my aims here. Never tried to manipulate a page before.

I got some help and have achieved my goal using HTML::TreeBuilder though. Everything has been running just skippy. I'm gonna look mor einto what you've given me though, looks as though it'd shave a few lines off my code...

Angry Jew Cat - Banned for Life 02-24-2009 06:12 PM

Cheers Tempest, shaved 7 lines of code off, and is much easier to remember then the TreeBuilder method for future use.

Tempest 02-24-2009 07:48 PM

If you're going to do quite a bit of Perl, I'd recomend you get and read this book http://oreilly.com/catalog/9780596520106/ and then the rest in the series... You might also want to check out this downloadable book http://www.perl.org/books/beginning-perl/ or perhaps this site http://www.perltutorial.org/

Angry Jew Cat - Banned for Life 02-24-2009 07:49 PM

Quote:

Originally Posted by Tempest (Post 15545247)
If you're going to do quite a bit of Perl, I'd recomend you get and read this book http://oreilly.com/catalog/9780596520106/ and then the rest in the series... You might also want to check out this downloadable book http://www.perl.org/books/beginning-perl/ or perhaps this site http://www.perltutorial.org/

i forgot my llama during my last move, but i agree, its a great book. i wish i had it with me :(


All times are GMT -7. The time now is 04:01 PM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123