Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar Mark Forums Read
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 02-11-2012, 12:38 PM   #1
Barry-xlovecam
It's 42
 
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
Damn Duplicates -- p00f begone!

I wrote a fast file cleaner the grep use is interesting ...
Clean your lists up, etc. ...

Code:
#!/usr/bin/perl
####################################
# nodupes.cgi
#you may use free as-is with  no warranty
#make the outfile's chmod 666 if using the webserver
#chmod this script to 755
####################################
use CGI::Carp qw/fatalsToBrowser/;
use CGI qw/:standard/;

print "Content-type: text/html\n\n";

my $query = "$ENV{'QUERY_STRING'}";
	if ($query =~ s/[^a-zA-Z0-9\_]//g) {print qq~HUH???~;       exit;}

my $infile="somesiteurl.txt";
my $outfile="somesiteurlduped.txt";

   open(INPUT, "<", $infile) || die "infile not found\n";
      my @array=(<INPUT>);
   open(OUTPUT, ">>",$outfile )|| die "outfile not found\n";


         my %seen = ();
         my @unique = grep { ! $seen{ $_ }++ } @array;

            foreach my $unique(@unique){
                  chomp $unique;
                  print OUTPUT "$unique\n"
                    }

close OUTPUT;
close INPUT;

Last edited by Barry-xlovecam; 02-11-2012 at 12:39 PM..
Barry-xlovecam is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-11-2012, 02:33 PM   #2
V_RocKs
Damn Right I Kiss Ass!
 
Industry Role:
Join Date: Dec 2003
Location: Cowtown, USA
Posts: 32,409
Nice.... reminds me of 2000...
V_RocKs is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-11-2012, 02:52 PM   #3
mikke
Confirmed User
 
mikke's Avatar
 
Industry Role:
Join Date: Jan 2010
Location: Europe
Posts: 1,327
can you port it to brainfuck?
__________________
icq: 395 294 346
http://www.adultsubmitter.eu - submit any adult site to 20 directories from 1 form!
now 20 domains!
http://www.porndeals.eu http://www.ebonybangbros.com
mikke is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-11-2012, 04:28 PM   #4
Barry-xlovecam
It's 42
 
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
Quote:
Originally Posted by mikke View Post
can you port it to brainfuck?
Something understandable ...
Barry-xlovecam is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-11-2012, 06:52 PM   #5
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
Quote:
Originally Posted by Barry-xlovecam View Post
Something understandable ...
or you could just use cat file.txt | sort -u

;)

much quicker
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-12-2012, 07:33 AM   #6
Barry-xlovecam
It's 42
 
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
Code:
cat infile.txt|sort -u > outfile.txt
No spaces and the outfile

That is a lot easier fris, ty
Barry-xlovecam is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-12-2012, 07:47 AM   #7
alextokyo
So Fucking Banned
 
Industry Role:
Join Date: Sep 2011
Posts: 975
Someone said something about a poof?


alextokyo is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-12-2012, 08:08 AM   #8
Barry-xlovecam
It's 42
 
Industry Role:
Join Date: Jun 2010
Location: Global
Posts: 18,083
p00f not poof -- comprehension problems?
Barry-xlovecam is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 02-12-2012, 08:29 AM   #9
fris
Too lazy to set a custom title
 
fris's Avatar
 
Industry Role:
Join Date: Aug 2002
Posts: 55,372
Quote:
Originally Posted by Barry-xlovecam View Post
Code:
cat infile.txt|sort -u > outfile.txt
No spaces and the outfile

That is a lot easier fris, ty
awk way to do it, to remove dups without sorting

Code:
awk '!x[$0]++' file.txt
perl without sorting

Code:
perl -ne 'print if !$a{$_}++'
this would remove dupe entries on a file with a single column

awk

Code:
awk '{ if ($1 in stored_lines) x=1; else print; stored_lines[$1]=1 }' infile.txt > outfile.txt
perl

Code:
perl -ane 'print unless $x{$F[0]}++' infile > outfile
sunday gfy bonus

count and show duplicate file names

Code:
find . -type f  |sed "s#.*/##g" |sort |uniq -c -d
extra bonus

fild duplicate files based on filesize, then md5 hash

Code:
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
__________________
Since 1999: 69 Adult Industry awards for Best Hosting Company and professional excellence.


WP Stuff
fris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks
Thread Tools



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.