GoFuckYourself.com - Adult Webmaster Forum - Damn Duplicates -- p00f begone!

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)

- Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)

- - Damn Duplicates -- p00f begone! (https://gfy.com/showthread.php?t=1057221)

Barry-xlovecam

02-11-2012 12:38 PM

Damn Duplicates -- p00f begone!

I wrote a fast file cleaner the grep use is interesting ...
Clean your lists up, etc. ...

Code:

#!/usr/bin/perl #################################### # nodupes.cgi #you may use free as-is with no warranty #make the outfile's chmod 666 if using the webserver #chmod this script to 755 #################################### use CGI::Carp qw/fatalsToBrowser/; use CGI qw/:standard/; print "Content-type: text/html\n\n"; my $query = "$ENV{'QUERY_STRING'}"; if ($query =~ s/[^a-zA-Z0-9\_]//g) {print qq~HUH???~; exit;} my $infile="somesiteurl.txt"; my $outfile="somesiteurlduped.txt"; open(INPUT, "<", $infile) || die "infile not found\n"; my @array=(<INPUT>); open(OUTPUT, ">>",$outfile )|| die "outfile not found\n"; my %seen = (); my @unique = grep { ! $seen{ $_ }++ } @array; foreach my $unique(@unique){ chomp $unique; print OUTPUT "$unique\n" } close OUTPUT; close INPUT;

V_RocKs

02-11-2012 02:33 PM

Nice.... reminds me of 2000...

mikke

02-11-2012 02:52 PM

can you port it to brainfuck?

Barry-xlovecam

02-11-2012 04:28 PM

Quote:

Originally Posted by mikke (Post 18753155)

can you port it to brainfuck?

Something understandable ...

fris	02-11-2012 06:52 PM

Quote:

Originally Posted by Barry-xlovecam (Post 18753323)

Something understandable ...

or you could just use cat file.txt | sort -u

;)

much quicker

Barry-xlovecam

02-12-2012 07:33 AM

Code:

cat infile.txt|sort -u > outfile.txt

No spaces and the outfile

That is a lot easier fris, ty

alextokyo

02-12-2012 07:47 AM

Someone said something about a poof?

http://i40.tinypic.com/aua3kn.jpg

Barry-xlovecam

02-12-2012 08:08 AM

p00f not poof -- comprehension problems?

fris	02-12-2012 08:29 AM

Quote:

Originally Posted by Barry-xlovecam (Post 18754251)

Code:

cat infile.txt|sort -u > outfile.txt

No spaces and the outfile

That is a lot easier fris, ty

awk way to do it, to remove dups without sorting

Code:

awk '!x[$0]++' file.txt

perl without sorting

Code:

perl -ne 'print if !$a{$_}++'

this would remove dupe entries on a file with a single column

awk

Code:

awk '{ if ($1 in stored_lines) x=1; else print; stored_lines[$1]=1 }' infile.txt > outfile.txt

perl

Code:

perl -ane 'print unless $x{$F[0]}++' infile > outfile

sunday gfy bonus

count and show duplicate file names

Code:

find . -type f |sed "s#.*/##g" |sort |uniq -c -d

extra bonus

fild duplicate files based on filesize, then md5 hash

Code:

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

:pimp:pimp

All times are GMT -7. The time now is 10:54 AM.