Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar Mark Forums Read
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 05-23-2008, 06:57 AM   #1
Mr Pheer
Retired
 
Mr Pheer's Avatar
 
Industry Role:
Join Date: Dec 2002
Posts: 21,246
need to de-dupe keyword list... solution?

I have a list of keywords, one phrase or keyword per line

the list has alot of duplicates... whats the best way to strip them out?

help please
__________________
2 lifeguards for Jessica
Mr Pheer is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 07:40 AM   #2
gornyhuy
Chafed.
 
gornyhuy's Avatar
 
Join Date: May 2002
Location: Face Down in Pussy
Posts: 18,041
One approach:
-Import to excel
-sort alphabetically
-run a formula comparing each entry to the one above and below, and mark it as a dupe (or delete it)
-for example =IF(OR(A3=A4,A3=A2),"Duplicate","")
-then sort by the duplicate status and delete

ish.
__________________

icq:159548293
gornyhuy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 07:46 AM   #3
gornyhuy
Chafed.
 
gornyhuy's Avatar
 
Join Date: May 2002
Location: Face Down in Pussy
Posts: 18,041
Here is a less manual excel approach that I haven't tested, but it looks damn sexy:
http://www.rondebruin.nl/easyfilter.htm
__________________

icq:159548293
gornyhuy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 07:46 AM   #4
Mr Pheer
Retired
 
Mr Pheer's Avatar
 
Industry Role:
Join Date: Dec 2002
Posts: 21,246
what about a solution for people that dont have excel?

I dont have any office applications
__________________
2 lifeguards for Jessica
Mr Pheer is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 07:48 AM   #5
mrkris
Confirmed User
 
Join Date: May 2005
Posts: 2,737
If you have access to *nix, try:

$ cat list.txt|uniq > newlist.txt
__________________

PHP-MySQL-Rails | ICQ: 342500546
mrkris is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 07:55 AM   #6
severe
Confirmed User
 
Industry Role:
Join Date: Dec 2007
Posts: 331
in excel you dont need a formula to remove dupes, theres a feature to show only non dupes. in older versions its called something like 'show original content' in 2007 under data tab its just called remove dupes
severe is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 07:58 AM   #7
Mr Pheer
Retired
 
Mr Pheer's Avatar
 
Industry Role:
Join Date: Dec 2002
Posts: 21,246
Quote:
Originally Posted by mrkris View Post
If you have access to *nix, try:

$ cat list.txt|uniq > newlist.txt
I tried that on freebsd and it just made a copy of the same file with a new name
__________________
2 lifeguards for Jessica
Mr Pheer is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 07:59 AM   #8
Mr Pheer
Retired
 
Mr Pheer's Avatar
 
Industry Role:
Join Date: Dec 2002
Posts: 21,246
someone help me out with the syntax error on line 18 please?

Code:
#!/usr/bin/perl
use strict;

my $FileName = 'file.txt'; # Modify file name as needed.

my(@List,%List,@NewList)= ();

sub Abandon
{
print join '
',@_;
exit;
} # sub Abandon

print "Content-type: text/plain\n\n";

Abandon("Unable to read file $FileName") unless open R,"<$FileName";
@List = ;
close R;

Abandon("Unable to create temporary file ${FileName}.tmp.txt") unless open W,">${FileName}.tmp.txt";
for(@List) { print W $_; }
close W;

for(@List)
{
next if $List{$_};
$List{$_}++;
push @NewList,$_;
}

Abandon('Something wrong.',"Backup file is ${FileName}.tmp.txt") unless open W,">$FileName";
for(@NewList) { print W $_; }
close W;

unlink "${FileName}.tmp.txt";

print 'D O N E';
__________________
2 lifeguards for Jessica
Mr Pheer is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 08:07 AM   #9
react
Confirmed User
 
Industry Role:
Join Date: Sep 2003
Location: NZ
Posts: 673
You must sort before you can uniq:

cat infile | sort | uniq > outputfile
__________________
--
react
react is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 08:11 AM   #10
Mr Pheer
Retired
 
Mr Pheer's Avatar
 
Industry Role:
Join Date: Dec 2002
Posts: 21,246
Quote:
Originally Posted by react View Post
You must sort before you can uniq:

cat infile | sort | uniq > outputfile
w00t!!!

thanks man
__________________
2 lifeguards for Jessica
Mr Pheer is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 08:30 AM   #11
gornyhuy
Chafed.
 
gornyhuy's Avatar
 
Join Date: May 2002
Location: Face Down in Pussy
Posts: 18,041
While we are on the subject, does anybody have a good query for deduping mysql tables across multiple fields?
__________________

icq:159548293
gornyhuy is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 09:56 AM   #12
react
Confirmed User
 
Industry Role:
Join Date: Sep 2003
Location: NZ
Posts: 673
That multiple fields bit isn't super clear.. but if you want to combine data in several columns of one table into a single unique column create a new table with one column that has unique index on it. Then for each of the columns in the old table:

insert ignore into newtable (newcolumn) select oldcolumn1 from oldtable;
insert ignore into newtable (newcolumn) select oldcolumn2 from oldtable;

If you just want to keep all unique rows then create new table with the same column structure, create a unique index across all columns, then:

insert ignore into newtable select * from oldtable
__________________
--
react
react is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 05-23-2008, 01:39 PM   #13
rowan
Too lazy to set a custom title
 
Join Date: Mar 2002
Location: Australia
Posts: 17,393
Quote:
Originally Posted by react View Post
You must sort before you can uniq:

cat infile | sort | uniq > outputfile
No need for uniq in that case... or cat

sort -u infile > outfile
rowan is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks
Thread Tools



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.