GoFuckYourself.com - Adult Webmaster Forum - need to de-dupe keyword list... solution?

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)

- Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)

- - need to de-dupe keyword list... solution? (https://gfy.com/showthread.php?t=830337)

Mr Pheer

05-23-2008 06:57 AM

need to de-dupe keyword list... solution?

I have a list of keywords, one phrase or keyword per line

the list has alot of duplicates... whats the best way to strip them out?

help please :helpme

gornyhuy

05-23-2008 07:40 AM

One approach:
-Import to excel
-sort alphabetically
-run a formula comparing each entry to the one above and below, and mark it as a dupe (or delete it)
-for example =IF(OR(A3=A4,A3=A2),"Duplicate","")
-then sort by the duplicate status and delete

ish.

gornyhuy

05-23-2008 07:46 AM

Here is a less manual excel approach that I haven't tested, but it looks damn sexy:
http://www.rondebruin.nl/easyfilter.htm

Mr Pheer

05-23-2008 07:46 AM

what about a solution for people that dont have excel?

I dont have any office applications

mrkris

05-23-2008 07:48 AM

If you have access to *nix, try:

$ cat list.txt|uniq > newlist.txt

severe

05-23-2008 07:55 AM

in excel you dont need a formula to remove dupes, theres a feature to show only non dupes. in older versions its called something like 'show original content' in 2007 under data tab its just called remove dupes

Mr Pheer

05-23-2008 07:58 AM

Quote:

Originally Posted by mrkris (Post 14227587)

If you have access to *nix, try:

$ cat list.txt|uniq > newlist.txt

I tried that on freebsd and it just made a copy of the same file with a new name

Mr Pheer

05-23-2008 07:59 AM

someone help me out with the syntax error on line 18 please?

Code:

#!/usr/bin/perl

use strict;



my $FileName = 'file.txt'; # Modify file name as needed.



my(@List,%List,@NewList)= ();



sub Abandon

{

print join '

',@_;

exit;

} # sub Abandon



print "Content-type: text/plain\n\n";



Abandon("Unable to read file $FileName") unless open R,"<$FileName";

@List = ;

close R;



Abandon("Unable to create temporary file ${FileName}.tmp.txt") unless open W,">${FileName}.tmp.txt";

for(@List) { print W $_; }

close W;



for(@List)

{

next if $List{$_};

$List{$_}++;

push @NewList,$_;

}



Abandon('Something wrong.',"Backup file is ${FileName}.tmp.txt") unless open W,">$FileName";

for(@NewList) { print W $_; }

close W;



unlink "${FileName}.tmp.txt";



print 'D O N E';

react

05-23-2008 08:07 AM

You must sort before you can uniq:

cat infile | sort | uniq > outputfile

Mr Pheer

05-23-2008 08:11 AM

Quote:

Originally Posted by react (Post 14227697)

You must sort before you can uniq:

cat infile | sort | uniq > outputfile

w00t!!!

thanks man :)

gornyhuy

05-23-2008 08:30 AM

While we are on the subject, does anybody have a good query for deduping mysql tables across multiple fields?

react

05-23-2008 09:56 AM

That multiple fields bit isn't super clear.. but if you want to combine data in several columns of one table into a single unique column create a new table with one column that has unique index on it. Then for each of the columns in the old table:

insert ignore into newtable (newcolumn) select oldcolumn1 from oldtable;
insert ignore into newtable (newcolumn) select oldcolumn2 from oldtable;

If you just want to keep all unique rows then create new table with the same column structure, create a unique index across all columns, then:

insert ignore into newtable select * from oldtable

rowan

05-23-2008 01:39 PM

Quote:

Originally Posted by react (Post 14227697)

You must sort before you can uniq:

cat infile | sort | uniq > outputfile

No need for uniq in that case... or cat :)

sort -u infile > outfile

All times are GMT -7. The time now is 07:23 AM.