GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   need to de-dupe keyword list... solution? (https://gfy.com/showthread.php?t=830337)

Mr Pheer 05-23-2008 06:57 AM

need to de-dupe keyword list... solution?
 
I have a list of keywords, one phrase or keyword per line

the list has alot of duplicates... whats the best way to strip them out?

help please :helpme

gornyhuy 05-23-2008 07:40 AM

One approach:
-Import to excel
-sort alphabetically
-run a formula comparing each entry to the one above and below, and mark it as a dupe (or delete it)
-for example =IF(OR(A3=A4,A3=A2),"Duplicate","")
-then sort by the duplicate status and delete

ish.

gornyhuy 05-23-2008 07:46 AM

Here is a less manual excel approach that I haven't tested, but it looks damn sexy:
http://www.rondebruin.nl/easyfilter.htm

Mr Pheer 05-23-2008 07:46 AM

what about a solution for people that dont have excel?

I dont have any office applications

mrkris 05-23-2008 07:48 AM

If you have access to *nix, try:

$ cat list.txt|uniq > newlist.txt

severe 05-23-2008 07:55 AM

in excel you dont need a formula to remove dupes, theres a feature to show only non dupes. in older versions its called something like 'show original content' in 2007 under data tab its just called remove dupes

Mr Pheer 05-23-2008 07:58 AM

Quote:

Originally Posted by mrkris (Post 14227587)
If you have access to *nix, try:

$ cat list.txt|uniq > newlist.txt

I tried that on freebsd and it just made a copy of the same file with a new name

Mr Pheer 05-23-2008 07:59 AM

someone help me out with the syntax error on line 18 please?

Code:

#!/usr/bin/perl
use strict;

my $FileName = 'file.txt'; # Modify file name as needed.

my(@List,%List,@NewList)= ();

sub Abandon
{
print join '
',@_;
exit;
} # sub Abandon

print "Content-type: text/plain\n\n";

Abandon("Unable to read file $FileName") unless open R,"<$FileName";
@List = ;
close R;

Abandon("Unable to create temporary file ${FileName}.tmp.txt") unless open W,">${FileName}.tmp.txt";
for(@List) { print W $_; }
close W;

for(@List)
{
next if $List{$_};
$List{$_}++;
push @NewList,$_;
}

Abandon('Something wrong.',"Backup file is ${FileName}.tmp.txt") unless open W,">$FileName";
for(@NewList) { print W $_; }
close W;

unlink "${FileName}.tmp.txt";

print 'D O N E';


react 05-23-2008 08:07 AM

You must sort before you can uniq:

cat infile | sort | uniq > outputfile

Mr Pheer 05-23-2008 08:11 AM

Quote:

Originally Posted by react (Post 14227697)
You must sort before you can uniq:

cat infile | sort | uniq > outputfile

w00t!!!

thanks man :)

gornyhuy 05-23-2008 08:30 AM

While we are on the subject, does anybody have a good query for deduping mysql tables across multiple fields?

react 05-23-2008 09:56 AM

That multiple fields bit isn't super clear.. but if you want to combine data in several columns of one table into a single unique column create a new table with one column that has unique index on it. Then for each of the columns in the old table:

insert ignore into newtable (newcolumn) select oldcolumn1 from oldtable;
insert ignore into newtable (newcolumn) select oldcolumn2 from oldtable;

If you just want to keep all unique rows then create new table with the same column structure, create a unique index across all columns, then:

insert ignore into newtable select * from oldtable

rowan 05-23-2008 01:39 PM

Quote:

Originally Posted by react (Post 14227697)
You must sort before you can uniq:

cat infile | sort | uniq > outputfile

No need for uniq in that case... or cat :)

sort -u infile > outfile


All times are GMT -7. The time now is 07:23 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123