Welcome to the GoFuckYourself.com - Adult Webmaster Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Post New Thread Reply

Register GFY Rules Calendar
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >
Discuss what's fucking going on, and which programs are best and worst. One-time "program" announcements from "established" webmasters are allowed.

 
Thread Tools
Old 03-07-2003, 01:21 PM   #1
mrthumbs
salad tossing sig guy
 
mrthumbs's Avatar
 
Join Date: Apr 2002
Location: mrthumbs*gmail.com
Posts: 11,702
MYSQL: wildcard search through 40 million records.

A record consists of 1 to 50 characters.

What hardware requirements do you recommend and what's the time-frame we are talking about to perform one search?

(should be about 6k to 15k searches per day)

Any cache modules or non-mysql methods available that are faster since the records are static?
It's for simple wildcard keyword-match searching.

Last edited by mrthumbs; 03-07-2003 at 01:26 PM..
mrthumbs is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 01:25 PM   #2
WiredGuy
Pounding Googlebot
 
Industry Role:
Join Date: Aug 2002
Location: Canada
Posts: 34,486
I'd like to see the answer to this too. I tried doing a large scale database infrastructure of about 18 million records (each record up to 64 characters) last year and it seemed that mysql was buckling entirely.

WG
__________________
I play with Google.
WiredGuy is online now   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 01:30 PM   #3
sneaker
So Fucking Banned
 
Join Date: Dec 2002
Posts: 251
There are things you can do to improve performance with mysql, some are run it on a dedicated server for itself with plenty of ram and cpu power with possibly a SCSI drive. But for large scale applications I recommend you use MS SQL it has more "search" and "indexing" power.


sneaker is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 02:21 PM   #4
vending_machine
Confirmed User
 
Join Date: Jun 2002
Location: Seattle
Posts: 1,062
Can you split up the databases? If you can, have a dedicated mysql server, put each split database on its own 15k rpm SCSI drive.

You should also look into caching results on the frontend as well as indexing your tables.

btw; What datatype is your 1-50 char field? varchar vs. char both have benefits so you should look into which one works the best for your application.
vending_machine is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 02:26 PM   #5
XXXManager
So Fucking Banned
 
Join Date: Mar 2002
Location: Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy
Posts: 893
I wouldn't adopt Sneaker's recommendadtion about database switch. especially if you care about price of hardware (as well as software) - especially since full-text search isn't the strongest side of MS-SQL as well.

Your question requires clarification..
Are you talking about full matches or partial wildcard matches?
Are you talking about text only search or co-searching with other fields?
What is your column type atm? text, char, varchar?

As a first step I would consider using full-text search especially if you can work with 4.01 since its full-text search capabilities and it has got caching for query results.

As to other databases. If you are moving to another tool, I would recommend using a special tool for text indexing and searching.
But in general - there are many times many shoutcuts you can make to avoid moving to expensive tools.

Last edited by XXXManager; 03-07-2003 at 02:35 PM..
XXXManager is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 02:33 PM   #6
DrGuile
Confirmed User
 
Join Date: Jan 2002
Posts: 2,025
shouldnt be too much of a problem if that all that box does and that record is indexed. (and its 'selects' only)

also depends how many records your query will return everytime...
__________________
LiveBucks / Privatefeeds - Giving you money since 1999
Up to 50% Commission!
25% Webmaster Referal
Powered by Gamma
DrGuile is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 02:35 PM   #7
buran
Confirmed User
 
Join Date: Mar 2002
Location: how'd I get here?
Posts: 264
I'm with XXXManager on this one. You could use MS-SQL. And you could also have it shutdown your network when it gets rooted again.

The biggest factor is going to be your datatype, the wildcard scope, and if you're going to be able to index it.

You might also take a look at postgres if you think mysql just can't do it for you.
__________________
[this signature intentionally left blank]
buran is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 02:47 PM   #8
XXXManager
So Fucking Banned
 
Join Date: Mar 2002
Location: Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy
Posts: 893
Quote:
Originally posted by DrGuile
shouldnt be too much of a problem if that all that box does and that record is indexed. (and its 'selects' only)
also depends how many records your query will return everytime...
Wrong. 2GB max data in 40M rows is not a very easy task for text search out of the hat.
As to the indexing and returned records amount. Assuming you don't count the data returned - it does not matter too much if the result is 40 rows or 40K rows. Once indexed selects work by the index which is not a sequential search. The data passed back is an issue though but noone passes back (from the DB) more than 100 (lets say) rows at one time.

When I say there are options to override the problem I mean things like caching most popular searches etc.
I assume (statistically speaking) that 80% of the searches are with the same 20% or the search terms. Just a guess - but the dude that asked the question don't seem to care about our replies and questions
XXXManager is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 03:00 PM   #9
extreme
Confirmed User
 
Industry Role:
Join Date: Oct 2002
Location: lalaland
Posts: 2,120
I don't think you can carry this is of cheaply.
I'm beginning to see problems with fulltext searching through 120k rows. But then thats with a search per second or so.

You will need enough mem to keep the index in memory otherwise you are toast.

You probably must go with a FULLTEXT index, wildcard searching will be too slow.
extreme is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 03:17 PM   #10
mrthumbs
salad tossing sig guy
 
mrthumbs's Avatar
 
Join Date: Apr 2002
Location: mrthumbs*gmail.com
Posts: 11,702


thanks for the responses so far:

>Are you talking about full matches or partial wildcard matches?

partial matches as wel!
in example searching for 'ever' should return
ever, whatever, everglades etc etc

>Are you talking about text only search or co-searching with
>other fields?

It's just ONE column i need to search for partial wildcard matches!
The database just consists of a column with 40 million
'phrases'. No special chars or 'double entries per record'.

>What is your column type atm? text, char, varchar?

It's VARCHAR[50] or any other datatype that would be appropiate: doesnt matter.
mrthumbs is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 03:18 PM   #11
MrPopup
Confirmed User
 
Join Date: Aug 2002
Location: on the internet
Posts: 3,783
[QUOTE]Originally posted by WiredGuy
[B]I'd like to see the answer to this too. I tried doing a large scale database infrastructure of about 18 million records (each record up to 64 characters) last year and it seemed that mysql was buckling entirely.


Is this for spam purposes? What kind of db holds that many records and for what?
__________________
<table cellspacing="0" cellpadding="3" border="1" bgcolor="#008000"><tr><td><font size=3>Gone</font></td></tr></table>
MrPopup is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 03:21 PM   #12
mrthumbs
salad tossing sig guy
 
mrthumbs's Avatar
 
Join Date: Apr 2002
Location: mrthumbs*gmail.com
Posts: 11,702
Second thing is: it has to run on FreeBSD. Any Microsoft platform
solution is not an option.

Unfortunatly every search phrase is random so

" assume (statistically speaking) that 80% of the searches are with the same 20% or the search terms. "

this is not the case with my issue :-(
mrthumbs is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 03:21 PM   #13
mrthumbs
salad tossing sig guy
 
mrthumbs's Avatar
 
Join Date: Apr 2002
Location: mrthumbs*gmail.com
Posts: 11,702
[QUOTE]Originally posted by MrPopup
[B]
Quote:
Originally posted by WiredGuy


Is this for spam purposes? What kind of db holds that many records and for what?
go back to your linux desktop issues kid.
mrthumbs is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 03:38 PM   #14
SpaceAce
Confirmed User
 
Join Date: Jul 2002
Location: Magrathea
Posts: 6,493
Quote:
Originally posted by extreme
I don't think you can carry this is of cheaply.
I'm beginning to see problems with fulltext searching through 120k rows. But then thats with a search per second or so.

You will need enough mem to keep the index in memory otherwise you are toast.

You probably must go with a FULLTEXT index, wildcard searching will be too slow.
What's your hardware? I wouldn't expect MySQL to give you trouble with 120K records, even at 1 search per second.

SpaceAce
SpaceAce is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 03:49 PM   #15
XXXManager
So Fucking Banned
 
Join Date: Mar 2002
Location: Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy
Posts: 893
Quote:
Originally posted by mrthumbs
Unfortunatly every search phrase is random so
" assume (statistically speaking) that 80% of the searches are with the same 20% or the search terms. "
this is not the case with my issue :-(
I dont see how - unless its not humans searching
Human beings usually look for the same things
ANYWAY - I suggest use special text indexing mechanism. A non-SQL solution will be best for that one since you seem to be talking about that has nothing to do with DB anyway so I dont see the point. There are many solutions for that
XXXManager is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-07-2003, 03:53 PM   #16
mrthumbs
salad tossing sig guy
 
mrthumbs's Avatar
 
Join Date: Apr 2002
Location: mrthumbs*gmail.com
Posts: 11,702
Quote:
Originally posted by XXXManager

I dont see how - unless its not humans searching
Human beings usually look for the same things
ANYWAY - I suggest use special text indexing mechanism. A non-SQL solution will be best for that one since you seem to be talking about that has nothing to do with DB anyway so I dont see the point. There are many solutions for that

Hehe.. it are non human searches

And like i stated in my original question
"Any cache modules or non-mysql methods available?"

So a 'special text indexing mechanism' IS exactly
what im looking for but do you have a name for it?

If you have a sec: hook me up on icq.. would appreciate it a lot.
160272407
mrthumbs is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-11-2003, 07:27 AM   #17
mrthumbs
salad tossing sig guy
 
mrthumbs's Avatar
 
Join Date: Apr 2002
Location: mrthumbs*gmail.com
Posts: 11,702
bumpie
mrthumbs is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-11-2003, 07:55 AM   #18
XXXManager
So Fucking Banned
 
Join Date: Mar 2002
Location: Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy
Posts: 893
I contacted you on ICQ (You seem to not respond)
XXXManager is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Old 03-11-2003, 08:27 AM   #19
kad
Confirmed User
 
Join Date: Feb 2003
Location: Brisbane, Australia
Posts: 735
This might sound odd, but RAM (last time i checked) is really cheap.

Wack in 4Gig of ram, create a 3gig ramdisk and replicate your database onto it.

This is a bad solution if you want to scale your dataset past physical RAM limitations

I have seen a similar working in a large scale commercial environment.




(edited for grammer)
__________________
lol internet.
kad is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote
Post New Thread Reply
Go Back   GoFuckYourself.com - Adult Webmaster Forum > >

Bookmarks



Advertising inquiries - marketing at gfy dot com

Contact Admin - Advertise - GFY Rules - Top

©2000-, AI Media Network Inc



Powered by vBulletin
Copyright © 2000- Jelsoft Enterprises Limited.