GoFuckYourself.com - Adult Webmaster Forum - MYSQL: wildcard search through 40 million records.

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)

- Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)

- - MYSQL: wildcard search through 40 million records. (https://gfy.com/showthread.php?t=113885)

mrthumbs

03-07-2003 01:21 PM

MYSQL: wildcard search through 40 million records.

A record consists of 1 to 50 characters.

What hardware requirements do you recommend and what's the time-frame we are talking about to perform one search?

(should be about 6k to 15k searches per day)

Any cache modules or non-mysql methods available that are faster since the records are static?
It's for simple wildcard keyword-match searching.

WiredGuy

03-07-2003 01:25 PM

I'd like to see the answer to this too. I tried doing a large scale database infrastructure of about 18 million records (each record up to 64 characters) last year and it seemed that mysql was buckling entirely.

WG

sneaker

03-07-2003 01:30 PM

There are things you can do to improve performance with mysql, some are run it on a dedicated server for itself with plenty of ram and cpu power with possibly a SCSI drive. But for large scale applications I recommend you use MS SQL it has more "search" and "indexing" power.

:2 cents:

vending_machine

03-07-2003 02:21 PM

Can you split up the databases? If you can, have a dedicated mysql server, put each split database on its own 15k rpm SCSI drive.

You should also look into caching results on the frontend as well as indexing your tables.

btw; What datatype is your 1-50 char field? varchar vs. char both have benefits so you should look into which one works the best for your application.

XXXManager

03-07-2003 02:26 PM

I wouldn't adopt Sneaker's recommendadtion about database switch. especially if you care about price of hardware (as well as software) - especially since full-text search isn't the strongest side of MS-SQL as well.

Your question requires clarification..
Are you talking about full matches or partial wildcard matches?
Are you talking about text only search or co-searching with other fields?
What is your column type atm? text, char, varchar?

As a first step I would consider using full-text search especially if you can work with 4.01 since its full-text search capabilities and it has got caching for query results.

As to other databases. If you are moving to another tool, I would recommend using a special tool for text indexing and searching.
But in general - there are many times many shoutcuts you can make to avoid moving to expensive tools.

DrGuile

03-07-2003 02:33 PM

shouldnt be too much of a problem if that all that box does and that record is indexed. (and its 'selects' only)

also depends how many records your query will return everytime...

buran

03-07-2003 02:35 PM

I'm with XXXManager on this one. You could use MS-SQL. And you could also have it shutdown your network when it gets rooted again.

The biggest factor is going to be your datatype, the wildcard scope, and if you're going to be able to index it.

You might also take a look at postgres if you think mysql just can't do it for you.

XXXManager

03-07-2003 02:47 PM

Quote:

Originally posted by DrGuile
shouldnt be too much of a problem if that all that box does and that record is indexed. (and its 'selects' only)
also depends how many records your query will return everytime...

Wrong. 2GB max data in 40M rows is not a very easy task for text search out of the hat.
As to the indexing and returned records amount. Assuming you don't count the data returned - it does not matter too much if the result is 40 rows or 40K rows. Once indexed selects work by the index which is not a sequential search. The data passed back is an issue though but noone passes back (from the DB) more than 100 (lets say) rows at one time.

When I say there are options to override the problem I mean things like caching most popular searches etc.
I assume (statistically speaking) that 80% of the searches are with the same 20% or the search terms. Just a guess - but the dude that asked the question don't seem to care about our replies and questions :(

extreme

03-07-2003 03:00 PM

I don't think you can carry this is of cheaply.
I'm beginning to see problems with fulltext searching through 120k rows. But then thats with a search per second or so.

You will need enough mem to keep the index in memory otherwise you are toast.

You probably must go with a FULLTEXT index, wildcard searching will be too slow.

mrthumbs

03-07-2003 03:17 PM

:thumbsup

thanks for the responses so far:

>Are you talking about full matches or partial wildcard matches?

partial matches as wel!
in example searching for 'ever' should return
ever, whatever, everglades etc etc

>Are you talking about text only search or co-searching with
>other fields?

It's just ONE column i need to search for partial wildcard matches!
The database just consists of a column with 40 million
'phrases'. No special chars or 'double entries per record'.

>What is your column type atm? text, char, varchar?

It's VARCHAR[50] or any other datatype that would be appropiate: doesnt matter.

MrPopup

03-07-2003 03:18 PM

[QUOTE]Originally posted by WiredGuy
[B]I'd like to see the answer to this too. I tried doing a large scale database infrastructure of about 18 million records (each record up to 64 characters) last year and it seemed that mysql was buckling entirely.

Is this for spam purposes? What kind of db holds that many records and for what?

mrthumbs

03-07-2003 03:21 PM

Second thing is: it has to run on FreeBSD. Any Microsoft platform
solution is not an option.

Unfortunatly every search phrase is random so

" assume (statistically speaking) that 80% of the searches are with the same 20% or the search terms. "

this is not the case with my issue :-(

mrthumbs

03-07-2003 03:21 PM

[QUOTE]Originally posted by MrPopup
[B]

Quote:

Originally posted by WiredGuy

Is this for spam purposes? What kind of db holds that many records and for what?

:1orglaugh go back to your linux desktop issues kid.

SpaceAce

03-07-2003 03:38 PM

Quote:

Originally posted by extreme
I don't think you can carry this is of cheaply.
I'm beginning to see problems with fulltext searching through 120k rows. But then thats with a search per second or so.

You will need enough mem to keep the index in memory otherwise you are toast.

You probably must go with a FULLTEXT index, wildcard searching will be too slow.

What's your hardware? I wouldn't expect MySQL to give you trouble with 120K records, even at 1 search per second.

SpaceAce

XXXManager

03-07-2003 03:49 PM

Quote:

Originally posted by mrthumbs
Unfortunatly every search phrase is random so
" assume (statistically speaking) that 80% of the searches are with the same 20% or the search terms. "
this is not the case with my issue :-(

I dont see how - unless its not humans searching :)
Human beings usually look for the same things :Graucho
ANYWAY - I suggest use special text indexing mechanism. A non-SQL solution will be best for that one since you seem to be talking about that has nothing to do with DB anyway so I dont see the point. There are many solutions for that :2 cents:

mrthumbs

03-07-2003 03:53 PM

Quote:

Originally posted by XXXManager

I dont see how - unless its not humans searching :)
Human beings usually look for the same things :Graucho
ANYWAY - I suggest use special text indexing mechanism. A non-SQL solution will be best for that one since you seem to be talking about that has nothing to do with DB anyway so I dont see the point. There are many solutions for that :2 cents:

Hehe.. it are non human searches :Graucho

And like i stated in my original question
"Any cache modules or non-mysql methods available?"

So a 'special text indexing mechanism' IS exactly
what im looking for but do you have a name for it?

If you have a sec: hook me up on icq.. would appreciate it a lot.
160272407

mrthumbs

03-11-2003 07:27 AM

bumpie

XXXManager

03-11-2003 07:55 AM

I contacted you on ICQ (You seem to not respond)

kad	03-11-2003 08:27 AM

This might sound odd, but RAM (last time i checked) is really cheap.

Wack in 4Gig of ram, create a 3gig ramdisk and replicate your database onto it.

This is a bad solution if you want to scale your dataset past physical RAM limitations :)

I have seen a similar working in a large scale commercial environment.

(edited for grammer) :(

All times are GMT -7. The time now is 11:41 AM.