![]() |
MYSQL: wildcard search through 40 million records.
A record consists of 1 to 50 characters.
What hardware requirements do you recommend and what's the time-frame we are talking about to perform one search? (should be about 6k to 15k searches per day) Any cache modules or non-mysql methods available that are faster since the records are static? It's for simple wildcard keyword-match searching. |
I'd like to see the answer to this too. I tried doing a large scale database infrastructure of about 18 million records (each record up to 64 characters) last year and it seemed that mysql was buckling entirely.
WG |
There are things you can do to improve performance with mysql, some are run it on a dedicated server for itself with plenty of ram and cpu power with possibly a SCSI drive. But for large scale applications I recommend you use MS SQL it has more "search" and "indexing" power.
:2 cents: |
Can you split up the databases? If you can, have a dedicated mysql server, put each split database on its own 15k rpm SCSI drive.
You should also look into caching results on the frontend as well as indexing your tables. btw; What datatype is your 1-50 char field? varchar vs. char both have benefits so you should look into which one works the best for your application. |
I wouldn't adopt Sneaker's recommendadtion about database switch. especially if you care about price of hardware (as well as software) - especially since full-text search isn't the strongest side of MS-SQL as well.
Your question requires clarification.. Are you talking about full matches or partial wildcard matches? Are you talking about text only search or co-searching with other fields? What is your column type atm? text, char, varchar? As a first step I would consider using full-text search especially if you can work with 4.01 since its full-text search capabilities and it has got caching for query results. As to other databases. If you are moving to another tool, I would recommend using a special tool for text indexing and searching. But in general - there are many times many shoutcuts you can make to avoid moving to expensive tools. |
shouldnt be too much of a problem if that all that box does and that record is indexed. (and its 'selects' only)
also depends how many records your query will return everytime... |
I'm with XXXManager on this one. You could use MS-SQL. And you could also have it shutdown your network when it gets rooted again.
The biggest factor is going to be your datatype, the wildcard scope, and if you're going to be able to index it. You might also take a look at postgres if you think mysql just can't do it for you. |
Quote:
As to the indexing and returned records amount. Assuming you don't count the data returned - it does not matter too much if the result is 40 rows or 40K rows. Once indexed selects work by the index which is not a sequential search. The data passed back is an issue though but noone passes back (from the DB) more than 100 (lets say) rows at one time. When I say there are options to override the problem I mean things like caching most popular searches etc. I assume (statistically speaking) that 80% of the searches are with the same 20% or the search terms. Just a guess - but the dude that asked the question don't seem to care about our replies and questions :( |
I don't think you can carry this is of cheaply.
I'm beginning to see problems with fulltext searching through 120k rows. But then thats with a search per second or so. You will need enough mem to keep the index in memory otherwise you are toast. You probably must go with a FULLTEXT index, wildcard searching will be too slow. |
:thumbsup
thanks for the responses so far: >Are you talking about full matches or partial wildcard matches? partial matches as wel! in example searching for 'ever' should return ever, whatever, everglades etc etc >Are you talking about text only search or co-searching with >other fields? It's just ONE column i need to search for partial wildcard matches! The database just consists of a column with 40 million 'phrases'. No special chars or 'double entries per record'. >What is your column type atm? text, char, varchar? It's VARCHAR[50] or any other datatype that would be appropiate: doesnt matter. |
[QUOTE]Originally posted by WiredGuy
[B]I'd like to see the answer to this too. I tried doing a large scale database infrastructure of about 18 million records (each record up to 64 characters) last year and it seemed that mysql was buckling entirely. Is this for spam purposes? What kind of db holds that many records and for what? |
Second thing is: it has to run on FreeBSD. Any Microsoft platform
solution is not an option. Unfortunatly every search phrase is random so " assume (statistically speaking) that 80% of the searches are with the same 20% or the search terms. " this is not the case with my issue :-( |
[QUOTE]Originally posted by MrPopup
[B] Quote:
|
Quote:
SpaceAce |
Quote:
Human beings usually look for the same things :Graucho ANYWAY - I suggest use special text indexing mechanism. A non-SQL solution will be best for that one since you seem to be talking about that has nothing to do with DB anyway so I dont see the point. There are many solutions for that :2 cents: |
Quote:
Hehe.. it are non human searches :Graucho And like i stated in my original question "Any cache modules or non-mysql methods available?" So a 'special text indexing mechanism' IS exactly what im looking for but do you have a name for it? If you have a sec: hook me up on icq.. would appreciate it a lot. 160272407 |
bumpie
|
I contacted you on ICQ (You seem to not respond)
|
This might sound odd, but RAM (last time i checked) is really cheap.
Wack in 4Gig of ram, create a 3gig ramdisk and replicate your database onto it. This is a bad solution if you want to scale your dataset past physical RAM limitations :) I have seen a similar working in a large scale commercial environment. (edited for grammer) :( |
| All times are GMT -7. The time now is 11:41 AM. |
Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123