View Single Post
Old 08-09-2010, 07:56 AM  
Varius
Confirmed User
 
Industry Role:
Join Date: Jun 2004
Location: New York, NY
Posts: 6,890
3) QUERIES

Once you have a solid table structure and proper indexing, your battle is nearly won. Nearly, but not quite. Badly-formed queries are the second most common case for slow performance after the lack of indexing. Often, there are numerous methods for achieving the same result set. This does not mean they are all the right way to query your database. In fact, a single malformed query can even crash the most powerful of databases if executed; such is the power they wield.

Using Explain

In MySQL, the EXPLAIN command is one of the most important tools at your disposal. Unfortunately, it won't do much good until you have a realistic data environment; to that end, if your site is in preproduction, you should try and write a quick script to populate the tables with as good a sampling of actual expected data as you can. Once you have a realistic demo environment, EXPLAIN every single one of your SELECT queries. ALL of them. Not only will it help you find potential typos or syntax errors, but it will analyze how the MySQL Optimizer sees your query and which Indexes it will use to perform the query.

Generally, you want to make sure that MySQL is using the indexes you want it to use and that results are simple and short. You also want to avoid using "Filesort" or "Temporary Tables" (under the 'extra' column of the EXPLAIN command) whenever possible; there are some cases where it's unavoidable though. Sometimes, EXPLAIN will be using the wrong index - when this happens, consider if adding a new multi-column index can fix things. If not, you may possibly need FORCE INDEX in your query to help "guide" the optimizer (it isn't perfect). Lastly, if you just cannot get the result you want, it may be time to examine other alternatives such as a different table structure or breaking your super-query into multiple, simpler queries. Trial and error is the absolute key to all of this.

Joins and Sub-queries

When you need to query multiple tables at once, you usually end up performing a JOIN. There are different types that suit different situations, but for the most part, you will use the default JOIN (or INNER JOIN); where you virtually join two tables via a matching key. Rows that do not match are excluded. Try and make sure the column you perform a join on is indexed and of the same data type in both tables (integer if possible). If it's a primary key, even better. JOINs can get very complicated very quickly; a good rule of thumb is, if you can no longer even understand the query you have wrote by looking at it - you have made things too complicated and performance may suffer. As stated above, your best friend for diagnosing wayward JOINs is EXPLAIN.

Sub-queries, on the other hand, should be used sparingly and in specific situations only. Otherwise, when selecting large result sets, you may end up performing thousands of queries without even realizing it. I typically only use them to compare against a singular column in a large table, or to select out a certain field or fields from one of my non-indexed information tables (see the profile example above).

Select * From

This is done all the time; when in fact, it should rarely (or never) be used. The obvious problem is, many times, you do not need to select all data a table contains; you only require certain fields. Selecting out additional data for no reason, is a waste of system resources and can contribute to network or delivery delays. The second reason, is with regards to your code. I know that when I am writing code, it is much easier to work when I have the available field names right in front of me, in the query. It will be for other people as well; especially those who may not have database access. Readability is a key component of efficient coding and using Select * goes against that completely.

Order By and Rand()

When trying to select a random group of results, many people use the ORDER BY rand() bit of SQL to achieve the desired result. Depending on your data, this can utilize a lot of processing power and thus become slow. If you know the number of rows in your table, you can use a random value less than that as an offset - speeding up your query greatly. If you need multiple results, however, that method won't work unless you select the random results out one by one using the previous method. In the right circumstances, performing 10 separate queries can be much faster than retrieving 10 random records from a very large table with a single query. Trial and error, EXPLAIN and you'll be fine.

Limit 1

Sometimes, you know you are seeking for just one specific record; or you only want to fetch a single record. In these cases, it's best to add a LIMIT 1 clause at the end of your query. This will boost performance by stopping immediately after the first match it found, as opposed to scanning all possible results. It's the little things like this that can add up to big performance savings.

4) MYISAM & INNODB

INNODB is the default MySQL database engine. It does have multiple drawbacks and while engine comparisons can become quite lengthy and argumentative, I will simply tell you instead when I find myself using them. I recommend using INNODB tables on very small, compact tables that are rarely changed (INSERTED, UPDATED, DELETED). For such cases, I generally only end up using it for reference tables (for example, the interests table described up above). For everything else, I recommend InnoDB. The key to using InnoDB successfully is to configure its settings properly. If you leave things to the default, you may find InnoDB causes a lot more harm than good. Finely-tuned (and that goes for all your my.cnf settings in general), you'll be able to store and query much larger tables at much larger speeds. You'll be able to better utilize indexes, joins and everything you have grown to love (or hate) about SQL.

Please do remember, that if your database becomes corrupted (generally due to a crash), upon restarting the InnoDB tables will attempt to repair themselves automatically. The same is not true for INNODB; make sure you don't forget about them or as soon as a query hits one of those tables, you may find yourself in trouble again.

As for the actual my.cnf settings file, it truly is a case-by-case basis. Hardware and software both come into play heavily, so my best advice is, if you aren't sure of what you are doing with it - hire someone to help.

5) CONCLUSION

There is so much to this topic, that I could go on for a year. The above should give you a broad overview, with a few specific examples and recommendations that may help you out. As mentioned, I'd be happy to cover the more advanced topics in a future article if the demand is there. I didn't want this particular article to alienate 95% of you reading it, though

For the record, I have dealt with multiple projects involving MySQL and PostgreSQL (I have actually worked for a time with one of the developers of PostgreSQL, attempting to implement a few optimizations into their engine), involving millions upon millions of records and some very complex queries. Clusters/replication, advanced procedures/triggers, constraints?that said, I still learn new tricks practically every week. You will never stop learning a better way to do something.

If anyone has any questions, feel free to ask away below and I'll answer them to the best of my ability.
__________________
Skype variuscr - Email varius AT gmail
Varius is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote