View Single Post
Old 08-09-2010, 07:54 AM  
Varius
Confirmed User
 
Industry Role:
Join Date: Jun 2004
Location: New York, NY
Posts: 6,890
Database Basics, Tips and Best Practices

As my third entry in the GFY Educational Series, I'd like to try and shed some light on another important topic which people often have difficulty with - database management. For most websites, the database can be considered the "core" or single-most important piece to the website puzzle; it is also one of the most-neglected. Most programmers spend their time coding; they can build a simple, functional database that interacts with their code and that's all they need. Unless they are specifically asked by their clients to do so, they don't spend time on database optimization or analysis - they delivered a working product and wish to be paid and move on to the next job. It may also be that they simply lack real-world experience. For example, you cannot realistically expect someone who has never worked with large-scale websites to understand some of the bottlenecks that only occur on million-record databases for a site getting millions of visitors.

As this article is aimed at a broad audience, who may or may not have deep technical knowledge, I will attempt to cover the more common and simple tips and best practices. If the demand if there, I would be happy to cover more advanced topics such as Clusters, Detailed Storage Engine Comparisons, Hardware and OS Benchmarking, RAID Configurations, Constraints, Triggers, Procedures, Events, Fulltext Indexing, Transactions and the like in a future article. Additionally, while much of the below subject matter relates to most database softwares, for my examples I will be referring to MySQL - as I believe PHP/MySQL to be the most popular development environment for the adult community.

1) INDEXES

Without a doubt, this is the most common reason for slow applications. I have personally seen popular softwares amongst the GFY community, built by fellow GFYers, containing MySQL tables that lack indexes completely. An index, in database jargon, can be thought of like a "hint" system. When a field is indexed, an analysis of the data contained within is put into a quick-access "list". When a query is performed, if an index is available for the search query parts, it may increase performance by avoiding the need to scan the entire table of data. There are many types of indexes available, but the main three are: Primary Indexes, Unique Indexes and Non-Unique (default) Indexes. You can also utilize multiple-column indexes, which can boost performance further in specific cases.

Primary Index

The Primary Index type, or "Primary Key" as it is more often referred to, uniquely identifies each record in a table. For example, if you have a table containing user information, this field might be named "user_id" and be set to auto_increment. One common misunderstanding is that a Primary Key MUST be auto_incrementing; this is false - the key simply must be unique. If you wished, you could use an actual username, as long as the intent is for no two records to have the same username value, it may be defined as a Primary Key.

Your Primary Key can also be composed of more than one field; this can be useful in eliminating the need for an extra, otherwise useless ID column. For example, let's say you have a customer table and a product table. In your application, every customer can have a custom limit for the amount of each product they can order. Thus, you have a three-column "link" table: CustID,ProdID,CustLimit. Since you know that for every customer, his limit can vary per product BUT he will only ever have one limit per product, you can use a multiple-column (or composite) primary key here on the first two fields (CustID,ProdID). That Primary Key can then be called upon to reference any individual row, without need for a space-wasting auto_increment ID column.

Note: when using Primary Keys, you do not need to add an additional "regular" index on the field. In the case of composite primary keys, you may require additional indexes if calling the keys out of turn. For example, "SELECT CustLimit FROM table WHERE CustID=1 AND ProdID=1" would be fine, but "SELECT CustLimit FROM table WHERE ProdID=1" would require an individual index on ProdID.

Unique Indexes

Unique indexes are extremely simple types of keys. For columns where there must be no duplicate values, you should make it unique. For example, say in your customer table, you have a field for their email address and you do not wish that any customer record can have a duplicate address - that is a prime target for a Unique Index.

Non-Unique Indexes

These are your regular, run-of-the-mill indexes. They are best put to use on fields which are used as search criteria or for ordering of results. For example, let's say you have a field called age or birthdate. When a search is performed, perhaps you need to find only those records between the age of 25 and 40. Your age or birthdate field needs an index. Another example, is you select our records from your table and wish to order them by last name. The last name column should be indexed. There is a special kind of index for text-type fields, but that is a little more advanced and will be covered in a future article.

As with the above, you can also create composite/multi-column non-unique indexes. Let's say that you have a table where two of the fields are named username and password. You perhaps perform this query: "SELECT something FROM table WHERE username='user' and password='whatever'". You can easily make sure that both username and password have an individual index, but for higher performance, you can also create an index on both fields (username,password). Just remember that adding too many columns to a composite index may defeat the purpose and always remember indexes are read from left-to-right.

CONTINUED IN NEXT POST...
__________________
Skype variuscr - Email varius AT gmail
Varius is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote