Indexes
Use the table indexes. They are designed to facilitate fast retrieval of a subset of data so use them as much as possible. Generally, you should try to design your overall data retrieval strategy to take advantage of indexed queries as much as possible but it also often means that adding seemingly useless conditions to your WHERE clause can greatly speed up the execution of your query. The MSSQL query optimizer generally does the best optimization but it works within a set of parameters defined by the query itself. As such, it may seem like a black art but by adding extra clauses or fields in the result set, you can give query optimizer the opportunity to utilize an index in a situation where it normally would not have.
Each table’s indexes can be viewed through the MSSQL Server Enterprise Manager and the Execution Plan displayed in query analyzer shows the actual usage of indexes in a particular query. The key is to look for the operation with the highest Query Percentage cost within a given execution plan, an then try to figure out how to improve that performance.
A situation where high Query Percentage cost often occurs is when the table indexes are not properly utilized and too many rows are initially selected from a large table before they are filtered out by other means (for example, a Join). This is shown selected in the Execution Plan by the high number of rows selected. Modification of the query can cause the query optimizer to use a different index and reduce the number of initial rows selected.
In any query, it is also important to keep the WHERE clause as definite as possible and avoid too many OR statements that bring multiple unconnected columns into play. This may cause the Query Optimizer to become confused and then not use any indexes at all. The key to identifying this situation is to look at your WHERE clause and determine whether there are multiple conditions that could apply to the same record.
For example:
The following queries the Variables table looking for a specific string in the Extended_Info field, which is a non-indexed field.
SELECT @Unload_Date_Var_Id = Var_Id
FROM Variables
WHERE PU_Id = @PU_Id
AND Extended_Info Like '%' + @Flag + '%'
This query can actually be made faster by included a search against the Var_Desc field, which is an indexed field. Peculiarly, to achieve the best performance this query needs the search string to be in a variable as opposed to a constant.
SELECT @Wildcard = ‘%’
SELECT @Unload_Date_Var_Id = Var_Id
FROM Variables
WHERE PU_Id = @PU_Id
AND Var_Desc Like @Wildcard
AND Extended_Info Like '%' + @ Flag + '%
Basically, you need to pick the indexes based on what you're putting into your WHERE clause or your JOIN statements. To properly design a database, you need to write the queries at the same time. In theory, you could create an index for every column or column combination but every index you create adds space to the database and ultimately affects performance. So there is a balance but unfortunately there are no rules about it. SQL provides an 'Index Tuning Wizard' which may be off assistance.
Fundamentally, there are 3 choices to make:
- Primary Key: You should always have a unique primary key. This is the unique column or column combination within the table. In Plant Applications, this is almost always an identity field and, because of this, we usually make it a non-clustered index (as there is no point in having a clustered index on a single column with unique values).
- Clustered Index : You can only have 1 clustered index and it should always be a multi-column index that is the most commonly selected key. For example, in the Plant Applications Events table, while Event_Id is the unique primary key, the clustered index is on PU_Id and Time Stamp because that is the most commonly selected combination. Clustered indexes act as a tree so they are very fast at retrieving data (for example, they search for all PU_Id records first before drilling down to Time stamps). The order of the columns in the clustered index is important (as it is for any index).
- Non-clustered indexes: You can have many of these. Typically they should be for commonly selected columns or column combinations other than the Clustered Index or Primary Key.
The best way to choose your indexes is by writing the queries you need and figuring out what's the most commonly selected columns. As you test the queries you look at the Execution Plan in Query Analyzer and see which indexes the query is using. You will view things such as:
- Table Scan: This means the query is not using any index and is checking every single row for a match. As the table grows larger the query takes longer and performance degrades.
- Index Scan: This means that the query is scanning the full index for the rows it wants. This is better than a table scan but still means the query has to check every single element. Because indexes are smaller than the actual table, Index Scans are much faster than Table Scans. As the table grows larger, the scan take longer and performance degrades.
- Index Seek: This means the index is being fully utilized in the search. This is what you are aiming for and means that the query performance should remain stable as the table grows larger. As the table grows larger, performance should remain stable. Also, larger tables mean slower performance in all cases, but the effect is be much less pronounced with Index Seeks.