Index Troubleshooting Techniques

One of the more common DBA requests is to troubleshoot a badly
performing piece of SQL code.  

----
Step 1: benchmark the query as is, if possible.  Just having a user
say "its slow" won't help you later as you attempt to fix the problem.
Try to quantify the query so you can tell if you've improved things
later.  Use

- set statistics time on: informational; calculates time a query has run

or just time it by hand.

----
Step 2.  Examine the showplan for the code

set showplan on
go

this will cause the engine to print out its decision tree for
the query...which index it selects, how it orders the data of
the query etc.

Next, turn on a "test mode" option that executes the query in
the Optimizer but not on the data.  If you are just examining
one particular sql statement, run:

set noexec on
go

if you're examining a stored procedure, execute

set fmtonly on
go

(set fmtonly on accomplishes the same thing as set noexec on for
stored procs...do NOT set noexec on before setting fmtonly)

Now run your code.  Each join in the query will have detailed
information printed out concerning the index selected (or lack
there of), the direction of the scan etc.  The key things to
look for are "table scans" where one expects an index to be used.

----
Step 3: analyze the table scans.  The most common problem with
table joins is a poorly chosen join field.  If an index does not
exist on the join fields, the Optimizer will have to scan the
entire table into memory to return the result set.  

Other things to watch out for/things that cause table scans:

- datatype mismatches; a join field is (for example) defined as an
integer in table A and a numeric in table B.  
- stored procedure parameter datatype mismatches; trickier to find;
if a parameter is defined as (for example) a numeric, but the
target table has the field defined as an integer.
- vague search arguments; if the search list is too vague, too many
results will be returned, which eliminates the usefulness of your
index because after a certain percentage of the table is read,
the optimizer defaults to doing a table scan for performance.  (note;
this used to be thought to be the 20% rule...if a query was going
to return more than 20% of the rows of the table, the Optimizer would
default to using a table scan.  This arbitrary percentage value has
been debunked by Sybase Engineers, who have taken great pains to
say that number was the result of an off-the-cuff remark and is not true.
- Missing indexes on search arguments; when the columns used in the
where clause simply are not indexed.
- Overly complex non-clustered indexes.  Sometimes the Optimizer will
decide its easier (less-costly in terms of physical i/o) to just read
the pages physically (table scan) than use a non-clustered index that
requires several tree steps to traverse.

- Certain where clauses (order by, group by, having, between, like,
calculations in where clauses) can be problematic.  Be careful using
them, and consider rewriting queries not to use them.

- misordering of where clauses; always try to use the leading column
of a composite index...secondary keys of composite indexes are not
as effective

Note: sometimes table scans are necessary.  If you have 16K i/o
buffer pools and they are efficiently used, table scans can be 
efficient ways to return large numbers of rows.  Not all table
scans are capable of being eliminated.


----
Step 4.  Issue an update statistics, sp_recompile, then repeat step 2.  

The Sybase optimizer depends on detailed numerical statistics about
the data, its distribution, etc to make index selections.  If data
changes frequently, then these statistics must be updated frequently
as well so the Optimizer always knows the most about the data it can.
the command Update statistics forces the statistics to flush and be
recollected.  sp_recompile instructs the Dataserver to recompile the
query plan of a stored procedure the next time it executes.  This is
necessary because Sybase stores a compiled version of each stored 
procedure in sybsysprocs (along with a complied query plan) and runs
this instead of the actual code for performance.  

Issue these two commands, then repeat step 2.  Compare whether the
Optimizer has changed its query plan, based on new statistics.  If
it has, re-run the original query (with out the noexec on) and
compare against the initial run time.

NOTE: some dataserver activities reduce performance on particular
frequently run operations.  For example; an often-run stored procedure
that pulls back commonly requested data will eventually populate the
cache chain with this information, greatly speeding access.  But,
a nightly update statistics or dbcc job will quickly flush out these
pages, causing a known performace issue the first time this particular
job is run each morning.

----
Step 5: More Advanced steps.

Usually the first 4 steps are sufficient.  I've only had to delve into
these techniques rarely in order to resolve problems.

- dbcc traceon(302): the 302 traceflag, which prints out the calculations
the Optimizer is making to select an index.
- dbcc traceon(310); prints Optimizer join information.
Note: dbcc traceon(3604) sends the dbcc output generated by 302/310 to
the screen, 3605 to the errorlog.

(note: dbcc traceon (302, 310, 3604) will set it all at once).

- set statistics io on; prints out the number of disk reads.  Useful
if you're trying to calculate numbers of I/O.  Can be extremely detailed
if resource limits are enabled in the Server.

- sp_sysmon; run sp_sysmon during a query to determine the nature of
System activity

- Cache, Buffer pool modification; prefetch theory, MRU-LRU cache management,
etc.  See the Performance and Tuning guide.

----
Step 6: Last Ditch efforts.  These are considered last ditch because
its considered rare that a human can outthink the Optimizer.  These
techniques more or less stop the statistics and cost-based Optimizer
decision tree process from occuring.

- set forceplan on; forces the optimizer to use the tables in the
order they are specified in the from clause

- specifying the index to be used: you can use the "index "
clause in a select statement to physically force the use of a particular
index.