All about Keys, Constraints and Referential Integrity (RI)

Keys:
There are two main kinds of keys: primary and foreign.  
- Primary Keys: an ID field (or fields) that uniquely identifies a 
row in the table and which may not contain nulls.  A primary key is 
meant to be a logical construct; theoretically it is not part of the 
database.  Every table should have a primary key defined.
- Foreign Keys: A foreign key in a table is a key reference to the
primary key of another table.  They're noted in the surrogate table
for integrity purposes; if the primary table deletes a record, and
a surrogate table has data that references that record, then the 
surrogate table's data is now "orphaned" and invalid. 

Notes:
- Good Database specifies that Primary keys be defined as sequential,
monotonically increasing numbers.  These are also referred to as Surrogate
(i've also heard Artificial or Synthetic) keys.  I also typically define
"Business" or "Natural" keys on tables; these are business-fields that 
also uniquely define records in tables, but which are typically alphanumeric
or composite in nature, or used by humans instead of surrogate ID numbers.  

- "Composite Keys" are logical primary keys to a table that require more
than one field to define.  These are acceptable in minor tables which
do not pass along its primary keys to a surrogate table.

- Foreign keys that are part of the surrogate table's primary key have
"Defining foreign key relationships" with the primary table.

- Why are Surrogate keys good?  Because Sybase's Optimizer works more
efficiently when joining and searching integer-only indexes.  Thus, if
all tables are joined via surrogate keys, queries will perform better.

---
Logical versus Physical keys:

Sybase (unlike other RDBMSs) distinguishes between defined logical key
relationships and actual physical implementations.  You define logical
keys on a table through these two stored procedures:
- sp_primarykey , , , ...
- sp_foreignkey , , , , ...
(, ... is optional if you don't have composite key)

Running these two stored procedures enters information into the syskeys
system table for a particular database, but DOES NOT implement any
types of physical constraints on the database.  These are documentation
only stored procedures.

- Why are logical key definitions important??  Because some 3rd party
ERD tools ONLY read the syskeys info defined by these two stored procedures.
Others are smart enough to infer relationships from constraint definitions,
but if you handle RI through triggers and not constrants (as a lot of
places do), ERD tools have no way of knowing the relationships between tables.

- if you have fully defined your logical keys using these two stored
procedures, you may run the stored proc "sp_helpjoin" and see possible
join fields between tables.
- Limitation; these two stored procs only take 8 field parameters, whereas
Sybase can use upto 16 fields in a composite key.  So, if you (for some
weird reason) have more than 8 fields in your primary key you cannot
document it properly using these two stored procs.

---
How do I tell what keys I have set now?
Well, it depends on what kind of keys you're trying to discover:
- sp_helpkey tablename will show you what keys have been created
via sp_primarykey and sp_foreignkey statements.
- sp_helpconstraint will show  you what declarative integrity statements
(primary key and foreign key constraints) have been created physically
on the table.

Note: Not too many DBA tools are good at creating sp_primarykey and
sp_foreignkey statements for your database.  The easiest way to get such
DDL is to usedbschema.pl, a widely known took out there for reverse 
engineering Sybase databases.  It will create all the create table, alter 
table and sp_primary and sp_foreignkey statement ddl for you.  Surf to
http://http.midsomer.org/ and you can download the code.

---
Issue: Should I use declarative constraints or triggers to enforce Referential
Integrity (mostly Foreign keys) in my database?
A: Not a simple question.  Pros and cons to both sides (some of this taken
from a discussion on Sybase-L 2/5/02)

Declarative constraints (foreign key constraints)
Pro
- No SQL coding required, just a simple alter table command.  No SQL to
program and depend on Developers.
- Constraints are read/interpreted by 3rd Party ERD tools, allowing for proper
modelling
- RI constraints appear in system tables; thus creating documentation as
to what RI relationships exist 
- ANSI Standard, whereas triggers are not.
- Better for performance?  Arguable: i've heard both are better performing.

Con
- inflexibility doing table modifications.  If you're doing a lot of
table mods (adding and dropping columns) FK constraints and their existing
relationships make for major headaches.  I've always preferred to add/drop
tables instead of using alter commands.  If you've got any sort of
cascading constraint relationship (table A has a foreign key reference in
table b, which has a FK reference in table C) you've got to drop the
FK constraints manually in table C, then table B just to do your table
drop.  It can get tedious.  This would be easier if there was a "cascade
constraint" option similar to Oracle's
- Do not create indexes on FK fields; needs additional step (con in both)
- Older versions of Sybase (allegedly) have had FK constraint bugs (though
I've never seen it in practice).  
- Can't do cascading deletes.
- Declarative RI have inflexible error messages
- If you don't specify a constraint name, Sybase assigns a non-user friendly
name (minor issue)


Triggers 
Pro
- Better for performance?  Arguable: i've heard both are better performing.
- easily disabled w/ "alter table disable trigger" to turn off RI temporarily
- More flexible than constraints; can do more error checking.  
- Can program in before/after behavior, even though all triggers in Sybase
are "after" triggers
- Can program in cascading update and delete behavior
- Can have more flexible error messages than Constraint RI violations.
(Example: a FK relationship to a PK table needs RI for two purposes: to ensure
a child row has a parent, and to ensure a parent row can't be deleted if
there's an existant child row.  The error messages really should distinguish
between these two cases).

Con
- Are NOT read by 3rd party ERD tools, requiring additional maitenance to keep 
RI documentation updated through sp_primarykey and sp_foreignkey statements
- Do not create indexes on FK fields; needs additional step (con in both)

----


Physical Implementation of keys:
Implementing Primary Keys:
2 choices:
- Create a primary key constraint on the table at creation (or through
alter table): you can specify it to be clustered or nonclustered (but
it defaults to a unique clustered index when you use the primary key
constraint).  This is called a "declarative constraint."
- Define a unique index using create index: you can specify it to be
clustered or non clustered, but you should define it unique.

- Performance Note: it is not always wise to place your clustered index on 
your surrogate key!  Especially if its an OLTP environment with large numbers 
of inserts.  This will create a heap table with a hot-spot at insertion point,
and could lead to blocking, deadlocking, and performance issues.
Recommendation: Place a unique non-clustered index on this field, and put the
clustered index on the "business key," if it exists.  If not, find an 
alphabetic field (last name, eg) that is a candidate for human searches, or
which at least guarantees somewhat random data distribution throughout the
table.

Implementing Foreign Keys:
2 choices:
- Create a foreign key constraint through create table or alter table.
- Create triggers: much more flexible, allows before/after delete trigger
behavior.

- Performance note: always put non-clustered indexes on each FK field, because
these are the fields the Optimizer will be joining on.

---
ERD Tool notation:

Two major notation conventions exist on ERDs:
1. IDEF1X:
- lines indicate relationships between tables: the table with the solid
at its line contact is the foreign key table.  
- Solid Lines indicate an "identifying relationship" between the primary table
and its foreign key.  In other words, the FK is part of the primary 
key of the surrogate table
- Dashed lines between tables indicate a foreign key relationship exists,
but that the FK is not part of the PK of the surrogate table.  However,
the FK is still mandatory (non-null)
- A lighter dot at the PK table indicates an optional relationship, along
with a dashed line (indicating a non-identifying relationship)

- Light dots at both ends of a table pair indicate an exact 1-1 relationship.
- Solid dots at both ends indicate a non-specific relationship (an
unresolved many-to-many for example).

- Keys are noted with (PK) and (FK) notations in the column list per table.

2. IE (crows feet)
- Lines indicate relationships (either identifying or non-identifying;
no distinction is made).
- "Crows feet" at the foreign key table indicates a 1-many relationship.
- A short "t-bar" at the primary key table indicates a non-null mandatory
field relationship
- a small circle indicates a nullable, optional relationship
- a lack of crows feet indicates a 1-1 relationship.