SQL Tutorial: Miscellaneous Topics

Miscellaneous Topics

Indexes

Indexes allow a DBMS to access data quicker (please note: this feature is nonstandard/not available on all systems). The system creates this internal data structure (the index) which causes selection of rows, when the selection is based on indexed columns, to occur faster. This index tells the DBMS where a certain row is in the table given an indexed-column value, much like a book index tells you what page a given word appears. Let's create an index for the OwnerID in the AntiqueOwners column:

CREATE INDEX OID_IDX ON ANTIQUEOWNERS (OWNERID);

Now on the names:

CREATE INDEX NAME_IDX ON ANTIQUEOWNERS (OWNERLASTNAME, OWNERFIRSTNAME);

To get rid of an index, drop it:

DROP INDEX OID_IDX;

By the way, you can also "drop" a table, as well (careful!--that means that your table is deleted). In the second example, the index is kept on the two columns, aggregated together--strange behavior might occur in this situation...check the manual before performing such an operation.

Some DBMS's do not enforce primary keys; in other words, the uniqueness of a column is not enforced automatically. What that means is, if, for example, I tried to insert another row into the AntiqueOwners table with an OwnerID of 02, some systems will allow me to do that, even though, we do not, as that column is supposed to be unique to that table (every row value is supposed to be different). One way to get around that is to create a unique index on the column that we want to be a primary key, to force the system to enforce prohibition of duplicates:

CREATE UNIQUE INDEX OID_IDX ON ANTIQUEOWNERS (OWNERID);

GROUP BY & HAVING

One special use of GROUP BY is to associate an aggregate function (especially COUNT; counting the number of rows in each group) with groups of rows. First, assume that the Antiques table has the Price column, and each row has a value for that column. We want to see the price of the most expensive item bought by each owner. We have to tell SQL to group each owner's purchases, and tell us the maximum purchase price:

SELECT BUYERID, MAX(PRICE) FROM ANTIQUES GROUP BY BUYERID;

Now, say we only want to see the maximum purchase price if the purchase is over $1000, so we use the HAVING clause:

SELECT BUYERID, MAX(PRICE) FROM ANTIQUES GROUP BY BUYERID HAVING PRICE > 1000;

More Subqueries

Another common usage of subqueries involve the use of logical operators to allow a Where condition to include the Select output of a subquery. First, list the buyers who purchased an expensive item (the Price of the item is $100 greater than the average price of all items purchased):

SELECT OWNERID FROM ANTIQUES WHERE PRICE >

(SELECT AVG(PRICE) + 100 FROM ANTIQUES);

The subquery calculates the average Price, plus $100, and using that figure, an OwnerID is printed for every item costing over that figure. One could use DISTINCT OWNERID, to eliminate duplicates.

List the Last Names of those in the AntiqueOwners table, ONLY if they have bought an item:

SELECT OWNERLASTNAME FROM ANTIQUEOWNERS WHERE OWNERID =

(SELECT DISTINCT BUYERID FROM ANTIQUES);

The subquery returns a list of buyers, and the Last Name is printed for an Antique Owner if and only if the Owner's ID appears in the subquery list (sometimes called a candidate list).

For an Update example, we know that the gentleman who bought the bookcase has the wrong First Name in the database...it should be John:

UPDATE ANTIQUEOWNERS SET OWNERFIRSTNAME = 'John' WHERE OWNERID =

(SELECT BUYERID FROM ANTIQUES WHERE ITEM = 'Bookcase');

First, the subquery finds the BuyerID for the person(s) who bought the Bookcase, then the outer query updates his First Name.

EXISTS & ALL

EXISTS uses a subquery as a condition, where the condition is True if the subquery returns any rows, and False if the subquery does not return any rows; this is a nonintuitive feature with few unique uses. However, if a prospective customer wanted to see the list of Owners only if the shop dealt in Chairs, try:

SELECT OWNERFIRSTNAME, OWNERLASTNAME FROM ANTIQUEOWNERS WHERE EXISTS

(SELECT * FROM ANTIQUES WHERE ITEM = 'Chair');

If there are any Chairs in the Antiques column, the subquery would return a row or rows, making the EXISTS clause true, causing SQL to list the Antique Owners. If there had been no Chairs, no rows would have been returned by the outside query.

ALL is another unusual feature, as ALL queries can usually be done with different, and possibly simpler methods; let's take a look at an example query:

SELECT BUYERID, ITEM FROM ANTIQUES WHERE PRICE >= ALL

(SELECT PRICE FROM ANTIQUES);

This will return the largest priced item (or more than one item if there is a tie), and its buyer. The subquery returns a list of all Prices in the Antiques table, and the outer query goes through each row of the Antiques table, and if its Price is greater than or equal to every (or ALL) Prices in the list, it is listed, giving the highest priced Item. The reason ">=" must be used is that the highest priced item will be equal to the highest price on the list, because this Item is in the Price list.

UNION & Outer Joins

There are occasions where you might want to see the results of multiple queries together, combining their output; use UNION. To merge the output of the following two queries, displaying the ID's of all Buyers, plus all those who have an Order placed:

SELECT BUYERID FROM ANTIQUEOWNERS UNION SELECT OWNERID FROM ORDERS;

Notice that SQL requires that the Select list (of columns) must match, column-by-column, in data type. In this case BuyerID and OwnerID are of the same data type (integer). Also notice that SQL does automatic duplicate elimination when using UNION (as if they were two "sets"); in single queries, you have to use DISTINCT.

The outer join is used when a join query is "united" with the rows not included in the join, and are especially useful if constant text "flags" are included. First, look at the query:

SELECT OWNERID, 'is in both Orders & Antiques' FROM ORDERS, ANTIQUES WHERE OWNERID = BUYERID UNION SELECT BUYERID, 'is in Antiques only' FROM ANTIQUES WHERE BUYERID NOT IN

(SELECT OWNERID FROM ORDERS);

The first query does a join to list any owners who are in both tables, and putting a tag line after the ID repeating the quote. The UNION merges this list with the next list. The second list is generated by first listing those ID's not in the Orders table, thus generating a list of ID's excluded from the join query. Then, each row in the Antiques table is scanned, and if the BuyerID is not in this exclusion list, it is listed with its quoted tag. There might be an easier way to make this list, but it's difficult to generate the informational quoted strings of text.

This concept is useful in situations where a primary key is related to a foreign key, but the foreign key value for some primary keys is NULL. For example, in one table, the primary key is a salesperson, and in another table is customers, with their salesperson listed in the same row. However, if a salesperson has no customers, that person's name won't appear in the customer table. The outer join is used if the listing of all salespersons is to be printed, listed with their customers, whether the salesperson has a customer or not--that is, no customer is printed (a logical NULL value) if the salesperson has no customers, but is in the salespersons table. Otherwise, the salesperson will be listed with each customer.

ENOUGH QUERIES!!! you say?...now on to something completely different...

Embedded SQL--an ugly example (do not write a program like this...for purposes of argument ONLY)

/* -To get right to it, here is an example program that uses Embedded SQL. Embedded SQL allows programmers to connect to a database and include SQL code right in the program, so that their programs can use, manipulate, and process data from a database. -This example C Program (using Embedded SQL) will print a report. -This program will have to be precompiled for the SQL statements, before regular compilation. -The EXEC SQL parts are the same (standard), but the surrounding C code will need to be changed, including the host variable declarations, if you are using a different language. -Embedded SQL changes from system to system, so, once again, check local documentation, especially variable declarations and logging in procedures, in which network, DBMS, and operating system considerations are crucial. */

/************************************************/ /* THIS PROGRAM IS NOT COMPILABLE OR EXECUTABLE */ /* IT IS FOR EXAMPLE PURPOSES ONLY */ /************************************************/

#include

/* This section declares the host variables; these will be the variables your program uses, but also the variable SQL will put values in or take values out. */ EXEC SQL BEGIN DECLARE SECTION; int BuyerID; char FirstName[100], LastName[100], Item[100]; EXEC SQL END DECLARE SECTION;

/* This includes the SQLCA variable, so that some error checking can be done. */ EXEC SQL INCLUDE SQLCA;

main() {

/* This is a possible way to log into the database */ EXEC SQL CONNECT UserID/Password;

/* This code either says that you are connected or checks if an error code was generated, meaning log in was incorrect or not possible. */ if(sqlca.sqlcode) { printf(Printer, "Error connecting to database server.\n"); exit(); } printf("Connected to database server.\n");

/* This declares a "Cursor". This is used when a query returns more than one row, and an operation is to be performed on each row resulting from the query. With each row established by this query, I'm going to use it in the report. Later, "Fetch" will be used to pick off each row, one at a time, but for the query to actually be executed, the "Open" statement is used. The "Declare" just establishes the query. */ EXEC SQL DECLARE ItemCursor CURSOR FOR SELECT ITEM, BUYERID FROM ANTIQUES ORDER BY ITEM; EXEC SQL OPEN ItemCursor;

/* +-- You may wish to put a similar error checking block here --+ */

/* Fetch puts the values of the "next" row of the query in the host variables, respectively. However, a "priming fetch" (programming technique) must first be done. When the cursor is out of data, a sqlcode will be generated allowing us to leave the loop. Notice that, for simplicity's sake, the loop will leave on any sqlcode, even if it is an error code. Otherwise, specific code checking must be performed. */ EXEC SQL FETCH ItemCursor INTO :Item, :BuyerID; while(!sqlca.sqlcode) {

/* With each row, we will also do a couple of things. First, bump the price up by $5 (dealer's fee) and get the buyer's name to put in the report. To do this, I'll use an Update and a Select, before printing the line on the screen. The update assumes however, that a given buyer has only bought one of any given item, or else the price will be increased too many times. Otherwise, a "RowID" logic would have to be used (see documentation). Also notice the colon before host variable names when used inside of SQL statements. */

EXEC SQL UPDATE ANTIQUES SET PRICE = PRICE + 5 WHERE ITEM = :Item AND BUYERID = :BuyerID;

EXEC SQL SELECT OWNERFIRSTNAME, OWNERLASTNAME INTO :FirstName, :LastName FROM ANTIQUEOWNERS WHERE BUYERID = :BuyerID;

printf("%25s %25s %25s", FirstName, LastName, Item);

/* Ugly report--for example purposes only! Get the next row. */ EXEC SQL FETCH ItemCursor INTO :Item, :BuyerID; }

/* Close the cursor, commit the changes (see below), and exit the program. */ EXEC SQL CLOSE DataCursor; EXEC SQL COMMIT RELEASE; exit(); }

Common SQL Questions (see FAQ link for several more)

Why can't I just ask for the first three rows in a table? --Because in relational databases, rows are inserted in no particular order, that is, the system inserts them in an arbitrary order; so, you can only request rows using valid SQL features, like ORDER BY, etc.
What is this DDL and DML I hear about? --DDL (Data Definition Language) refers to (in SQL) to the Create Table statement...DML (Data Manipulation Language) refers to the Select, Update, Insert, and Delete statements.
Aren't database tables just files? --Well, DBMS's store data in files declared by system managers before new tables are created (on large systems), but the system stores the data in a special format, and may spread data from one table over several files. In the database world, a set of files created for a database is called a tablespace. In general, on small systems, everything about a database (definitions and all table data) is kept in one file.
(Related question) Aren't database tables just like spreadsheets? --No, for two reasons. First, spreadsheets can have data in a cell, but a cell is more than just a row-column-intersection. Depending on your spreadsheet software, a cell might also contain formulas and formatting, which database tables cannot have (currently). Secondly, spreadsheet cells are often dependent on the data in other cells. In databases, "cells" are independent, except that columns are logically related (hopefully; together a row of columns describe an entity), and, other than primary key and foreign key constraints, each row in a table in independent from one another.
What is normalization? --Normalization is a technique of database design that suggests that certain criteria be used when constructing a table layout (deciding what columns each table will have, and creating the key structure), where the idea is to eliminate redundancy of non-key data across tables. Normalization is usually referred to in terms of forms, and I will introduce only the first three, even though it is somewhat common to use other, more advanced forms (fourth, fifth, Boyce-Codd; see documentation).

First Normal Form refers to moving data into separate tables where the data in each table is of a similar type, and by giving each table a primary key.

Putting data in Second Normal Form involves taking out data off to other tables that is only dependent of a part of the key. For example, if I had left the names of the Antique Owners in the items table, that would not be in second normal form because that data would be redundant; the name would be repeated for each item owned, so the names were placed in their own table. The names themselves don't have anything to do with the items, only the identities of the buyers and sellers.

Third Normal Form involves getting rid of anything in the tables that doesn't depend solely on the primary key. Only include information that is dependent on the key, and move off data to other tables that are independent of the primary key, and create a primary keys for the new tables.

There is some redundancy to each form, and if data is in 3NF (shorthand for 3rd normal form), it is already in 1NF and 2NF. In terms of data design then, arrange data so that any non-primary key columns are dependent only on the whole primary key. If you take a look at the sample database, you will see that the way then to navigate through the database is through joins using common key columns.

Two other important points in database design are using good, consistent, logical, full-words in the database itself. On the last point, my database is lacking, as I use numeric codes for identification. It is usually best, if possible, to come up with keys that are, by themselves, self-explanatory; for example, a better key would be the first four letters of the last name and first initial of the owner, like JONEB for Bill Jones (or for tiebreaking purposes, add numbers to the end to differentiate two or more people with similar names, so you could try JONEB1, JONEB2, etc.).
What is the difference between a single-row query and a multiple-row query and why is it important to know the difference? --First, to cover the obvious, a single-row query is a query that returns one row as its result, and a multiple-row query is a query that returns more than one row as its result. Whether a query returns one row or more than one row is entirely dependent on the design (or schema) of the tables of the database. As query-writer, you must be aware of the schema, be sure to include enough conditions, and structure your SQL statement properly, so that you will get the desired result (either one row or multiple rows). For example, if you wanted to be sure that a query of the AntiqueOwners table returned only one row, consider an equal condition of the primary key-column, OwnerID.

Three reasons immediately come to mind as to why this is important. First, getting multiple rows when you were expecting only one, or vice-versa, may mean that the query is erroneous, that the database is incomplete, or simply, you learned something new about your data. Second, if you are using an update or delete statement, you had better be sure that the statement that you write performs the operation on the desired row (or rows)...or else, you might be deleting or updating more rows than you intend. Third, any queries written in Embedded SQL must be carefully thought out as to the number of rows returned. If you write a single-row query, only one SQL statement may need to be performed to complete the programming logic required. If your query, on the other hand, returns multiple rows, you will have to use the Fetch statement, and quite probably, some sort of looping structure in your program will be required to iterate processing on each returned row of the query.
What are some important nonstandard SQL features (extremely common question)? --Well, see the next section...

Nonstandard SQL..."check local listings"

INTERSECT and MINUS are like the UNION statement, except that INTERSECT produces rows that appear in both queries, and MINUS produces rows that result from the first query, but not the second.
Report Generation Features: the COMPUTE clause is placed at the end of a query to place the result of an aggregate function at the end of a listing, like COMPUTE SUM (PRICE); Another option is to use break logic: define a break to divide the query results into groups based on a column, like BREAK ON BUYERID. Then, to produce a result after the listing of a group, use COMPUTE SUM OF PRICE ON BUYERID. If, for example, you used all three of these clauses (BREAK first, COMPUTE on break second, COMPUTE overall sum third), you would get a report that grouped items by their BuyerID, listing the sum of Prices after each group of a BuyerID's items, then, after all groups are listed, the sum of all Prices is listed, all with SQL-generated headers and lines.
In addition to the above listed aggregate functions, some DBMS's allow more functions to be used in Select lists, except that these functions (some character functions allow multiple-row results) are to be used with an individual value (not groups), on single-row queries. The functions are to be used only on appropriate data types, also. Here are some Mathematical Functions:

ABS(X)	Absolute value-converts negative numbers to positive, or leaves positive numbers alone
CEIL(X)	X is a decimal value that will be rounded up.
FLOOR(X)	X is a decimal value that will be rounded down.
GREATEST(X,Y)	Returns the largest of the two values.
LEAST(X,Y)	Returns the smallest of the two values.
MOD(X,Y)	Returns the remainder of X / Y.
POWER(X,Y)	Returns X to the power of Y.
ROUND(X,Y)	Rounds X to Y decimal places. If Y is omitted, X is rounded to the nearest integer.
SIGN(X)	Returns a minus if X < 0, else a plus.
SQRT(X)	Returns the square root of X.

**Character Functions**
LEFT(,X)	Returns the leftmost X characters of the string.
RIGHT(,X)	Returns the rightmost X characters of the string.
UPPER()	Converts the string to all uppercase letters.
LOWER()	Converts the string to all lowercase letters.
INITCAP()	Converts the string to initial caps.
LENGTH()	Returns the number of characters in the string.
\|\|	Combines the two strings of text into one, concatenated string, where the first string is immediately followed by the second.
*LPAD(,X,'')**	Pads the string on the left with the * (or whatever character is inside the quotes), to make the string X characters long.
*RPAD(,X,'')**	Pads the string on the right with the * (or whatever character is inside the quotes), to make the string X characters long.
SUBSTR(,X,Y)	Extracts Y letters from the string beginning at position X.
NVL(,)	The Null value function will substitute for any NULLs for in the . If the current value of is not NULL, NVL has no effect.

Syntax Summary--For Advanced Users Only

Here are the general forms of the statements discussed in this tutorial, plus some extra important ones (explanations given). REMEMBER that all of these statements may or may not be available on your system, so check documentation regarding availability:

ALTER TABLE ADD|DROP|MODIFY (COLUMN SPECIFICATION[S]...see Create Table); --allows you to add or delete a column or columns from a table, or change the specification (data type, etc.) on an existing column; this statement is also used to change the physical specifications of a table (how a table is stored, etc.), but these definitions are DBMS-specific, so read the documentation. Also, these physical specifications are used with the Create Table statement, when a table is first created. In addition, only one option can be performed per Alter Table statement--either add, drop, OR modify in a single statement. COMMIT; --makes changes made to some database systems permanent (since the last COMMIT; known as a transaction) CREATE [UNIQUE] INDEX ON (); --UNIQUE is optional; within brackets. CREATE TABLE ( [()] , ...other columns); (also valid with ALTER TABLE) --where SIZE is only used on certain data types (see above), and constraints include the following possibilities (automatically enforced by the DBMS; failure causes an error to be generated): NULL or NOT NULL (see above) UNIQUE enforces that no two rows will have the same value for this column PRIMARY KEY tells the database that this column is the primary key column (only used if the key is a one column key, otherwise a PRIMARY KEY (column, column, ...) statement appears after the last column definition. CHECK allows a condition to be checked for when data in that column is updated or inserted; for example, CHECK (PRICE > 0) causes the system to check that the Price column is greater than zero before accepting the value...sometimes implemented as the CONSTRAINT statement. DEFAULT inserts the default value into the database if a row is inserted without that column's data being inserted; for example, BENEFITS INTEGER DEFAULT = 10000 FOREIGN KEY works the same as Primary Key, but is followed by: REFERENCES (), which refers to the referential primary key.CREATE VIEW AS ; DELETE FROM WHERE ; INSERT INTO [()] VALUES (); ROLLBACK; --Takes back any changes to the database that you have made, back to the last time you gave a Commit command...beware! Some software uses automatic committing on systems that use the transaction features, so the Rollback command may not work. SELECT [DISTINCT|ALL] FROM [WHERE ] [GROUP BY ] [HAVING ] [ORDER BY [ASC|DESC]]; --where ASC|DESC allows the ordering to be done in ASCending or DESCending order UPDATE SET = [WHERE ]; --if the Where clause is left out, all rows will be updated according to the Set statement Important Links Computing & SQL/DB Links: Netscape -- Oracle -- Sybase -- Informix --Microsoft Programmer's Source -- DBMS Sites -- inquiry.com -- DB Ingredients -- SQL for Beginners Web Authoring -- Computing Dictionary -- DBMS Lab/Links -- SQL FAQ -- SQL Databases SQL Reference Page -- Ask the SQL Pro -- SQL Pro's Relational DB Useful Sites RIT Database Design Page -- Database Jump Site -- Programming Tutorials on the Web Development Resources -- Query List -- IMAGE SQL Miscellaneous: CNN -- USA Today -- Pathfinder -- ZDNet -- Metroscope -- CNet Internet Resource List -- Netcast Weather -- TechWeb -- LookSmart -- NY Times Fax Search Engines: Yahoo -- Alta Vista -- Excite -- WebCrawler -- Lycos -- Infoseek -- search.com These sites are not endorsed by the author. Disclaimer I hope you have learned something from this introductory look at a very important language that is becoming more prevalent in the world of client-server computing. I wrote this web page in order to contribute something of value to the web and the web community. In fact, I have been informed that this document is being used at several colleges for use in database classes and for use by researchers. Also, look for this page in Waite Publishing's newest book about Borland C++ Builder, which will be out this summer. In addition, I would like to thank all of the people from across five continents who have contacted me regarding this web page. I also hope to continue to add more material to this tutorial, such as topics about database design and nonstandard SQL extensions, even though I wish to stay away from material about individual Database Management Systems. Good luck in your SQL and other computing adventures. Jim Hoffman Comments or suggestions? Mail jhoffman@one.net. Or you may wish to look at Jim Hoffman's Index Page for more information about Jim Hoffman. This page is best viewed with Netscape Navigator; it doesn't look quite right with Microsoft Internet Explorer. Last updated: 4-12-1997; added some text. http://w3.one.net/~jhoffman/sqltut.htm