Tag: SQL

Some notes on SQL: 5 – database design

This is the fifth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database, the fourth on advanced selection. This post covers database design, as such it is a little lighter on the code examples. No claim of authority is made for these posts, they are mainly intended as my notes on the topic. These notes are based largely on Head First SQL.

The goal of database design is to produce a database which is straightforward and efficient to search. This is done by splitting data into a set of tables, with lookups between those tables used to build the desired output results.

Efficient database design is normally discussed with reference to “normal forms“, the goal being to reach the highest order normal form. In practice, pragmatism is applied which means it may be sensible to hold back a little on this.

First normal form – each row of data must contain atomic values, and each row of data must have a unique identifier, known as a Primary Key. “Atomic values” are essentially the smallest pieces into which data can be sensibly divided, this may depend on application. So, for example, in some cases a house address may be kept as a single text field whilst in others it might be divided into Number, Street, Town etc. Furthermore to be “atomic” data should not be repeated (i.e. a table containing interests should not contain columns “interest_1”, “interest_2″…The Primary Key may be a single column of ‘synthetic’ numbers (i.e. they don’t have any other purpose), or it may be a pre-existing column in the table, or it may be a combination of columns which case it is called a Composite Key. Primary and Composite Keys are indicated using the PRIMARY KEY keyword :

CREATE TABLE customer
(
sid        INTEGER,
last_name  VARCHAR(30),
first_name VARCHAR(30),
PRIMARY KEY (sid)
);

For a composite key, this form is used:
PRIMARY KEY (column_1,column_2,column_3)
Second normal form the table is in first normal form, and in addition contains no ‘partial functional dependencies’, this happens naturally with synthetic primary keys. Partial functional dependency means that a non-key column is dependent on some but not all of the columns in a composite primary key.

Third normal form the table is in second normal form, and in addition contains no ‘transitive dependencies’. Transitive functional dependency is when any non-key column is related to any of the other non-key columns. This page has a nice example, if we have a table with columns: {Project_id, manager_name, manager_address} then manager address and manager name are transitively dependent: change manager name and we change manager address. To address this in third normal form we split the table into two tables {Project_id, manager name} and {Manager_name, manager_address}. As the author writes:

In a normalised relation a non-key field must provide a fact about the key, the whole key and nothing but the key.

Relationships between tables in a database are indicated like this:

CREATE TABLE orders
(
order_id     INTEGER,
order_date   DATE,
customer_sid INTEGER,
amount       DOUBLE,
PRIMARY KEY (order_id),
FOREIGN KEY (customer_sid) REFERENCES customer(sid)
);
(Example borrowed from here). PRIMARY KEY and FOREIGN KEY are examples of ‘constraints’, primary keys must be unique and a foreign key value cannot be used in a table if it does not exist as a primary key in the referenced table. The CONSTRAINT keyword is used to give a name to a constraint (a constraint being one of NOT NULL, UNIQUE, CHECK, Primary Key, Foreign Key). CHECK is not supported in MySQL.

Keywords: PRIMARY KEY, CONSTRAINT, FOREIGN KEY, REFERENCES, CONSTRAINT

Some notes on SQL: 4 – Advanced select

This is the fourth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database. This post covers more advanced commands for selecting information from a database and ways of manipulating the results returned. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.

SQL supports CASE statements, similar to those which are found in a range of programming languages, they are used to write multiple comparison sequences more compactly:


UPDATE my_table
SET    new_column = CASE
WHEN column1 = somevalue1 THEN newvalue1
WHEN column2 = somevalue2 THEN newvalue2
ELSE newvalue3
END;

The CASE statement can also be used in a SELECT:


SELECT title,
price,
budget = CASE price
WHEN price > 20.00 THEN ‘Expensive’
WHEN price BETWEEN 10.00 AND 19.99 THEN ‘Moderate’
WHEN price < 10.00 THEN ‘Inexpensive’
ELSE ‘Unknown’
END,
FROM   titles

(This second example is from here)

The way in which results are returned from a SELECT statement can be controlled by the ORDER BY keyword with the ASC (or ASCENDING) and DESC (or DESCENDING) modifiers. Results can be ordered by multiple keys. The sort order is numbers before letters, and uppercase letters before lowercase letters.

SELECT title,purchased
FROM   movie_table
ORDER  BY title ASC, purchased DESC;

ASCENDING order is assumed in the absence of the explicit keyword.

There are various functions that can be applied to sets of rows returned in a query to produce a single value these include MIN, MAX, AVG, COUNT and SUM. The  functions are used like this:

SELECT SUM(sales)
FROM   cookie_sales
WHERE  first_name = ‘Nicole’;

This returns a sum of all of the “sales” values returned by the WHERE clause. Related is DISTINCT which is a keyword rather than a function so the syntax is slightly different:

SELECT DISTINCT sale_date
FROM   cookie_sales
ORDER  BY sale_date;

This returns a set of unique dates in the sale_date column.

The GROUP BY keyword is used to facilitate the use of functions such as SUM etc which take multiple arguments to produce a single output, or to reduce a list to distinct elements (in these circumstances it is identical to the DISTINCT keyword but execution may be faster). The format for GROUP BY is shown, by example below:


SELECT first_name, SUM(sales)
FROM   cookie_sales
GROUP  BY first_name;

This will return a sum of the “sales” by each person identified by “first_name”. A final keyword used to control the output of a SELECT statement is the LIMIT keyword which can take one or two parameters the behaviour for the two forms is quite different. One parameter form:

SELECT * FROM your_table LIMIT  5;

This returns the first five results from a SELECT. Two parameter form:

SELECT * FROM your_table LIMIT  5, 5;

This returns results 6,7,8,9 and 10 from the SELECT. The first parameter is the index of the first result to return (starting at 0 for the first position) and the second parameter is the number of results to return.


Keywords: CASE, WHEN, THEN, ELSE, ORDER BY, ASC, DESC, DISTINCT, MIN, MAX, AVG, COUNT, SUM, GROUP BY, LIMIT

Some notes on SQL: 3 – changing a table

This is the third in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database. This post covers commands to modify the structure and contents of an existing database. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.

UPDATE and DELETE allow the rows in a table to be either updated or deleted according to a select-like WHERE clause. This is UPDATE, acting on multiple columns:

UPDATE your_table
SET    first_column = ‘newvalue’,
second_column = ‘another_value’
WHERE  some_column = ‘a test’;

And this is the DELETE command:

DELETE FROM your_table
WHERE  some_column = ‘a test’;

In combination with the ALTER keyword, the following operations can be performed:
The CHANGE keyword allows the name and data type of an existing column to be changed.

ALTER TABLE project_table 
CHANGE COLUMN a_silly_column_name a_better_column_name VARCHAR(100), 
CHANGE COLUMN another_poorly_named_column a_better_name VARCHAR(30);

It’s necessary to be cautious here because data loss can occur depending on the source and destination types, for example going from VARCHAR(100) to VARCHAR(30) could potentially lose 70 characters.
The MODIFY keyword allows the data type or position of an existing column to be changed.

ALTER TABLE my_table 
MODIFY COLUMN target_column VARCHAR(120), 
MODIFY COLUMN another_column AFTER target_column;

The ADD keyword allows new columns to be added to a table:

ALTER TABLE my_table 
ADD COLUMN new_column INT NOT NULL AUTO_INCREMENT FIRST;

The ADD and MODIFY keywords take position identifiers: FIRST, LAST and BEFORE, AFTER – which require a second column identifier as indicated in the MODIFY example.

In addition RENAME TO allows the table to be renamed:

ALTER TABLE poor_name RENAME TO good_name;

And DROP deletes a column:

ALTER TABLE my_table DROP COLUMN unwanted_column; 

Obviously you should use DROP COLUMN cautiously!

Keywords: ALTER, UPDATE, CHANGE, MODIFY, ADD, DELETE, AFTER

Some notes on SQL: 2 – Basic SELECT

Part 1 of this sequence of blog posts provided a preamble and showed how to create databases. This post introduces the basic SELECT command, which shows you what lies within your database and as it’s name implies allows you to select only parts of the data contained within.

The basic form of SELECT is:

SELECT*FROMmy_contacts
WHEREfirst_name=‘Anne’;

* indicates that all fields should be returned from the table ‘my_contacts’, where the first_name field = ‘Anne’. We don’t have to take all the fields from a table:

SELECTfirst_name,last_name,emailFROMmy_contacts
WHEREfirst_name=‘Anne’;

As well as the equivalence operator =, we can also use comparison operators <> (not equal), <, >, <=, >= these work not only with numerical values, but also with text values. WHERE clauses can also be combined with AND and OR operators.

SELECTdrink_nameFROMdrink_info
WHEREcalories>=30
ANDcalories<=60;


SELECTdrink_nameFROMdrink_info
WHEREcaloriesBETWEEN30AND60;

The second select using the BETWEEN keyword is equivalent to the first.
In addition there are wildcards, % meaning ‘any number of characters’ and _ meaning ‘one character’ which are accessed via the LIKE keyword:

SELECTfirst_nameFROMmy_contacts
WHEREfirst_nameLIKE‘%im’;

This first search will return ‘Tim’, ‘Slim’, and ‘Ephraim’.

SELECTfirst_nameFROMmy_contacts
WHEREfirst_nameLIKE‘_im’;

This second search will only return ‘Tim’. NULL is special, nothing equals NULL but you can check if something is NULL:

SELECTfirst_nameFROMmy_contactsWHEREflagISNULL;

Comparisons can be made to a list with the IN keyword:

SELECT drink_name FROM drink_info
WHERE rating IN ( ‘good’, ‘excellent’, ‘average’ );

Finally, the NOT operator can be used to find the inverse of the selection made, the NOT keyword goes directly after IN but otherwise goes after WHERE:

SELECTdrink_nameFROMdrink_info
WHEREratingNOT IN (‘good’,‘excellent’,‘average’);


SELECTfirst_nameFROMmy_contacts
WHERE NOTfirst_nameLIKE‘_im’;

Keywords: AND, OR, BETWEEN, IS NULL, NOT, LIKE, IN

Some notes on SQL: 1 – creation

These are some notes on SQL a language for creating and querying databases, I’m learning it because a couple of programming projects I have in mind for work and home will need it. The source for these notes is  Head First SQL book. I’ve used a previous book in this series and I quite like the presentational style. I’m using MySQL via it’s command line client to do the exercises, because it’s about as straightforward as you can get. The code is formatted for display using this SQL to HTML formatter. Notepad++ recognises SQL as a language and will do syntax highlighting, so I type my commands into it and copy them into the MySQL command line client.

SQL is quite an old language and the convention is to write keywords in block capitals (reminds me of FORTRAN!). Command sequences are terminated by a semi-colon.

To start, this sequence creates a database, sets it as active and then adds a table containing a range of fields of different types, the command DESC shows the layout of a table:


CREATE DATABASE my_database;
USE my_database
CREATE TABLE contacts
(
contact_id INT NOT NULL,
first_name VARCHAR(20),
last_name VARCHAR(20),
birthday DATE,
life_story BLOB,
weight DEC(3, 2) NOT NULL DEFAULT 80.00,
state_code CHAR(2),
appointment DATETIME
);

Desc contacts;



NOT NULL keywords are used if a field must be specified on INSERT. Once created data can be added to the table using the INSERT command:

INSERT INTO contacts (contact_id,first_name,last_name,birthday,life_story,weight,
appointment)
VALUES
(1,‘Ian’,‘Hopkinson’,‘1970-24-04’,‘A very long text string’,80.0,
’10:30am 2010-21-06′
);

INSERT can be used with no specified fields (in which case values for all supplied fields must be provided for all fields), or with a subset of fields. In order to add the ‘ character we can either use ” or \’ (where ” is two single quote characters, rather than a double quote).

To delete a table:

DROP TABLE contacts;

This command should be used with care since it deletes the table whether or not it contains data. The next post should be on the SELECT command. 

Of course you can find SQL cheatsheets elsewhere.

Keywords: CREATE, DATABASE, TABLE, USE, DESC, DROP, INSERT INTO, VALUES, INT, VARCHAR, BLOB, CHAR, DATETIME, DATE, DEC