Tag Archive: computing

Mar 04 2011

Far, far away

This week I have journeyed into the heart of darkness.

Actually it was my company’s IT outsourcing system. I work for a very big company: it has about 150,000 employees spread across the world. I work in north west England amongst other things I look after a little unit which uses a particular piece of bespoke software, the unit involves seven people in an office a couple of hundred metres from where I sit at work. The tale of our new bespoke software is long and tortuous and I won’t go into it here but to relate my adventures in getting the test version of the software copied onto the live system today.

The servers on which this software resides are located in North Wales (15 miles away) and a spot down the road about 8 miles away. The outsourcing of our IT services means that the manager for this process is located in the Netherlands, and the person actually doing the process, Supriya, is in India. I can tell she is in India because she has an Indian phone number. Her e-mail signature says her “office base” is in North Wales, it must be a bit inconvenient having your “office base” in North Wales, a location I suspect Supriya has never visited, and a phone in India. Do my company think I am some sort of dribbling BNP little Englander who would dissolve in rage if I thought I was dealing with someone in India? I regularly work with people from China, France and even the US, trying to obfuscate where someone works is frankly patronising and offensive – particularly if you do it so ineptly.

I’ve spoken to Supriya before – she’s a friendly and helpful lass but she doesn’t half ask some odd questions: “Could I confirm that Ireland was not going to be impacted by the change I had requested?”. “Had I notified NL service mfgpro(users)?” Just to be clear: I have no idea how Ireland might be affected or who the “NL service mfgpro(users)” are, these aren’t recognised code words for me. I clearly provided the right answer in these cases because I was informed that both Ireland and the Benelux countries had given their approval. But the fear arises in my mind: I’ve not cleared things with the Austro-Hungarian Empire – could I have inadvertently started World War III? This is yet to be determined.

The process doesn’t go entirely smoothly, largely because Supriya is too polite to tell me that the procedure she’d been asked to carry out throws up some errors. I can’t help because I’m not given permissions to see the servers where the software resides, Supriya has a difficult time because she has no absolutely idea what the software does. However, with the help of  James, who wrote the software, based in Manchester but whose boss is in Sunderland we do manage to get everything sorted out by the end of the day (or about 10pm in Supriya’s time zone).

This is not an isolated incident: receipts for my travel claims are sent to Iron Mountain (a company just outside Birmingham) where they are converted to electronic form before being sent to Manila (I can’t help thinking this may have been due to a misunderstanding involving envelopes) and paid via India. In a fit of tidiness I once decided to get a stash of 6 computers removed from a desk in my office: they’d been left by a sequence of unnamed, and now forgotten contractors. I received endless fractious e-mails from a centre in Bulgaria, belonging to the leasing company, demanding to know who all these computers belonged to, or why I appeared to be in possession of 6 computers.

The old way of doing things involved a prescriptive system of doing stuff where you filled in a form and it went through a process and something got done. But actually it didn’t, actually you learnt who was going to do what you wanted, went over for a little chat whereby you found out what incantation you needed to inject into the system in order to get your job squared with the system whilst they got on and did the job. Outsourcing frequently loses this human contact, in fact it purposefully eliminates it.

Aug 02 2010

A set of blog posts on SQL

This is a roundup post for a rather specialist set of posts I wrote on SQL (Structured Query Language), a computer language for creating and querying databases. Basically the posts are my notes on the language which I’m learning because a couple of programming projects I have in mind will need it. The main source for these notes is the Head First SQL book. I’ve used a another book in this series (Head First Design Patterns) – I quite like the presentational style. The code in the posts is formatted for display using this SQL to HTML formatter.

Topics covered:
Some notes on SQL: 1 – Creation
Some notes on SQL: 2 – Basic SELECT
Some notes on SQL: 3 – Changing a table
Some notes on SQL: 4 – Advanced SELECT
Some notes on SQL: 5 – Database design
Some notes on SQL: 6 – Multi-table operations
Some notes on SQL: 7 – Subqueries and views

Of course you can find SQL cheatsheets elsewhere.

The Head First SQL book also has material on transactions and security, if I get a renewed bout of enthusiasm I will add a post on these items.

I used MySQL via its command line client to do the exercises, because it’s about as straightforward as you can get. Notepad++ recognises SQL as a language and will do syntax highlighting, so I type my commands into it and copy them into the MySQL command line client. MySQL is pretty straightforward to install. I also installed Microsoft SQL Server Express 2008, which turned out to be a bit more of a struggle but on the plus side integration the C# .NET, which is what I normally program in, looks better than for MySQL.

I’ve been using with the SQL Server via SQL Management Studio (a graphical interface to databases) on the general election data compiled by The Guardian. First contact with actual data, as opposed to learning exercises has proved interesting! A lot of things that are fiddly to do in a spreadsheet are straightforward using  SQL.

SQL was designed in the early 1970’s, with commercial implementations appearing towards the end of the decade. It’s influence visible is visible in more modern languages, such as the LINQ extensions to C# (this influence is pretty explicitly stated). Some of the ideas of database design (normalisation) seem relevant to object-oriented programming.

It’s been an interesting learning experience, my scientific background in programming has me stuffing pretty much any sort of data into an array in the first instance. SQL and a database look like a much better solution for many situations.

Aug 02 2010

Some notes on SQL: 7 – Subqueries, combining selections and views

This is the seventh, and final, in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database, the fourth, advanced selection. The fifth post covered database design. The sixth post covered multi-table database operations. This post covers subqueries and views. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.
This is largely a wrapping up blog post to cover a couple of items.As an alternative to the joins described in a previous post, “subqueries” can often be used. A subquery is essentially an entire query embedded in another query. Subqueries can be used with UPDATE, INSERT and DELETE statements, whilst joins cannot. However, joins can be used to bring columns from multiple tables. There are no special keywords involved in creating a subquery. There is some discussion on the pros and cons of subqueries and joins here on Stackoverflow.

In an uncorrelated subquery the so-called inner query can be evaluated with no knowledge of the contents of the outer query. This expression contains an uncorrelated subquery:

UPDATE raw_candidate_data
SET    gender = ‘f’
WHERE  firstname IN (SELECT firstname
FROM   firstname_gender_lookup
WHERE  gender = ‘f’); 

The inner query is the clause in brackets, in this instance it is a shorthand way of building a list for an “IN” comparison. Often an inner query returns a single value, i.e. for an average or maximum in a list.

This expression contains a correlated subquery:

UPDATE raw_candidate_data
SET    party_id = (SELECT party_id
FROM   parties
WHERE  raw_candidate_data.party = parties.party_name); 

The inner query requires information from the outer query to work, this expression acts as a look up.

Complex queries can be given aliases using the VIEW keyword, for example:

CREATE VIEW web_designer AS SELECT mc.first_name, mc.last_name, mc.phone,
mc.email FROM my_contacts mc NATURAL JOIN job_desired jd WHERE jd.title=‘Web Designer’; 

Can subsequently be used by:

SELECT * FROM   web_designers; 

The view web_designers can be treated just as any other table.

The results of multiple select commands can be combined using the keywords: UNION, UNION ALL, INTERSECT and EXCEPT. UNION returns the distinct union of the results of all the selects, i.e. with no repeats, UNION ALL includes the repeats. INTERSECT returns items that are common to both SELECTs and EXCEPT returns those items that are returned by the first SELECT but not the second. The general form for these combination operations is as follows:

SELECT title FROM   job_current
SELECT title FROM   job_desired
SELECT title FROM   job_listings
ORDER  BY title; 

Each SELECT statement must return the same number of columns and each column must be coercible to the same datatype. Only a single ORDER BY statement for the set of SELECTs can be used.


Jul 25 2010

Some notes on SQL: 6 – Multi-table operations

This is the sixth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database, the fourth, advanced selection. The fifth post covered database design. This post covers multi-table database operations. No claim of authority is made for these posts, they are mainly intended as my notes on the topic.

Good database design leads us to separate information into separate tables, the information we require from a SELECT statement may reside in multiple tables. There are keywords and methods in SQL to help with extracting data from multiple tables. To assist with clarity aliases, indicated using the AS keyword, allow tables to be given shorter, or clearer, names temporarily. Various JOIN keywords enable lookups between tables, as with other aspects of SQL there are multiple ways of achieving the same results – in this case ‘subqueries’.

The AS keyword can be used to populate a new table with the results of a SELECT statement, or it can be used to alias a table name. In it’s aliasing guise it can be dropped, in shorthand. This is AS being used in table creation:
CREATE TABLE profession
profession VARCHAR(20)
) AS
SELECT profession
FROM   my_contacts
GROUP  BY profession
ORDER  BY profession;

The following two forms are equivalent, the first uses the AS to alias, the second uses an implicit alias:
SELECT profession AS mc_prof
FROM   my_contacts AS mc
GROUP  BY mc_prof
ORDER  BY mc_prof;

SELECT profession mc_prof
FROM   my_contacts mc
GROUP  BY mc_prof
ORDER  BY mc_prof;

The following examples use two tables boys which is a three column table {boy_id, boy, toy_id} and toys a two column table {toy_id, toy}.

boy_id boy toy_id
1 Davey 3
2 Bobby 5
3 Beaver 2
4 Richie 1
toy_id toy
1 Hula hoop
2 balsa glider
3 Toy soldiers
4 Harmonica
5 Baseball cards

Cross, cartesian, comma join are all names for the same, relatively little used operation which returns every row from one table crossed with every row from a second table, that’s to say two 6 row tables will produce a result with 36 rows. Although see here for an application.
SELECT t.toy,
FROM   toys AS t

Notice the use of the period and aliases to reference columns, this query will produce a 20 row table.

Inner join combines the rows from two tables using comparison operators in a condition, an equijoin returns rows which are the same, a non-equijoin returns rows that are different. These are carried out with the same keywords, the condition is different. This is an equijoin:
SELECT boys.boy,
FROM   boys
ON boys.toy_id = toys.toy_id;

The ON and WHERE keywords can be used interchangeable; in this instance we do not use aliases furthermore since the columns in the two tables (toys and boys) have the same name we could use a natural join:
SELECT boys.boy,
FROM   boys

Natural join is a straightforward lookup operation, a key from one table is used to extract a matching row from a second table, where the key column has the same name in each table. Both of these versions produce the following table:

boy toy
Richie hula hoop
Beaver balsa glider
Davey toy soldiers
Bobby harmonica

A non-equijoin looks like this:
SELECT boys.boy,
FROM   boys
ON boys.toy_id<>toys.toy_id
ORDER  BY boys.boy;

the resultant in this instance is four rows for each boy containing the four toys he does not have.

Outer joins are quite similar to inner joins, with the exception that they can return rows when no match is found, inserting a null value. The following query
SELECT b.boy,
FROM   boys b
ON b.toy_id = t.toy_id;

produces this result

Boy toy
Richie Hula hoop
Beaver balsa glider
Davey Toy soldiers
NULL Harmonica
Bobby Baseball cards

That’s to say each row of the toys table is taken and matched to the boys table, where there is no match (for toy_id=4, harmonica) a null value is inserted in the boy column. Both LEFT OUTER JOIN and RIGHT OUTER JOIN are available but the same effect can be achieved by swapping the order in which tables are used in the query.

In some instances a table contains a self-referencing foreign key which is the primary key of the table. An example might be a three column table, clown_info, of “clowns” {id, name, boss_id} where each id refers to a clown name and the bosses identified by boss_id are simply other clowns in the same table. To resolve this type of key a self-join is required this uses two aliases of the same table.
SELECT c1.name,
c2.name AS boss
FROM   clown_info c1
INNER JOIN clown_info c2
ON c1.boss_id = c2.id; 

Notice both c1 and c2 alias to clown_info.


Jul 20 2010

Some notes on SQL: 5 – database design

This is the fifth in a series of blog posts on SQL, the first covered creating a database, the second selecting information from a database, the third commands to modify the structure and contents of an existing database, the fourth on advanced selection. This post covers database design, as such it is a little lighter on the code examples. No claim of authority is made for these posts, they are mainly intended as my notes on the topic. These notes are based largely on Head First SQL.

The goal of database design is to produce a database which is straightforward and efficient to search. This is done by splitting data into a set of tables, with lookups between those tables used to build the desired output results.

Efficient database design is normally discussed with reference to “normal forms“, the goal being to reach the highest order normal form. In practice, pragmatism is applied which means it may be sensible to hold back a little on this.

First normal form – each row of data must contain atomic values, and each row of data must have a unique identifier, known as a Primary Key. “Atomic values” are essentially the smallest pieces into which data can be sensibly divided, this may depend on application. So, for example, in some cases a house address may be kept as a single text field whilst in others it might be divided into Number, Street, Town etc. Furthermore to be “atomic” data should not be repeated (i.e. a table containing interests should not contain columns “interest_1”, “interest_2″…The Primary Key may be a single column of ‘synthetic’ numbers (i.e. they don’t have any other purpose), or it may be a pre-existing column in the table, or it may be a combination of columns which case it is called a Composite Key. Primary and Composite Keys are indicated using the PRIMARY KEY keyword :

sid        INTEGER,
last_name  VARCHAR(30),
first_name VARCHAR(30),

For a composite key, this form is used:
PRIMARY KEY (column_1,column_2,column_3)
Second normal form the table is in first normal form, and in addition contains no ‘partial functional dependencies’, this happens naturally with synthetic primary keys. Partial functional dependency means that a non-key column is dependent on some but not all of the columns in a composite primary key.

Third normal form the table is in second normal form, and in addition contains no ‘transitive dependencies’. Transitive functional dependency is when any non-key column is related to any of the other non-key columns. This page has a nice example, if we have a table with columns: {Project_id, manager_name, manager_address} then manager address and manager name are transitively dependent: change manager name and we change manager address. To address this in third normal form we split the table into two tables {Project_id, manager name} and {Manager_name, manager_address}. As the author writes:

In a normalised relation a non-key field must provide a fact about the key, the whole key and nothing but the key.

Relationships between tables in a database are indicated like this:

order_id     INTEGER,
order_date   DATE,
customer_sid INTEGER,
amount       DOUBLE,
PRIMARY KEY (order_id),
FOREIGN KEY (customer_sid) REFERENCES customer(sid)
(Example borrowed from here). PRIMARY KEY and FOREIGN KEY are examples of ‘constraints’, primary keys must be unique and a foreign key value cannot be used in a table if it does not exist as a primary key in the referenced table. The CONSTRAINT keyword is used to give a name to a constraint (a constraint being one of NOT NULL, UNIQUE, CHECK, Primary Key, Foreign Key). CHECK is not supported in MySQL.


Older posts «