Selecting Data – Part 3

The SELECT reserved word is part of the Data Manipulation Language (DML) defined by Codd and is used to query data from the database. Because of its popularity, there are many clauses and functions that can be used to construct a query to retrieve data.

Today, I am continuing our exploration by reviewing SELECT queries that aggregate data. Data aggregation is the process of converting many records into a few records with special meaning. Many of the functions used in aggregation have statistical meaning such as minimum value, maximum value, etc.

I will be using the Adventure Works Sample database supplied by Microsoft during this talk.

We will start by looking at the the sales people and their year-to-date sales numbers. The following query joins the SalesPerson and Person tables to retrieve our results.

One nice feature of Microsoft Access is the FIRST and LAST functions are part of the aggregation set. We can recreate these features by using the ORDER BY and TOP clauses. The queries below return the first and last sales person by total sales.

The power of aggregation is achieved by using the GROUP BY and HAVING clauses. Basically, GROUP BY takes all values that are the same for a given column list and returns one unique row. The HAVING clause is a secondary filter like the WHERE clause that can specify some aggregate condition needs to be meet.

The query below returns the number of sales people by country. The COUNT function returns a integer value.

We can rewite this query with a HAVING clause to return all countries that have multiple sales people.

Common day statistics such as MIN, MAX and AVG can be calculated by selecting the correct function. The example below shows the minimum, maximum, and average bonus for a sales person in each country.

The SUM function can be used to totaled up all values in a particular column. The example below looks at the average and total sales by country regardless of person.

Last but not least, advance statistics such as standard deviation using the STDEV function or variance using the VAR fucntion can be calculated. The example below goes to the Sales Order Header level and returns statistics by country for the
2008 year.

In summary, aggregation is a great way to summarize data for organizational decision makers.
The following functions are part of TSQL aggregation with the except of FIRST and LAST which can be
emulated.

  • AVG – average of values in a column.
  • COUNT – counts how many rows.
  • FIRST – returns the value of the first record in a field.
  • LAST – returns the value of the last record in a field.
  • MAX – maximum value in a column.
  • MIN – minimum value in a column.
  • STDEV – sample standard deviation of the values in a column.
  • STDEVP – standard deviation of the values in a column.
  • SUM – adds the values in a column.
  • VAR – sample variance of the values in a column.
  • VARP – variance of the values in a column.

There are some exciting new aggregation features in TSQL 2008 that I will talk in the future about.
Knowing how to summarize data is key to using the information that is stored in the database.

Related posts

3 Thoughts to “Selecting Data – Part 3”

  1. Much appreciated for the information and share!

  2. Many thanks for your review! Frankly speaking I have never come across anything that cool.

  3. Hey! Someone in my Myspace group shared this site with us so I came to take a look. I’m definitely loving the information. I’m bookmarking and will be tweeting this to my followers! Outstanding blog and outstanding design and style.

Leave a Comment