Summarizing Data – Part 2

The SELECT reserved word is part of the Data Manipulation Language (DML) defined by Codd and is used to query data from the database. Because of its popularity, there are many clauses and functions that can be used to construct a query to retrieve data.

I am continuing our exploration by reviewing SELECT queries that aggregate data. Data aggregation is the process of converting many records into a few records with special meaning. I will be using the Adventure Works 2012 sample database supplied by Microsoft during this talk.

Today, I am going to be introducing the CUBE operator. Let us start off by getting the business requirements for the query from the Sales & Marketing manager.

(S)he wants to get Adventure Works production inventory grouped by main category, sub category and item color. The basic statistics for reporting should be total items in stock, average list price per item and average cost per item.

My solution to this business problem is to use the CREATE VIEW statement to make a view named ‘vw_Inventory_Cube1’ that represents a three dimensional cube. Each of the three columns (coordinates) can be used to find the correct tile on the cube. Each tile has the basic statistics for that grouping.

Normally, the CUBE operator will return a NULL value when it is summarizing by a dimension. I am going to use the GROUPING() function to convert this value to the word ‘All’. When there is an actual NULL value for the data, I am going to convert this value to the word ‘Unknown’. I am going to use the CASE expression to implement this IF-THEN-ELSE logic.

The snippet below creates the required view.

We need to join four table together to extract the required information from the database.

The [ProductInventory] table contains inventory levels and the [Product] table has the characteristics of each item. Both the [ProductCategory] and [ProductSubcategory] tables are used to classify the products into various descriptive buckets.

I am dropping all the items that can not be categorized by using the WHERE clause. Also, we are grouping by [MainCategory], [SubCategory] and [ItemColor]. The expressions used in aggregation have no new functions to introduce.

Now that we have our view in place, we need to run two queries to retrieve the data that the Sales & Marketing manager wanted.

Please note that ANSI SQL is a living work of art. The syntax has been changed to be ISO compliant so that the WITH CUBE clause might not be available in future versions of SQL Server. The new syntax combines both the clauses into one.

To recap, the CUBE operator calculates all permutations of the columns in the GROUP BY clause. Therefore, the more columns you add, the longer the query will take to execute.

Use the GROUPING() function to determine if a column is a summary row (ALL) or a NULL value (UNKNOWN). I suggest using a VIEW to save the TSQL in a format the can be queried to gleam information out of the CUBE.

Next time, I will be examining the GROUPING SETS operator which gives you more control over how things are summarized.

Cube Query Output
Cube Query Example

Related posts

Leave a Comment