{"id":933,"date":"2011-09-20T21:34:16","date_gmt":"2011-09-20T21:34:16","guid":{"rendered":"http:\/\/craftydba.com\/?p=933"},"modified":"2011-09-26T14:44:37","modified_gmt":"2011-09-26T14:44:37","slug":"selecting-data-part-3","status":"publish","type":"post","link":"https:\/\/craftydba.com\/?p=933","title":{"rendered":"Selecting Data &#8211; Part 3"},"content":{"rendered":"<p>The <a href=\"http:\/\/en.wikipedia.org\/wiki\/Select_(SQL)\">SELECT<\/a> reserved word is part of the Data Manipulation Language (DML) defined by <a href=\"http:\/\/en.wikipedia.org\/wiki\/Edgar_F._Codd\">Codd<\/a> and is used to query data from the database. Because of its popularity, there are many clauses and functions that can be used to construct a query to retrieve data.<\/p>\n<p>Today, I am continuing our exploration by reviewing SELECT queries that aggregate data.<a href=\"http:\/\/en.wikipedia.org\/wiki\/Aggregate_(data_warehouse)\"> Data aggregation <\/a>is the process of converting many records into a few records with special meaning. Many of the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Aggregate_function\">functions <\/a>used in aggregation have statistical meaning such as minimum value, maximum value, etc.<\/p>\n<p>I will be using the<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/aa992075.aspx\"> Adventure Works <\/a>Sample database supplied by Microsoft during this talk.<\/p>\n<p>We will start by looking at the the sales people and their year-to-date sales numbers. The following query joins the SalesPerson and Person tables to retrieve our results.<\/p>\n<pre><span style=\"color: #008000; font-family: Lucida Console; font-size: small;\">-- Use the correct database\r\nUSE AdventureWorks2008R2\r\nGO\r\n\r\n-- List existing sales force\r\nSELECT \r\n    P.BusinessEntityID as Id, \r\n    cast(P.LastName as char(20)) as LastName, \r\n    cast(P.FirstName as char(20)) as FirstName, \r\n    Str(S.SalesYTD, 12, 2) as SalesYtd\r\nFROM Sales.SalesPerson S INNER JOIN Person.Person P\r\nON s.BusinessEntityID = P.BusinessEntityID\r\nWHERE TerritoryID IS NOT NULL\r\nGO\r\n\r\n-- Output\r\nId          LastName             FirstName            SalesYtd\r\n----------- -------------------- -------------------- ------------\r\n275         Blythe               Michael                3763178.18\r\n276         Mitchell             Linda                  4251368.55\r\n277         Carson               Jillian                3189418.37\r\n278         Vargas               Garrett                1453719.47\r\n279         Reiter               Tsvi                   2315185.61\r\n280         Ansman-Wolfe         Pamela                 1352577.13\r\n281         Ito                  Shu                    2458535.62\r\n282         Saraiva              Jos\u00e9                   2604540.72\r\n283         Campbell             David                  1573012.94\r\n284         Mensa-Annan          Tete                   1576562.20\r\n286         Tsoflias             Lynn                   1421810.92\r\n288         Valdez               Rachel                 1827066.71\r\n289         Pak                  Jae                    4116871.23\r\n290         Varkey Chudukatil    Ranjit                 3121616.32\r\n<\/span><\/pre>\n<\/p>\n<p>One nice feature of Microsoft Access is the <a href=\"http:\/\/www.techonthenet.com\/access\/functions\/advanced\/first.php\">FIRST<\/a> and <a href=\"http:\/\/www.techonthenet.com\/access\/functions\/advanced\/last.php\">LAST <\/a>functions are part of the aggregation set. We can recreate these features by using the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188385.aspx\">ORDER BY<\/a> and <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/ms189463.aspx\">TOP <\/a>clauses. The queries below return the first and last sales person by total sales.<\/p>\n<pre><span style=\"color: #008000; font-family: Lucida Console; font-size: small;\">-- Top sales person (LAST)\r\nSELECT TOP 1 \r\n    P.BusinessEntityID as Id,\r\n    cast(P.LastName as char(20)) as LastName,\r\n    cast(P.FirstName as char(20)) as FirstName,\r\n    Str(S.SalesYTD, 12, 2) as SalesYtd\r\nFROM Sales.SalesPerson S INNER JOIN Person.Person P ON s.BusinessEntityID = P.BusinessEntityID\r\nWHERE TerritoryID IS NOT NULL\r\nORDER BY SalesYTD DESC\r\n\r\n-- Output\r\nId          LastName             FirstName            SalesYtd\r\n----------- -------------------- -------------------- ------------\r\n276         Mitchell             Linda                  4251368.55\r\n\r\n-- Bottom sales person (FIRST)\r\nSELECT TOP 1 \r\n    P.BusinessEntityID as Id,\r\n    cast(P.LastName as char(20)) as LastName,\r\n    cast(P.FirstName as char(20)) as FirstName,\r\n    Str(S.SalesYTD, 12, 2) as SalesYtd\r\nFROM Sales.SalesPerson S INNER JOIN Person.Person P ON s.BusinessEntityID = P.BusinessEntityID\r\nWHERE TerritoryID IS NOT NULL\r\nORDER BY SalesYTD ASC\r\n\r\n-- Output\r\nId          LastName             FirstName            SalesYtd\r\n----------- -------------------- -------------------- ------------\r\n280         Ansman-Wolfe         Pamela                 1352577.13\r\n<\/span><\/pre>\n<\/p>\n<p>The power of aggregation is achieved by using the <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/ms177673.aspx\">GROUP BY <\/a>and <a href=\"http:\/\/technet.microsoft.com\/en-us\/library\/ms180199.aspx\">HAVING <\/a>clauses. Basically, GROUP BY takes all values that are the same for a given column list and returns one unique row. The HAVING clause is a secondary filter like the WHERE clause that can specify some aggregate condition needs to be meet.<\/p>\n<p>The query below returns the number of sales people by country. The <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms175997.aspx\">COUNT <\/a>function returns a integer value.<\/p>\n<pre><span style=\"color: #008000; font-family: Lucida Console; font-size: small;\">-- Count sales people by country\r\nSELECT T.CountryRegionCode AS Country, COUNT(*) AS SalesPeople\r\nFROM Sales.SalesPerson P INNER JOIN Sales.SalesTerritory T ON P.TerritoryID = T.TerritoryID\r\nGROUP BY T.CountryRegionCode\r\n\r\n-- Output\r\nCountry SalesPeople\r\n------- -----------\r\nAU      1\r\nCA      2\r\nDE      1\r\nFR      1\r\nGB      1\r\nUS      8 \r\n<\/span><\/pre>\n<\/p>\n<p>We can rewite this query with a HAVING clause to return all countries that have multiple sales people.<\/p>\n<pre><span style=\"color: #008000; font-family: Lucida Console; font-size: small;\">-- -- Show countries that have multiple sales people\r\nSELECT T.CountryRegionCode AS Country, COUNT(*) AS SalesPeople\r\nFROM Sales.SalesPerson P INNER JOIN Sales.SalesTerritory T ON P.TerritoryID = T.TerritoryID\r\nGROUP BY T.CountryRegionCode\r\nHAVING COUNT(*) &gt; 1\r\n\r\n--Output\r\nCountry SalesPeople\r\n------- -----------\r\nCA      2\r\nUS      8\r\n<\/span><\/pre>\n<\/p>\n<p>Common day statistics such as <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms179916.aspx\">MIN<\/a>, <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187751.aspx\">MAX <\/a>and <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms177677.aspx\">AVG <\/a>can be calculated by selecting the correct function.  The example below shows the minimum, maximum, and average bonus for a sales person in each country.<\/p>\n<pre><span style=\"color: #008000; font-family: Lucida Console; font-size: small;\">-- Show the min, max, &amp; avg bonus by country\r\nSELECT \r\n    T.CountryRegionCode AS Country, \r\n    MIN(P.Bonus) AS SmallBonus, \r\n    MAX(P.Bonus) AS BigBonus, \r\n    AVG(P.Bonus) AS AvgBonus\r\nFROM Sales.SalesPerson P INNER JOIN Sales.SalesTerritory T ON P.TerritoryID = T.TerritoryID\r\nGROUP BY T.CountryRegionCode\r\n\r\n-- Output\r\nCountry SmallBonus            BigBonus              AvgBonus\r\n------- --------------------- --------------------- ---------------------\r\nAU      5650.00               5650.00               5650.00\r\nCA      500.00                5000.00               2750.00\r\nDE      75.00                 75.00                 75.00\r\nFR      985.00                985.00                985.00\r\nGB      5150.00               5150.00               5150.00\r\nUS      2000.00               6700.00               3906.25\r\n<\/span><\/pre>\n<\/p>\n<p>The <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187810.aspx\">SUM <\/a>function can be used to totaled up all values in a particular column. The example below looks at the average and total sales by country regardless of person.<\/p>\n<pre><span style=\"color: #008000; font-family: Lucida Console; font-size: small;\">-- Show avg and total sales by sales person for a particular country\r\nSELECT \r\n    T.CountryRegionCode AS TheCountry, \r\n    AVG(P.SalesYTD) AS TheAvg, \r\n    SUM(P.SalesYTD) AS TheSales\r\nFROM Sales.SalesPerson P INNER JOIN Sales.SalesTerritory T ON P.TerritoryID = T.TerritoryID\r\nGROUP BY T.CountryRegionCode\r\n\r\n-- Output\r\nTheCountry TheAvg                TheSales\r\n---------- --------------------- ---------------------\r\nAU         1421810.9242          1421810.9242\r\nCA         2029130.0912          4058260.1825\r\nDE         1827066.7118          1827066.7118\r\nFR         3121616.3202          3121616.3202\r\nGB         4116871.2277          4116871.2277\r\nUS         2559979.8237          20479838.5899\r\n<\/span><\/pre>\n<\/p>\n<p>Last but not least, advance statistics such as standard deviation using the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190474.aspx\">STDEV <\/a>function or variance using the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms186290.aspx\">VAR <\/a>fucntion can be calculated. The example below goes to the Sales Order Header level and returns statistics by country for the<br \/>\n2008 year.<\/p>\n<pre><span style=\"color: #008000; font-family: Lucida Console; font-size: small;\">-- Get statistics on sales detail by country for 2008\r\nSELECT\r\n    YEAR(H.OrderDate) AS TheYear, \r\n    T.CountryRegionCode AS TheCountry, \r\n    STR(AVG(H.TotalDue), 12, 2) AS TheAvg, \r\n    STR(STDEV(H.TotalDue), 12, 3) AS TheStdDev, \r\n    STR(VAR(H.TotalDue), 12, 3) AS TheVar\r\nFROM Sales.SalesOrderHeader H INNER JOIN Sales.SalesTerritory T ON H.TerritoryID = T.TerritoryID\r\nWHERE YEAR(H.OrderDate) = 2008\r\nGROUP BY YEAR(H.OrderDate), T.CountryRegionCode\r\nORDER BY YEAR(H.OrderDate), T.CountryRegionCode\r\n\r\n-- Output\r\nTheYear     TheCountry TheAvg       TheStdDev    TheVar\r\n----------- ---------- ------------ ------------ ------------\r\n2008        AU              1254.82     3111.270  9680003.033\r\n2008        CA              1800.43     7957.318 63318908.399\r\n2008        DE              1764.02     6209.980 38563854.668\r\n2008        FR              2110.00     8906.554 79326696.856\r\n2008        GB              1888.58     7335.738 53813050.772\r\n2008        US              2748.81     9902.666 98062789.479\r\n<\/span><\/pre>\n<\/p>\n<p>In summary, aggregation is a great way to summarize data for organizational decision makers.<br \/>\nThe following functions are part of <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms173454.aspx\">TSQL aggregation <\/a>with the except of FIRST and LAST which can be<br \/>\nemulated.<\/p>\n<ul>\n<li>AVG &#8211; average of values in a column.<\/li>\n<li>COUNT &#8211; counts how many rows.<\/li>\n<li>FIRST &#8211; returns the value of the first record in a field.<\/li>\n<li>LAST &#8211; returns the value of the last record in a field.<\/li>\n<li>MAX &#8211; maximum value in a column.<\/li>\n<li>MIN &#8211; minimum value in a column.<\/li>\n<li>STDEV &#8211; sample standard deviation of the values in a column.<\/li>\n<li>STDEVP &#8211; standard deviation of the values in a column.<\/li>\n<li>SUM &#8211; adds the values in a column.<\/li>\n<li>VAR &#8211; sample variance of the values in a column.<\/li>\n<li>VARP &#8211; variance of the values in a column.<\/li>\n<\/ul>\n<p>There are some exciting new aggregation features in TSQL 2008 that I will talk in the future about.<br \/>\nKnowing how to summarize data is key to using the information that is stored in the database.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The SELECT reserved word is part of the Data Manipulation Language (DML) defined by Codd and is used to query data from the database. Because of its popularity, there are many clauses and functions that can be used to construct a query to retrieve data. Today, I am continuing our exploration by reviewing SELECT queries that aggregate data. Data aggregation is the process of converting many records into a few records with special meaning. Many of the functions used in aggregation have statistical meaning such as minimum value, maximum value,&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[106,95,96,97,12,93,94,98,99,100,37,28,101,103,102,29,104,105],"class_list":["post-933","post","type-post","status-publish","format-standard","hentry","category-db-dev","tag-aggregation","tag-avg","tag-count","tag-first","tag-free-code","tag-group-by","tag-having","tag-last","tag-max","tag-min","tag-select","tag-sql-server","tag-stdev","tag-stdevp","tag-sum","tag-tsql","tag-var","tag-varp"],"_links":{"self":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/933","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=933"}],"version-history":[{"count":0,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/933\/revisions"}],"wp:attachment":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=933"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=933"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=933"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}