{"id":2970,"date":"2012-11-16T02:26:33","date_gmt":"2012-11-16T02:26:33","guid":{"rendered":"http:\/\/craftydba.com\/?p=2970"},"modified":"2012-11-16T21:58:31","modified_gmt":"2012-11-16T21:58:31","slug":"surrogate-keys","status":"publish","type":"post","link":"https:\/\/craftydba.com\/?p=2970","title":{"rendered":"Surrogate Keys"},"content":{"rendered":"<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/11\/golden-keys.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/11\/golden-keys-150x150.jpg\" alt=\"\" title=\"golden-keys\" width=\"150\" height=\"150\" class=\"alignleft size-thumbnail wp-image-2978\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/11\/golden-keys-150x150.jpg 150w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/11\/golden-keys.jpg 192w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" \/><\/a><\/p>\n<p>I recently worked on a project I which I redesigned sales data warehouse as a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Star_schema\">STAR schema<\/a>, using daily file partitions, with a automatic sliding window, and applying data compression at the page level.  I ended up reducing a 5 terabyte database to less than 750 GB.  I will be writing several articles on the lessons that I learned during the process.<\/p>\n<p>Today, I want to talk about how surrogate keys can be used to tie the dimension tables to the fact tables.  I need to start the talk off with some definitions grabbed from IBM&#8217;s designing and modeling data warehouse <a href=\"http:\/\/pic.dhe.ibm.com\/infocenter\/idm\/docv3\/index.jsp?topic=%2Fcom.ibm.datatools.dimensional.ui.doc%2Ftopics%2Fc_dm_surrogatekeys.html\">webpage<\/a>.<\/p>\n<p>A dimension table in a star or snowflake schema stores details about the facts. For example, a DATE dimension table stores the various aspects of time such as year, quarter, month, and day.  A fact table in a star or snowflake schema stores business measures such as sales, cost of goods, or profit.<\/p>\n<p>The concept of a STAR is obtained by placing the fact table in the middle of the ER Diagram, having the dimension on the outside, and the relationships are spokes in a wheel.<\/p>\n<p>Let&#8217;s get back to the DATE dimension since this table appears in almost every data model.  A 10 digit date formatted at YYYY-MM-DD is a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Natural_key\">natural ke<\/a>y of the table.  However, joining tables on a character field is slower than joining them on <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187745.aspx\">integers<\/a>.  That is where the speed of a STAR schema comes from.  <\/p>\n<p>Expanding the BASIC TRAINING database, I am going to use the following T-SQL snippet to create the DATE dimension and populate it with data for each day in November 2012. <\/p>\n<p>A surrogate key can be created by defining an tiny integer column with a <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/aa933196(v=sql.80).aspx\">IDENTITY<\/a>() property.  I purposely used a TINYINT with a range of 0 to 255 to illustrate a point later on in the talk.<\/p>\n<pre><span style=\"color: #008000;font-size: small;\">-- Use the correct database\r\nUSE [BASIC];\r\nGO\r\n\r\n-- Delete existing table\r\nIF  OBJECT_ID(N'[TRAINING].[DIM_DATE]') > 0\r\n    DROP TABLE [TRAINING].[DIM_DATE]\r\nGO\r\n\r\n-- Create new table\r\nCREATE TABLE [TRAINING].[DIM_DATE]\r\n(\r\n  DATE_KEY TINYINT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DIM_DATE PRIMARY KEY,\r\n  DATE_DATE SMALLDATETIME NOT NULL,\r\n  DATE_STRING VARCHAR(10) NOT NULL,\r\n  DATE_MONTH VARCHAR(3) NOT NULL,\r\n  DATE_DAY VARCHAR(3) NOT NULL,\r\n  DATE_INT_QTR INT NOT NULL,\r\n  DATE_INT_DOY INT NOT NULL,\r\n  DATE_INT_YYYY INT NOT NULL,\r\n  DATE_INT_MM INT NOT NULL,\r\n  DATE_INT_DD INT NOT NULL\r\n);\r\nGO\r\n\r\n-- Create and intialize loop variable\r\nDECLARE @VAR_DATE SMALLDATETIME;\r\nSELECT @VAR_DATE = '2012-11-01';\r\n\r\n-- Add data to our new table\r\nWHILE (DATEDIFF(D, @VAR_DATE, '2012-12-01') <> 0)\r\nBEGIN\r\n\r\n    -- Add row to dimension table\r\n    INSERT INTO [TRAINING].[DIM_DATE]\r\n    (\r\n        DATE_DATE,\r\n        DATE_STRING,\r\n        DATE_MONTH,\r\n        DATE_DAY,\r\n        DATE_INT_QTR,\r\n        DATE_INT_DOY,\r\n        DATE_INT_YYYY,\r\n        DATE_INT_MM,\r\n        DATE_INT_DD\r\n    )\r\n    VALUES\r\n\t(\r\n\t   -- As small date time\r\n\t   @VAR_DATE,\r\n\r\n\t   -- Date as string\r\n\t   SUBSTRING(CONVERT(CHAR(10), @VAR_DATE, 120) + REPLICATE(' ', 10), 1, 10), \r\n\r\n\t   -- Month as string\r\n\t   UPPER(SUBSTRING(CONVERT(CHAR(10), @VAR_DATE, 100) + REPLICATE(' ', 3), 1, 3)),\r\n\r\n\t   -- Day as string\r\n\t   UPPER(SUBSTRING(DATENAME(DW, @VAR_DATE), 1, 3)),\r\n\r\n\t   -- As quarter of year\r\n\t   DATEPART(QQ, @VAR_DATE),\r\n\r\n\t   -- As day of year\r\n\t   DATEPART(DY, @VAR_DATE),\r\n\r\n\t   -- Year as integer\r\n\t   YEAR(@VAR_DATE),\r\n\r\n\t   -- Month as integer\r\n\t   MONTH(@VAR_DATE),\r\n\r\n\t   -- Day as integer\r\n\t   DAY(@VAR_DATE)\r\n\r\n    );\r\n\r\n    -- Increment the counter\r\n    SELECT @VAR_DATE = DATEADD(D, 1, @VAR_DATE);\r\nEND;<\/span><\/pre>\n<\/p>\n<p>I used some interesting date and conversion functions to get the date variable into the correct format for the dimension.  I will be exploring functions in a future BASIC TRAINING article.<\/p>\n<p>From the image below, we can see that our table is populated with a increasing surrogate key.  <\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/11\/screen-shot-of-date-dimension.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/11\/screen-shot-of-date-dimension.jpg\" alt=\"\" title=\"screen-shot-of-date-dimension\" width=\"1156\" height=\"712\" class=\"alignleft size-full wp-image-3001\" \/><\/a><\/p>\n<p>&nbsp;<\/P><\/p>\n<p>Let&#8217;s make believe that the business line comes back and asks for a years worth of date entries  Because we chose a TINYINT at the start, we would have to drop and recreate the table.  Therefore, it is very important to choose a large enough integer to satisfy the business need.  A typical INT has a positive range up to 2 billion +.  This used to be a large number.  But if you are counting x-mass light bulbs made in China, this range might be exceeded now a-days.<\/p>\n<p>Try to stick with integers when defining surrogates keys.  This will lend to speedy joins.  On the other hand, a <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187942.aspx\">UNIQUEIDENTIFIER <\/a>can be used.  It is twice as large as an BIGINT data type (8 bytes).  Use the <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms189786.aspx\">NEWSEQUENTIALID<\/a>() function to generate ascending keys so that you can use a clustered index, the default index for a primary key.<\/p>\n<p>In summary, surrogates key can be used to join dimension tables to fact tables.  Try to define the surrogate key using a integer data type that is large enough to hold current and future data.<br \/>\nYou will be suprised on how fast reporting queries execute using this simple type of key.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently worked on a project I which I redesigned sales data warehouse as a STAR schema, using daily file partitions, with a automatic sliding window, and applying data compression at the page level. I ended up reducing a 5 terabyte database to less than 750 GB. I will be writing several articles on the lessons that I learned during the process. Today, I want to talk about how surrogate keys can be used to tie the dimension tables to the fact tables. I need to start the talk off&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[607,609,608,610,602,606,605,12,15,611,615,28,604,614,603,29,613,612],"class_list":["post-2970","post","type-post","status-publish","format-standard","hentry","category-db-admin","tag-date-dimension","tag-datename","tag-datepart","tag-day","tag-dba","tag-dimension-table","tag-fact-table","tag-free-code","tag-john-f-miner-iii","tag-month","tag-replicate","tag-sql-server","tag-star-schema","tag-substring","tag-surrogate-key","tag-tsql","tag-upper","tag-year"],"_links":{"self":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/2970","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2970"}],"version-history":[{"count":0,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/2970\/revisions"}],"wp:attachment":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2970"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2970"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2970"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}