Just the other day, I was tasked with redesigning a data warehouse’s star schema that grew to over 4 terabytes in size. After completing the project, I realized that if the original designers knew more about storage (data types, data pages, index pages), the explosive growth would have not been so bad.
I ended up putting the database on a diet of daily table partitions and page compression. Today, the database is 20% of it’s orginal size.
In short, I am going to start off a series of talks covering such fundamental topics. I am a proud United States Army Reservist (USAR) Veteran. Just like boot camp I went to so long ago, I am going nick name the series BASIC TRAINING.
The most basic part of a database is a TABLE which consists of COLUMNS. The most important decision during the initial design is to choose the data types that will capture the information you want in the least amount of space.
I am going to talk today about Exact Numerical data types. These types can be categorized as integer, money, or decimal with exact precision. No precision is lost during storage like Approximate Numerical data types.
The first step is to create a sample database named [BASIC] that contains a sample schema named [TRAINING]. The snippet below accomplishes these actions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
<span style="color: #008000; font-size: small;">-- -- Create basic database -- -- Which database to use. USE [master] GO -- Delete existing databases. IF EXISTS (SELECT name FROM sys.databases WHERE name = N'BASIC') DROP DATABASE [BASIC] GO -- Add new databases. CREATE DATABASE [BASIC] ON PRIMARY ( NAME = N'BASIC_DAT', FILENAME = N'C:\MSSQL\DATA\BASIC.MDF' , SIZE = 5MB , MAXSIZE = 20MB , FILEGROWTH = 20%) LOG ON ( NAME = N'BASIC_LOG', FILENAME = N'C:\MSSQL\LOG\BASIC.LDF' , SIZE = 1MB , MAXSIZE = 5MB , FILEGROWTH = 512KB ); GO -- -- Create training schema -- -- Which database to use. USE [BASIC] GO -- Delete existing schema. IF EXISTS (SELECT * FROM sys.schemas WHERE name = N'TRAINING') DROP SCHEMA [TRAINING] GO -- Add new schema. CREATE SCHEMA [TRAINING] AUTHORIZATION [dbo] GO </span> |
The second step is to create a sample table named [EXACT_NUMERICS] that contains one or more fields for each data type. Books online describes nine data types that are considered exact numerics. Most data types have a range of values that can be stored. Some data types use precision (total number of digits) and scale (number of decimal points) to vary the range of values. With this variation comes differences in storage size.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
<span style="color: #008000; font-size: small;">-- -- Create test tables (exact numerics) -- -- Delete existing table IF EXISTS ( SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TRAINING].[EXACT_NUMERICS]') AND type in (N'U')) DROP TABLE [TRAINING].[EXACT_NUMERICS] GO -- Create new table CREATE TABLE [TRAINING].[EXACT_NUMERICS] ( EN1 BIT, -- 8 BIT FIELDS = 1 BYTE EN2 TINYINT, -- 1 BYTE EN3 SMALLINT, -- 2 BYTES EN4 INT, -- 4 BYTES EN5 BIGINT, -- 8 BYTES EN6 SMALLMONEY, -- 4 BYTES EN7 MONEY, -- 8 BYTES EN8 DECIMAL(8, 3), -- DEPENDS ON (P, S) EN9 NUMERIC(18, 5), -- DEPENDS ON (P, S) EN10 NUMERIC(38, 0) -- DEPENDS ON (P, S) ); GO </span> |
The third step is to load the table with values that show the minimum and maximum data points that can be stored.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
<span style="color: #008000; font-size: small;">-- Insert lower range min values INSERT INTO [TRAINING].[EXACT_NUMERICS] VALUES ( 0, 0, -32768, -2147483648, -9223372036854775808, -214748.3648, -922337203685477.5808, -99999.999, -9999999999999.99999, -999999999999999999999999999999999999 ); GO -- Insert upper range max values INSERT INTO [TRAINING].[EXACT_NUMERICS] VALUES ( 1, 255, 32767, 2147483647, 9223372036854775807, 214748.3647, 922337203685477.5807, 99999.999, 9999999999999.99999, 999999999999999999999999999999999999 ); GO -- Return the data from the table SELECT * FROM [TRAINING].[EXACT_NUMERICS] GO </span> |
As a database designer, you should always question the components that make up your database.
One question that you might have is ‘What is the maximum number of bytes that a row can have?’. This is important because data is stored in the *.MDF or *.NDF files as pages ~ 8k. Since a page can only save 8060 bytes, you can figure out how many records can fit on a page and how many bytes are wasted space.
The following code uses the sys.columns table to count the number of fields and calculate the maximum row size.
1 2 3 4 5 6 7 8 9 10 |
<span style="color: #008000; font-size: small;">-- Maximum row length (num cols, max bytes) SELECT OBJECT_NAME (c.object_id) tablename, COUNT (1) nr_columns, SUM (c. max_length) maxrowlength FROM sys.columns AS c WHERE OBJECT_NAME(c.object_id) = 'EXACT_NUMERICS' GROUP BY OBJECT_NAME (c.object_id) ORDER BY OBJECT_NAME (c.object_id); </span> |
We can see that 10 columns in the table have a maximum record length of 59 bytes and 136 records will fit into one page. This leaves 36 bytes of wasted space on each data page. The sp_spaceused stored procedure shows us that 1 data and 1 index page has been allocated for the table. This is called a mixed extent.
1 2 3 |
<span style="color: #008000; font-size: small;">-- Real life numbers (pages/extents) EXEC sp_spaceused 'TRAINING.EXACT_NUMERICS'; </span> |
Last but not least, the sp_help stored procedure displays the details of the table. This includes many different settings that can be choosen as a DDL designer such as computed column, field length, nullabilty, and collation to mention a few.
1 2 3 |
<span style="color: #008000; font-size: small;">-- Display size details of table EXEC sp_help 'TRAINING.EXACT_NUMERICS'; </span> |
In summary, when designing a table to use exact numerical data types, choose the data type that will allow the storage of the information in the least amount of space. Most people choose INT (4 bytes) or MONEY (8 bytes) as the default. Selecting the correct data type can amount in savings of up to 50%. Next time, I will be going over approximate numerical types.
You’ve managed a first class post
Great post. I was checking continuously this blog and I’m impressed! Very useful info particularly the last part :)
I care for such information and was seeking this certain information for a long time. Thank you and good luck.