Basic Training – Data Types – Part 1

Just the other day, I was tasked with redesigning a data warehouse’s star schema that grew to over 4 terabytes in size. After completing the project, I realized that if the original designers knew more about storage (data types, data pages, index pages), the explosive growth would have not been so bad.

I ended up putting the database on a diet of daily table partitions and page compression. Today, the database is 20% of it’s orginal size.

In short, I am going to start off a series of talks covering such fundamental topics. I am a proud United States Army Reservist (USAR) Veteran. Just like boot camp I went to so long ago, I am going nick name the series BASIC TRAINING.

The most basic part of a database is a TABLE which consists of COLUMNS. The most important decision during the initial design is to choose the data types that will capture the information you want in the least amount of space.

I am going to talk today about Exact Numerical data types. These types can be categorized as integer, money, or decimal with exact precision. No precision is lost during storage like Approximate Numerical data types.

The first step is to create a sample database named [BASIC] that contains a sample schema named [TRAINING]. The snippet below accomplishes these actions.

The second step is to create a sample table named [EXACT_NUMERICS] that contains one or more fields for each data type. Books online describes nine data types that are considered exact numerics. Most data types have a range of values that can be stored. Some data types use precision (total number of digits) and scale (number of decimal points) to vary the range of values. With this variation comes differences in storage size.

The third step is to load the table with values that show the minimum and maximum data points that can be stored.

As a database designer, you should always question the components that make up your database.

One question that you might have is ‘What is the maximum number of bytes that a row can have?’. This is important because data is stored in the *.MDF or *.NDF files as pages ~ 8k. Since a page can only save 8060 bytes, you can figure out how many records can fit on a page and how many bytes are wasted space.

The following code uses the sys.columns table to count the number of fields and calculate the maximum row size.

We can see that 10 columns in the table have a maximum record length of 59 bytes and 136 records will fit into one page. This leaves 36 bytes of wasted space on each data page. The sp_spaceused stored procedure shows us that 1 data and 1 index page has been allocated for the table. This is called a mixed extent.

Last but not least, the sp_help stored procedure displays the details of the table. This includes many different settings that can be choosen as a DDL designer such as computed column, field length, nullabilty, and collation to mention a few.

In summary, when designing a table to use exact numerical data types, choose the data type that will allow the storage of the information in the least amount of space. Most people choose INT (4 bytes) or MONEY (8 bytes) as the default. Selecting the correct data type can amount in savings of up to 50%. Next time, I will be going over approximate numerical types.

Related posts

2 Thoughts to “Basic Training – Data Types – Part 1”

  1. Lyndee

    You’ve managed a first class post

  2. seo tutorial

    Great post. I was checking continuously this blog and I’m impressed! Very useful info particularly the last part :)

    I care for such information and was seeking this certain information for a long time. Thank you and good luck.

Leave a Comment