Basic Training – Data Types – Part 3


I am continuing my series of talks on fundamental topics like data types. I am proud to be a United States Army Reservist (USAR) Veteran.

Just like boot camp I went to so long ago, I am going nick name the series BASIC TRAINING.

The most basic unit of any database is a TABLE which consists of COLUMNS. A vital decision during the initial database design is to choose the data types that will capture the information you want in the least amount of space.

Today, I am exploring character strings which can be fixed or variable length in size.

I am extenting the sample database named [BASIC] that contains a sample schema named [TRAINING]. The snippet below creates a sample table named [CHAR_STRINGS] that contains one or more fields for each data type.

Books online describes two data types that are categorized as character strings: CHAR(N) is defined as fixed length of N and VARCHAR(N) is defined as variable length from zero to N. The overhead for VARCHAR is 2 bytes.

In SQL Server versions prior to 2005, N was defined from 0 to 8000 so that the record could fit on a single data page. The MAX key word was introduced in SQL Server 2005 in preparation for removing the TEXT data type in the future. Both VARCHAR(MAX) and TEXT can hold textual data
up to 2 GB in size.

We should test the newly crafted table by inserting data. The first test below will work since all columns are less than or eqaul to eight characters. The second test will fail since truncation will occur due to constant being greater than the field size.

We can tell the database engine to ignore the truncation warning by using the SET command.

I am going to alter the table definition to experiment with TEXT data type. By default, data for TEXT is stored out of row in seperate data pages.

The sp_tableoption stored procedure can be used to force the storage of TEXT data in row. The default for this setting is to store any string that is 256 bytes or less in row. Again, this stored procedure is slated to be discontinued in the future. It is suggested that you use VARCHAR(MAX) instead.

As a database designer, you should always question the components that make up your database.
One question that you might have is ‘What is the maximum number of bytes that a row can have?’.

This is important because data is stored in the *.MDF or *.NDF files as pages ~ 8k.
Since a page can only save 8060 bytes, you can figure out how many records can fit on a page and how many bytes are wasted space.

The following code uses the dynamic management view, sys.dm_db_index_physical_stats, to count the number of fields and calculate the maximum row size.

The sp_help stored procedure displays the details of the table. This includes many different settings that can be choosen as a DDL designer such as computed column, field length, nullabilty, and collation to mention a few.

You might have noticed that the index physical statistics DMV displayed two heap pages and three total records. How can we have a total of three records when we did two inserts and two updates? The answer is the out of row storage for BLOBS.

The undocumented DBCC IND command will show the pages being used by the table or HEAP.

The DBCC TRACEON flag 3604 will dump pages to the output window. The DBCC page command will dump the contents of each page. The DBCC TRACEOFF flag 3604 will restore the flag to its orginal setting.

The in row data page entry shows the actual text.

The out of row data page entry shows a text pointer to the BLOB.

In summary, use the VARCHAR data type when length of the data is double digits or more in size. While there is a two byte overhead for the datatype, the odds are the empty space reclaimed by the table will make up a total positive savings. Remember, the TEXT data type is on the depreciated list. Use the VARCHAR(MAX) instead.

Next time, I will be reviewing unicode character strings.

Related posts

Leave a Comment