I am continuing my series of talks on fundamental topics like data types. I am proud to be a United States Army Reservist (USAR) Veteran.
Just like boot camp I went to so long ago, I am going nick name the series BASIC TRAINING.
The most basic unit of any database is a TABLE which consists of COLUMNS. A vital decision during the initial database design is to choose the data types that will capture the information you want in the least amount of space.
Today, I am exploring binary strings by extending the sample database named [BASIC] that contains a schema named [TRAINING] and table named [BINARY_STRINGS].
Books online describes three data types that are categorized as binary. I created a field for each type in our new table as well as a surrogate key.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
<span style="color: #008000; font-size: small;">-- -- 1 - Create test table (binary data types) -- -- Delete existing table IF EXISTS ( SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TRAINING].[BINARY_STRINGS]') AND type in (N'U')) DROP TABLE [TRAINING].[BINARY_STRINGS] GO -- Create new table CREATE TABLE [TRAINING].[BINARY_STRINGS] ( ID1 TINYINT NOT NULL, -- 1 BYTE BIN1 BINARY(8), -- 8 BYTES BIN2 VARBINARY(4), -- DATA SIZE (1..4) + 2 BYTES OVERHEAD BIN3 VARBINARY(MAX), -- OUT OF ROW STORAGE (0 .. 2 ^ 31) BYTES BIN4 IMAGE -- DITTO - DEPRECIATED IN FUTURE RELEASES ); GO </span> |
Many people use binary fields to store documents such as PDF files. Due to the size of these files, I am going to make adjustments to increase the data and log files associated with the [BASIC] database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
<span style="color: #008000; font-size: small;">-- -- 2 - Alter data & log files settings -- -- Select the correct database USE BASIC GO -- Switch owner to system admin ALTER AUTHORIZATION ON DATABASE::BASIC TO SA; GO -- Change data size to load pdf ALTER DATABASE BASIC MODIFY FILE (NAME = BASIC_DAT, SIZE = 10 MB, MAXSIZE = 1000 MB, FILEGROWTH = 10 MB); GO -- Change log size to load pdf ALTER DATABASE BASIC MODIFY FILE (NAME = BASIC_DAT, SIZE = 10 MB, MAXSIZE = 1000 MB, FILEGROWTH = 10 MB); GO </span> |
We should test the newly created table by loading some binary data.
1 2 3 4 5 6 7 8 9 10 11 12 |
<span style="color: #008000; font-size: small;">-- -- 3 - Add one row with some simple binary data -- INSERT INTO [TRAINING].[BINARY_STRINGS] VALUES (44, 0x03A0202F41100819, 0xA02F1019, 0x0, 0x0) GO -- Return the data from the table SELECT * FROM [TRAINING].[BINARY_STRINGS]; GO </span> |
To make this test more complex, I found a E-book version of Arthur Conan Doyle’s Sherlock Holmes adventures.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
<span style="color: #008000; font-size: small;">-- -- 4 - Update row with Sherlock Holmes Adventures (pdf) -- -- Update test table with data file UPDATE [TRAINING].[BINARY_STRINGS] SET BIN3 = (SELECT * FROM OPENROWSET(BULK N'C:\TEMP\SHERLOCK-HOLMES.PDF', SINGLE_BLOB) AS FREE_EBOOK), BIN4 = (SELECT * FROM OPENROWSET(BULK N'C:\TEMP\SHERLOCK-HOLMES.PDF', SINGLE_BLOB) AS FREE_EBOOK) WHERE ID1 = 44; GO -- Return the data from the table SELECT * FROM [TRAINING].[BINARY_STRINGS]; GO </span> |
To complete this example, what would a database be if you could not extract the PDF document when it was needed? We will first need to enable to command shell extended stored procedure using the code below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
<span style="color: #008000; font-size: small;">-- -- 5 - Enable the command shell -- -- To allow advanced options to be changed. EXEC sp_configure 'show advanced options', 1 GO -- To update the currently configured value for advanced options. RECONFIGURE GO -- To enable the feature. EXEC sp_configure 'xp_cmdshell', 1 GO -- To update the currently configured value for this feature. RECONFIGURE GO </span> |
The only utility that can export binary objects is the bcp.exe program. The TSQL code below will create and execute a command shell script to extract the document.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
<span style="color: #008000; font-size: small;">-- -- 6 - Export blob using BCP -- -- Local variables DECLARE @VAR_CMD VARCHAR(1024); DECLARE @VAR_RETURN INT; -- Build the string SET @VAR_CMD = ''; SET @VAR_CMD = @VAR_CMD + 'BCP "SELECT [BIN3] FROM [BASIC].[TRAINING].[BINARY_STRINGS] WHERE ID1=44" '; SET @VAR_CMD = @VAR_CMD + 'QUERYOUT C:\TEMP\BIN3.PDF -T -f C:\TEMP\SHERLOCK-HOLMES.FMT -S ' + @@SERVERNAME; -- Show the string PRINT @VAR_CMD; -- Execute the command EXEC @VAR_RETURN = MASTER..xp_cmdshell @VAR_CMD; -- Did not work IF (@VAR_RETURN > 0) BEGIN RAISERROR ('Unable to export the pdf file to os system. Please check the file system for issues.', 16, 1) WITH LOG; END GO </span> |
The screen shot below shows the execution of the BCP program exporting the binary document to disk.
As a database designer, you should always question the components that make up your database.
One question that you might have is ‘What is the maximum number of bytes that a row can have?’. This is important because data is stored in the *.MDF or *.NDF files as pages ~ 8k. Since a page can only save 8060 bytes, you can figure out how many records can fit on a page and how many bytes are wasted space.
The following code uses the sys.columns table to count the number of fields and calculate the maximum row size.
1 2 3 4 5 6 7 8 9 10 11 |
<span style="color: #008000; font-size: small;">-- -- Maximum row length (num cols, max bytes) SELECT OBJECT_NAME (c.object_id) tablename, COUNT (1) nr_columns, SUM (c. max_length) maxrowlength FROM sys.columns AS c WHERE OBJECT_NAME(c.object_id) = 'DATE_N_TIME' GROUP BY OBJECT_NAME (c.object_id) ORDER BY OBJECT_NAME (c.object_id); </span> |
We can see that 5 columns in the table have a maximum record length of 28 bytes. This information is misleading since it does not account for the out of row data. It only counts the first three columns. The [sp_spaceused] stored procedure shows us that 1 index page and many data pages have been allocated for the table.
1 2 3 |
<span style="color: #008000; font-size: small;">-- Real life numbers (pages/extents) EXEC sp_spaceused 'TRAINING.BINARY_STRINGS'; </span> |
We can query the dynamic management view named index physical stats to the an accurate number of data pages in the heap. A heap is a table without an index. In our example, there are 1297 pages allocated for the documents.
1 2 3 4 5 6 |
<span style="color: #008000; font-size: small;">-- Valid for SQL Server 2005 + SELECT s.database_id, s.object_id, s.index_type_desc, s.page_count, s.record_count, s.min_record_size_in_bytes, s.max_record_size_in_bytes FROM sys.dm_db_index_physical_stats (DB_ID(N'BASIC'), OBJECT_ID(N'TRAINING.BINARY_STRINGS'), NULL, NULL , 'DETAILED') AS s; GO </span> |
Last but not least, the sp_help stored procedure displays the details of the table. This includes many different settings that can be choosen as a DDL designer such as computed column, field length, nullabilty, and collation to mention a few.
1 2 3 4 |
<span style="color: #008000; font-size: small;">-- Display size details of table EXEC sp_help 'TRAINING.BINARY_STRINGS'; GO </span> |
In summary, binary strings can be used to save bit patterns from your application specific integrated circuit or your favorite music file (MP3) in the database. Next time, I will be going over types classified as other.