A Microsoft SQL Server database is composed of at least two operating system files: a primary data file and a transaction log file. I am going to be talking about how to define databases with multiple files tonight.
There are three types of operating system files.
PRIMARY data file is the starting point of the database and is a pointer to any other data files that are allocated. A extension of .mdf is used to identify these files. This data file can contains system tables and user defined objects as well as data.
SECONDARY data files can be associated with the database. These files are used to spread the data files between LOGICAL disk drives to achieve high input/output rates. An extension of .ndf is used to identify these files. These data files contain only data.
LOG file(s) keep track of database transactions. Every database has at least one defined. It is used by recovery operations when a database becomes corrupt. Since the logging process is single threaded, I suggest to use one log file.
However, if a large SQL transaction is chewing up log space, you might have to allocate another log file until the process is completed. At that point, the transaction log can be backed up and truncated. The temporary log file can be deleted.
Each data file type has a logical name and physical file location. The logical name is used as a parameter to system stored procedures such as DBCC.
Filegroups are used to organize a bunch of files into one object. Each database comes with a default PRIMARY filegroup. Additional USER DEFINED filegroups can be added.
Today’s storage systems are built using RAID controllers and/or SAN appliance technologies. Thus, an allocated LOGICAL disk drive is really spread across several physical disks. The idea is to get as many drives involved when the database reads and writes data pages. The best way to do this is by having your storage administrator carve out N+1 drives for your use. The N drives will hold a single secondary data file. These data files can be grouped into a filegroup called MULTIFILE in our example. The N+1 drive will be used to hold the log file. The best situation is to have this log file on its own RAID 0 or 1 disk.
Since I do not have a RAID controller on my laptop, I can only show you the CREATE DATABASE statement below. I am going to re-use the [WILDLIFE] database that I have talked about in the past. This example assumes that you have a MSSQL directory on each drive. With either a DATA or LOG sub-directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
-- -- 1 - Create WILD LIFE database using 5 LOGICAL drives -- -- Define the db with file groups CREATE DATABASE [WILDLIFE] ON PRIMARY ( NAME = 'WildLifeData', FILENAME = 'C:\MSSQL\DATA\Wild_Life_Data.mdf', SIZE = 5MB, MAXSIZE = 10MB, FILEGROWTH = 500KB), FILEGROUP MULTIFILE ( NAME = 'Multi1_Data', FILENAME = 'C:\MSSQL\DATA\Multi1_Data.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 10MB), ( NAME = 'Multi2_Data', FILENAME = 'C:\MSSQL\DATA\Multi2_Data.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 10MB), ( NAME = 'Multi3_Data', FILENAME = 'C:\MSSQL\DATA\Multi3_Data.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 10MB), ( NAME = 'Multi4_Data', FILENAME = 'C:\MSSQL\DATA\Multi4_Data.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 10MB), ( NAME = 'Multi5_Data', FILENAME = 'C:\MSSQL\DATA\Multi5_Data.ndf', SIZE = 5MB, MAXSIZE = 100MB, FILEGROWTH = 10MB) LOG ON ( NAME = 'WildLifeLog', FILENAME = 'C:\MSSQL\LOG\Wild_Life_Log.ldf', SIZE = 12MB, MAXSIZE = 120MB, FILEGROWTH = 10MB) GO -- Switch owner to system admin ALTER AUTHORIZATION ON DATABASE::[WILDLIFE] TO SA; GO |
Only database objects that store data can take advantage of the new filegroup. Both TABLES and INDEXES can be defined on our new user defined file group.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
-- -- 2 - Create table & index using MULTIFILE group -- -- Use the database USE [WILDLIFE] GO -- Create the animals table CREATE TABLE ANIMALS ( ID INT NOT NULL IDENTITY (1, 1), NAME VARCHAR(200) NOT NULL, CATEGORY INT DEFAULT (0) ) ON [MULTIFILE] GO -- Add index on name (non-clustered) CREATE CLUSTERED INDEX IDX_ANIMAL_CATEGORY ON DBO.ANIMALS(CATEGORY) ON [MULTIFILE]; GO |
Looking at the table and index storage properties in SQL Server Management Studio (SSMS), we can see that they do exist on the MULTIFILE group.
The user defined FUNCTION and TRIGGER for cataloging the ANIMAL names will be stored in the PRIMARY file group on the PRIMARY data file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
-- -- 3 - Create a function, map name -> code -- CREATE FUNCTION UFN_MY_CATEGORIES(@NAME VARCHAR(200)) RETURNS INT AS BEGIN -- Default value DECLARE @RETVAL INT = 0; -- 2 POWER 0 IF (LOWER(@NAME) LIKE '%fly%') SET @RETVAL = @RETVAL + 1; -- 2 POWER 1 IF (LOWER(@NAME) LIKE '%pig%') SET @RETVAL = @RETVAL + 2; -- 2 POWER 2 IF (LOWER(@NAME) LIKE '%bat%') SET @RETVAL = @RETVAL + 4; -- 2 POWER 3 IF (LOWER(@NAME) LIKE '%dog%') SET @RETVAL = @RETVAL + 8; -- RETURN THE CODE RETURN(@RETVAL); END; -- -- Create a trigger to categorize name -- CREATE TRIGGER TRG_CATEGORIZE_ANIMALS ON ANIMALS AFTER INSERT, UPDATE AS BEGIN -- Detect inserts (1 .. n) IF EXISTS (SELECT * FROM inserted) AND NOT EXISTS(SELECT * FROM deleted) BEGIN -- Update records UPDATE ANIMALS SET CATEGORY = dbo.UFN_MY_CATEGORIES(I.NAME) FROM ANIMALS A JOIN INSERTED I ON A.ID = I.ID; -- Debugging PRINT 'INSERT DETECTED, CREATED CATEGORY'; END; -- Detect deletes (1 .. n) IF EXISTS (SELECT * FROM deleted) AND NOT EXISTS(SELECT * FROM inserted) BEGIN -- Debugging PRINT 'DELETE DETECTED, NOTHING TO DO'; END; -- Detected updates (1 .. n) IF EXISTS (SELECT * FROM inserted) AND EXISTS(SELECT * FROM deleted) BEGIN -- Update records UPDATE ANIMALS SET CATEGORY = dbo.UFN_MY_CATEGORIES(I.NAME) FROM ANIMALS A JOIN INSERTED I ON A.ID = I.ID; -- Debugging PRINT 'UPDATE DETECTED, ADJUST CATEGORY'; END; END GO |
The rest of the sample code to insert data and select from the table is enclosed for completeness.
So what have we learnt tonight?
First, every database has a primary data file and at least one log file. Additional secondary files can be defined across multiple disks and grouped into a single file group. This user defined filegroup can be used to define all tables and indexes. By having multiple path ways to the RAID controller, hopefully the file I/O will be spread across the whole bank of disk drives. This practice should increase the performance of the database. Last but not least, data files and log files should not be defined on the same disk or even controller.