{"id":1421,"date":"2012-01-18T22:19:59","date_gmt":"2012-01-18T22:19:59","guid":{"rendered":"http:\/\/craftydba.com\/?p=1421"},"modified":"2012-02-12T02:46:17","modified_gmt":"2012-02-12T02:46:17","slug":"full-text-search-part-1","status":"publish","type":"post","link":"https:\/\/craftydba.com\/?p=1421","title":{"rendered":"Full-Text Search &#8211; Part 1"},"content":{"rendered":"<p>Today, I want to go over why text searching for patterns is an expensive query operation.  <\/p>\n<p>I am going to create a [WILD LIFE] database that contains a table called [ANIMALS].  A identity column called [ID] will be populated by the system automatically and a [NAME] column will be loaded with 445 animals names that I grabbed from WIKIPEDIA.  The idea is to search for animals that have a root word in common.<\/p>\n<p>The snippet below creates the database and table.  The full SQL script including insert statements is at the end of this article.<\/p>\n<pre><span style=\"color: #008000;\">-- Create a very basic database\r\nCREATE DATABASE WILDLIFE;\r\nGO\r\n\r\n-- Use the database\r\nUSE [WILDLIFE]\r\nGO\r\n\r\n-- Create the animals table\r\nCREATE TABLE ANIMALS\r\n(\r\n    ID INT NOT NULL IDENTITY (1, 1),\r\n    NAME VARCHAR(200) NOT NULL\r\n)\r\nGO\r\n<\/span><\/pre>\n<p><\/P><\/p>\n<p>I am going to search for all animals that contain the word fly.  We are going to look at how the query analyzer creates different cost based execution plans.  <\/p>\n<p>For a given plan, we are going to look at the time in milliseconds (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190287.aspx\">SET STATISTICS TIME<\/a>) and the io counts (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms184361.aspx\">SET STATISTICS IO<\/a>). <\/p>\n<p>To get a good test, we should write dirty pages to disk (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188748.aspx\">CHECKPOINT<\/a>), drop clean buffers from memory (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187762.aspx\">DBCC DROPCLEANBUFFERS<\/a>) and free the procedure cache of any query plans (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms174283.aspx\">DBCC FREEPROCCAHE<\/a>).  These commands should be completed before executing the query.<\/p>\n<p>The sql snippet below does just that.<\/p>\n<pre><span style=\"color: #008000;\">-- Show time & i\/o\r\nSET STATISTICS TIME ON\r\nSET STATISTICS IO ON\r\nGO\r\n\r\n-- Remove clean buffers & clear plan cache\r\nCHECKPOINT \r\nDBCC DROPCLEANBUFFERS \r\nDBCC FREEPROCCACHE\r\nGO\r\n<\/span><\/pre>\n<p><\/P><\/p>\n<p>Here is the query to select all records that have the word &#8216;fly&#8217; in the name.<\/p>\n<pre><span style=\"color: #008000;\">-- Select everything with word 'fly'\r\nSELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'\r\nGO\r\n<\/span><\/pre>\n<\/p>\n<p>A FULL TABLE <a href=\"http:\/\/blog.sqlauthority.com\/2007\/03\/30\/sql-server-index-seek-vs-index-scan-table-scan\/\">SCAN <\/a>is performed because no index was defined.  The data is stored as a unordered <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188270.aspx\">HEAP<\/a> structure.  This execution plan is very expensive when the table contains 1 Million rows.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/full-table-scan.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/full-table-scan.jpg\" alt=\"\" title=\"full-table-scan\" width=\"744\" height=\"200\" class=\"alignnone size-full wp-image-1436\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/full-table-scan.jpg 744w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/full-table-scan-300x80.jpg 300w\" sizes=\"auto, (max-width: 744px) 100vw, 744px\" \/><\/a><\/p>\n<p>We can also see 8 records returned in the result set after 1 scans and 2 logical reads.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/first-time-n-io.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/first-time-n-io.jpg\" alt=\"\" title=\"first-time-n-io\" width=\"486\" height=\"568\" class=\"alignnone size-full wp-image-1443\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/first-time-n-io.jpg 486w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/first-time-n-io-256x300.jpg 256w\" sizes=\"auto, (max-width: 486px) 100vw, 486px\" \/><\/a><\/p>\n<p>Let us create a nonclustered index (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188783.aspx\">CREATE INDEX<\/a>) on the [NAME] column.  Hopefully, the query optimizer will use the index.  The data is stored in <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms177484.aspx\">NONCLUSTERED INDEX<\/a> structure in which leaf nodes are not data pages.<\/p>\n<p>Oh no, we can see the same plan being used and the same execution results.<\/p>\n<pre><span style=\"color: #008000;\">-- Add index on name (non-clustered)\r\nCREATE NONCLUSTERED INDEX IDX_ANIMAL_NAME ON DBO.ANIMALS(NAME);\r\nGO\r\n\r\n-- Select everything with word 'fly'\r\nSELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'\r\nGO\r\n\r\n-- Drop index on name\r\nDROP INDEX ANIMALS.IDX_ANIMAL_NAME;\r\nGO\r\n<\/span><\/pre>\n<\/p>\n<p>Lets drop the index and create a clustered index on [NAME].  Again, we are hoping the query optimizer will use the index.  The data is stored in <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms177443.aspx\">CLUSTERED INDEX<\/a> structure in which leaf nodes are data pages.<\/p>\n<pre><span style=\"color: #008000;\">-- Add index on name (clustered)\r\nCREATE CLUSTERED INDEX IDX_ANIMAL_NAME ON DBO.ANIMALS(NAME);\r\nGO\r\n\r\n-- Select everything with word 'fly'\r\nSELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'\r\nGO\r\n\r\n-- Drop index on name\r\nDROP INDEX ANIMALS.IDX_ANIMAL_NAME;\r\nGO\r\n<\/span><\/pre>\n<\/p>\n<p>This time, we do get the query optimizer to use the index but it is not optimal since it is a FULL CLUSTERED INDEX SCAN.  Again, this is very expensive plan when the table contains alot of rows.  A <a href=\"http:\/\/blog.sqlauthority.com\/2007\/03\/30\/sql-server-index-seek-vs-index-scan-table-scan\/\">SEEK<\/a> would be a better operation.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/clustered-index-scan.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/clustered-index-scan.jpg\" alt=\"\" title=\"clustered-index-scan\" width=\"763\" height=\"426\" class=\"alignleft size-full wp-image-1444\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/clustered-index-scan.jpg 763w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/clustered-index-scan-300x167.jpg 300w\" sizes=\"auto, (max-width: 763px) 100vw, 763px\" \/><\/a><\/p>\n<p>The main issue with this query is that we are looking for a pattern in a string.  Indexing basically orders (logical or physical) the data given a sort collation.  However, we are searching for part of a word.  Thus, every [NAME] field needs to be examined to look for the &#8216;fly&#8217; pattern.<\/p>\n<p>The SQL snippet below turns off the messaging for io and time since we are done for now.<\/p>\n<pre><span style=\"color: #008000;\">-- Hide time & i\/o\r\nSET STATISTICS TIME OFF\r\nSET STATISTICS IO OFF\r\nGO\r\n<\/span><\/pre>\n<\/p>\n<p>Next time, I am going to introduce setting up a  <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms142571.aspx\">full-text index<\/a> using the GUI.  Full-text indexes are great for searching on a single word or phrase (and optionally ranking the result set), searching on a word or phrase close to another word or phrase, and\/or searching on synonymous forms of a specific word.<\/p>\n<p>While this speeds up our searching, we will see that it does not completely solve the problem at hand.<\/p>\n<p>Sample Code:<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;  <a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/01\/animals.txt\">TSQL<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, I want to go over why text searching for patterns is an expensive query operation. I am going to create a [WILD LIFE] database that contains a table called [ANIMALS]. A identity column called [ID] will be populated by the system automatically and a [NAME] column will be loaded with 445 animals names that I grabbed from WIKIPEDIA. The idea is to search for animals that have a root word in common. The snippet below creates the database and table. The full SQL script including insert statements is at&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[311,309,308,117,312,313,310,12,15,307,306,28,29],"class_list":["post-1421","post","type-post","status-publish","format-standard","hentry","category-db-admin","tag-checkpoint","tag-create-clustered-index","tag-create-nonclustered-index","tag-database-administrator","tag-dbcc-dropcleanbuffers","tag-dbcc-freeproccache","tag-drop-index","tag-free-code","tag-john-f-miner-iii","tag-set-statistics-io","tag-set-statistics-time","tag-sql-server","tag-tsql"],"_links":{"self":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/1421","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1421"}],"version-history":[{"count":0,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/1421\/revisions"}],"wp:attachment":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1421"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1421"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1421"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}