{"id":1664,"date":"2012-02-12T03:54:14","date_gmt":"2012-02-12T03:54:14","guid":{"rendered":"http:\/\/craftydba.com\/?p=1664"},"modified":"2017-10-12T01:20:58","modified_gmt":"2017-10-12T01:20:58","slug":"full-text-search-part-3","status":"publish","type":"post","link":"https:\/\/craftydba.com\/?p=1664","title":{"rendered":"Full Text Search &#8211; Part 3"},"content":{"rendered":"<p>Tonight, I want to revisit the [WILD LIFE] database that we were using last time.  The main goal is to correctly search the [ANIMALS] table returning all [NAME]&#8217;s that contain the word fly.  We want the query plan associated with our query to use a index to speed up overall execution time.<\/p>\n<p>The first solution to this business problem resulted in <span style=\"color: #0033ff;\">FULL TABLE SCANS<\/span> or  <span style=\"color: #0033ff;\">CLUSTERED INDEX SCANS<\/span>.  Please see the desciptions of <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms175913.aspx\">graphical icons<\/a> that can show up in a query plan.  Since every record in the table has to be parsed, this was not a good scalable solution.  <\/p>\n<p>The second solution lead us to believe that a full text index was going to save the day.  The TSQL query used an  <span style=\"color: #0033ff;\">INDEX SEEK<\/span> to retrieve data; However, the results were not what was expected.<\/p>\n<p>The third time is a charm.  I suggest that we categorize the data during the load process.  This can be done with a ETL tool or DML table TRIGGERS.  I am going to option for the later since the amount of data we are talking about<br \/>\nis relatively samll.<\/p>\n<p>Drop and recreate the [WILD LIFE] database from scratch.  We are going to add a column called [CATERGORY] to the table.  It can be defined as an INT or a CHAR.  When using a integer, store matches as powers of 2.  The bitwise or (&#038;) function can be used to retrive any combination of words you are categorizing.  For larger sets of values, use a character array.  X&#8217; and O&#8217;s and be used to mark the existence of a word.  The new table is shown below.<\/p>\n<pre class=\"lang:TSQL theme:familiar mark:1,2-3\" title=\"tsql - animals table &#038; clustered index\">\r\n-- Create the animals table\r\nCREATE TABLE ANIMALS\r\n(\r\nID INT NOT NULL IDENTITY (1, 1),\r\nNAME VARCHAR(200) NOT NULL,\r\nCATEGORY INT DEFAULT (0)\r\n)\r\nGO\r\n\r\n-- Add index on name (clustered)\r\nCREATE CLUSTERED INDEX IDX_ANIMAL_CATEGORY ON DBO.ANIMALS(CATEGORY);\r\nGO\r\n<\/pre>\n<p><\/P><\/p>\n<p>A user defined <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms186755.aspx\">FUNCTION<\/a> is going to be designed to take a [NAME] as an input and return a [CATEGORY] value as a result.  The FUNCTION will parse the [NAME] looking for 4 key animals.<\/p>\n<pre class=\"lang:TSQL theme:familiar mark:1,2-3\" title=\"tsql - categorizing function\">\r\n-- Create a function, map name -> code\r\nCREATE FUNCTION UFN_MY_CATEGORIES(@NAME VARCHAR(200))\r\nRETURNS INT AS\r\nBEGIN\r\n\r\n   -- Default value\r\n   DECLARE @RETVAL INT = 0;\r\n   \r\n   -- 2 POWER 0\r\n   IF (LOWER(@NAME) LIKE '%fly%')\r\n       SET @RETVAL = @RETVAL + 1;\r\n\r\n   -- 2 POWER 1\r\n   IF (LOWER(@NAME) LIKE '%pig%')\r\n       SET @RETVAL = @RETVAL + 2;\r\n\r\n   -- 2 POWER 2\r\n   IF (LOWER(@NAME) LIKE '%bat%')\r\n       SET @RETVAL = @RETVAL + 4;\r\n\r\n   -- 2 POWER 3\r\n   IF (LOWER(@NAME) LIKE '%dog%')\r\n       SET @RETVAL = @RETVAL + 8;\r\n   \r\n   -- RETURN THE CODE\r\n   RETURN(@RETVAL);\r\n   \r\nEND;\r\n<\/pre>\n<p><\/P><\/p>\n<p>A user defined <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms189799.aspx\">TRIGGER<\/a> is going to be added to the [ANIMALS] table.  When INSERTS or UPDATES happen to the table, the data will be categorized.  Esentially, we are moving the text parsing from the QUERY execution to the DATA loading phase.  This technique is only good if the data changes seldomly.  Large record changes can make this choice worse than the first solution.  Since animal names do not change that often, we have a good solution.  <\/p>\n<pre class=\"lang:TSQL theme:familiar mark:1,2-3\" title=\"tsql - insert &#038; update trigger for category\">\r\n-- Create a trigger to categorize name\r\nCREATE TRIGGER TRG_CATEGORIZE_ANIMALS ON ANIMALS\r\nAFTER INSERT, UPDATE AS\r\nBEGIN\r\n\r\n    -- Detect inserts (1 .. n)\r\n    IF EXISTS (SELECT * FROM inserted) AND NOT EXISTS(SELECT * FROM deleted) \r\n    BEGIN     \r\n        -- Update records\r\n        UPDATE ANIMALS\r\n        SET CATEGORY = dbo.UFN_MY_CATEGORIES(I.NAME)\r\n        FROM ANIMALS A JOIN INSERTED I ON A.ID = I.ID;\r\n        \r\n        -- Debugging    \r\n        PRINT 'INSERT DETECTED, CREATED CATEGORY';\r\n    END; \r\n    \r\n    -- Detect deletes (1 .. n)\r\n    IF EXISTS (SELECT * FROM deleted) AND NOT EXISTS(SELECT * FROM inserted) \r\n    BEGIN\r\n        -- Debugging    \r\n        PRINT 'DELETE DETECTED, NOTHING TO DO';\r\n    END;\r\n        \r\n    -- Detected updates (1 .. n)\r\n    IF EXISTS (SELECT * FROM inserted) AND EXISTS(SELECT * FROM deleted)\r\n    BEGIN     \r\n        -- Update records\r\n        UPDATE ANIMALS\r\n        SET CATEGORY = dbo.UFN_MY_CATEGORIES(I.NAME)\r\n        FROM ANIMALS A JOIN INSERTED I ON A.ID = I.ID;\r\n        \r\n        -- Debugging    \r\n        PRINT 'UPDATE DETECTED, ADJUST CATEGORY';\r\n    END; \r\n    \r\nEND\r\nGO\r\n<\/pre>\n<p><\/P><\/p>\n<p>Please run the rest of the <a href='https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/animals-example3.txt'>full script<\/a> to load the table with data.<\/p>\n<p>I am going to search for all animals that contain the word fly.  We are going to look at how the query analyzer creates different cost based execution plans.  <\/p>\n<p>For a given plan, we are going to look at the time in milliseconds (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms190287.aspx\">SET STATISTICS TIME<\/a>) and the io counts (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms184361.aspx\">SET STATISTICS IO<\/a>). <\/p>\n<p>To get a good test, we should write dirty pages to disk (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms188748.aspx\">CHECKPOINT<\/a>), drop clean buffers from memory (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187762.aspx\">DBCC DROPCLEANBUFFERS<\/a>) and free the procedure cache of any query plans (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms174283.aspx\">DBCC FREEPROCCAHE<\/a>).  These commands should be completed before executing the query.<\/p>\n<p>The sql snippet below does just that.<\/p>\n<pre class=\"lang:TSQL theme:familiar mark:1,2-3\" title=\"show time &#038; i\/o - start with clean buffer &#038; cache\">\r\n-- Show time & i\/o\r\nSET STATISTICS TIME ON\r\nSET STATISTICS IO ON\r\nGO\r\n\r\n-- Remove clean buffers & clear plan cache\r\nCHECKPOINT \r\nDBCC DROPCLEANBUFFERS \r\nDBCC FREEPROCCACHE\r\nGO\r\n<\/pre>\n<p><\/P><\/p>\n<p>Here is the query to select all records that have the word &#8216;fly&#8217; in the name.<\/p>\n<pre class=\"lang:TSQL theme:familiar mark:1,2-3\" title=\"tsql - search for the word fly\">\r\n-- Select everything with word 'fly'\r\nSELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'\r\nGO\r\n<\/pre>\n<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/categorized-results.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/categorized-results.jpg\" alt=\"\" title=\"categorized-results\" width=\"257\" height=\"208\" class=\"aligncenter size-full wp-image-1681\" \/><\/a><\/p>\n<p>Looking at the query results, query plan and io statistics, we have a <span style=\"color: #0033ff;\">CLUSTERED INDEX SEEK<\/span> that returns 8 correct results with a 18 ms compile time and a 1 ms run time.  <\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/categorized-query-plan.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/categorized-query-plan.jpg\" alt=\"\" title=\"categorized-query-plan\" width=\"533\" height=\"444\" class=\"aligncenter size-full wp-image-1684\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/categorized-query-plan.jpg 533w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/categorized-query-plan-300x249.jpg 300w\" sizes=\"auto, (max-width: 533px) 100vw, 533px\" \/><\/a><\/p>\n<p>In summary, categorizing the data ahead of time resulted in the quickest solution yet.  Next time, I want to revisit Full Text Indexing (FTI) to discuss how it works internally.  This talk will go over all the system stored procedures you need to know when working with FTI&#8217;s.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tonight, I want to revisit the [WILD LIFE] database that we were using last time. The main goal is to correctly search the [ANIMALS] table returning all [NAME]&#8217;s that contain the word fly. We want the query plan associated with our query to use a index to speed up overall execution time. The first solution to this business problem resulted in FULL TABLE SCANS or CLUSTERED INDEX SCANS. Please see the desciptions of graphical icons that can show up in a query plan. Since every record in the table has&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[318,317,12,358,15,28,359,29],"class_list":["post-1664","post","type-post","status-publish","format-standard","hentry","category-db-admin","tag-create-function","tag-create-trigger","tag-free-code","tag-full-text-searching","tag-john-f-miner-iii","tag-sql-server","tag-text-parsing","tag-tsql"],"_links":{"self":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/1664","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1664"}],"version-history":[{"count":0,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/1664\/revisions"}],"wp:attachment":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1664"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1664"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1664"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}