{"id":1629,"date":"2012-02-09T19:53:17","date_gmt":"2012-02-09T19:53:17","guid":{"rendered":"http:\/\/craftydba.com\/?p=1629"},"modified":"2017-10-12T01:22:45","modified_gmt":"2017-10-12T01:22:45","slug":"full-text-search-part-2","status":"publish","type":"post","link":"https:\/\/craftydba.com\/?p=1629","title":{"rendered":"Full-Text Search &#8211; Part 2"},"content":{"rendered":"<p>I want to revisit the [WILD LIFE] database that we were using last time.  First goal is to create a <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms142571.aspx\">full text index<\/a> (FTI) using the SQL Server Management Studio (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms174173.aspx\">SSMS<\/a>) graphical user interface (GUI).  Second goal is to have our SELECT query perform a INDEX SEEK using the FTI.<\/p>\n<p>We have the existing [ANIMALS] table with a identity column called [ID] and a data column called [NAME] which is loaded with 445 animals names that I grabbed from WIKIPEDIA.  The idea is to search for animals that have &#8216;fly&#8217; root word in common.<\/p>\n<p>To start the Full Text Index Wizard, right click on the [ANIMALS] table name and select &#8216;Define Full Text Index&#8217;.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/right-click-menu-items22.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/right-click-menu-items22.jpg\" alt=\"\" title=\"right-click-menu-items2\" width=\"485\" height=\"497\" class=\"aligncenter size-full wp-image-1662\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/right-click-menu-items22.jpg 485w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/right-click-menu-items22-292x300.jpg 292w\" sizes=\"auto, (max-width: 485px) 100vw, 485px\" \/><\/a><\/p>\n<p>The splash screen of the wizard list a summary of steps that will happen to define a FTI.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/wizard-page1.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/wizard-page1.jpg\" alt=\"\" title=\"wizard-page1\" width=\"525\" height=\"455\" class=\"aligncenter size-full wp-image-1637\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/wizard-page1.jpg 525w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/wizard-page1-300x260.jpg 300w\" sizes=\"auto, (max-width: 525px) 100vw, 525px\" \/><\/a><\/p>\n<p>The first step of the process is to select a unique index that will be associated with the FTI.  This association is neccessary for query processor to join with other tables.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/wizard-page2.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/wizard-page2.jpg\" alt=\"\" title=\"wizard-page2\" width=\"527\" height=\"460\" class=\"aligncenter size-full wp-image-1639\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/wizard-page2.jpg 527w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/wizard-page2-300x261.jpg 300w\" sizes=\"auto, (max-width: 527px) 100vw, 527px\" \/><\/a><\/p>\n<p>The second step is to pick the column that you want to create a index on.  The column has to be of type text, xml or image.  Only the text data type is in scope for this discussion.  The [name] column is selected as the search column.  English is selected as the word breakerlanguage, whose word breakers and stemmers perform linguistic analysis on all full-text indexed data. <\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page3.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page3.jpg\" alt=\"\" title=\"Wizard-Page3\" width=\"529\" height=\"465\" class=\"aligncenter size-full wp-image-1641\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page3.jpg 529w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page3-300x263.jpg 300w\" sizes=\"auto, (max-width: 529px) 100vw, 529px\" \/><\/a><\/p>\n<p>The third step is to define how the system is going to detect changes.  <\/p>\n<p>AUTOMATIC &#8211; SQL Server update the full-text index automatically<br \/>\nMANUAL &#8211; SQL Server tracks data changes.  DBA must appply the changes.<br \/>\nNONE &#8211; No tracking of data changes.  Full or incremental population has to be performed. <\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page4.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page4.jpg\" alt=\"\" title=\"Wizard-Page4\" width=\"522\" height=\"464\" class=\"aligncenter size-full wp-image-1642\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page4.jpg 522w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page4-300x266.jpg 300w\" sizes=\"auto, (max-width: 522px) 100vw, 522px\" \/><\/a><\/p>\n<p>The fourth step is to specify the catalog name, the index filegroup and the stop list.  A full text index (FTI) can be case sensitive or insensitive.  I am going to specify [ANIMAL_NAMES] as the catalog name.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page5.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page5.jpg\" alt=\"\" title=\"Wizard-Page5\" width=\"527\" height=\"462\" class=\"aligncenter size-full wp-image-1643\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page5.jpg 527w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page5-300x262.jpg 300w\" sizes=\"auto, (max-width: 527px) 100vw, 527px\" \/><\/a><\/p>\n<p>The fifth step specifies a schedule for the catalog maintenance.  Options such as frequency, time of day, and duration can be specified.  I am going to specify [UPDATE_ANIMALS] as the name of the schedule.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page6.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page6.jpg\" alt=\"\" title=\"Wizard-Page6\" width=\"684\" height=\"588\" class=\"aligncenter size-full wp-image-1644\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page6.jpg 684w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page6-300x257.jpg 300w\" sizes=\"auto, (max-width: 684px) 100vw, 684px\" \/><\/a><\/p>\n<p>The sixth step specified how the FTI is updated.  Either a full or incremental population of the index can be performed.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page7.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page7.jpg\" alt=\"\" title=\"Wizard-Page7\" width=\"675\" height=\"508\" class=\"aligncenter size-full wp-image-1645\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page7.jpg 675w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page7-300x225.jpg 300w\" sizes=\"auto, (max-width: 675px) 100vw, 675px\" \/><\/a><\/p>\n<p>The seventh step is to confirm all the selections before proceeding.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page8.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page8.jpg\" alt=\"\" title=\"Wizard-Page8\" width=\"671\" height=\"512\" class=\"aligncenter size-full wp-image-1646\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page8.jpg 671w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Wizard-Page8-300x228.jpg 300w\" sizes=\"auto, (max-width: 671px) 100vw, 671px\" \/><\/a><\/p>\n<p>After a short time, the FTI creation process will be complete.  To verify the creation process, take a look at the Storage section of the [WILD LIFE] database to find the full text catalog.  Right click the name to see the properties of the catalog.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Storage-Full-Text-Catalog2.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Storage-Full-Text-Catalog2.jpg\" alt=\"\" title=\"Storage-Full-Text-Catalog2\" width=\"952\" height=\"638\" class=\"aligncenter size-full wp-image-1648\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Storage-Full-Text-Catalog2.jpg 952w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Storage-Full-Text-Catalog2-300x201.jpg 300w\" sizes=\"auto, (max-width: 952px) 100vw, 952px\" \/><\/a><\/p>\n<p>Again, I am going to search for all animals that contain the word fly.  Unlike last time, I am interested in looking just at the query plan.  Time is a secondary concern.  I want to see if the FTI index will be used during query execution. I want to make sure the expected results are returned in the result set.<\/p>\n<p>There are four new TSQL keywords that can be used with full text catalogs \/ indexes.  The <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms187787.aspx\">CONTAINS<\/a> function looks for exact word or suffix matches.  The <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms176078.aspx\">FREETEXT<\/a> looks for fuzzy word phrase matches.  Both these functions have a variation called <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms189760.aspx\">CONTAINSTABLE<\/a> and <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms177652.aspx\">FREETEXTTABLE<\/a> respectively.  This variation returns a ranking number associated with quality of the match.  <\/p>\n<p>The code below was used to extent examples in part 1.<\/p>\n<pre class=\"lang:TSQL theme:familiar mark:1,2-3\" title=\"full text indexing - table valued functions\">\r\n-- Add index on id (unique - needed for FT joins)\r\nCREATE UNIQUE INDEX IDX_ANIMAL_ID ON DBO.ANIMALS(ID);\r\nGO\r\n\r\n-- Manually Create Full Text Index\r\n\r\n-- Exact match word 'fly' - (2) results\r\nSELECT ID, SUBSTRING(NAME,1,20) AS NAME2\r\nFROM ANIMALS \r\nWHERE CONTAINS(*,  ' \"fly\" ')\r\nGO\r\n\r\n-- Exact match word 'fly' as a word or prefix & ranking value - (3) results,\r\nSELECT ID, SUBSTRING(NAME,1,20) AS NAME2, [RANK] FROM ANIMALS AS A \r\nJOIN CONTAINSTABLE(ANIMALS, NAME, ' \"fly*\" ') FTS\r\nON A.ID = FTS.[KEY]\r\nGO\r\n\r\n-- Fuzzy match on a phrase\r\nSELECT ID, SUBSTRING(NAME,1,20) AS NAME2, [RANK] FROM ANIMALS AS A \r\nJOIN FREETEXTTABLE(ANIMALS, NAME, 'old pig ate fruit fly') FTS\r\nON A.ID = FTS.[KEY]\r\nGO\r\n<\/pre>\n<\/p>\n<p>Looking at the query plan for the CONTAINSTABLE query, we can see that our second goal has been achieved.  The full text index is being used. <\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Full-Text-Query-Plan2.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Full-Text-Query-Plan2.jpg\" alt=\"\" title=\"Full-Text-Query-Plan2\" width=\"1135\" height=\"230\" class=\"aligncenter size-full wp-image-1653\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Full-Text-Query-Plan2.jpg 1135w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Full-Text-Query-Plan2-300x60.jpg 300w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/Full-Text-Query-Plan2-1024x207.jpg 1024w\" sizes=\"auto, (max-width: 1135px) 100vw, 1135px\" \/><\/a><\/p>\n<p>However, we can also see they are not what we expected.  Instead of 8 rows, we have 3.  This is because full text index matches on whole words or start of words.<\/p>\n<p><a href=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/various-fti-functions2.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/various-fti-functions2.jpg\" alt=\"\" title=\"various-fti-functions2\" width=\"500\" height=\"459\" class=\"aligncenter size-full wp-image-1654\" srcset=\"https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/various-fti-functions2.jpg 500w, https:\/\/craftydba.com\/wp-content\/uploads\/2012\/02\/various-fti-functions2-300x275.jpg 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/a><\/p>\n<p>An interesting observation that the fuzzy phrase returns six animals including pigs, bats, flies and dogs.  This function is useful when searching large blobs of text for key words.  <\/p>\n<p>In the past, I designed ASP pages using fuzzy searches against help desk ticket data.  The help desk was told to put key words into the free form memo field when logging application user calls.  The web page would load a days worth of data every night and incrementally repopulate the catalog.  In short, it was a great solution to a business problem.<\/p>\n<p>In summary, full-text indexes are great for searching on a single word or phrase.  While a FTI speeds up our searching, it does not correctly solve the  problem at hand.  Next time, I am going to solve the business problem by categorizing data upon loading.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I want to revisit the [WILD LIFE] database that we were using last time. First goal is to create a full text index (FTI) using the SQL Server Management Studio (SSMS) graphical user interface (GUI). Second goal is to have our SELECT query perform a INDEX SEEK using the FTI. We have the existing [ANIMALS] table with a identity column called [ID] and a data column called [NAME] which is loaded with 445 animals names that I grabbed from WIKIPEDIA. The idea is to search for animals that have &#8216;fly&#8217;&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[354,355,12,356,357,352,351,350,353,15,28,29],"class_list":["post-1629","post","type-post","status-publish","format-standard","hentry","category-db-admin","tag-contains","tag-containstable","tag-free-code","tag-freetext","tag-freetextable","tag-full-population","tag-full-text-catalog","tag-full-text-index","tag-incremental-population","tag-john-f-miner-iii","tag-sql-server","tag-tsql"],"_links":{"self":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/1629","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1629"}],"version-history":[{"count":0,"href":"https:\/\/craftydba.com\/index.php?rest_route=\/wp\/v2\/posts\/1629\/revisions"}],"wp:attachment":[{"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1629"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1629"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/craftydba.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1629"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}