Full-Text Search – Part 2

I want to revisit the [WILD LIFE] database that we were using last time. First goal is to create a full text index (FTI) using the SQL Server Management Studio (SSMS) graphical user interface (GUI). Second goal is to have our SELECT query perform a INDEX SEEK using the FTI.

We have the existing [ANIMALS] table with a identity column called [ID] and a data column called [NAME] which is loaded with 445 animals names that I grabbed from WIKIPEDIA. The idea is to search for animals that have ‘fly’ root word in common.

To start the Full Text Index Wizard, right click on the [ANIMALS] table name and select ‘Define Full Text Index’.

The splash screen of the wizard list a summary of steps that will happen to define a FTI.

The first step of the process is to select a unique index that will be associated with the FTI. This association is neccessary for query processor to join with other tables.

The second step is to pick the column that you want to create a index on. The column has to be of type text, xml or image. Only the text data type is in scope for this discussion. The [name] column is selected as the search column. English is selected as the word breakerlanguage, whose word breakers and stemmers perform linguistic analysis on all full-text indexed data.

The third step is to define how the system is going to detect changes.

AUTOMATIC – SQL Server update the full-text index automatically
MANUAL – SQL Server tracks data changes. DBA must appply the changes.
NONE – No tracking of data changes. Full or incremental population has to be performed.

The fourth step is to specify the catalog name, the index filegroup and the stop list. A full text index (FTI) can be case sensitive or insensitive. I am going to specify [ANIMAL_NAMES] as the catalog name.

The fifth step specifies a schedule for the catalog maintenance. Options such as frequency, time of day, and duration can be specified. I am going to specify [UPDATE_ANIMALS] as the name of the schedule.

The sixth step specified how the FTI is updated. Either a full or incremental population of the index can be performed.

The seventh step is to confirm all the selections before proceeding.

After a short time, the FTI creation process will be complete. To verify the creation process, take a look at the Storage section of the [WILD LIFE] database to find the full text catalog. Right click the name to see the properties of the catalog.

Again, I am going to search for all animals that contain the word fly. Unlike last time, I am interested in looking just at the query plan. Time is a secondary concern. I want to see if the FTI index will be used during query execution. I want to make sure the expected results are returned in the result set.

There are four new TSQL keywords that can be used with full text catalogs / indexes. The CONTAINS function looks for exact word or suffix matches. The FREETEXT looks for fuzzy word phrase matches. Both these functions have a variation called CONTAINSTABLE and FREETEXTTABLE respectively. This variation returns a ranking number associated with quality of the match.

The code below was used to extent examples in part 1.

Looking at the query plan for the CONTAINSTABLE query, we can see that our second goal has been achieved. The full text index is being used.

However, we can also see they are not what we expected. Instead of 8 rows, we have 3. This is because full text index matches on whole words or start of words.

An interesting observation that the fuzzy phrase returns six animals including pigs, bats, flies and dogs. This function is useful when searching large blobs of text for key words.

In the past, I designed ASP pages using fuzzy searches against help desk ticket data. The help desk was told to put key words into the free form memo field when logging application user calls. The web page would load a days worth of data every night and incrementally repopulate the catalog. In short, it was a great solution to a business problem.

In summary, full-text indexes are great for searching on a single word or phrase. While a FTI speeds up our searching, it does not correctly solve the problem at hand. Next time, I am going to solve the business problem by categorizing data upon loading.

Related posts

Leave a Comment