Full-Text Search – Part 1 – Ramblings of a Crafty DBA

Today, I want to go over why text searching for patterns is an expensive query operation.

I am going to create a [WILD LIFE] database that contains a table called [ANIMALS]. A identity column called [ID] will be populated by the system automatically and a [NAME] column will be loaded with 445 animals names that I grabbed from WIKIPEDIA. The idea is to search for animals that have a root word in common.

The snippet below creates the database and table. The full SQL script including insert statements is at the end of this article.

<span style="color: #008000;">-- Create a very basic database
CREATE DATABASE WILDLIFE;
GO

-- Use the database
USE [WILDLIFE]
GO

-- Create the animals table
CREATE TABLE ANIMALS
(
    ID INT NOT NULL IDENTITY (1, 1),
    NAME VARCHAR(200) NOT NULL
)
GO
</span>

-- Create a very basic database

CREATE DATABASE WILDLIFE;

-- Use the database

USE [WILDLIFE]

-- Create the animals table

CREATE TABLE ANIMALS

(

ID INT NOT NULL IDENTITY (1, 1),

NAME VARCHAR(200) NOT NULL

)

I am going to search for all animals that contain the word fly. We are going to look at how the query analyzer creates different cost based execution plans.

For a given plan, we are going to look at the time in milliseconds (SET STATISTICS TIME) and the io counts (SET STATISTICS IO).

To get a good test, we should write dirty pages to disk (CHECKPOINT), drop clean buffers from memory (DBCC DROPCLEANBUFFERS) and free the procedure cache of any query plans (DBCC FREEPROCCAHE). These commands should be completed before executing the query.

The sql snippet below does just that.

<span style="color: #008000;">-- Show time & i/o
SET STATISTICS TIME ON
SET STATISTICS IO ON
GO

-- Remove clean buffers & clear plan cache
CHECKPOINT 
DBCC DROPCLEANBUFFERS 
DBCC FREEPROCCACHE
GO
</span>

-- Show time & i/o

SET STATISTICS TIME ON

SET STATISTICS IO ON

-- Remove clean buffers & clear plan cache

CHECKPOINT

DBCC DROPCLEANBUFFERS

DBCC FREEPROCCACHE

Here is the query to select all records that have the word ‘fly’ in the name.

<span style="color: #008000;">-- Select everything with word 'fly'
SELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'
GO
</span>

-- Select everything with word 'fly'

SELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'

A FULL TABLE SCAN is performed because no index was defined. The data is stored as a unordered HEAP structure. This execution plan is very expensive when the table contains 1 Million rows.

We can also see 8 records returned in the result set after 1 scans and 2 logical reads.

Let us create a nonclustered index (CREATE INDEX) on the [NAME] column. Hopefully, the query optimizer will use the index. The data is stored in NONCLUSTERED INDEX structure in which leaf nodes are not data pages.

Oh no, we can see the same plan being used and the same execution results.

<span style="color: #008000;">-- Add index on name (non-clustered)
CREATE NONCLUSTERED INDEX IDX_ANIMAL_NAME ON DBO.ANIMALS(NAME);
GO

-- Select everything with word 'fly'
SELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'
GO

-- Drop index on name
DROP INDEX ANIMALS.IDX_ANIMAL_NAME;
GO
</span>

-- Add index on name (non-clustered)

CREATE NONCLUSTERED INDEX IDX_ANIMAL_NAME ON DBO.ANIMALS(NAME);

-- Select everything with word 'fly'

SELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'

-- Drop index on name

DROP INDEX ANIMALS.IDX_ANIMAL_NAME;

Lets drop the index and create a clustered index on [NAME]. Again, we are hoping the query optimizer will use the index. The data is stored in CLUSTERED INDEX structure in which leaf nodes are data pages.

<span style="color: #008000;">-- Add index on name (clustered)
CREATE CLUSTERED INDEX IDX_ANIMAL_NAME ON DBO.ANIMALS(NAME);
GO

-- Select everything with word 'fly'
SELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'
GO

-- Drop index on name
DROP INDEX ANIMALS.IDX_ANIMAL_NAME;
GO
</span>

-- Add index on name (clustered)

CREATE CLUSTERED INDEX IDX_ANIMAL_NAME ON DBO.ANIMALS(NAME);

-- Select everything with word 'fly'

SELECT * FROM dbo.ANIMALS WHERE NAME LIKE '%FLY%'

-- Drop index on name

DROP INDEX ANIMALS.IDX_ANIMAL_NAME;

This time, we do get the query optimizer to use the index but it is not optimal since it is a FULL CLUSTERED INDEX SCAN. Again, this is very expensive plan when the table contains alot of rows. A SEEK would be a better operation.

The main issue with this query is that we are looking for a pattern in a string. Indexing basically orders (logical or physical) the data given a sort collation. However, we are searching for part of a word. Thus, every [NAME] field needs to be examined to look for the ‘fly’ pattern.

The SQL snippet below turns off the messaging for io and time since we are done for now.

<span style="color: #008000;">-- Hide time & i/o
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
GO
</span>

-- Hide time & i/o

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

Next time, I am going to introduce setting up a full-text index using the GUI. Full-text indexes are great for searching on a single word or phrase (and optionally ranking the result set), searching on a word or phrase close to another word or phrase, and/or searching on synonymous forms of a specific word.

While this speeds up our searching, we will see that it does not completely solve the problem at hand.

Sample Code:
TSQL

3 Thoughts to “Full-Text Search – Part 1”

Frank Magliano

February 2, 2012 at 9:28 pm

A lot of thanks for your own labor on this blog. Ellie delights in engaging in investigation and it’s easy to see why. My spouse and i know all relating to the dynamic way you make efficient techniques on this website and as well increase participation from the others on that area of interest plus our own child is really learning so much. Take advantage of the rest of the new year. You’re carrying out a fabulous job.

Anthony

February 4, 2012 at 12:19 pm

heya, superior weblog, and a decent understand! at least one for my bookmarks.

Cheyenne Navarrete

February 4, 2012 at 8:15 pm

I like this post, enjoyed this one appreciate it for posting .

Full-Text Search – Part 1

Related posts

3 Thoughts to “Full-Text Search – Part 1”

Leave a Comment Cancel reply

Related posts

Staying Connected

Transferring Logins Between Servers

Page Anatomy – Part 2

3 Thoughts to “Full-Text Search – Part 1”

Leave a Comment Cancel reply