Full Text Search in SQL Server

Introduction:

Full-Text Search in SQL Server and Azure SQL Database lets users and applications run full-text queries against character-based data in SQL Server tables. These queries are much faster than traditional search operations performed over character based data.

We can utilize this powerful tool provided in SQL Server for various large-scale businesses. Some of them are mentioned below-

  1. e-businesses: searching for an item on an ecommerce website.
  2. Schools/Colleges: searching for content in a large set of e-books
  3. Law Firms: searching for old case history logs
  4. HR Departments: matching job description with resume

Real-Life Requirement:

I recently came across this requirement. I worked on a product where the application used to dump error message (along with Full Stack Trace), debugging information and lots of other stuff in a database table called Logs. The client asked us to add a Report in the application, where they could see all the errors reported by the application.

Applying filters on one of the columns with data type varchar(max) was difficult and therefore, the report was really very slow. I had around 2 lakhs record in the Logs table (it was development and not a production database so one can imagine the amount of record in The Logs table on production database), and it took around 35-40 seconds to filter records based on the varchar(max) type column.

One won’t believe the effect of full-text index. After applying full-text indexing on that column, the same query (although now, I used predicates CONTAINS and FREETEXT instead of like) took less than a second to provide results.

Full-Text Index:

Full Text Search is performed by querying full-text indexed columns.

SQL Server provides two types of indexes in general- clustered and nonclustered indexes. These indexes can be created on almost all types of columns except those with large object (LOB) data types, for example- text and varchar(max).

Although, rarely we query over these types of columns however, there are some cases where we need to query over these columns. And without indexing, such a query with a traditional say “LIKE” keyword will perform full database scan and will often take minutes to provide results.

Key Points:

Some of key points to note down for a full-text index are mentioned below-

  1. A full-text index can be applied on any of the following data types- char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary(max) and FILESTREAM.
  2. A full-text index can be defined on tables and views.
  3. A full-text index is defined at the table level, and only one full-text index can be defined per table.
  4. A full-text index can be applied on one or more columns.
  5. A full-text index can be defined for a table only if there exists a UNIQUE NON NULLABLE Index on that table. This column is used as the Key Indexin the full-text index definition. Generally, key index should be defined on a column configured with an integer data type for best performance. The primary key is an appropriate choice for key index.
  6. Full-text search supports almost 50 diverse languages, such as English, Spanish, Chinese, Japanese, Arabic, Bengali, and Hindi.
     

Implementation:

In this blog, I’m going to illustrate the implementation of full-text search on a database table Logs (mentioned in the example at the beginning of this blog). We are going to use an existing database TestDB. Our table Logs will have some general information which is almost similar to any application irrespective of its domain.

Query to create Logs table in database TestDb-

USE [TestDb]
GO
CREATE TABLE [dbo].[Logs]
(
	[Id] [int] PRIMARY KEY IDENTITY(1,1) NOT NULL,
	[TimeStamp] [datetime] NOT NULL,			-- DateTime of error generation
	[Type] [varchar](200) NOT NULL,				-- Error/Debug Info
	[UserId] [varchar](max) NOT NULL,			-- LoggedIn user's id
	[Error] [nvarchar](max) NOT NULL,			-- Error message
)
GO

We will apply full-text index on Error column. To implement full-text indexing in SQL Server, we are going to follow following steps-

  1. Create a Full-Text Catalog: 
    1. A Full-Text Catalog provides a mechanism for managing full-text indexes.
    2. A Full-Text Catalog is associated with specific database.
    3. A Full-Text Catalog can have one or more full-test indexes but a full-text index can be associated with only one full-test catalog.
    4. A database can have zero or more full-text catalogs.

      Query to create a full-text catalog on TestDb-

      USE TestDb;  
      GO  
      CREATE FULLTEXT CATALOG LogsFTC; 

     

  2. Create a Full-Text Index:
    As mentioned earlier in the blog, there must be a UNIQUE NON NULLABLE index on the table, before creating a full-text index. But, since we already have a PRIMARY KEY on our Logs table, we don’t need to add another unique index for creating the full-text index. So, we’ll move forward and add full-text index on our Logs table.

    Query to add full-text index on Logs table-
    CREATE FULLTEXT INDEX ON Logs
    (  
        Error						--Full-text index column name   
        Language 2057				--2057 is the LCID for British English  
    )  
    KEY INDEX [PK_Logs] ON LogsFTC	--Unique index  
    WITH CHANGE_TRACKING AUTO     	--Population type;  
    GO
    Now, let me give some insights of the about the above query-

    1. The first line of the query CREATE FULLTEXT INDEX ON Logs, creates a full-text index on table Logs.
    2. Line number two, Error, supplies the column name(s) on which full-text index is to be applied.
    3. Third line, Language 2057, specifies the language code to be used for each column specified in the full-text index. Here, we’ve used British English.
    4. Line number four, KEY INDEX [PK_Logs] ON LogsFTC, specifies the Key Index to be used. Here, we’ve used the primary key of the table as the key index. Remember, we need to specify index name and not column name here. You can check all the indexes on a table with the following query.
      SELECT * from sys.indexes
      WHERE object_id = (SELECT object_id FROM sys.objects WHERE Name = 'Logs')
      
      The query will list all the index names and from there you can pick the index name applied on your unique indexed column. Further, LogsFTC, is the name of the full-text catalog we created above in the blog.
    5. The last line, WITH CHANGE_TRACKING AUTO, specifies that the full-text index will auto populate by tracking changes in the Logs table. As soon as the above full-text index is created, the full-text index is automatically fully populated with the existing data in the table.

  3. Create a StopList:
    1. A stoplist is a list of words that we want the full-text index to ignore while performing full-text search. It is also referred as Noise Words.
    2. These are those words that are used most commonly in a language and they don’t help in the search.
    3. A full-text index uses stoplist and ignores the listed words meaning it will not search for those words. This is extremely beneficial for the further improvement in performance of a full-text search.
      However, in this blog, we are not making use of stoplist words.

       

  4. Full-Text Querying:
    Now, we’ve successfully defined full-text index on our table, we’ve come to the last stage of our blog, i.e., querying over the full-text index. We can write full-text queries by using the full-text predicates CONTAINS and FREETEXT and the rowset-valued functions CONTAINSTABLE and FREETEXTTABLE with the SELECT.

    We can use these methods only on a full-text indexed column. To help make you a choice among the above ways to perform a full-text search, below are some points-
    1. To match words and phrases, use CONTAINSand CONTAINSTABLE.
    2. To match the meaning, but not the exact wording, use FREETEXTand FREETEXTTABLE.

    Below is a query that searches for a keyword ‘error’. It takes less than a second to give us the results.
    SELECT * FROM Logs
    WHERE CONTAINS(Error, 'error')
    You can now compare the performance of the search before and after applying a full-text index. Try performing the same search (‘error’) with the most commonly used ‘LIKE’ keyword, it’ll take a lot of time (even minutes depending on the number of records in the table) to give us the results.

Conclusion:

Full-Text Index optimizes the search operation and enhances some of the key operations of an application but, you should be aware of some of the downsides of it before using it. The blog tried to give you a short but meaningful introduction and implementation of the full-text index. You can learn more about full-text search here.

Hope you enjoyed!!