|Read more about this book|
(For more resources on Search Engine, see here.)
Sphinx is a full-text search engine. So, before going any further, we need to understand what full-text search is and how it excels over the traditional searching.
What is full-text search?
Full-text search is one of the techniques for searching a document or database stored on a computer. While searching, the search engine goes through and examines all of the words stored in the document and tries to match the search query against those words. A complete examination of all the words (text) stored in the document is undertaken and hence it is called a full-text search.
Full-text search excels in searching large volumes of unstructured text quickly and effectively. It returns pages based on how well they match the user’s query.
To understand the difference between a normal search and full-text search, let’s take an example of a MySQL database table and perform searches on it.
It is assumed that MySQL Server and phpMyAdmin are already installed on your system.
Time for action – normal search in MySQL
- Open phpMyAdmin in your browser and create a new database called myblog.
- Select the myblog database:
- Create a table by executing the following query:
CREATE TABLE `posts` (
`id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`title` VARCHAR( 255 ) NOT NULL ,
`description` TEXT NOT NULL ,
`created` DATETIME NOT NULL ,
`modified` DATETIME NOT NULL
) ENGINE = MYISAM;
Queries can be executed from the SQL page in phpMyAdmin. You can find the link to that page in the top menu.
- Populate the table with some records:
INSERT INTO `posts`(`id`, `title`, `description`, `created`,
(1, 'PHP scripting language', 'PHP is a web scripting language
originally created by Rasmus Lerdorf', NOW(), NOW()),
(2, 'Programming Languages', 'There are many languages available
to cater any kind of programming need', NOW(), NOW()),
(3, 'My Life', 'This post is about my life which in a sense is
beautiful', NOW(), NOW()),
(4, 'Life on Mars', 'Is there any life on mars?', NOW(), NOW());
- Next, run the following queries against the table:
SELECT * FROM posts WHERE title LIKE 'programming%';
The above query returns row 2.
SELECT * FROM posts WHERE description LIKE '%life%';
The above query return rows 3 and 4.
SELECT * FROM posts WHERE description LIKE '%scripting language%';
The above query returns row 1.
SELECT * FROM posts WHERE description LIKE '%beautiful%' OR
description LIKE '%programming%';
The above query returns rows 2 and 3.
To administer MySQL database, I highly recommend using a GUI interface tool like phpMyAdmin (http://www.phpmyadmin.net). All the above mentioned queries can easily be executed
What just happened?
We first created a table posts to hold some data. Each post has a title and a description. We then populated the table with some records.
With the first SELECT query we tried to find all posts where the title starts with the word programming. This correctly gave us the row number 2. But what if you want to search for the word anywhere in the field and not just at that start? For this we fired the second query, wherein we searched for the word life anywhere in the description of the post. Again this worked pretty well for us and as expected we got the result in the form of row numbers 3 and 4.
Now what if we wanted to search for multiple words? For this we fired the third query where we searched for the words scripting language. As row 1 has those words in its description, it was returned correctly.
Until now everything looked fine and we were able to perform searches without any hassle. The query gets complex when we want to search for multiple words and those words are not necessarily placed consecutively in a field, that is, side by side. One such example is shown in the form of our fourth query where we tried to search for the words programming and beautiful in the description of the posts. Since the number of words we need to search for increases, this query gets complicated, and moreover, slow in execution, since it needs to match each word individually.
The previous SELECT queries and their output also don’t give us any information about the relevance of the search terms with the results found. Relevance can be defined as a measure of how closely the returned database records match the user’s search query. In other words, how pertinent the result set is to the search query.
Relevance is very important in the search world because users want to see the items with highest relevance at the top of their search results. One of the major reasons for the success of Google is that their search results are always sorted by relevance.
MySQL full-text search
This is where full-text search comes to the rescue. MySQL has inbuilt support for full-text search and you only need to add FULLTEXT INDEX to the field against which you want to perform your search.
Continuing the earlier example of the posts table, let’s add a full-text index to the description field of the table. Run the following query:
ALTER TABLE `posts` ADD FULLTEXT (
The query will add an INDEX of type FULLTEXT to the description field of the posts table.
Only MyISAM Engine in MySQL supports the full-text indexes.
Now to search for all the records which contain the words programming or beautiful anywhere in their description, the query would be:
SELECT * FROM posts WHERE
MATCH (description) AGAINST ('beautiful programming');
This query will return rows 2 and 3, and the returned results are sorted by relevance. One more thing to note is that this query takes less time than the earlier query, which used LIKE for matching.
By default, the MATCH() function performs a natural language search, it attempts to use natural language processing to understand the nature of the query and then search accordingly.
Full-text search in MySQL is a big topic in itself and we have only seen the tip of the iceberg. For a complete reference, please refer to the MySQL manual at http://dev.mysql.com/doc/.
Advantages of full-text search
The following points are some of the major advantages of full-text search:
- It is quicker than traditional searches as it benefits from an index of words that is used to look up records instead of doing a full table scan
- It gives results that can be sorted by relevance to the searched phrase or term, with sophisticated ranking capabilities to find the best documents or records
- It performs very well on huge databases with millions of records
- It skips the common words such as the, an, for, and so on
When to use a full-text search?
- When there is a high volume of free-form text data to be searched
- When there is a need for highly optimized search results
- When there is a demand for flexible search querying