Search Engines in ColdFusion

Built-In Search Engine

Verity comes in package with ColdFusion. One of the reasons why people pay for ColdFusion is the incredible power that comes with this tool. It should be noted that one of the most powerful standalone commercial search engines is this tool. Some of the biggest companies in the world have expanded internal services with the help of the Verity tool that we will learn about.

We can see that in order to start, we must create collections. The building of search abilities is a three-step process. There is a standard ColdFusion tag to help us with each of these functions.

Create collections
Index the collections
Search the collections

These collections can contain information about web pages, binary documents, and can even work as a powerful way to search cached query result information. There are many document formats supported. In the real business world, the latest bleeding-edge solutions will still store a previous version. Archived and shared documents should be stored in appropriate formats and versions that can be searched.

Creating a Collection

The first thing is to make our collection. See the ColdFusion Administrator under Data & Services.

search-engines-coldfusion-img-0

Here, we will be able to add collections and edit existing collections. There is one default collection included in ColdFusion installations. This is the bookclub demonstration application data. We will be creating a collection of PDF documents for this lesson. We have placed a collection of ColdFusion, Flex, and some of the Fusion Authority Quarterly periodicals in a directory for indexing. Here is the information screen for adding the collection through the administrator.

search-engines-coldfusion-img-1

We choose to select the Enable Category Support option. Also, there are libraries available for multiple languages if that is appropriate in a collection. We now see that there is a new collection for our devdocs. There are four icons to work with this collection. They are, from right to left, index, optimize, purge, and remove actions. The Name link takes us to the index action. The collection gives us the number of actual documents present, and the size of the index file on the server. The screen will show the details of the index as to when it was last modified, and the language in which it is stored. It lists the categories, and also shows the actual path where the index is stored.

search-engines-coldfusion-img-2

Here is a code version of creating a collection that would achieve the same thing. This means that it is possible to create an entire administrative interface to manage collections. It is also possible to move from tags to objects, and wrap up all the functions in that style.

<cfcollection 
action="create" 
collection="devdocs" 	
path="c:ColdFusion8veritycollectionsdocuments" />

If we have categories in our collection, and we want to get a list of the categories, then the following code must be used:

<cfcollection 
action="categoryList" 
collection="bookClub" 
name="myCats" />
<cfdump var="#myCats#">

Indexing a Collection

We can do this through the administration interface. But here, we will do it as shown in the the following screenshot. This is a limited directory that we have used as an example for searching.

search-engines-coldfusion-img-3

This is the result of the devdocs submitted above.

search-engines-coldfusion-img-4

This gave a result of 12 documents with a search collection of the size, 4,611 Kb. Now, we will look at how to do the same search using code and build the index outside the administrator interface. This will require the collection to be built before we try to index files into it. The creation of the collection can also be done inside the administration interface or in code. It should also be noted that ColdFusion includes a security called Sandbox Security. These three core tags for Verity searching among many others can be blocked if you find it better for your environment. Just consider what is actually getting indexed and what needs to be searched. Hopefully, documents will be secured correctly and it will not be an issue.

When we are making an index, we have to make sure that we can either choose to use a recursive search or not. A recursive search means that all the subdirectories in a document or web page search will be included in our search. It should also be noted that the service will not work for indexing other websites. It is for indexing this server only.

<cfindex name="myCats" action="refresh" 
collection="bookClub" recurse="true" 
type="path" extensions=".html .htm .cfm .cfml" 
key="c:inetpubwwwrootdocuments" 
urlpath="http://localhost/documents/" />

Your collection has been indexed.

It is important to note that there is no output from this tag. So we need to put some text on the screen to make sure the person using the site can know that the task has been completed. If we want to index a single file rather than a whole directory path, we can do it with this code:

<cfindex action="refresh" 
collection="bookClub" recurse="true" 
type="file" extensions=".pdf" 
key=" c:inetpubwwwrootdocumentsColdFusioncf8_devguide.pdf" 
urlpath="http://localhost/documents/ColdFusion" />

Your collection has been indexed.