Included with OpenCms is a distribution of the Lucene search engine. Lucene is an open source, high-performance text search engine that is both easy to use and full-featured. Lucene is not a product. It is a Java library providing data indexing, and search and retrieval support. OpenCms integrates with Lucene to provide these features for its VFS content.
Though Lucene is simple to use, it is highly flexible and has many options. We will not go into the full details of all the options here, but will provide a basic overview, which will help us in developing our search code. A full understanding of Lucene is not required for completing this article, but interested readers can find more information at the Lucene website: http://jakarta.apache.org/lucene. There are also several excellent books available, which can easily be found with a web search.
For any data to be searched, it must first be indexed. Lucene supports both disk and memory based indexes, but OpenCms uses the more suitable disk based indexes. There are three basic concepts to understand regarding Lucene search indexes: Documents, Analyzers, and Fields.
Lucene also provides the ability to define a boost value for a field. This affects the relevance of the field when it is used in a search. A value other than the default value of 1.0 may be used to raise or lower the relevance.
These are the important concepts to be understood while creating a Lucene search index. After an index has been created, documents may be searched through queries.
Querying Lucene search indexes is supported through a Java API and a search querying language. Search queries are made up of terms and operators. A term can be a simple word such as “hello” or a phrase such as “hello world”. Operators are used to form logical expressions with terms, such as AND or NOT. With the Java API, terms can be built and aggregated together along with operators to form a query. When using the query language, a Java class is provided to parse the query and convert it into a format suitable for passing to the engine. In addition to these search features, there are more advanced operations that may be performed such as fuzzy searches, range searches, and proximity searches.
All these options and flexibility allow Lucene to be used in an application in many ways. OpenCms does a good job of using these options to provide search capabilities for a wide range of content types. Next, we will look at how OpenCms interfaces with Lucene to provide this support.
OpenCms maintains search settings in the opencms-search.xml configuration file located in the WEB-INF/config directory. Prior to OpenCms 7, most of the settings in this configuration file needed to be made by hand. With OpenCms 7, the Search Management tool in the Administration View has been improved to cover most of the settings. We will first go over the settings that are controlled through the Search Management view, and will then visit the settings that must still be changed by hand. The first thing we’ll do is define our own search index for the blog content. Creating a new search index is simple with the Administration tool. We access it by clicking on the Search Management icon of the Administrative View, and then clicking on the New Index icon:
The Name field contains the name of the index file. This name can also be passed to a Java API. If the content differs between the online and offline areas, we can create an index for each one. For now, we will start with the offline index. We’ll name it: Blogs – Offline. The other fields are:
We do not have our own field configuration yet; so for now press OK to save the index. Next, we will define a field configuration for the blog content.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…