12 min read

(Read more interesting articles on Liferay Portal 6 here.)

Pluggable Enterprise Search

As an alternative to using Lucene, the portal supports pluggable search engines. The first implementation of this uses the open source search engine Solr, but in the future, there will be many such plugins for search engine of your choice, such as FAST, GSA, Coveo, and so on. In this section, we’re going to discuss caching, indexing, and using Solr for search

Caching settings

EHCache is a widely-used cache implemented in Java, which the portal uses to provide distributed caching in a clustered environment. EHCache is also used in a non-clustered environment to speed up repeated data retrievals. The portal uses EHCache caching by default. At the same time, the portal uses Hibernate caching as well. The portal provides the capability to confi gure EHCache caching and Hibernate caching.

The portal has specified Hibernate as default ORM (Object-Relational Mapping) persistence in portal.properties.

persistence.provider=hibernate

The preceding code sets the provider hibernate used for ORM persistence. Of course, you can set this property to jpa (Java Persistence API), thus the properties with the prefi x jpa.* will be read. Similarly, if this property is set to hibernate, then the properties with the prefix hibernate.* will be read. Note that this property affects the loading of hibernate-spring.xml or jpa-spring.xml in the property spring.configs. For example, the portal has the following JPA configuration specified in portal.properties:

jpa.configs=
META-INF/mail-orm.xml,
META-INF/portal-orm.xml
jpa.provider=eclipselink
jpa.provider.property.eclipselink.allow-zero-id=true
jpa.load.time.weaver=org.springframework.instrument.
classloading.ReflectiveLoadTimeWeaver

As shown in the preceding code, the property jpa.configs sets a list of commadelimited JPA configurations. The default JPA provider is set as eclipselink via the property jpa.provider. You can set it to other values such as hibernate, openjpa, and toplink. The property jpa.provider.property.eclipselink. allow-zero-id specifies provider-specific properties prefixed with jpa.provider. property.*.On the other hand, LoadTimeWeaver interface specified via the property jpa.load.time.weaver is a Spring class that allows JPA ClassTransformer instances to be plugged in a specific manner depending on the environment. Note that not all JPA providers require a JVM agent. If your provider doesn’t require an agent or you have other alternatives, the loadtime weaver shouldn’t be used.

Configure Hibernate caching

First of all, let’s consider Hibernate caching settings. The portal will automatically detect the Hibernate dialect. However, you can set the property in portal-ext.properties to manually override the automatically detected dialect.

hibernate.dialect=

The portal also specified the following properties related to Hibernate caching in portal.properties.

hibernate.configs=//ignore details
META-INF/ext-hbm.xml
hibernate.cache.provider_class=com.liferay.portal.dao.orm.hibernate.
EhCacheProvider
net.sf.ehcache.configurationResourceName=/ehcache/hibernate.xml
hibernate.cache.use_query_cache=true
hibernate.cache.use_second_level_cache=true
hibernate.cache.use_minimal_puts=true
hibernate.cache.use_structured_entries=false
hibernate.jdbc.batch_size=20
hibernate.jdbc.use_scrollable_resultset=true
hibernate.bytecode.use_reflection_optimizer=true
hibernate.query.factory_class=org.hibernate.hql.classic.
ClassicQueryTranslatorFactory
hibernate.generate_statistics=false

As shown in the preceding code, the property hibernate.configs sets Hibernate configurations. You may input a list of comma-delimited Hibernate configurations in portal-ext.properties. The property hibernate.cache.provider_class sets the Hibernate cache provider. On the other hand, the property net.sf.ehcache.configurationResourceName is used if Hibernate is confi gured to use Ehcache’s cache provider, where Ehcache is recommended in a clustered environment. In a clustered environment, you need to set the property in portal-ext.properties as follows:

net.sf.ehcache.configurationResourceName=
/ehcache/hibernate-clustered.xml

The portal has specified other Hibernate cache settings with properties starting with hibernate.cache.use_*. The property hibernate.jdbc.batch_size sets the JDBC batch size to improve performance. Note that if you’re using Hypersonic databases or Oracle 9i, you should set the batch size to 0 as a workaround for a logging bug in the Hypersonic database driver or Oracle 9i driver.

In addition, the property hibernate.query.factory_class sets the classic query factory, whereas the portal sets the property hibernate.generate_statistics to false. Of course, you could set the property hibernate.generate_statistics to true to enable Hibernate cache monitoring in portal-ext.properties.

Setting up EHCache caching

The portal has specified the following EHCache caching settings in portal.properties

ehcache.single.vm.config.location=/ehcache/liferay-single-vm.xml
ehcache.multi.vm.config.location=/ehcache/liferay-multi-vm.xml
ehcache.portal.cache.manager.jmx.enabled=true
ehcache.blocking.cache.allowed=true

As shown in the preceding code, the property ehcache.single.vm.config.location sets the classpath to the location of the Ehcache configuration file / ehcache/liferay-single-vm.xml for internal caches of a single VM, whereas the property ehcache.multi.vm.config.location sets the classpath to the location of the Ehcache configuration file /ehcache/liferay-multi-vm.xml for internal caches of multiple VMs. In a clustered environment, you need to set the following in

ehcache.multi.vm.config.location=/ehcache/
liferay-multi-vm-clustered.xml

In addition, the portal sets the property ehcache.portal.cache.manager.jmx.enabled to true to enable JMX integration in com.liferay.portal.cache.EhcachePortalCacheManager. Moreover, the portal sets the property ehcache.blocking.cache.allowed to, true to allow Ehcache to use blocking caches. This improves performance signifi cantly by locking on keys instead of the entire cache. The drawback is that threads can hang if the cache isn’t used properly. Therefore, make sure that all queries that return a miss also immediately populate the cache, or else other threads that are blocked on a query of that same key will continue to hang.

Of course, you can override the preceding properties in portal-ext.properties.

Customization

As you can see, the property net.sf.ehcache.configurationResourceName can have the value /ehcache/hibernate.xml for a non-clustered environment and /ehcache/hibernate-clustered.xml for a clustered environment. The propertynet.sf.ehcache.configurationResourceName can have the value /ehcache/hibernate.xml for a non-clustered environment and /ehcache/hibernateclustered.xml for a clustered environment.

In the same pattern, the property ehcache.single.vm.config.location can havethe value /ehcache/liferay-single-vm.xml and the property ehcache.multi.vm.config.location can have the value /ehcache/liferay-multi-vm.xml for a non-clustered environment.

ehcache.multi.vm.config.location has a value /ehcache/liferay-multi-vmclustered.xml for a clustered environment.

In real cases, you may need to update both Hibernate caching settings and Ehcache caching settings, either in a non-clustered environment or in a clustered environment. The following is an example of how to do this:

  1. Create a folder named ext-ehcache under the folder $PORTAL_ROOT_HOME/WEB-INF/classes/. Obviously, you can have different names for the folder ${ehcache.folder.name}. Here we use the folder ext-ehcache of ${ehcache.folder.name} as an example
  2. Locate the JAR file portal-impl.jarunder the folder $PORTAL_ROOT_HOME/WEB-INF/lib and unzip all the fi les under the folder ehcache into the folder $PORTAL_ROOT_HOME/WEB-INF/classes/ext-ehcache.
  3. Update following fi les according to your requirements for both a non-clustered environment and a clustered environment.
  4. hibernate.xml
    hibernate-clustered.xml
    liferay-single-vm.xml
    liferay-multi-vm.xml
    liferay-multi-vm-clustered.xml
  5. Set the following for a non-clustered environment in
  6. portal-ext.properties:
    net.sf.ehcache.configurationResourceName=/ext-ehcache/
    hibernate.xml
    ehcache.single.vm.config.location=/ext-ehcache/
    liferay-single-vm.xml
    ehcache.multi.vm.config.location=/ext-ehcache/liferay-multi-vm.xml
  7. Otherwise, set the following for a clustered environment in
  8. portal-ext.properties:
    net.sf.ehcache.configurationResourceName=/ext-ehcache/hibernateclustered.
    xml
    ehcache.multi.vm.config.location=/ext-ehcache/liferay-multi-vmclustered.
    xml

That’s it! You have customized both the both Hibernate caching settings and Ehcache caching settings

 

Indexing settings

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java-based indexing. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Refer to http://lucene.apache.org for more information. By default, the portal uses Lucene search and indexing.

Lucene search

The portal sets the default Lucene index on start-up to false for faster performance, as follows,in portal.properties:

index.read.only=false
index.on.startup=false
index.on.startup.delay=60
index.with.thread=true

As you can see, the portal sets the property index.read.only to false to allow any writes to the index. You should set it to true if you want to avoid any writes to the index. This is useful in some clustering environments where there is a shared index, and only one node of the cluster updates it.

The portal sets the property index.on.startup to false in order to avoid indexing on every startup. You could set this property to true if you want to index your entire library of files on startup. This property is available so that automated test environments index the files on startup.

Don’t set this to true on production systems, or else your index data will be indexed on every startup.

The property index.on.startup.delay adds a delay before indexing on startup. A delay may be necessary if a lot of plugins need to be loaded and re-indexed. Note that this property is only valid if the property index.on.startup is set to true.

In addition, the portal sets the property index.with.thread to true to allow indexing on startup to be executed on a separate thread to speed up execution.

Of course, you could re-index either all resources or an individual resource through web UI. For example, for re-indexing all search indexing, you can go to Control Panel | Server | Server Administration | Resources | Actions, and click on the button Execute next to the “Reindex all search indexes” option.

Suppose that you’re going to re-index individual resource like Users, you can use the Plugin Installation portlet in the Control Panel. Go to Control Panel | Server | Plugin Installation | Portlet Plugins, and click on the button Reindex next to the portlet Users.

Index storage

Lucene stores could be in the filesystem, the database, or in RAM. Anyway, the portal provides a set of properties to configure index storage as follows in portal.properties.

lucene.store.type=file
lucene.store.jdbc.auto.clean.up=false
lucene.store.jdbc.dialect.*
lucene.dir=${liferay.home}/data/lucene/
lucene.file.extractor=com.liferay.portal.search
.lucene.LuceneFileExtractor
lucene.file.extractor.regexp.strip=
lucene.analyzer=org.apache.lucene.analysis.standard
.StandardAnalyzer
lucene.commit.batch.size=0
lucene.commit.time.interval=0
lucene.buffer.size=16
lucene.merge.factor=10
lucene.optimize.interval=100

As shown in the preceding code, the property lucene.store.type designates whether Lucene stores indexes in a database via JDBC, filesystem, or in RAM. The default setting is filesystem. When using Lucene’s storage of indexes via JDBC, temporary files don’t get removed properly. This can eat up disk space over time. Thus set the property lucene.store.jdbc.auto.clean.up to true to automatically clean up the temporary files once a day.

The property lucene.store.jdbc.dialect.* sets the JDBC dialect so that Lucene can use it to store indexes in the database. This property is referenced only when Lucene stores indexes in the database. The portal will attempt to load the proper dialect based on the URL of the JDBC connection.

The property lucene.dir sets the directory where Lucene indexes are stored. This is referenced only when Lucene stores indexes in the filesystem. In a clustered environment, you could point the property lucene.dir to a shared folder, which is accessible for all nodes. More interestingly, you could set one node to allow any writes to the indexes via the property index.read.only and set the rest of nodes to allow read only.

The property lucene.file.extractor specifies a class, called by Lucene to extract text from complex fi les so that they can be properly indexed. The file extractor can sometimes return text that isn’t valid for Lucene. The property lucene.file. extractor.regexp.strip expects a regular expression. Any character that doesn’t match the regular expression will be replaced with a blank space. You can set an empty regular expression to disable this feature. The property lucene.analyzer sets the default analyzer used for indexing and retrieval.

In addition, the property lucene.commit.batch.size sets how often index updates will be committed. Set the batch size to confi gure how many consecutive updates will trigger a commit. If the value is 0, then the index will be committed on every update. The property lucene.commit.time.interval sets the time interval in milliseconds to confi gure how often to commit the index. The time interval isn’t read unless the batch size is greater than 0 because the time interval works in conjunction with the batch size to guarantee that the index is committed after a specifi ed time interval. The portal sets the time interval to 0 to disable committing the index by a time interval.

The property lucene.buffer.size sets Lucene’s buffer size in megabytes and the property lucene.merge.factor sets Lucene’s merge factor. For both of these properties, higher numbers mean that indexing goes faster but uses more memory. The default value from Lucene is 10. Note that this should never be set to a number less than 2. The property lucene.optimize.interval sets how often to run Lucene’s optimize method. Optimization speeds up searching but slows down writing. You can set this property to 0 to always optimize.

Indexer framework

As mentioned earlier, you could re-index either all resources or an individual resource through web UI. For example, you could re-index out-of-the-box portlets like Users (Portlet ID 125) and plugins like the Mail portlet. This is because, in $PORTAL_ROOT_HOME/WEB-INF/liferay-portlet.xml, the portlet Users (named as enterprise_admin_users) has specified the following line:

<indexer-class>
com.liferay.portlet.enterpriseadmin.util.UserIndexer
</indexer-class>

As shown in the preceding code, the indexer-class value, which is the specified indexer framework, must be a class that implements com.liferay.portal.kernel.search.Indexer and is called to create or update a search index for the portlet Users. Additionally, you could fi nd the indexer framework in out-of-the-box portlets such as Organizations (portlet ID 126, called enterprise_admin_organizations), Web Content (portlet ID 15), Image Gallery (Portlet ID 31), Document Library (Portlet ID 20), and so on.

Similarly the indexer-class value, which is the specified indexer framework, is also available in plugins. For example, the portlet Mail has specifi ed the following line in $AS_WEB_APP_HOME/mail-portlet/WEB-INF/liferay-portlet.xml.

<indexer-class>com.liferay.mail.search.Indexer</indexer-class>

In the same pattern, you may add the indexer framework in other plugins, like the Knowledge base portlet KBIndexer, which supports keyword search against titles, descriptions, content, tags, categories and category hierarchy, and “San Francisco” as oneword, and ‘San Francisco’ as multiple words (“San” or “Francisco”) at http://liferay.cignex.com/palm_tree/book/0387/chapter11/knowledge-base-portlet-6.0.0.1.war

LEAVE A REPLY

Please enter your comment!
Please enter your name here