9 min read

 

Apache Solr 3.1 Cookbook

Apache Solr 3.1 Cookbook Over 100 recipes to discover new ways to work with Apache’s Enterprise Search Server
        Read more about this book      

(For more resources on this subject, see here.)

Getting more documents similar to those returned in the results list

Let’s imagine a situation where you have an e-commerce library shop and you want to show users the books similar to the ones they found while using your application. This recipe will show you how to do that.

How to do it…

Let’s assume that we have the following index structure (just add this to your schema.xml file’s fields section):

<field name=”id” type=”string” indexed=”true” stored=”true”
required=”true” />
<field name=”name” type=”text” indexed=”true” stored=”true”
termVectors=”true” />


The test data looks like this:

<add>
<doc>
<field name=”id”>1</field>
<field name=”name”>Solr Cookbook first edition</field>
</doc>
<doc>
<field name=”id”>2</field>
<field name=”name”>Solr Cookbook second edition</field>
</doc>
<doc>
<field name=”id”>3</field>
<field name=”name”>Solr by example first edition</field>
</doc>
<doc>
<field name=”id”>4</field>
<field name=”name”>My book second edition</field>
</doc>
</add>


Let’s assume that our hypothetical user wants to find books that have first in their names. However, we also want to show him the similar books. To do that, we send the following query:

http://localhost:8983/solr/select?q=name:edition&mlt=true&mlt.
fl=name&mlt.mintf=1&mlt.mindf=1


The results returned by Solr are as follows:

<?xml version=”1.0″ encoding=”UTF-8″?>
<response>
<lst name=”responseHeader”>
<int name=”status”>0</int>
<int name=”QTime”>1</int>
<lst name=”params”>
<str name=”mlt.mindf”>1</str>
<str name=”mlt.fl”>name</str>
<str name=”q”>name:edition</str>
<str name=”mlt.mintf”>1</str>
<str name=”mlt”>true</str>
</lst>
</lst>
<result name=”response” numFound=”1″ start=”0″>
<doc>
<str name=”id”>3</str>
<str name=”name”>Solr by example first edition</str>
</doc>
</result>
<lst name=”moreLikeThis”>
<result name=”3″ numFound=”3″ start=”0″>
<doc>
<str name=”id”>1</str>
<str name=”name”>Solr Cookbook first edition</str>
</doc>
<doc>
<str name=”id”>2</str>
<str name=”name”>Solr Cookbook second edition</str>
</doc>
<doc>
<str name=”id”>4</str>
<str name=”name”>My book second edition</str>
</doc>
</result>
</lst>
</response>


Now let’s see how it works.

How it works…

As you can see, the index structure and the data are really simple. One thing to notice is that the termVectors attribute is set to true in the name field definition. It is a nice thing to have when using more like this component and should be used when possible in the fields on which we plan to use the component.

Now let’s take a look at the query. As you can see, we added some additional parameters besides the standard q one. The parameter mlt=true says that we want to add the more like this component to the result processing. Next, the mlt.fl parameter specifies which fields we want to use with the more like this component. In our case, we will use the name field. The mlt.mintf parameter tells Solr to ignore terms from the source document (the ones from the original result list) with the term frequency below the given value. In our case, we don’t want to include the terms that will have the frequency lower than 1. The last parameter, mlt.mindf, tells Solr that the words that appear in less than the value of the parameter documents should be ignored. In our case, we want to consider words that appear in at least one document.

Finally, let’s take a look at the search results. As you can see, there is an additional section (<lst name=”moreLikeThis”>) that is responsible for showing us the more like this component results. For each document in the results, there is one more similar section added to the response. In our case, Solr added a section for the document with the unique identifier 3 (<result name=”3″ numFound=”3″ start=”0″>) and there were three similar documents found. The value of the id attribute is assigned the value of the unique identifier of the document that the similar documents are calculated for.

Presenting search results in a fast and easy way

Imagine a situation where you have to show a prototype of your brilliant search algorithm made with Solr to the client. But the client doesn’t want to wait another four weeks to see the potential of the algorithm, he/she wants to see it very soon. On the other hand, you don’t want to show the pure XML results page. What to do then? This recipe will show you how you can use the Velocity response writer (a.k.a. Solritas) to present a prototype fast.

How to do it…

Let’s assume that we have the following index structure (just add this to your schema.xml file to the fields section):

<field name=”id” type=”string” indexed=”true” stored=”true”
required=”true” />
<field name=”name” type=”text” indexed=”true” stored=”true” />


The test data looks like this:

<add>
<doc>
<field name=”id”>1</field>
<field name=”name”>Solr Cookbook first edition</field>
</doc>
<doc>
<field name=”id”>2</field>
<field name=”name”>Solr Cookbook second edition</field>
</doc>
<doc>
<field name=”id”>3</field>
<field name=”name”>Solr by example first edition</field>
</doc>
<doc>
<field name=”id”>4</field>
<field name=”name”>My book second edition</field>
</doc>
</add>


We need to add the response writer definition. To do this, you should add this to your solrconfig.xml file (actually this should already be in the configuration file):

<queryResponseWriter name=”velocity” class=”org.apache.solr.
request.VelocityResponseWriter”/>


Now let’s set up the Velocity response writer. To do that we add the following section to the solrconfig.xml file (actually this should already be in the configuration file):

<requestHandler name=”/browse” class=”solr.SearchHandler”>
<lst name=”defaults”>
<str name=”wt”>velocity</str>
<str name=”v.template”>browse</str>
<str name=”v.layout”>layout</str>
<str name=”title”>Solr cookbook example</str>
<str name=”defType”>dismax</str>
<str name=”q.alt”>*:*</str>
<str name=”rows”>10</str>
<str name=”fl”>*,score</str>
<str name=”qf”>name</str>
</lst>
</requestHandler>


Now you can run Solr and type the following URL address:

http://localhost:8983/solr/browse

You should see the following page:

(Move the mouse over the image to enlarge it.)

How it works…

As you can see, the index structure and the data are really simple, so I’ll skip discussing this part of the recipe.

The first thing in configuring the solrconfig.xml file is adding the Velocity Response Writer definition. By adding it, we tell Solr that we will be using velocity templates to render the view.

Now we add the search handler to use the Velocity Response Writer. Of course, we could pass the parameters with every query, but we don’t want to do that, we want them to be added by Solr automatically. Let’s go through the parameters:

  • wt: The response writer type; in our case, we will use the Velocity Response Writer.
  • v.template: The template that will be used for rendering the view; in our case, the template that Velocity will use is in the browse.vm file (the vm postfix is added by Velocity automatically). This parameter tells Velocity which file is responsible for rendering the actual page contents.
  • v.layout: The layout that will be used for rendering the view; in our case, the template that velocity will use is in the layout.vm file (the vm postfix is added by velocity automatically). This parameter specifies how all the web pages rendered by Solritas will look like.
  • title: The title of the page.
  • defType: The parser that we want to use.
  • q.alt: Alternate query for the dismax parser in case the q parameter is not defined.
  • rows: How many maximum documents should be returned.
  • fl: Fields that should be listed in the results.
  • qf: The fields that we should be searched.

Of course, the page generated by the Velocity Response Writer is just an example. To modify the page, you should modify the Velocity files, but this is beyond the scope of this article.

There’s more…

If you are still using Solr 1.4.1 or 1.4, there is one more thing that can be useful.

Running Solritas on Solr 1.4.1 or 1.4

Because the Velocity Response Writer is a contrib module in Solr 1.4.1, we need to do the following operations to use it. Copy the following libraries from the /contrib/velocity/ src/main/solr/lib directory to the /lib directory of your Solr instance:

  • apache-solr-velocity-1.4.dev.jar
  • commons-beanutils-1.7.0.jar
  • commons-collections-3.2.1.jar
  • velocity-1.6.1.jar
  • velocity-tools-2.0-beta3.jar

Then copy the contents of the /velocity (with the directory) directory from the code examples to your Solr configuration directory.

LEAVE A REPLY

Please enter your comment!
Please enter your name here