Administrating Solr

October 11, 2013 - 12:00 am

1484

10 min read

(For more resources related to this topic, see here.)

Query nesting

You might come across situations wherein you need to nest a query within another query in order to search specific keyword or phrase. Let us imagine that you want to run a query using the standard request handler, but you need to embed a query that is parsed by the dismax query parser inside it. Isn’t that interesting? We will show you how to do it.

Our example data looks like this:

<add>
<doc>
<field name=”id”>1</field>
<field name=”title”>Reviewed solrcook book</field>
</doc>
<doc>
<field name=”id”>2</field>
<field name=”title”>Some book reviewed</field>
</doc>
<doc>
<field name=”id”>3</field>
<field name=”title”>Another reviewed little book</field>
</doc>
</add>

Here, we are going to use the standard query parser to support lucene query syntax, but we would like to boost phrases using the dismax query parser. At first it seems to be impossible to achieve, but don’t worry, we will handle it. Let us suppose that we want to find books having the words reviewed and book in their title field and we would like to boost the reviewed book phrase by 10. Here we go with the query:

http: //localhost:8080/solr/select?q=reviewed+AND+book+AND+_
query_:”{!dismax qf=title pf=title^10 v=$qq}”&qq=reviewed+book

The results of the preceding query should look like:

<?xml version=”1.0″ encoding=”UTF-8″?>
<response>
<lst name=”responseHeader”>
<int name=”status”>0</int>
<int name=”QTime”>2</int>
<lst name=”params”>
<str name=”fl”>*,score</str>
<str name=”qq”>book reviewed</str>
<str name=”q”>book AND reviewed AND _query_:”{!dismax qf=title
pf=title^10 v=$qq}”</str>
</lst>
</lst>
<result name=”response” numFound=”3″ start=”0″ maxScore=”0.77966106″>
<doc>
<float name=”score”>0.77966106</float>
<str name=”id”>2</str>
<str name=”title”>Some book reviewed</str>
</doc>
<doc>
<float name=”score”>0.07087828</float>
<str name=”id”>1</str>
<str name=”title”>Reviewed solrcook book</str>
</doc>
<doc>
<float name=”score”>0.07087828</float>
<str name=”id”>3</str>
<str name=”title”>Another reviewed little book</str>
</doc>
</result>
</response>

Let us focus on the query. The q parameter is built of two parts connected together with AND operator. The first one reviewed+AND+book is just a usual query with a logical operator AND defined. The second part building the query starts with a strange looking expression, _query_. This expression tells Solr that another query should be made that will affect the results list. We then see the expression stating that Solr should use the dismax query parser (the !dismax part) along with the parameters that will be passed to the parser (qf and pf).

The v parameter is an abbreviation for value and it is used to pass the value of the q parameter (in our case, reviewed+book is being passed to dismax query parser).

And that’s it! We land to the search results which we had expected.

Stats.jsp

From the admin interface, when you click on the Statistics link, though you receive a web page of information about the specific index, this information is actually being served to the browser as an XML linked to an embedded XSL stylesheet. This is then transformed into HTML in the browser. This means that if you perform a GET request on stats.jsp, you will be back with XML demonstrated as follows.

curl http://localhost:8080/solr/mbartists/admin/stats.jsp

If you open the downloaded file, you will see all the data as XML. The following code is an extract of the statistics available that stores individual documents and the standard request handler with the metrics you might wish to monitor (highlighted in the following code):

<entry>
<name>documentCache</name>
<class>org.apache.solr.search.LRUCache</class>
<version>1.0</version>
<description>LRU Cache(maxSize=512,
initialSize=512)</description>
<stats>
<stat name=”lookups”>3251</stat>
<stat name=”hits”>3101</stat>
<stat name=”hitratio”>0.95</stat>
<stat name=”inserts”>160</stat>
<stat name=”evictions”>0</stat>
<stat name=”size”>160</stat>
<stat name=”warmupTime”>0</stat>
<stat name=”cumulative_lookups”>3251</stat>
<stat name=”cumulative_hits”>3101</stat>
<stat name=”cumulative_hitratio”>0.95</stat>
<stat name=”cumulative_inserts”>150</stat>
<stat name=”cumulative_evictions”>0</stat>
</stats>
</entry>
<entry>
<name>standard</name>
<class>org.apache.solr.handler.component.SearchHandler</class>
<version>$Revision: 1052938 $</version>
<description>Search using components:
org.apache.solr.handler.component.QueryComponent,
org.apache.solr.handler.component.FacetComponent</description>
<stats>
<stat name=”handlerStart”>1298759020886</stat>
<stat name=”requests”>359</stat>
<stat name=”errors”>0</stat>
<stat name=”timeouts”>0</stat>
<stat name=”totalTime”>9122</stat>
<stat name=”avgTimePerRequest”>25.409472</stat>
<stat name=”avgRequestsPerSecond”>0.446995</stat>
</stats>
</entry>

The method of integrating with monitoring system various from system to system., as an example you may explore ./examples/8/check_solr.rb for a simple Ruby script that queries the core and check if the average hit ratio and the average time per request are above a defined threshold.

./check_solr.rb -w 13 -c 20 -imtracks
CRITICAL – Average Time per request more than 20 milliseconds old:
39.5

In the previous example, we have defined 20 milliseconds as the threshold and the average time for a request to serve is 39.5 milliseconds (which is far greater than the threshold we had set).

Ping status

It is defined as the outcome from PingRequestHandler, which is primarily used for reporting SolrCore health to a Load Balancer; that is, this handler has been designed to be used as the endpoint for an HTTP Load Balancer to use while checking the “health” or “up status” of a Solr server. In a simpler term, ping status denotes the availability of your Solr server (up-time and downtime) for the defined duration.

Additionally, it should be configured with some defaults indicating a request that should be executed. If the request succeeds, then the PingRequestHandler will respond with a simple OK status. If the request fails, then the PingRequestHandler will respond with the corresponding HTTP error code. Clients (such as load balancers) can be configured to poll the PingRequestHandler monitoring for these types of responses (or for a simple connection failure) to know if there is a problem with the Solr server.

PingRequestHandler can be implemented which looks something like the following:

<requestHandler name=”/admin/ping”
class=”solr.PingRequestHandler”>
<lst name=”invariants”>
<str name=”qt”>/search</str><!– handler to delegate to –>
<str name=”q”>some test query</str>
</lst>
</requestHandler>

You may try this out even with a more advanced option, which is to configure the handler with a healthcheckFile that can be used to enable/disable the PingRequestHandler. It would look something like the following:

<requestHandler name=”/admin/ping”
class=”solr.PingRequestHandler”>
<!– relative paths are resolved against the data dir –>
<str name=”healthcheckFile”>server-enabled.txt</str>
<lst name=”invariants”>
<str name=”qt”>/search</str><!– handler to delegate to –>
<str name=”q”>some test query</str>
</lst>
</requestHandler>

A couple of points which you should know while selecting the healthcheckFile option are:

If the health check file exists, the handler will execute the query and returns status as described previously.
If the health check file does not exist, the handler will throw an HTTP error even though the server is working fine and the query would have succeeded.

This health check file feature can be used as a way to indicate to some load balancers that the server should be “removed from rotation” for maintenance, or upgrades, or whatever reason you may wish.

Business rules

You might come across situations wherein your customer who is running an e-store consisting of different types of products such as jewelry, electronic gazettes, automotive products, and so on defines a business need which is flexible enough to cope up with changes in the search results based on the search keyword.

For instance, imagine of a customer’s requirement wherein your need to add facets such as Brand, Model, Lens, Zoom, Flash, Dimension, Display, Battery, Price, and so on whenever a user searches for “Camera” keyword. So far the requirement is easy and can be achieved in simpler way. Now let us add some complexity in our requirement wherein facets such as Year, Make, Model, VIN, Mileage, Price, and so on should get automatically added when the user searches for a keyword “Bike”. Worried about how to overrule such complex requirement? This is where business rules come into play. There is n-number of rule engines (both proprietary and open source) in market such as Drools, JRules, and so on which can be plugged-in into your Solr.

Drools

Now let us understand how Drools functions. It injects the rules into working memory, and then it evaluates which custom rules should be triggered based on the conditions stated in the working memory. It is based on if-then clauses, which enables the rules coder to define the what condition must be true (using if or when clause), and what action/event should be triggered when the defined condition is met, that is true (using then clause). Drools conditions are nothing but any Java object that the application wishes to inject as input. A business rule is more or less in the following format:

rule “ruleName”
when
// CONDITION
then
//ACTION

We will now show you how to write an example rule in Drools:

rule “WelcomeLucidWorks”
no-loop
when
$respBuilder : ResponseBuilder();
then
$respBuilder.rsp.add(“welcome”, “lucidworks”);
end

In the given code snippet, it checks for ResponseBuilder object (one of the prime objects which help in processing search requests in a SearchComponent) in the working memory and then adds a key-value pair to that ResponseBuilder (in our case, welcome and lucidworks).

Summary

In this article, we saw how to nest a query within another query, learned about stats.jsp, how to use ping status, and what are business rules, how and when they prove to be important for us and how to write your custom rule using Drools.

Resources for Article:

Further resources on this subject:

Getting Started with Apache Solr [Article]
Making Big Data Work for Hadoop and Solr [Article]
Apache Solr Configuration [Article]

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Administrating Solr

Query nesting

Stats.jsp

Ping status

Business rules

Drools

Summary

Resources for Article:

LEAVE A REPLY Cancel reply

MobilePro

datapro

Programming

Subscribe to our newsletter