17 min read

(For more resources related to this topic, see here.)

During the writing of this article, I used Solr version 4.0 and Jetty Version 8.1.5. If another version of Solr is mandatory for a feature to run, then it will be mentioned.

If you don’t have any experience with Apache Solr, please refer to the Apache Solr tutorial which can be found at : http://lucene.apache.org/solr/tutorial.html.

Running Solr on Jetty

The simplest way to run Apache Solr on a Jetty servlet container is to run the provided example configuration based on embedded Jetty. But it’s not the case here. In this recipe, I would like to show you how to configure and run Solr on a standalone Jetty container.

Getting ready

First of all you need to download the Jetty servlet container for your platform. You can get your download package from an automatic installer (such as, apt-get), or you can download it yourself from http://jetty.codehaus.org/jetty/

How to do it…

The first thing is to install the Jetty servlet container, which is beyond the scope of this article, so we will assume that you have Jetty installed in the /usr/share/jetty directory or you copied the Jetty files to that directory.

Let’s start by copying the solr.war file to the webapps directory of the Jetty installation (so the whole path would be /usr/share/jetty/webapps). In addition to that we need to create a temporary directory in Jetty installation, so let’s create the temp directory in the Jetty installation directory.

Next we need to copy and adjust the solr.xml file from the context directory of the Solr example distribution to the context directory of the Jetty installation. The final file contents should look like the following code:

<?xml version="1.0"?> <!DOCTYPE Configure PUBLIC "-//Jetty//Configure//EN" "http://www. eclipse.org/jetty/configure.dtd"> <Configure class="org.eclipse.jetty.webapp.WebAppContext"> <Set name="contextPath">/solr</Set> <Set name="war"><SystemProperty name="jetty.home"/>/webapps/solr. war</Set> <Set name="defaultsDescriptor"><SystemProperty name="jetty.home"/>/ etc/webdefault.xml</Set> <Set name="tempDirectory"><Property name="jetty.home" default="."/>/ temp</Set> </Configure>

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com.
If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Now we need to copy the jetty.xml, webdefault.xml, and logging.properties files from the etc directory of the Solr distribution to the configuration directory of Jetty, so in our case to the /usr/share/jetty/etc directory.

The next step is to copy the Solr configuration files to the appropriate directory. I’m talking about files such as schema.xml, solrconfig.xml, solr.xml, and so on. Those files should be in the directory specified by the solr.solr.home system variable (in my case this was the /usr/share/solr directory). Please remember to preserve the directory structure you’ll see in the example deployment, so for example, the /usr/share/solr directory should contain the solr.xml (and in addition zoo.cfg in case you want to use SolrCloud) file with the contents like so:

<?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores adminPath="/admin/cores" defaultCoreName="collection1"> <core name="collection1" instanceDir="collection1" /> </cores> </solr>

All the other configuration files should go to the /usr/share/solr/collection1/conf directory (place the schema.xml and solrconfig.xml files there along with any additional configuration files your deployment needs). Your cores may have other names than the default collection1, so please be aware of that.

The last thing about the configuration is to update the /etc/default/jetty file and add –Dsolr.solr.home=/usr/share/solr to the JAVA_OPTIONS variable of that file. The whole line with that variable could look like the following:

JAVA_OPTIONS="-Xmx256m -Djava.awt.headless=true -Dsolr.solr.home=/usr/ share/solr/"

If you didn’t install Jetty with apt-get or a similar software, you may not have the /etc/default/jetty file. In that case, add the –Dsolr.solr.home=/usr/share/solr parameter to the Jetty startup.

We can now run Jetty to see if everything is ok. To start Jetty, that was installed, for example, using the apt-get command, use the following command:

/etc/init.d/jetty start

You can also run Jetty with a java command. Run the following command in the Jetty installation directory:

java –Dsolr.solr.home=/usr/share/solr –jar start.jar

If there were no exceptions during the startup, we have a running Jetty with Solr deployed and configured. To check if Solr is running, try going to the following address with your web browser: http://localhost:8983/solr/.

You should see the Solr front page with cores, or a single core, mentioned. Congratulations! You just successfully installed, configured, and ran the Jetty servlet container with Solr deployed.

How it works…

For the purpose of this recipe, I assumed that we needed a single core installation with only I and solrconfig.xml configuration files. Multicore installation is very similar – it differs only in terms of the Solr configuration files.

The first thing we did was copy the solr.war file and create the temp directory. The WAR file is the actual Solr web application. The temp directory will be used by Jetty to unpack the WAR file.

The solr.xml file we placed in the context directory enables Jetty to define the context for the Solr web application. As you can see in its contents, we set the context to be /solr, so our Solr application will be available under http://localhost:8983/solr/ We also specified where Jetty should look for the WAR file (the war property), where the web application descriptor file (the defaultsDescriptor property) is, and finally where the temporary directory will be located (the tempDirectory property).

The next step is to provide configuration files for the Solr web application. Those files should be in the directory specified by the system solr.solr.home variable. I decided to use the /usr/share/solr directory to ensure that I’ll be able to update Jetty without the need of overriding or deleting the Solr configuration files. When copying the Solr configuration files, you should remember to include all the files and the exact directory structure that Solr needs. So in the directory specified by the solr.solr.home variable, the solr.xml file should be available – the one that describes the cores of your system.

The solr.xml file is pretty simple – there should be the root element called solr. Inside it there should be a cores tag (with the adminPath variable set to the address where Solr’s cores administration API is available and the defaultCoreName attribute that says which is the default core). The cores tag is a parent for cores definition – each core should have its own cores tag with name attribute specifying the core name and the instanceDir attribute specifying the directory where the core specific files will be available (such as the conf directory).

If you installed Jetty with the apt–get command or similar, you will need to update the /etc/default/jetty file to include the solr.solr.home variable for Solr to be able to see its configuration directory.

After all those steps we are ready to launch Jetty. If you installed Jetty with apt–get or a similar software, you can run Jetty with the first command shown in the example. Otherwise you can run Jetty with a java command from the Jetty installation directory.

After running the example query in your web browser you should see the Solr front page as a single core. Congratulations! You just successfully configured and ran the Jetty servlet container with Solr deployed.

There’s more…

There are a few tasks you can do to counter some problems when running Solr within the Jetty servlet container. Here are the most common ones that I encountered during my work.

I want Jetty to run on a different port

Sometimes it’s necessary to run Jetty on a different port other than the default one. We have two ways to achieve that:

  • Adding an additional startup parameter, jetty.port. The startup command would look like the following command:

    java –Djetty.port=9999 –jar start.jar

  • Changing the jetty.xml file – to do that you need to change the following line:

    <Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>

    To:

    <Set name="port"><SystemProperty name="jetty.port" default="9999"/></Set>

Buffer size is too small

Buffer overflow is a common problem when our queries are getting too long and too complex, – for example, when we use many logical operators or long phrases. When the standard head buffer is not enough you can resize it to meet your needs. To do that, you add the following line to the Jetty connector in the jetty.xml file. Of course the value shown in the example can be changed to the one that you need:

<Set name="headerBufferSize">32768</Set>

After adding the value, the connector definition should look more or less like the following snippet:

<Call name="addConnector"> <Arg> <New class="org.mortbay.jetty.bio.SocketConnector"> <Set name="port"><SystemProperty name="jetty.port" default="8080"/></ Set> <Set name="maxIdleTime">50000</Set> <Set name="lowResourceMaxIdleTime">1500</Set> <Set name="headerBufferSize">32768</Set> </New> </Arg> </Call>

Running Solr on Apache Tomcat

Sometimes you need to choose a servlet container other than Jetty. Maybe because your client has other applications running on another servlet container, maybe because you just don’t like Jetty. Whatever your requirements are that put Jetty out of the scope of your interest, the first thing that comes to mind is a popular and powerful servlet container – Apache Tomcat. This recipe will give you an idea of how to properly set up and run Solr in the Apache Tomcat environment.

Getting ready

First of all we need an Apache Tomcat servlet container. It can be found at the Apache Tomcat website – http://tomcat.apache.org. I concentrated on the Tomcat Version 7.x because at the time of writing of this book it was mature and stable. The version that I used during the writing of this recipe was Apache Tomcat 7.0.29, which was the newest one at the time.

How to do it…

To run Solr on Apache Tomcat we need to follow these simple steps:

  1. Firstly, you need to install Apache Tomcat. The Tomcat installation is beyond the scope of this book so we will assume that you have already installed this servlet container in the directory specified by the $TOMCAT_HOME system variable.

  2. The second step is preparing the Apache Tomcat configuration files. To do that we need to add the following inscription to the connector definition in the server.xml configuration file:

    URIEncoding="UTF-8"

    The portion of the modified server.xml file should look like the following code snippet:

    <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" />

  3. The third step is to create a proper context file. To do that, create a solr.xml file in the $TOMCAT_HOME/conf/Catalina/localhost directory. The contents of the file should look like the following code:

    <Context path="/solr" docBase="/usr/share/tomcat/webapps/solr.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/ usr/share/solr/" override="true"/> </Context>

  4. The next thing is the Solr deployment. To do that we need the apache-solr-4.0.0.war file that contains the necessary files and libraries to run Solr that is to be copied to the Tomcat webapps directory and renamed solr.war.

  5. The one last thing we need to do is add the Solr configuration files. The files that you need to copy are files such as schema.xml, solrconfig.xml, and so on. Those files should be placed in the directory specified by the solr/home variable (in our case /usr/share/solr/). Please don’t forget that you need to ensure the proper directory structure. If you are not familiar with the Solr directory structure please take a look at the example deployment that is provided with the standard Solr package.

  6. Please remember to preserve the directory structure you’ll see in the example deployment, so for example, the /usr/share/solr directory should contain the solr.xml (and in addition zoo.cfg in case you want to use SolrCloud) file with the contents like so:

    <?xml version="1.0" encoding="UTF-8" ?> <solr persistent="true"> <cores adminPath="/admin/cores" defaultCoreName="collection1"> <core name="collection1" instanceDir="collection1" /> </cores> </solr>

  7. All the other configuration files should go to the /usr/share/solr/collection1/ conf directory (place the schema.xml and solrconfig.xml files there along with any additional configuration files your deployment needs). Your cores may have other names than the default collection1, so please be aware of that.

  8. Now we can start the servlet container, by running the following command:

    bin/catalina.sh start

  9. In the log file you should see a message like this:

    Info: Server startup in 3097 ms

  10. To ensure that Solr is running properly, you can run a browser and point it to an address where Solr should be visible, like the following:

    http://localhost:8080/solr/

If you see the page with links to administration pages of each of the cores defined, that means that your Solr is up and running.

How it works…

Let’s start from the second step as the installation part is beyond the scope of this book. As you probably know, Solr uses UTF-8 file encoding. That means that we need to ensure that Apache Tomcat will be informed that all requests and responses made should use that encoding. To do that, we modified the server.xml file in the way shown in the example.

The Catalina context file (called solr.xml in our example) says that our Solr application will be available under the /solr context (the path attribute). We also specified the WAR file location (the docBase attribute). We also said that we are not using debug (the debug attribute), and we allowed Solr to access other context manipulation methods. The last thing is to specify the directory where Solr should look for the configuration files. We do that by adding the solr/home environment variable with the value attribute set to the path to the directory where we have put the configuration files.

The solr.xml file is pretty simple – there should be the root element called solr. Inside it there should be the cores tag (with the adminPath variable set to the address where the Solr cores administration API is available and the defaultCoreName attribute describing which is the default core). The cores tag is a parent for cores definition – each core should have its own core tag with a name attribute specifying the core name and the instanceDir attribute specifying the directory where the core-specific files will be available (such as the conf directory).

The shell command that is shown starts Apache Tomcat. There are some other options of the catalina.sh (or catalina.bat) script; the descriptions of these options are as follows:

  • stop: This stops Apache Tomcat

  • restart: This restarts Apache Tomcat

  • debug: This start Apache Tomcat in debug mode

  • run: This runs Apache Tomcat in the current window, so you can see the output on the console from which you run Tomcat.

After running the example address in the web browser, you should see a Solr front page with a core (or cores if you have a multicore deployment). Congratulations! You just successfully configured and ran the Apache Tomcat servlet container with Solr deployed.

There’s more…

There are some other tasks that are common problems when running Solr on Apache Tomcat.

Changing the port on which we see Solr running on Tomcat

Sometimes it is necessary to run Apache Tomcat on a different port other than 8080, which is the default one. To do that, you need to modify the port variable of the connector definition in the server.xml file located in the $TOMCAT_HOME/conf directory. If you would like your Tomcat to run on port 9999, this definition should look like the following code snippet:

<Connector port="9999" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" />

While the original definition looks like the following snippet:

<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" />

Installing a standalone ZooKeeper

You may know that in order to run SolrCloud—the distributed Solr installation–you need to have Apache ZooKeeper installed. Zookeeper is a centralized service for maintaining configurations, naming, and provisioning service synchronization. SolrCloud uses ZooKeeper to synchronize configuration and cluster states (such as elected shard leaders), and that’s why it is crucial to have a highly available and fault tolerant ZooKeeper installation. If you have a single ZooKeeper instance and it fails then your SolrCloud cluster will crash too. So, this recipe will show you how to install ZooKeeper so that it’s not a single point of failure in your cluster configuration.

Getting ready

The installation instruction in this recipe contains information about installing ZooKeeper Version 3.4.3, but it should be useable for any minor release changes of Apache ZooKeeper. To download ZooKeeper please go to http://zookeeper.apache.org/releases.html This recipe will show you how to install ZooKeeper in a Linux-based environment. You also need Java installed.

How to do it…

Let’s assume that we decided to install ZooKeeper in the /usr/share/zookeeper directory of our server and we want to have three servers (with IP addresses 192.168.1.1, 192.168.1.2, and 192.168.1.3) hosting the distributed ZooKeeper installation.

  1. After downloading the ZooKeeper installation, we create the necessary directory:

    sudo mkdir /usr/share/zookeeper

  2. Then we unpack the downloaded archive to the newly created directory. We do that on three servers.

  3. Next we need to change our ZooKeeper configuration file and specify the servers that will form the ZooKeeper quorum, so we edit the /usr/share/zookeeper/conf/ zoo.cfg file and we add the following entries:

    clientPort=2181 dataDir=/usr/share/zookeeper/data tickTime=2000 initLimit=10 syncLimit=5 server.1=192.168.1.1:2888:3888 server.2=192.168.1.2:2888:3888 server.3=192.168.1.3:2888:3888

  4. And now, we can start the ZooKeeper servers with the following command:

    /usr/share/zookeeper/bin/zkServer.sh start

  5. If everything went well you should see something like the following:

    JMX enabled by default Using config: /usr/share/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED

And that’s all. Of course you can also add the ZooKeeper service to start automatically during your operating system startup, but that’s beyond the scope of the recipe and the book itself.

How it works…

Let’s skip the first part, because creating the directory and unpacking the ZooKeeper server there is quite simple. What I would like to concentrate on are the configuration values of the ZooKeeper server. The clientPort property specifies the port on which our SolrCloud servers should connect to ZooKeeper. The dataDir property specifies the directory where ZooKeeper will hold its data. So far, so good right ? So now, the more advanced properties; the tickTime property specified in milliseconds is the basic time unit for ZooKeeper. The initLimit property specifies how many ticks the initial synchronization phase can take. Finally, the syncLimit property specifies how many ticks can pass between sending the request and receiving an acknowledgement.

There are also three additional properties present, server.1, server.2, and server.3. These three properties define the addresses of the ZooKeeper instances that will form the quorum. However, there are three values separated by a colon character. The first part is the IP address of the ZooKeeper server, and the second and third parts are the ports used by ZooKeeper instances to communicate with each other.


Subscribe to the weekly Packt Hub newsletter

* indicates required

LEAVE A REPLY

Please enter your comment!
Please enter your name here