6 min read

(For more resources related to this topic, see here.)

Getting ready

Assuming that you have walked through the tutorial, you should be nearly ready with the setup. Still, it does not hurt to go through the checklist:

  • Be familiar that you know how to start your operating system’s shell (cmd.exe on Windows, Terminal/iTerm on Mac, and sh/bash/tch/zsh on Unix).

  • Ensure that running the java –version command on the shell’s prompt returns at least Version 1.6. You may need to upgrade if you have an older version.

  • Ensure that you know where you unpacked the Solr distribution and the full path to the example directory within that. You needed that directory for the tutorial, but that’s also where we are going to start our own Solr instance. That allows us to easily run an embedded Jetty web server and to also find all the additional JAR files that Solr needs to operate properly.

Now, create a directory where we will store our indexes and experiments. It can be anywhere on your drive. As Solr can run on any operating system where Java can run, we will use SOLRINDEXING as a name whenever we refer to that directory. Make sure to use absolute path names when substituting with your real path for the directory.

How to do it…

As our first example, we will create an index that stores and allows for the searching of simplified e-mail information. For now, we will just look at the addr_from and addr_to e-mail addresses and the subject line. You will see that it takes only two simple configuration files to get the basic Solr index working.

  1. Under the SOLR-INDEXING directory, create a collection1 directory and inside that create a conf directory.

  2. In the conf directory, create two files: schema.xml and solrconfig.xml.

  3. The schema.xml file should have the following content:

    <?xml version="1.0" encoding="UTF-8" ?>
    <schema version="1.5">
    <fields>
    <field name="id" type="string" indexed="true" stored="true"
    required="true"/>
    <field name="addr_from" type="string" indexed="true"
    stored="true" required="true"/>
    <field name="addr_to" type="string" indexed="true"
    stored="true" required="true"/>
    <field name="subject" type="string" indexed="true"
    stored="true" required="true"/>
    </fields>
    <uniqueKey>id</uniqueKey>
    <types>
    <fieldType name="string" class="solr.StrField" />
    </types>
    </schema>

  4. The solrconfig.xml file should have the following content:

    <?xml version="1.0" encoding="UTF-8" ?>
    <config>
    <luceneMatchVersion>LUCENE_43</luceneMatchVersion>
    <requestDispatcher handleSelect="false">
    <httpCaching never304="true" />
    </requestDispatcher>
    <requestHandler name="/select" class="solr.SearchHandler" />
    <requestHandler name="/update" class="solr.UpdateRequestHandler" />
    <requestHandler name="/admin" class="solr.admin.AdminHandlers" />
    <requestHandler name="/analysis/field" class="solr.
    FieldAnalysisRequestHandler" startup="lazy" />
    </config>

  5. That is it. Now, let’s start our just-created Solr instance. Open a new shell (we’ll need the current one later). On that shell’s command prompt, change the directory to the example directory of the Solr distribution and run the following command:

    java -Dsolr.solr.home=SOLR-INDEXING -jar start.jar

    Notice that solr.solr.home is not a typo; you do need the solr part twice. And, as always, if you have spaces in your paths (now or later), you may need to escape them in platform-specific ways, such as with backslashes on Unix/Linux or by quoting the whole value.

  6. In the window of your shell, you should see a long list of messages that you can safely ignore (at least for now).

  7. You can verify that everything is working fine by checking for the following three elements:

    • The long list of messages should finish with a message like Started [email protected]:8983. This means that Solr is now running on port 8983 successfully.

    • You should now have a directory called data, right next to the directory called conf that we created earlier.

    • If you open the web browser and go to the http:// localhost:8983/ solr/, you should see a web-based admin interface that makes testing and troubleshooting your Solr instance much easier. We will be using this interface later, so do spend a couple of minutes clicking around now.

  8. Now, let’s load some actual content into our collection:

    • Copy post.jar from the Solr distribution’s example/exampledocs directory to our root SOLR-INDEXING directory.
    • Create a file called input1.csv in the collection1 directory, next to the conf and data directories with the following three-line content:

      id,addr_from,addr_to,subject
      email1,[email protected],[email protected],"Kari,
      we need more Junior Java engineers"
      email2,[email protected],[email protected].
      com,"Updating vacancy description"

    • Run the import command from the command line in the SOLR-INDEXING directory (one long command; do not split it across lines):

      java -Dauto -Durl=http://localhost:8983/solr/collection1/
      update -jar post.jar collection1/input1.csv

    • You should see the following in one of the message lines:

      "1 files indexed".

  9. If you now open a web browser and go to http:// localhost:8983/solr/ collection1/select?q=*%3A*&wt=ruby&indent=true, you should see Solr output with all the three documents displayed on the screen in a somewhat readable format.

How it works…

We have created two files to get our example working. Let’s review what they mean and how they fit together:

  • The schema.xml file in the collection’s conf directory defines the actual shape of data that you want to store and index. The fields define a structure of a record. Each field has a type, which is also defined in the same file. The field defines whether it is stored, indexed, required, multivalued, or a small number of other, more advanced properties. On the other hand, the field type defines what is actually done to the field when it is indexed and when it is searched. We will explore all of these later.

  • The solrconfig.xml file also in the collection’s conf directory defines and tunes the components that make up Solr’s runtime environment. At the very least, it needs to define which URLs can be called to add records to a collection (here, /update), which to query a collection (here, /select), and which to do various administrative tasks (here, /admin and /analysis/field).

Once Solr started, it created a single collection with the default name of collection1, assigned an update handler to it at the /solr/collection1/update URL and search handler at the /solr/collection1/select URL (as per solrconfig.xml). At that point, Solr was ready for the data to be imported into the four required fields (as per schema.xml).

We then proceeded to populate the index from a CSV file (one of many update formats available) and then verified that the records are all present in an indented Ruby format (again, one of many result formats available).

Summary

This article helped you create a basic Solr collection and populate it with a simple dataset in CSV format.

Resources for Article :


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here