In this article by Josh Diakun, Paul R Johnson, and Derek Mock authors of the books Splunk Operational Intelligence Cookbook – Second Edition, we will cover the basic ways to search the data in Splunk. We will cover how to make raw event data readable
(For more resources related to this topic, see here.)
The ability to search machine data is one of Splunk’s core functions, and it should come as no surprise that many other features and functions of Splunk are heavily driven-off searches. Everything from basic reports and dashboards to data models and fully featured Splunk applications are powered by Splunk searches behind the scenes.
Splunk has its own search language known as the Search Processing Language (SPL). This SPL contains hundreds of search commands, most of which also have several functions, arguments, and clauses. While a basic understanding of SPL is required in order to effectively search your data in Splunk, you are not expected to know all the commands! Even the most seasoned ninjas do not know all the commands and regularly refer to the Splunk manuals, website, or Splunk Answers (http://answers.splunk.com).
To get you on your way with SPL, be sure to check out the search command cheat sheet and download the handy quick reference guide available at http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/SplunkEnterpriseQuickReferenceGuide.
Searching
Searches in Splunk usually start with a base search, followed by a number of commands that are delimited by one or more pipe (|) characters. The result of a command or search to the left of the pipe is used as the input for the next command to the right of the pipe. Multiple pipes are often found in a Splunk search to continually refine data results as needed. As we go through this article, this concept will become very familiar to you.
Splunk allows you to search for anything that might be found in your log data. For example, the most basic search in Splunk might be a search for a keyword such as error or an IP address such as 10.10.12.150. However, searching for a single word or IP over the terabytes of data that might potentially be in Splunk is not very efficient. Therefore, we can use the SPL and a number of Splunk commands to really refine our searches. The more refined and granular the search, the faster the time to run and the quicker you get to the data you are looking for!
When searching in Splunk, try to filter as much as possible before the first pipe (|) character, as this will save CPU and disk I/O. Also, pick your time range wisely. Often, it helps to run the search over a small time range when testing it and then extend the range once the search provides what you need.
Boolean operators
There are three different types of Boolean operators available in Splunk. These are AND, OR, and NOT. Case sensitivity is important here, and these operators must be in uppercase to be recognized by Splunk. The AND operator is implied by default and is not needed, but does no harm if used.
For example, searching for the term error or success would return all the events that contain either the word error or the word success. Searching for error success would return all the events that contain the words error and success. Another way to write this can be error AND success. Searching web access logs for error OR success NOT mozilla would return all the events that contain either the word error or success, but not those events that also contain the word mozilla.
Common commands
There are many commands in Splunk that you will likely use on a daily basis when searching data within Splunk. These common commands are outlined in the following table:
Command |
Description |
chart/timechart |
This command outputs results in a tabular and/or time-based output for use by Splunk charts. |
dedup |
This command de-duplicates results based upon specified fields, keeping the most recent match. |
eval |
This command evaluates new or existing fields and values. There are many different functions available for eval. |
fields |
This command specifies the fields to keep or remove in search results. |
head |
This command keeps the first X (as specified) rows of results. |
lookup |
This command looks up fields against an external source or list, to return additional field values. |
rare |
This command identifies the least common values of a field. |
rename |
This command renames the fields. |
replace |
This command replaces the values of fields with another value. |
search |
This command permits subsequent searching and filtering of results. |
sort |
This command sorts results in either ascending or descending order. |
stats |
This command performs statistical operations on the results. There are many different functions available for stats. |
table |
This command formats the results into a tabular output. |
tail |
This command keeps only the last X (as specified) rows of results. |
top |
This command identifies the most common values of a field. |
transaction |
This command merges events into a single event based upon a common transaction identifier. |
Time modifiers
The drop-down time range picker in the Graphical User Interface (GUI) to the right of the Splunk search bar allows users to select from a number of different preset and custom time ranges. However, in addition to using the GUI, you can also specify time ranges directly in your search string using the earliest and latest time modifiers. When a time modifier is used in this way, it automatically overrides any time range that might be set in the GUI time range picker.
The earliest and latest time modifiers can accept a number of different time units: seconds (s), minutes (m), hours (h), days (d), weeks (w), months (mon), quarters (q), and years (y). Time modifiers can also make use of the @ symbol to round down and snap to a specified time.
For example, searching for sourcetype=access_combined earliest=-1d@d latest=-1h will search all the access_combined events from midnight, a day ago until an hour ago from now. Note that the snap (@) will round down such that if it were 12 p.m. now, we would be searching from midnight a day and a half ago until 11 a.m. today.
Working with fields
Fields in Splunk can be thought of as keywords that have one or more values. These fields are fully searchable by Splunk. At a minimum, every data source that comes into Splunk will have the source, host, index, and sourcetype fields, but some source might have hundreds of additional fields. If the raw log data contains key-value pairs or is in a structured format such as JSON or XML, then Splunk will automatically extract the fields and make them searchable. Splunk can also be told how to extract fields from the raw log data in the backend props.conf and transforms.conf configuration files.
Searching for specific field values is simple. For example, sourcetype=access_combined status!=200 will search for events with a sourcetype field value of access_combined that has a status field with a value other than 200.
Splunk has a number of built-in pre-trained sourcetypes that ship with Splunk Enterprise that might work with out-of-the-box, common data sources. These are available at http://docs.splunk.com/Documentation/Splunk/latest/Data/Listofpretrainedsourcetypes.
In addition, Technical Add-Ons (TAs), which contain event types and field extractions for many other common data sources such as Windows events, are available from the Splunk app store at https://splunkbase.splunk.com.
Saving searches
Once you have written a nice search in Splunk, you may wish to save the search so that you can use it again at a later date or use it for a dashboard. Saved searches in Splunk are known as Reports. To save a search in Splunk, you simply click on the Save As button on the top right-hand side of the main search bar and select Report.
Making raw event data readable
When a basic search is executed in Splunk from the search bar, the search results are displayed in a raw event format by default. To many users, this raw event information is not particularly readable, and valuable information is often clouded by other less valuable data within the event. Additionally, if the events span several lines, only a few events can be seen on the screen at any one time.
In this recipe, we will write a Splunk search to demonstrate how we can leverage Splunk commands to make raw event data readable, tabulating events and displaying only the fields we are interested in.
Getting ready
You should be familiar with the Splunk search bar and search results area.
How to do it…
Follow the given steps to search and tabulate the selected event data:
- Log in to your Splunk server.
- Select the Search & Reporting application from the drop-down menu located in the top left-hand side of the screen.
- Set the time range picker to Last 24 hours and type the following search into the Splunk search bar:
index=main sourcetype=access_combined
Then, click on Search or hit Enter.
- Splunk will return the results of the search and display the raw search events under the search bar.
- Let’s rerun the search, but this time we will add the table command as follows:
index=main sourcetype=access_combined | table _time, referer_domain, method, uri_path, status, JSESSIONID, useragent
- Splunk will now return the same number of events, but instead of presenting the raw events to you, the data will be in a nicely formatted table, displaying only the fields we specified. This is much easier to read!
- Save this search by clicking on Save As and then on Report. Give the report the name cp02_tabulated_webaccess_logs and click on Save. On the next screen, click on Continue Editing to return to the search.
How it works…
Let’s break down the search piece by piece:
Search fragment |
Description |
index=main |
All the data in Splunk is held in one or more indexes. While not strictly necessary, it is a good practice to specify the index (es) to search, as this will ensure a more precise search. |
sourcetype=access_combined |
This tells Splunk to search only the data associated with the access_combined sourcetype, which, in our case, is the web access logs. |
| table _time, referer_domain, method, uri_path, action, JSESSIONID, useragent |
Using the table command, we take the result of our search to the left of the pipe and tell Splunk to return the data in a tabular format. Splunk will only display the fields specified after the table command in the table of results. |
In this recipe, you used the table command. The table command can have a noticeable performance impact on large searches. It should be used towards the end of a search, once all the other processing on the data by the other Splunk commands has been performed.
The stats command is more efficient than the table command and should be used in place of table where possible. However, be aware that stats and table are two very different commands.
There’s more…
The table command is very useful in situations where we wish to present data in a readable format. Additionally, tabulated data in Splunk can be downloaded as a CSV file, which many users find useful for offline processing in spreadsheet software or for sending to others. There are some other ways we can leverage the table command to make our raw event data readable.
Tabulating every field
Often, there are situations where we want to present every event within the data in a tabular format, without having to specify each field one by one. To do this, we simply use a wildcard (*) character as follows:
index=main sourcetype=access_combined | table *
Removing fields, then tabulating everything else
While tabulating every field using the wildcard (*) character is useful, you will notice that there are a number of Splunk internal fields, such as _raw, that appear in the table. We can use the fields command before the table command to remove the fields as follows:
index=main sourcetype=access_combined | fields - sourcetype, index, _raw, source date* linecount punct host time* eventtype | table *
If we do not include the minus (–) character after the fields command, Splunk will keep the specified fields and remove all the other fields.
Summary
In this article we covered along with the introduction to Splunk, how to make raw event data readable
Resources for Article:
Further resources on this subject:
- Splunk’s Input Methods and Data Feeds [Article]
- The Splunk Interface [Article]
- The Splunk Web Framework [Article]