An index in Splunk is a storage pool for events, capped by size and time. By default, all events will go to the index specified by defaultDatabase, which is called main but lives in a directory called defaultdb.
In this tutorial, we put focus to index structures, need of multiple indexes, how to size an index and how to manage multiple indexes in a Splunk environment.
Directory structure of an index
Each index occupies a set of directories on the disk. By default, these directories live in $SPLUNK_DB, which, by default, is located in $SPLUNK_HOME/var/lib/splunk.
Look at the following stanza for the main index:
[main] homePath = $SPLUNK_DB/defaultdb/db coldPath = $SPLUNK_DB/defaultdb/colddb thawedPath = $SPLUNK_DB/defaultdb/thaweddb maxHotIdleSecs = 86400 maxHotBuckets = 10 maxDataSize = auto_high_volume
If our Splunk installation lives at /opt/splunk, the index main is rooted at the path /opt/splunk/var/lib/splunk/defaultdb.
To change your storage location, either modify the value of SPLUNK_DB in $SPLUNK_HOME/etc/splunk-launch.conf or set absolute paths in indexes.conf.
splunk-launch.conf cannot be controlled from an app, which means it is easy to forget when adding indexers. For this reason, and for legibility, I would recommend using absolute paths in indexes.conf.
The homePath directories contain index-level metadata, hot buckets, and warm buckets. coldPath contains cold buckets, which are simply warm buckets that have aged out. See the upcoming sections The lifecycle of a bucket and Sizing an index for details.
When to create more indexes
There are several reasons for creating additional indexes. If your needs do not meet one of these requirements, there is no need to create more indexes. In fact, multiple indexes may actually hurt performance if a single query needs to open multiple indexes.
If you do not have a test environment, you can use test indexes for staging new data. This then allows you to easily recover from mistakes by dropping the test index. Since Splunk will run on a desktop, it is probably best to test new configurations locally, if possible.
It may be the case that you need more history for some source types than others. The classic example here is security logs, as compared to web access logs. You may need to keep security logs for a year or more, but need the web access logs for only a couple of weeks.
If these two source types are left in the same index, security events will be stored in the same buckets as web access logs and will age out together. To split these events up, you need to perform the following steps:
- Create a new index called security, for instance
- Define different settings for the security index
- Update inputs.conf to use the new index for security source types
For one year, you might make an indexes.conf setting such as this:
[security] homePath = $SPLUNK_DB/security/db coldPath = $SPLUNK_DB/security/colddb thawedPath = $SPLUNK_DB/security/thaweddb #one year in seconds frozenTimePeriodInSecs = 31536000
For extra protection, you should also set maxTotalDataSizeMB, and possibly coldToFrozenDir.
If you have multiple indexes that should age together, or if you will split homePath and coldPath across devices, you should use volumes. See the upcoming section, Using volumes to manage multiple indexes, for more information.
Then, in inputs.conf, you simply need to add an index to the appropriate stanza as follows:
[monitor:///path/to/security/logs/logins.log] sourcetype=logins index=security
If some data should only be seen by a specific set of users, the most effective way to limit access is to place this data in a different index, and then limit access to that index by using a role. The steps to accomplish this are essentially as follows:
- Define the new index.
- Configure inputs.conf or transforms.conf to send these events to the new index.
- Ensure that the user role does not have access to the new index.
- Create a new role that has access to the new index.
- Add specific users to this new role. If you are using LDAP authentication, you will need to map the role to an LDAP group and add users to that LDAP group.
To route very specific events to this new index, assuming you created an index called sensitive, you can create a transform as follows:
[contains_password] REGEX = (?i)password[=:] DEST_KEY = _MetaData:Index FORMAT = sensitive
You would then wire this transform to a particular sourcetype or source index in props.conf.
Using more indexes to increase performance
Placing different source types in different indexes can help increase performance if those source types are not queried together. The disks will spend less time seeking when accessing the source type in question.
If you have access to multiple storage devices, placing indexes on different devices can help increase the performance even more by taking advantage of different hardware for different queries. Likewise, placing homePath and coldPath on different devices can help performance.
However, if you regularly run queries that use multiple source types, splitting those source types across indexes may actually hurt performance. For example, let’s imagine you have two source types called web_access and web_error.
We have the following line in web_access:
2012-10-19 12:53:20 code=500 session=abcdefg url=/path/to/app
And we have the following line in web_error:
2012-10-19 12:53:20 session=abcdefg class=LoginClass
If we want to combine these results, we could run a query like the following:
(sourcetype=web_access code=500) OR sourcetype=web_error | transaction maxspan=2s session | top url class
If web_access and web_error are stored in different indexes, this query will need to access twice as many buckets and will essentially take twice as long.
The life cycle of a bucket
An index is made up of buckets, which go through a specific life cycle. Each bucket contains events from a particular period of time.
The stages of this lifecycle are hot, warm, cold, frozen, and thawed. The only practical difference between hot and other buckets is that a hot bucket is being written to, and has not necessarily been optimized. These stages live in different places on the disk and are controlled by different settings in indexes.conf:
- homePath contains as many hot buckets as the integer value of maxHotBuckets, and as many warm buckets as the integer value of maxWarmDBCount. When a hot bucket rolls, it becomes a warm bucket. When there are too many warm buckets, the oldest warm bucket becomes a cold bucket.
- Do not set maxHotBuckets too low. If your data is not parsing perfectly, dates that parse incorrectly will produce buckets with very large time spans. As more buckets are created, these buckets will overlap, which means all buckets will have to be queried every time, and performance will suffer dramatically. A value of five or more is safe.
- coldPath contains cold buckets, which are warm buckets that have rolled out of homePath once there are more warm buckets than the value of maxWarmDBCount. If coldPath is on the same device, only a move is required; otherwise, a copy is required.
- Once the values of frozenTimePeriodInSecs, maxTotalDataSizeMB, or maxVolumeDataSizeMB are reached, the oldest bucket will be frozen. By default, frozen means deleted. You can change this behavior by specifying either of the following:
- coldToFrozenDir: This lets you specify a location to move the buckets once they have aged out. The index files will be deleted, and only the compressed raw data will be kept. This essentially cuts the disk usage by half. This location is unmanaged, so it is up to you to watch your disk usage.
- coldToFrozenScript: This lets you specify a script to perform some action when the bucket is frozen. The script is handed the path to the bucket that is about to be frozen.
- thawedPath can contain buckets that have been restored. These buckets are not managed by Splunk and are not included in all time searches. To search these buckets, their time range must be included explicitly in your search.
I have never actually used this directory. Search https://splunk.com for restore archived to learn the procedures.
Sizing an index
To estimate how much disk space is needed for an index, use the following formula:
(gigabytes per day) * .5 * (days of retention desired)
Likewise, to determine how many days you can store an index, the formula is essentially:
(device size in gigabytes) / ( (gigabytes per day) * .5 )
The .5 represents a conservative compression ratio. The log data itself is usually compressed to 10 percent of its original size. The index files necessary to speed up search brings the size of a bucket closer to 50 percent of the original size, though it is usually smaller than this.
If you plan to split your buckets across devices, the math gets more complicated unless you use volumes. Without using volumes, the math is as follows:
homePath = (maxWarmDBCount + maxHotBuckets) * maxDataSize coldPath = maxTotalDataSizeMB - homePath
For example, say we are given these settings:
[myindex] homePath = /splunkdata_home/myindex/db coldPath = /splunkdata_cold/myindex/colddb thawedPath = /splunkdata_cold/myindex/thaweddb maxWarmDBCount = 50 maxHotBuckets = 6 maxDataSize = auto_high_volume #10GB on 64-bit systems maxTotalDataSizeMB = 2000000 Filling in the preceding formula, we get these values: homePath = (50 warm + 6 hot) * 10240 MB = 573440 MB coldPath = 2000000 MB - homePath = 1426560 MB
If we use volumes, this gets simpler and we can simply set the volume sizes to our available space and let Splunk do the math.
Using volumes to manage multiple indexes
Volumes combine pools of storage across different indexes so that they age out together. Let’s make up a scenario where we have five indexes and three storage devices.
The indexes are as follows:
Name Data per day Retention required Storage needed web 50 GB no requirement ? security 1 GB 2 years 730 GB * 50 percent app 10 GB no requirement ? chat 2 GB 2 years 1,460 GB * 50 percent web_summary 1 GB 1 years 365 GB * 50 percent
Now let’s say we have three storage devices to work with, mentioned in the following table:
Name Size small_fast 500 GB big_fast 1,000 GB big_slow 5,000 GB
We can create volumes based on the retention time needed. Security and chat share the same retention requirements, so we can place them in the same volumes. We want our hot buckets on our fast devices, so let’s start there with the following configuration:
[volume:two_year_home] #security and chat home storage path = /small_fast/two_year_home maxVolumeDataSizeMB = 300000 [volume:one_year_home] #web_summary home storage path = /small_fast/one_year_home maxVolumeDataSizeMB = 150000
For the rest of the space needed by these indexes, we will create companion volume definitions on big_slow, as follows:
[volume:two_year_cold] #security and chat cold storage path = /big_slow/two_year_cold maxVolumeDataSizeMB = 850000 #([security]+[chat])*1024 - 300000 [volume:one_year_cold] #web_summary cold storage path = /big_slow/one_year_cold maxVolumeDataSizeMB = 230000 #[web_summary]*1024 - 150000
Now for our remaining indexes, whose timeframe is not important, we will use big_fast and the remainder of big_slow, like so:
[volume:large_home] #web and app home storage path = /big_fast/large_home maxVolumeDataSizeMB = 900000 #leaving 10% for pad [volume:large_cold] #web and app cold storage path = /big_slow/large_cold maxVolumeDataSizeMB = 3700000 #(big_slow - two_year_cold - one_year_cold)*.9
Given that the sum of large_home and large_cold is 4,600,000 MB, and a combined daily volume of web and app is 60,000 MB approximately, we should retain approximately 153 days of web and app logs with 50 percent compression.
In reality, the number of days retained will probably be larger. With our volumes defined, we now have to reference them in our index definitions:
[web] homePath = volume:large_home/web coldPath = volume:large_cold/web thawedPath = /big_slow/thawed/web [security] homePath = volume:two_year_home/security coldPath = volume:two_year_cold/security thawedPath = /big_slow/thawed/security coldToFrozenDir = /big_slow/frozen/security [app] homePath = volume:large_home/app coldPath = volume:large_cold/app thawedPath = /big_slow/thawed/app [chat] homePath = volume:two_year_home/chat coldPath = volume:two_year_cold/chat thawedPath = /big_slow/thawed/chat coldToFrozenDir = /big_slow/frozen/chat [web_summary] homePath = volume:one_year_home/web_summary coldPath = volume:one_year_cold/web_summary thawedPath = /big_slow/thawed/web_summary thawedPath cannot be defined using a volume and must be specified for Splunk to start.
For extra protection, we specified coldToFrozenDir for the indexes’ security and chat. The buckets for these indexes will be copied to this directory before deletion, but it is up to us to make sure that the disk does not fill up. If we allow the disk to fill up, Splunk will stop indexing until space is made available.
This is just one approach to using volumes. You could overlap in any way that makes sense to you, as long as you understand that the oldest bucket in a volume will be frozen first, no matter what index put the bucket in that volume.
With this, we learned to operate multiple indexes and how we can get effective business intelligence out of the data without hurting system performance. If you found this tutorial useful, do check out the book Implementing Splunk 7 – Third Edition and start creating advanced Splunk dashboards.