8 min read

Troubleshooting Web Interface

There might be cases where accessing the Nagios URL shows an error instead of the welcome screen. If this happens, it can be due to various reasons, for example, because the web server has not started, or the Nagios related configuration setup is incorrect, or permissions on the Nagios directories are incorrect.

The first thing that we should check is whether Apache is working properly. We can manually run the check_http plugin from Nagios. If the web server is up and running, we should see something similar to what is shown here:

 # /opt/nagios/plugins/check_http -H 127.0.0.1
HTTP OK HTTP/1.1 200 OK - 296 bytes in 0.006 seconds

and if Apache is not running currently, the plugin will report an error similar to the following one:

# /opt/nagios/plugins/check_http -H 127.0.0.1
HTTP CRITICAL - Unable to open TCP socket

If it was stopped, start it by running /etc/init.d/apache2 start.

The next step is to check whether the http://127.0.0.1/nagios/ URL is working properly. We can also use the same plugin for this. The -u argument can specify the exact link to access, and -a allows you to specify the username and password to be authorized. It is passed in the form of <username>:<password>.

# /opt/nagios/plugins/check_http -H 127.0.0.1 
u /nagios/ -a nagiosadmin:<yourpassword>
HTTP OK HTTP/1.1 200 OK - 979 bytes in 0.019 seconds

We can also check the actual CGI scripts by passing a URL to one of the scripts:

# /opt/nagios/plugins/check_http -H 127.0.0.1 
u /nagios/cgi-bin/tac.cgi -a nagiosadmin:<yourpassword>
HTTP OK HTTP/1.1 200 OK - 979 bytes in 0.019 seconds

If any of these checks return any HTTP code other than 200, it means that this is the problem.

If the code is 500, it means that Apache is not configured correctly. In such cases, the Apache error log contains useful information about any potential problems. On most systems, including Ubuntu Linux, the filename of the log is /var/log/apache2/error.log. An example entry in the error log could be:

[error] [client 127.0.0.1] need AuthName: /nagios/cgi-bin/tac.cgi

In this particular case, the problem is the missing AuthName directive for CGI scripts.

Internal errors can usually be resolved by making sure that the Nagios-related Apache configuration is correct.

If this does not help, it is worth checking other parts of the configuration, especially the ones related to virtual hosts and CGI configuration. Commenting out parts of the configuration can help in determining which parts of the configuration are causing problems.

Another possibility is that either the check for /nagios/ or the check for the /nagios/cgi-bin/tac.cgi URL returned code 404. This code means that the page was not found. In this case, please make sure that Apache is configured according to the previous steps.

If it is, then it’s a good idea to enable more verbose debugging to a custom file. The following Apache 2 directives can be added either to /etc/apache2/conf.d/nagios or to any other file in Apache configuration:

LogFormat "%h %l %u "%r" %>s %b %{Host}e %f" debuglog
CustomLog /var/log/apache2/access-debug.log debuglog

The first entry defines a custom logging format that also logs exact paths to files. The second one enables logging with this format to a dedicated file. An example entry in such a log would be:

127.0.0.1 - - "GET /nagios/ HTTP/1.1" 404 481 127.0.0.1 /var/www/nagios

This log entry tells us that http://127.0.0.1/nagios/ was incorrectly expanded to the /var/www/nagios directory. In this case, the Alias directive describing the /nagios/ prefix is missing. Making sure that actual configuration matches the one provided in the previous section will also resolve this issue.

Another error that you can get is 403, which indicates that Apache was unable to access either CGI scripts in /opt/nagios/sbin, or Nagios static pages in /opt/nagios/share. In this case, you need to make sure that these directories are readable by the user Apache is running as.

The error might also be related to the directories above /opt/nagios or /opt. One of these might also be inaccessible to the user Apache is running as, which will also cause the same error to occur.

If you run into any other problems, it is best to start with making sure that Nagios related configuration matches the examples from the previous section. It is also a good idea to reduce the number of enabled features and virtual hosts in your Apache configuration.

Troubleshooting Passive Checks

It’ s not always possible to set up passive checks correctly the first time. In such cases, it is a good thing to try to debug the issue one step at a time in order to find any potential problems. Sometimes the problem could be a configuration issue, while in other cases, it could be an issue such as the mistyping of the host or service name.

One thing worth checking is whether the Web UI shows changes after you have sent the passive result check. If it doesn’t, then at some point, things are not working correctly.

The first thing you should start with is enabling the logging of external commands and passive checks. To do this, make sure that the following values are enabled in the main Nagios configuration file:

log_external_commands=1
log_passive_checks=1

In order for the changes to take effect, a restart of the Nagios process is needed. After this has been done, Nagios will log all commands passed via the command pipe and log all of the passive check results it receives.

The first issue, a common problem, is that an application or script cannot write data to the Nagios command pipe. In order to test this, simply change to the user your scripts are running as, and try the following command:

user@ubuntuserver:~$ echo "TEST" >/var/nagios/rw/nagios.cmd

If the command above runs fine, and no errors are reported, then your permissions are set up correctly. If an error shows up, you should add the user to the nagioscmd group. The next thing to do is to manually send a passive check result to the Nagios command pipe and check whether the Nagios log file was received and parsed correctly. To test this, run the following command:

echo "['date +%s'] PROCESS_HOST_CHECK_RESULT;host1;2;test" 
>/var/nagios/rw/nagios.cmd

The name, host1, needs to be replaced with an actual host name from your configuration. A few seconds after running this command, the Nagios log file should reflect the command that we have just sent. You should see the following lines in your log:

EXTERNAL COMMAND: PROCESS_HOST_CHECK_RESULT;host1;2;test
[1220257561] PASSIVE HOST CHECK: host1;2;test

If both of these lines are in your log file, then we can conclude that Nagios has received and parsed the command correctly.

If only the first line is present, then it means that either the global option to receive passive host check results is disabled, or it is disabled for this particular object. The first thing you should do is to make sure that your main Nagios configuration file contains the following line:

accept_passive_host_checks=1

Next, you should check your configuration to see whether the host definition has passive checks enabled as well. If not, simply add the following directive to the object definition:

passive_checks_enabled 1

If you have misspelled the name of the host object, then the following will be logged:

Warning: Passive check result was received for host host01',
but the host could not be found!

In this case, make sure that your host name is correct.

Similar checks can also be done for services. You can run the following command to check if a passive service check is being handled correctly by Nagios:

echo "['date +%s'] PROCESS_SERVICE_CHECK_RESULT;host1;APT;0;test" 
>/var/nagios/rw/nagios.cmd

Again, host1 should be replaced by the actual host name, and APT needs to be an existing service for that host. After a few seconds, the following entries in Nagios log file would indicate the result has been successfully parsed:

EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;host1;APT;0;test
PASSIVE SERVICE CHECK: host1;APT;0;test

If the second line is not in the log file, either the option to accept service passive checks is disabled on a global basis, or this particular service has the option to accept passive check results disabled. You should start by making sure that your main Nagios configuration file contains the following line:

accept_passive_service_checks=1

You should also make sure that the service definition has passive checks enabled as well, and if not, add the following directive to the object definition:

passive_checks_enabled 1

If you have misspelled the name of the host or service, then the following will be logged:

Warning: Passive check result was received for service APT' on host 
host1', but the service could not be found!

LEAVE A REPLY

Please enter your comment!
Please enter your name here