Fix a Nagios Service Stuck in Scheduled Downtime

Today an issue was brought to my attention in which a Nagios service check was genuinely stuck in downtime, with the only way to fix it being manually updating the values in the database.

Background

The service did not show up in the "Scheduled Downtime" page in the XI UI, nor did it show up in the nagios_scheduleddowntime table in our MariaDB instance. Yet, the service page still showed it as being in downtime.

According to the comment history on the service, the downtime was scheduled until 2042, so we couldn't really just wait and hope it expired.

This service has been scheduled for fixed downtime from 2019-04-30 00:39:16 to 2042-02-21 07:39:16.

Note: Before I talk about how the problem was resolved, if you're just here wondering how to remove scheduled downtime then you can do so in the aforementioned "Scheduled Downtime" page. It's at nagiosxi/includes/components/xicore/downtime.php in the XI UI, and under the "System -> Downtime" page in Nagios Core.

Technical Info

The issue that presented itself to us was that the entry in the nagios_servicestatus table had incorrect values in the acknowledgement_type and scheduled_downtime_depth columns.

The acknowledgement_type column can either 0, 1, or 2 which represent None, Normal, or Sticky, respectively.

The scheduled_downtime_depth column can be any smallint number, and it represents the number of downtimes that the service is in (since a service can be in multiple levels of downtime).

Fix

After determining the object ID of the service and verifying that no entries corresponding to it existed in nagios_scheduleddowntime, I used the following SQL to fix the service's status.

-- Fix acknowledgement_type, replace service_object_id with the appropriate ID
update nagios.nagios_servicestatus set acknowledgement_type='0' where service_object_id='8540';

-- Fix scheduled_downtime_depth, replace service_object_id with the appropriate ID
update nagios.nagios_servicestatus set scheduled_downtime_depth='0' where service_object_id='8540';

After doing this, you can also optionally remove the comment history for the downtime from the nagios_commenthistory table.

-- replace object_id and commenthistory_id with appropriate values.
delete from nagios.nagios_commenthistory where object_id = '8540' and commenthistory_id='2461761';