Chapter 12: Operations & Maintenance — Underground Security Surveillance Design Guide

12.1 O&M Overview

Effective operations and maintenance (O&M) of an underground security surveillance system requires a combination of remote monitoring, scheduled preventive maintenance, and a well-defined incident response procedure. The harsh underground environment — high humidity, temperature cycling, vibration, and limited access — means that equipment degrades faster than in surface installations, and that access for repairs is more time-consuming and costly. A proactive O&M strategy that catches problems early through remote monitoring and scheduled inspections is far more cost-effective than a reactive strategy that waits for failures to occur.

The photograph below shows a typical O&M operations center where a monitoring engineer tracks system health in real time using a multi-screen dashboard, while a field technician simultaneously performs a cabinet inspection. This two-layer approach — remote monitoring plus scheduled field visits — is the recommended O&M model for underground security surveillance systems.

Underground Security Surveillance O&M Operations Center with Real-Time Monitoring Dashboard

Figure 12.1: O&M Operations Center — Real-time monitoring dashboard (camera online status 48/50, storage utilization 67%, network health all green) with concurrent field cabinet inspection by maintenance technician.

12.2 Key Performance Indicators

The following KPIs define the operational targets for an underground security surveillance system. These targets should be included in the service level agreement (SLA) between the system operator and the maintenance contractor. KPI performance should be reviewed monthly and reported quarterly to the facility manager.

≥99%

Camera Availability

Target: ≥99% of cameras online at any time

<4h

Mean Time to Repair (MTTR)

Target: <4 hours for critical zone failures

100%

Recording Retention Compliance

Target: 100% of required retention days available

<70%

Storage Utilization

Target: <70% to allow headroom for retention spikes

<10ms

Network Latency

Target: <10ms round-trip on local LAN

Security Incidents (Cyber)

Target: Zero successful unauthorized access events

12.3 Preventive Maintenance Schedule

The preventive maintenance schedule defines the activities, frequency, and responsible party for all routine maintenance tasks. Preventive maintenance is the primary tool for maintaining system availability and extending equipment life in the harsh underground environment. All maintenance activities must be documented in the maintenance log, and any defects found must be raised as corrective maintenance work orders within 24 hours of discovery.

Frequency	Activity	Responsible	Documentation
Daily (Remote)	Check camera online status in VMS dashboard	Control room operator	Daily log entry
	Verify recording is active on all channels	Control room operator	Daily log entry
	Check storage utilization — alert if >80%	Control room operator	Daily log entry
	Review UPS battery status and alarm log	Control room operator	Daily log entry
Monthly (Field)	Clean camera dome covers with IPA wipe	Maintenance technician	Maintenance work order
	Inspect cable entries and glands for seal integrity	Maintenance technician	Maintenance work order
	Test UPS battery runtime under load	Maintenance technician	UPS test report
	Check cabinet temperature and humidity logs	Maintenance technician	Maintenance work order
Quarterly (Field)	Verify camera focus and coverage area	Maintenance technician	Quarterly inspection report
	Test motion detection in all zones	Maintenance technician	Quarterly inspection report
	Check firmware versions; apply updates if available	Network engineer	Firmware update log
	Review and rotate VMS user passwords	IT security	Password change log
Annual (Full)	Full acceptance re-test (all items from Chapter 10)	Project manager + client	Annual inspection report
	Replace UPS batteries if capacity <80% of rated	Maintenance technician	Battery replacement record
	Anti-corrosion inspection and touch-up of all metalwork	Maintenance technician	Annual inspection report

12.4 Incident Response Procedure

When a system alarm or failure is detected, the incident response procedure defines the steps to be taken to restore service within the target MTTR. The procedure distinguishes between critical incidents (affecting a complete zone or the entire system) and non-critical incidents (affecting individual cameras or non-essential functions). Critical incidents must be escalated immediately to the on-call engineer; non-critical incidents can be queued for the next scheduled maintenance visit if the affected camera count is below the threshold defined in the SLA.

Incident Type	Example	Response Time	Escalation Path	Resolution Target
P1 — Critical	All cameras in a zone offline; VMS server down; complete recording failure	Acknowledge within 15 min; dispatch within 1 hour	Operator → On-call engineer → Facility manager	Restore within 4 hours
P2 — High	Multiple cameras offline (>10%); UPS alarm; storage >90%	Acknowledge within 30 min; dispatch within 4 hours	Operator → On-call engineer	Restore within 8 hours
P3 — Medium	Single camera offline; image quality degraded; motion detection false alarms	Acknowledge within 2 hours; schedule within 24 hours	Operator → Maintenance queue	Restore within 48 hours
P4 — Low	Firmware update available; label damaged; minor cable management issue	Acknowledge within 24 hours	Maintenance queue	Resolve at next scheduled visit

Spare Parts Readiness: To meet the MTTR targets above, a minimum spare parts inventory (as defined in Chapter 8) must be maintained on-site or at a nearby depot. Without spare parts, even a simple camera replacement can take days if parts must be ordered. Review spare parts inventory quarterly and replenish any consumed items within 30 days.