12.1 O&M Overview

Effective operations and maintenance (O&M) of an underground security surveillance system requires a combination of remote monitoring, scheduled preventive maintenance, and a well-defined incident response procedure. The harsh underground environment — high humidity, temperature cycling, vibration, and limited access — means that equipment degrades faster than in surface installations, and that access for repairs is more time-consuming and costly. A proactive O&M strategy that catches problems early through remote monitoring and scheduled inspections is far more cost-effective than a reactive strategy that waits for failures to occur.

The photograph below shows a typical O&M operations center where a monitoring engineer tracks system health in real time using a multi-screen dashboard, while a field technician simultaneously performs a cabinet inspection. This two-layer approach — remote monitoring plus scheduled field visits — is the recommended O&M model for underground security surveillance systems.

Underground Security Surveillance O&M Operations Center with Real-Time Monitoring Dashboard
Figure 12.1: O&M Operations Center — Real-time monitoring dashboard (camera online status 48/50, storage utilization 67%, network health all green) with concurrent field cabinet inspection by maintenance technician.

12.2 Key Performance Indicators

The following KPIs define the operational targets for an underground security surveillance system. These targets should be included in the service level agreement (SLA) between the system operator and the maintenance contractor. KPI performance should be reviewed monthly and reported quarterly to the facility manager.

≥99%
Camera Availability
Target: ≥99% of cameras online at any time
<4h
Mean Time to Repair (MTTR)
Target: <4 hours for critical zone failures
100%
Recording Retention Compliance
Target: 100% of required retention days available
<70%
Storage Utilization
Target: <70% to allow headroom for retention spikes
<10ms
Network Latency
Target: <10ms round-trip on local LAN
0
Security Incidents (Cyber)
Target: Zero successful unauthorized access events

12.3 Preventive Maintenance Schedule

The preventive maintenance schedule defines the activities, frequency, and responsible party for all routine maintenance tasks. Preventive maintenance is the primary tool for maintaining system availability and extending equipment life in the harsh underground environment. All maintenance activities must be documented in the maintenance log, and any defects found must be raised as corrective maintenance work orders within 24 hours of discovery.

FrequencyActivityResponsibleDocumentation
Daily (Remote)Check camera online status in VMS dashboardControl room operatorDaily log entry
Verify recording is active on all channelsControl room operatorDaily log entry
Check storage utilization — alert if >80%Control room operatorDaily log entry
Review UPS battery status and alarm logControl room operatorDaily log entry
Monthly (Field)Clean camera dome covers with IPA wipeMaintenance technicianMaintenance work order
Inspect cable entries and glands for seal integrityMaintenance technicianMaintenance work order
Test UPS battery runtime under loadMaintenance technicianUPS test report
Check cabinet temperature and humidity logsMaintenance technicianMaintenance work order
Quarterly (Field)Verify camera focus and coverage areaMaintenance technicianQuarterly inspection report
Test motion detection in all zonesMaintenance technicianQuarterly inspection report
Check firmware versions; apply updates if availableNetwork engineerFirmware update log
Review and rotate VMS user passwordsIT securityPassword change log
Annual (Full)Full acceptance re-test (all items from Chapter 10)Project manager + clientAnnual inspection report
Replace UPS batteries if capacity <80% of ratedMaintenance technicianBattery replacement record
Anti-corrosion inspection and touch-up of all metalworkMaintenance technicianAnnual inspection report

12.4 Incident Response Procedure

When a system alarm or failure is detected, the incident response procedure defines the steps to be taken to restore service within the target MTTR. The procedure distinguishes between critical incidents (affecting a complete zone or the entire system) and non-critical incidents (affecting individual cameras or non-essential functions). Critical incidents must be escalated immediately to the on-call engineer; non-critical incidents can be queued for the next scheduled maintenance visit if the affected camera count is below the threshold defined in the SLA.

Incident TypeExampleResponse TimeEscalation PathResolution Target
P1 — CriticalAll cameras in a zone offline; VMS server down; complete recording failureAcknowledge within 15 min; dispatch within 1 hourOperator → On-call engineer → Facility managerRestore within 4 hours
P2 — HighMultiple cameras offline (>10%); UPS alarm; storage >90%Acknowledge within 30 min; dispatch within 4 hoursOperator → On-call engineerRestore within 8 hours
P3 — MediumSingle camera offline; image quality degraded; motion detection false alarmsAcknowledge within 2 hours; schedule within 24 hoursOperator → Maintenance queueRestore within 48 hours
P4 — LowFirmware update available; label damaged; minor cable management issueAcknowledge within 24 hoursMaintenance queueResolve at next scheduled visit

Spare Parts Readiness: To meet the MTTR targets above, a minimum spare parts inventory (as defined in Chapter 8) must be maintained on-site or at a nearby depot. Without spare parts, even a simple camera replacement can take days if parts must be ordered. Review spare parts inventory quarterly and replenish any consumed items within 30 days.