Technical FAQs

Ask a Question

Why does my Network Management Card report warmstart/coldstart or network interface restart/coldstart?



Issue

My UPS Network Management Card or Network Management Card enabled product is reporting one or more of the following alarms: System: Warmstart, System: Coldstart, System: Network Interface restarted, System: Network Interface Coldstarted.


Product Lines
 
  • Web/SNMP Card - AP9606
    Devices with an embedded Web/SNMP Card include (but are not limited to): Environmental Monitoring Unit 1 (AP9312TH)
     
  • Network Management Card 1 (NMC1) - AP9617, AP9618, AP9619
    Devices with an embedded Network Management Card 1 include (but are not limited to): Metered/Switched Rack PDUs (AP78XX, AP79XX), Rack Automatic Transfer Switches (AP77XX, Environmental Monitoring Units (AP9320, AP9340, NetBotz 200)
     
  • Network Management Card 2 (NMC2) - AP9630/AP9631CH, AP9631/AP9631CH, AP9635/AP9635CH
    Devices with an embedded Network Management Card 2 include (but are not limited to): 2G Metered/Switched Rack PDUs (AP84XX, AP86XX, AP88XX, AP89XX), Certain Audio/Video Network Management Enabled products.

 
Environment
  • All serial numbers
  • All firmware versions

     
Note: As of APC Operating System (AOS) v5.1.5 or higher, the critical alerts of System: Warmstart or System: Coldstart have now been renamed to an Informational alert of System: Network Interface restarted or System: Network Interface coldstarted, respectively.


Cause


A System Coldstart event indicates that the NMC (Network Management Card) has just been powered and has completed startup.
  • Power interruption to the NMC - If a device powering the NMC suffers an interruption of power, the NMC will restart when power is reapplied to that device.

A System Warmstart (or restart) event indicates that the NMC has rebooted and completed startup, without losing power. This may be due to any of the conditions mentioned below.
  • The default gateway is wrong or the network traffic is too heavy and the gateway can not be reached.
The default gateway can be any valid node's IP on the network management card's network. The Management Card implements internal watchdog mechanisms to protect itself from becoming inaccessible over the network. For example, if the Management Card does not receive any network traffic for 9.5 minutes (either direct traffic, such as SNMP,or broadcast traffic, such as an Address Resolution Protocol [ARP] request), it assumes that there is a problem with its network interface and restarts to prevent further problems. To ensure that the Management Card does not restart if the network is quiet for 9.5 minutes, the Management Card attempts to contact the default gateway every 4.5 minutes. If the gateway is present, it responds to the Management Card, and that response restarts the 9.5-minute timer. If your application does not require or have a gateway, specify the IP address of a computer that is running on the network most of the time and is on the same subnet. The network traffic of that computer will restart the 9.5-minute timer frequently enough to prevent the Management Card from restarting.  
  • FTP upload completion - After a new AOS or Application firmware upgrade has been uploaded to the NMC the NMC will automatically reboot and run the new firmware.
  • Logout if settings are changed - Modification of some NMC settings will require a reboot of the NMC.
  • Reset Switch - If the Reset button on the front panel of the NMC is pressed the NMC will reboot immediately.
  • Web Interface Reboot request - One of the NMC options for Reboot was selected in the Web User Interface.
  • Network settings have changed - At least one of the TCP/IP settings changed. The system will need to reboot for the settings to take effect.
  • Restart SNMP Agent (SNMP v1 or v3) - A request to restart the current SNMP agent service was received. Majority of SNMP configuration changes will require an SNMP agent service restart to apply the change.
  • Load and Execute SNMP Agent Service (SNMP v1 or v3) - An internal request to load and execute a new SNMP agent service was received. Enabling or disabling SNMP v1 or v3 will require a restart to apply the change.
  • Clear Network and Start SNMP Agent (SNMP v1 or v3) - A request to clear the NMC's network settings and restart the SNMP agent service was received.
  • Smart-UPS Output Voltage Change - Some Smart-UPS models allow the output voltage to be changed. This requires a reboot of the NMC firmware.
     
  • Remote Monitoring Service (RMS) communication has been lost (NMC2 only) - If for a certain period of time, RMS communication is lost, you may see several System: RMS failed to communicate with the server events followed by a Warmstart/Network Interface Restart message in the NMC2 event log. This is normal and the NMC2 reboots in this situation as a failsafe mechanism in case the communication problem is internal to the NMC2. On NMC2, it may be helpful to obtain the debug.txt or dump.txt file from the card's file system in the /dbg folder to help validate, if the files are present.
  • Detected Firmware Error - An internal firmware error was detected by the NMC and to clear the error, the NMC firmware explicitly reboots itself as a failsafe. On NMC2, it may be helpful to obtain the debug.txt or dump.txt file from the card's file system in the /dbg folder to help validate, if the files are present.
  • Unrecoverable Firmware Error - An undetected firmware error occurred and the hardware watchdog reboots the NMC to clear the error. On NMC2, it may be helpful to obtain the debug.txt or dump.txt file from the card's file system in the /dbg folder to help validate, if the files are present.

 
Resolution
 
It is recommended to follow the instructions in article FA156131 and download all available event logs for your product. This includes event.txt, data.txt, and config.ini for NMC1 and NMC2 as well as debug.txt and dump.txt for NMC2 only.

For NMC1:
 
  • Review the event.txt file to see if any of the causes listed above could be why your Network Management Card has restarted or coldstarted.
  • Is this affecting more than one Network Management Card in your environment? This may point to a network traffic issue (too much, to little, or "bad" packets)  causing the Management Card to reboot due to the watchdog mechanism outlined above.
  • Note the frequency of the events in question. Can you pinpoint it to a certain time/certain set of events before and after?
    • If the warmstarts/network interface restarts are always the same amount of time apart, this may relate to a network traffic issue as well.
  • Depending on what you find, try rebooting your card's interface or resetting the card to defaults (after backing up your configuration). See if the issue persists.
  • Please contact APC support at any time for additional options and help to obtain a root cause for a warmstart/restart/coldstart. Also note these types of events don't always necessarily indicate a problem and only affect the Network Management Card's interface. Your UPS or device load is unaffected. Please have your log files (event.txt, data.txt, config.ini) available for tech support to review.
 
 
For NMC2:
  • Review the event.txt file to see if any of the causes listed above is why your Network Management Card has restarted or coldstarted. 
  • Is this affecting more than one Network Management Card in your environment? This may point to a network traffic issue (too much, to little, or "bad" packets) causing the Management Card to reboot due to the watchdog mechanism outlined above.
  • Note the frequency of the events in question. Can you pinpoint it to a certain time/certain set of events before and after?
    • If the warmstarts/network interface restarts are always the same amount of time apart, this may relate to a network traffic issue as well.
  • Depending on what you find, consider rebooting your card's interface or resetting the card to defaults (only after backing up your configuration and obtaining the aforementioned log files). See if the issue persists.
  • Please contact contact APC support for additional options and help to obtain a root cause for a warmstart/restart/coldstart. Also note these types of events don't always necessarily indicate a problem and only affect the Network Management Card's interface. Your UPS or device load is unaffected. Please have your log files available for tech support to review - event.txt, debug.txt and dump.txt are most important as these files can easily debug the situation. These two files are not available on NMC1 products.
  
Was this helpful?
What can we do to improve the information ?